We don't need continual learning for AGI. What top labs are currently doing

Many people think that we won't reach AGI or even ASI if LLM's don't have something called "continual learning". Basically, continual learning is the ability for an AI to learn on the job, update its neural weights in real-time, and get smarter without forgetting everything else (catastrophic forgetting). This is what we do everyday, without much effort.

What's interesting now, is if you look at what the top labs are doing, they’ve stopped trying to solve the underlying math of real-time weight updates. Instead, they’re simply brute-forcing it. It is exactly why, in the past ~ 3 months or so, there has been a step-function increase in how good the models have gotten.

Long story short, the gist of it is, if you combine:

very long context windows

reliable summarization

structured external documentation,

you can approximate a lot of what people mean by continual learning.

How it works is, the model does a task and absorbs a massive amount of situational detail. Then, before it “hands off” to the next instance of itself, it writes two things: short “memories” (always carried forward in the prompt/context) and long-form documentation (stored externally, retrieved only when needed). The next run starts with these notes, so it doesn't need to start from scratch.

Through this clever reinforcement learning (RL) loop, they train this behaviour directly, without any exotic new theory.

They treat memory-writing as an RL objective: after a run, have the model write memories/docs, then spin up new instances on the same, similar, and dissimilar tasks while feeding those memories back in. How this is done, is by scoring performance across the sequence, and applying an explicit penalty for memory length so you don’t get infinite “notes” that eventually blow the context window.

Over many iterations, you reward models that (a) write high-signal memories, (b) retrieve the right docs at the right time, and (c) edit/compress stale notes instead of mindlessly accumulating them.

This is pretty crazy. Because when you combine the current release cadence of frontier labs where each new model is trained and shipped after major post-training / scaling improvements, even if your deployed instance never updates its weights in real-time, it can still “get smarter” when the next version ships AND it can inherit all the accumulated memories/docs from its predecessor.

This is a new force multiplier, another scaling paradigm, and likely what the top labs are doing right now (source: TBA).

Ignoring any black swan level event (unknown, unknowns), you get a plausible 2026 trajectory:

We’re going to see more and more improvements, in an accelerated timeline. The top labs ARE, in effect, using continual learning (a really good approximation of it), and they are directly training this approximation, so it rapidly gets better and better.

Don't believe me? Look at what both OpenAi(https://openai.com/index/introducing-openai-frontier/) and Anthropic(https://resources.anthropic.com/2026-agentic-coding-trends-report) have mentioned as their core things they are focusing on. It's exactly why governments & corporations are bullish on this; there is no wall....

6 points | by kok14 11 hours ago

4 comments

  • Imanari 5 hours ago
    I don't know who you are and how you are so sure about 'what top labs are actually doing' but I have a similar feeling about the issue. The models dont have to 'actually learn', the setup has to approximate 'actual learning' just well enough to be usefull.

    > AND it can inherit all the accumulated memories/docs from its predecessor.

    So we are talking about a whole system, not just the model? Reminds me of something I heard a while back 'AGI will be a product, not a model'

  • aristofun 7 hours ago
    How does the model reliably know if notes and memories were good or bad and update accordingly?
  • uaas 8 hours ago
    Would a human end a long-winded text with…?
  • potsandpans 4 hours ago
    I have been experimenting with hands off keyboard agent-driven implementations of non trivial tasks, and I'm finding that the same pattern you're outlining proves to be very useful:

    Explore, summarize, plan, handoff. That info distillation loop seems to be quite effective at keeping agents on task. I've been considering adding memories between agents, but haven't thought of a good data model yet.

    The problem with something like memories is I've observed that when context gets polluted agents get confused. Especially these distilled models like minimax and kimi. So the challenge is ensuring that only "relevant" memories are pulled into context for a given task.