31 comments

  • Frannky 36 minutes ago
    Opus 4.6 is nuts. Everything I throw at it works. Frontend, backend, algorithms—it does not matter.

    I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.

    It was also the first AI I felt, "Damn, this thing is smarter than me."

    The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.

    • koreth1 3 minutes ago
      I wish I had this kind of experience. I threw a tedious but straightforward task at Claude Code using Opus 4.6 late last week: find the places in a React code base where we were using useState and useEffect to calculate a value that was purely dependent on the inputs to useEffect, and replace them with useMemo. I told it to be careful to only replace cases where the change did not introduce any behavior changes, and I put it in plan mode first.

      It gave me an impressive plan of attack, including a reasonable way to determine which code it could safely modify. I told it to start with just a few files and let me review; its changes looked good. So I told it to proceed with the rest of the code.

      It made hundreds of changes, as expected (big code base). And most of them were correct! Except the places where it decided to do things like put its "const x = useMemo(...)" call after some piece of code that used the value of "x", meaning I now had a bunch of undefined variable references. There were some other missteps too.

      I tried to convince it to fix the places where it had messed up, but it quickly started wanting to make larger structural changes (extracting code into helper functions, etc.) rather than just moving the offending code a few lines higher in the source file. Eventually I gave up trying to steer it and, with the help of another dev on my team, fixed up all the broken code by hand.

      It probably still saved time compared to making all the changes myself. But it was way more frustrating.

    • Aperocky 8 minutes ago
      I had been able to get it into the classic AI loop once.

      It was about a problem with calculation around filling a topographical water basin with sedimentation where calculation is discrete (e.g. turn based) and that edge case where both water and sediments would overflow the basin; To make the matter simple, fact was A, B, C, and it oscillated between explanation 1 which refuted C, explanation 2 which refuted A and explanation 3 that refuted B.

      I'll give it to opus training stability that my 3 tries using it all consistently got into this loop, so I decided to directly order it to do a brute force solution that avoided (but didn't solve) this problem.

      I did feel like with a human, there's no way that those 3 loop would happen by the second time. Or at least the majority of us. But there is just no way to get through to opus 4.6

    • dzink 21 minutes ago
      Opus 4.6 is AGI in my book. They won’t admit it, but it’s absolutely true. It shows initiative in not only getting things right but also adding improvements that the original prompt didn't request that match the goals of the job.
      • winrid 17 minutes ago
        On the adding improvements and being helpful thing, isn't that part of the system prompt?
    • hrishikesh-s 5 minutes ago
      Opus-4.6 is so far ahead of the rest that I think Anthropic is the winner in winner-take-all
    • eru 34 minutes ago
      > [...] with multiple agents working at the same time, each at that speed.

      Horizontal parallelising of tasks doesn't really require any modern tech.

      But I agree that Opus 4.6 with 1M context window is really good at lots of routine programming tasks.

      • travisgriggs 14 minutes ago
        Opus helped me brick my RPi CM4 today. It glibly apologized for telling to use an e instead of a 6 in a boot loader sequence.

        Spent an hour or so unraveling the mess. My feeling are growing more and more conflicted about these tools. They are here to stay obviously.

        I’m honestly uncertain about the junior engineers I’m working with who are more productive than they might be otherwise, but are gaining zero (or very little) experience. It’s like the future is a world where the entire programming sphere is dominated by the clueless non technical management that we’ve all had to deal with in small proportion a time or two.

    • vessenes 29 minutes ago
      I’ll put out a suggestion you pair with codex or deepthink for audit and review - opus is still prone to … enthusiastic architectural decisions. I promise you will be at least thankful and at most like ‘wtf?’ at some audit outputs.

      Also shout out to beads - I highly recommend you pair it with beads from yegge: opus can lay out a large project with beads, and keep track of what to do next and churn through the list beautifully with a little help.

    • interpol_p 9 minutes ago
      I had Opus 4.6 running on a backend bug for hours. It got nowhere. Turned out the problem was in AWS X-ray swizzling the fetch method and not handling the same argument types as the original, which led to cryptic errors.

      I had Opus 4.6 tell me I was "seeing things wrong" when I tried to have it correct some graphical issues. It got stuck in a loop of re-introducing the same bug every hour or so in an attempt to fix the issue.

      I'm not disagreeing with your experience, but in my experience it is largely the same as what I had with Opus 4.5 / Codex / etc.

  • dimitri-vs 8 hours ago
    The big change here is:

    > Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.

    For Claude Code users this is huge - assuming coherence remains strong past 200k tok.

    • MikeNotThePope 3 hours ago
      Is it ever useful to have a context window that full? I try to keep usage under 40%, or about 80k tokens, to avoid what Dex Horthy calls the dumb zone in his research-plan-implement approach. Works well for me so far.

      No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

      • furyofantares 2 hours ago
        I'd been on Codex for a while and with Codex 5.2 I:

        1) No longer found the dumb zone

        2) No longer feared compaction

        Switching to Opus for stupid political reasons, I still have not had the dumb zone - but I'm back to disliking compaction events and so the smaller context window it has, has really hurt.

        I hope they copy OpenAI's compaction magic soon, but I am also very excited to try the longer context window.

        • mgambati 1 hour ago
          1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.
          • furyofantares 48 minutes ago
            I'm directly conveying my actual experience to you. I have tasks that fill up Opus context very quickly (at the 200k context) and which took MUCH longer to fill up Codex since 5.2 (which I think had 400k context at the time).

            This is direct comparison. I spent months subscribed to both of their $200/mo plans. I would try both and Opus always filled up fast while Codex continued working great. It's also direct experience that Codex continues working great post-compaction since 5.2.

            I don't know about Gemini but you're just wrong about Codex. And I say this as someone who hates reporting these facts because I'd like people to stop giving OpenAI money.

            • dotancohen 31 minutes ago
              What's wrong with OpenAI?
              • furyofantares 9 minutes ago
                When Anthropic said they wouldn't sell LLMs to the government for mass surveillance or autonomous killing machines, and got labeled a supply chain risk as a result, OpenAI told the public they have the same policy as Anthropic while inking a deal with the government that clearly means "actually we will sell you LLMs for mass surveillance or autonomous killing machines but only if you tell us it's legal".

                If you already knew all that I'm not interested in an argument, but if you didn't know any of that, you might be interested in looking it up.

              • awakeasleep 12 minutes ago
                It’s so difficult to have an earnest discussion about ethical matters like this, that your question almost reads as flame bait.

                I don’t think it is though, so I just recommend you did a quick Google search or even asked a chat bot to list the ethical compromises

          • hu3 1 hour ago
            Source? I ask because I use 500k+ context on these on a daily basis.

            Big refactorings guided by automated tests eat context window for breakfast.

            • 8note 1 hour ago
              i find gemini gets real real bad when you get far into the context - gets into loops, forgets how to call tools, etc
              • girvo 26 minutes ago
                I find gemini does that normally, personally. Noticeably worse in my usage than either Claude or Codex.
              • petesergeant 15 minutes ago
                I find Gemini to be real bad. Are you just using it for price reasons, or?
          • johnebgd 49 minutes ago
            Codex high reasoning has been a legitimately excellent tool for generating feedback on every plan Claude opus thinking has created for me.
        • iknowstuff 1 hour ago
          Hmm I’ve felt the dumb zone on codex
          • nomel 51 minutes ago
            From what I've seen, it means whatever he's doing is very statistically significant.
      • kaizenb 32 minutes ago
        Thanks for the video.

        His fix for "the dumb zone" is the RPI Framework:

        ● RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.

        ● PLAN. The agent writes a detailed step-by-step plan. You review and approve the plan, not just the output. Dex calls this avoiding "outsourcing your thinking." The plan is where intent gets compressed before execution starts.

        ● IMPLEMENT. Execute in a fresh context window. The meta-principle he calls Frequent Intentional Compaction: don't let the chat run long. Ask the agent to summarize state, open a new chat with that summary, keep the model in the smart zone.

        • girvo 27 minutes ago
          That's fascinating: that is identical to the workflow I've landed on myself.
          • hedora 1 minute ago
            It's also identical to what Claude Code does if you put it in plan mode (bound to <tab> key), at least in my experience.
      • SkyPuncher 2 hours ago
        Yes. I've recently become a convert.

        For me, it's less about being able to look back -800k tokens. It's about being able to flow a conversation for a lot longer without forcing compaction. Generally, I really only need the most recent ~50k tokens, but having the old context sitting around is helpful.

        • hombre_fatal 2 hours ago
          Also, when you hit compaction at 200k tokens, that was probably when things were just getting good. The plan was in its final stage. The context had the hard-fought nuances discovered in the final moment. Or the agent just discovered some tiny important details after a crazy 100k token deep dive or flailing death cycle.

          Now you have to compact and you don’t know what will survive. And the built-in UI doesn’t give you good tools like deleting old messages to free up space.

          I’ll appreciate the 1M token breathing room.

          • roygbiv2 2 hours ago
            I've found compactation kills the whole thing. Important debug steps completely missing and the AI loops back round thinking it's found a solution when we've already done that step.
            • s900mhz 40 minutes ago
              I find it useful to make Claude track the debugging session with a markdown file. It’s like a persistent memory for a long session over many context windows.

              Or make a subagent do the debugging and let the main agent orchestrate it over many subagent sessions.

            • garciasn 1 hour ago
              For me, Claude was like that until about 2m ago. Now it rarely gets dumb after compaction like it did before.
              • 8note 1 hour ago
                oh, ive found that something about compaction has been dropping everything that might be useful. exact opposite experience
            • myrak 1 hour ago
              [dead]
      • ogig 3 hours ago
        When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.
        • SequoiaHope 2 hours ago
          Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.
        • MikeNotThePope 2 hours ago
          I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.
          • ashdksnndck 2 hours ago
            My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.
          • tudelo 2 hours ago
            I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.
        • boredtofears 2 hours ago
          All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)
          • grafmax 2 hours ago
            A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.
            • not_kurt_godel 1 hour ago
              Just curious, what kind of work are you doing where agentic workflows are consistently able to make notable progress semi-autonomously in parallel? Hearing people are doing this, supposedly productively/successfully, kind of blows my mind given my near-daily in-depth LLM usage on complex codebases spanning the full stack from backend to frontend. It's rare for me to have a conversation where the LLM (usually Opus 4.6 these days) lasts 30 minutes without losing the plot. And when it does last that long, I usually become the bottleneck in terms of having to think about design/product/engineering decisions; having more agents wouldn't be helpful even if they all functioned perfectly.
              • avereveard 1 hour ago
                I've passed that bottleneck with a review task that produces engineering recommendations along six axis (encapsulation, decoupling, simplification, dedoupling, security, reduce documentation drift) and a ideation tasks that gives per component a new feature idea, an idea to improve an existing feature, an idea to expand a feature to be more useful. These two generate constant bulk work that I move into new chat where it's grouped by changeset and sent to sub agent for protecting the context window.

                What I'm doing mostly these days is maintaining a goal.md (project direction) and spec.md (coding and process standards, global across projects). And new macro tasks development, I've one under work that is meant to automatically build png mockup and self review.

                • not_kurt_godel 54 minutes ago
                  What are you using to orchestrate/apply changes? Claude CLI?
          • chrisweekly 2 hours ago
            weary (tired) -> wary (cautious)
          • saaaaaam 2 hours ago
            Wary, not weary. Wary: cautious. Weary: tired.
      • Barbing 17 minutes ago
        Looking at this URL, typo or YouTube flip the si tracking parameter?

          youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ
      • ricksunny 2 hours ago
        Since I'm yet to seriously dive into vibe coding or AI-assisted coding, does the IDE experience offer tracking a tally of the context size? (So you know when you're getting close or entering the "dumb zone")?
        • MikeNotThePope 1 hour ago
          The 2 I know, Cursor and Claude Code, will give you a percentage used for the context window. So if you know the size of the window, you can deduce the number of tokens used.
        • 8note 1 hour ago
          Cline gives you such a thing. you dont really know where the dumb zone by numbers though, only by feel.
        • stevula 2 hours ago
          Most tools do, yes.
        • quux 2 hours ago
          OpenCode does this. Not sure about other tools
        • nujabe 2 hours ago
          > Since I'm yet to seriously dive into vibe coding or AI-assisted coding

          Unless you’re using a text editor as an IDE you probably have already

      • dimitri-vs 2 hours ago
        It's kind of like having a 16 gallon gas tank in your car versus a 4 gallon tank. You don't need the bigger one the majority of the time, but the range anxiety that comes with the smaller one and annoyance when you DO need it is very real.
        • steve-atx-7600 2 hours ago
          It seems possible, say a year or two from now that context is more like a smart human with a “small”, vs “medium” vs “large” working memory. The small fellow would be able to play some popular songs on the piano , the medium one plays in an orchestra professionally and the x-large is like Wagner composing Der Ring marathon opera. This is my current, admittedly not well informed mental model anyway. Well, at least we know we’ve got a little more time before the singularity :)
          • twodave 1 hour ago
            It’s more like the size of the desk the AI has to put sheets of paper on as a reference while it builds a Lego set. More desk area/context size = able to see more reference material = can do more steps in one go. I’ve lately been building checklists and having the LLM complete and check off a few tasks at a time, compacting in-between. With a large enough context I could just point it at a PLAN.md and tell it to go to work.
        • scwoodal 2 hours ago
          Except after 4 gallons it might as well be pure oil, mucking everything up.
      • maskull 2 hours ago
        After running a context window up high, probably near 70% on opus 4.6 High and watching it take 20% bites out of my 5hr quota per prompt I've been experimenting with dumping context after completing a task. Seems to be working ok. I wonder if I was running into the long context premium. Would that apply to Pro subs or is just relevant to api pricing?
      • saaaaaam 2 hours ago
        That video is bizarre. Such a heavy breather.
      • bushbaba 48 minutes ago
        Yes. I’ve used it for data analysis
      • twodave 1 hour ago
        I mean, try using copilot on any substantial back-end codebase and watch it eat 90+% just building a plan/checklist. Of course copilot is constrained to 120k I believe? So having 10x that will blow open up some doors that have been closed for me in my work so far.

        That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.

    • a_e_k 2 hours ago
      I've been using the 1M window at work through our enterprise plan as I'm beginning to adopt AI in my development workflow (via Cline). It seems to have been holding up pretty well until about 700k+. Sometimes it would continue to do okay past that, sometimes it started getting a bit dumb around there.

      (Note that I'm using it in more of a hands-on pair-programming mode, and not in a fully-automated vibecoding mode.)

    • hagen8 3 hours ago
      Well, the question is what is contributing to the usage. Because as the context grows, the amount of input tokens are increasing. A model call with 800K token as input is 8 times more expensive than a model call with 100K tokens as input. Especially if we resume a conversation and caching does not hit, it would be very expensive with API pricing.
    • chatmasta 2 hours ago
      So a picture is worth 1,666 words?
    • islewis 3 hours ago
      The quality with the 1M window has been very poor for me, specifically for coding tasks. It constantly forgets stuff that has happened in the existing conversation. n=1, ymmv
      • robwwilliams 43 minutes ago
        Yes, especially with shifts in focus of a long conversation. But given the high error rates of Opus 4.6 the last few weeks it is possibly due to other factors. Conversational and code prodding has been essential.
  • convenwis 8 hours ago
    Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?
    • esperent 1 hour ago
      > As you got closer to 100k performance degraded substantially

      In practice, I haven't found this to be the case at all with Claude Code using Opus 4.6. So maybe it's another one of those things that used to be true, and now we all expect it to be true.

      And of course when we expect something, we'll find it, so any mistakes at 150k context use get attributed to the context, while the same mistake at 50k gets attributed to the model.

    • tyleo 3 hours ago
      I mentioned this at work but context still rots at the same rate. 90k tokens consumed has just as bad results in 100k context window or 1M.

      Personally, I’m on a 6M+ line codebase and had no problems with the old window. I’m not sending it blindly into the codebase though like I do for small projects. Good prompts are necessary at scale.

    • minimaxir 8 hours ago
      The benchmark charts provided are the writeup. Everything else is just anecdata.
    • FartyMcFarter 3 hours ago
      Isn't transformer attention quadratic in complexity in terms of context size? In order to achieve 1M token context I think these models have to be employing a lot of shortcuts.

      I'm not an expert but maybe this explains context rot.

      • vlovich123 2 hours ago
        Nope, there’s no tricks unless there’s been major architectural shifts I missed. The rot doesn’t come from inference tricks to try to bring down quadratic complexity of the KV cache. Task performance problems are generally a training problem - the longer and larger the data set, the fewer examples you have to train on it. So how do you train the model to behave well - that’s where the tricks are. I believe most of it relies on synthetically generated data if I’m not mistaken, which explains the rot.
  • minimaxir 8 hours ago
    Claude Code 2.1.75 now no longer delineates between base Opus and 1M Opus: it's the same model. Oddly, I have Pro where the change supposedly only for Max+ but am still seeing this to be case.

    EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.

    The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.

    • zaptrem 2 hours ago
      I have Max 20x and they're still separate on 2.1.75.
    • auggierose 3 hours ago
      No change for Pro, just checked it, the 1M context is still extra usage.
  • wewewedxfgdf 3 hours ago
    The weirdest thing about Claude pricing is their 5X pricing plan is 5 times the cost of the previous plan.

    Normally buying the bigger plan gives some sort of discount.

    At Claude, it's just "5 times more usage 5 times more cost, there you go".

    • apetresc 2 hours ago
      Those sorts of volume discounts are what you do when you're trying to incentivize more consumption. Anthropic already has more demand then they're logistically able to serve, at the moment (look at their uptime chart, it's barely even 1 9 of reliability). For them, 1 user consuming 5 units of compute is less attractive than 5 users consuming 1 unit.

      They would probably implement _diminishing_-value pricing if pure pricing efficiency was their only concern.

    • auggierose 3 hours ago
      It is not the plan they want you to buy. It is a pricing strategy to get you to buy the 20x plan.
      • radley 3 hours ago
        5x Max is the plan I use because the Pro plan limits out so quickly. I don't use Claude full-time, but I do need Claude Code, and I do prefer to use Opus for everything because it's focused and less chatty.
        • auggierose 2 hours ago
          Sure, I get it. For me a 2x Max would be ideal and usually enough. Now, guess why they are not offering that?
          • prtmnth 42 minutes ago
            Same here. I'd love a 2x Max plan! More than enough usage for my needs.
    • tclancy 27 minutes ago
      We’ll make it up on volume.
    • operatingthetan 3 hours ago
      I think they are both subsidized so either is a great deal.
    • Zambyte 3 hours ago
      5 for 5
  • vessenes 3 hours ago
    This is super exciting. I've been poking at it today, and it definitely changes my workflow -- I feel like a full three or four hour parallel coding session with subagents is now generally fitting into a single master session.

    The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.

    p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?

    • steve-atx-7600 2 hours ago
      Did it get better? I used sonnet 4.5 1m frequently and my impression was that it was around the same performance but a hell of a lot faster since the 1m model was willing to spends more tokens at each step vs preferring more token-cautious tool calls.
      • vessenes 1 hour ago
        Opus 4.6 is wayy better than sonnet 4.5 for sure.
    • mattfrommars 2 hours ago
      Random: are you personally paying for Claude Code or is it paid by you employer?

      My employer only pays for GitHub copilot extension

      • kiratp 58 minutes ago
        GitHub Copilot CLI lets you use all these models (unless your employer disables them.

        https://github.com/features/copilot/cli

        Disclosure: work at Msft

        • tclancy 25 minutes ago
          Disclosure: have to use them via copilot at work. Be glad I don’t write code for nuclear plants. Why does it have to be so hard. Doubly so in JetBrains ides but I’ve a feeling that’s on both of you rather than just you personally. But I still resent you now.
      • celestialcheese 2 hours ago
        Both. Employer pays for work max 20x, i pay for a personal 10x for my side projects and personal stuff.
  • throw03172019 22 minutes ago
    Pentagon may switch to Claude knowing OpenAI has the premium rates for 1M context.
  • causalzap 51 minutes ago
    I've been using Opus 4.5 for programmatic SEO and localizing game descriptions. If 4.6 truly improves context compaction, it could significantly lower the API costs for large-scale content generation. Has anyone tested its logic consistency on JSON output compared to 4.5?
  • aragonite 2 hours ago
    Do long sessions also burn through token budgets much faster?

    If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?

    • dathery 1 hour ago
      That's correct. Input caching helps, but even then at e.g. 800k tokens with all of them cached, the API price is $0.50 * 0.8 = $0.40 per request, which adds up really fast. A "request" can be e.g. a single tool call response, so you can easily end up making many $0.40 requests per minute.
      • acjohnson55 48 minutes ago
        Interesting, so a prompt that causes a couple dozen tool calls will end up costing in the tens of dollars?
    • jasondclinton 1 hour ago
      If you use context cacheing, it saves quite a lot on the costs/budgets. You can cache 900k tokens if you want.
  • pixelpoet 3 hours ago
    Compared to yesterday my Claude Max subscription burns usage like absolutely crazy (13% of weekly usage from fresh reset today with just a handful prompts on two new C++ projects, no deps) and has become unbearably slow (as in 1hr for a prompt response). GGWP Anthropic, it was great while it lasted but this isn't worth the hundreds of dollars.
    • Spooky23 3 hours ago
      Yeah, morning eastern time Claude is brutal.
  • dkpk 33 minutes ago
    Is this also applicable for usage in Claude web / mobile apps for chat?
  • chaboud 2 hours ago
    Awesome.... With Sonnet 4.5, I had Cline soft trigger compaction at 400k (it wandered off into the weeds at 500k). But the stability of the 4.6 models is notable. I still think it pays to structure systems to be comprehensible in smaller contexts (smaller files, concise plans), but this is great.

    (And, yeah, I'm all Claude Code these days...)

  • margorczynski 3 hours ago
    What about response coherence with longer context? Usually in other models with such big windows I see the quality to rapidly drop as it gets past a certain point.
  • 8note 1 hour ago
    im guessing this is why the compacts have started sucking? i just finished getting me some nicer tools for manipulating the graph so i could compact less frequently, and fish out context from the prior session.

    maybe itll still be useful, though i only have opus at 1M, not sonnet yet

  • vicchenai 3 hours ago
    The no-degradation-at-scale claim is the interesting part. Context rot has been the main thing limiting how useful long context actually is in practice — curious to see what independent evals show on retrieval consistency across the full 1M window.
    • apetresc 2 hours ago
      I don't think they're claiming "no degradation at scale", are they? They still report a 91.9->78.3 drop. That's just a better drop than everyone else (is the claim).
  • arjie 2 hours ago
    This is fantastic. I keep having to save to memory with instructions and then tell it to restore to get anywhere on long running tasks.
  • thunkle 2 hours ago
    Just have to ask. Will I be spending way more money since my context window is getting so much bigger?
  • swader999 2 hours ago
    I notice Claude steadily consuming less tokens, especially with tool calling every week too
  • aliljet 3 hours ago
    Are there evals showing how this improves outputs?
    • apetresc 2 hours ago
      Improves outputs relative to what? Compared to previous contexts of 1M, it improves outputs by allowing them to exist (because previously you couldn't exceed 200K). Compared to contexts of <200K, it degrades outputs rather than improves them, but that's what you'd expect from longer contexts. It's still better than compaction, which was previously the alternative.
  • johnwheeler 3 hours ago
    This is incredible. I just blew through $200 last night in a few hours on 1M context. This is like the best news I've heard all year in regards to my business.

    What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"

    • hagen8 3 hours ago
      Did u use the API or subscription?
      • johnwheeler 3 hours ago
        Max subscription and "extra usage" billing
        • steve-atx-7600 2 hours ago
          That sounds high. I mean, if you paid for the 20x max plan you’d be capped at around 200/month and at least for me as a professional engineer running a few Claude’s in parallel all day, I haven’t exceeded the plans limits.
          • Wowfunhappy 1 hour ago
            Prior to this announcement, all 1M context use consumed "extra usage", it wasn't included in a normal subscription plan.
    • dominotw 3 hours ago
      rarely go over 25 percent in codex but i hit 80 on claude code in just a short time.
  • LoganDark 39 minutes ago
    Finally, I don't have to constantly reload my Extra Usage balance when I already pay $200/mo for their most expensive plan. I can't believe they even did that. I couldn't use 1M context at all because I already pay $200/mo and it was going to ask me for even more.

    Next step should be to allow fast mode to draw from the $200/mo usage balance. Again, I pay $200/mo, I should at least be able to send a single message without being asked to cough up more. (One message in fast mode costs a few dollars each) One would think $200/mo would give me any measure of ability to use their more expensive capabilities but it seems it's bucketed to only the capabilities that are offered to even free users.

  • 8cvor6j844qw_d6 3 hours ago
    Oh nice, does it mean less game of /compact, /clear, and updating CLAUDE.md with Claude Code?
    • fnordpiglet 3 hours ago
      I’ve been using 1M for a while and it defers it and makes it worse almost when it happens. Compacting a context that big loses a ton of fidelity. But I’ve taken to just editing the context instead (double esc). I also am planning to build an agent to slice the session logs up into contextually useful and useless discarding the useless and keeping things high fidelity that way. (I.e., carve up with a script the jsonl and have subagent haiku return the relevant parts and reconstructing the jsonl)
      • dominotw 3 hours ago
        til you can edit context. i keep a running log and /clear /reload log
        • 8note 58 minutes ago
          double escape gets you to a rewind. not sure about much else.

          the conversation history is a linked list, so you can screw with it, with some care.

          I spend this afternoon building an MCP do break the conversation up into topics, then suggest some that aren't useful but are taking up a bunch of context to remove (eg iterations through build/edit just needs the end result)

          its gonna take a while before I'm confident its worth sharing

  • zmmmmm 3 hours ago
    Noticed this just now - all of a sudden i have 1M context window (!!!) without changing anything. It's actually slightly disturbing because this IS a behavior change. Don't get me wrong, I like having longer context but we really need to pin down behaviour for how things are deployed.
    • steve-atx-7600 2 hours ago
      You can pin to specific models with —-model. Check out their doc. See https://support.claude.com/en/articles/11940350-claude-code-.... You can also pin to a less specific tag like sonnet-4.5[1m] (that’s from memory might be a little off).
      • zmmmmm 1 hour ago
        sure - but the model hasn't changed. I'm specifying it explicitly. But suddenly the context window has. I'm not using Claude Code, this is an application built against Bedrock APIs. I assume there's a way I could be specifying the context window and I'm just using API defaults. But it definitely makes me wonder what else I'm not controlling that I really should be.
    • phist_mcgee 3 hours ago
      Anthropic is famous for changing things under your feet. Claude code is basically alpha software with a global footprint.
  • alienbaby 1 hour ago
    is this the market played in front of our eyes slice by slice: ok, maybe not, but watching these entities duke it out is kinda amusing? There will be consequences but may as well sit it out for the ride, who knows where we are going?
  • nemo44x 1 hour ago
    Has anyone started a project to replace Linux yet?
  • dominotw 3 hours ago
    can someone tell me how to make this instruction work in claude code

    "put high level description of the change you are making in log.md after every change"

    works perfectly in codex but i just cant get calude to do it automatically. I always have to ask "did you update the log".

  • gaigalas 3 hours ago
    I'm getting close to my goal of fitting an entire bootstrappable-from-source system source code as context and just telling Claude "go ahead, make it better".
  • sunilgentyala 1 hour ago
    [dead]
  • sriramgonella 2 hours ago
    [dead]
  • sysutil_dev 1 hour ago
    [dead]