This is already the case for many startups. In fact, the figure might be closer to 100%. The work shifts to requirements analysis, high-level specifications, and final review instead (after AI code review).
This is really a zero information blog post. I want to know how they use the LSP to improve their understanding of the code base. Would be great if it was open source for us to review.
A post like this should be providing people with some reassurance about Claude's ability to understand code at a large scale. It's mostly fluff.
> Claude Code navigates a codebase the way a software engineer would: it traverses the file system, reads files, uses grep to find exactly what it needs, and follows references across the codebase. It operates locally on the developer’s machine and doesn’t require a codebase index to be built, maintained, or uploaded to a server....
> Agentic search avoids those failure modes. There's no embedding pipeline or centralized index to maintain as thousands of engineers commit new code. Each developer's instance works from the live codebase.
The frame of "the way a software engineer would" and the conclusion seem at odds. I'd love to be schooled otherwise?
I use autocomplete/LSPs all the time and they're useful. That's an index? Why wouldn't Claude be able to use one? Also a "software engineer" remembers the codebase - that's definitely a RAG. I have a lot of muscle memory to find the file I need through an auto-completed CMD+P.
It doesn't need to particularly be real-time across thousands of engineers -- just the branch I'm on.
It's rare that I'd be navigating a codebase from first-principles traversal. It would usually be a new codebase and in those cases it's definitely not what I'd call an optimal experience.
> Claude Code is running in production across multi-million-line monorepos, decades-old legacy systems, distributed architectures spanning dozens of repositories (…)
So it is optimized for the general case, using robust tooling that works everywhere, especially when large & messy.
That being said, your remark is right and for well organised smaller repo’s there’s better tooing it can and should use. But I think it does, at least Codex does is my case so I guess Claude does it to. For example Codex use ‘go doc’ first before doing greps.
Even if there is first principles traversal of some parts of the codebase, there are other bits that definitely not change, and where exploring every time is a massive waste of tokens. My arguments with claude often have to do with making it explore a lot less, because I know better, and faster, than its slow, expensive navigation of things that basically never change. And it just goes into the same kind of rabbit holes every time.
I don't have any LSP's hooked up to CC yet (going to fix that today), or particularly sophisticated CLAUDE.md files.
So, if I've read this post correctly, that means that CC is navigating my codebase today by sending lots of it up to a model, and building an understanding. Is that correct? Did I misunderstand it?
I kinda suspected there was more local inference going on somehow -- partly because the iteration times are fairly fast.
Interesting that MCP was mentioned over CLI. For production or controlled environments, I would not make MCP the deployment path. I would let MCP help generate or choose commands, but have the actual deployment go through CLI scripts, Git commits, and CI/CD approval.
the fishing: 1) install the official `skill-creator`; 2) use that with the above link to create `claude-md-improver`; 3) improve the skill by tasking claude with researching the topic of `progressive-disclosure`, in the official docs; 4) point the new skill at you CLAUDE.md file and accept the changes
How very interesting. In an industry, where things shift around in months if not weeks, there’s been not only enough time for clear patterns to emerge but also these patterns have proven successful on large codebases. What’s the success criteria? Didn’t delete production database? Team velocity has increased? Codebase TTL has increased? Operations guys are happier?
I still say if this happens to you with AI tooling, that's both a failure on you and your org for giving a developer prod credentials that could nuke production resources. I don't think I've worked in a place that gave me this level of blind access.
I have only worked in startups and I have been an early engineer in both of them. I would always get high privileges within a short time where I would have the access to create and delete resources. I don't think it's that uncommon.
We kinda need to architect things with the assumption that all token-output from an LLM can be unpredictably sneaky and malicious.
Alas, humans suck at constant vigilance, we're built to avoid it whenever possible, so a "reverse centaur" future of "do what the AI says but only if you see it's good" is going to suck.
The first step I do when I do any meaningful side project is to set up rds with snapshots. So any startup that doesnt do this one basic step already deserves to fail in my opinion.
Then next I've used AI agents like crazy, we even have linked mcp servers that let it query on the dev database. Haven't seen it try deleting everything a single time. I haven't seen any agent try to do anything destructive. Ever. Perhaps its just reflecting an outrageously bad engineer and nothing else.
Exactly. So is that level of obvious hygiene where the bar is or is it somewhere else. What ticks me off is the audacity of blanket claims without an attempt to even remotely state why it’s said this is a list of successful patterns and what does success mean. We’re just supposed to eat it up, because, you know, Claude.
Disagree, but also what do you classify as local storage? Does the repo “size” include all projects or just one? What about multiple branches? How much capacity is local storage?
A stock Unreal Engine project is several hundred gigs, consists of multiple solutions, multiple languages, and I would classify as large personally.
Without some kind of indexing it’s very awkward to work with and very slow. To work with LLMs and Unreal projects we create a local index, that index file alone is 46GB.
Without distributed compilers and caches it can take multiple hours to compile the main solution per platform (usually PC, Linux, Xbox, PlayStation, Switch, and sometimes mobile).
So the codebase easily fits on local storage so long as you don’t count assets (those are several TB) and extra so for source assets (10s of TB), and that’s per stream per large project.
Anyways, point is I disagree and think Unreal Engine is an example of large codebase that fits locally.
Wondering if enterprises have a modified version of CC that doesnt have to optimize to stop bleeding on fixed cost subscription plans.
The article really does not align with the current sentiment. Everyone with a choice has mostly moved on to codex (ofc in this world all it takes is a model update/harness update to turn things around).
CC is great at a lot of things, but repeatedly misses out reading on crucial parts of the code base, hallucinates on the work that was done and a bunch of other issues.
The influencer economy trades on hype, on frenzy, and ultimately, eyeballs. The more the better.
They want you feel like you’re missing out. They want you to switch. Being boring is far more productive. Pin your versions. Stick to stable releases and avoid the nightlies.
Significant noise created from 4.6 to 4.7 Opus transition has caused some to interpret this as signal. Excluding certain genuine and real bugs, the noise about perceived quality falling dramatically was noise. Influencers doing influencing turned it into “signal”. The reality was that if you had strong planning and spec driven development it ranged from manageable to non-existent.
The vast majority of the people I know and work with have not switched off CC or their Max sub.
I have a choice and have not moved to codex (100/mo personal + my employer pays for a subscription). I try codex here and there and it seems to go off the rails every time. I have had some good experiences with codex, but generally trying to get something big accomplished it doesn't work out.
But I may not have paid enough to get the full real experience with codex
I use codex at home 20 bucks a month the limits are very high relative to the price, maybe the gravy train ends soon for these and then it's probably to open router chinese models.
At work it's CC or sometime codex, personally don't see much difference at all and most normies will notice none. The cultists have their opinions.
What bleeding? Anthropic wants as much of that "bleeding" as possible. The interaction data gathered from genuine human CC subscription usage of their models goes directly into their RL training, it's invaluable and they are more than happy to lose money on the inference to get it. That data is what xAI was recently willing to pay $10b to cursor to get.
They want you to use Claude Code. They hate other UI surfaces like OpenCode etc purely because they lose control over that data, so they're subsidizing the inference without getting what they actually want, the data (they still get some of it of course, but it's much less ergonomic for them. Those tools often abstract away the subagent calls, for example). OpenCode can collect that data themselves, so by allowing subscription there, Anthropic sees itself as subsidizing another org getting that data. Hard no.
And tools like OpenClaw are useless because they're mechanical and don't represent actual users interacting with the service - again, subsidizing but not getting the reward.
It's all very simple once you understand their motivations.
You must be using a different CC. Or what they’re writing here is correct, and it’s all due to the CLAUDE.md file that I only occassionally yell at claude.
Hmm please share more. I have had the max CC sub since it came out. Religiously follow all of Boris/Cats advice but still struggle with it. Meanwhile a really badly written AGENTS.md will still get the work done.
I find that most “techniques” are basically user hallucinations. Simple plan-write-refactor loops and trivial CLAUDE/AGENTS.md, generated by the harness itself, work nicely. Maaaaaaaaaybe write a skill or two, but usually it’s better to just write a script.
I think it's a good rule of thumb that if you find yourself saying everyone prefers this model or that model you're in a bubble. I've made this mistake before, I used to go around saying everyone knew Claude was the only model for serious professional use, but I was wrong.
I always assume that people making those comments on HN are trying to convince others to switch to their model. Surely no one actually believes their friend circle is a representative sample of the hundreds of millions of people that use these LLMs?
Btw the guy in charge of that stuff for Anthropic is the same guy who said GPT 2 was too dangerous to release, Jack Clark. LMAO. That model could barely string a sentence together.
I’m super interested to know what the back and forth between models and tools really looks like in practice.
Are there any much more detailed walkthroughs of how it works and how it decides the tools to use and the grep to use etc and what the conversations actually look like?
In the UI you see just enough to know it’s doing something but you don’t really see the jumps it’s making offscreen.
Lots of concepts. Release the harness that made it possible to port Bun to Rust in 9 days. That's what everyone really wants. Then everyone can go "do that but for this other goal".
Meanwhile we are still waiting for these statements to come true:
https://eu.36kr.com/en/p/3648851352018565
https://www.businessinsider.com/anthropic-ceo-ai-90-percent-...
https://www.reddit.com/r/Anthropic/comments/1nemhxb/futurism...
https://medium.com/@coders.stop/dario-amodei-said-90-of-code...
https://www.youtube.com/shorts/0j1HqEEDThc
Accountability, anyone?
A post like this should be providing people with some reassurance about Claude's ability to understand code at a large scale. It's mostly fluff.
> Agentic search avoids those failure modes. There's no embedding pipeline or centralized index to maintain as thousands of engineers commit new code. Each developer's instance works from the live codebase.
The frame of "the way a software engineer would" and the conclusion seem at odds. I'd love to be schooled otherwise?
I use autocomplete/LSPs all the time and they're useful. That's an index? Why wouldn't Claude be able to use one? Also a "software engineer" remembers the codebase - that's definitely a RAG. I have a lot of muscle memory to find the file I need through an auto-completed CMD+P.
It doesn't need to particularly be real-time across thousands of engineers -- just the branch I'm on.
It's rare that I'd be navigating a codebase from first-principles traversal. It would usually be a new codebase and in those cases it's definitely not what I'd call an optimal experience.
> Claude Code is running in production across multi-million-line monorepos, decades-old legacy systems, distributed architectures spanning dozens of repositories (…)
So it is optimized for the general case, using robust tooling that works everywhere, especially when large & messy.
That being said, your remark is right and for well organised smaller repo’s there’s better tooing it can and should use. But I think it does, at least Codex does is my case so I guess Claude does it to. For example Codex use ‘go doc’ first before doing greps.
So, if I've read this post correctly, that means that CC is navigating my codebase today by sending lots of it up to a model, and building an understanding. Is that correct? Did I misunderstand it?
I kinda suspected there was more local inference going on somehow -- partly because the iteration times are fairly fast.
Also, is this the main point: The better you explain the codebase to the LLM the better it explains it to you?
Although if you've ever used Claude's search tool, you'll be unsurprised that the team knows nothing about indexing.
How a company, whose primary product is text-based chat, doesn't allow users to easily perform text search on said chat is beyond comprehension.
the fishing: 1) install the official `skill-creator`; 2) use that with the above link to create `claude-md-improver`; 3) improve the skill by tasking claude with researching the topic of `progressive-disclosure`, in the official docs; 4) point the new skill at you CLAUDE.md file and accept the changes
I still say if this happens to you with AI tooling, that's both a failure on you and your org for giving a developer prod credentials that could nuke production resources. I don't think I've worked in a place that gave me this level of blind access.
Indeed it’s a good practice to use roles where supported (AWS has them) and explicitly switch when needed
Alas, humans suck at constant vigilance, we're built to avoid it whenever possible, so a "reverse centaur" future of "do what the AI says but only if you see it's good" is going to suck.
Then next I've used AI agents like crazy, we even have linked mcp servers that let it query on the dev database. Haven't seen it try deleting everything a single time. I haven't seen any agent try to do anything destructive. Ever. Perhaps its just reflecting an outrageously bad engineer and nothing else.
A stock Unreal Engine project is several hundred gigs, consists of multiple solutions, multiple languages, and I would classify as large personally.
Without some kind of indexing it’s very awkward to work with and very slow. To work with LLMs and Unreal projects we create a local index, that index file alone is 46GB.
Without distributed compilers and caches it can take multiple hours to compile the main solution per platform (usually PC, Linux, Xbox, PlayStation, Switch, and sometimes mobile).
So the codebase easily fits on local storage so long as you don’t count assets (those are several TB) and extra so for source assets (10s of TB), and that’s per stream per large project.
Anyways, point is I disagree and think Unreal Engine is an example of large codebase that fits locally.
The article really does not align with the current sentiment. Everyone with a choice has mostly moved on to codex (ofc in this world all it takes is a model update/harness update to turn things around).
CC is great at a lot of things, but repeatedly misses out reading on crucial parts of the code base, hallucinates on the work that was done and a bunch of other issues.
They want you feel like you’re missing out. They want you to switch. Being boring is far more productive. Pin your versions. Stick to stable releases and avoid the nightlies.
Significant noise created from 4.6 to 4.7 Opus transition has caused some to interpret this as signal. Excluding certain genuine and real bugs, the noise about perceived quality falling dramatically was noise. Influencers doing influencing turned it into “signal”. The reality was that if you had strong planning and spec driven development it ranged from manageable to non-existent.
The vast majority of the people I know and work with have not switched off CC or their Max sub.
But I may not have paid enough to get the full real experience with codex
At work it's CC or sometime codex, personally don't see much difference at all and most normies will notice none. The cultists have their opinions.
What bleeding? Anthropic wants as much of that "bleeding" as possible. The interaction data gathered from genuine human CC subscription usage of their models goes directly into their RL training, it's invaluable and they are more than happy to lose money on the inference to get it. That data is what xAI was recently willing to pay $10b to cursor to get.
They want you to use Claude Code. They hate other UI surfaces like OpenCode etc purely because they lose control over that data, so they're subsidizing the inference without getting what they actually want, the data (they still get some of it of course, but it's much less ergonomic for them. Those tools often abstract away the subagent calls, for example). OpenCode can collect that data themselves, so by allowing subscription there, Anthropic sees itself as subsidizing another org getting that data. Hard no.
And tools like OpenClaw are useless because they're mechanical and don't represent actual users interacting with the service - again, subsidizing but not getting the reward.
It's all very simple once you understand their motivations.
Ha!
As the product they deliver is greenfield and in the newest of domain spaces, there is a serious halo-effect to consider.
On a side note, at a company I know the devs are split between
Stick to Copilot inside visual studio
- suspiciously cheap Opus quotas there
+ they read their code
pi coding agent
+ control all the things my way
- each their own way
Claude Code
+ it's magic
- you mean it did that to my prompt!
Btw the guy in charge of that stuff for Anthropic is the same guy who said GPT 2 was too dangerous to release, Jack Clark. LMAO. That model could barely string a sentence together.
You are deep in an information bubble, mostly driven by hype-train influencers with magpie attention spans.
Are there any much more detailed walkthroughs of how it works and how it decides the tools to use and the grep to use etc and what the conversations actually look like?
In the UI you see just enough to know it’s doing something but you don’t really see the jumps it’s making offscreen.