When I reject AI code even if it works

(vinibrasil.com)

47 points | by vnbrs 2 hours ago

6 comments

  • ecshafer 30 minutes ago
    If we rephrased this to "When I reject my coworkers code even if it works" and give the same reasons there would be zero dissent. There is this weird idea that seems to come up with AI that any solution must be good and adequate. Software Engineering is all about rejecting code that works for the right code that works.
    • api 10 minutes ago
      Which means it doesn’t matter if the code is from AI or not.

      If it’s not good it’s not good.

  • Aurornis 39 minutes ago
    Even using Fable (while it was briefly available), having it refine a plan, and directing it to make only small incremental changes, I still found reasons to reject its first pass at a lot of work. There was a lot of “You’re right to push back” responses. A lot of incidents where it would creat some giant complex set of abstractions to accomplish something that I could find ways to do much more elegantly and in a more maintainable manner.

    It’s really eye opening to work with these tools on a codebase you know deeply because these problems are everywhere.

    However if I opened an unfamiliar project in another language and I wanted to add a little feature with no intention of maintaining it, I’d happily accept the changes and loop until it worked well enough for my temporary needs.

    The scary middle is when you’re dealing with coworkers who don’t care about anything other than closing tickets and collecting credit. With enough of a token budget you can now wrap loops around an LLM and have it try things until the program appears to work. Ask it to do a code review and then submit the PR without having understood what it was doing. There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.

    • resonious 11 minutes ago
      All Claude models are huge suck ups. The "you're absolutely right" meme is real even if that exact phrase doesn't show up as much anymore.

      I don't want to start a fight or anything but IME Codex has a bit more of a spine. If you point out something weird, it sometimes gives a good reason for it. Whereas Claude will always say "whoopsie you're right as always sir" even when it's me who missed something.

    • embedding-shape 18 minutes ago
      > There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.

      If the "big ball of spaghetti" theory holds, where software companies who can't manage the debt stumble over themselves as they continue to add to the big ball of spaghetti code, I guess we'll see a row of companies declaring "software bankruptcy" or something in some/many months, depending on how well these workspaces learn to care slightly more and get better at pushing back against slop.

    • busterarm 29 minutes ago
      > With enough of a token budget you can now wrap loops around an LLM and have it try things until the program appears to work. Ask it to do a code review and then submit the PR without having understood what it was doing. There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.

      I'm not making an argument in favor of people using LLMs for this, but people were doing this before we had LLMs it was just usually a bit slower. I can't even say it usually doesn't work out long term because I worked with a lot of guys who did this and took a ton of Adderall while working practically around the clock. Every incentive structure in the organizations rewarded it along with social credibility from more junior engineers. (The last cowboy I worked with who pulled this shit ended up becoming the most senior engineer in the company, a multi-millionaire and worshipped like a god by 90% of the mostly fresh grads we were hiring).

      The problem is when invariably these people burn out eventually and leave, they leave a massive vacuum in their stead. Not from load they were carrying but creating.

      I think the larger the organization I've been at, the more they reward the people making huge commits on nights and weekends. Worse, they could get away with TBRing their shit and merging it without review.

      LLMs are often all of the bad habits and organizational problems that we already carryied just being speedrun. There are some places doing it right, but they already were.

  • summerlight 38 minutes ago
    My personal rule of thumb: I am usually okay with agents driving e2e implementations if this won't make life noticeably worse when it does not work. Some analytical code? Perfectly fine. Hobby projects? Fine, though I prefer doing a fun part myself. Refactoring production code generating 10x more revenue than my salary? You'd better be at least understanding what it does.
    • resonious 9 minutes ago
      Yes this is the thing with these new tools. You have to know when to use them and when not to.

      Good ol' software architecture tricks can also help you slot "vibe coded" components into a larger system safely.

  • datadrivenangel 1 hour ago
    "The reality is that code that runs and makes the CI green can still be a bad solution, and engineering has always been about implementing adequate, scalable, and extensible solutions."

    Adequate often means done and cheap

    • josephg 42 minutes ago
      > Adequate often means done and cheap

      It really, REALLY depends what you're working on. If you're throwing together an internal tool or simple dashboard, it doesn't really matter what the code looks like. But if you're writing software that other programs will depend on, bad design choices ripple out and affect another generation of software. Imagine slop in the linux kernel, in google chrome, or in your compiler or runtime. Its not acceptable.

      I know a lot of people spend their careers writing end user software and web UIs. AI is increasingly a good choice for this sort of code. But that's not all of us. And its not all of the software being written.

    • DrewADesign 55 minutes ago
      As long as safe and stable are assumed to be base-level requirements… maybe?
    • solid_fuel 57 minutes ago
      Disagree, adequate means adequate. Done and cheap is what you call it when a solution is adequate. If the solution isn't adequate, it doesn't matter if it's cheap, because it isn't done.
  • rvz 10 minutes ago
    > Before coding agents, when given a task, I would explore the codebase, think of different solutions, experiment, and only then implement. That could take days of consolidating all that context. When I finally submitted that PR, confidence was higher, and explaining each of my changes to my coworkers was easier.

    Now we are getting to the point where we are speed-running the deskilling of engineers into comprehension debt and they themselves rapidly losing confidence in reviewing code they did not write.

    I think this blog post [0] is the best example of what could go entirely wrong and even worse when you do not know the technology.

    If you cannot explain a change even when "the CI is green" or "all tests passing", I will immediately reject it.

    Maybe great for vibe coding prototypes, but it all changes when that code is deployed onto mission critical systems. Just ask Amazon with Kiro. [1]

    [0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

    [1] https://www.reuters.com/business/retail-consumer/amazons-clo...

  • _wire_ 1 hour ago
    "Even if it works?"

    How do you verify that it works?

    • serious_angel 54 minutes ago
      For example, the following "works":

          json='{ "left":2, "right":2 }';    
          result="$(
              perl -e '($_)=<>; / "left":(\d+), "right":(\d+)/; print $1 + $2, "\n";' <<< "$json";
          )";
          printf '%s\n' "$result";
      
      Yet, it is literally the same as:

          printf '%s\n' "$(( 2 + 2 ))";
    • p1024k 1 hour ago
      According to the author's intention, it is the code that he cannot understand or control. Even if the solution provided by the AI works, he will not adopt it. This is unless he can understand or control it. This should be an assumption.

      However, if AI provides a solution, as the person using AI, one should conduct research before making a decision. This is not in conflict with or hindered by the use of the ideas provided by AI.

      • Grombobulous 38 minutes ago
        I think this policy is probably more prescriptive than I would go with myself. I like to think of my risk tolerance first to help make that determination.

        For example, I use a vibecoded internal tool written in Go. I don’t even know how to write Go. Haven’t read a single line of the code. I just wanted to move from bash scripts to using cloud SDKs for performance reasons.

        But the internal tool is a convenience tool, and you can do everything it does using alternative methods. So if it break, there is no real negative impact besides personal convenience of anyone using it. There’s some documentation on how to do everything manually if needed.

        Here’s another example: you’re making a static website. No JavaScript, no interactivity. Truly, what could go wrong? And while I do understand HTML a lot better than Go, it wouldn’t really matter if I didn’t.

      • andyfilms1 52 minutes ago
        I will say--as someone who has fielded late night troubleshooting calls--I totally understand OP's point of view. It's reasonable to expect that you will be able to answer questions about something that you ship, or brainstorm ways to solve a problem a customer is encountering while using something you provided them.

        The obvious counterargument is "well, just ask the AI for those answers," but the AI lacks the context and experience that you have. Sometimes, genuinely, the user really is just "holding it wrong," but none of the current AI models would ever admit that, and you'd spend hours trying to solve an unsolvable problem.