Let's be honest, Generative AI isn't going all that well

(garymarcus.substack.com)

100 points | by 7777777phil 6 hours ago

27 comments

gejose 1 hour ago
I believe Gary Marcus is quite well known for terrible AI predictions. He's not in any way an expert in the field. Some of his predictions from 2022 [1]
> In 2029, AI will not be able to watch a movie and tell you accurately what is going on (what I called the comprehension challenge in The New Yorker, in 2014). Who are the characters? What are their conflicts and motivations? etc.
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
> In 2029, AI will not be able to work as a competent cook in an arbitrary kitchen (extending Steve Wozniak’s cup of coffee benchmark).
> In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
> In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Many of these have already been achieved, and it's only early 2026.
[1]https://garymarcus.substack.com/p/dear-elon-musk-here-are-fi...
[-]
- merlincorey 1 hour ago
  Which ones are you claiming have already been achieved?
  My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.
  For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.
  Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.
  [-]
  - bspammer 1 hour ago
    The bug-free code one feels unfalsifiable to me. How do you prove that 10,000 lines of code is bug-free, and then there's a million caveats about what a bug actually is and how we define one.
    The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.
    [-]
    - verse 37 minutes ago
      I agree with you but I'd point out that unless you've read the book it's difficult to know if the answer you got was accurate or it just kinda made it up. In my experience it makes stuff up.
      Like, it behaves as if any answer is better than no answer.
  - stingrae 1 hour ago
    1 and 2 have been achieved.
    4 is close, the interface needs some work to allow nontechnical people use it. (claude code)
    [-]
    - fxtentacle 1 hour ago
      I strongly disagree. I’ve yet to find an AI that can reliably summarise emails, let alone understand nuance or sarcasm. And I just asked ChatGPT 5.2 to describe an Instagram image. It didn’t even get the easily OCR-able text correct. Plus it completely failed to mention anything sports or stadium related. But it was looking at a cliche baseball photo taken by an fan inside the stadium.
    - falloutx 1 hour ago
      I dispute 1 & 2 more than 4.
      1) Is it actually watching a movie frame by frame or just searching about it and then giving you the answer?
      2) Again can it handle very long novels, context windows are limited and it can easily miss something. Where is the proof for this?
      4 is probably solved
      4) This is more on predictor because this is easy to game. you can create some gibberish code with LLM today that is 10k lines long without issues. Even a non-technical user can do
      [-]
      - CjHuber 48 minutes ago
        I think all of those are terrible indicators, 1 and 2 for example only measure how well LLMs can handle long context sizes.
        If a movie or novel is famous the training data is already full of commentary and interpretations of them.
        If its something not in the training data, well I don't know many movies or books that use only motives that no other piece of content before them used, so interpreting based on what is similar in the training data still produces good results.
        EDIT: With 1 I meant using a transcript of the Audio Description of the movie. If he really meant watch a movie I'd say thats even sillier because well of course we could get another Agent to first generate the Audio Description, which definitely is possible currently.
        [-]
        zdragnar 42 minutes ago
        Just yesterday I saw an article about a police station's AI body cam summarizer mistakenly claim that a police officer turned into a frog during a call. What actually happened was that the cartoon "princess and the frog" was playing in the background.
        Sure, another model might have gotten it right, but I think the prediction was made less in the sense of "this will happen at least once" and more of "this will not be an uncommon capability".
        When the quality is this low (or variable depending on model) I'm not too sure I'd qualify it as a larger issue than mere context size.
        [-]
        CjHuber 35 minutes ago
        My point was not that those video to text models are good like they are used for example in that case, but more generally I was referring to that list of indicators. Like surely when analysing a movie it is alright if some things are misunderstood by it, especially as the amount of misunderstanding can be decreased a lot. That AI body camera surely is optimized on speed and inference cost. but if you give an agent 10 1s images along with the transcript of that period and the full prior transcript, and give it reasoning capabilities, it would take almost endlessy for that movie to process but the result surely will be much better than the body cameras. After all the indicator talks about "AI" in general so judge a model not optimized for capability but something else to measure on that indicator
- zozbot234 1 hour ago
  > In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
  Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo). Of course the movie variety is even more challenging since it involves especially complex multi-modal input. You could easily extend it to making sense of a whole TV series.
  [-]
  - idreyn 1 hour ago
    Yes. I am a novelist and I noticed a step change in what was possible here around Claude Sonnet 3.7 in terms of being able to analyze my own unpublished work for theme, implicit motivations, subtext, etc -- without having any pre-digested analysis of the work in its training data.
  - the-grump 1 hour ago
    Yes they can. The size of many codebases is much larger and LLMs can handle those.
    Consider also that they can generate summaries and tackle the novel piecemeal, just like a human would.
    Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
    [-]
    - falloutx 1 hour ago
      Novel is different from a codebase. In code you can have a relationship between files and most files can be ignored depending on what you're doing. But for a novel, its a sequential thing, in most cases A leads to B and B leads to C and so on.
      > Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
      This is different from watching a movie. Can it tell what suit actor was wearing? Can it tell what the actor's face looked like? Summarising and watching are too different things.
  - colechristensen 1 hour ago
    >Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo)
    Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
    You could also do something with a multi-pass strategy where you come up with a collection of ideas on the first pass and then look back with search to refine and prove/disprove them.
    Of course for novels which existed before the time of training an LLM will already contain trained information about so having it "read" classic works like The Count of Monte Cristo and answer questions about it would be a bit of an unfair pass of the test because models will be expected to have been trained on large volumes of existing text analysis on that book.
    >reliably answer questions about plot, character, conflicts, motivations
    LLMs can already do this automatically with my code in a sizable project (you know what I mean), it seems pretty simple to get them to do it with a book.
- jgalt212 40 minutes ago
  This comment or something very close always appears alongside a Gary Marcus post.
- colechristensen 1 hour ago
  Besides being a cook which is more of a robotics problem all of the rest are accomplished to the point of being arguable about how reliably LLMs can perform these tasks, the arguments being between the enthusiast and naysayer camps.
  The keyword being "reliably" and what your threshold is for that. And what "bug free" means. Groups of expert humans struggle to write 10k lines of "bug free" code in the absolutist sense of perfection, even code with formal proofs can have "bugs" if you consider the specification not matching the actual needs of reality.
  All but the robotics one are demonstrable in 2026 at least.
- ls612 1 hour ago
  I'm pretty sure it can do all of those except for the one which requires a physical body (in the kitchen) and the one that humans can't do reliably either (construct 10000 loc bug-free).
- thethirdone 1 hour ago
  Which ones of those have been achieved in your opinion?
  I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.
  Then, probably reading a novel and answering questions is the next most successful.
  Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.
  [-]
  - zozbot234 1 hour ago
    Formalizing an arbitrary proof is incredibly hard. For one thing, you need to make sure that you've got at least a correct formal statement for all the prereqs you're relying on, or the whole thing becomes pointless. Many areas of math ouside of the very "cleanest" fields (meaning e.g. algebra, logic, combinatorics etc.) have not seen much success in formalizing existing theory developments.
  - kleene_op 1 hour ago
    > Reliably constructing 10k bug free lines is probably the least successful.
    You imperatively need to try Claude Code, because it absolutely does that.
    [-]
    - thethirdone 1 hour ago
      I have seen many people try to use Claude Code and get LOTS of bugs. Show me any > 10k project you have made with it and I will put the effort in to find one bug free of charge.
mattmaroon 2 hours ago
Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
I myself am saving a small fortune on design and photography and getting better results while doing it.
If this is not all that well I can’t wait until we get to mediocre!
[-]
- merlincorey 1 hour ago
  > Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
  Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
  However, in the end, execution is all that matters so if you and your cofounder are able to execute successfully with mountains of generated code then it doesn't matter what assets and liabilities you hold in the short term.
  The long term is a lot harder to predict in any case.
  [-]
  - _vertigo 1 hour ago
    > Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
    Code that solves problems and makes you money is by definition an asset. Whether or not the code in question does those things remains to be seen, but code is not strictly a liability or else no one would write it.
    [-]
    - merlincorey 1 hour ago
      "Code is a liability. What the code does for you is an asset." as quoted from https://wiki.c2.com/?SoftwareAsLiability with Last edit December 17, 2013.
      This discussion and distinction used to be well known, but I'm happy to help some people become "one of today's lucky 10,000" as quoted from https://xkcd.com/1053/ because it is indeed much more interesting than the alternative approach.
  - wouldbecouldbe 1 hour ago
    Developers that can’t see the change are blind.
    Just this week, sun-tue. I added a fully functional subscription model to an existing platform, build out a bulk async elasticjs indexing for a huge database and migrated a very large Wordpress website to NextJS. 2.5 days, would have cost me at least a month 2 years ago.
    [-]
    - fxtentacle 1 hour ago
      To me, this sounds like:
      AI is helping me solve all the issues that using AI has caused.
      Wordpress has a pretty good export and Markdown is widely supported. If you estimate 1 month of work to get that into NextJS, then maybe the latter is not a suitable choice.
- nsoonhui 1 hour ago
  It's not directly comparable. The first time writing the code is always the hardest because you might have to figure out the requirements along the way. When you have the initial system running for a while, doing a second one is easier because all the requirements kinks are figured out.
  By the way, why does your co-founder have to do the rewrite at all?
  [-]
  - el_benhameen 1 hour ago
    I find the opposite to be true. Once you know the problem you’re trying to solve (which admittedly can be the biggest lift), writing the fist cut of the code is fun, and you can design the system and set precedent however you want. Once it’s in the wild, you have to work within the consequences of your initial decisions, including bad ones.
- segfaultex 1 hour ago
  Sounds like an argument for better hiring practices and planning.
  Producing a lot of code isn’t proof of anything.
  [-]
  - sheeh 59 minutes ago
    Yep. Let’s see the projects and more importantly the incremental returns…
- aprdm 1 hour ago
  lol same. I just wrote a bunch of diagrams with mermaid that would legit take me a week, also did a mock of an UI for a frontend engineer that would take me another week to do .. or some designers. All of that in between meetings...
  Waiting for it to actually go well to see what else I can do !
- bwestergard 1 hour ago
  Out of curiosity, what is your product?
- venndeezl 1 hour ago
  I suspect he means as a trillion dollar corporation led endeavor.
  I trained a small neural net on pics of a cat I had in the 00s (RIP George, you were a good cat).
  Mounted a webcam I had gotten for free from somewhere, above the cat door, in the exterior of the house.
  If the neural net recognized my cat it switched off an electromagnetic holding the pet door locked. Worked perfectly until I moved out of the rental.
  Neural nets are, end of the day, pretty cool. It's the data center business that's the problem. Just more landlords, wannabe oligarchs, claiming ownership over anything they can get the politicians to give them.
- fzeroracer 1 hour ago
  > Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
  This is one of those statements that would horrify any halfway competent engineer. A cowboy coder going in, seeing a bunch of code and going 'I should rewrite this' is one of the biggest liabilities to any stable system.
  [-]
  - hactually 1 hour ago
    I assume this is because they're already insanely profitable after hitting PMF and are now trying to bring down infra costs?
    Right? RIGHT?!
  - habinero 1 hour ago
    Every professional SWE is going to stare off into the middle distance, as they flashback to some PM or VP deciding to show everyone they still got it.
    The "how hard could it be" fallacy claims another!
    [-]
    - iwontberude 1 hour ago
      Definitely been in that room multiple times.
    - sheeh 45 minutes ago
      As someone who is more involved in shaping the product direction rather than engineering what composes the product - I will readily admit many product people are utterly, utterly clueless.
      Most people have no clue the craftsmanship, work etc it takes to create a great product. LLMs are not going to change this, in fact they serve as a distraction.
      I’m not a SWE so I gain nothing by being bearish on the contributions of LLMs to the real economy ;)
- mschuster91 1 hour ago
  The problem is... you're going to deprive yourself of the talent chain in the long run, and so is everyone else who is switching over to AI, both generative like ChatGPT and transformative like the various translation, speech recognition/transcription or data wrangling models.
  For now, it works out for companies - but forward to, say, ten years in the future. There won't be new intermediates or seniors any more to replace the ones that age out or quit the industry entirely in frustration of them not being there for actual creativity but to clean up AI slop, simply because there won't have been a pipeline of trainees and juniors for a decade.
  But by the time that plus the demographic collapse shows its effects, the people who currently call the shots will be in pension, having long since made their money. And my generation will be left with collapse everywhere and find ways to somehow keep stuff running.
  Hell, it's already bad to get qualified human support these days. Large corporations effectively rule with impunity, with the only recourse consumers have being to either shell out immense sums of money for lawyers and court fees or turning to consumer protection/regulatory authorities that are being gutted as we speak both in money and legal protections, or being swamped with AI slop like "legal assistance" AI hallucinating case law.
tombert 2 hours ago
I find it a bit odd that people are acting like this stuff is an abject failure because it's not perfect yet.
Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
Yes, people have probably been deploying it in spots where it's not quite ready but it's myopic to act like it's "not going all that well" when it's pretty clear that it actually is going pretty well, just that we need to work out the kinks. New technology is always buggy for awhile, and eventually it becomes boring.
[-]
- maccard 1 hour ago
  > Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
  Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding. Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits. We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.
  It can one shot a web frontend, just like v0 could in 2023. But that's still about all I've seen it work on.
  [-]
  - tombert 1 hour ago
    Sure, but think about what it's replacing.
    If you hired a human, it will cost you thousands a week. Humans will also fail at basic tasks, get stuck in useless loops, and you still have to pay them for all that time.
    For that matter, even if I'm not hiring anyone, I will still get stuck on projects and burn through the finite number of hours I have on this planet trying to figure stuff out and being wrong for a lot of it.
    It's not perfect yet, but these coding models, in my mind, have gotten pretty good if you're specific about the requirements, and even if it misfires fairly often, they can still be useful, even if they're not perfect.
    I've made this analogy before, but to me they're like really eager-to-please interns; not necessarily perfect, and there's even a fairly high risk you'll have to redo a lot of their work, but they can still be useful.
    [-]
    - falloutx 1 hour ago
      I am an AI-skeptic but I would agree this looks impressive from certain angles, especially if you're an early startup (maybe) or you are very high up the chain and just want to focus on cutting costs. On the other hand, if you are about to be unemployed, this is less impressive. Can it replace a human? I would say no its still long way to go, but a good salesman can convince executives that it does and thats all that matters.
      [-]
      - xp84 1 hour ago
        > On the other hand, if you are about to be unemployed, this is less impressive
        > salesman can convince executives that it does
        I tend to think that reality will temper this trend as the results develop. Replacing 10 engineers with one engineer using Cursor will result in a vast velocity hit. Replacing 5 engineers with 5 "agents" assigned to autonomously implement features will result in a mess eventually. (With current technology -- I have no idea what even 2027 AI will do). At that point those unemployed engineers will find their phones ringing off the hook to come and clean up the mess.
        Not that unlike what happens in many situations where they fire teams and offshore the whole thing to a team of average developers 180 degrees of longitude away who don't have any domain knowledge of the business or connections to the stakeholders. The pendulum swings back in the other direction.
      - tombert 1 hour ago
        I just think Jevins paradox [1]/Gustafson's Law [2] kind of applies here.
        Maybe I shouldn't have used the word "replaced", as I don't really think it's actually going to "replace" people long term. I think it's likely to just lead to higher output as these get better and better .
        [1] https://en.wikipedia.org/wiki/Jevons_paradox
        [2] https://en.wikipedia.org/wiki/Gustafson%27s_law
        [-]
        falloutx 1 hour ago
        Not you, but the word replaced is the being used all the time. Even senior engineers are saying they are using it as a junior engineers while we can easily hire junior engineers (but Execs don't want to). Jevon's paradox wont work in Software because user's wallets and time is limited, and if software becomes too easy to build, it becomes harder to sell. Normal people can have 5 subscriptions, may be 10, but they wont be going to 50 or 100. I would say we would have already exhausted users already, with all the bad practices.
    - maccard 23 minutes ago
      You’ve missed my point here - I agree that gen AI has changed everything and is useful, _but_ I disagree that it’s improved substantially - which is what the comment I replied to claimed.
      Anecdotally I’ve seen no difference in model changes in the last year, but going from LLM to Claude code (where we told the LLMs they can use tools on our machines) was a game changer. The improvement there was the agent loop and the support for tools.
      In 2023 I asked v0.dev to one shot me a website for a business I was working on and it did it in about 3 minutes. I feel like we’re still stuck there with the models.
      [-]
      - tombert 18 minutes ago
        In my experience it has gotten considerably better. When I get it to generate C, it often gets the pointer logic correct, which wasn't the case three years ago. Three years ago, ChatGPT would struggle with even fairly straightforward LaTeX, but now I can pretty easily get it to generate pretty elaborate LaTeX and I have even had good success generating LuaTeX. I've been able to fairly successfully have it generate TLA+ spec from existing code now, which didn't work even a year ago when I tried it.
        Of course, sample size of one, so if you haven't gotten those results then fair enough, but I've at least observed it getting a lot better.
  - Aurornis 1 hour ago
    You’re doing exactly the thing that the parent commenter pointed out: Complaining that they’re not perfect yet as if that’s damning evidence of failure.
    We all know LLMs get stuck. We know they hallucinate. We know they get things wrong. We know they get stuck in loops.
    There are two types of people: The first group learns to work within these limits and adapt to using them where they’re helpful while writing the code when they’re not.
    The second group gets frustrated every time it doesn’t one-shot their prompt and declares it all a big farce. Meanwhile the rest of us are out here having fun with these tools, however limited they are.
    [-]
    - maccard 15 minutes ago
      Someone else said this perfectly farther down:
      > The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
      As I’ve said, I use LLMs, and I use tools that are assisted by LLMs. They help. But they don’t work anywhere near as reliably as people talk about them working. And that hasn’t changed in the 18 months since I first promoted v0 to make me a website.
  - BeetleB 1 hour ago
    > We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.
    If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.
    The rest of us learn how to be productive with them despite these problems.
    [-]
    - drewbug01 1 hour ago
      > If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.
      I struggle to take comments like this seriously - yes, it is very reasonable to expect these magical tools to copy and paste something without alterations. How on earth is that an unreasonable ask?
      The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
      What, precisely, are they good for?
      [-]
      - ubercow13 1 hour ago
        It seems like just such a weird and rigid way to evaluate it? I am a somewhat reasonable human coder, but I can't copy and paste a bunch of code without alterations from memory either. Can someone still find a use for me?
      - BeetleB 1 hour ago
        For a long time, I've wanted to write a blog post on why programmers don't understand the utility of LLMs[1], whereas non-programmers easily see it. But I struggle to articulate it well.
        The gist is this: Programmers view computers as deterministic. They can't tolerate a tool that behaves differently from run to run. They have a very binary view of the world: If it can't satisfy this "basic" requirement, it's crap.
        Programmers have made their career (and possibly life) being experts at solving problems that greatly benefit from determinism. A problem that doesn't - well either that needs to be solved by sophisticated machine learning, or by a human. They're trained on essentially ignoring those problems - it's not their expertise.
        And so they get really thrown off when people use computers in a nondeterministic way to solve a deterministic problem.
        For everyone else, the world, and its solutions, are mostly non-deterministic. When they solve a problem, or when they pay people to solve a problem, the guarantees are much lower. They don't expect perfection every time.
        When a normal human asks a programmer to make a change, they understand that communication is lossy, and even if it isn't, programmers make mistakes.
        Using a tool like an LLM is like any other tool. Or like asking any other human to do something.
        For programmers, it's a cardinal sin if the tool is unpredictable. So they dismiss it. For everyone else, it's just another tool. They embrace it.
        [1] This, of course, is changing as they become better at coding.
        [-]
        maccard 19 minutes ago
        I’m perfectly happy for my tooling to not be deterministic. I’m not happy for it to make up solutions that don’t exist, and get stuck in loops because of that.
        I use LLMs, I code with a mix of antigravity and Claude code depending on the task, but I feel like I’m living in a different reality when the code I get out of these tools _regularly just doesn’t work, at all_. And to the parents point, I’m doing something wrong for noticing that?
      - tombert 1 hour ago
        I think what they're best at right now is the initial scaffolding work of projects. A lot of the annoying bootstrap shit that I hate doing is actually generally handled really well by Codex.
        I agree that there's definitely some overhype to them right now. At least for the stuff I've done they have gotten considerably better though, to a point where the code it generates is often usable, if sub-optimal.
        For example, about three years ago, I was trying to get ChatGPT to write me a C program to do a fairly basic ZeroMQ program. It generated something that looked correct, but it would crash pretty much immediately, because it kept trying to use a pointer after free.
        I tried the same thing again with Codex about a week ago, and it worked out of the box, and I was even able to get it to do more stuff.
        [-]
        smithkl42 1 hour ago
        I think it USED to be true that you couldn't really use an LLM on a large, existing codebase. Our codebase is about 2 million LOC, and a year ago you couldn't use an LLM on it for anything but occasional small tasks. Now, probably 90% of the code I commit each week was written by Claude (and reviewed by me and other humans - and also by Copilot and ZeroPath).
      - blibble 1 hour ago
        > What, precisely, are they good for?
        scamming people
      - falloutx 1 hour ago
        Its strong enough to replace humans at their jobs and weak enough that it cant do basic things. Its a paradox. Just learn to be productive with them. Pay $200/month and work around with its little quirks. /s
  - elzbardico 1 hour ago
    There’s a subtle point a moment when you HAVE to take the driver wheel from the AI. All issues I see are from people insisting to use far beyond the point it stops being useful.
    It is a helper, a partner, it is still not ready go the last mile
    [-]
    - maccard 12 minutes ago
      As someone else said in this thread:
      > The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.
      I’m perfectly happy to write code, to use these tools. I do use them, and sometimes they work (well). Other times they have catastrophic failures. But apparently it’s my failure for not understanding the tool or expecting too much of the tool, while others are screaming from the rooftops about how this new model changes everything (which happens every 3 months at this point)
    - xp84 1 hour ago
      It's funny how many people don't get that. It's like adding a pretty great senior or staff level engineer to sit on-call next to every developer and assist them, for basically free (I've never used any of the expensive stuff yet. Just things like Copilot, Grok Code in JetBrains, just asking Gemini to write bits of code for me).
      If you hired a staff engineer to sit next to me, and I just had him/her write 100% of the code and never tried to understand it, that would be an unwise decision on my part and I'd have little room to complain about the times he made mistakes.
- barbazoo 1 hour ago
  We implement pretty cool workflows at work using "GenAI" and the users of our software are really appreciative. It's like saying a hammer sucks because it breaks most things you hit with it.
- onlyrealcuzzo 1 hour ago
  > Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
  I think the big problem is that the pace of improvement was UNBELIEVABLE for about 4 years, and it appears to have plateaued to almost nothing.
  ChatGPT has barely improved in, what, 6 months or so.
  They are driving costs down incredibly, which is not nothing.
  But, here's the thing, they're not cutting costs because they have to. Google has deep enough pockets.
  They're cutting costs because - at least with the current known paradigm - the cost is not worth it to make material improvements.
  So unless there's a paradigm shift, we're not seeing MASSIVE improvements in output like we did in the previous years.
  You could see costs go down to 1/100th over 3 years, seriously.
  But they need to make money, so it's possible non of that will be passed on.
  [-]
  - tombert 35 minutes ago
    I think that even if it never improves, its current state is already pretty useful. I do think it's going to improve though I don't think AGI is going to happen any time soon.
    I have no idea what this is called, but it feels like a lot of people assume that progress will continue at a linear pace for forever for things, when I think that generally progress is closer to a "staircase" shape. A new invention or discovery will lead to a lot of really cool new inventions and discoveries in a very short period of time, eventually people will exhaust the low-to-middle-hanging fruit, and progress kind of levels out.
    I suspect it will be the same way with AI; I don't now if we've reached the top of our current plateau, but if not I think we're getting fairly close.
  - sheeh 56 minutes ago
    They are focused on reducing costs in order to survive. Pure and simple.
    Alphabet / Google doesn’t have that issue. OAI and other money losing firms do.
- 1970-01-01 59 minutes ago
  >and is likely to keep improving.
  I'm not trying to be pedantic, but how did you arrive at 'keep improving' as a conclusion? Nobody is really sure how this stuff actually works. That's why AI safety was such a big deal a few years ago.
  [-]
  - tombert 26 minutes ago
    Totally reasonable question, and I only am making an assumption based on observed progress. AI generated code, at least in my personal experience, has gotten a lot better, and while I don't think that will go to infinity, I do think that there's still more room for improvement that could happen.
    I will acknowledge that I don't have any evidence of this claim, so maybe the word "likely" was unwise, as that suggests probability. Feel free to replace "is "likely to" with "it feels like it will".
- jbs789 1 hour ago
  Because the likes of Altman have set short term expectations unrealistically high.
  [-]
  - tombert 1 hour ago
    I mean that's every tech company.
    I made a joke once after the first time I watched one of those Apple announcement shows in 2018, where I said "it's kind of sad, because there won't be any problems for us to solve because the iPhone XS Max is going to solve all of them".
    The US economy is pretty much a big vibes-based Ponzi scheme now, so I don't think we can single-out AI, I think we have to blame the fact that the CEOs running these things face no negative consequences for lying or embellishing and they do get rewarded for it because it will often bump the stock price.
    Is Tesla really worth more than every other car company combined in any kind of objective sense? I don't think so, I think people really like it when Elon lies to them about stuff that will come out "next year", and they feel no need to punish him economically.
  - hamdingers 1 hour ago
    I maintain that most anti-AI sentiment is actually anti-lying-tech-CEO sentiment misattributed.
    The technology is neat, the people selling it are ghouls.
    [-]
    - acdha 1 hour ago
      Exactly: the technology is useful but because the executive class is hyping it as close to AGI because their buddies are slavering for layoffs. If that “when do you get fired?” tone wasn’t behind the conversation, I think a lot of people would be interested in applying LLMs to the smaller subset of things they actually perform well at.
      [-]
      - tombert 11 minutes ago
        Maybe CEOs should face consequences for going on the stage and outwardly lying. Instead they're rewarded by a bump in stock price because people appear to have amnesia.
dreadsword 1 hour ago
This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.
I work commercializing AI in some very specific use cases where it extremely valuable. Where people are being lead astray is layering generalizations: general use cases (copilots) deployed across general populations and generally not doing very well. But that's PMF stuff, not a failure of the underlying tech.
[-]
- kokanee 1 hour ago
  I think both sides of this debate are conflating the tech and the market. First of all, there were forms of "AI" before modern Gen AI (machine learning, NLP, computer vision, predictive algorithms, etc) that were and are very valuable for specific use cases. Not much has changed there AFAICT, so it's fair that the broader conversation about Gen AI is focused on general use cases deployed across general populations. After all, Microsoft thinks it's a copilot company, so it's fair to talk about how copilots are doing.
  On the pro-AI side, people are conflating technology success with product success. Look at crypto -- the technology supports decentralization, anonymity, and use as a currency; but in the marketplace it is centralized, subject to KYC, and used for speculation instead of transactions. The potential of the tech does not always align with the way the world decides to use it.
  On the other side of the aisle, people are conflating the problematic socio-economics of AI with the state of the technology. I think you're correct to call it a failure of PMF, and that's a problem worth writing articles about. It just shouldn't be so hard to talk about the success of the technology and its failure in the marketplace in the same breath.
- Aurornis 1 hour ago
  > This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.
  I haven’t followed this author but the few times he’s come up his writings have been exactly this.
1a527dd5 1 hour ago
A year ago I would have agreed wholeheartedly and I was a self confessed skeptic.
Then Gemini got good (around 2.5?), like I-turned-my-head good. I started to use it every week-ish, not to write code. But more like a tool (as you would a calculator).
More recently Opus 4.5 was released and now I'm using it every day to assist in code. It is regularly helping me take tasks that would have taken 6-12 hours down to 15-30 minutes with some minor prompting and hand holding.
I've not yet reached the point where I feel letting is loose and do the entire PR for me. But it's getting there.
[-]
- kstrauser 1 hour ago
  > I was a self confessed skeptic.
  I think that's the key. Healthy skepticism is always appropriate. It's the outright cynicism that gets me. "AI will never be able to [...]", when I've been sitting here at work doing 2/3rds of those supposedly impossible things. Flawlessly? No, of course not! But I don't do those things flawlessly on the first pass, either.
  Skepticism is good. I have no time or patience for cynics who dismiss the whole technology as impossible.
- spaceywilly 1 hour ago
  I would strongly recommend this podcast episode with Andrej Karpathy. I will poorly summarize it by saying his main point is that AI will spread like any other technology. It’s not going to be a sudden flash and everything is done by AI. It will be a slow rollout where each year it automates more and more manual work, until one day we realize it’s everywhere and has become indispensable.
  It sounds like what you are seeing lines up with his predictions. Each model generation is able to take on a little more of the responsibilities of a software engineer, but it’s not as if we suddenly don’t need the engineer anymore.
  https://www.dwarkesh.com/p/andrej-karpathy
  [-]
  - sheeh 54 minutes ago
    AI first of all is not a technology.
    Can people get their words straight before typing?
daedrdev 6 hours ago
This post is literally just 4 screenshots of articles, not even its own commentary or discussion.
saberience 1 hour ago
Gary Marcus (probably): "Hey this LLM isn't smarter than Einstein yet, it's not going all that well"
The goalposts keep getting pushed further and further every month. How many math and coding Olympiads and other benchmarks will LLMs need to dominate before people will actually admit that in some domains it's really quite good.
Sure, if you're a Nobel prize winner or PhD then LLMs aren't as good as you yet, but for 99% of the people in the world, LLMs are better than you at Math, Science, Coding, and every language probably except your native language, and it's probably better at you at that too...
m463 45 minutes ago
I see stuff like this and think of these two things:
1) https://en.wikipedia.org/wiki/Gartner_hype_cycle
or
2) "First they ignore you, then they laugh at you, then they fight you, then you win."
or maybe originally:
"First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you"
smashed 2 hours ago
Should have used an LLM to proofread.. LLMs can still cannot be trusted?
[-]
- warkdarrior 1 hour ago
  How dare you accuse Gary-Marcus-5.2-2025-12-11 of being an LLM??
emp17344 6 hours ago
Guessing this isn’t going to be popular here, but he’s right. AI has some use cases, but isn’t the world-changing paradigm shift it’s marketed as. It’s becoming clear the tech is ultimately just a tool, not a precursor to AGI.
[-]
- teej 2 hours ago
  Is that the claim the OP is making?
- avaer 2 hours ago
  If AGI is ever going to happen, then it's definitionally a precursor to it.
  So I'm not really sure how to parse your statement.
  [-]
  - alex_young 1 hour ago
    I’m not sure I follow. What if LLMs are helpful but not useful to AGI, but some other technology is? Seems likely.
    [-]
    - avaer 1 hour ago
      The comment wasn't referencing LLMs, but generative AI.
      Even then, given the deep impact of LLMs and how many people are using them already, it's a stretch to say LLMs will have no effect on the development of AGI.
      I think it's pretty obvious that AGI requires something more than LLMs, but I think it's equally obvious LLMs will have been involved in its development somewhere, even if just a stepping stone. So, a "precursor".
- sajithdilshan 2 hours ago
  not YET.
billsunshine 1 hour ago
a historic moron. Marcus will make Krugman's internet==fax machine look like a good prediction
thechao 6 hours ago
You're absolutely right!
The irony of a five sentence article making giant claims isn't lost on me. Don't get me wrong: I'm amenable to the idea; but, y'know, my kids wrote longer essays in 4th grade.
herunan 1 hour ago
First of all, popping in a few screenshots of articles and papers is not proper analysis.
Second of all, GenAI is going well or not depending on how we frame it.
In terms of saving time, money and effort when coding, writing, analysing, researching, etc. It’s extremely successful.
In terms of leading us to AGI… GenAI alone won’t reach that. Current ROI is plateauing, and we need to start investing more somewhere else.
rpowers 1 hour ago
I keep reading comments that claim GenAI's positive traits, but this usually amounts to some toy PoC that very eerily mirrors work found in code bootcamps. You want an app that has logins and comments and upvotes? GenAI is going to look amazing setting up a non-relational db to your node backend.
sghiassy 6 hours ago
LLMs help me read code 10x faster - I’ll take the win and say thanks
anarticle 16 minutes ago
Download models you can find now and forever. The guardrails will only get worse, or models banned entirely. Whether it's because of "hurts people's health" or some other moral panic, it will kill this tech off.
gpt-oss isn't bad, but even models you cannot run are worth getting since you may be able to run them in the future.
I'm hedging against models being so nerfed they are useless. (This is unlikely, but drives are cheap and data is expensive.)
mrbluecoat 1 hour ago
> LLMs can still cannot be trusted
But can they write grammatically correct statements?
mythrwy 2 hours ago
It's going well for coding. I just knocked out a mapping project that would have been a week+ of work (with docs and stackoverflow opened in the background) in a few hours.
And yes, I do understand the code and what is happening and did have to make a couple of adjustments manually.
I don't know that reducing coding work justifies the current valuations, but I wouldn't say it's "not going all that well".
afspear 1 hour ago
Meanwhile I'm over here reducing my ADO ticket time estimates by 75%.
amw-zero 1 hour ago
I’m starting to think this take is legitimately insane.
As said in the article, a conservative estimate is that Gen AI can currently do 2.5% of all jobs in the entire economy. A technology that is really only a couple of years old. This is supposed to be _disappointing_? That’s millions of jobs _today_, in a totally nascent form.
I mean I understand skepticism, I’m not exactly in love with AI myself, but the world has literally been transformed.
robertclaus 2 hours ago
Odds this was AI generated?
[-]
- kingstnap 1 hour ago
  It's literally just four screenshots paired with this sentence.
  > Trying to orient our economy and geopolitical policy around such shoddy technology — particularly on the unproven hopes that it will dramatically improve– is a mistake.
  The screenshots are screenshots of real articles. The sentence is shorter than a typical prompt.
andy99 1 hour ago
Gary Marcus is a troll, there is no novel viewpoint he brings, he just has an anti-AI blog and keeps repeating the same tired stuff.
Healthy scepticism is fine, this is not that, it’s just a crazy person yelling in the background.
Jadiiee 1 hour ago
It's more about how you use it. It should be a source of inspo. Not the end all be all.
bawolff 1 hour ago
Holy moving goal posts batman!
I hate generative AI, but its inarguable what we have now would have been considered pure magic 5 years ago.
meowface 1 hour ago
How on Earth do people keep taking Gary Marcus seriously?
[-]
- throw310822 50 minutes ago
  He's such a joke that even LLMs make fun of him. The Gemini-generated Hacker News frontpage for December 9 2035 contains an article by Gary Marcus: "AI progress is stalling": https://dosaygo-studio.github.io/hn-front-page-2035/news
- andy99 1 hour ago
  I think they haven’t heard of him, or just react to the headline. It’s hard to imagine a circle in which he isn’t a joke. Even if you agree with him, it’s the broken clock thing. Being a contrarian is easy, making correct, actionable bets is not, and he has made none.
w4yai 1 hour ago
[flagged]
segfaultex 1 hour ago
I wholeheartedly agree. Shitty companies steal art and then put out shitty products that shitty people use to spam us with slop.
The same goes for code as well.
I’ve explored Claude code/antigravity/etc, found them mostly useless, tried a more interactive approach with copilot/local models/ tried less interactive “agents”/etc. it’s largely all slop.
My coworkers who claim they’re shipping at warp speed using generative AI are almost categorically our worst developers by a mile.