What makes 5% of AI agents work in production?

(motivenotes.ai)

121 points | by AnhTho_FR 5 days ago

19 comments

sbierwagen 1 day ago
>This Monday, I moderated a panel in San Francisco with engineers and ML leads from Uber, WisdomAI, EvenUp, and Datastrato. The event, Beyond the Prompt, drew 600+ registrants, mostly founders, engineers, and early AI product builders.
>We weren’t there to rehash prompt engineering tips.
>We talked about context engineering, inference stack design, and what it takes to scale agentic systems inside enterprise environments. If “prompting” is the tip of the iceberg, this panel dove into the cold, complex mass underneath: context selection, semantic layers, memory orchestration, governance, and multi-model routing.
I bet those four people love that the moderator took a couple notes and then asked ChatGPT to write a blog post.
As always, the number one tell of LLM output, besides the tone, is that by default it will never include links in the body of the post.
[-]
- stingraycharles 1 day ago
  Yeah, “here’s the reality check:”, “not because they’re flashy, but because they’re blah blah”.
  Why can’t anyone be bothered anymore to write actual content, especially when writing about AI, where your whole audience is probably already exposed to these patterns in content day in, day out?
  It comes off as so cheap.
  [-]
  - mccoyb 1 day ago
    It comes off as someone who lives their life according to quantity, not quality.
    The real insight: have some fucking pride in what you make, be it a blog post, or a piece of software.
    [-]
    - palmotea 1 day ago
      > The real insight: have some fucking pride in what you make, be it a blog post, or a piece of software.
      The businessmen's job will be complete when they've totally eliminated all pride from work.
      [-]
      - philipallstar 21 hours ago
        This same instinct is why a pencil costs almost nothing and is perfect, and isn't rubbish, really expensive, and created by someone who took pride in their work.
        [-]
        soks86 21 hours ago
        I hope you don't take pride in that sentence because I'm still not sure what it means.
        Also, automation and pride can go hand in hand. Pride doesn't mean "make it by hand," that would be silly.
        [-]
        philipallstar 21 hours ago
        To put it another way: an apocryphal businessman took something that people took pride in and gradually optimised everything so much that all the logging, transportation, graphite work and combination resulted in a perfect pencil that costs basically nothing almost anywhere in the world.
        [-]
        gf000 16 hours ago
        Pencils here are a bit like grains. The market works for them because they fall into such a niche that economic "laws" works there.
        But it's a fallacy to apply it elsewhere and there are millions of examples where the free market failed to optimize a product.
        [-]
        philipallstar 2 hours ago
        I don't agree. Loads of things are like this. Cars, microchips, hard drive storage, monitors, TVs, laptops. All either much better than they used to be, or much cheaper, or both.
        palmotea 20 hours ago
        > This same instinct is why a pencil costs almost nothing and is perfect, and isn't rubbish, really expensive, and created by someone who took pride in their work.
        No. Have you worked with businessmen? 90% of the time they're telling you to cut corners and leave things broken, to the point you have a janky mess that can be barely held together. And, right now, we're talking about a technology (LLMs) that is well known to introduce stupid but often hard to spot errors.
        They don't want a pencil that's perfect. They want one that's just barely good enough to write with and that they can get maximum profit margin on.
        And then, you know, there's the whole thing about life being more than output.
        [-]
        philipallstar 19 hours ago
        Life can be more than output, which is why you don't want buying pencils, or anything else, to take up any more of your wages than is absolutely necessary.
        [-]
        palmotea 18 hours ago
        > Life can be more than output, which is why you don't want buying pencils, or anything else, to take up any more of your wages than is absolutely necessary.
        You're not getting it. It'd probably help if you stopped focusing on your pencil story, it's frankly off-topic.
        To try one more time: You probably spend half your waking ours at work. The quality of that time is important to your well being. Even if the businessmen sell you cheap, perfect pencils (which I do not grant), swimming in them in your off hours won't help with the other half of your time.
        [-]
        philipallstar 2 hours ago
        > It'd probably help if you stopped focusing on your pencil story, it's frankly off-topic.
        I've no idea what this italicisation is meant to do; nor why this is off-topic. Stating things isn't explaining them.
        > Even if the businessmen sell you cheap, perfect pencils (which I do not grant), swimming in them in your off hours won't help with the other half of your time.
        It helps in that I don't have to spend as much of my time working to buy pencils. It's the same with everything. There's no reason why a laptop doesn't cost $1m except that the incredible, detailed, cross-continent cooperative work is done by experts and coordinated by a market for that work driving costs down and quality up.
        blargey 8 hours ago
        Do you actually use pencils? The most popular US (cheapo) brands have atrocious quality because they compromised on materials and construction to get the lowest sticker price possible.
        The brands that do have a claim to "perfection" necessarily had the pride to not participate in that race to the bottom.
    - WhyOhWhyQ 3 hours ago
      Where's the pride in what you make when you're using AI agents? Seems like you're fantasizing about a by-gone era. The name of the activity, "vibe-coding", already makes it clear that this is a pride-free industry.
    - jihadjihad 22 hours ago
      Don't forget to turn your point into a playful rhetorical question [0].
      "The real insight?"
      0: https://en.wikipedia.org/wiki/Hypophora
    - Analemma_ 21 hours ago
      Taking pride in your work makes your labor more expensive than that of someone who does not do this, so over time as "efficiency" increases, you will eventually be removed and replaced by someone without these compunctions. Taking no pride in your work is economically rational and maximizes your long-term value to capital.
      [-]
      - mccoyb 20 hours ago
        Economically rational, but bereft of identity or _soul_ -- which, paradoxically, becomes highly valued when economically rational agents all regress to a mean of mediocrity.
        [-]
        littlecosmic 2 hours ago
        Valued by the worker to give meaning and quality of life not by the buyer - so it does carry much weight.
  - alexchantavy 1 day ago
    Yeah it bugs me. We've got enough examples in this article to make Cards Against Humanity ChatGPT edition
    > One panelist shared a personal story that crystallized the challenge: his wife refuses to let him use Tesla’s autopilot. Why? Not because it doesn’t work, but because she doesn’t trust it.
    > Trust isn’t about raw capability, it’s about consistent, explainable, auditable behavior.
    > One panelist described asking ChatGPT for family movie recommendations, only to have it respond with suggestions tailored to his children by name, Claire and Brandon. His reaction? “I don’t like this answer. Why do you know my son and my girl so much? Don’t touch my privacy.”
    [-]
    - nylonstrung 1 hour ago
      Are there any good lists of these GPTisms or research on the common patterns?
      Beyond the em dashes and overuse of "delve" etc. there is this distinctive style of composition I want to understand and recognize better
    - stingraycharles 1 day ago
      Yeah, AI isn’t creative. You need to ask it to describe these types of patterns, and then include avoiding them in your original prompt to make it come across as somewhat natural.
      What I wonder is whether the author of the article recognized these patterns and didn’t care, didn’t even recognize them, or didn’t proofread the article?
      [-]
      - EForEndeavour 1 day ago
        I gather he's operating Beyond the Prompt, and isn't here to rehash prompt engineering tips.
        [-]
        collingreen 21 hours ago
        This made me chuckle
    - donnaoana 19 hours ago
      it's not written by AI
      [-]
      - sethaurus 14 hours ago
        You've said plainly elsewhere in these comments that you did use AI to write it:
        > thanks, I used AI but aren't we all? I thought the point of AI is to get us to be more productive.
        You've also repeatedly dismissed any criticism of the writing as "hate."
        If you want readers to do you the favor of reading your work, please do them the favor of writing it.
  - rapind 1 day ago
    > Why can’t anyone be bothered anymore to write actual content
    The way I see it is that the majority of people never bothered to write actual content. Now there’s a tool the non-writers can use to write dubious content.
    I would wager this tool is being used much differently by actual writers focused on producing quality. There’s just way less of them, same way there is less of any specialization.
    The real question with AI to me is whether it will remain consistently better when wielded by a specialist who has invested their time into whatever the thing is they are producing. If that ever changes then we are doomed. When it’s no longer slop…
    [-]
    - stingraycharles 1 day ago
      That’s a good insight. So basically we have a whole new generation of authors out there, in the same way we have a whole new generation of coders out there.
      Perhaps they can be called vibe bloggers?
      What bothers me compared to code is that for software, the code is just a means to and end. But for articles, it’s much more than that.
      I wonder how this will end up affecting our lives. Last week I saw a video that highlighted how AI is already affecting our vocabulary. It introduces words not typically used in American English (but more commonly used in Nigeria, where a lot of content writing is outsourced to) into mainstream media.
      I can totally see how this will slowly start affecting language itself.
    - retSava 1 day ago
      You're absolutely right! (/s)
      The tone of AI-written stuff sounds to me just like the soul-less SEO-optimized content marketing blog crap we saw the years before AI became a thing. Very prevalent on Linkedin too. It just sounds/reads so hopelessly artificial.
      If I were to begin using AI to write stuff for me (comments or articles or whatever), I'd at least begin with having it train on the collection of everything I've written so far.
      [-]
      - collingreen 21 hours ago
        This makes sense and is extremely possible and how I thought these things would be positioned in the first place. In surprised we don't see this more - would be better results, less shame thrown at users, and make the product stickier.
      - pickledoyster 7 hours ago
        SEO slop is what the LLMs were trained on. GIGO
- tkgally 1 day ago
  I started to suspect a few paragraphs in that this post was written with a lot of AI assistance, but I continued to read to the end because the content was interesting to me. Here's one point that resonated in particular:
  "There’s a missing primitive here: a secure, portable memory layer that works across apps, usable by the user, not locked inside the provider. No one’s nailed it yet. One panelist said if he weren’t building his current startup, this would be his next one."
  [-]
  - donnaoana 20 hours ago
    thanks, I used AI but aren't we all? I thought the point of AI is to get us to be more productive. But that's only after I came up with the questions for the speakers and I wrote a draft of the blog, and the penelists read it, added comments and I published. It seems I get a lot of hate here for it, but I am happy with the number of engineers and founders sharing feedback that this was useful to them. I'm not forcing anyone to read my content, but if people want to put the time to hate on it, it's their choice.
    [-]
  - ares623 1 day ago
    Isn’t that markdown files?
    [-]
    - tkgally 1 day ago
      I was thinking about consumer-facing AI products, where md files controlled by the user presumably wouldn’t fly.
      I find it annoying that, when prompting ChatGPT, Claude, Gemini, etc. on personal tasks through their chat interfaces, I have to provide the same context about myself and my job again and again to the different providers.
      The memory functions of the individual providers now reduce some of that repetition, but it would be nice to have a portable personal-memory context (under my control, of course) that is shared with and updated semiautomatically by any AI provider I interact with.
      As isoprophlex suggests in a sister comment, though, that would be hard to monetize.
      [-]
      - ares623 1 day ago
        Brb going to squat openmemory.org
        Edit: Aaaand it’s gone.
    - isoprophlex 1 day ago
      Sheesh how ever will you monetize a text file
      Will someone please think of the MRR!
- esperent 1 day ago
  > the number one tell of LLM output, besides the tone, is that by default it will never include links in the body of the post.
  This isn't true. I've been using Gemini 2.5 a lot recently and I can't get it to stop adding links!
  I added custom instructions: Do not include links in your output. At the start of every reply say "I have not added any links as requested".
  It works for the first couple of responses but then it's back to loads of links again.
- donnaoana 20 hours ago
  thanks for the hate, they did love it indeed, the questions I've asked them, the draft I wrote for them to read, and published only after they read and added comments. I am curious, do you not use AI? isn't the point to polish things and make it more efficient? I am curious if there was anything useful to you in the article or if you have constructive criticism? I was sad to read some of the hate, but overall, I am very happy with the many notes form founders and builders who found it useful.
- carimura 22 hours ago
  the future is now where debates about human vs machine will influence our trust and enjoyment! I read the article wondering how much of it was AI generated (new worry!), but also how biased it was based on the authors startup business interest (old worry!), and concluded that if I learned something about the panel it was worth the 5 minutes. Or maybe 2 minutes if an AI summarized it.
- geoffbp 22 hours ago
  And the Oxford comma
  [-]
  - collingreen 21 hours ago
    Nooooo I believe in the oxford comma don't let them drag it down! :(
- scotty79 1 day ago
  It did good enough job for me to skim it.
thisisit 22 hours ago
It seems to me that people think AI is somehow magic. Recently I led a product demo. The conversation went something like this:
End users (at my company) - Can your AI system look at numbers and find differences and generate a text description?
Pre-sales - (trying to clarify) For our systems to generate text it will be better if you give it some live examples so that it understands what text to generate.
End users - But there is supporting data (metadata) around the numbers. Can't your AI system just generate text?
Pre-Sales - It can but you need to provide context and examples. Otherwise it is going to generic text like "there is x difference".
End user - You mean I need to write comments manually first? That is too much work.
Now these users have a call with another product - MS Copilot.
[-]
- beezlebroxxxxxx 21 hours ago
  Well, you hear a lot about how AI will "empower" employees and generate new "insights" based off of data for analysts and execs. In reality, most executives aren't really interested in that. They'd like it for sure, but really what they want is automation. They want "efficiencies"; they want cost cutting.
  Anyone that's been involved in data science roles in corporate environments knows that "the data" is usually forced into an execs pre-existing understanding of a phenomenon. With AI, execs are really excited at "cutting out the middlemen" when the middlemen in the equation are very often their own paid employees. That's all fine and dandy in an abstract economic view, but it's sure something they won't say publicly (at least most won't).
  In terms of potential cost cutting, it probably is the most recent "new magic". You used to have to pay a consultant, now you can "ask AI".
- nowittyusername 21 hours ago
  This is a very common sentiment I see everywhere and it really highlights how uneducated most people are about technology in general. Most folks seem to expect things to work magically and perform physics breaking feats and it honestly baffles me. I would expect this attitude from maybe the younger generations who grew up only being users of technology like tablets and smartphones, but I honestly never expected millennials to be in the same camp, but nope they are just as ignorant. And I am thinking to myself, did I grow up different? Were my friends also not using the same Nintendo cartridges, and VCR's and camcorders and all the other tech that you had no choice but to learn at least basic fundamentals to use? Apparently most people never delved deeper then surface level on how to use these things and everything else went right over their head...
  [-]
  - __s 21 hours ago
    Vonnegut in On Writing Science Fiction reflected on Player Piano being labeled sci-fi since it involved machines, "The feeling persists that no one can simultaneously be a respectable writer and understand how a refrigerator works, just as no gentleman wears a brown suit in the city"
  - nitwit005 19 hours ago
    Plenty of people have a story of managers asking them to do impossible or nonsensical things. It should be unsurprising people will do the same with a machine.
  - TheHegemon 21 hours ago
    > Apparently most people never delved deeper then surface level on how to use these things and everything else went right over their head...
    This is really the truth of all things in life.
  - bluefirebrand 15 hours ago
    > Most folks seem to expect things to work magically and perform physics breaking feats and it honestly baffles me
    This is how it is being marketed and I guess people are silly enough to believe marketing so it's not too surprising
- hadlock 19 hours ago
  The MS Copilot pre-sales person responded "oh, there is metadata? then yes, it will discover that and generate a text description, no problem"
- alansaber 21 hours ago
  TBF synthetic data generation exists for this reason. I do understand why a lot of companies go with the "safe" choice (copilot) even though it's crap.
- alganet 21 hours ago
  > It seems to me that people think AI is somehow magic.
  That's because it is marketed as magic. It's marketed as magic so people will adopt the thing before knowing its shortcomings.
  https://pbfcomics.com/comics/the-masculator/
- amenhotep 17 hours ago
  Pray, Mr Babbage, etc
iagooar 1 day ago
Wow, half of this article deeply resonates with what I am working on.
Text-to-SQL is the funniest example. It seems to be the "hello world" of agentic use in enterprise environments. It looks so easy, so clear, so straight-forward. But just because the concept is easy to grasp (LLMs are great at generating markup or code, so let's have them translate natural language to SQL) doesn't mean it is easy to get right.
I have spent the past 3 months building a solution that actually bridges the stochastic nature of AI agents and the need for deterministic queries. And boy oh boy is that rabbit hole deep.
[-]
- data-ottawa 20 hours ago
  SQL is never just the tables and joins, it’s knowing the table grains, the caveats, all the modelling definitions and errors (and your data warehouse almost certainly has modelling errors as business logic in your app drifts), plus the business context to correctly answer questions.
  60% of the time I spend writing sql is probably validation. A single hallucinated assumption can blow the whole query. And there are questions that don’t have clear modelling approaches that you have to deal with.
  Plus, a lot of the sql training data in LLMs is pretty bad, so I’ve not been impressed yet. Certainly not to let business users run an AI query agent unchecked.
  I’m sure AI will get good at this, so I’m building up my warehouse knowledge base and putting together documentation as best I can. It’s just pretty awful today.
- jamesblonde 1 day ago
  Text2SQL was 75% on bird-bench 6 months ago. Now it's 80%. Humans are still at 90+%. We're not quite there yet. I suspect text-to-sql needs a lot of intermediate state and composition of abstractions, which vanilla attention is not great at.
  https://bird-bench.github.io/
  [-]
  - ares623 1 day ago
    Text to sql is solved by having good UX and a reasonable team that’s in touch with the customers needs.
    A user having to come up with novel queries all the time to warrant text 2 sql is a failure of product design.
    [-]
    - strange_quark 20 hours ago
      This 1000x. I’ve sat through several vendor demos of BI tools that have a chatbot and seen my PM go all starry eyed that you can ask it “show me top x over the last week” and get a chart back. How an empty text box is easier to use than a UI with several filter drop-downs, I’ll never understand, and I suspect that the people impressed with this stuff don’t know either.
    - caust1c 1 day ago
      This is exactly it. AI is sniffing out the good datamodels from the bad. Easy to understand? AI can understand it too! Complex business mess with endless technical debt? Not too much.
      But this is precisely why we're seeing startups build insane things fast while well established companies are still questioning if it's even worth it or not.
  - impossiblefork 21 hours ago
    There were some iffy things about the text to SQL datasets though, historically.
    People got good results on the test datasets, but the test datasets had errors so the high performance was actually just the models being overfitted.
    I don't remember where this was identified, but it's really recent, but before GPT-5.
- juleiie 1 day ago
  > building a solution that actually bridges the stochastic nature of AI agents and the need for deterministic queries
  Wait but this just sounds unhinged, why oh why
  [-]
  - pbronez 23 hours ago
    The problem is that precision is expensive. Writing is thinking. Writing software is defining the business problem.
    People don't know exactly what they want from the data warehouse, just a fuzzy approximation of it. You need stochastic software (AI) to map the imprecise instructions from your users to precise instructions the warehouse can handle.
- donnaoana 20 hours ago
  glad it resonates, that was the intention
another_twist 1 day ago
So I have read the MIT paper and the methodology as well as the conclusions are just something else.
For example, the number comes from perceived successes and failures and not actual measurements. The customer conclusions are also - it doesnt improve or it doesnt remember. Literally buying into the hype of recursive self improvement and completely oblivious to the fact that API dont control model weights and such cant do much self improvement besides writing more CRUD layers. The other complaints are about integrations which are totally valid. But in industries which still run windows XYZ without any API platforms so thats not going away in those cases.
Point being, if the paper itself is not very good discourse just a well marketed punditry, why should we discuss on the 5% number. It makes no sense.
AdieuToLogic 1 day ago
It's funny that what the author identifies as "the reality check":
```
  Here’s the reality check: One panelist mentioned that 95%
  of AI agent deployments fail in production. Not because the 
  models aren’t smart enough, but because the scaffolding 
  around them, context engineering, security, memory design, 
  isn’t there yet.
```
Could be a reasonable definition of "understanding the problem to solve."
In other words, everything identified as what "the scaffolding" needs is what qualified people provide when delivering solutions to problems people want solved.
[-]
- whatever1 1 day ago
  They fail because the “scaffolding” is building the complicated expert system that AI promised that one would not have to do.
  If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middle altogether.
  [-]
  - nylonstrung 1 hour ago
    Very interesting that you've found ways to mitigate the hallucination issue. Are you able to share more about what worked for you with the post processor and parser?
  - moduspol 20 hours ago
    > If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middle altogether.
    You might even be able to put a UI on it that is a lot more effective than asking the user to type text into a box.
  - AdieuToLogic 1 day ago
    > If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middles altogether.
    Well said and I could not agree more.
- mnky9800n 23 hours ago
  You see, in order to get the AI agent to do it's job, we needed to write a lot of software to provide it with guard rails so that it doesn't lose its mind when doing so.
  might as well just write the ai agent part of the software yourself as well.
- codyb 20 hours ago
  At work we're deploying a chat bot to help users with our internal tools and it's just a forcing function to write and mark as deprecated the documentation we never maintained in the first place.
  So...
  The bot, to its credit, returns some decent results. But my guess is that it will be quite a while before we see it in prod since a lot of these projects go from 0 - 80% in a week and 80% - deployable in several years.
- danieltanfh95 1 day ago
  It is really just BS. These are just basic DSA stuff. We deployed a real world solution by doing of all of that on our side. It's not magic. It's engineering.
marcosdumay 20 hours ago
Those 5% that generate revenue on the MIT article do that because the only thing they are used for is creating marketing spam to send to people.
And now we have an entire panel of bullshitters with an article-long theory about how to make LLMs program actually for real this time.
(Oh, and it would be great if journalists actually cited their public sources, instead of pretending they link to the article but actually linking to their review of related content.)
ares623 1 day ago
At some point, say 5 years from now, someone will revisit their AI-powered production workloads and ask the question "how can we optimize this by falling back to non-AI workload?". Where does that leave AI companies when the obvious choice is to do away with their services once their customers reach a threshold?
[-]
- anonzzzies 1 day ago
  A lot of what we encounter is; there is this 'chat' interface which is the 'wow factor': you type something in english and something (like text to sql) falls out, maybe 60-80% of what was needed. But then the frustration (for the user) starts: the finetuning of the result. After a few uses, they always ask for the 'old way' back to do that: just editing the query or give them knobs to turn to finetune the result. Where most want knobs which are, outside the most generic cases (pick a timespan for a datetime column), custom work. So AI is used for the first 10% of the work time (which gives you 60%+ of the solution) until the frustration lands: the last 40% or less are going to take 90% of your time. Still great as overall it will probably take far less time than before.
- EdwardDiego 1 day ago
  "Huh, turns out we could replace it all with a 4 line Perl script doing linear regression."
  [-]
  - ares623 1 day ago
    “How I used ancient programming techniques to save the company $100k/year in token costs”
    [-]
    - topaz0 21 hours ago
      They're going to need gigawatts worth of datacenters just to hold all the posts with that title.
zoeey 4 hours ago
I've always felt the real challenge isn't the LLM itself, but managing the context around it. Many people assume that writing a good prompt is enough, but the real work is turning something unpredictable into a tool you can actually rely on.
intended 1 day ago
I just refuse to read long AI generated text. Sadly this feels exactly like that.
[-]
- donnaoana 20 hours ago
  I am curious, how would you use AI then, if not to make one more productive? The text is not AI generated, I came up with the questions, moderated the discussion, wrote a draft that the speakers red, added comments and the AI polished it, the AI was a custom GPT that I trained on my previous text from that substack. I am curious what would you have done differently or if you would refuse to use AI at all? I wrote the article, so I am genuinly curious. I didn't know someone posted on Hacker News, I knew people like to be negative here because there is no accountablity, I want to learn from all this hate. I am personally happy with the outcome, I gover over 30 notes from people who are building that this was useful to them, and the speakers were happy. So I am curious what could have I done differently from your perspective or what should be my learning from all these people who take time from their day to hate on this piece of writing instead of deciding not to read and moving on.
  [-]
  - intended 19 hours ago
    Hey it’s your call! As you said it’s your productivity.
    If you said it’s something you made for perusal and reading? Then it reads like AI.
    I’ve had to read tons of papers and articles, the most testing being conference submissions. I won’t read something with that structure unless I have to.
- codyb 20 hours ago
  I get really frustrated when I see it on PRs cause it's such a time sink, super obvious, and so fluffy.
  So you scaffold this up in 30 seconds but want me to read through it carefully? Cool, thanks.
EdwardDiego 1 day ago
> One team suggested that instead of text-to-SQL, we should build semantic business logic layers, “show me Q4 revenue” should map to a verified calculation, not raw SQL generation.
Okay, how would that work though? Verified by who and calculated by what?
I need deets.
[-]
- meheleventyone 23 hours ago
  They're saying that someone should implement the CalculateQuarterRevenue(year, quarter) function somewhere in a manner that has been verified (e.g. run it against previous quarters to make sure it works correctly) then rather than using the LLM to generate SQL you use it to decide what domain function should be called. Which to me seems to mean that someone on the panel was gently taking the piss out of the idea. Since if you've done all the hardwork anyway presenting this in a deterministic way with a nice UX is straightforward bit of front end work.
  [-]
  - moduspol 20 hours ago
    It also removes a lot of the value of the LLM. They're perceived as being smart, and the interface (open-ended text) implies they are capable of more than executing pre-defined functions.
    So if you have a "CalculateQuarterRevenue(year, quarter)" function, you'll soon find your users asking for the data per-month. Or just for the last six weeks. Or just for a specific client. And they'll be confused when it doesn't work.
    [-]
    - esafak 19 hours ago
      The conversational user interface is misleading then, isn't it? It can't make you a sandwich either, though it allows you to submit this request.
      [-]
      - moduspol 18 hours ago
        Yes. The sandwich example is contrived, but the basis is "discoverability." It's very opaque to the user what actually can be done and how reliable the result is.
        Compare this to basically any website you've ever been to. It's the "GUIs vs. CLIs" discussion all over again, except even CLIs had man pages for discoverability.
- esafak 22 hours ago
  In other words, there should be a list of predefined queries, or possibly subqueries, that the user can request. This is basically how products used to work before AI. The difference is now you can request which query you want verbally.
  edit: I'm serious. I'm just answering the question, not making a value judgement.
  [-]
  - lesuorac 22 hours ago
    I assume you're being tongue in check but I've watched a lot of people use software and they really just don't know anything about it. Being able to verbally request something is something they can learn to do while googling how do I normalize the scores in my rubric to add up to 100 is something they couldn't.
    Verbal queries is the solution for the world we have even if it's not optimal.
    [-]
    - slfnflctd 21 hours ago
      Your last sentence sums it up. This is what users want.
      The main killer app, I think, boils down really expensive speech-to-text (and vice versa) with a reasonable number of seemingly authoritative querying details in fairly plain language. It's a new, 'better' search engine, just with different pitfalls people need to get up to speed on. And that may be enough, because employing humans to fill the same role as effectively is more expensive still.
  - thr0w 22 hours ago
    So simple classification problem. Big deal.
- dchftcs 1 day ago
  On one side, you have an agent calculating the revenue.
  On the other side, you have an SQL that calculates the revenue
  Compare the two. If the two disagree, get the AI to try again. If the AI is still wrong after 10 tries, just use the SQL output.
  [-]
  - mnky9800n 23 hours ago
    so you have an answer and then you throw compute at trying to produce the answer in a different way.
    What I hear is a billion dollar AI startup in the making!
- tirumaraiselvan 1 day ago
  A simple way is perhaps implement a text-to-metrics system where metrics could be defined as SQL functions.
- moomoo11 1 day ago
  psychedelics
LogicFailsMe 22 hours ago
95% of the talent is being paid top dollar to build ~5% of the applications?
[-]
- alansaber 21 hours ago
  Absolutely, when we're talking about infrastructure versus model development (RL/fine tuning, let alone pre-training).
janalsncm 20 hours ago
> The panel’s consensus: conversation works when it removes a learning curve.
Conversational UIs are controversial but I think there are a good number of websites where a better search could be more centric. Not generating text, but surfacing the most relevant text.
I’m thinking of a lot of library documentation, government info websites, etc. Basically an improvement over deep hierarchical navigation, where their way of organizing info is a leaky abstraction.
Maybe that will be one of the side effects of this AI boom. Who knows.
monero-xmr 1 day ago
A non-open ended path collapses into a decision tree. Very hard to think of customer support use-cases that do not collapse into decision trees. Most prompt engineering on the SaaS side results in very long prompts to re-invent decision trees and protect against edge cases. Ultimately the AI makes a “decision function call” which hits a decision tree. LLM is very poor replacement for a decision tree.
I use LLM every day of my life to make myself highly productive. But I do not use LLM tools to replace my decision trees.
[-]
- LPisGood 1 day ago
  It just occurred to me that with those massive system files people use we’re basically reinventing expert systems of the past. Time is a flat circle, I suppose.
  [-]
  - schrodinger 1 day ago
    A decision tree is simply a model where you follow branches and make a decision at each point. Like...
    If we had tech support for a toaster, you might see:
```
    if toaster toasts the bread:
      if no: has turning it off and on again worked?
        if yes: great! you found a solution
        if no: hmm, try ...
      if yes:
        is the bread burnt after?
          if no: sounds like your toaster is fine!
          if yes: have you tried adjusting the darkness knob?
            if no: ship it in for repair
            if yes: try replacing the timer. does that help?
              if no: ship it in for repair
              if yes: yay you're toaster is fixed
```
  - LostMyLogin 1 day ago
    Any chance you can ELI5 this to me?
    [-]
    - dmbche 1 day ago
      Just search "expert system"
    - esafak 22 hours ago
      https://en.wikipedia.org/wiki/Decision_tree
hn_throwaway_99 1 day ago
> Here’s the reality check: One panelist mentioned that 95% of AI agent deployments fail in production. Not because the models aren’t smart enough, but because the scaffolding around them, context engineering, security, memory design, isn’t there yet.
It's a big pet peeve of mine when an author states an opinion, with no evidence, as some kind of axiom. I think there is plenty of evidence that "the models aren't smart enough". Or to put it more accurately, it's an incredibly difficult problem to get a big productivity gain when an automated system is blatantly wrong ~1% of the time but when those wrong answers are inherently designed to look like right answers as much as possible.
jongjong 1 day ago
It's interesting because my management philosophy when delegating work has been to always start by telling people what my intent is, so that they don't get too caught up in a specific approach. Many problems require out-of-the-box thinking. This is really about providing context. Context engineering is basically a management skill.
Without context, even the brightest people will not be able to fill in the gaps in your requirements. Context is not just nice-to-have, it's a necessity when dealing with both humans and machines.
I suspect that people who are good engineering managers will also be good at 'vibe coding'.
[-]
- HardCodedBias 22 hours ago
  "I suspect that people who are good engineering managers will also be good at 'vibe coding'."
  I have observed that those who have both technical and management experience seem to be more adept (or perhaps willing?) to use LLMs in the daily life to good effect.
  Of course what really helps, like in all things, is conscientiousness and an obsession for working through problems (if people don't like obsession then tenacity and diligence).
tirumaraiselvan 1 day ago
This article is getting a lot of hate but honestly it does have good amount of useful content learned through practical experience, although at an abstract level. For example, this section:
``` The teams that succeed don’t just throw SQL schemas at the model. They build:
Business glossaries and term mappings
Query templates with constraints
Validation layers that catch semantic errors before execution ```
Unfortunately, the mixing of fluffy tone and high level ideas is bound to be detested by hands on practitioners.
another_twist 1 day ago
Its weird that this makes the front page and Metas code world model never did.
[-]
- metadat 1 day ago
  First I've heard of it:
  https://ai.meta.com/research/publications/cwm-an-open-weight...
- CuriouslyC 1 day ago
  HN front page dynamics are heavily driven by having readers of /new who are stans for your content.
  [-]
  - mnky9800n 23 hours ago
    is that an eminem reference?
    [-]
    - esafak 22 hours ago
      now entered the lexicon among the younger crowd.
      [-]
      - mnky9800n 22 hours ago
        That eminem song is 25 years old.
ath3nd 1 day ago
[dead]
hshdhdhehd 1 day ago
Base models are the seed, fine tuning is the genetically modified seed. Context is the fertiliser.
[-]
- handfuloflight 1 day ago
  Agents are the oxen pulling the plow through the seasons... turning over ground, following furrows, adapting to terrain. RAG is the irrigation system. Prompts are the farmer's instructions. And the harvest? That depends on how well you understood what you were trying to grow.
  [-]
  - nylonstrung 1 hour ago
    Locusts are when LLMs inexplicably rewrite your existing code despite in-line and prompt instructions not to
- ath3nd 1 day ago
  [dead]