59 comments

  • Obertr 45 minutes ago
    At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed

    Image model they have released is much worse than nano banana pro, ghibli moment did not happen

    Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding

    The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.

    • int32_64 16 minutes ago
      Is there a "good enough" endgame for LLMs and AI where benchmarks stop mattering because end users don't notice or care? In such a scenario brand would matter more than the best tech, and OpenAI is way out in front in brand recognition.
      • holler 12 minutes ago
        this. I don't know any non-tech people who use anything other than chatgpt. On a similar note, I've wondered why Amazon doesn't make a chatgpt-like app with their latest Alexa+ makeover, seems like a missed opportunity. The Alexa app has a feature to talk to the LLM in chat mode, but the overall app is geared towards managing devices.
        • macNchz 5 minutes ago
          Google has great distribution to be able to just put Gemini in front of people who are already using their many other popular services. ChatGPT definitely came out of the gate with a big lead on name recognition, but I have been surprised to hear various non-techy friends talking about using Gemini recently, I think for many of them just because they have access at work through their Workspace accounts.
        • Obertr 10 minutes ago
          Most of Europe if full of Gemini ads, my parents use Gemini because it is free and it popped up in YouTube ad before the video

          Just go outside the bubble plus take a bit older people

    • avazhi 33 minutes ago
      "At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed"

      This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.

      • drawnwren 18 minutes ago
        OAI also got talent mined. Their top intellectual leaders left after fight with sama, then Meta took a bunch of their mid-senior talent, and Google had the opposite. They brought Noam and Sergey back.
      • mmaunder 30 minutes ago
        Yeah the only thing standing in Google's way is Google. And it's the easy stuff, like sensible billing models, easy to use docs and consoles that make sense and don't require 20 hours to learn/navigate, and then just the slew of bugs in Gemini CLI that are basic usability and model API interaction things. The only differentiator that OpenAI still has is polish.

        Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.

        Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.

        Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.

        Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.

        • mips_avatar 20 minutes ago
          I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.
          • mmaunder 6 minutes ago
            Agreed. It's ridiculous.
    • baq 27 minutes ago
      GPT 5.2 is actually getting me better outputs than Opus 4.5 on very complex reviews (on high, I never use less) - but the speed makes Opus the default for 95% of use cases.
    • GenerWork 14 minutes ago
      I'm actually liking 5.2 in Codex. It's able to take my instructions, do a good job at planning out the implementation, and will ask me relevant questions around interactions and functionality. It also gives me more tokens than Claude for the same price. Now, I'm trying to white label something that I made in Figma so my use case is a lot different from the average person on this site, but so far it's my go to and I don't see any reason at this time to switch.
    • raincole 24 minutes ago
      That's a quite sensationalized view.

      Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?

      • Obertr 17 minutes ago
        Check the size and budget of Google iniatives. It’s unlimited
    • dieortin 34 minutes ago
      Is there anything pointing to Brin having anything to do with Google’s turnaround in AI? I hear a lot of people saying this, but no one explaining why they do
    • yieldcrv 15 minutes ago
      the trend I've seen is that none of these companies are behind in concept and theory, they are just spending longer intervals baking a more superior foundational model

      so they get lapped a few times and then drop a fantastic new model out of nowhere

      the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc

      they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options

    • encroach 23 minutes ago
      OAI's latest image model outperforms Google's in LMArena in both image generation and image editing. So even though some people like yourself may prefer nano banana pro in your own anecdotal tests, the average person prefers GPT image 1.5 in blind evaluations.

      https://lmarena.ai/leaderboard/text-to-image

      https://lmarena.ai/leaderboard/image-edit

      • Obertr 12 minutes ago
        Add This to Gemini distribution which is being adcertised by Google in all of their products, and average Joe will pick the sneakers at the shelf near the checkout rather than healthier option in the back
        • encroach 8 minutes ago
          That's not how the arena works. The evaluation is blind so Google's advertising/integration has no effect on the results.
          • Obertr 4 minutes ago
            3 points, sure
    • random9749832 34 minutes ago
      This is obviously trained on Pro 3 outputs for benchmaxxing.
      • CuriouslyC 22 minutes ago
        Not trained on pro, distilled from it.
  • samyok 2 hours ago
    Don’t let the “flash” name fool you, this is an amazing model.

    I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price

    • thecupisblue 1 hour ago
      Oh wow - I recently tried 3 Pro preview and it was too slow for me.

      After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.

      The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.

      Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.

    • mmaunder 48 minutes ago
      Thanks, having it walk a hardcore SDR signal chain right now --- oh damn it just finished. The blog post makes it clear this isn't just some 'lite' model - you get low latency and cognitive performance. really appreciate you amplifying that.
    • unsupp0rted 34 minutes ago
      How good is it for coding, relative to recent frontier models like GPT 5.x, Sonnet 4.x, etc?
    • esafak 2 hours ago
      What are you using it for and what were you using before?
    • encroach 21 minutes ago
      How did you get early access?
    • epolanski 1 hour ago
      Gemini 2.0 flash was good already for some tasks of mine long time ago..
    • freedomben 1 hour ago
      Cool! I've been using 2.5 flash and it is pretty bad. 1 out of 5 answers it gives will be a lie. Hopefully 3 is better
      • samyok 1 hour ago
        Did you try with the grounding tool? Turning it on solved this problem for me.
        • Davidzheng 1 hour ago
          what if the lie is a logical deduction error not a fact retrieval error
          • rat9988 54 minutes ago
            The error rate would still be improved overall and might make it a viable tool for the price depending on the usecase.
    • jauntywundrkind 1 hour ago
      Just to point this out: many of these frontier models cost isn't that far away from two orders of magnitude more than what DeepSeek charges. It doesn't compare the same, no, but with coaxing I find it to be a pretty capable competent coding model & capable of answering a lot of general queries pretty satisfactorily (but if it's a short session, why economize?). $0.28/m in, $0.42/m out. Opus 4.5 is $5/$25 (17x/60x).

      I've been playing around with other models recently (Kimi, GPT Codex, Qwen, others) to try to better appreciate the difference. I knew there was a big price difference, but watching myself feeding dollars into the machine rather than nickles has also founded in me quite the reverse appreciation too.

      I only assume "if you're not getting charged, you are the product" has to be somewhat in play here. But when working on open source code, I don't mind.

      • happyopossum 34 minutes ago
        Two orders of magnitude would imply that these models cost $28/m in and $42/m out. Nothing is even close to that.
    • Sincere6066 1 hour ago
      [flagged]
  • meetpateltech 2 hours ago
  • Def_Os 1 minute ago
    Consolidating their lead. I'm getting really excited about the next Gemma release.
  • zurfer 5 minutes ago
    It's a cool release, but if someone on the google team reads that: flash 2.5 is awesome in terms of latency and total response time without reasoning. In quick tests this model seems to be 2x slower. So for certain use cases like quick one-token classification flash 2.5 is still the better model. Please don't stop optimizing for that!
  • __jl__ 2 hours ago
    This is awesome. No preview release either, which is great to production.

    They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output

    For comparison:

    Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output

    Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output

    Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output

    Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)

    Gemini 3.0 Pro: $2.00/M for input and $12/M for output

    Gemini 2.5 Pro: $1.25/M for input and $10/M for output

    Gemini 1.5 Pro: $1.25/M for input and $5/M for output

    I think image input pricing went up even more.

    Correction: It is a preview model...

    • srameshc 2 hours ago
      Thanks that was a great breakup of cost. I just assumed before that it was the same pricing. The pricing probably comes from the confidence and the buzz around Gemini 3.0 as one of the best performing models. But competetion is hot in the area and it's not too far where we get similar performing models for cheaper price.
    • mips_avatar 1 hour ago
      I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.
    • martythemaniak 6 minutes ago
      The price increase sucks, but you really do get a whole lot more. They also had the "Flash Lite" series, 2.5 Flash Lite is 0.10/M, hopefully we see something like 3.0 Flash Lite for .20-.25.
    • sunaookami 32 minutes ago
      This is a preview release.
    • uluyol 1 hour ago
      Are these the current prices or the prices at the time the models were released?
      • __jl__ 1 hour ago
        Mostly at the time of release except for 1.5 Flash which got a price drop in Aug 2024.

        Google has been discontinuing older models after several months of transition period so I would expect the same for the 2.5 models. But that process only starts when the release version of 3 models is out (pro and flash are in preview right now).

    • misiti3780 25 minutes ago
      is there a website where i can compare openai, anthropic and gemini models on cost/token ?
    • YetAnotherNick 51 minutes ago
      For comparison, GPT-5 mini is $0.25/M for input and $2.00/M for output, so double the price for input and 50% higher for output.
      • AuthError 48 minutes ago
        flash is closer to sonnet than gpt minis though
  • fariszr 2 hours ago
    These flash models keep getting more expensive with every release.

    Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

    Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

    > Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

    The replacement for old flash models will be probably the 3.0 flash lite then.

    • aoeusnth1 2 hours ago
      I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.
    • thecupisblue 1 hour ago
      Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro.

      So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.

    • mips_avatar 28 minutes ago
      For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.
    • fullstackwife 2 hours ago
      cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now
      • fariszr 2 hours ago
        Sure, but for simple tasks that require a large context window, aka the typical usecase for 2.0 flash, it's still significantly more expensive.
  • simonsarris 2 hours ago
    Even before this release the tools (for me: Claude Code and Gemini for other stuff) reached a "good enough" plateau that means any other company is going to have a hard time making me (I think soon most users) want to switch. Unless a new release from a different company has a real paradigm shift, they're simply sufficient. This was not true in 2023/2024 IMO.

    With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.

    • bgirard 2 hours ago
      Why wouldn't you switch? The cost to switch is near zero for me. Some tools have built in model selectors. Direct CLI/IDE plug-ins practically the same UI.
      • azuanrb 1 hour ago
        Not OP, but I feel the same way. Cost is just one of the factor. I'm used to Claude Code UX, my CLAUDE.md works well with my workflow too. Unless there's any significant improvement, changing to new models every few months is going to hurt me more.
        • bgirard 59 minutes ago
          I used to think this way. But I moved to AGENTS.md. Now I use the different UI as a mental context separation. Codex is working on Feature A, Gemini on feature B, Claude on Feature C. It has become a feature.
    • theLiminator 2 hours ago
      For me, the last wave of models finally started delivering on their agentic coding promises.
      • orourke 1 hour ago
        This has been my experience exactly. Even over just the last few weeks I’ve noticed a dramatic drop in having to undo what the agents have done.
      • inquirerGeneral 1 hour ago
        [dead]
    • calflegal 2 hours ago
      I asked a similar question yesterday:

      https://news.ycombinator.com/item?id=46290797

    • catigula 1 hour ago
      Correct. Opus 4.5 'solved' software engineering. What more do I need? Businesses need uncapped intelligence, and that is a very high bar. Individuals often don't.
      • gaigalas 48 minutes ago
        If Opus is one-size-fits-all, then why Claude keeps the other series? (rethorical).

        Opus and Sonnet are slower than Haiku. For lots of less sophisticated tasks, you benefit from the speed.

        All vendors do this. You need smaller models that you can rapid-fire for lots of other reasons than vibe coding.

        Personally, I actually use more smaller models than the sophisticated ones. Lots of small automations.

    • nprateem 2 hours ago
      But for me the previous models were routinely wrong time wasters that overall added no speed increase taking the lottery of whether they'd be correct into account.
    • alex1138 17 minutes ago
      I just can't stop thinking though about the vulnerability of training data

      You say good enough. Great, but what if I as a malicious person were to just make a bunch of internet pages containing things that are blatantly wrong, to trick LLMs?

      • calflegal 10 minutes ago
        The internet has already tried this, for about a few decades. The garbage is in the corpus; it gets weighted as such
    • szundi 1 hour ago
      [dead]
  • simonw 1 hour ago
    Quick pricing comparison: https://www.llm-prices.com/#it=100000&ot=10000&sel=gemini-3-...

    It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.

    It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.

  • zhyder 1 hour ago
    Glad to see big improvement in the SimpleQA Verified benchmark (28->69%), which is meant to measure factuality (built-in, i.e. without adding grounding resources). That's one benchmark where all models seemed to have low scores until recently. Can't wait to see a model go over 90%... then will be years till the competition is over number of 9s in such a factuality benchmark, but that'd be glorious.
  • primaprashant 2 hours ago
    Pricing is $0.5 / $3 per million input / output tokens. 2.5 Flash was $0.3 / $2.5. That's 66% increase in input tokens and 20% increase in output token pricing.

    For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.

    • simonw 1 hour ago
      Calculating price increases is made more complex by the difference in token usage. From https://blog.google/products/gemini/gemini-3-flash/ :

      > Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.

  • caminanteblanco 1 hour ago
    Does anyone else understand what the difference is between Gemini 3 'Thinking' and 'Pro'? Thinking "Solves complex problems" and Pro "Thinks longer for advanced math & code".

    I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.

    • flakiness 1 hour ago
      It seems:

         - "Thinking" is Gemini 3 Flash with higher "thinking_level"
         - Prop is Gemini 3 Pro. It doesn't mention "thinking_level" but I assume it is set to high-ish.
    • peheje 1 hour ago
      I think:

      Fast = Gemini 3 Flash without thinking (or very low thinking budget)

      Thinking = Gemini 3 flash with high thinking budget

      Pro = Gemini 3 Pro with thinking

    • lysace 46 minutes ago
      Really stupid question: How is Gemini-like 'thinking' separate from artificial general intelligence (AGI)?

      When I ask Gemini 3 Flash this question, the answer is vague but agency comes up a lot. Gemini thinking is always triggered by a query.

      This seems like a higher-level programming issue to me. Turn it into a loop. Keep the context. Those two things make it costly for sure. But does it make it an AGI? Surely Google has tried this?

  • JumpCrisscross 45 minutes ago
    Kara Swisher recently compared OpenAI to Netscape. It’s starting to look prescient.
  • outside2344 53 minutes ago
    I don't want to say OpenAI is toast for general chat AI, but it sure looks like they are toast.
  • kingstnap 1 hour ago
    It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

    I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.

    • leumon 2 minutes ago
      Or could it be that it's using tool calls in reasoning (e.g. a google search)?
    • tanh 1 hour ago
      This will be fantastic for voice. I presume Apple will use it
    • GaggiX 1 hour ago
      >or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

      More experts with a lower pertentage of active ones -> more sparsity.

  • dandiep 13 minutes ago
    For someone looking to switch over to Gemini from OpenAI, are there any gotchas one should be aware of? E.g. I heard some mention of API limits and approvals? Or in terms of prompt writing? What advice do people have?
  • tootyskooty 1 hour ago
    Since it now includes 4 thinking levels (minimal-high) I'd really appreciate if we got some benchmarks across the whole sweep (and not just what's presumably high).

    Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?

    • happyopossum 23 minutes ago
      Google appears to be changing what flash is “meant for” with this release - the capability it has along with the thinking budgets make it superior to previous Pro models in both outcome and speed. The likely-soon-coming flash-lite will fit right in to where flash used to be - cheap and fast.
  • rohitpaulk 1 hour ago
    Wild how this beats 2.5 Pro in every single benchmark. Don't think this was true for Haiku 4.5 vs Sonnet 3.5.
    • FergusArgyll 1 hour ago
      Sonnet 3.5 might have been better than opus 3. That's my recollection anyhow
  • SubiculumCode 26 minutes ago
    In Gemini Pro interface, I now have Fast, Thinking, and Pro options. I was a bit confused by that, but did find this: https://discuss.ai.google.dev/t/new-model-levels-fast-thinki...
  • acheong08 2 hours ago
    Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would be able to control a robot with raw instructions and perform skilled physical work currently limited by the text-only output of LLMs. Also helps that the Gemini series is really good at multimodal processing with images and audio. Maybe they can also encode sensory inputs in a similar way.

    Pipe dream right now, but 50 years later? Maybe

  • mmaunder 35 minutes ago
    I think about what would be most terrifying to Anthropic and OpenAI i.e. The absolute scariest thing that Google could do. I think this is it: Release low latency, low priced models with high cognitive performance and big context window, especially in the coding space because that is direct, immediate, very high ROI for the customer.

    Now, imagine for a moment they had also vertically integrated the hardware to do this.

    • JumpCrisscross 17 minutes ago
      > think about what would be most terrifying to Anthropic and OpenAI

      The most terrifying thing would be Google expanding its free tiers.

    • avazhi 30 minutes ago
      "Now, imagine for a moment they had also vertically integrated the hardware to do this."

      Then you realise you aren't imagining it.

  • jug 2 hours ago
    Looks like a good workhorse model, like I felt 2.5 Flash also was at its time of launch. I hope I can build confidence with it because it'll be good to offload Pro costs/limits as well of course always nice with speed for more basic coding or queries. I'm impressed and curious about the recent extreme gains on ARC-AGI-2 from 3 Pro, GPT-5.1 and now even 3 Flash.
  • k8sToGo 1 hour ago
    I remember the preview price for 2.5 flash was much cheaper. And then it got quite expensive when it went out of preview. I hope the same won't happen.
  • xnx 2 hours ago
    OpenAI is pretty firmly in the rear-view mirror now.
    • walthamstow 1 hour ago
      Google Antigravity is a buggy mess at the moment, but I believe it will eventually eat Cursor as well. The £20/mo tier currentluy has the highest usage limits on the market, including Google models and Sonnet and Opus 4.5.
  • alach11 1 hour ago
    I really wish these models were available via AWS or Azure. I understand strategically that this might not make sense for Google, but at a non-software-focused F500 company it would sure make it a lot easier to use Gemini.
    • lbhdc 34 minutes ago
      I feel like that is part of their cloud strategy. If your company wants to pump a huge amount of data through one of these you will pay a premium in network costs. Their sales people will use that as a lever for why you should migrate some or all of your fleet to their cloud.
  • bearjaws 2 hours ago
    I've been using the preview flash model exclusively since it came out, the speed and quality of response is all I need at the moment. Although still using Claude Code w/ Opus 4.5 for dev work.

    Google keeps their models very "fresh" and I tend to get more correct answers when asking about Azure or O365 issues, ironically copilot will talk about now deleted or deprecated features more often.

    • sv123 1 hour ago
      I've found copilot within the Azure portal to be basically useless for solving most problems.
      • djeastm 1 hour ago
        Me too. I don't understand why companies think we devs need a custom chat on their website when we all have access to a chat with much smarter models open in a different tab.
  • whinvik 1 hour ago
    Ok, I was a bit addicted to Opus 4.5 and was starting to feel like there's nothing like it.

    Turns out Gemini 3 Flash is pretty close. The Gemini CLI is not as good but the model more than makes up for it.

    The weird part is Gemini 3 Pro is nowhere as good an experience. Maybe because its just so slow.

    • __jl__ 1 hour ago
      I will have to try that. Cursor bill got pretty high with Opus 4.5. Never considered opus before the 4.5 price drop but now it's hard to change... :)
      • diamondfist25 29 minutes ago
        $100 Claude max is the best subscription I’ve ever had.

        Well worth every penny now

  • SyrupThinker 2 hours ago
    I wonder if this suffers from the same issue as 3 Pro, that it frequently "thinks" for a long time about date incongruity, insisting that it is 2024, and that information it receives must be incorrect or hypothetical.

    Just avoiding/fixing that would probably speed up a good chunk of my own queries.

    • archon1410 1 minute ago
      I just checked and Gemini 3 Flash is completely convinced that the 2025 date in the system prompt is a simulation and the current time is actually May 2024. They didn't immediately comment on it when asked about recent events though, only when asked to ponder on if the events feel real.
    • robrenaud 1 hour ago
      Omg, it was so frustrating to say:

      Summarize recent working arxiv url

      And then it tells me the date is from the future and it simply refuses to fetch the URL.

  • elvin_d 14 minutes ago
    Gemini 3 are great models but lacking a few things: - app expirience is atrocious, poor UX all over the place. A few examples: silly jumps when reading the text when the model starting to respond, slide-over view in iPad breaking request while Claude and ChatGPT working fine. - Google offer 2 choices: your data used for whatever they want or if you want privacy, the app expirience going even worse.
  • doomerhunter 2 hours ago
    Pretty stoked for this model. Building a lot with "mixture of agents" / mix of models and Gemini's smaller models do feel really versatile in my opinion.

    Hoping that the local ones keep progressively up (gemma-line)

  • Fiveplus 2 hours ago
    It is interesting to see the "DeepMind" branding completely vanish from the post. This feels like the final consolidation of the Google Brain merger. The technical report mentions a new "MoE-lite" architecture. Does anyone have details on the parameter count? If this is under 20B params active, the distillation techniques they are using are lightyears ahead of everyone else.
  • jtrn 1 hour ago
    This is the first flash/mini model that doesn't make a complete ass of itself when I prompt for the following: "Tell me as much as possible about Skatval in Norway. Not general information. Only what is uniquely true for Skatval."

    Skatval is a small local area I live in, so I know when it's bullshitting. Usually, I get a long-winded answer that is PURE Barnum-statement, like "Skatval is a rural area known for its beautiful fields and mountains" and bla bla bla.

    Even with minimal thinking (it seems to do none), it gives an extremely good answer. I am really happy about this.

    I also noticed it had VERY good scores on tool-use, terminal, and agentic stuff. If that is TRUE, it might be awesome for coding.

    I'm tentatively optimistic about this.

    • amunozo 1 hour ago
      I tried the same with my father's little village (Zarza Capilla, in Spain), and it gave a surprisingly good answer in a couple of seconds. Amazing.
    • kingstnap 1 hour ago
      You are effectively describing SimpleQA but with a single question instead of a comprehensive benchmark and you can note the dramatic increase in performance there.
  • timpera 33 minutes ago
    Looks awesome on paper. However, after trying it on my usual tasks, it is still very bad at using the French language, especially for creative writing. The gap between the Gemini 3 family and GPT-5 or Sonnet 4.5 is important for my usage.

    Also, I hate that I cannot send the Google models in a "Thinking" mode like in ChatGPT. When I send GPT 5.1 Thinking on a legal task and tell it to check and cite all sources, it takes +10 minutes to answer, but it did check everything and cite all its sources in the text; whereas the Gemini models, even 3 Pro, always answer after a few seconds and never cite their sources, making it impossible to click to check the answer. It makes the whole model unusable for these tasks. (I have the $20 subscription for both)

    • happyopossum 16 minutes ago
      > whereas the Gemini models, even 3 Pro, always answer after a few seconds and never cite their sources

      Definitely has not been my experience using 3 Pro in Gemini Enterprise - in fact just yesterday it took so long to do a similar task I’d thought something was broken. Nope, just re-chrcking a source

      • timpera 3 minutes ago
        Does Gemini Enterprise have more features?

        Just tried once again with the exact same prompt: GPT-5.1-Thinking took 12m46s and Gemini 3.0 Pro took about 20 seconds. The latter obviously has a dramatically worse answer as a result.

        (Also, the thinking trace is not in the correct language, and doesn't seem to show which sources have been read at which steps- there is only a "Sources" tab at the end of the answer.)

  • user_7832 2 hours ago
    Two quick questions to Gemini/AI Studio users:

    1, has anyone actually found 3 Pro better than 2.5 (on non code tasks)? I struggle to find a difference beyond the quicker reasoning time and fewer tokens.

    2, has anyone found any non-thinking models better than 2.5 or 3 Pro? So far I find the thinking ones significantly ahead of non thinking models (of any company for that matter.)

    • Workaccount2 2 hours ago
      Gemini 3 is a step change up against 2.5 for electrical engineering R&D.
    • Davidzheng 2 hours ago
      I think it's probably actually better at math. Though still not enough to be useful in my research in a substantial way. Though I suspect this will change suddenly at some point as the models move past a certain threshold (also it is heavily limited by the fact that the models are very bad at not giving wrong proofs/counterexamples) so that even if the models are giving useful rates of successes, the labor to sort through a bunch of trash makes it hard to justify.
    • tmaly 2 hours ago
      Not for coding but for the design aspect, 3 outshines 2.5
  • sunaookami 33 minutes ago
    Sadly not available in the free tier...
  • speedgoose 1 hour ago
    I’m wondering why Claude Opus 4.5 is missing from the benchmarks table.
    • anonym29 1 hour ago
      I wondered this, too. I think the emphasis here was on the faster / lower costs models, but that would suggest that Haiku 4.5 should be the Anthropic entry on the table instead. They also did not use the most powerful xAI model either, instead opting for the fast one. Regardless, this new Gemini 3 Flash model is good enough that Anthropic should be feeling pressure on both price and model output quality simultaneously regardless of which Anthropic model is being compared against, which is ultimately good for the consumer at the end of the day.
  • Workaccount2 2 hours ago
    Really hoping this is used for real time chatting and video. The current model is decent, but when doing technical stuff (help me figure out how to assemble this furniture) it falls far short of 3 pro.
  • poplarsol 2 hours ago
    Will be interesting to see what their quota is. Gemini 3.0 Pro only gives you 250 / day until you spam them with enough BS requests to increase your total spend > $250.
  • JeremyHerrman 2 hours ago
    Disappointed to see continued increased pricing for 3 Flash (up from $0.30/$2.50 to $0.50/$3.00 for 1M input/output tokens).

    I'm more excited to see 3 Flash Lite. Gemini 2.5 Flash Lite needs a lot more steering than regular 2.5 Flash, but it is a very capable model and combined with the 50% batch mode discount it is CHEAP ($0.05/$0.20).

    • jeppebemad 1 hour ago
      Have you seen any indications that there will be a Lite version?
      • summerlight 1 hour ago
        I guess if they want to eventually deprecate the 2.5 family they will need to provide a substitute. And there are huge demands for cheap models.
  • walthamstow 1 hour ago
    I'm sure it's good, I thought the last one was too, but it seems like the backdoor way to increase prices is to release a new model
    • jeffbee 1 hour ago
      If the model is better in that it resolves the task with fewer iterations then the i/o token pricing may be a wash or lower.
  • tanh 2 hours ago
    Does this imply we don't need as much compute for models/agents? How can any other AI model compete against that?
  • heliophobicdude 1 hour ago
    Any word on if this using their diffusion architecture?
  • nickvec 1 hour ago
    So is Gemini 3 Fast the same as Gemini 3 Flash?
  • hubraumhugo 2 hours ago
    You can get your HN profile analyzed and roasted by it. It's pretty funny :) https://hn-wrapped.kadoa.com
    • onraglanroad 1 hour ago
      I didn't feel roasted at all. In fact I feel vindicated! https://hn-wrapped.kadoa.com/onraglanroad
    • SubiculumCode 23 minutes ago
    • apparent 30 minutes ago
      Pretty fucking hilarious, if completely off-topic.
    • WhereIsTheTruth 1 hour ago
      This is exactly why you keep your personal life off the internet
    • echelon 2 hours ago
      This is hilarious. The personalized pie charts and XKCD-style comics are great, and the roast-style humor is perfect.

      I do feel like it's not an entirely accurate caricature (recency bias? limited context?), but it's close enough.

      Good work!

      You should do a "show HN" if you're not worried about it costing you too much.

    • peheje 1 hour ago
      This is great. I literally "LOL'd".
  • FergusArgyll 1 hour ago
    So much for "Monopolies get lazy, they just rent seek and don't innovate"
    • NitpickLawyer 1 hour ago
      Also so much for the "wall, stagnation, no more data" folks. Womp womp.
    • deskamess 1 hour ago
      Monopolies and wanna-be monopolies on the AI-train are running for their lives. They have to innovate to be the last one standing (or second last) - in their mind.
    • concinds 1 hour ago
      The LLM market has no moats so no one "feels" like a monopoly, rightfully.
    • incrudible 1 hour ago
      LLMs are a big threat to their search engine revenue, so whatever monopoly Google may have had does not exist anymore.
  • Tiberium 2 hours ago
    Yet again Flash receives a notable price hike: from $0.3/$2.5 for 2.5 Flash to $0.5/$3 (+66.7% input, +20% output) for 3 Flash. Also, as a reminder, 2 Flash used to be $0.1/$0.4.
    • BeetleB 2 hours ago
      Yes, but this Flash is a lot more powerful - beating Gemini 3 Pro on some benchmarks (and pretty close on others).

      I don't view this as a "new Flash" but as "a much cheaper Gemini 3 Pro/GPT-5.2"

      • jexe 55 minutes ago
        Right, depends on your use cases. I was looking forward to the model as an upgrade to 2.5 Flash, but when you're processing hundreds of millions of tokens a day (not hard to do if you're dealing in documents or emails with a few users), the economics fall apart.
      • Tiberium 2 hours ago
        I would be less salty if they gave us 3 Flash Lite at same price as 2.5 Flash or cheaper with better capability, but they still focus on the pricier models :(
        • zzleeper 2 hours ago
          Same! I want to do some data stuff from documents and 2.0 pricing was amazing, but the constant increases go the wrong way for this task :/
  • GaggiX 2 hours ago
    They went too far, now the Flash model is competing with their Pro version. Better SWE-bench, better ARC-AGI 2 than 3.0 Pro. I imagine they are going to improve 3.0 Pro before it's no more in Preview.

    Also I don't see it written in the blog post but Flash supports more granular settings for reasoning: minimal, low, medium, high (like openai models), while pro is only low and high.

    • minimaxir 2 hours ago
      "minimal" is a bit weird.

      > Matches the “no thinking” setting for most queries. The model may think very minimally for complex coding tasks. Minimizes latency for chat or high throughput applications.

      I'd prefer a hard "no thinking" rule than what this is.

      • GaggiX 2 hours ago
        It still supports the legacy mode of setting the budget, you can set it to 0 and it would be equivalent to none reasoning effort like gpt 5.1/5.2
    • skerit 2 hours ago
      > They went too far, now the Flash model is competing with their Pro version

      Wasn't this the case with the 2.5 Flash models too? I remember being very confused at that time.

      • JohnnyMarcone 41 minutes ago
        This is similar to how Anthropic has treated sonnet/opus as well. At least pre opus 4.5.

        To me it seems like the big model has been "look what we can do", and the smaller model is "actually use this one though".

    • jug 2 hours ago
      I'm not sure how I'm going to live with this!
  • jijji 1 hour ago
    I tried Gemini CLI the other day, typed in two one line requests, then it responded that it would not go further because I ran out of tokens. I've hard other people complaint that it will re-write your entire codebase from scratch and you should make backups before even starting any code-based work with the Gemini CLI. I understand they are trying to compete against Claude Code, but this is not ready for prime time IMHO.
  • andrepd 2 hours ago
    Is there a way to try this without a Google account?
    • mschulkind 1 hour ago
      Just use openrouter or a similar aggregator.
  • moralestapia 2 hours ago
    Not only it is fast, it is also quite cheap, nice!
  • anonym29 1 hour ago
    I never have, do not, and conceivably never will use gemini models, or any other models that require me to perform inference on Alphabet/Google's servers (i.e. gemma models I can run locally or on other providers are fine), but kudos to the team over there for the work here, this does look really impressive. This kind of competition is good for everyone, even people like me who will probably never touch any gemini model.
    • oklahomasports 23 minutes ago
      You don’t want Google to know that you are searching for like advice on how much a 61 yr old can contribute to a 401k. What are you hiding?
      • anonym29 19 minutes ago
        Why do you close the bathroom stall door in public?

        You're not doing anything wrong. Everyone knows what you're doing. You have no secrets to hide.

        Yet you value your privacy anyway. Why?

        Also - I have no problem using Anthropic's cloud-hosted services. Being opposed to some cloud providers doesn't mean I'm opposed to all cloud providers.

  • Lucasjohntee 43 minutes ago
    [dead]
  • inquirerGeneral 1 hour ago
    [dead]
  • imvetri 2 hours ago
    this is why samsung is stopping production in flash
    • Tepix 1 hour ago
      This is why they stopped The Flash after season 9 in 2023.
  • bennydog224 2 hours ago
    From the article, speed & cost match 2.5 Flash. I'm working on a project where there's a huge gap between 2.5 Flash and 2.5 Flash Lite as far as performance and cost goes.

    -> 2.5 Flash Lite is super fast & cheap (~1-1.5s inference), but poor quality responses.

    -> 2.5 Flash gives high quality responses, but fairly expensive & slow (5-7s inference)

    I really just need an in-between for Flash and Flash Lite for cost and performance. Right now, users have to wait up to 7s for a quality response.