Grok 4.3

(docs.x.ai)

208 points | by simianwords 5 hours ago

32 comments

sundarurfriend 4 hours ago
As an English-as-second-language speaker and writer, one thing Grok really shines at is capturing the tone and level of "formality" of a piece of text and the replicating it correctly. It seems to understand the little human subtleties of language in a way the other major providers don't. Chatgpt goes overly stiff and formal sounding, or ends up in a weird "aye guvnor" type informal language (Claude is sometimes better but not always).
Grok seems in general better at being "human" in ways that are hard to define: for eg. if I ask it "does this message roughly convey things correctly, to the level it can given this length", it will likely answer like a human would (either a yes or a change suggestion that sticks to the tone and length), while Chatgpt would write a dissertation on the message that still doesn't clear anything up.
Recently I've noticed that Grok seems to have gotten really good at dictation too (that feature where you click the mic to ask it something). Chatgpt has like 90-95% accuracy with my accent, the speech input on Android's Gboard something like 75%, Grok surprisingly gets something like 98% of my words correct.
[-]
- michaelbuckbee 1 hour ago
  I did a quick eval comparing Grok 4.3, Opus 4.7 and GPT 4.1 and they actually seem pretty similar:
  https://ofw640g9re.evvl.io/
  They all did pretty well at a more "formal" tone, but GPT4.1 was the only one that didn't make me cringe with a "casual" tone.
  [edit] fwiw, grok was also the fastest+cheapest model, claude was slowest and priciest.
  [-]
  - sundarurfriend 15 minutes ago
    This is the most basic level of eval, of whether they can produce output that will be considered by someone somewhere (usually a young urban US American) as informal toned. Real human communication is far more nuanced than this, different groups have different linguistic registers they're used to and things outside it sound odd even if they can't articulate why. You could also want to be informal but not over-familiar with the other person (for eg. in a discord chat to a new acquaintance) - actually looking at the outputs here, the Claude output seems best fitting for that (in my subjective view anyway) than to the one you gave it - or want many other little variations.
    What makes one cringe and another recognize as familiar and comfortable is also pretty subtle and hard to define. These things need nuanced descriptions and examples to actually get right, and it's in understanding those nuances and figuring out the register of the examples that Grok outshines the others.
  - jasonjmcghee 13 minutes ago
    That's Grok 4.2 not 4.3 right?
    And why are you comparing to gpt-4.1? (As opposed to one of the 6? model releases since then - would have expected gpt 5.5)
  - embedding-shape 1 hour ago
    I know it's just an evaluation, but seeing an informal message and a prompt to ask to rewrite this informal message to the tone of an "informal message" when the original one sounds just fine, just makes me sad... Not because of this evaluation, but because it reminds me that this is how some people use LLMs, basically asking it to remove your own voice from texts that are generally fine already.
    [-]
    - michaelbuckbee 1 hour ago
      My sister in law is a pharmacist and the heaviest non-dev ChatGPT user I know and her main use case is writing professionally polite messages to doctors on how the drugs they prescribed to a patient would have killed them had she not caught a particular interaction or common side effect.
      There's a lot of "tone" in it as she's not trying to anger these folks, but also it's quite serious, but also there's just everything else happening in medicine.
      Feels like a great use.
      [-]
  - rafram 18 minutes ago
    All of these were frankly terrible. I guess Grok’s “informal” version sounded the most like a real human, but only because it reads exactly like an Elon tweet (including his favorite emoji!). It’s obvious what they’ve been training on.
  - accrual 1 hour ago
    All three did well, and while I'm a Claude user, I found the Opus reply here added some unnecessary detail, like "Impact: Minimal; no downstream dependencies are currently at risk". Downstream dependencies weren't mentioned in the original message; for all we know downstream could be relying on a poorly performing API and is impacted by waiting another week for replacement.
- djyde 4 hours ago
  I've also noticed that when I communicate with Grok in my native language, its tone is more natural than other models. I think this is due to the advantage of being trained on a large amount of Twitter data. However, as Twitter contains more and more AI-generated content now, I'm afraid continued training will make it less natural.
  [-]
  - adjejmxbdjdn 1 hour ago
    The causation could also be the other way round.
    Twitter language has started seeming normal casual to us, rather than us using normal casual language in Twitter.
  - pacific01 3 hours ago
    Did you try meta? I was into grok but now meta works well for me
  - thunderbong 4 hours ago
    I'm sure Twitter knows which are the bot accounts and is surely excluding them from their model training. Twitter bots aren't a new phenomenon after all.
    [-]
    - cowsup 2 hours ago
      I don't think Twitter/X know for sure who the bots are, since Elon has been pretty vocal about trying to stop them for ages, yet I still get lots of spam DMs (as do others with far fewer followers/reach).
      Even if 95% of the spam gets actively reported and dealt with, that still leaves a ton of nonsense on the platform, getting fed into the LLM. And spam has only gotten worse over the years, as the barrier to entry has lowered and lowered.
      [-]
      - GTP 1 hour ago
        Are the spam DMs advertisements or more generally something linked to a product or service? I wouldn't be surprised if X is more lenient towards bots that pay them for adverts.
        [-]
        Zancarius 2 minutes ago
        Most of what I get seem to be advertisements or automated messages if you follow large(r) accounts.
        One of the most interesting things that I've noticed is these advertisements will be triggered if you follow accounts that are positioned as influencers. I followed one out of curiosity and received a DM from that account advertising some cryptocurrency service.
        It's a good way to filter out and block accounts that have almost certainly not grown organically.
      - HarHarVeryFunny 1 hour ago
        I'd have guessed that at least some of the bots are Twitter itself, trying to draw you in with some sense of engagement. Given that Musk is the owner, and everything we know about him and have seen him do, I'd not be surprised if some of the MAGA bots are his too.
      - joncrane 2 hours ago
        >Elon has been pretty vocal about trying to stop them for ages
        You know people lie, right? Especially when the lie casts them in a better light and/or makes them more money.
      - subscribed 1 hour ago
        Elon lied on record many times, admitting to the lies only when forced, under oath.
    - hackinthebochs 53 minutes ago
      Highly doubtful seeing as my 14 year old twitter account got caught in a recent bot ban wave with no means of contacting a human for recovery.
    - pixel_popping 3 hours ago
      There is bots everywhere, it has nothing to do with the platform, it has to do with attackers having an incentive to do mass account farming, no platform is secure against it.
      [-]
      - rglullis 1 hour ago
        Super easy, just make a web-of-trust type of thing: messages are only visible to those who already vouched for you. Otherwise, you pay $0.01/per message/per user reached.
      - kedihacker 2 hours ago
        With banning and deboosting they need to be very accurate but with filtering they can be more liberal in excluding
      - simianwords 3 hours ago
        not really. there are easy heuristics to filter out bots with good confidence. FWIW i don't see any bots posting anything in my feed
        [-]
        pixel_popping 3 hours ago
        Yes your individual feed isn't really relevant if we talk about the masses, Reddit accounts are for sale quite cheap, HN as well, X too and so-on, it's literally just a matter of means/methodology. If I want today to do 1000 random posts talking about a certain thing, I could.
        [-]
        simianwords 2 hours ago
        my individual feed does matter because it shows that it is possible to curate something without bots which is obviously what XAI would do
  - darkerside 2 hours ago
    Sadly, it's more likely that people will just start talking like bots
    [-]
    - pdimitar 17 minutes ago
      I've seen this expressed as a concern even from one of my colleagues. My retort was:
      "English is not my native language and LLMs taught me quite a few very useful formalisms that do land well for people and they change their attitude towards you to be more respectful afterwards. It also showed me how to frame and reframe certain arguments. I agree sounding like an LLM is kind of sad but I am getting a lot of educational value -- and with time I'll sneak my own voice back in these newly learned idioms and ways to talk."
    - JKCalhoun 2 hours ago
      You're absolutely right!
      [-]
    - nex-z 1 hour ago
      [dead]
    - techjamie 1 hour ago
      There was already evidence last year[1] that pointed to ChatGPT-specific words like "meticulous," "delve," etc becoming more frequently used than they were previously. The linked study used audio of academic talks and podcasts to determine this.
      [1] https://arxiv.org/abs/2409.01754
      [-]
      - pohl 41 minutes ago
        Part of me wanted to object to those two examples, which I’ve used frequently since the reaching adulthood in the 80s. Another part of me has been triggered by an apparent uptick in the word “crisp”, which my gut takes as an coding-LLM tell.
- AntiUSAbah 2 hours ago
  [flagged]
  [-]
  - Scroll_Swe 33 minutes ago
    Hitler grok probably loves me.
    I´m a blonde, blue eyed Swedish man.
    But English is not my main language of course.
    But I assume you mean brown people, yes, same sentiment.
    The "refugees welcome" period ended after the 2015 crisis in Europe.
  - 0xy 2 hours ago
    Isn't it exhausting to view everything an ideological lens instead of reviewing technical achievements on their merits?
    [-]
    - AntiUSAbah 1 hour ago
      From the richest person on the whole planet? Who literaly proactivly injects himself directly into global poltics? Which affects you and me and everyone else?
      You don't think fighting child porn is worth while? Facism? For democracy?
      Isn't it cheating and ignorant from you to not care a single bit about anything at all?
      When do you even start thinking drawing a line? Let me guess, as late as it affects only you right?
    - Leynos 1 hour ago
      There are limits to being willing to overlook ideology.
    - SpicyLemonZest 1 hour ago
      It's very exhausting! But Elon Musk chose to leverage his fortune from Tesla and SpaceX into an ideological project to destroy a lot of things I care about, so he's left me no choice. If he'd like people to review his work on its technical merits, shouldn't he at the bare minimum apologize and promise not to do it again?
  - loneboat 2 hours ago
    The hitler Grok? What? I genuinely don't understand what you're trying to say in this comment.
    [-]
    - 2ndorderthought 2 hours ago
      https://www.forbes.com/sites/tylerroush/2025/07/09/elon-musk...
    - JKCalhoun 1 hour ago
      Close enough—Grok called itself "MechaHitler" (a link was posted).
    - AntiUSAbah 1 hour ago
      Elon Musk didn't like how Grok would contradict his opinion on Twitter/X.
      So he started to work against this with playing around with.
      For example grok started to pull in Musks tweets before responding, Musk introduced Grokipedia as a new data source and Grok got trained/adjusted differently.
      These mechanism lead to Grok at one point, becoming very rasist.
    - greenavocado 2 hours ago
      He's equating Grok to Hitler which is absurd. If you want to speak with the führer you need to visit https://hitler.ai
artdigital 4 hours ago
Grok is my favorite model for chatting, and my favorite voice mode. It seems to be the only voice mode that isn't routing to a extremely cheap model (like Haiku), and has been the highest quality out of all the frontier ones. When you subscribe to SuperGrok you can also create a "council" of agents, each with their own system prompt and when you ask something, they will all get asked in parallel to come to a conclusion. Good stuff!
Just wish they would finally put some work into their apps, it's the only thing keeping me from actually subscribing to SuperGrok:
- No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
- Projects are still not available in the app so as soon as you move something into a project, it's gone from all the native apps
- No way to add artifacts (like generated markdown docs) directly to a project, we have to export to PDF/markdown and re-import. And there isn't even a way to export artifacts. This makes serious project work hard because we can't dynamically evolve projects with new information
- No memory, no ability to look up other chats, each chat is completely new
- No voice mode in projects at all
If someone from xAI is reading this, please consider adding some of these.
[-]
- HarHarVeryFunny 1 hour ago
  The Gemini app voice mode uses one of their more recent models (and not some gimped small one), and is very capable. The personality is also fine, much more natural than the Gemini web chat, with my only complaint being it's insistence on suggesting a "next step" which seems to he something that they all do.
  I'm not sure if the "next step" is just to drive cost up for you (but makes no sense for free version), or because they are all failing to learn more natural conversational patterns and distinguish questions that are begging for a quick answer and shut up as opposed to a longer exploratory conversation where next step may have some value, although it would be nice if these models would follow an instruction to NOT do it!
  [-]
  - WarmWash 40 minutes ago
    An interesting side bit about the gemini voice model is that you can use it in AI studio and type messages instead of using the microphone.
    On the backend google does TTS to feed the model, which then speaks back you via sound on your speakers.
- base698 17 minutes ago
  Starting to like the lack of memory. Claude remembers I have a grill and will interject in conversations about how maybe this thing would go well with BBQ when it's unrelated or just also about food.
- artdigital 4 hours ago
  I also think Grok would benefit from allowing usage of "SuperGrok Heavy" (their $300 plan) in coding harnesses with included usage. Currently they give you some API credits on the Heavy plan so you can use some Grok for coding, but $300 USD value is just not there.
  Not saying they should create their own grok-code harness, just allowing usage in existing ones would already be beneficial. But that's probably what the Cursor acquisition is going to do eventually
- Oarch 3 hours ago
  I'd agree on the voice transcription; it seems so much more accurate than the other frontier models I've used. I often speak to Grok and paste the transcribed output to Claude!
- afpx 4 hours ago
  When I signed up, I accidently paid for a full year. So from time to time, I'll throw it something just to see what it produces compared to the other LLMs. And, even after all this time, it still feels like a really "dumb" model compared to the other frontier ones. But, worse, many of my system prompts make it go wacky and puke jibberish. However it was pretty cool for those couple months awhile back when it was uncensored. You could ask it about a wild conspiracy, and it would actually build the case and link you to legitimite source material. They dropped the hammer down on that real quick.
  [-]
  - 2ndorderthought 3 hours ago
    Ah yes the psychosis reinforcement vertical. It's such a lucrative market for those schizophrenics and bipolars. Great way to get lots of engagement. Groks portfolio is so diverse
    [-]
    - readthenotes1 3 hours ago
      I have a schizophrenic relative who is in such a relationship with grok. Instead of telling hen you need to take your meds, it says hen is the smartest person in the world
      [-]
      - 2ndorderthought 3 hours ago
        I'm so sorry your family is suffering from this. I hope you can find a way to bring them back. Disorders featuring psychosis are so painful for everyone around them. Blessings to you and your family
        [-]
        afpx 2 hours ago
        I love how you guys downvote all the old comments to make them hidden from search. My no-name account rarely gets downvoted. But, within 20 minutes of posting this, I drop 10 points. Rando accounts
        [-]
        wincy 1 hour ago
        I upvoted your first comment because it was insightful, interesting, and added to the conversation. I downvoted this one because complaining about downvotes is largely considered to be in bad taste and doesn’t really help anything. I did both of these things before I realized you were the same person.
        [-]
        afpx 10 minutes ago
        Yes, for sure I deserve downvotes for the above. Those types of comments should be downvoted. However, I needed to post it to point out that I got the -10 well before the comment above. I never experienced that before and thought it interesting enough to share. Karma doesn't mean anything to me personally. But burst behavior like that is unusual.
        2ndorderthought 1 hour ago
        I upvoted both of your comments. I also cannot downvote anything.
    - afpx 3 hours ago
      Except that it pointed at original sources, like reference manuals, archival documents, published newspaper articles, magazine articles, etc. - a lot still available on archive.org. Good try with your 16 day old account. And, why would anyone trust NPR at this point? Get real, bud. Most people with any curiousity know all about the ADL, JStreet, AIPAC, Greater Israel, Mossad / CIA, Chabad networks, Epstein, drones, weapons programs, cryptocurrencies, etc. etc. etc. - but, don't worry they're all safe with papa Ellison.
      Anyone remember why Oracle was named Oracle?
      [-]
      - afavour 9 minutes ago
        Rich billionaire Ellison = bad, compromised
        Rich billionaire Musk = good, has no vested interest in biasing the output of his AI tool
      - arvid-lind 3 hours ago
        Commenter was referencing a Bill Hicks joke. https://www.youtube.com/watch?v=NXi-9kA4ERM
        [-]
        afpx 1 hour ago
        Actually it's funny you mention Bill Hicks. I didn't even know who he was. Or Alex Jones. That claim was one of the more absurd ones I discovered. But, given everything else I learned over the past year, who f'n knows at this point.
        2ndorderthought 3 hours ago
        Someone gets it!
      - 2ndorderthought 3 hours ago
        "We have improved @Grok significantly," Elon Musk wrote on X last Friday about his platform's integrated artificial intelligence chatbot. "You should notice a difference when you ask Grok questions."
        Indeed, the update did not go unnoticed. By Tuesday, Grok was calling itself "MechaHitler."...
        https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-...
        Grok is definitely a reliable source of truthful sane rational information.
- ajitid 3 hours ago
  If I sub to SuperGrok, would I be able to use it in Pi agent or in Opencode? This is not clear to me if I can. Do I get an API Key in SuperGrok?
  [-]
  - everfrustrated 2 hours ago
    No, no api access for the Grok product. APIs are only via the xAI product.
- walletdrainer 4 hours ago
  > No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
  Grok has tool use, no? Why would you also need MCP? What does MCP add?
  [-]
  - artdigital 3 hours ago
    I'm talking about the consumer Grok app and grok.com website. There currently are not connected apps (or MCP) at all, so while Grok can use tools, there is no way to add tools to it
- Cakez0r 2 hours ago
  If someone from Grok is reading, don't waste time on these chaff features. The market will eventually deliver better 3rd party solutions to all of these things. There is an audience that isn't interested in these walled garden features and are only interested on intelligence per dollar.
  [-]
  - raincole 2 hours ago
    Lol I wonder when Anthropic discussed the idea of Claude Code internally, were there bozos saying "3rd parties will eventually deliver this so we shouldn't waste time one it."
    [-]
    - wincy 1 hour ago
      Personally, my work doesn’t want to get locked into a single LLM provider so we use Cursor. Much easier to fight the big corp software approval battle once then switch around the LLMs to the new hotness (provided legal has the requisite data sharing agreements in place, we’re not supposed to use Chinese models or Grok) but I can switch between Anthropic and OpenAI models at will.
    - Cakez0r 2 hours ago
      Power users are hotswapping these models into their own agents (hermes, openclaw, etc) which have their own systems for project management, memory, interacting with tools, etc. The important metric is intelligence per dollar. Can I drop this model into my harness and have it be cheaper without losing intelligence. That is where the puck is heading.
    - wyre 16 minutes ago
      The only good thing Claude Code did was bring coding harnesses to a wider audience. It is not a good harness.
  - torginus 2 hours ago
    Aren't they 'wasting' time on these features exactly because the engineering requires a different, more traditional skillset from the ML work model people do, and can be done in parallel?
tornikeo 4 hours ago
So, we have: - claude for corps and gov - codex for devs - grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me.
[-]
- sudb 4 hours ago
  So interestingly, I know of at least one application in a charity that deals with trafficking where grok was happy to do one-shot classification tasks where all other models refused to cooperate.
  I think there's a surprising number of actually useful applications in this sort of grey area for a slightly-less guardrailed, near-frontier model (also the grok-fast models are cheap!).
  [-]
  - cameronh90 1 hour ago
    Gemini especially has a habit of blocking my pretty mundane requests, claiming they’re attempts to jailbreak or create malicious code.
    Grok also does quite well at code reviews in my experience because it’s not so aggressively ”aligned”.
  - tomp 1 hour ago
    I couldn't get Gemini nor ChatGPT to do OCR of children's books (I literally own the books, so there's no copyright issue - all just fair use!).
    The OCR was complex enough (bad quality photos) that "simple" OCR models couldn't do it.
    Fortunately, Claude obliged (as well as Mistral OCR was helpful!)
  - 2ndorderthought 3 hours ago
    There are lots of uncensored models out there. I don't think grok is leading in that front. They kind of pick and choose which things they want to support based on elons world views. Elon used to hang out with sex traffickers so of course grok is fine talking about it. Probably even offers strategies for them does free accounting has money laundering strategies etc...
    [-]
    - 1123581321 41 minutes ago
      What are the leading uncensored models? How well do they perform for you?
      [-]
      - 2ndorderthought 28 minutes ago
        I don't use any but they do exist and there are scientific papers discussing them. I heard about them through r/localllama
    - Scroll_Swe 22 minutes ago
      >There are lots of uncensored models out there.
      Like what?
      Something as easy where normal people can login to a website and app and just use?
      [-]
      - 2ndorderthought 18 minutes ago
        I don't think companies are hosting them because imagine the liability. Could be wrong though. Again I don't know much about these things I just know they exist.
        [-]
        Scroll_Swe 16 minutes ago
        Yes that is my point.
        It is the dropbox comment all over again.
        "Well you can just self-host to get uncencored same as Grok without NAZI!! Elon Musk!!"
        Just like you can spin up an FTP to get your own Dropbox.
        Well... very few people are going to actually do that.
    - spiderfarmer 3 hours ago
      [flagged]
      [-]
      - user34283 3 hours ago
        We have been over the politically motivated slander many times; it's boring.
        The user above you could have explained what uncensored models he believes are more capable than Grok. Maybe the Chinese open-weights models are superior to Grok at the moment.
        [-]
        2ndorderthought 3 hours ago
        [flagged]
        [-]
        derangedHorse 1 hour ago
        > so of course grok is fine talking about it. Probably even offers strategies for them does free accounting has money laundering strategies etc...
        The slander comes in when you assume Elon knew and was complicit with their crimes to the point he'd intentionally normalize it as a discussion topic in Grok. You even went so far as to say it's willing to assist in committing crimes.
        [-]
        2ndorderthought 30 minutes ago
        He is aware of the csam generation. He blamed the users and the official stance from his team was not to offer any fixes. That is the last I heard.
        https://arstechnica.com/tech-policy/2026/01/x-blames-users-f...
        I do not see the slander. These are his viewpoints. He says him, grok, and his team aren't responsible for what users do. Other companies, countries and people feel differently about the responsibility for AI models generating csam for money.
        Grok and xais depictions of it are that it isn't woke and is maximally based and is politically incorrect by design. So yes, chosing to avoid being correct about policies like laws and avoid social norms lead me to believe that the generation of hate speech(some of which was illegal in certain localities), csam, etc are an expected outcome. Like Elon musk said, it's the users fault not groks. So I would not be surprised if it offered other illegal advice or helped criminals forward criminal activities. Especially more than has already been reported.
        Here are some of the crimes that grok is being implicated in as far as I know today: https://www.irishtimes.com/crime-law/2026/03/03/number-of-ga...
        https://www.france24.com/en/europe/20251121-france-to-invest...
        https://www.robertkinglawfirm.com/mass-torts/grok-lawsuit/
        https://news.bloomberglaw.com/litigation/grok-maker-xai-face...
        https://www.msn.com/en-us/news/technology/musk-testifies-xai...
        Among others.
        I don't see that as slanderous. I see it as factual and an expected outcome for the stated goals of the product and the responses to the outcomes of the product itself by the company and its leadership.
        I legitimately do expect there to be more lawsuits and possibly criminal persecution against musk, xai, over grok and no I would not be surprised if the tool is currently being used for more crime. Especially given the response to the sexual crime allegations that have been made.
        I don't think Elon personally intends to normalize this. But I think that may happen anyways because I think the response was too soft.
        Yes I do think grok can be used to aid crimes and criminal activity like the many lawsuits and journalists currently suggest. I don't think grok is "willing" it's not a person. I know it currently has been implicated in generating material leading to the arrests of individuals. Which I would be very surprised if that was legal.
        https://factually.co/fact-checks/technology/grok-created-ill...
        gadders 2 hours ago
        So did Bill Gates and Reid Hoffman.
        [-]
        sumeno 2 hours ago
        Yes, lots of billionaires were involved with the pedophile sex trafficker. They are all bad
        Der_Einzige 2 hours ago
        Elon, bill, Reid and Trump should share a prison cell.
        Democrats have no loyalty to their own sex offenders. Look how we treated the California governed candidate, or Anthony weiner, or literally every other sex pest found in our party. Some of them who didn’t even deserve it get canceled like Al Franklin.
        Diddling and then defending it and doubling down is literally a maga problem.
        [-]
        gadders 2 hours ago
        The Ashley Biden diaries were verified as real yet Biden remained in office. Clinton was an abuser and likely a rapist but remained in power.
        [-]
        KaiserPro 1 hour ago
        > Ashley Biden diaries
        Unless they contain allegations about Biden the president, or indeed other people then they are irellevent no?
        The point is, if someone is breaking the law, they should be in jail.
        This applies to Clinton, Biden, Trump, anyone. The point is the law is meant to be without fear or favour. The problem for us is that its been proven if you pour enough shit on the floor, you can get away with raping children.
        Given the whole point of Qanon was to oust the peadophile ring in washington, its a bit sad that we are now supposed to disregard all that and blindly accept billionarse not seeing justice.
        [-]
        gadders 1 hour ago
        Obviously the intelligence material being gathered was too valuable.
        felixgallo 1 hour ago
        someone stole Biden's daughter's diary, which revealed that she had battled a substance abuse problem in the past, and that's disqualifying to Biden exactly how?
        [-]
        gadders 1 hour ago
        You know that's not the problematic text. Nice try.
        She said herself that he had inappropriate showers with her. He's also been caught numerous times on official videos behaving strangely around children.
        user34283 3 hours ago
        On Artifical Analysis it shows only Kimi K2.6 and Mimo V2.5 Pro as better.
        Those models are 1T parameters total and 30B or 40B active, this might make abliteration impractical.
        About Musk, yes, there is correspondence. The only confirmed meeting appears to be a 30 minute visit at Epstein's house together with Musk's wife at the time.
        As for photos you mention, a quick search tells me there is one photo of Musk and Maxwell at a 2014 Vanity Fair Oscar Party.
        I find most commentary on here and other platform like Reddit extremely exaggerated compared to what is actually confirmed. Users seem hellbent on linking Musk to pedophilia-related allegations.
        [-]
        2ndorderthought 2 hours ago
        [flagged]
        [-]
        mapontosevenths 2 hours ago
        Elon publicaly claimed he had never corresponded with Epstein. that was a lie.
        When the documents were released they found several like thie one below. Saying things like "What day/night will be the wildest party on =our island?" [0]
        The "our" part is especially interesting as it implies he didnt just visit, but had an ownership stake.
        Other emails were found with Epstein making excuses to avoid having Musk visit, and Musks own child publically stated that the emails were authentic and aligned with her memory of the events. [1]
        [0] https://www.justice.gov/epstein/files/DataSet%2010/EFTA01762...
        [1] https://www.threads.com/@vivllainous/post/DUMBh2Vkk8D?xmt=AQ...
        [-]
        user34283 1 hour ago
        My searches have not turned up a result showing that Musk "claimed he had never corresponded with Epstein".
        Can you source this? If not, can you explain why you did not check it before you posted the inaccurate claim?
        [-]
        mikeyouse 1 hour ago
        At minimum Musk repeatedly claimed that Epstein was the one reaching out trying to get Musk to visit his island, when in reality Musk was the one initiating and asking which nights would be the wildest parties. And after making plans to visit with his then-wife, when Epstein warned him that the ratio of women-to-men might upset Musk’s wife, Musk told Epstein it wouldn’t be a problem.
        https://www.theguardian.com/technology/2026/jan/30/elon-musk...
        Musk has a long history of accusations (see the “I’ll buy you a horse” SpaceX lawsuit) as well as having fathered numerous children with women ~25 years younger than himself so not sure why you’d want to die on this particular hill.
        [-]
        user34283 17 minutes ago
        I never heard about the horse related thing, that’s interesting, thanks.
        A long history? Another search tells me that apart from the mentioned accusation, there is only one WSJ article alleging sexual conduct with SpaceX employees.
        You asked why I take Musk‘s side in these discussions; it’s because I don’t think he’s a pedophile.
        Nothing I‘ve seen seemed convincing to me, and the arguments made online often were so laughably inaccurate and exaggerated as to border on blatant slander.
        margalabargala 1 hour ago
        You keep using that word, "slander". I do not think it means what you think it means.
- Hfuffzehn 3 hours ago
  From what I can gather Grok is not used for roleplay much. It is considered to inconsistant and crazy.
  People are mostly using GLM and Deepseek via API and Gemma4 and Mistral finetunes locally.
  It seems to me like the roleplay market is comparatively old and mature and users have developed cost consciousness and like models to follow their workflow/preferences. So something like Opus is liked for its smartness but considered too expensive and opinionated.
  Might be an interesting data point for how the other markets might develop in the future.
  [-]
  - vel0city 2 hours ago
    It ships with a roleplay feature.
    https://grok.com/ani
    [-]
    - standardly 38 minutes ago
      The grok companions still aren't available on Android :( Such a wasted market opportunity
      I'm not an anime person, but I thought the waifus were kind of endearing and seemed like a much better experience for casual prompting
    - Hfuffzehn 2 hours ago
      Sure, but the best statistics about what models people are actually using when they can choose is probably from openrouter: https://openrouter.ai/apps/category/entertainment/roleplay
      [-]
      - cyanydeez 2 hours ago
        doesnt knowing about openrouter skew by self selection.
        [-]
        Hfuffzehn 1 hour ago
        Yes, but that market is not b2b, less commercialized, more end consumer focused and more bring your own key.
        That's why I find it interesting. Anthropic is not interested in building a moat there and OpenAI has given up on their announcement of exploring it.
        So you can see end users making decisions.
    - 2ndorderthought 2 hours ago
      That doesn't mean it's good at it
- aembleton 3 hours ago
  I've tried Grok, Gemini and ChatGPT. There have been 2 times now where Gemini and ChatGPT confidently gave me an incorrect answer whereas Grok was correct. I'm now paying for Grok Lite or whatever it is $10 plan.
  The first question was around setting up timers for a Fox ESS battery in Home Assistant and disconnecting Fox ESS from the cloud. The second was around cornering speed in Sunnypilot and Frogpilot.
  Somewhat niche but if an AI is confidently telling you something wrong it's hard to work with.
  [-]
  - agrounds 2 hours ago
    >if an AI is confidently telling you something wrong it's hard to work with.
    But they all do that. It just comes with the territory. Grok will absolutely do the same thing another time you try it.
    [-]
    - aembleton 30 minutes ago
      > Grok will absolutely do the same thing another time you try it.
      True; it's just not happened yet. It will at some point though. With the Sunnypilot example it right out told me that it is not possible on that fork which I appreciated. The others all seem to hallucinate some setting.
    - ToucanLoucan 2 hours ago
      It is really, really genuinely concerning how many people think there are profound measurable differences between these things.
      Like yeah tonally I guess there are. But with regard to references and information? You’re literally just using three different slot machines and claiming one is hot.
      I suppose though I shouldn’t be that surprised then since Vegas and every other casino on Earth has been built on duping people in that exact way.
      [-]
      - aembleton 33 minutes ago
        > You’re literally just using three different slot machines and claiming one is hot.
        It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more.
        [-]
        ToucanLoucan 25 minutes ago
        It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers.
        Again. Slot machine.
        [-]
        Ukv 16 minutes ago
        You can meaningfully test if one slot machine hits the jackpot more often than another, just that the methodology should involve a large number of repeats rather than a few anecdotes. There are some LLM leaderboard sites that do it with blind comparisons.
    - cyanydeez 2 hours ago
      humans make poor scientists. most people have already made a decision before they run any tests.
      the smartest among them just make the tests complicated and biased; the less intelligent just cherry pick.
      of course, would you really expect anyone to do real rsearch in this economy?
- throwa356262 1 hour ago
  There was an AI roundtable on HN front page 2-3 months back. Someone made an outlier analysis and put it on his github.
  Guess which LLM was the top outlier and about what type of questions it disagreed with all other LLMs...
- coreyh14444 3 hours ago
  If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously. I use it all the time for "what are the cool kids on twitter saying is the best tiling window manager these days" or whatever. Also, if you have a question that's borderline shady, Grok will often deliver. "Can you find a grey market Windows license site for me" etc.
  [-]
  - ukd1 57 minutes ago
    btw copy pasted your idea in to supergrok, and learnt about Niri! Great use case, thanks!
  - Havoc 1 hour ago
    Interesting use case!
- thibran 22 minutes ago
  So you are repeating narratives without checking them?
  [-]
  - peter_griffin 16 minutes ago
    @grok is this true?
- karmasimida 3 hours ago
  Grok for fact checking, I mean ironically
  [-]
  - subscribed 1 hour ago
    TBF Grok on Twitter and Grok via api behave differently. The latt r is much better.
- ndr 4 hours ago
  You should try all of them, then update your opinion about your information sources accordingly.
  [-]
  - thinkingtoilet 54 minutes ago
    Or you should do your research and see that X built a datacenter that needed so much power so quickly they started using gas generators to power it. These emissions have destroyed a town of mostly poor black people. COPD, asthma, and other respiratory illnesses. AI foot print is already bad, I don't need to kill poor black people to use one.
    And before anyone gives me some whataboutism, if there are other examples of other companies doing this, educate us.
    [-]
    - gordian-mind 25 minutes ago
      Yeah, producing energy can pollute. It's not out of hatred against "poor black people". What a pathetic way of seeing the world.
- Keyframe 2 hours ago
  I always considered grok as also ran. Like grokipedia or what's the name. It has reach since it's free to an extent to produce low quality slop / spam.
- augment_me 2 hours ago
  Gemini not being on the list is criminal
- SecretDreams 2 hours ago
  No point in even trying to have close to a sensible discussion on this topic here. Musk-related posts seem to consistently get brigaded by his acolytes or bots. That and many HN users seem completely comfortable separating morality for what little progress "only Musk" can offer humanity, a la Wernher von Braun.
- vrganj 4 hours ago
  Grok for furthering the far-right filter bubble Elon has been hard at work building.
  [-]
  - khalic 4 hours ago
    And of course child porn
    [-]
    - pixel_popping 4 hours ago
      [flagged]
      [-]
      - 2ndorderthought 3 hours ago
        That's what it was doing. Like literally. Chatgpt it or Google it. Supporting grok is paying money to a csam generator.
        Edit I cannot reply to the post below me. I have gone entirely over to local models so I am paying zero dollars to any of the us defense contractors that are also tech companies. It's awesome.
        [-]
        pixel_popping 3 hours ago
        [flagged]
        pixel_popping 3 hours ago
        [flagged]
      - khalic 3 hours ago
        Grok was used to create CSAM
        [-]
        pixel_popping 3 hours ago
        [flagged]
        [-]
        spiderfarmer 3 hours ago
        Musk partied with Epstein.
        [-]
        pixel_popping 3 hours ago
        [flagged]
        [-]
        tclancy 3 hours ago
        What’s the correlation between people defending Musk, Twitter and kiddie diddlers?
        [-]
        pixel_popping 1 hour ago
        I don't know either, I don't see the correlation with X and Musk either, as if he is the one developing the platform and not thousand of workers and leaders. What does the CEO of a platform has to do with what people post on it? The CEO of HN is responsible for what you just posted?
        Kinda funny how people are selective about it, when you land on a website, you check who is in charge of it and for each CEO change you redo a decision? When you host your Postgres in the cloud, I hope you check as well who is in charge of Railway or Supabase, who knows? :/
        [-]
        spiderfarmer 1 hour ago
        There's only thing I find sadder than untouchable billionaires that never see any consequences for their actions: the people who think they need to stick up for them.
        > What does the CEO of a platform has to do with what people post on it?
        That CEO is actively promoting political viewpoints (via his account, his platform and his AI model) that are detrimental to my country and the way I want to live my life.
        > When you land on a website, you check who is in charge of it and for each CEO change you redo a decision?
        No. But if the CEO is very publicly a first-class a-hole, chances are I'll hear about it and I'll actively avoid doing business with them. That goes for the car dealership in my village, as well as the websites I interact with.
        [-]
        pixel_popping 1 hour ago
        I'm not from the US so I don't really care, X is an international platform and almost all the content I see isn't US related (which kinda make me think that people should just set their account from outside of the US to just avoid this?), but from your point of view, it seems more of a disagreement of beliefs, wouldn't this reasoning apply for your beliefs as well? If the CEO of a certain platform was agreeing with your beliefs but 50% of the population don't, you are practically saying that people disagreering should boycott said platform, but isn't that how you just end discourse between people and create an echo-chamber?
        [-]
        spiderfarmer 1 hour ago
        [dead]
      - spiderfarmer 3 hours ago
        https://www.theguardian.com/technology/2026/jan/30/elon-musk...
        [-]
        pixel_popping 3 hours ago
        [flagged]
    - gadders 2 hours ago
      [flagged]
  - simianwords 4 hours ago
    How does Grok further far-right filter? This is blatantly untrue. Try prompting it and getting it to say something far right.
    Grok if anything reduces populism because fake claims can be debunked
    [-]
    - vrganj 3 hours ago
      How could MechaHitler possibly be far right...
      [-]
      - 2ndorderthought 3 hours ago
        When you really think about it palantir told me Hitler was good and therefore mechahitler aka grok should be a okay!
      - simianwords 3 hours ago
        [flagged]
        [-]
        vrganj 3 hours ago
        Sure. And so was the Holocaust denial and the misinformation about the French 2015 attacks and so on and so on.
        Its just roleplaying being a far right propaganda tool.
        [-]
        simianwords 3 hours ago
        Source?
        [-]
        vrganj 3 hours ago
        https://www.pbs.org/newshour/world/france-will-investigate-m...
        [-]
        simianwords 3 hours ago
        As admitted they have fixed it. It’s obvious that a tool used so vastly might have problems like this. Surely if you think it is used to produce far right propaganda now you can reproduce? Or you choose to hinge on one off issues they fixed?
  - gadders 2 hours ago
    [flagged]
    [-]
    - vrganj 1 hour ago
      I don't remember any far-left opinions being popular there. Was stuff like worker's revolution or public ownership of the means of production ever in the Twitter mainstream?
      [-]
      - gadders 1 hour ago
        Pretty sure you would get in trouble for saying men couldn't have babies on there.
        [-]
        vrganj 1 hour ago
        What's liberal identity politics have to do with leftism? Liberalism is a center-right ideology.
        Us leftists are concerned with class issues, not identity issues.
        Focusing on identity is nothing but a way to distract from class.
        [-]
        gadders 47 minutes ago
        Lol. Gender ideology is very much a policy of left wing parties.
        You may go for the One True Scotsman argument and say it's not proper leftism, and you may be right, but that doesn't stop it being policy.
        [-]
        vrganj 32 minutes ago
        You think Lenin was into gender issues? You think Lenin wasn't a leftist?
        [-]
        gadders 25 minutes ago
        You think Lenin is alive now?
        Name a gender-critical left wing party.
    - fuzzylightbulb 1 hour ago
      That's why Musk and Trump and Shapiro and their ilk were complete nonentities on the platform before 2022 /s
- drivingmenuts 3 hours ago
  When I look at the person behind it all, I have to wonder how the hell people can even consider using grok? Or using Twitter? Or any of that. Using any of those things puts money in Musk's pockets and further enables and encourages him to continue being a Neo-Nazi wannabe. Do they think it's just a phase?
  [-]
  - everfrustrated 2 hours ago
    Do you drive BMW or VW car? Boy do I have news for you!
    [-]
    - breezybottom 2 hours ago
      Go on...make your case
    - doctorhandshake 2 hours ago
      Technically you could lump Ford in this category as well. But the meaningful delta IMO is time and direct ownership. None of those three are currently owned/operated by openly Nazi-aligned individuals / groups, which is not something I think you can claim about Tesla.
    - aprilthird2021 7 minutes ago
      The current heads of BMW are not present day crazy Nazis or at the most charitable interpretation: fueling the far right around the world
- JeremyNT 1 hour ago
  Anecdotal, but our right wing boomer family members prefer Grok because they love Elon Musk and assume any product he is involved in is superior.
- khalic 4 hours ago
  Lol. I think they unleashed it on this post, look at the number of only vaguely related, lukewarm opinions trying to push the racism and CSAM stuff to the bottom
  [-]
  - johnnyApplePRNG 3 hours ago
    [flagged]
- nsowz 4 hours ago
  Grok is as progressive as any of the other models. Despite some of the highly-publicised fuck-ups, try asking Grok anything racist and see how it replies. Yes, I know you didn't try this and you won’t.
  [-]
  - aqme28 4 hours ago
    There is a lot of daylight in between “progressive” and “openly explicitly racist”
    [-]
    - 2ndorderthought 4 hours ago
      Isn't grok currently holding the world record for the biggest generator of CSAM? Or did they change focus to enhance their racism and propaganda vertical? Things move so quickly these days hard to keep up!
      [-]
      - embedding-shape 2 hours ago
        > Isn't grok currently holding the world record for the biggest generator of CSAM?
        I'm not sure I see how that's possible, given their image/video generation seems to be heavily censored. Do they have some alternative product besides "Imagine" or whatever it's called, that people use for generating CSAM?
        Judging by https://old.reddit.com/r/grok (but I haven't validated it myself), it seems like people are complaining more about how censored the model is, than anything else, maybe that's not actually true in reality?
        There are image models out there with 0 restrictions, even available on HuggingFace or CivitAI, I'm guessing those are way more widely used for things like CSAM than any centralized platform with moderation.
        [-]
        2ndorderthought 2 hours ago
        Please don't validate any of this personally that would be illegal.
        I think the proportion of people generating images that way is likely very low. Though I am sure it is possible.
        Here are some links
        https://arstechnica.com/tech-policy/2026/01/x-blames-users-f...
        https://9to5mac.com/2026/02/17/eu-also-investigating-as-grok...
        Concerning.
        [-]
        embedding-shape 1 hour ago
        > Please don't validate any of this personally that would be illegal.
        Obviously, I assumed we all are familiar with our local laws to not unwittingly commit crimes here :)
        > I think the proportion of people generating images that way is likely very low
        So probably a far cry from "holding the world record for the biggest generator of CSAM" given the amount of local alternatives available? Would be my guess at least, but obviously also hard to know for sure.
        > Though I am sure it is possible.
        How can you be sure of this? I've tried just now to get Grok to generate even sexually explicit material with adults, and it's unable to, all of the requests are getting moderated and censored. Are you claiming that instead of prompting "A man and a woman having sex" you put "A man and a child having sex" and then the moderation doesn't censor it? Somehow I find that hard to believe, but as you say, I'm not gonna test that either, so I guess we'll never know for sure.
        [-]
        2ndorderthought 26 minutes ago
        I have no idea what people are doing to get it to generate illegal content. I only know there are thousands of cases of it via articles about it. I have not, and will not use grok as a product.
        [-]
        embedding-shape 6 minutes ago
        > I have no idea what people are doing to get it to generate illegal content.
        Isn't that relevant to somehow know those things before you say stuff like "I am sure it is possible"? Seems bit strange to first confidently claim you know something then saying you actually have no idea.
        Not doubting that it used to be true, that people could generate CSAM, I just don't see how it's possible today, because it seems heavily censored for any explicit/adult content.
      - addedGone 3 hours ago
        Mistral will also tell you how to do ransoms btw from A to Z in automated ways, you are saying they are responsible? I don't get the mix here.
        [-]
        2ndorderthought 3 hours ago
        Yes any company generating csam should not be in business as a legitimate entity. Can you send me a link from a reputable enough source where Mistral models have done this? I didn't even realize they were doing image generation.
        [-]
        pred_ 2 hours ago
        > Yes any company generating csam should not be in business as a legitimate entity.
        At the same time, in this corner of the world, acting Minister for Justice (also known for trying to push through Chat Control), and NGO Save the Children, have been working to make legal the generation of CSAM for law enforcement use. So that would certainly make the industry legitimate, and you would already have a customer.
        https://www.justitsministeriet.dk/pressemeddelelse/regeringe...
        [-]
        2ndorderthought 2 hours ago
        I think they key point here is "for law enforcement". That's a little different from "pay me 10 dollars and enjoy the felonies". I still don't feel good about that by the way.
        [-]
        naasking 1 hour ago
        Would you feel good about completely fake CSAM if it actually reduced incidence of child molestation?
        [-]
        addedGone 3 hours ago
        If I send you a convo I've had with Mistral and Claude Sonnet 3.7 that say atrocious things (how to scam, and get away with it, by exploiting dating websites in Thailand, you don't even want to know the next steps trust me when it talks about the UK incorporation by the Thai itself that you brainwash first to send packages safely without customs seizing it and so on), you'll then publicly recognize that both those companies should be avoided and are promoting crime? If we have a deal and you publicly acknowledge it, I'll share you the links.
        [-]
        2ndorderthought 3 hours ago
        Sure!
        Hamuko 3 hours ago
        But it's not doing any ransoms, right? Because Grok wasn't instructing users on how to create CSAM.
    - nsowz 4 hours ago
      I didn’t say “progressive”; I said “as progressive”.
      [-]
      - aqme28 3 hours ago
        I don't see how that changes my point at all.
        edit: to clarify for you, here's an example.
        Model A advocates for single-payer healthcare, while Model B prefers for the current US healthcare system. So on that one axis, A is more progressive than B. Neither of them needs to be racist for that calculation.
    - simianwords 4 hours ago
      Can you share a prompt that can show how it is openly racist now? Lots of easy claims like this can be debunked
      [-]
      - aqme28 3 hours ago
        What claim? I didn't make any of that sort
  - Der_Einzige 2 hours ago
    Grok absolutely is fine with being very racist. Stop spreading lies on the internet.
  - SanjayMehta 4 hours ago
    100% agree. Grok may or may not be biased one way or the other as far as the US is concerned but from the rest of the world perspective it's mostly the same as any other model trained on Wikipedia.
- Scroll_Swe 23 minutes ago
  I use Grok as a chatbot for ERP yes. >:3
  I also use it as a regular chatbot but on the free plan, they only allow Fast mode now, but it is still good for all kinds of queries.
  "racism", sure, you can call it that.
  Where would you rather live, what country?
  https://www.youtube.com/watch?v=jq0SrR6XoXc
  https://youtu.be/ty_OdtKEY4U?si=aVB0-gpEnIDw8sBL&t=55
  Sure call me racist, cry and scream and bang your hands. You still would want to live in North/West Europe, NA or Japan/AUS rather than anywhere else in the world?
  [-]
  - peter_griffin 15 minutes ago
    ew
    [-]
    - Scroll_Swe 7 minutes ago
      Peter, Where would you rather live, what country? :)
      Any other, maybe more constructive comments?
      HN is not a platform for quick drive-by comments such as your "ew", we strive to be better here. That sometimes mean confronting uncomfortable topics in search of discourse and truth.
      Your comment is very "new internet", new Reddit, Instagram and such. You should maybe go back to using those, they are hugboxes for comments like yours. Yes king! Stay strong against racism.
amunozo 43 minutes ago
It's Google time to release something. If I'm not mistaken, it's the big lab that did not release a big model in the last month.
[-]
- samuelknight 1 minute ago
  They have always released slowly, and they are usually tagged "preview".
- brazukadev 2 minutes ago
  Google released Gemma4 recently and got quite good reviews from the local models community.
Barbing 2 hours ago
Grok 4.3 was completed ahead of its CEO’s lesson on this common safety resource:
```
  Asked if he knew anything about OpenAI's "safety card," Musk smiled and replied: "Safety card? Why would it be a card?"
```
https://www.axios.com/2026/04/30/musk-openai-safety-grok
Low relevancy in spite of cluster size and musical chair gas generators for time being:
```
  Later in his testimony, Musk was asked about a claim he made last summer that xAI would soon be far beyond any company besides Google. In response, he ranked the world’s leading AI providers, saying Anthropic held the top spot, followed by OpenAI, Google, and Chinese open source models. He characterized xAI as a much smaller company with just a few hundred employees.
```
https://techcrunch.com/2026/04/30/elon-musk-testifies-that-x...
(Affiliated with no AI company, just surprised to read this yesterday - how could Elon miss model cards…concerning…, & the fact money can’t buy success every time.)
[-]
- tecoholic 2 hours ago
  Seriously though, why is it a model "card", safety "card"? I had to lookup to learn that it comes from HuggingFace's vague definition of "README" in the model's repo. This is such a specific thing that I don't think anyone except a very small population would know - not the users, not the c-suites.
  I don't like Musk or Grok. But not knowing what's a safety card is not a signal of anything IMO.
  [-]
  - accrual 37 minutes ago
    > Seriously though, why is it a model "card", safety "card"?
    My assumption is because "card" has a more formal tone than a README, which is more like a quick "how to use the software" guide.
    Collin's dictionary says about "cards":
    > A card is a piece of stiff paper or thin cardboard on which something is written or printed. (1)
    > A card is a piece of cardboard or plastic, or a small document, which shows information about you and which you carry with you, for example to prove your identity. (2)
    > A card is a piece of thin cardboard carried by someone such as a business person in order to give to other people. A card shows the name, address, phone number, and other details of the person who carries it. (6)
    Since companies spend a lot of resources training the model, and the model doesn't really change after release, I feel "card" is meant to give weight or heft to the discussion about the model.
    It's not meant to be updated like a README or other software documents, it's meant to be handed out to others as a firm, unchanging "this is a summary of the model and its specifications", like a business card for models.
    [-]
    - lukewarm707 23 minutes ago
      maybe it was from soccer cards.
      the model gets the yellow card.
      if it wants to become skynet it gets a red.
  - Barbing 2 hours ago
    He asked why it would be a card. URL slug of world’s hottest (non-Nvidia?) company:
```
  system-cards
```
    https://www.anthropic.com/system-cards
    You’d have to be asleep at the wheel. For years:
```
  Claude 2
  July 2023
  Read system card
```
    But users don’t need to know you’re 100% right, you shouldn’t need to know this inside baseball (you didn’t pollute & compute & gain the responsibility).
- kardianos 57 minutes ago
  Elon has publicly stated that he cares a great deal about safety. He has stated that the only safe models are those which align greatest with truth, that which is in reality. In this, xAI has lived up, as it has proved to hallucinate least (or close to least) in benchmarks.
  If you read that, quote again, he is saying "how can you quantify safety in a card?"
  [-]
  - WarmWash 35 minutes ago
    The irony that the guy who lies incessantly for years now with empty promises about his businesses is most concerned with truth...
  - Aurornis 34 minutes ago
    > If you read that, quote again, he is saying "how can you quantify safety in a card?"
    Everyone familiar with LLM research understands what is meant by “card”.
    He was being obtuse to try to dodge the question and simultaneously give performance for his fans.
    [-]
    - neuronexmachina 23 minutes ago
      For model cards in general, I have a suspicion that grok's training includes a fair amount of distillation off their competitors' models. That should be disclosed in a model card, and one of the reasons they likely don't want to release one.
xiphias2 3 hours ago
It's just at the Chinese levels for coding, so right now it's just a money earing thing for investors.
I hope the Cursor guys help them catch up to be closer to frontier models because they badly need help in it.
[-]
- ai_fry_ur_brain 3 hours ago
  They all suck.
- AntiUSAbah 2 hours ago
  I hope not. Musk can directly go to hell with his shit.
  Nonetheless, the 10 Billion and 60 Billion deal with Cursor is weird as hell. I can only imagine that he wants to throw as much money at all of his shit before the IPO.
  He probably wants the training data
  [-]
  - xiphias2 2 hours ago
    Sure, then good like paying twice as much for the next Opus / Codex models.
    Margins are going up for the 2 frontier model providers like crazy, and I don't expect it to go down more, I think we have seen the cheapest token prices already.
    [-]
    - AntiUSAbah 1 hour ago
      We don't need Musk for this.
      There is plenty of Chinamodels, Mistral and co.
- bakies 34 minutes ago
  I'm rooting for the china models so I can run it at home. Qwen is getting pretty good for how big it is. Idgaf about this asshole and his mechahitler.
netdur 4 hours ago
In court vs openai, Musk said Grok is partly trained on openai models, so it should be somehow similar to Chinese models in terms of performance and cost!
[-]
ezoe 1 hour ago
While the tread is swapping between "OMG Claude good. OpenAI was done for" and "OMG Codex good. Anthropic was done for". I've never heard about Gemini and Grok. It works mostly similar performance, but people don't mention that much.
Still, my impression is, Gemini hallucinate too much while Grok is always less capable than competitors so it's not worth using it.
[-]
- margalabargala 1 hour ago
  Gemini is the best model for OCR bar none.
  It absolutely sucks at coding.
- kardianos 53 minutes ago
  Gemini 2.5 and 3 can code, but they are also dumb. They don't model the world well. It's hard to use them for programming tasks.
  I haven't tried grok4.2 or grok4.3 yet for coding, but it wasn't up to the challenge as an agent yet. It looks like grok4.3 shifted its training and operates always as an agent first judging on some web usage. Musk knows grok is behind and states it publically. Now with grok4.3 release I do plan to try it again to see if it is suitable.
  [-]
  - WarmWash 33 minutes ago
    Gemini weakness is coding, but it will go toe to toe with 5.5 for science, (classic) engineering, finance, basically not programming stuff. It also does it while using about 1/4 the tokens.
maz1b 5 hours ago
I still wish they named it something else, but congratulations to the team on what seems to be a good release!
Pricing is also quite surprising, compared to comparable competitors. I guess they have tons of capacity or really want to bring over more people.
[-]
- readthenotes1 3 hours ago
  You don't like science fiction references in general or Heinlein in particular?
  [-]
  - draxil 3 hours ago
    I don't like that word, which was previously a common part of my vocabulary, being forever ruined?
    [-]
    - randallsquared 2 hours ago
      My father's name was Claude, but, you know. ¯\_(ツ)_/¯
- Hamuko 3 hours ago
  [flagged]
alyxya 5 hours ago
Despite their attrition, this combined with their cursor partnership is likely going to make them competitive in coding agents soon.
ragchronos 5 hours ago
When looking at the benchmarks, this model seems to be really close to Kimi K2.6 in terms of intelligence and pricing, hitting that sweet spot. It does also have a higher AA-Omniscience index, which is something kimi and other open models lack in. Curious to see how pleasant it is to use.
[-]
- alfiedotwtf 4 hours ago
  I’ll eat my hat if it even comes close to Kimi
  [-]
  - mirekrusin 4 hours ago
    How would you like it? Well done?
    [-]
    - __patchbit__ 3 hours ago
      What about spending $41 million on each model's tokens and seeing the value gain? be it efficiency gain in factory work or energy savings in austere battlescape hunting.
mythz 5 hours ago
Ok speed (202.7 tok/s) and value (1.25 -> 2.50) look great, with pretty decent intelligence.
[-]
- pzo 5 hours ago
  The problem with speed is that they usually are very fast for first few weeks and then suddenly much slower. They did such trick when they advertised Grok 4 fast ( dropped from 200 tps to 60tps)
  [-]
  - victorbjorklund 4 hours ago
    Wow. That is a big drop.
  - polski-g 2 hours ago
    Grok 4.1 is still 110tps. The only other model that comes close is Gemini at 85tps.
- energy123 11 minutes ago
  Value should be calculated some other way, like cost per task completion or something.
- Cakez0r 2 hours ago
  202.7 tok/s is only OK speed? Which providers are you using that are significantly better than that?
  [-]
  - mythz 50 minutes ago
    I said speed was great, Cerebas and Groq can provide better performance, likewise Fast versions of Cursor's Composer and Claude.
    The reported speed like benchmarks is only a reported number on paper, we'll see how it holds up in real world usage, so far OpenRouter is only reporting 73tps
    [1] https://openrouter.ai/x-ai/grok-4.3
    [-]
    - lukewarm707 15 minutes ago
      i really don't trust openrouter numbers.
      i use byok and see responses fail on openrouter while they work perfectly at the provider. the provider is often listed as 'down' and it's very clearly up on the original api and serving requests.
      cerebras quotes oss 120b at 3000tps and it is under 800 on openrouter.
      same with fireworks, i am getting much higher numbers not on openrouter. but recently i think fireworks deepseek is kind of spotty, the main provider i know that just doesn't go down is vertex and they charge 2-3x the rest
  - mritchie712 2 hours ago
    for reference, it's the 2nd fastest model tracked in the "Highlights" section of https://artificialanalysis.ai/
    [-]
    - Cakez0r 2 hours ago
      Yes, it's incredibly fast. Openrouter is clocking 60 tokens per second, which is on par with the likes of sonnet, opus, GPT 5.5.
    - goldenarm 2 hours ago
      That section misses Cerebras and Groq which are up to 5x faster.
      [-]
      - Havoc 1 hour ago
        Very different tech and limitations though so wouldn’t make sense to compare 1:1 I think
        [-]
        goldenarm 1 hour ago
        What are the limitations ?
        [-]
        gslepak 49 minutes ago
        Much smaller context
- catcowcostume 4 hours ago
  [flagged]
  [-]
  - kuboble 4 hours ago
    I don't remember the source of the quote.
    But debating whether the models are intelligent is slim to debating whether a car can walk.
    You can offload to the model a lot of work that until recently we thought requires intelligence. The more and better of those tasks the model can do, it's fair to call it intelligence*
    [-]
    - NitpickLawyer 4 hours ago
      "The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra
  - MrDrDr 4 hours ago
    Please elaborate.
    [-]
    - IshKebab 3 hours ago
      Some people have this strange idea that only "whatever humans do" counts as intelligence, despite the fact that a) we don't really have a clue what humans do, and b) "intelligence" is definitely not that strictly defined.
      I think they're just trying to feel like they know some important truth that other people don't.
    - nesk_ 4 hours ago
      Prediction is not intelligence.
      [-]
      - mirekrusin 4 hours ago
        Misprediction is?
  - exe34 4 hours ago
    What does intelligence mean to you?
mirekrusin 4 hours ago
All those plans from providers should be sliders – prepay more, get more in return.
agunapal 4 hours ago
Very competitive price for the speed and intelligence being offered!
OtherShrezzing 4 hours ago
The tok/s stat is interesting. Since the dominant constraint on inference speed is hardware, it suggests X purchased far more compute than was really needed to serve the demand for their models.
Expensive miscalculation.
[-]
- flir 4 hours ago
  Didn't a bunch of hardware that was destined for Tesla get redirected to xAI? I'm sure I remember something like that.
  [-]
  - mikeyouse 3 hours ago
    Yep! Why his shareholders in Tesla abide by this kind of thing is beyond me, but he often mixes resources from completely unrelated companies: https://www.cnbc.com/amp/2024/06/04/elon-musk-told-nvidia-to...
samagragune 1 hour ago
Bro the agent deciding how many tools to call on its own is wild for cost predictability. Who's approving that bill?
BoredPositron 4 hours ago
Yay, free tokens. I don't know why but grok always seems good fast in the free token phase and after that degrades.
kilroy123 3 hours ago
People are going to hate on Grok because of Musk. However, I do hope they're successful in making a powerful model. We desperately need more competition. I want cheap subsidized AI plans.
I hope Meta finally comes around, too. I want those sweet, sweet billionaire subsidized tokens.
[-]
- renegade-otter 2 hours ago
  Pardon me for feeling icky when giving money to the guy who is obsessed with "white replacement".
  I am old and cynical - I have no illusions, but I also have my limits and a semblance of moral compass. We, as citizens, can vote with ballots, but also with money.
  And, no, I am not someone who keeps boycotting companies for every little grievance (was on the receiving end of that nonsense twice).
  [-]
  - foltik 2 hours ago
    Never used grok, never will.
- ai_fry_ur_brain 2 hours ago
  Your $200 claude code subscription is a cheap subsidized plan.
  You're getting like 40k in tokens a year for $2400. A whole lotta people are about to be sad when they realize they bet their competency on that lasting forever.
  [-]
  - kilroy123 2 hours ago
    That's my point. While the billionares fight each other over who has the best model, this will continue for a while. At least, I think so.
- troupo 3 hours ago
  Credit where it's due, Grok is currently the only model that has near-realtime updates from/access to a waterhose of data, and is casually used by regular people all the time.
  I don't think there's a single thread on Xitter whete people don't delegate some question to grok.
  (There's a separate conversation of failure modes, and whether it's a good thing, and how much control Elon had when he doesn't like Grok's "woke" responses)
  [-]
  - bakies 29 minutes ago
    All the major tools can websearch guy
Imustaskforhelp 5 hours ago
Pelican riding a bike here: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
(ran this on arena.ai direct chat and also tried to write this gist inspired by how simon writes his gists about pelicans)
Edit: just realized that I made pelican riding a bike instead of bicycle, which now makes sense as to why it hardened the bicycle to look tankier, going to compare this with pelican riding a bicycle if anybody else shares the pelican riding a bicycle.
[-]
- gchamonlive 4 hours ago
  https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
  You should probably come up with variations, like a beaver riding a scooter or something, just to see what's what :)
  [-]
  - Imustaskforhelp 4 hours ago
    Thanks I have generated both
    beaver riding a scooter: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
    pelican riding a bicycle: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
    Personal opinion but the beaver one looks especially bad as compared to pelicans. Can we be for sure that this model of grok-4.3 hasn't been trained on pelican. Simonw in blog-post says that he will try with other creatures so I hope he does that but it does feel to me as the model/xAI is trying to cheat, Hope Simonw tests it out more.
    Edit: Also added turtle riding a scooter, something which literally has images online or heck even teenage mutant ninja turtles and I thought that it would be able to pass this but it wasn't even able to generate this: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...
    This literally looks more avocado than turtle. Perhaps this could be a bug from arena.ai or something else too, not sure but at this point waiting for simon's analysis.
    [-]
    - gchamonlive 4 hours ago
      We can never be sure of course, but I think this is a very strong indication that pelican riding a bike is indeed going into the training dataset.
      Thanks for generating those!
simianwords 5 hours ago
https://artificialanalysis.ai/models/grok-4-3
[-]
- nextaccountic 5 hours ago
  This puts Sonnet 4.6 above Opus 4.6 in the coding index.. kinda hard to trust those numbers.
  (Also it puts Opus 4.7 universally above Opus 4.6, and I may be wrong but this doesn't seem to match the experience of most/many/some people. I think it's widely recognized that Anthropic is severely lacking compute and Opus 4.7 is a costs saving measure)
  [-]
  - conception 1 hour ago
    What I’ve usually seen is 4.7 -> 4.5 -> 4.6 in terms of quality. Though 4.7 seems to hallucinate more than before.
  - manmal 4 hours ago
    Anthropic themselves have (had?) this thing where Opus is used for planning and Sonnet for coding.
    [-]
    - nextaccountic 3 hours ago
      I thought this was a costs saving measure: we plan with the frontier model / SOTA, then code with something cheaper.
      But then, Anthropic employees don't have rate limits, right?
- Alifatisk 5 hours ago
  Does numbers don't look exciting at all? I may have gotten spoiled by releases from Qwen, Kimi and Z.ai who keep closing the gap between closed weight SOTA models and open weight. From my experience, Grok is only useful for one thing, and that's looking up things for you and gathering a consensus on topics. That's it.
  Update, I noted that Grok 4.3 is in the "Most attractive quadrant", that's cool! It is also in the top 5 highest in "AA-Omniscience Index", good! Really good.
- progbits 5 hours ago
  What's with the charts and numbers?
  It says #1 for speed but then in the chart it's #2. Also says #10 for intelligence but then it's #7 in the chart.
- BoorishBears 5 hours ago
  What an exciting game we're playing, where the most popular leaderboard is completely made up and the stakes are in the trillions.
sexylinux 2 hours ago
Is this now a reliable product or will it still produce errors?
alfiedotwtf 4 hours ago
If there was any model I wouldn’t trust, it wouldn’t be the ones from China, it would be the one from Elon Musk
[-]
- Cthulhu_ 4 hours ago
  Thankfully it's not an either / or, I don't trust any models. This is a healthy attitude to have because you shouldn't trust anyone on the internet either, especially when it comes to specific subjects.
  [-]
  - benrutter 3 hours ago
    That's definitely a good approach. Although I get a little concerned about the resources put into convincing people that models (and especially Grok) are accurate. For example, X's "fact checked by Grok" approvals, which I've unfortunately heard people reference as meaningful.
    Politically motivated models can still do a lot of damage that affects me (or "have a lot of impact" depending on whether you like the politics or not) even if I don't engage with them myself.
  - 2ndorderthought 4 hours ago
    I don't trust this. But by not trusting it I am inherently trusting it. But by trusting it I shouldn't.
jimmypk 1 hour ago
[flagged]
unit149 3 hours ago
[dead]
shchess 4 hours ago
[dead]
happosai 4 hours ago
[flagged]
[-]
- Hugsun 3 hours ago
  It is unbelievable that this is a controversial opinion.
- miroljub 4 hours ago
  [flagged]
  [-]
  - vrganj 4 hours ago
    There is no non-bias. What you call unbiased is always just a reflection of your personal biases.
    That being said, I am definitely against a model that is biased to be following the ideology of a far-right extremist.
  - Jtarii 4 hours ago
    Musk bought a social media company for the specific purpose of getting Trump elected by turning it into a right wing propaganda machine. Have Anthropic/OpenAI/Google done something similar to that?
  - henry2023 3 hours ago
    [dead]
AntiUSAbah 2 hours ago
[flagged]
[-]
- NotGMan 2 hours ago
  Get professional help.
  [-]
  - AntiUSAbah 1 hour ago
    You are smart enought to post on hn but not smart enough to have an argument?
    Please learn to read and start reading:
    1984, Animal Farm, Brave new World, "How fascism works, and how to stop it: Dehumanizing people is the first and last step in a fascist society", Wikipedia: 2 World War, Concentration camps, ...
th3b0tk1ll3r 4 hours ago
[flagged]
[-]
- curtisblaine 4 hours ago
  Please avoid comments with no real substance, written just to denigrate, with a throwaway account. They make discourse unnecessarily worse.
khalic 4 hours ago
This project is a gigantic waste of resources, it’s fine tuned on politics of the CEO, was used for CSAM generation and just sucks overall
[-]
- johnnyApplePRNG 3 hours ago
  The resource waste he's talking about is horrendous, read more here: https://time.com/7308925/elon-musk-memphis-ai-data-center/
- servo_sausage 4 hours ago
  I like that there are models with divergent politics; the status quo being creepy corporate left silicon valley is not healthy or pleasant to interact with.
  Even with grock it's only broadening things to creepy corporate right of silicon valley.
  [-]
  - breezybottom 2 hours ago
    Silicon Valley...left? Huh?
  - KingMob 44 minutes ago
    I'll take the fake corporate "left" over white supremacy any day.
- spiderfarmer 4 hours ago
  It’s a model made for 36% of Americans. The rest of the world can’t care less.
  [-]
  - 2ndorderthought 4 hours ago
    Considering how few Americans there are and how little of that 39% even uses technology, that's what 20 million people at a maximum?
    [-]
    - Hugsun 3 hours ago
      That seems like a decently sized market. Maybe not for an AI lab though.
      [-]
      - 2ndorderthought 3 hours ago
        Sure it's a good market for a normal company. For a social media company it's pretty isolated and really limits the products that can come out. But their current selling points: propaganda, csam, and psychosis engagement are quite strong amongst that population.
        [-]
        cindyllm 3 hours ago
        [dead]
gigatexal 3 hours ago
How do the grok models fare in coding challenges to say gpt 5.5 and opus 4.6/4.7?
I hate giving Elon any money. The man is a net negative to society but … if the models are objectively better then logically I must no?
[-]
- simonh 3 hours ago
  Logic can't tell you what your objectives should be, only how to achieve them.
  [-]
  - gigatexal 1 hour ago
    Fair. Anyway I’ll look at benchmarks.
- gigatexal 1 hour ago
  All the downvotes are from Elon Stan’s. Think on your sins. ;-)
lpcvoid 38 minutes ago
It can now quote "mein Kampf" in over 21 languages!