Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

(algogist.com)

190 points | by jainilprajapati 16 hours ago

4 comments

dang 12 hours ago
We've moved the (relevant) comments to https://news.ycombinator.com/item?id=44807868, which was posted by the project creators.
I've re-upped that thread to the same position the previous discussion (this one) was at.
divamgupta 13 hours ago
Thanks for posting about our project in HN! I am one of the creators of KittenTTS
Here is the link to our repo: https://github.com/KittenML/KittenTTS
[-]
- jainilprajapati 6 hours ago
  <3
binary132 6 hours ago
I’m new to TTS models but is this something I can plug into my own engine like with LLMs, or does it require the Python stack it ships with?
colonCapitalDee 15 hours ago
[flagged]
[-]
- esseph 15 hours ago
  Everybody always thinks everything is AI. AI learned from consuming writing.
  This is a ouroboros that will continue.
  (Not saying this is or isn't, simply that these claims are rampant on a huge number of posts and seem to be growing.)
  [-]
  - treyd 15 hours ago
    This is strictly true but not correct. LLMs were trained on human-written text, but they were post-trained to generate text in a particular style. And that style does have some common patterns.
    [-]
    - esseph 14 hours ago
      So are you saying all LLMs were post-trained in that style then?
      Because, well, there's a huge number of models. Are they all, as they say, "in cahoots"? (working together, clandestinely)
      [-]
      - rgoulter 14 hours ago
        Examples of LLM-style text: short & punchy sentences, negative parallelism ("not just X, it's Y"), bullet points especially with emojis and bolded text. Overuse of em-dash.
        This is a good list: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
        It's one thing to observe "LLM-generated writing all looks the same". Whether the LLMs were all post-trained the same way is a different question.
        I don't agree "everyone says everything is AI". Do you have examples where a consensus of people are accusing something of being AI generated, where it lacks those indicators?
        [-]
        K0balt 13 hours ago
        I consistently get accused of AI writing, but I’m not really sure why. I use spellcheck, that’s about it. Although I am a fan of an appropriately used em-dash, I don’t really follow the other patterns. I also find that people say that as a form of character assassination though, literally denying your humanity lol.
        EnnEmmEss 11 hours ago
        Many of those rules are kind of hazy though. Curly Quotes, em-dash, etc are also signs of using MS Word for writing for examples.
        Teknomadix 14 hours ago
        It's our fault—we've all been overusing emojis and the em—dash for years.
        [-]
        EGreg 13 hours ago
        I know exactly what you mean ^_^ honestly — and I’m saying this with a certain satisfaction — it’s been difficult to stop smiling :)
        It’s not slop — it’s inspiration!
        esseph 14 hours ago
        Just reading through posts on here about various blogs/posts/opinion pieces there always seems to be a handful of people that jumps to "this is AI". And maybe it is! But the driving force behind this seems not to be to identify that something is AI, but because they spite it so (AI writing), to quickly rule out caring about the material if identified as AI slop.
        The problem I see this leading to is plenty of legitimate written things getting thrown away because somebodys online exposure bubbles don't end up including a lot of Medium or Tumblr or a certain Discord or whatever bubble where _huge_ groups of people actually are writing in whatever $STYLE is being identified by the reader and commenter as AI. Which then, because of their post, also gets other people to not even look.
        It seems like a disaster, frankly.
        [-]
        rgoulter 13 hours ago
        > But the driving force behind this seems not to be to identify that something is AI, but because they spite it so (AI writing), to quickly rule out caring about the material.
        Your expressed concern is "people don't like AI; this dislike motivates people to dismiss the material".
        I think it's misguided to assume motivation.
        For myself, I dislike the writing style because it's insincere and inauthentic. If the author isn't motivated enough to write out something in their own words, what's there to motivate a reader?
        > The problem I see this leading to is plenty of legitimate written things getting thrown away because somebodys online exposure bubbles don't end up
        Do you have any actual examples where legitimate writing was dismissed as written by AI? If not, I'd suggest your concern is hypothetical.
        [-]
        esseph 13 hours ago
        I believe the same depth as your comment is a comment from another person who also writes like this.
        And yes, I'm not writing a research paper, I'm posting a comment. Full Disclaimer for those not paying attention, this is an Opinion.
        One: https://news.ycombinator.com/item?id=44807103
        Two: https://news.ycombinator.com/item?id=44807541
      - raincole 14 hours ago
        There was a time everyone trained their models with ChatGPT output. You can still find open source models that tell you they're ChatGPT if you ask.
      - koolala 14 hours ago
        Seems like many train on the output of other models for post-training and catch some kind of cooties.
      - someguy101010 14 hours ago
        if the people who develop and release these models were all optimizing for the same goals, they could converge on strategies or behaviors, without coordinating.
    - mikepurvis 14 hours ago
      I'm one of the unlucky ones who has coincidentally trained myself over the past fifteen years to write in the style that is now largely recognized to be the ChatGPT style— bolded lists, clear section breakdowns with intro and concluding sentences, correct and liberal use of semicolons and em-dashes. The only parts of it I don't do are litter my text with random emojis or directly address the reader with simpering praise.
      [-]
      - esseph 14 hours ago
        Sounds like someone that shoots for simple but effective communication, to me.
        [-]
        mikepurvis 13 hours ago
        I mean, that has always been my intention with it— particularly in the context of something like a ticket or design doc where it's critical that other busy people be able to quickly get a high level overview and then scan for the bits that are most relevant or of interest to themselves.
        It's just ironic that I've now been asked if I was using AI to write (or punch up) content that I've produced in this style when I most certainly was not.
- jainilprajapati 13 hours ago
  This is HOW I WRITE man yes I agree I take LITTLE help Of AI
- namuol 13 hours ago
  I think it’s fair enough to just say that the writing is cringe, AI or not.
- dismalaf 15 hours ago
  The writing style we associate with AI is the 2010's blogging style that AI learned from... So it definitely could have been written by a person.
  [-]
  - hildolfr 15 hours ago
    No it isn't, it's something new born from ingesting that stuff... That's exactly why a lot of us can detect it from a mile away.
    No human comments on meta formatting like that outside the deepest trenches of Apple/FB corporate stuff.
    [-]
    - esseph 13 hours ago
      This is very much our internal newsletter at work, which is actually still written by human hand (and we know it is, she can't stand "using those things”).
    - croes 14 hours ago
      > That's exactly why a lot of us can detect it from a mile away.
      Is that tested and proven or just gut feeling?
    - dismalaf 13 hours ago
      You must not have read a lot of blogs... This style is 100% the pretentious kind of writing that was in vogue.
- anonym29 14 hours ago
  [flagged]
  [-]
  - tomhow 12 hours ago
    Please don't post snarky comments attacking other users like this on HN, no matter what you're replying to. It's not what this site is for, and destroys what it is for.
    If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
    [-]
    - anonym29 12 hours ago
      Point taken, but I'd like to raise some serious questions - is the 18 millionth post of someone whining about having to read text written by an LLM that much more of a substantive contribution?
      Is it "thoughtful criticism" to have the same pedantic complaint made everywhere?
      Is offering zero feedback to OP other than whining about the presence of LLM-written text in a README not a "shallow dismissal"?
      What about "Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."?
      Or is snark the only rule that matters enough to warrant reminders about rules?
      Sorry for inappropriately handling the frustration I get with this kind of repetitive, shallow, pedantic no-value-add whining clogging up HN any time any LLM-generated text ever accompanies any part of a featured article / link that never gets the same kind of warnings or moderation.
      The people who make these kinds of complaints need to accept that LLM-generated text is a fact of life now - even in (or perhaps especially in) interesting technical projects. We all heard their complaints the first time, and the fiftieth time, and the five thousandth time - those complaints added no value to the relevant discussions then and they add no value to the discussion here, it's just bullies taking advantage of the latest snobby, diminutive way to shit on other people's work over what amounts to little more than subjective cosmetic preferences.
      A tiny CPU-only TTS model is awesome. Why is it appropriate to derail the discussion about the actual technical innovation here with a low-effort complaint that's so common it has become a trope?
      [-]
      - tomhow 11 hours ago
        We have frequently asked users not to make public accusations of posting LLM-generated content, so much so that several of the most engaged HN users routinely flag these kinds of comments and post their own reminders not to do it, and email us to let us know about it.
        That is the right way to deal with undesirable activity on HN: flagging, emailing us so we can take action, and if replies are to be posted, expressing them with respect and kindness as the guidelines ask of us.
        The problem with a reply like yours – one that is much worse than an already-bad comment – is that it becomes the highest-priority comment for us to respond to, and makes it harder for us to deal with the original comment with our normal approaches.
- PontifexMinimus 14 hours ago
  Indeed the blurb is absurd and very off-putting. It's not a big deal that "It clocks in at under 25MB with just 15 million parameters", because text to speech is a long-solved problem, in fact the Texas Speak and Spell from 1978 (half a century ago FFS) solved it, probably with a good deal less than 25MB.
  [-]
  - paulryanrogers 14 hours ago
    Speak and Spell was a toy. I loved it as a kid in the eighties. But it was very limited and sounded terrible.