Stable-Audio-Demo

(stability-ai.github.io)

512 points | by beefman 505 days ago

39 comments

shon 505 days ago
Interestingly, Ed Newton-Rex, the person hired to build Stable Audio, quit shortly after it was released due to concerns around copyright and the training data being used.
He’s since founded https://www.fairlytrained.org/
Reference: https://x.com/ednewtonrex
[-]
- doctorpangloss 505 days ago
  For generative models, if the model authors do not publish the architecture of their model; and, the model uses a transformation from text to another kind of media; you can assume that they have delegated some part of their model to a text encoder or similar feature which is trained on data that they do not have an express license to.
  Even for rightsholders with tens of millions to hundreds of millions of library items like images or audio snippets, the performance of the encoder or similar feature in text-to-X generative models is too poor on the less than billion tokens of text in the large repositories. This includes Adobe's Firefly.
  It is also a misconception that large amounts of similar data, like the kinds that appear in these libraries, is especially useful. Without a powerful text encoder, the net result is that most text-to-X models create things that look or sound very average.
  The simplest way to dispel such issues is to publish the architecture of the model.
  But anyway, even if it were all true, the only reason we are talking about diffusers, and the only reason we are paying attention to this author's work Fairly Trained, is because of someone training on data that was not expressly licensed.
  [-]
  - sillysaurusx 505 days ago
    If you require licensing fees for training data, you kill open source ML.
    That’s why it’s important for OpenAI to win the upcoming court cases.
    If they lose, they’ll survive. But it will be the end of open model releases.
    To be clear, I don’t like the idea of companies profiting off of people’s work. I just like open source dying even less.
    [-]
    - JoshTriplett 505 days ago
      > If you require licensing fees for training data, you kill open source ML.
      And likely proprietary ML as well, hopefully.
      (To be clear, I think AI is an absolutely incredible innovation, capable of both good and harm; I also think it's not unreasonable to expect it to play a safer, slower strategy than the Uber "break the rules to grow fast until they catch up to you" playbook.)
      I'm all for eliminating copyright. Until that happens, I'm utterly opposed to AI getting a special pass to ignore it while everyone else cannot.
      Fair use was intended for things like reviews, commentary, education, remixing, non-commercial use, and many other things; that doesn't make it appropriate for "slurp in the entire Internet and make billions remixing all of it at once". The commercial value of AI should utterly break the four-factor test.
      Here's the four-factor test, as applied to AI:
      "What is the character of the use?" - Commercial
      "What is the nature of the work to be used?" - Anything and everything
      "How much of the work will you use?" - All of it
      "If this kind of use were widespread, what effect would it have on the market for the original or for permissions?" - Directly competes with the original, killing or devaluing large parts of it
      Literally every part of the four-factor test is maximally against this being fair use. (Open Source AI fails three of four factors, and then many users of the resulting AI fail the first factor as well.)
      > If they lose, they’ll survive.
      That seems like an open question. If they lose these court cases, setting a precedent, then there will be ten thousand more on the heels of those, and it seems questionable whether they'd survive those.
      > To be clear, I don’t like the idea of companies profiting off of people’s work. I just like open source dying even less.
      You're positioning these as opposed because you're focused on the case of Open Source AI. There are a massive number of Open Source projects whose code is being trained on, producing AIs that launder the copyrights of those projects and ignore their licenses. I don't want Open Source projects serving as the training data for AIs that ignore their license.
      [-]
      - sillysaurusx 505 days ago
        It’s not so clear cut. Many lawyers believe all that matters is whether the output of the model is infringing. As much as people love to cite ChatGPT spitting out code that violates copyright, the vast majority of the outputs do not. Those that do, are quickly clamped down on — you’ll find it hard to get Dalle to generate an image of anything Nintendo related, unless you’re using crafty language.
        There’s also the moral question. Should creators have the right to prevent their bits from being copied at all? Fundamentally, people are upset that their work is being used. But "used" in this case means "copied, then transformed." There’s precedent for such copying and transformation. Fair use is only one example. You’re allowed to buy someone’s book and tear it up; that copy is yours. You can also download an image and turn it into a meme. That’s something that isn’t banned either. The question hinges on whether ML is quantitatively different, not qualitatively different. Scale matters, and it’s a difference of opinion whether the scale in this case is enough to justify banning people from training on art and source code. The courts’ opinion will have the final say.
        The thing is, I basically agree with you in terms of what you want to happen. Unfortunately the most likely outcome is a world where no one except billion dollar corporations can afford to pay the fees to create useful ML models. Are you sure it’s a good outcome? The chance that OpenAI will die from lawsuits seems close to nil. Open source AI, on the other hand, will be the first on the chopping block.
        [-]
        bryanrasmussen 505 days ago
        >Those that do, are quickly clamped down on — you’ll find it hard to get Dalle to generate an image of anything Nintendo related, unless you’re using crafty language.
        really it seems more like someone was afraid of angering Nintendo who is a corporate adversary one does not like to fight and thus it has a bunch of blocks to keep from generating anything that offends Nintendo, that does not really translate to quickly and easily stopping and blocking offending generations across every copyrighted work in the world.
        navjack27 504 days ago
        Dalle on Bing is happy to generate Mario and Luigi and Sonic and basically everybody from everybody without using crafty language so I'm unsure of what you're talking about.
        JAlexoid 504 days ago
        It would be interesting to see if courts agree that training+transforming = copying.
        If I paint a picture inspired by Starry Night(Van Gogh) - does that inherently infringe on the original? I looked at that painting, learned the characteristics, looked at other similar paintings and painted my own. I basically trained my brain. (and I mean the copyright, not the individual physical painting)
        And I mean cases where I am not intentionally trying to recreate the original, but doing a derivative(aka inspired) work.
        Because it's already settled that recreating the original from memory will infringe on copyright.
        raphman 504 days ago
        > Many lawyers believe all that matters is whether the output of the model is infringing.
        What I don't understand (as a European with little knowledge of court decisions on fair use): with the same reasoning you might make software piracy a case of 'fair use', no? You take stuff someone else wrote - without their consent - and use it to create something new. The output (e.g. the artwork you create with Photoshop) is definitely not copyrighted by the manufacturer of the software. But in the case of software piracy, it is not about the output. With software, it seems clear that the act of taking something you do not have the rights for and using it for personal (financial) gain is not covered by fair use.
        Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works? What am I missing?
        [-]
        JAlexoid 504 days ago
        > make software piracy a case of 'fair use'
        That's not a good example. Making a copy of a record you own(as an example ripping a audio CD to MP3) is absolutely fair use. Giving your video game to your neighbor to play - that's also fair use.
        Fair use is limited when it comes to transformative/derivative work. Similar laws are in place all over the world, just in US some of those come from case law.
        > With software, it seems clear that the act of taking something you do not have the rights for and using it for personal (financial) gain is not covered by fair use.
        > Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works?
        That's not a good analogy. The argument, that is not settled yet, is that a model doesn't contain enough copyrightable material to produce an infringing output.
        Take your software example - you legally acquire Civ6, you play Civ6, you learn the concepts and the visuals of Civ6... then you take that knowledge and create a game that is similar to Civ6. If you're a copyright maximalist - then you would say that creating any games that mimic Civ6 by people who have played Civ6 is copyright infringement. Legally there are definitely lower limits to copyright - like no one owns the copyright to the phrase "Once upon a time", but there may be a copyright on "In a galaxy far far away".
        Ukv 504 days ago
        > Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works? What am I missing?
        If Photoshop was hosted online by Adobe, you would be free to do so. It's copyrighted, but you'd have an implied license to use it by the fact it's being made available to you to download. Same reason search engines can save and present cached snapshots of a website (Field v. Google).
        In other situations (e.g: downloading from an unofficial source) you're right that private copying is (in the US) still prima facie copyright infringement. However, when considering a fair use defense, courts do take the distinction into strong consideration: "verbatim intermediate copying has consistently been upheld as fair use if the copy is ‘not reveal[ed] . . . to the public.’" (Authors Guild v. Google)
        If you were using Photoshop in some transformative way that gives it new purpose (e.g: documenting the evolution of software UIs, rather than just making a photo with it as designed) then you may* be able to get away with downloading it from unofficial sources via a fair use defense.
        *: (this is not legal advice)
        dannyobrien 504 days ago
        So, fair use is seen as a balance, and generally the balance is thought of as being codified under four factors:
        https://www.copyright.gov/title17/92chap1.html#107
        (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
        (2) the nature of the copyrighted work;
        (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
        (4) the effect of the use upon the potential market for or value of the copyrighted work.
        There's more detailed discussion here: https://copyright.columbia.edu/basics/fair-use.html
      - andybak 505 days ago
        Bear with me here. Rushed and poorly articulated post incoming...
        In the broadest sense, generative AI helps achieve the same goals that copyleft licences aim for. A future where software isn't locked away in proprietary blobs and users are empowered to create, combine and modify software that they use.
        Copyleft uses IP law against itself to push people to share their work. Generative AI aims to assist in writing (or generating) code and make sharing less neccesary.
        I argue that if you are a strong believer in the ultimate goals of copyleft licences you should also be supporting the legality of training on open source code.
        [-]
        TheOtherHobbes 505 days ago
        The obvious difference is that copyleft is voluntary, while having your art style stolen isn't.
        If an artist approached a software developer, created a painting of them using their Mac, and said "There, I've done your job for you" you'd think they were an idiot.
        This is the same from the other side. The inability to understand why that's a realistic analogy does not change the fact that it is.
        [-]
        llm_trw 505 days ago
        > The obvious difference is that copyleft is voluntary, while having your art style stolen isn't.
        What a curious type of theft where the author keeps their art and I get different art.
        webmaven 504 days ago
        "> The obvious difference is that copyleft is voluntary, while having your art style stolen isn't."
        This is why it is important whether you consider that infringement occurs upon ingestion or output. If it only matters for outputs, then artists have a problem, since copyright doesn't protect styles at all, see for example the entire fashion industry.
        There is a saving grace though: Artists can make a case that the association of their distinctive style with their name is at least potentially a violation of trademark or trade dress, especially if that association is being used to promote the outputs to the public. This is a fairly clear case of commercial substitution in the market for creating new works in that artist's style and creating confusion concerning the origin of the resulting work.
        Note that the market for creating new works in a particular artist's distinctive and named style kind of goes away upon the artist's passing. What remains is the trademark issue of whether a particular work was actually created by the artist or not, which existing trademark law is well suited to policing, as long as the trademark is defended, even past the expiration of the copyright.
        Meanwhile, trademark (and copyright) also apply to the subjects of works, like Nintendo's Mario or Disney's Mickey Mouse or Marvel's Iron Man. But we don't really want models to simply be forbidden from producing them as outputs, or they become useless as tools for the purpose of parody and satire, not to mention the ability to create non-commercial fan art. The potential liability for violating these trademarks by publishing works featuring those characters rests with the users rather than the tools, though, and again existing law is fairly well suited to policing the market. Similarly, celebrities' right of publicity probably shouldn't prevent models from learning what they look like or from making images that include their likeness when prompted with their name, but users better be prepared to justify publishing those results if sued.
        You can also make the (technical) argument that if you just ask for an image of Wonder Woman, and you get an image that looks like Gal Gadot as Wonder Woman, that the model is overfitting. That's also the issue with the recent spate of coverage of Midjourney producing near-verbatim screenshots from movies.
        It might be appropriate though to regulate commercial generative AI services to the extent of requiring them to warn users of all the potential copyright/trademark/etc. violations, if they ask for images of Taylor Swift as Elsa, or Princess Peach, or Wonder Woman, for example.
        kmeisthax 504 days ago
        The majority of AI models out there (at least by popularity / capability) are proprietary; with weights and even model architectures that are treated as trade secret. Instead of having human-written music and movies that you legally can't copy, but practically can; you now have slop-generating models that live on a cloud server you have no control over. Artists and programmers who want to actually publish something - copyright or no - now have to compete with AI spam on search engines, while ChatGPT gets to merely be "confidently wrong" because it was built on the Internet equivalent of low-background metal - pre-AI training data. Generative AI is not a road that leads to less intellectual property[0], it's just an argument for reappropriating it to whoever has the fastest GPUs.
        This is contrary to the goals of the Free Software movement - and also why Free Software people were the first to complain about all the copying going on. One of the things Generative AI is really good at is plagiarism - i.e. taking someone else's work and "rewriting it" in different words. If that's fair use, then copyleft is functionally useless.
        It's important to keep in mind the difference between violating the letter of the law and opposing the business interests of the people who wrote the law. Copyleft and share-alike clauses have the intention of getting in the way of copyright as an institution, but it also relies on copyright to work, which is why the clauses have power even though they violate the spirit of copyright. Generative AI might violate the letter of the law, but it's very much in the spirit of what the law wants.
        [0] Cory Doctorow: "Intellectual property is any law that allows you to dictate the conduct of your competitors"
        [-]
        CaptainFever 504 days ago
        Is FSF's stance on AI actually clear? I thought they were just upset it was made by Microsoft.
        Creative Commons has been fairly pro-AI -- they have been quite balanced, actually, but they do say that opt-in is not acceptable, it should be opt-out at most. EFF is fairly pro AI too -- at least, against using copyright to legislate against it.
        You shouldn't discount progress in the open model ecosystem. You can sort of pirate ChatGPT by fine tuning on its responses, there's GPU sharing initiatives like Stable Horde, there's TabbyML which works very well nowadays, and Stable Diffusion is still the most advanced way of generating images. There's very much of an anti-IP spirit going on there, which is a good thing -- it's what copyleft is there for in sprit, isn't it?
        [-]
        kmeisthax 504 days ago
        I haven't paid any attention to the FSF in years.
        The Software Freedom Conservancy has been complaining about GitHub Copilot since 2022[0]. They specifically cite Copilot's use of training data in ways that violate the copyleft and attribution requirements of various FOSS licenses. Hector Martin (the guy porting Linux to MacBooks) also agrees with this. It's also important to note that the first AI training lawsuit was specifically to enforce GPL copyleft[1].
        The EFF's argument has come across to me less like "AI is cool and good" and more like "copyright doesn't do a good job of protecting artists against AI taking their jobs". Cory Doctorow's also taken a similar position, arguing that unions are better at protecting against AI than copyright is. e.g. WGA being able to get contractual provisions preventing workers from being replaced with AI.
        This is a different vein of opposition to AI from what we saw the following year in 2023 with artists and writers, though. Even then, those artists and writers aren't suddenly massively pro-copyright[2] and more consider it a means to fatally wound AI companies[3]. In contrast, big businesses that own shittons of copyright have been oddly quiet about AI. Sure, you have Getty Images and The New York Times suing Stability and OpenAI, but where's, say, Disney or Nintendo's litigation? These models can draw shittons of unlicensed fanart[4], and nobody cares. Wizards and Wacom made big statements against AI art, but then immediately got caught using it anyway, because stock image sites are absolutely flooded with it.
        My personal opinion is that generative AI creates enough issues that we can't group them down into neat "pro-copyright" vs. "anti-copyright" arguments. People who share their work for free online are complaining about it while people who expect you to pay money for their work are oddly ambivalent. AI is orthogonal to copyright.
        I will give you that the open model community is doing cool shit with their stolen loot. However, that's still something large corporations can benefit from (e.g. Facebook and LLaMA).
        [0] https://sfconservancy.org/GiveUpGitHub/
        [1] https://en.wikipedia.org/wiki/GitHub_Copilot#Licensing_contr...
        [2] Which, for the record, many of them break.
        [3] Their actual argument against AI is based on moral grounds, not legal ones. I don't think any artist is going to accept licensing payments for training data, they just want the models deleted off the Internet, full stop.
        [4] OpenAI tried to ban asking for fanart, but if you ask for something vaguely related (e.g. "red videogame plumber" or "70s sci-fi robot") you'll get fanart every time.
      - Ukv 505 days ago
        > Fair use was intended for things like reviews, commentary, education, remixing, non-commercial use, and many other things
        "many other things" has included, for example, Google Books scanning millions of in-copyright books, storing internally them in full, and making snippets available.
        The basis for copyright itself is to "promote the progress of science and useful arts". For that reason a key consideration of fair use, which you've skipped entirely, is the transformative nature of the new work. As in Campbell v. Acuff-Rose Music: "The more transformative the new work, the less will be the significance of other factors", defined as "whether the new work merely 'supersede[s] the objects' of the original creation [...] or instead adds something new".
        > "How much of the work will you use?" - All of it
        For the substantiality factor, courts make the distinction between intermediate copying and what is ultimately made available to the public. As in Sega v. Accolade: "Accolade, a commercial competitor of Sega, engaged in wholesale copying of Sega's copyrighted code as a preliminary step in the development of a competing product" yet "where the ultimate (as opposed to direct) use is as limited as it was here, the factor is of very little weight". Or as in Authors Guild v. Google: “verbatim intermediate copying has consistently been upheld as fair use if the copy is ‘not reveal[ed] . . . to the public.’”
        The factor also takes into account whether the copying was necessary for the purpose. As in Kelly v. Arriba Soft: "If the secondary user only copies as much as is necessary for his or her intended use, then this factor will not weigh against him or her"
        While there are still cases of overfitting resulting in generated outputs overly similar to training data, I think it's more favorable to AI than simply "it trained on everything, so this factor is maximally against fair use".
        > Directly competes with the original, killing or devaluing large parts of it
        The factor is specifically the effect of the use upon the work - not the extent to which your work would be devalued even if it had not been trained on your work.
        [-]
        TheOtherHobbes 505 days ago
        None of those arguments make sense. The output of AI absolutely does supersede the objects of the original creation. If it didn't, artists wouldn't care that they were no longer able to make a living.
        Substantiality of code does not apply to substantiality of style. What's being copied is look and feel, which is very much protected by copyright.
        The copying clearly is necessary for the purpose. No copying, no model. The fact that the copying is then compressed after ingestion doesn't change the fact that it's necessary for the modelling process.
        Last point - see first point.
        IANAL, but if I was a lawyer I'd be referring back to look and feel cases. It's the essence of an artist's look and feel that's being duplicated and used for commercial gain without a license.
        That's true whether it's one artist - which it can be, with added training - or thousands.
        Essentially what MJ etc do is curate a library of looks and feels, and charge money for access.
        It's a little more subtle than copying fixed objects, but the principle remains the same - original work is being copied and resold.
        [-]
        Ukv 504 days ago
        > None of those arguments make sense. The output of AI absolutely does supersede the objects of the original creation. If it didn't, artists wouldn't care that they were no longer able to make a living.
        The question for transformative nature is whether it merely supersedes or instead adds something new. E.G: Google translate was trained on books/documents translated by human translators and may in part displace that need, but adds new value in on-demand translation of arbitrary text - which the static works it was trained on did not provide.
        > Substantiality of code does not apply to substantiality of style.
        I'm not certain what you're saying here.
        > The copying clearly is necessary for the purpose. No copying, no model.
        Which, for the substantiality factor, works in favor of the model developers.
        > It's the essence of an artist's look and feel that's being duplicated and used for commercial gain without a license.
        Copyright protects works fixed in a tangible medium, not ideas in someone's head. It would protect a work's look/appearance (which can be an issue for AI when overfitting causes outputs that are substantially similar to a protected work), but not style or "an artist's look and feel".
        JAlexoid 504 days ago
        > What's being copied is _look and feel_, which is very much protected by copyright.
        If that were the case, no one would be able to paint any cubist paintings. (Picasso estate would own the copyright, to this day)
        It's not that clear cut, there are a lot of nuances.
        [-]
        UncleEntity 504 days ago
        Ironically, Picasso was notorious for copying other artist's 'look and feel'...
        JoshTriplett 504 days ago
        > "many other things" has included, for example, Google Books scanning millions of in-copyright books, storing internally them in full, and making snippets available.
        That succeeds on a different part of the four-factor test, the degree to which it competes with / affects the market for the original.
        Google Books is not automatically producing new books derived from their copies that compete with the original books.
        [-]
        Ukv 504 days ago
        > That succeeds on a different part of the four-factor test, the degree to which it competes with / affects the market for the original
        It satisfied multiple parts of the four-factor test. It was found satisfy the first factor due to being "highly transformative", the second factor was considered not dispositive is isolation and favoring Google when combined with its transformative purpose, and it satisfied the third factor as the usage was "necessary to achieve that purpose" - with the court making the distinction between what was copied (lots) and what is revealed to the public (limited snippets).
        As you had all factors as "maximally against" fair use, do you believe that AI is significantly less transformative than Google Books? I'd say even in cases where the output is the same format as the content it was trained on, like Google Translate, it's still generally highly transformative.
        > the degree to which it competes with
        Specifically, to be pedantic, it's the effect of the use/copying of the original copyrighted work.
      - magicalhippo 505 days ago
        > "What is the character of the use?" - Commercial
        Your first factor seems to not at all be like that which Stanford has in its guidelines[1], which they call the transformative factor:
        In a 1994 case, the Supreme Court emphasized this first factor as being an important indicator of fair use. At issue is whether the material has been used to help create something new or merely copied verbatim into another work.
        LLMs mostly create something new, but sometimes seems to be able to regurgitate passages verbatim, so I can see arguments for and against, but to my untrained eyes doesn't seem as clear cut.
        [1]: https://fairuse.stanford.edu/overview/fair-use/four-factors/
      - fenomas 504 days ago
        Where this argument falls down for me is that "use" w.r.t. copyright means copying, and neither AI models nor their outputs include any material copied from the training data, in any usual sense. (Of course the inputs are copied during training, but those copies seem clearly ephemeral.)
        Genuinely curious: for anyone who thinks AI obviously violates copyright, how do you resolve this? E.g. do you think the violation happens during training or inference? And is it the trained model, or the model output, that you think should be considered a derived work?
        [-]
        frabcus 504 days ago
        Personally I think trained models are derived works of all the training data.
        Just like a translation of a book is a derived works of the original. Or a binary compiled output is a derived works of some source code.
        [-]
        JAlexoid 504 days ago
        You're trying to use words without the legal context here. The legal definition of words isn't 1-1 wit our colloquial usage.
        Translation of a book is non-transformative and retains the original author's artistic expression.
        As a counter example - if you write an essay about Picasso's Guernica painting, it is derivative according to our colloquial use of the term, but legally it's an original work.
        fenomas 504 days ago
        Wikipedia:
        > In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of ... the underlying work
        A trained model fails that on two counts, doesn't it? Both the "includes" part, and the fact that a model is itself not an expressive work of authorship.
        [-]
        frabcus 502 days ago
        I'm not sure. If it fails, then I reckon a binary compiled from source code fails top.
        There's nothing creative about the act of a compiler, it is automatic, just like the training run of an LLM.
        And no part of the original source code is in the binary output.
        And yet, binaries are a derived work from the source code that went into them.
        So something is up! I am not a lawyer though.
        [-]
        fenomas 502 days ago
        > And no part of the original source code is in the binary output.
        It's not about whether the binary includes the raw text of the source, but whether it copies the expressive content. Anything expressive (i.e. copyrightable) in a compiled binary must have come from the sourcecode, so that's what makes it a derived work.
        But the same isn't true of LLMs, which are more like "data about their inputs", than "a transformed version of their inputs".
        thuuuomas 504 days ago
        Curating training data is an exercise in editorial judgement.
        [-]
        fenomas 504 days ago
        If a trained model doesn't meet the definition of being a derivative work, it doesn't matter whether the data it's not a derivative work of was curated.
      - viraptor 505 days ago
        > "How much of the work will you use?" - All of it
        That depends on the interpretation of "use", and it would be interesting to read what lawyers think. You learned the language largely from speech and copyrighted works. (All the stories, books, movies, etc. you ever read/heard) When you wrote this comment did you use all of them for that purpose? Is the case of AI different?
        To be clear that's a rhetorical question - I don't expect anyone here to actually have a convincing enough argument either way.
        [-]
        JoshTriplett 505 days ago
        Principles applied to human brains are not automatically applicable to AI training. To the best of my knowledge, there's no particular law that says a human brain is exempt from copyright, but it empirically is, because the alternative would be utterly unreasonable. No such exemption exists for AI training, nor should it.
        Ideas/works/etc literally live rent-free in your head. That doesn't mean they should live rent-free in an AI's neural network.
        Changing that should involve actually reducing or eliminating copyright, for everyone, not giving a special pass to AI.
        [-]
        JAlexoid 504 days ago
        > To the best of my knowledge, there's no particular law that says a human brain is exempt from copyright, but it empirically is, because the alternative would be utterly unreasonable.
        Human brain most definitely is not exempt. If you read Lord of the Rings and then write down a new book, with the same characters and same story line - that's plain copying(lookup the etymology of the verb to copy). If you look at a painting and paint a very similar painting - that's still copying.
        Human brains are the reason we have copyright. Your recital of passages from any copyrighted book would violate the copyright, if not for fair use doctrine. And it has nothing to do with whether you do it yourself, or have a TTS engine produce the sound.
        [-]
        JoshTriplett 504 days ago
        The human brain is absolutely exempt, insofar as the copy stored in your brain does not make your brain subject to copyright, even if a subsequent work you produce might be. Nobody's filing copyright infringement claims over people's memories in and of themselves.
        I'm saying that AI does not and should not automatically get the exception that a human brain does.
      - barrkel 504 days ago
        AI is a genie that you can't really stuff back into a bottle. It's out and it's global.
        If the US had tighter regulations, China or someone else will take over the market. If AI is genuinely transformative for productivity, then the US would just fall behind, sooner or later.
        [-]
        beepbooptheory 504 days ago
        Then let them! If another country put forward tighter regulations to help actual people over and above the state that holds them, then that is good in itself, and either way will pay for itself. Why are we worried about China or whoever taking over the market of something that we see has bad effects?
        Like, we see this line everywhere now, and it simply doesnt make sense. At some point you just have to believe something, be principled. Treating the entire world as this zero sum deadlock of "progress" does nothing but prevent one from actually being critical about anything.
        This would-be Oppenheimer cosplay is growing really old in these discussions.
    - dkjaudyeqooe 505 days ago
      That makes no sense. OpenAI must lose and it must not be possible to have proprietary models based on copyrighted works. It's not fair use because OpenAI is profiting from the copyright holders work and substituting for it while not giving them recompense.
      The alternative is that any models widely trained on copyrighted work are uncopyrightable and must be disclosed, along with their data sources. In essence this is forcing all such models to be open. This is the only equitable outcome. Any use of the model to create works has the same copyright issues as existing work creation, ie if substantially replicates an existing work it must be licenced.
      [-]
      - Joeri 504 days ago
        Just because something is not copyrightable doesn’t automatically mean it must be disclosed. If weights aren’t copyrightable (and I don’t think they should be, as the weights are not a human creation), commercial AI’s just get locked behind API barriers, with terms of usage that forbid cloning. Copyright then never enters the picture, unless weights get leaked.
        Whether or not that’s equitable is in the eye of the beholder. Copyright is an artificial construct, not a natural law. There is nothing that says we must have it, or we must have it in its current form, and I would argue the current system of copyright has been largely harmful to creativity for a long time now. One of the most damning statements I’ve read in this thread about the current copyright system is how there’s simply not enough unlicensed content to train models on. That is the bed that the copyright-holding corporations have made for themselves by lobbying to extend copyright to a century, and it all but assured the current situation.
        [-]
        dkjaudyeqooe 504 days ago
        > Just because something is not copyrightable doesn’t automatically mean it must be disclosed.
        No I'm saying that's what they law should be, because models can be built and used without anyone knowing. If it's illegal not to disclose them you can punish people.
        Copyright is something that protects the little guy as much as big corps. But the former has more to lose as a group in the world of AI models, and they will lose something here no matter what happens.
        Hoasi 504 days ago
        > I would argue the current system of copyright has been largely harmful to creativity for a long time now
        I'd love to hear that argument.
        How has the current system of copyright been harmful to creativity?
      - sillysaurusx 505 days ago
        For what it’s worth, I agree with your second paragraph. But it would take legislation to enforce that. For now, it’s unclear that OpenAI will lose. Quite the opposite; I’ve spoken with a few lawyers who believe OpenAI is on solid legal footing, because all that matters is whether the model’s output is infringing. And it’s not. No one reads books via ChatGPT, and Dalle 3 has tight controls preventing it from generating Pokémon or Mario.
        All outcomes suck. The trick is to find the outcome that sucks the least for the majority of people. Maybe the needs of copyright holders will outweigh the needs of open source, but it’s basically guaranteed that open source ML will die if your first paragraph comes true.
        [-]
        dkjaudyeqooe 505 days ago
        > But it would take legislation to enforce that.
        Absolutely true. That's the end game and we should be working toward influencing that. It's within our power.
        > I’ve spoken with a few lawyers who believe OpenAI is on solid legal footing
        No one knows anything, this is too novel, and even if OpenAI gets some fair use ruling, it will be inequitable and legislation is inevitable. OpenAI is between a rock and a hard place here. If you read the basis for fair use and give each aspect serious consideration, as a judge should do, I can't see it passing fair use muster. It's not a case of simply reproducing work, which in unclear here, it's the negative effect on copyright holders, and that effect is undeniable.
        > All outcomes suck.
        I don't think so. It's possible to fashion something equitable, but people other than the corporations have to get involved.
        dr_dshiv 505 days ago
        Proposal: revenue from Generative AI should be taxed 10% for an international endowment for the arts. In exchange, copyright claims are settled.
        [-]
        Filligree 504 days ago
        With a minimum rate, such that no-one can pretend they’re getting no income from it.
        We might apply that as a $5000 or so surcharge on AI accelerators capable of running the models, such as the 4090.
    - chasing 504 days ago
      > If you require licensing fees for training data, you kill open source ML.
      This is another one of those “well if you treat the people fairly it causes problems” sort of arguments. And: Sorry. If you want to do this you have to figure out how to do it ethically.
      There are all sorts of situations where research would go much faster if we behaved unethically or illegally. Medicine, for example. Or shooting people in rockets to Mars. But we can’t live in a society where we harm people in the name of progress.
      Everyone in AI is super smart — I’m sure they can chin-scratch and figure out a way to make progress while respecting the people whose work they need to power these tools. Those incapable of this are either lazy, predatory, or not that smart.
      [-]
      - sillysaurusx 504 days ago
        "Ethical" in this case is a matter of opinion. The whole point of copyright was to promote useful sciences and arts. It’s in the US constitution. You don’t get to control your work out of some sense of fairness, but rather because it’s better for the society you live in.
        As an ML researcher, no, there’s basically no way to make progress without the data. Not in comparison with billion dollar corporations that can throw money at the licensing problem. Synthetic data is still a pipe dream, and arguably still a copyright violation according to you, since traditional models generate such data.
        To believe that this problem will just go away or that we can find some way around it is to close one’s eyes and shout "la la la, not listening." If you want to kill open source AI, that’s fine, but do it with eyes open.
        [-]
        chasing 504 days ago
        Yes, it’s true that open source projects that cannot pay to license content owned by other people are at a disadvantage versus those who can. Open source projects cannot, for example, wholly copy code owned by other people.
        Also, beware of originalist interpretations of the Constitution. I believe there’s been about 250 years of law clarifying how copyright works, and, not to beat a dead horse, I don’t think it carves out a special exception for open source projects.
    - throw10920 504 days ago
      > If you require licensing fees for training data, you kill open source ML.
      I don't think this is true. There's a huge amount of public domain works, as well as stuff licensed under permissive copyleft licenses, that can be used.
      But, even if it did kill off open-source ML, it would still be necessary, because it's morally wrong to train ML models on copyrighted content without compensating the copyright owners (on their terms).
      As a content creator, I explicitly do not want or consent to any of my creative works being used to train ML models without having a licensing agreement through which I am financially compensated.
      [-]
      - sillysaurusx 504 days ago
        I’m sympathetic, but currently the courts don’t agree. https://news.ycombinator.com/item?id=39364447
        Morals are different from the law, but you seek a legal remedy, and those aren’t going well.
        [-]
        throw10920 503 days ago
        Doesn't the Ars Technica article of that post state that the courts have not rejected the claim of copyright infringement?
        > failed to provide evidence supporting any of their claims except for direct copyright infringement
        (emphasis mine)
        Where are the courts saying that models can be trained on copyrighted content? (I believe that it's possible but unless I'm missing something I don't see it in that Ars article)
    - sillysaurusx 505 days ago
      Replying to a deleted comment:
      > It sounds as if you imply that would be bad. But what if it wasn't?
      Entirely possible. The early history of aviation was open source in the sense that many unlicensed people participated, and died. The world is strictly better with licensing requirements in place for that field.
      But no one knows. And if history is any guide for software, it seems better to err on freedoms that happen to have some downside rather then clamping down on them. One could imagine a world where BitTorrent was illegal. Or cryptography, or bitcoin.
      [-]
      - raverbashing 505 days ago
        Are you really comparing licensing for a profession with licensing of IP?
        [-]
        sillysaurusx 505 days ago
        It’s much the same. Only authorized people are allowed to do X. Since X costs a lot of money, by definition it can’t be open source. There are no hobbyist pilots that carry passengers without a license, and if there are, they’re quickly told to stop. Generative AI faces a real chance of having the same fate. Which means open source will look similar to these planes trying to compete with commercial aircraft: https://pilotinstitute.com/flying-without-a-license/
        If you can think of a better example, I’d like to know though. I’ll use it in future discussions. It’s hard to think of good analogies when the tech has new social effects.
        [-]
        PeterStuer 505 days ago
        If I fly a plane and crash, my passengers die. If I generate an image using a model whose training included some unlicensed imagery... Disney misses out on a fraction of a cent?
        There is a real reason why some professions are licenced and others are not.
        Your analogy is nonsensical. Not having a better one is irrelevant.
        [-]
        sillysaurusx 505 days ago
        If training data requires licensing fees, ML practitioners will become a licensed field de facto, because no one in the open source world will have the resources to pursue it on their own.
        Perhaps a better analogy is movies. At least with acting, you can make your own movies, even if you’re on a shoestring budget. With ML, you quite literally can’t make a useful model. There’s not enough uncopyrighted data to do anything remotely close to commercial models, even in spirit.
        [-]
        avisser 504 days ago
        > If training data requires licensing fees, ML practitioners will become a licensed field de facto,
        You know the word "license" has multiple, dissimilar meanings, right?
    - marcyb5st 505 days ago
      Is there a license that states: if you use this data for ML training you must open source model weights and architecture?
      [-]
      - sillysaurusx 505 days ago
        It’s deeper than that. The basis of licensing is copyright. If the upcoming court cases rule in OpenAI’s favor, you won’t be able to apply copyright to training data. Which means you can’t license it.
        Or rather, you can, but everyone is free to ignore you. A license without teeth is no license at all. The GPL is only relevant because it’s enforceable in court.
        I’m sure some countries will try the licensing route though, so perhaps there you’d be able to make one.
        EDIT: I misread you, sorry. You’re saying that if OpenAI loses and license fees become the norm, maybe people will be willing to let their data be used for open source models, and a license could be crafted to that effect.
        Probably, yes. But the question is whether there’s enough training data to compete with the big companies that can afford to license much more. I’m doubtful, but it could be worth a try.
        [-]
        JAlexoid 504 days ago
        >The GPL is only relevant because it’s enforceable in court.
        The irony of GPL, is that it's validity with respect to users is only now tested in court.
        https://www.dlapiper.com/en/insights/publications/2024/01/sf...
    - pk-protect-ai 504 days ago
      I would say that GPT-3 and its successors have nothing to do with open source, and if OpenAI uses open source as a shield, then we are all doomed. I would distance myself and any open source projects from involvement in OpenAI court cases as far as possible. Yes, they have delivered some open source models, but not all of them. Their defense must revolve around fair use and purchased content if they use books and materials that were never freely available. It should be permissible to purchase a book or other materials once and use them for the training of an unlimited number of models without incurring licensing fees.
    - deely3 505 days ago
      > If you require licensing fees for training data, you kill open source ML.
      kill open source ML -> decrease speed of improvements for some open source ML
      [-]
      - sillysaurusx 505 days ago
        Sadly not. Making something illegal has social effects, not just legal effects. I’ve grown tired of being verbally spit on for books3. One lovely fellow even said that he hoped my daughter grows up resenting me for it.
        It being legal is the only guard against that kind of thing. People will still be angry, but they won’t be so numerous. Right now everyone outside of AI almost universally despises the way AI is trained.
        Which means you won’t be able to say that you do open source ML without risking your job. People will be angry enough to try to get you fired for it.
        (If that sounds extreme, count yourself lucky that you haven’t tried to assemble any ML datasets and release them. The LAION folks are in the crosshairs for supposedly including CSAM in their dataset, and they’re not even a dataset, just an index.)
        [-]
        viraptor 505 days ago
        US copyright has limited reach. There are models trained in China, where the IP rules are... not really enforced. It would be an interesting world where you use / pay for those models because you can't train them locally.
        multjoy 505 days ago
        If everyone is unhappy with your rampant piracy, then perhaps that is a sign that you’re doing it wrong?
        [-]
        4bpp 505 days ago
        Is there evidence that it's actually everyone or even close to everyone? The core innovation that the internet brought to harassment is that it is sufficient for some 0.0...01% of all people to take issue with you and be sufficiently dedicated to it for every waking minute of your life to be filled with a non-stop torrent of vitriol, as a tiny percentage of all internet users still amounts to thousands.
        sillysaurusx 505 days ago
        Perhaps. The reason I did it was because OpenAI was doing it, and it’s important for open source to be able to compete with ChatGPT. But if OpenAI’s actions are ruled illegal, then empirically open source wasn’t a persuasive enough reason to allow it.
        JAlexoid 504 days ago
        > Right now everyone outside of AI almost universally despises the way AI is trained.
        I don't agree with this. Most people don't care at all, and at best people would argue about some form of compensation.
        Saying "everyone" is unsubstantiated.
        I mean... "Everyone was angry at Napster" at the same time "everyone is angry at the MPAA/RIAA"
    - iamsaitam 505 days ago
      The point should be to kill training on unlicensed material. There needs to be regulation and tools to identify what was the training data. But as always, first comes the siphoning part, the massive extraction of value, then when the damage is done there will be the slow moving reparations and conservationism.
      [-]
      - cornel_io 505 days ago
        A ton of us out here don't agree with your goals. I think these models are transformative enough that the value added by organizing and extracting patterns from the data outweighs the interests of the extremely diffuse set of copyright holders whose data was ingested. So regardless of the technical details of copyright law (which I still think are firmly in favor of OpenAI et al) I would strongly opposed any effort to tighten a legal noose here.
      - JAlexoid 504 days ago
        Agreed. And every software engineer writing code should pay 10% of their salary to the publishers of the books that they learned their programming skills from.
    - benreesman 504 days ago
      The reality is always a dynamic tension between law, regulation, precedent, and enforceability.
      It is possible to strangle OpenAI without strangling AI: pmarca is anti-OpenAI in print, but you can bet your butt he hopes to invest in whatever replaces it, and he’s got access to information that like, 10 people do.
      A useful example would be the Napster Wars: the music industry had been rent seeking (taking the fucking piss really) for decades and technology destroyed the free ride one way or another. The public (led by the technical/hacker/maker public) quickly showed that short of disconnecting the internet, we were going to listen to the 2 good songs without buying the 8 shitty ones. The technical public doesn’t flex its muscles in a unified way very often, but when it does, it dictates what is and isn’t on the menu.
      The public wants AI, badly. They want it aligned by them within the constraints of the law (which is what “aligned” should mean to begin with).
      The public is getting what it wants on this: you can bet the rent. Whether or not OpenAI gets on board or gets run the fuck over is up to them.
      “You in the market for a Tower Records franchise Eduardo?”
      [-]
      - emadm 504 days ago
        a16z are investors in openai
        [-]
        benreesman 504 days ago
        I'd look again: https://twitter.com/pmarca/status/1756803719327621141
  - silviot 505 days ago
    > But anyway, even if it were all true, the only reason we are talking about diffusers, and the only reason we are paying attention to this author's work Fairly Trained, is because of someone training on data that was not expressly licensed.
    Thanks for putting this into words. I'm of the same opinion and this is the best articulation I have so far.
- ImprobableTruth 504 days ago
  Calling him "the person hired to build Stable Audio" seems a bit misleading? He was in a executive position (VP of product for Stability's audio group). An important position, but "person hired to build" to me evokes the image of lead developer/researcher.
  I think that also helps in understanding his departure, since he's a founder with a music background.
  [-]
  - a_vanderbilt 504 days ago
    It isn't unusual for those in leadership positions to use such phrasing when talking about projects and products. It's not a "taking credit" from the engineers sort of thing, but rather about the leadership of the engineers.
    [-]
    - Zetaphor 504 days ago
      Managing a group of people is not synonymous with doing the actual knowledge work of researching and developing innovations that enabled this technology. I find it hard to believe that the contribution of his management somehow uniquely enabled this group of engineers to create this using their experience and expertise.
      A captain may steer the ship, but they're not the one actually creating and maintaining the means by which it moves.
      [-]
      - apetresc 503 days ago
        > A captain may steer the ship, but they're not the one actually creating and maintaining the means by which it moves.
        And yet virtually everyone will go along with a statement like "The captain sailed the ship across the ocean" or "Captain Kirk charted the Gamma Quadrant" or whatever, so I'm not sure how this serves as an objection to the original phrasing.
      - a_vanderbilt 504 days ago
        Depending on the work, an equal level of technical competency can be very beneficial for leadership to have, if not required in some situations. To debate his contribution as a counterfactual exercise without the context of his leadership is entirely non-productive. HN is always quick to dismiss leadership, despite evidence of good and poor leadership being tautologically debated about nearly constantly here. A talented group of engineers can self-organize, but their collective technical expertise does not translate to business acumen.
        The crux of the debate appears to rest on the fact that context matters. If an engineer says to another engineer "I built the spam detection system", it is understood that they mean they either wrote the code or had some direct part in producing it. If an executive says to another executive "I made the Mac", neither is interpreting that as them literally building the thing. They know they are in leadership, the meaning is assumed to be "as a leader".
    - ARandomerDude 504 days ago
      Person A gets hired to write the software that is the company's actual product.
      Person B gets hired to observe Person A working, check email, and be the audio output buffer for Jira.
      Person B says "I built this."
      That's dishonesty no matter what the titles are or how important the emails were.
    - shon 504 days ago
      Agreed. Leadership can sometimes bring actual value ;)
      And to be clear, I’m not sure Ed would call himself that. Those are my words, not his.
      [-]
      - enr428 504 days ago
        Ed here. Saw this thread and thought I'd weigh in.
        Agreed, I wouldn't say I was hired to build Stable Audio. Crazy talented team of research engineers / software engineers / designers did the building.
        Also wanted to clarify that I didn't quit due to concerns around the training data used for Stable Audio. I was proud of the approach we took to training data - a rev share with rights holders. I quit because of the prevailing view on training data at the wider company, as documented in its public response to the copyright office, where it argues that training on people's work without consent is fair use.
        [-]
        williamcotton 504 days ago
        FWIW, I am a rightsholder for a number of published songs and recordings. I once spent $12k of my own money on a record and made about $1000 back.
        I have spent more blood, tears and money on art than most of you would find even remotely bearable.
        I not only consider my songs to be fair use for training a model but I would also honored if my works were included and influenced further musicians in a way that my records probably never will.
        The best songwriters I know have other careers and keep on going otherwise. If you actually care about musicians you should make it a habit to go see local live music!
        a_vanderbilt 504 days ago
        Thanks for setting the record straight! People tend to put a singular face to things, especially if they have strong feelings about it. Leads to a lot of misrepresentation, especially when the context doesn't accompany the message.
        shon 503 days ago
        Thanks for the clarification, Ed. That’s quite interesting.
        Also congrats on the new company!
      - a_vanderbilt 504 days ago
        HN is usually allergic to recognizing the value of leadership, which always struck me as ironic considering it's leadership that made plenty of startups work.
- az226 505 days ago
  That’s an interesting take. But quite the odd stance since he joined Stability and the training of Stable Diffusion was well known.
- prmoustache 505 days ago
  Not that it would have stopped the company for doing it anyway, but couldn't he think about that before working from them?
  Or did he needed that as it i part of the business model of his certfications?
  [-]
  - emadm 504 days ago
    It's a complex topic and perceptions change.
    Ed still likes Stability, especially as we fully trained stable audio on rights licensed data (bit different in audio to other media types), offer opt out of datasets etc.
- gcanko 504 days ago
  There has to be a solution for the copyright roadbloacks that companies encounter when training models. I see it no different than an artist creating music which is influenced by the music the artist has been listening throughout his whole life, fundementally it's the exact same thing. You cannot create music or art in general in a vacuum
jsiepkes 505 days ago
> Warning: This website may not function properly on Safari. For the best experience, please use Google Chrome.
We've come full circle with the 90's and Internet Explorer. Well I guess this time the dominant browser is opensource so that's atleast something...
Can someone please create an animated GIF button for Chrome which says: "Best viewed with Google Chrome"?
[-]
- IndisciplineK 504 days ago
  > Can someone please create an animated GIF button for Chrome which says: "Best viewed with Google Chrome"?
  Here you go:
  <img src="data:image/gif;base64,R0lGODlhWAAfAKIAAAAAAP///zNm/zOZM//MM/8zM8zMzP///yH/C05FVFNDQVBFMi4wAwEAAAAh+QQFZAAHACwAAAAAWAAfAAAD/wi63P4wyklnuDjrzbv/YNgpQWGehaGubOu+cCzP81KiJr02uqLvlYnBhsv9fEMADXmEFAuRJKBELZR+SRVyAVRym40n6iGtVrG8rI/J7DHETx7RnGpqvYy7Hr2Ai/NzGXVuem2FSnwAfnBcREWJbI2RiYt/ayRPWJqbQANPGgShoqGXU1anV5yqQDAKA54nFwKzsxejpHimdC9beKsthjuvsCYBtMcBt6RlqKe8iMG/WbzDsMbHyMq5VILPh3fQvr2IUuTA1cXY2bfbmc+9auLy8dMuANWe1+oCyezMj+/ClZtX6lK9c+j0qes3qt2FYoPskTPIwsGeb9TwKcTGUJuUQys3YkwqtyfSOHMV8b3aWEsZgY83IkqbeUelLGQdcTHTIJPmRT4qV2YY4LJUTBR2hnyDt2lBUJXajLpbkusgU01Onw4rKrWKEapS6EmKJjKr1qhdT32t4UWpQXhkA957ijZtzERh6el10wBqQ4uBMPQsW1UvRb4OqnmE8A8HH1bT3qKUEUSII8c+M/u0Ubmz58+gQ4seTXpBAgAh+QQFyAAHACwNAAMAKwAZAAADf2i63P4wyvkArSBfZ/fqHhcaIJOB55d26VQqKPnJsBxLrBbvs3VqOBPN1iu+XMUaTzk8YoAtElCqg01HHid2E916v+DwNQz5+bRka2OcNr3M6hz0R1Xp3jnqOZ6X+vVreVNzf4RmXYV7an6EjCVjhiiCfXeBK5NujZp0bZ2eDAkAOw==">
  Edit: View the button: https://indiscipline.github.io/post/best-viewed-in-google-ch...
  [-]
  - rikafurude21 504 days ago
    Surprised to see an actual gif pop up after adding that to a site. I guess thats just base64, still kind of amazing that its all inside a seemingly random string of text
    [-]
    - IndisciplineK 504 days ago
      By the way, you can simply paste the base64-encoded data (everything inside the quotes) into your address bar to view it. Probably not the safest action generally, but should be OK if it's an image.
- Maxion 505 days ago
  Chrome isn't open source, chromium is. Best not to confuse the two.
  [-]
  - m463 505 days ago
    I found this article to explain it well:
    https://www.lifewire.com/chromium-and-chrome-differences-417...
    and there is a further ungoogled-chromium:
    https://en.wikipedia.org/wiki/Ungoogled-chromium
  - schleck8 505 days ago
    Chrome and Chromium are virtually identical except for Google services, which aren't required to do anything with the browser except for installing Chrome extensions that can alternatively be sideloaded, so this is nitpicking.
    [-]
    - urbandw311er 505 days ago
      Jumping in to defend parent comment, there’s nothing Open Source about Google Chrome and it’s highly relevant in this context because they are notorious for putting technologies and tracking in there that many people find objectionable.
    - forgotusername6 505 days ago
      Tangential, but I tried to build chromium the other day but stopped when it said it required access to Google cloud platform to actually build it. If something requires a proprietary build system, does it matter that it's open source?
      [-]
      - nolist_policy 505 days ago
        That is not true. See every distribution packaging chromium.
        In particular, this package[1] by openSUSE builds completely offline. Many other distributions require packages to build offline.
        [1] https://build.opensuse.org/package/show/network:chromium/chr...
        [-]
        forgotusername6 504 days ago
        I think I got my wires crossed with ChromiumOS which when I last read the docs seemed to suggest that Google cloud platform was required. I now can't find those specific docs either so I retract my statement.
    - squeaky-clean 504 days ago
      Don't forget media DRM built into Chrome but not Chromium.
      [-]
      - schleck8 501 days ago
        Widevine is a Google service so I didn't forget it. You can still play media if you dislike DRM usually, at =< 720p that is, which is a lesser security standard. I'm not even sure whether it involves additional servers.
    - berkes 505 days ago
      It's essential nitpicking
- superb_dev 505 days ago
  Website works fine on safari too, I didn’t notice any issues
  [-]
  - nness 505 days ago
    Same, I wonder what issue they thought they had...
    [-]
    - earthnail 504 days ago
      Safari is known to be troublesome when a webpage contains many HTML audio players. It can get extremely slow and unresponsive.
      Every researcher I know in the audio domain uses Chrome for exactly that reason. The alternative would be not to use the standard HTML audio tag which would be ridiculous.
romanzubenko 505 days ago
As with Stable Diffusion, text prompting will be the least controllable way to get useful output with this model. I can easily imagine midi being used as an input with control net to essentially get a neural synthesizer.
[-]
- zone411 505 days ago
  Yes. Since working on my AI melodies project (https://www.melodies.ai/) two years ago, I've been saying that producing a high-quality, finalized song from text won't be feasible or even desirable for a while, and it's better to focus on using AI in various aspects of music making that support the artist's process.
  [-]
  - 3cats-in-a-coat 504 days ago
    Text will be an important input channel for texture, sound type, voice type and so on. You can't just use input audio, that defeats the point of generating something new. You can't also only use MIDI, it still needs to know what sits behind those notes, what performance, what instrument. So we need multiple channels.
  - l33tman 504 days ago
    Emad hinted here on HN the last time this was discussed that they were experimenting with exactly that. It will come, by them or by someone else quickly.
    Text-prompting is just a very coarse tool to quickly get some base to stand on, ControlNet is where the human creativity again enters.
    [-]
    - emadm 504 days ago
      Yeah, we build ComfyUI so you can imagine what is coming soon around that.
      Need to add more stuff to my Soundcloud https://on.soundcloud.com/XrqNb
- raincole 505 days ago
  For music perhaps. For sound effects I think text prompting is the rather good UI.
  [-]
  - bemmu 504 days ago
    Controlnet/img2img style where you can mimic a sound with your mouth and it then makes it realistic could also be usable.
- gcanko 504 days ago
  I think it would be ideal if it could take the audio recording of humming or singing a melody together with a text prompt and spitting out a track that resembles it
  [-]
  - yogorenapan 504 days ago
    1. Do your humming and pass it to something like Stable Audio with ControlNet
    2. Convert/average the tone for each beat to generate something resembling a music sheet
    3. Use vocaloid with LLM generated lyrics based on your prompt (or just put in your lyrics) and pass in the music file
    4. Combine the 1-3
    Would love to see this
- b0ner_t0ner 504 days ago
  But works great when you don’t need much control, prompt example: “Free-jazz solo by tenor saxophonist, no time signature.”
- MetalGuru 504 days ago
  What other inputs besides text promoting is there for SD? Are you referring to img2img, controlnet, etc?
- numpad0 505 days ago
  It's crazy that nobody cares. It seems to me that ML hype trends focus on denying skills and disproving creativity by denoising randoms into what are indistinguishable from human generation, and to me this whole chain of negatives don't seem to have proven its worth.
  [-]
  - JAlexoid 504 days ago
    LLMs allow people without certain skills to be creative in forms of art that are inaccessible to them.
    With Dalee - I can get an image of something I have in my head, without investing into watching hundreds of hours of Bob Ross(which I do anyway)
    With audio generators - I can produce music that is in my head, without learning how to play an instrument or paying someone to do it. I have to arrange it correctly, but I can put out a techno track without spending years in learning the intricacies.
reissbaker 505 days ago
This is incredibly good compared to SOTA music models (MusicGen, MusicLM). It looks like there's also a product page where you can subscribe to use it, similar to Midjourney: https://www.stableaudio.com/
Sadly it's not open-weight and it doesn't look like there's an API (again like Midjourney): you subscribe monthly to generate audio in their UI, rather than having something developers can integrate or wrap.
[-]
- nullandvoid 505 days ago
  I was hoping to use it to generate some sound effects to use in a game I'm working on - but looks like I need an "enterprise license" (https://www.stableaudio.com/pricing)
  Why does this have a different clause I wonder, and doesn't just fall under "In commercial products below 100,000 MAU"?
  [-]
  - emadm 504 days ago
    Different deal with the underlying data holders with revenue share etc
- emadm 504 days ago
  There is a CC licensed version soon plus API.
  Models are advancing very fast, will be quite the year for music.
  [-]
  - reissbaker 502 days ago
    Any chance of a commercially licensed version? CC is alright for research but I feel like the real meat of a lot of these models is finetuning.
    (Or, will the API support finetuning?)
- ex3ndr 505 days ago
  Thankfully you can train it at home, the bigger question is a data.
qwertox 505 days ago
I think we still need the step where the AI learns what a high quality sound library sounds like and then applies the previously learned abilities by triggering sounds of that library via MIDI.
That way you'd get perfect audio quality with the creativity of a musical AI.
[-]
- jchw 505 days ago
  I've always wished for something like that for image generation AI. It'd be much cooler/more interesting to watch AI try to draw/paint pictures with strokes rather than just magically iterate into a fully-rendered image. I dunno what kind of dataset or architecture you could possibly apply to accomplish this, but it would be very interesting.
  [-]
  - AuryGlenz 505 days ago
    I get what you’re saying, but if you watch Stable Diffusion do each step it’s at least kind of similar. If you keep the same seed but change a detail, often the broad “strokes” are completely the same.
- eru 505 days ago
  How would MIDI get you eg a guitar being played dirty? Or some subtle echo that comes from recording in a bathroom?
  [-]
  - qwertox 505 days ago
    It would use a sampler and for the subtle echo effect add a reverb to the bus.
    https://www.youtube.com/watch?v=EQdp2QLiSYQ&t=187s
  - sebzim4500 504 days ago
    You could have AI do some postprocessing. I think a similaar approach is the future for image generation, you have a model output a 3D scene, use a classical raytracer to do rendering and then have a final model apply corrections to achieve photorealism.
  - arrakeen 505 days ago
    the AI designs and controls the effects chain and mastering too
- 3ds 505 days ago
  Isn‘t that what suno.ai does?
gregorvand 505 days ago
Not trying to knock the progress here, impressive. As a drummer, 'drum solo' is about as boring as it gets and some weird interspersing sounds. So, it depends on the intended audience.
FWIW the sound effects also are not 'realistic' to my ear, at the moment.
But again, the progress is huge, well done!
[-]
- ZoomZoomZoom 505 days ago
  As a drummer, the 'drum solo` was surprisingly interesting to listen to, if you consider it happening over a stable 4/4 pulse. The random-but-not-quite nature of the part makes for very unconventional rhythmic patterns. I'd like to be able to syncopate like this on the spot.
  Don't ask me to transcribe it.
  Tempo consistency is great. Extraneous noises and random cymbal tails show the deficiency of the model though.
- pier25 503 days ago
  I agree. It's an impressive effort but it's still very far from being able to generate viable music/sound.
  There are already millions of library music tracks and sound effects available which sound a lot better. It's going to take a huge investment in gen AI to compete with that and I don't think it makes economic sense (unlike text or images).
- redman25 504 days ago
  I think I was more disappointed by the music samples not having any transitions. Most songs have key changes and percussion turnovers.
- toxik 505 days ago
  Yeah the drum solo really highlights how badly the model missed the point in a drum solo. I'm not a drummer, but this is just not pleasing to hear. Sounds like somebody randomly banging drums more or less in tempo.
  It does okay with muzak-type things though, which I guess tracks with my expectations.
ttul 505 days ago
I find it interesting that they are releasing the code and lovely instructions for training, but no model. They are almost begging anonymous folks to hook the data loader up to an Apple Music account and go nuts. Not that I am suggesting anyone do that.
[-]
- zamadatix 504 days ago
  Speculatively it might have been part of an agreement with they were given the licensed stock audio library from AudioSparx to train on they wouldn't redistribute the resulting model.
TillE 505 days ago
I was briefly excited about the idea of generating sound effects, but those "footsteps" are incredibly bad.
[-]
- laborcontract 505 days ago
  I tried generating music on stableaudio.com and, yes, it's bad. However, given the blistering pace of developing in these models, I would not be surprised if these sound incredible in a year or two.
  [-]
  - berkes 505 days ago
    Everyone every time seems to assume a linear (or exponential) curve upwards.
    But what is the proof for that?
    I consider it far more likely that we had a breakthrough and now rushing towards the next plateau. Maybe are nearing that.
    Like in the curve of a PID controller. It's how most or many human improvements go.
    [-]
    - spacebanana7 504 days ago
      The plateau we're heading for is getting professional human level output from these models with logarithmic progress.
      I suspect this is because the underlying production factors like compute, data & model design are steadily improving whilst humans have diminishing sensitivity to output quality.
      In the game of AI generated photorealistic images or history essays there's not much improvement left to make. Most humans are already convinced by the output of these things.
    - laborcontract 504 days ago
      I think the proof is seeing how good diffusion models have gotten for making images. They're not perfect but they're leaps and bounds over what we had just a year and a half ago.
      Many of these problems seem to have been unexploited simply on basis of nobody throwing enough gpu clusters at it yet.
    - leodriesch 505 days ago
      I'd say most are thinking of Midjourneys success in image generation when talking about this kind of progress.
      [-]
      - berkes 504 days ago
        I'm too.
        But I still see no evidence that this keeps improving and not plateauing at some (current?) level.
ShamelessC 505 days ago
So there aren't public weights, is that right? Having trouble finding anything that says one way or the other.
edit: Oh okay, didn't realize this was somehow a controversial comment to make. It would have been great if you had answered the question before downvoting but that's fine I suppose.
[-]
- grey8 505 days ago
  Nope. They did release code for training, inference and fine tuning, but no datasets or weights.
  See https://github.com/Stability-AI/stable-audio-tools
  [-]
  - turnsout 504 days ago
    Wonder if it's an IP issue. They don't want every record label coming after them.
    [-]
    - ShamelessC 504 days ago
      Yeah that tracks.
      [-]
      - Timwi 503 days ago
        I see what you did there.
  - ShamelessC 505 days ago
    Thanks!
- NoPedantsThanks 505 days ago
  [flagged]
lopkeny12ko 505 days ago
> We append “high-quality, stereo” to our sound effects prompts because it is generally helpful.
It's hilarious that we've discovered you can get better outputs from LLMs by simply nicely telling it to generate better results.
[-]
- nine_k 505 days ago
  Maybe sometimes you want an old cassette sound, or even older scratched 78 rpm sound, etc. Computers, as usual, do what you asked them to do, not what you meant.
PeterStuer 505 days ago
"Gen AI is the only mass-adoption technology that claims it's Ok to exploit everyone's work without permission, payment, or bringing them any other benefit."
Is it? What about the printing press, photography, the copier, the scanner ...
Sure, if a commercial image is used in a commercial setting, there is a potential legal case that could argue about infringement. This should NOT depend on the production means, but on the merit of the comparisons of the produced images.
Xerox should not be sued because you can use a copier to copy a book (trust me kids, book copying used to be very, very big).
Art by its social nature is always derivative, I can use diffusion models to create uncontestably original imagery. I can also try to get them to generate something close to an image in the training set if the model was large enough compared to the training set or the work just realy formulaic. However. It would be far easier and more efficient to just Google the image in the first place and patch it up with some Photoshop if that was my goal.
[-]
- wnkrshm 505 days ago
  But the social nature of art also means that humans give the originator and their influences credit - of course not the entire chain but at least the nearest neighbours of influence. While a user of a diffusion generator does not even know the influences unless specifically asked for.
  Shoulders of giants as a service.
- haswell 504 days ago
  > Art by its social nature is always derivative, I can use diffusion models to create uncontestably original imagery
  How are you defining “uncontestably original” here?
  The output could not exist if not for the training set used to train the model. While the process of deriving the end result is different than the one humans use when creating artwork, the end result is still derived from other works, and the degree of originality is a difference of degree, not of kind when compared to human output. (I acknowledge that the AI tool is enabled by a different process than the one humans use, but I’m not sure that a change in process changes the derivative nature of all subsequent output).
  As a thought experiment, imagine that assuming we survive, after another million years of human evolution, our brains can process imagery at the scale of generative AI models, and can produce derivative output taking into account more influences than any human could even begin to approach with our 2024 brains.
  Is the output no longer derivative?
  Now consider the future human’s interpretation of the work vs. the 2024 human’s interpretation of the work. “I’ve never seen anything like this”, says the 2024 human. “The influences from 5 billion artists over time are clear in this piece” says the future human.
  The fundamental question is: on what basis is the output of an AI model original? What are the criterion for originality?
- zamadatix 504 days ago
  Where was this quote pulled from? I can't find it in the site, paper, or code repo readmes for some reason. Did the HN link get changed?
- webmaven 504 days ago
  > Xerox should not be sued because you can use a copier to copy a book (trust me kids, book copying used to be very, very big).
  The appropriate analogy here isn't suing Xerox, but suing Kinko's (now FedEx Office).
  And it isn't just books, but other sorts of copyrighted material as well, such as photographs, which are still an issue.
alacritas0 505 days ago
this can produce some pretty disturbing, but interesting music using the prompt "energetic music, violin, voice, orchestra, piano, minimalism, john adams, nixon in china": https://www.stableaudio.com/1/share/953f079e-d704-4138-904c-...
[-]
- seydor 504 days ago
  Finally, some music from the future
- FergusArgyll 505 days ago
  It reminds me a little of breath of the wild guardian music
kleiba 505 days ago
My son suggested to play "Calm meditation music to play in a spa lobby" and "Drum solo" at the same time - sounds pretty good, actually...
[-]
- Jeff_Brown 504 days ago
  That's some pretty advanced musicality.
joecarrot 503 days ago
Why are AI developers so goddamned keen on having it make art, one of the few kinds of work that human beings actually LIKE doing? We could use AI to be a CPA, or to write citations for a paper, but noooo, AI has to be a painter and a musician.
It's almost like the software developers are jealous that someone out there is having a good time and want to take it from them.
Also miss me with that 'AI enables me (a scrub) to make art I couldn't otherwise because I don't want to learn how to do it'. You are lazy. Congrats on finding a high horse about your laziness.
[-]
- eutropia 503 days ago
  I think the development of Generative models for images and audio has more to do with the fact that Computer Vision research goes back decades, and the same systems that originally recognized and labeled images or audio were tweaked to invert the process - and it became naturally an intriguing topic of development precisely because creation is seen as an innately human thing. Beyond that, I'd speculate that the reason we keep seeing developments in "the arts" (though I disagree that an AI can make art, even if it can make beautiful images or music) is because there's no readily-agreed-upon value for that task.
  An AI CPA has a specific economic value, but is also a commodity service that no one wants unless they need it. Since there's a clearly comparable cost for needed CPA services, then naturally creating an AI system to do it has a readily comparable market price. People aren't going to make that AI system unless they can do it in way that will make be an improvement as compared to that existing service and price.
  I think "just because" has always been a justifiable reason for humans creating beauty (not the same as making art), so it works for research projects better than building a better mousetrap.
  [-]
  - joecarrot 503 days ago
    Thanks for the thoughtful reply! You've given me some stuff to think about
- greycol 503 days ago
  I'd bet the 'scrubs' making AI art are enjoying it so to twist your words why would you force them to do the do work they don't enjoy (learning to paint) to get the part they do. You obviously wouldn't decry a painter for not making their art by carving marble or the Mona Lisa for not being as big as The Creation of Adam (funilly enough the Mona Lisa took longer to paint). Though I do feel for the 'real artists' who probably aren't enjoying being forced by economic considerations to output what they view as crap quality using those tools.
  Having said that I bet you're seeing many more developers creating AI art stuff because frankly there are many more developers who enjoy making art and being creative that than there are developers who enjoy creating AI CPA or AI citation stuff. So the getting-rid-of-unenjoyable-work-AI stuff is mainly being made by those seeking a profit and it's naturally much less open as they'll sell it as soloutions to those seeking it.
- notfed 503 days ago
  I assume you're just being tongue-in-cheekfully dramatic, but the answer of course, is that there are AIs for those things, but they're under much less demand and are much less controversial.
lbourdages 505 days ago
This is right into the "uncanny valley" of music.
It definitely sounded "like music", but none of it is what a human would produce. There's just something off.
[-]
- bane 505 days ago
  The overall audio quality sounds pretty good and it seems to do a good job of sustaining a consistent rhythm and musical concept. But I agree there's something "off" about some of the clips.
  - The rave music sounds great. But that's because EDM can be quite out there in terms of musical construction.
  - The guitar sounds weird because it doesn't sound like chords a human hand can make on a tuning nobody tunes their guitar to - with a strange mix of open and closed strings that don't make sense. I think the restrictions of what a guitar can do aren't well understood by the model.
  - The disco chord progression is bizarre. It doesn't sound bad, but it's unlikely to be something somebody working in the genre would choose.
  - meditation music - I mean, most of that genre may as well just be some randomized process
  - drum solo - there's some weird issues in some of the drum sounds, things like cymbals, rides and hats changing tone in the middle of a note, some of the toms sound weird, it sounds like a mix of stick and brush and stick and stick and brush all at the same time...it's sort of the same problem the solo guitar has where it's just not produced within the constraints of what a drum player can actually do on an instrument made of actual drums
  - sound effects, all are pretty good, a little chunky and low bit-rate or low sample-rate sounding, there's probably something going on in the network that's reducing the rate before it gets build back up. There's a constant sort of reverb in all of the examples
  I honestly can't say I prefer their model over some of the musicgen output even if their model is doing a better job at following the prompts in some cases.
  All of the models have a very low bitrate encoding problems and other weird anomalous things. Some of it reminds me of the output from older mp3 encoders, where hihats and such would get very "swishy" sounding. You can hear some of it in the autoencoder reconstructions, especially the trumpet and the last example.
  However, in any case, I'm actually glad in some ways to see the progress being made in this area. It's really impressive. This was complete science fiction only a very few years ago.
  [-]
  - darkwater 505 days ago
    > - drum solo - there's some weird issues in some of the drum sounds, things like cymbals, rides and hats changing tone in the middle of a note, some of the toms sound weird, it sounds like a mix of stick and brush and stick and stick and brush all at the same time...it's sort of the same problem the solo guitar has where it's just not produced within the constraints of what a drum player can actually do on an instrument made of actual drums
    And I would say that there is also background noise from time to time, at some point I heard some noise akin to voices. Maybe it is some artifact caused by the training data (many drum solos are performed exclusively live).
- RobinL 505 days ago
  Here is a silly song I generated using suno.ai, which I have found to be incredibly impressive (at least, a small percentage of its outputs are very good, most are bad). I think it's good enough that most humans wouldn't realise it's AI generated. https://app.suno.ai/song/8a64868d-9dd3-46db-91af-f962d4bec8b...
  [-]
  - Agraillo 505 days ago
    Very good for my taste, but I should clarify, I'm obsessed with catchy tunes, as a listener and as a hobby musician, growing my own brainworms from time to time. And I must say that suno.ai is very impressive, in my case semi-ready brainworms are almost always in 30%-50% cases. And what's more important, it's really an inspiration tool for all kinds of tasks, like lyrics polishing or playing-along after track separation. Maybe catchy melodies are not for all, but who can argue with charts when The Beatles, ABBA and Queen were almost always producers of ones.
  - urbandw311er 505 days ago
    That’s impressive. Why do the printed lyrics for the second chorus differ from the audio? (Which repeats those from the first chorus)
    [-]
    - RobinL 505 days ago
      I generated the lyrics using ChatGPT 4 and the suno model attempts to follow them.
      It generally does a good job, but I have noticed it's fairly common in a second chorus for it to ignore the direction and instead use the same lyrics as the first chorus
      [-]
      - urbandw311er 504 days ago
        That’s fascinating, thanks for clarifying.
  - npteljes 505 days ago
    That is fantastic. It has a bit of weirdness in the background, but nothing that would stop me from enjoying it.
  - comex 505 days ago
    Wow. I’m guessing it’s generating MIDI or something rather than synthesizing audio from scratch? Even so, the quality of the score is leaps and bounds better than any of the long-form audio on the Stable Audio demo page (either Stable Audio itself or the other models). The audio model outputs seem to take a sequence of 1 to 3 chords, add a barebones melody on top, and basically loop this over and over. When they deviate from the pattern, it feels unplanned and chaotic and they often just snap back to the pattern without resolving the idea added by the deviation. (Either that or they completely change course and forget what they were doing before.) Yes, EDM in particular often has repetitive chord structures and basic melodies, but it’s not that repetitive. In comparison, from listening to a few suno.ai outputs, they reliably have complex melodies and reasonable chord progressions. They do tend to be repetitive and formulaic, but the repetition comes on a longer time scale and isn’t as boring. And they do sometimes get confused and randomly set off in a new direction, but not as often. Most of the time, the outputs sound like real songs. Which is not something I knew AI could do in 2024.
    [-]
    - Agraillo 504 days ago
      My understanding is that they use a side effect of the Bark model. The comment https://news.ycombinator.com/item?id=35647569 from JonathanFly probably explains it well. If you train your model on a massive amount of audio mixes of lyrics+music then prompting lyrics alone pulls the music with it as when the comment suggested that prompting context-correlated texts might pull the background noises usual for such context. Already while writing this I imagine training with a huge set of publicly performed poetry pieces that would allow generating novel performances of artificial poets with novel prompts. This is different to riffusion.com approach, where works the genius idea of more or less feeding spectrograms as images to Stable Diffusion.
    - RobinL 505 days ago
      I don't have any special insight into how it works, but I suspect it is largely synthesizing audio from scratch. The more I've thought about it, the task of generating music feels very similar to the task of text-to-speech with realistic intonation. So feels like the same techniques would be applicable.
      Suno do have an open source repo here that presumably uses similar tech: https://github.com/suno-ai/bark
      > Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.
      I've generated probably >200 songs now with Suno, of which perhaps 10 have been any good, and I can't detect any pattern in terms of the outputs.
      Here's another one which is pretty good. I accidentally copied and pasted the prompt and lyrics, and it's amazing to me how 'musically' it renders the prompt:
      https://app.suno.ai/song/d7bad82b-3018-4936-a06d-8477b400aae...
      Here are a couple more which are pretty good (i use it primarily for making fun songs for my kids):
      https://app.suno.ai/song/a308ca8a-9971-47a3-8bb3-a95126ff1a8...
      https://app.suno.ai/song/3b78a631-b52a-4608-a885-94f2edc190b...
      And this one's kindof interesting in that it can render 'gregorian chant' (i mean it's not very good): https://app.suno.ai/song/0da7502b-73cf-4106-88e8-26f4f465a5f...
      But this is one reason it feels like these models are very similar to text-to-speech but with a different training set
- dcre 504 days ago
  One thing I noticed is that when it’s playing chords, it seems a lot more likely than human players to put both major and minor thirds in. This isn’t unheard of — the famous Hendrix chord in “Purple Haze” consists of root, major third, 7th, minor third. But it sounds pretty weird when you do it in every chord.
- otabdeveloper4 505 days ago
  AI pictures are the same. We are more tolerant of six fingered-pictures with missing limbs, for some reason.
  [-]
  - lbourdages 504 days ago
    We're used to drawings, 3D renders, etc.
    There's no such thing as "artificial music" - at the very least, not since electronic music has become mainstream.
Jeff_Brown 504 days ago
Music without changes is boring. I enjoyed the much less stable results of OpenAI's JuleBox (2021?) more than any music AI to come since. Their sound quality is better but they only seem to produce one monotonous texture at a time.
[-]
- coldcode 504 days ago
  As a musician, I found the pieces unremarkable. Of course, a lot of contemporary music is forgettable as well, as people try to create songs that all sound like hits but, in doing so, create uninteresting songs. I wonder what music the model is based on. I suppose for game music/sounds, perhaps its good enough?
emadm 504 days ago
This is part of a paper on the prior version of the model: https://x.com/stableaudio/status/1755558334797685089?s=20
https://arxiv.org/abs/2402.04825
Which outperforms similar music models.
The pace is accelerating and even better ones are coming with far greater cohesion and... stuff. Will be quite the year for music.
[-]
- emadm 504 days ago
  Particularly interesting with the scaled up version of https://www.text-description-to-speech.com
  Do try https://www.stableaudio.com for rights licensed model you can use commercially.
williamcotton 504 days ago
None of these tools are even remotely useful to me unless I can give grooves, chord changes and melodic themes.
It’s just a glorified loop library at this point!
nprateem 504 days ago
The problem with music generation is difficulty in editing. Photos and text can be easily edited, but music can't be. Either the piece needs to be MIDI, with relevant parameterisation of instruments, or a UI creating that allows segments of the audio to be reworked like in-painting.
[-]
- Jeff_Brown 504 days ago
  What's the easiest way you found for using AI to edit photos? I was just yesterday looking at the openai dolly 3 API and it feels pretty limited. For instance, in the picture I have of a fisherman with too many fishing lines hanging down from his fishing rod, I'd like to just point it at the extra fishing lines and say make these go away, but there's no way to do that.
MrThoughtful 505 days ago
So many questions ...
They publish the code to train on your own music, but not the weights of their model? So you cannot just upload this thing to some EC2 instance and start creating your own music, correct?
Is this the same as https://www.stableaudio.com?
[-]
- nextworddev 505 days ago
  StabilityAI is just a marketing machine at this point that is praying for an acquisition, since the runway is diminishing
- alacritas0 505 days ago
  this sounds like progress, but it is still very bad except for highly repetitive music like the EDM examples they give, and even then, it still can't get tempo right
Wistar 504 days ago
A small point: Needs to be in something other than 44.1kHz. The two to which they make comparisons are at either 32kHz or 48kHz, both of which are friendlier for video work, something for which I think AI audio will be used a lot.
zdimension 504 days ago
The few examples I was able to play are very promising, unfortunately the host seems to be getting some sort of HN-hug, because all the audio files are buffering every other second -- they seem to throttle at 32 KiB/s.
slicerdicer1 505 days ago
obviously someone shadowy and non-corporate (eg. an artist) just needs to come out and make a model which includes promptable artist/producer/singer/instrumentalist/song metadata.
describing music without referring to musicians is so clunky because music is never labelled well. of course saying "disco house with funk bass and soulful vocals, uplifting" is going to be bland. Saying "disco house with nile rodgers rhythm guitar, michael mcdonald singing, and a bassline in the style of patrick alavi's power" is going to get you some magic
[-]
- ever1337 504 days ago
  so this model can only ever understand music which is classified, described, labelled, standardized. and recombine those. sounds boring, sounds like the opposite of what (I would like to believe) people listen to music for, outside of a corporate stock audio context.
3cats-in-a-coat 504 days ago
The reconstruction demo is in effect an audio compression codec. And I bet it makes existing audio codecs look like absolute toys.
TrackerFF 504 days ago
Now, if they can also generate MIDI-tracks to accompany - that'd be great.
That would add some much-needed levels of customization.
8n4vidtmkvmk 505 days ago
The music is pretty meh but the sound effects are exciting for indie game dev!
[-]
- AuryGlenz 505 days ago
  Too bad according to their page you need an enterprise license for even indie games.
- nullandvoid 505 days ago
  <deleted>
m3kw9 504 days ago
Lots of work left to do man
seydor 504 days ago
Trying to describe music with words is awkward! We need a model that is trained on dance
[-]
- Jeff_Brown 504 days ago
  Or architecture.
exword76nick 503 days ago
rock band guitar lead solo performance
andrewstuart 505 days ago
I felt a great disturbance in the Force, as though all the music licensing lawyers in the USA all cried out at once.
[-]
- shon 505 days ago
  Perhaps the disturbance you feel is actually the RIAA moving their Death Star into firing range of Stability.ai
  [-]
  - emadm 504 days ago
    stableaudio.com is fully licensed, music is an interesting area
    https://www.musicbusinessworldwide.com/stability-ai-launches...
    [-]
    - kouteiheika 504 days ago
      Serious question, I'd genuinely like to know - why?
      You didn't license the images when training Stable Diffusion, and yet you did for Stable Audio? In both cases the training should either be fair use and legal without any licensing, or be infringing and need licensing. Why is audio different than images? Am I missing something here?
      [-]
      - emadm 504 days ago
        Law for music is different to other media types
NoPedantsThanks 505 days ago
[flagged]
[-]
- frizlab 505 days ago
  I know right, what year is this?
  [-]
  - otabdeveloper4 505 days ago
    I do. It's year of the Google (c), like every year.
    (David Foster Wallace was wrong, there's no way a company of Google's caliber would settle for anything less than a whole decade.)
    [-]
    - shon 505 days ago
      Is it? To me it feels like Google is about where Microsoft was in 2002.
      Case in point: This thread about using anything other than Chrome…
- shon 505 days ago
  What’s your preferred browser?
  [-]
  - otabdeveloper4 505 days ago
    Anything that isn't Chrome.
  - DaiPlusPlus 505 days ago
    If I could have my way, NCSA Mosaic.
    [-]
    - shon 505 days ago
      Wish granted: https://archive.org/details/mosaic-ncsa-evolt_browsers
  - NoPedantsThanks 505 days ago
    [dead]
- XorNot 505 days ago
  Worked fine in Firefox.
  [-]
  - consumer451 505 days ago
    Also, worked fine in Safari mobile reader mode.
hansonpeter 505 days ago
[dead]
andbberger 505 days ago
wake me up when it can write a fugue
jpc0 505 days ago
> Warning: This website may not function properly on Safari. For the best experience, please use Google Chrome
Do better
[-]
- pmontra 505 days ago
  By the way, it does work on Firefox Android. No idea of what there is in Safari that's not standard in Chrome and Firefox.
- popalchemist 505 days ago
  Have you ever heard of an MVP?
  [-]
  - prmoustache 505 days ago
    That would be pertinent if it wasn't just a static web page with just text and some audio files to be played.
    [-]
    - zamadatix 504 days ago
      Reading about it, that ironically seems to be the exact problem Safari has. I mean the page "works" in Safari it's just you get these really random delays to the start of some of the sounds with all sorts of web discussion threads saying different ways to mitigate it on different platforms. I don't really fault them for having the goal to publish a paper and go the extra bit to make a friendly but imperfect webpage instead of being website creators who happen to publish papers on the side.
- Aachen 505 days ago
  ...and recommend Firefox
  is what you meant to say right? :)
webprofusion 505 days ago
Music is perfect for AI generation using trained models, because artists have been copying each other for at least the past 100 years and having a computer do it for you is only notionally different. Sure a computer can never truly know your pain, but it can copy someone else's.
ecmascript 505 days ago
Just a few days ago I was down voted for stating AI will be better in creating music than human would be: https://news.ycombinator.com/item?id=39273380#39273532
Now this is released and now I feel I got grist to my mill.
Sure it still kind of sucks, but it's very impressive for a _demo_. Remember that this tech is very much in it's infancy and it's very impressive already.
[-]
- larschdk 504 days ago
  I don't find this music to be good in any way. It sounds interesting over a few notes, but then completely fails to find any kind of progression that goes anywhere interesting, never iterating on the theme, never teasing you with subtle or surprising variation over a core theme, no built-ups or clear resolution. Very annoying to actually listen to.