Cessation of public development of Kefir C compiler

(kefir.protopopov.lv)

96 points | by f311a 5 hours ago

15 comments

  • kator 4 hours ago
    > Yet, this shift made me re-evaluate the open source code publishing. Prior to that, I have been positive about free and open software, and considered this to be the default mode for work such as kefir. I did not require any justifications from myself to publish something. Now, however, I feel more and more that the main beneficiaries of my unpaid work are companies scraping the internet to train large language models. Currently accepted status quo in this area goes against my own intentions in licensing this work under GNU GPLv3. Publication has ceased to be the "null hypothesis" for me, and requires explicit mental justification which I am not able to provide.

    I feel this pain, one of my small donation driven sites has been destroyed by crawlers who just ignore robots.txt and burn the site into the ground.

    Sort of jokingly I proposed an update to the "spam fax" law:

    https://www.karlbunch.com/random/website-protection-act/

    • account42 2 hours ago
      This is essentially the digital world transforming from a high trust society into a low trust one. Sad to see.
      • Gormo 18 minutes ago
        To whom would you attribute the greater part of that reduction in trust: the people using FOSS to train LLMs, or the people trying to block them?
    • malwrar 3 hours ago
      Really hate to say it, but I’ve stopped publishing my work too for this reason. I spend most of my time now building my own little software ark, and I aspire to no longer think of programming in the next few years. I feel like the creative economy in general will be unrecognizable in the near future, maybe nonexistent. I wonder what modes of collaboration on ideas might form in the next few years.
      • irdc 2 hours ago
        Here is what the purveyors of AI don't seem to realise. You can bend copyright law all you want in order to train your models on whatever you can grab, but in the absence of genuine protection of their creative work authors are simply not going to be publishing at all.
        • buran77 13 minutes ago
          I think they see it all too well. They still think they can make bank today while it lasts, whatever comes after is some other shareholder's problem. And if we're talking about open source, killing it might be a positive side effect, they'll be ready to sell you a closed source alternative when you no longer have options.
        • egypturnash 31 minutes ago
          People who are making stuff because they want to share it are still going to be publishing. And fighting to be noticed in an unending torrent of slop.
          • irdc 23 minutes ago
            Without any material or immaterial benefits? And with one's work being ground up and turned into weights for the next version of the machine that's threatening one's employment?
        • dzhiurgis 2 hours ago
          Great. More work for AI then.
      • kator 25 minutes ago
        The sad thing is I feel trapped on all sides of the debate, I wrote a book about LLMs and human creativity (spoiler Humans win for a long time) but I was going to do it as a blog series, instead I published https://www.amazon.com/dp/B0GXCSY4W8 because I felt at least I might get a bit back for literally 100’s of hours of my life I poured into the book and my editor and friends who read and provided reviews.

        And I push a lot of open source code including a ton for the SWGEmu project, but now I’m of mixed mind to stop pushing anything public. I can’t decide, am I talking out of both sides of my mouth, it’s a confusing time to navigate for sure.

    • jagged-chisel 3 hours ago
      > The sender pays, not the receiver.

      You have a hole here. Your web server is sending the response and the bot is receiving.

      Fix that and … profit? :-)

      • kator 24 minutes ago
        oh good point got that backwards… OMG my fax brain didn’t even think about it.
      • wizzwizz4 1 hour ago
        I'm trying to compose a better wording, but my attempts aren't working. The best I've got is:

        > The initiator of the communication pays, not the server operator.

  • keyle 55 minutes ago

       This project in particular has been unconcerned with new coding practices so far, primarily, because I derive pleasure from hand-written implementations of my ideas, and believe that overcoming challenges the hard way is the main value I get from it.
    
    This 100% the same for me. Outside of work where speed is more important than quality, and I work with people that use AI, I don't use AI at all on my own projects. It poisons the mind and the soul. Ok that sounds dramatic, but I felt down up until the point where I started hand writing everything again. Software engineering is still fun and powerful, and the hell with where the world is going.
  • binaryturtle 1 hour ago
    I'm also very hesitant to release any new works (code, artworks, etc.) to the public. I usually release code under the GPL or AGPL, but I don't think any of those choices are properly respected by the AI crawlers, and subsequent "mixing into" those models.

    Multiple times I got partially broken "citations" of GPL licensed code out of the models as answers to basic research questions (aka prompts) w/o any mentioning of the original license applied to the code. Just adding some random bugs every 10th line doesn't make it not a direct derivate. Image generators happily generated Sonics or Bart Simpsons (w/o directly prompting for that either). No mentions that those are copyrighted characters either.

  • rurban 1 hour ago
    One of the very few small compilers which passes the full gcc torture tests. But for me kefir is good enough as the reference small compiler. Not as fast as tcc, but more correct
    • paufernandez 25 minutes ago
      I've been taking a look at the source and it's a work of art :O
  • rgoulter 3 hours ago
    Seems to me LLMs have changed some things. I'm not sure how it's best put, but it used to be:

    - Seeing code (or a blogpost or whatever) was a result from effort where thought had gone into it. The writer paid effort so the reader didn't have to.

    - There'd be some level of attachment to what you've put effort into.

    With LLMs, that's undermined: it's easy to produce thoughtless imitations. Code or comments where thought didn't go into it. So, seeing some result isn't an indication of skill, but also not even an indication thought went into it.

    I guess there's still something lost if someone isn't going to share code they've put thought into. -- But on the other hand, if it's just for me & I don't have to share it with a wider audience, getting LLMs to write out code isn't so expensive.. so code itself isn't necessarily something to value so much.

    • f6v 15 minutes ago
      I don't know... I've been writing code for good twenty years (15 professionally).

      First, I think it's the best time to write software since so much boring stuff can be automated. I can put my thoughts into what I'm trying to achieve instead of how. To put it otherwise, I think about big picture much more than about mundane details like dealing with particularities of a programming language.

      Second, most people were using SO to solve just about any issue they had. The number of developers producing truly original code was minimal even 10 years ago.

    • irdc 2 hours ago
      But LLMs don’t seem particularly good at inventing new ways to code (or write, or…). It’s literally all derivative. So what happens in 10 years? Are we headed for a great stagnation?
      • rgoulter 54 minutes ago
        > But LLMs don’t seem particularly good at inventing new ways to code (or write, or…). It’s literally all derivative.

        I think the key part is how much thought goes into something.

        Optimistically, LLMs are good at taking unstructured input, and (probably) producing the intended output from that. -- This allows for an interesting new way of coding: a set of instructions don't need to be as rigorous as a shell script, but can be natural language.

        That part surely extends creativity. An LLM will be familiar with domain ideas I'm not, even if an LLM is completely disinterested in doing things.

        Pessimistically, I think it's still not clear what the right way of interacting online with all of this is (other than clear expectations of "no AI")... in some sense LLM output is worthless to share, in the sense that I'm just as capable of asking the LLM to output something as anyone else is.

      • dzhiurgis 2 hours ago
        It’s like arguing that nobody is going to invent new ways to ride horses in the age to automobile.
        • irdc 1 hour ago
          If the way humanity advances were via new ways to ride horses, then yes.
        • asibahi 1 hour ago
          You made me curious. Has anyone invented new ways to ride horses in the age of the automobile?
  • RetroTechie 1 hour ago
    So how big is the community around this project?

    If a one-person show, closing it up would effectively kill it? Or (re?)turn it into a hobby project developed at snail pace.

    If some community exists: fork coming up?

  • turtleyacht 5 hours ago
    It was nice hearing about it. If this is a healthy direction for the project, then so be it. At least source to previous versions is still available.
  • Max-Ganz-II 3 hours ago
    I put my site behind a username/password wall, to block LLM bots.
    • Xirdus 3 hours ago
      Spambots learned to autoregister 30 years ago. Do LLMs not do that? Crazy.
    • krystalgamer 3 hours ago
      same, not worth getting 100GB of content getting scrapped every other day.
  • altmanaltman 3 hours ago
    What a well-rounded nicely written announcement that touches on all parts of the argument without any rage baiting or flex etc. It would be easy to just ramble against AI and how its the end of the world etc but the author focused on a point that's not even related to use or misue of AI in software but rather how we have made it acceptable that large corporate companies can skirt copyright without any issue and make rivers of money with it. This problem extends not only to coding but other industries as well.
    • snarfy 31 minutes ago
      Instead of a derivative work we have a machine that creates derivative works. I fail to see how this is fair use.
  • 34aSHGAS 25 minutes ago
    That a function which is at its core literally trained to be as close to its input as possible is not (yet, court cases are still pending) IP theft is one of the great mysteries of our time.

    Worse, because the sometimes valuable real time answers are generated by scraping the web and rewriting the IP in plain sight.

    A couple of academic psychopaths who write horrible academic code themselves steal all valuable human knowledge right before our eyes and market it as "tech".

    There should be a new civil war against these modern plantation owners and slave holders.

  • bjourne 3 hours ago
    People taking your work and not giving anything back was ALWAYS the risk you took when writing free software. LLM training doesn't change that much. That the us military no doubt is using gcc to compile embedded software for their icbm:s no doubt irks the gnu people. But you can't have it any other way. "You can only use my software for good things" just is not consistent with "free software".
    • Gormo 20 minutes ago
      Yeah, I really can't comprehend these sentiments as anything other than an "I don't like AI" argument. FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

      I see a lot of risks involved in people surrendering their own decision-making to LLMs, but that's a question of how they're used, not how they're trained. The idea that using FOSS software to train LLMs is somehow a violation of FOSS norms just doesn't seem valid.

      • xigoi 18 minutes ago
        > FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

        Not true. Most FOSS licenses require attribution and many require derivatives to be released under the same license.

        • Gormo 15 minutes ago
          Sure, but I guess I'm not seeing the relevance here. Are we seeing some greater-than-normal wave of people redistributing FOSS code without attribution, or creating derivative works without adhering to the license terms? LLM training doesn't seem to be either of these things.
    • xigoi 19 minutes ago
      Before LLMs, you could use the GNU GPL or other copyleft licenses to protect your code from being used to develop non-free software. Unfortunately, the courts have decided that LLMs are free to ignore licenses.
    • TheOtherHobbes 2 hours ago
      There's an almost intergalactic level of irony in the extent to which open source has benefited giant corporations and the military at the expense of individuals, and ultimately contributed to the commercialised enclosure of software IP.

      I suppose you could argue it also indirectly led to the empowerment of non-developers to create their own vibe coded solutions. But we're not quite there yet.

      And the AI IP that makes that possible is still enclosed rather than open.

      • Gormo 24 minutes ago
        > There's an almost intergalactic level of irony in the extent to which open source has benefited giant corporations and the military at the expense of individuals, and ultimately contributed to the commercialised enclosure of software IP.

        Could you perhaps explain that irony a bit more explicitly?

        Can you provide any examples of "commercialized enclosure of software IP" somehow backwashing into the FOSS ecosystem and closing things up that are already open?

      • nine_k 1 hour ago
        Don't open-weight models sort of returning the favor?
      • fragmede 2 hours ago
        > But we're not quite there yet.

        Judging from the number of projects I've seen from people who aren't software developers, we're there enough.

  • ryanshrott 18 minutes ago
    [dead]
  • jdw64 2 hours ago
    [dead]
  • neoparker 3 hours ago
    [flagged]
  • neoparker 3 hours ago
    [flagged]