What if we set GPT-4 free in Minecraft?


268 points | by ecliptik 15 days ago


  • Imnimo 15 days ago
    It's always fun to look at the prompts used by these projects. Here are a few snippets from this one:

    >You are a helpful assistant that tells me the next immediate task to do in Minecraft. My ultimate goal is to discover as many diverse things as possible, accomplish as many diverse tasks as possible and become the best Minecraft player in the world.

    >8) Tasks that require information beyond the player's status to verify should be avoided. For instance, "Placing 4 torches" and "Dig a 2x1x2 hole" are not ideal since they require visual confirmation from the screen. All the placing, building, planting, and trading tasks should be avoided. Do not propose task starting with these keywords

    >7) Use `exploreUntil(bot, direction, maxDistance, callback)` when you cannot find something. You should frequently call this before mining blocks or killing mobs. You should select a direction at random every time instead of constantly using (1, 0, 1).

    >9) Do not write infinite loops or recursive functions.

    You can really imagine the sorts of pitfalls the agent fell into that induced the authors to add these stipulations.

    • doctoboggan 15 days ago
      >9) Do not write infinite loops or recursive functions.

      did GPT-4 just solve the halting problem?

      • travisjungroth 15 days ago
        One description of the halting problem is that there are not just two categories of programs (halts and doesn’t), but a third of “undecidable”. To limit yourself to the category of “halts” isn’t solving the halting problem.
        • jrockway 15 days ago
          eBPF is a real-world system that also refuses to accept programs that might not halt. You don't have to solve the halting problem, you just have to deny the programmer loops (and loop-alikes, of course; recursive functions, jmp, etc).
          • dragontamer 15 days ago
            IIRC, some language model proved that one-layer of loops can be proven to halt vs not-halt.

            As soon as you add a 2nd layer of loops however, you reach Turing-completeness and suddenly the halting problem becomes unsolvable.


            So you don't need to deny jmp/loops. You just need to deny _nested_ loops. And... find that old paper I read like 15 years ago to figure out the details to discriminate halt/not-halt in the single-layer loop language.

            • jakear 15 days ago
              This is most assuredly false:

                  while (state !== STOP):
                     read = TAPE[index]
                     [state, write, dir] = TABLE[read, state]
                     TAPE[index] = write
                     index = index + dir
            • vidarh 14 days ago
              It takes much more. You need to e.g. limit APIs you can call to ones that are also guaranteed to halt, and guarantee an exit condition that is guaranteed to occur, which means either seriously limiting input or adding additional implicit conditions.

              E.g. this Ruby program is undecidable:

              This one using a hypothetical gets guaranteed to eventually halt is still undecidable:

                  while getsWithTimeout != "halt"
            • napsterbr 15 days ago
              You could ask ChatGPT to find the paper for you! It told me the name of an old videogame I was trying to remember for years.
          • insanitybit 15 days ago
            Microsoft's implementation of eBPF allows for loops :D


            • deredede 14 days ago
              For loops (where the user is not allowed to change the value of the iterator mid-iteration) are guaranteed to terminate.
              • fardo 14 days ago
                Or create more loops, since for example adding

                For (;;) { print(“hahaha you didn’t say the magic word!”); }

                In another loop would also prevent termination without program shutdowns.

                • deredede 13 days ago
                  Don't let weird C syntax choices fool you, this is a while loop, not a for loop.

                  To clarify, when I said "for loop", I meant what is sometimes called a "counted for loop" (or simply "counted loop"): there is an (maximum, if you allow early exit) iteration count that is computed before executing the loop and can't increase later.

                  In C syntax, it is for (int i = 0, e = ...; i < e; ++i) { ... } and the body of the loop is not allowed to change the value of either i or e.

                  Edit: actually I may have been unclear. When I said "for loops are guaranteed to terminate", in the context of the discussion, I meant "if the only kind of loops you allow are (counted) for loops in a language where loop-free expressions are guaranteed to terminate, you get a language where all expressions are guarantee to terminate". So loops can contain other loops, as long as they are all of the "counted for loop" kind.

              • insanitybit 14 days ago
                Yeah, I just wanted to point out that it's possible in ebpf.
                • deredede 13 days ago
                  Fair enough :) Although skimming the paper you linked, they say they don't check termination of loops, which surprises me.
        • wendyshu 15 days ago
          If it's undecidable that means it doesn't halt.
          • sharkbot 15 days ago
            Incorrect. The program under inspection could halt or could not halt; undecidability is a statement about Turing machines inspecting that program and unable to make a determination (via a clever proof by contradiction).

            The Halting Problem is a truly interesting result, and for the most part uninteresting in practice.

          • bmacho 14 days ago
            Why is this downvoted? It can't happen that you run an undecidable program, and it halts, since its running would be a proof that it halts.

            Say your Turing machine is searching for a contradiction in ZFC. It enumerates all the possible proofs, and checks if they are valid and they prove 0=1. You can prove in ZFC that you can't prove that it halts, nor you can prove that it doesn't halt.

            Now your Turing machine won't halt, according to ZFC. (It can halt in practice, if ZFC has a contradiction.) Same for any other undecidable programs.

            • imtringued 13 days ago
              Problems which cannot be solved by an algorithm/program are considered undecidable as well. Saying that something that doesn't exist, doesn't halts makes no sense in the general case.
              • bmacho 13 days ago
                The thread is about programs. Existing programs do exist. There are problems not solvable by Turing machines, like the meaning of life, but this thread is not about them.

                GGP was talking about programs halting, not halting, and "something else", which GGP called "undecidable".

                There are programs that ZFC cannot prove if they halt or not. For example, searching for a contradiction in ZFS is such a program. Its undecidability means that there is no sequence of axioms of ZFC that ends with "this program halts" or "this program does not halt".

                But then this program does not halt, since its halting would mean that ZFC can prove that this program halts.

                In any case, programs can halt or not halt. There are programs that ZFC cannot prove that they halt or not, but those programs do not halt.

                • SAI_Peregrinus 12 days ago
                  > But then this program does not halt, since its halting would mean that ZFC can prove that this program halts.

                  That step is not logically consistent. It could halt, it's just that ZFC can't prove it will ever do so. For example, the program which computes the 8000th busy beaver number halts (per definition of the busy beaver function), but is undecidable in ZFC[1].

                  An undecidable program can halt. An undecidable program can run forever. But whatever axiom system is being used to decide that can't prove which (without running the program, potentially for an infinite number of steps).

                  [1] https://scottaaronson.blog/?p=2725

                  • bmacho 12 days ago
                    > For example, the program which computes the 8000th busy beaver number

                    I don't believe that there exist such a program.

                    • SAI_Peregrinus 12 days ago
                      Why not? That implies that an 8000 state Turing machine which eventually halts when started with a blank tape doesn't exist. If any of them do (and it's trivial to simulate one), then exactly such a program can exist.
                      • bmacho 11 days ago
                        Well, there can indeed exist a program that writes out the number BB(8000). (In the sense that it is consistent with ZFC that there is a program that writes out the number of steps of the smallest contradiction in ZFC.)

                        What I don't believe that there exist a program that is a proper counter-example, that is, its halting is undecidable in ZFC, and it halts. Exactly because what I wrote earlier.

          • enugu 14 days ago
            Actually, this comment makes sense and is used in logic to derive true but non-provable propositions withing the current axiom system assuming it is sound.

            The problem is that you dont know if the checker that you use to detect if a program halts will itself halt on your given input or continue forever. But given a sound checker with some assumption, one can find non-halting programs which wont be detected by the checker using the diagonalization trick.

          • kaba0 14 days ago
            n = 1;

            while (isFamousMathProblemWeDontKnowWhetherItHoldsForAllTheIntegers(n)) { n++ }

      • geoelectric 15 days ago
        Seems slightly easier for me to guarantee something I wrote halts. The halting problem is more about a process analyzing someone else's arbitrary code.
        • cyanydeez 15 days ago
          Is gpt conceptually always analyzing someone else's code?
          • IIAOPSW 15 days ago
            Are you conceptually always analyzing some stackoverflow code?
      • moffkalast 15 days ago

      • hgsgm 14 days ago
        Is there a name for the phenomenon where people who understand a concept don't talk about it because it isn't important or relevant, but people who don't understand think it's a big deal?
      • jhanschoo 14 days ago
        As others mentioned, it's trivial* to write useful programs that provably halt; the impossible task is to determine if an arbitrary program halts.

        *: Trivial in a mathematician's sense

      • redox99 15 days ago
        It's trivial to avoid infinite loops by adding some exit condition in case it takes too long, which is probably what you want here.

        And obviously trivial to avoid recursion too.

        • Too 14 days ago
          Avoiding recursion isn’t always that trivial if there are intermediate functions between the first function is called again. More dynamic code with callback functions is another case making this difficult.
      • nr2x 15 days ago
        Probably the best programming advice any anyone could get.
      • bentcorner 15 days ago
        Every program halts because no computer can run forever
      • narrator 15 days ago
        If the brain is like a computer, how do humans even solve the halting problem? Maybe it's because humans feel anxiety at the unproductive passing of time while computers will just do what they're doing forever.
        • jameshart 15 days ago
          Humans can’t solve the halting problem.

          Write a program that, for a positive integer, runs the Collatz process on it (if even: halve it; if odd, multiply by three and add one; repeat).

          If the process results in a 1, move on to the next number and repeat.

          If the process produces a number it produced previously, halt. (Worried you’ll need unbounded state for this part? Use tortoise+hare, it’s fine).

          This program halts if and only if the Collatz conjecture is false.

          Now, the Collatz conjecture might not be unprovable. But for now no human can tell you whether or not that program halts.

          • genewitch 14 days ago
            When I read about collatz I wrote a C# program that short circuits the calculation like that, but my programming chops are not great and it could only handle 3000 digits of memory serves. It's somewhere on my GitHub page and I'm curious how many other people released similar code to speed things up.

            I ended up using it to compare single core performance on any windows machine, because the timestamped logging was deterministic. Rewrote it in python and still use it these days.

        • umanwizard 15 days ago
          Humans can’t solve the halting problem any more than computers can.

          For example, it’s trivial to write a program that searches by brute force for a cycle that would disprove the Collatz conjecture, and halts if it finds one. No human knows whether that program will halt.

        • ethanbond 15 days ago
          They don’t really. People have different thresholds for acceptable exploration costs and heuristics for approximating those costs (as well as the opportunity cost of alternative courses of action).

          IMO, in biological systems the explore vs exploit tradeoff is pretty analogous to the halting problem. There doesn’t seem to be an optimal general “solution” to it.

          Once an organism is very familiar with its environment it’ll approximate near-optimal trade offs, which would suggest it’s just using heuristics.

        • pjerem 14 days ago
          > If the brain is like a computer

          That’s a common misconception. The brain is not like a computer. The brain can’t store and execute programs.

          • shkkmo 14 days ago
            > That’s a common misconception. The brain is not like a computer. The brain can’t store and execute programs.

            Can't it? The brain is absolutely capable of emulating a turing complete system such as running a program.

            • pjerem 13 days ago
              Can it ?

              I mean, sure you can reason about a portion of code on your screen but there is no way you could emulate anything without some visual support.

              There is no way your short term memory could store the program and the variables. The human brain can barely remember 5 to 10 words for several seconds and you have no control on your long term memory.

              So yes your brain can somehow emulate a computer if you give him a pen, a sheet of paper paper, time, and a lot of sugar. But that’s not because it’s functioning like a computer but rather because you learnt how a computer work.

          • kaba0 14 days ago
            The halting problem is about computable functions. Every real thing can at most solve computable functions — so for all practical purposes, a brain is a computer that has all the same limits.
        • IIAOPSW 15 days ago
          Nothing you think about consciously has higher thread priority than the stay-alive loop.
        • i2cmaster 15 days ago
          Humans tend to communicate/write software in what are really sub languages. Often these sub languages are not recursive.
    • qayxc 15 days ago
      In the end it all looks like classical programming with extra steps to me :)
      • Imnimo 15 days ago
        Even more than the complex prompts, they give the bot access to a variety of functions implementing primitive actions, which are crafted to give helpful hints if the bot is struggling. For example, the code for mining a block:


        This function keeps a global count of how many times it's been called to mine a block that doesn't exist nearby, and will warn the bot that it should try exploring first instead.

        There's nothing necessarily wrong with all that - it's an important research question to understand how much hand-holding the agent needs to be able to do these sorts of tasks. But readers should be aware that it's hardly dropping an agent into the world and leaving it to its own devices.

        • sterlind 15 days ago
          I do love their use of `bot.chat()` as, like, throwing an exception to the AI. maybe this finally gives developers an incentive to document their code and throw intelligible errors - the machines need it to figure out what they're doing wrong!
          • ineedasername 15 days ago
            ChatGPT: I am sorry for the confusion. I am a large language model (LLM) trained on natural language and my training data cutoff is September 2021. Hexadecimal references to memory addresses are not part of my training data set.”
      • uw_rob 13 days ago
    • tmountain 15 days ago
      How did you get the prompt?
    • kodah 15 days ago
      It would be really cool if they simplified an NPCs world to include GPT prompts for an RPG-like experience.
  • thesuperbigfrog 15 days ago
    At first glance this just seems like an alternate approach at building expert systems.

    The Minecraft videos are impressive.

    Nethack (https://www.nethack.org/) has been used for AI development in the past and more recently:







    I am curious how well Voyager would do in Nethack.

  • smcl 15 days ago
    That opening sentence is a very funny statement when you take into account that "...in Minecraft" is a way some YouTubers hide hyperbolic/unserious statements (to skirt TOS violations). Like after 100 deaths to Malenia in Elden Ring: "Oh fuck me, I might as well kill myself ... IN MINECRAFT"
    • opan 15 days ago
      I thought the same thing, although I think of the meme as being used to avoid feds and legal problems, like saying you'll kill someone "in Minecraft" to make it not a realistic-sounding threat.
      • TylerLives 15 days ago
        • smcl 15 days ago
          Yeah a few people got banned from places using it, and also tried "...in Roblox". I think I remember seeing "... Roblox [your|my]self" as a euphemism too.
          • GauntletWizard 15 days ago
            Euphemisms are a treadmill, and modern "content policies" have only accelerated it. There was a time not long ago that "retarded" was what kids called each other, and now my phone won't type that word. Every euphemism and hyperbolic statement will eventually be taken seriously, and a new one will be created.
      • IIAOPSW 15 days ago
        Its the kiddie version of "asking for a friend".
        • Gigachad 14 days ago
          Minecraft players are in their 20s these days.
  • lsy 15 days ago
    It's not playing in-context of minecraft, it's playing in-context of an API to minecraft. You can see one of the limitations in its error condition when it tries to craft an "acacia axe" out of acacia planks and sticks, fails, and then replaces all the references to "wooden axe". Of course in the real world it doesn't matter what you call the axe you made, and it's pretty clear what an acacia axe is. Even if it did matter, you could also easily keep the function name and output message, and just make an "wooden axe" behind the scenes. The fact that the GPT is so tightly bound to the formalism of the API is an indication that this is a task the GPT can likely do quite well as this API is well-used and documented.
  • blooalien 15 days ago
    Where I think it's gonna start to get really scary, and much more closely approaching "real A.I" / AGI is when they start augmenting and wiring together various differing forms of "A.I." with each other. GPT-4, no matter how impressive it might appear on the surface is still "just" a large language model. Augment it with other types of learning models and at some point you might just hit on the right combination for it to start some form of actual "reasoning" thought or "creativity".

    As long as they're all still "special" single-purpose systems (LLM is about processing and responding to language input for example, CV / Computer Vision models specialize in operating on visual or image inputs, etc.), that's all they'll ever be, no matter how good they get at pretending they're more.

    • hahajk 15 days ago
      Isn't GPT-4 multimodal? I remember it designing a website based off of a sketch during the initial demo.
      • rst 15 days ago
        Theres a multimodal variant, but it's not widely available yet.
      • ShamelessC 15 days ago
        It is.
    • fnordpiglet 14 days ago
      This is happening. These projects are examples of feedback loops. While the models aren’t playing directly they are receiving feedback and iteratively improving. Constraining and optimizing using classical AI is an obvious next step. I agree this is when the magic happens.
    • MagicMoonlight 15 days ago
      A LLM is a form of input like a keyboard. If they put it in front of something like siri and used it as input instead of a processor then you could make siri actually functional.

      It’s very good at understanding text but by itself it can’t think. Turning text into commands it knows is doable.

  • jmugan 15 days ago
    I wish there was a summary of how this worked. I see the abstract and lots of figures and movies, but I still don't get a good sense of what exactly the algorithm is. I even skimmed the whole paper.
    • ShamelessC 15 days ago

      You just scroll down a tiny bit on the twitter page and get this nice video and summary from the author.

      > Voyager has 3 key components:

      > 1) An iterative prompting mechanism that incorporates game feedback, execution errors, and self-verification to refine programs;

      > 2) A skill library of code to store & retrieve complex behaviors;

      > 3) An automatic curriculum to maximize exploration.

      • jmugan 14 days ago
        Right, but I wanted to know what each of those things was.
  • ChrisArchitect 15 days ago
  • nitwit005 15 days ago
    I searched for the word "infinite", and my suspicion was quickly proven correct:

    > 9) Do not write infinite loops or recursive functions.

    > Sometimes GPT-4 will write an infinite loop that runs forever.

    • csours 15 days ago
      do not write infinite loops
      • lsy 15 days ago
        I can only imagine the desperation of the researcher pleading with a computer to do its best to solve the halting problem.
  • codeulike 15 days ago
    This is like its writing a bot to play minecraft?

    I'd like to see a visual/language model/AI that learns to play minecraft as an actual inhabitant of the game. i.e. processing visual input, recognising objects, working out whats going on, learning how to move around. Learning how to make food and avoid monsters. It would be an 'Embodied AI' within the world of Minecraft.

    The language part would allow us to talk to this being. You could ask it things like:

    "Do you prefer to make a house, or dig a cave?"

    "What are your hopes for the future?"

    "Is there a recent achievement you are particularly proud of?"


    • AlecSchueler 15 days ago
      You could ask it those things but it will tell you that it doesn't have feelings, preferences or hopes. You'd also need to give a reason to do anything. Eventually you get back to the point where you're "writing a bot" but with behaviour closer to how you imagine it should be.
    • krapp 15 days ago
      >I'd like to see a visual/language model/AI that learns to play minecraft as an actual inhabitant of the game. i.e. processing visual input, recognising objects, working out whats going on, learning how to move around. Learning how to make food and avoid monsters. It would be an 'Embodied AI' within the world of Minecraft.

      There is already an AI VTuber, Neurosama, trained to "play" games, including Minecraft (also OSU! and Among Us.)

      I don't think she's learned lava is bad yet.

  • tehsauce 15 days ago
  • slg 15 days ago
    And yet it still digs straight down.
    • neura 15 days ago
      Yeah... was wondering if it eventually learns that digging straight down is more likely to kill you than any other direction or if it could ever learn to down down only when standing on the edge of another block, instead of the one you're breaking.
      • yazaddaruvala 15 days ago
        They don't explicitly talk about curriculum improvement based on negative reinforcement. It would be relatively straight forward to do tho.

        Perform self-reflection every time damage is taken (the iterations of self-reflection can depend on % health lost). Something like "Please output as a list of general guidelines / best-practices that any Minecraft player can use in the future. For each guideline add a risk profile that is introduced if the guideline is neglected. Based on the following game log over the last 100 steps, what should the Minecraft player have done differently to avoid taking damage? \n <game log>"

        And keep storing those guidelines over multiple iterations. If there start to become too many best-practices, ask GPT-4 to "Please output as a list of general guidelines / best-practices that any Minecraft player can use in the future. For each guideline add a risk profile that is introduced if the guideline is neglected. Take the guidelines below and summarize them. Prioritize retaining more detail about the guidelines with a higher risk profile, and do merge guidelines if possible and appropriate. \n <old guidelines>"

        • zimpenfish 14 days ago
          > Perform self-reflection every time damage is taken

          Given the number of times I've taken damage in Minecraft from a Creeper that has seemingly just appeared behind me, I expect this feedback loop to pretty quickly end up with a bot that does a full surroundings scan after every action to make sure there's no Creepers around.

    • js8 14 days ago
      It also fights an Enderman with a shovel, how smart.
  • sdenton4 15 days ago
    There's untold billions of tokens of Minecraft related content on the web - the controlling LLM personally has memorized every Minecraft strategy guide ever written.
  • Geee 15 days ago
    Very interesting. Here LLM writes code that plays Minecraft. If software 2.0 is neural networks, then software 3.0 is code written by neural networks.
    • ablyveiled 15 days ago
      And software 4.0 is sticks and stones. har har.
  • Fgehono 14 days ago
    Crazy how fast the peaces fall in place.

    A big text corpus gives quite a huge amount of context.

    SD also got better through this

  • mensetmanusman 15 days ago
    It will set out to build a library that contains all the sources that it quotes but that only exist in the multiverse.
  • iinnPP 15 days ago
    I imagine this is being worked on for OSRS as well. Exciting times. Terrifying times.
    • drekk 14 days ago
      It's interesting you mention OSRS because some bots are using ChatGPT (or other LLMs) to respond to players and it's incredibly effective. Only having to deal with conversations seems like a perfect use-case for the technology


  • phendrenad2 15 days ago
    I'm imagining Twitch Plays Pokemon but even less coherent.
  • busseio 15 days ago
    I’d like to build something like this, but for Robot Odyssey.
  • thdespou 14 days ago
    Prompt engineering at it's finest form.
  • mbgerring 15 days ago
    Why? Who wants this? To what end?
    • mkaic 15 days ago
      > Why?

      Because the researchers found it to be an interesting problem. Using an AI to beat Minecraft has been an active benchmark in the ML world for some time now.

      > Who wants this?

      Me, as well as the many others who have liked and shared the paper.

      > To what end?

      Furthering our ability to create autonomous agents. If we can get it to work in Minecraft, that's one step closer to getting it to work in real life.

  • tommywiseausmom 15 days ago