What if we set GPT-4 free in Minecraft?

(twitter.com)

268 points | by ecliptik 766 days ago

23 comments

Imnimo 766 days ago
It's always fun to look at the prompts used by these projects. Here are a few snippets from this one:
>You are a helpful assistant that tells me the next immediate task to do in Minecraft. My ultimate goal is to discover as many diverse things as possible, accomplish as many diverse tasks as possible and become the best Minecraft player in the world.
>8) Tasks that require information beyond the player's status to verify should be avoided. For instance, "Placing 4 torches" and "Dig a 2x1x2 hole" are not ideal since they require visual confirmation from the screen. All the placing, building, planting, and trading tasks should be avoided. Do not propose task starting with these keywords
>7) Use `exploreUntil(bot, direction, maxDistance, callback)` when you cannot find something. You should frequently call this before mining blocks or killing mobs. You should select a direction at random every time instead of constantly using (1, 0, 1).
>9) Do not write infinite loops or recursive functions.
You can really imagine the sorts of pitfalls the agent fell into that induced the authors to add these stipulations.
[-]
- doctoboggan 766 days ago
  >9) Do not write infinite loops or recursive functions.
  did GPT-4 just solve the halting problem?
  [-]
  - travisjungroth 766 days ago
    One description of the halting problem is that there are not just two categories of programs (halts and doesn’t), but a third of “undecidable”. To limit yourself to the category of “halts” isn’t solving the halting problem.
    [-]
    - jrockway 766 days ago
      eBPF is a real-world system that also refuses to accept programs that might not halt. You don't have to solve the halting problem, you just have to deny the programmer loops (and loop-alikes, of course; recursive functions, jmp, etc).
      [-]
      - dragontamer 766 days ago
        IIRC, some language model proved that one-layer of loops can be proven to halt vs not-halt.
        As soon as you add a 2nd layer of loops however, you reach Turing-completeness and suddenly the halting problem becomes unsolvable.
        -------
        So you don't need to deny jmp/loops. You just need to deny _nested_ loops. And... find that old paper I read like 15 years ago to figure out the details to discriminate halt/not-halt in the single-layer loop language.
        [-]
        jakear 765 days ago
        This is most assuredly false:
        while (state !== STOP): read = TAPE[index] [state, write, dir] = TABLE[read, state] TAPE[index] = write index = index + dir
        [-]
        vidarh 765 days ago
        It takes much more. You need to e.g. limit APIs you can call to ones that are also guaranteed to halt, and guarantee an exit condition that is guaranteed to occur, which means either seriously limiting input or adding additional implicit conditions.
        E.g. this Ruby program is undecidable:
        gets
        This one using a hypothetical gets guaranteed to eventually halt is still undecidable:
        while getsWithTimeout != "halt" end
        [-]
        napsterbr 766 days ago
        You could ask ChatGPT to find the paper for you! It told me the name of an old videogame I was trying to remember for years.
      - insanitybit 765 days ago
        Microsoft's implementation of eBPF allows for loops :D
        https://vbpf.github.io/assets/prevail-paper.pdf
        [-]
        deredede 765 days ago
        For loops (where the user is not allowed to change the value of the iterator mid-iteration) are guaranteed to terminate.
        [-]
        fardo 765 days ago
        Or create more loops, since for example adding
        For (;;) { print(“hahaha you didn’t say the magic word!”); }
        In another loop would also prevent termination without program shutdowns.
        [-]
        deredede 764 days ago
        Don't let weird C syntax choices fool you, this is a while loop, not a for loop.
        To clarify, when I said "for loop", I meant what is sometimes called a "counted for loop" (or simply "counted loop"): there is an (maximum, if you allow early exit) iteration count that is computed before executing the loop and can't increase later.
        In C syntax, it is for (int i = 0, e = ...; i < e; ++i) { ... } and the body of the loop is not allowed to change the value of either i or e.
        Edit: actually I may have been unclear. When I said "for loops are guaranteed to terminate", in the context of the discussion, I meant "if the only kind of loops you allow are (counted) for loops in a language where loop-free expressions are guaranteed to terminate, you get a language where all expressions are guarantee to terminate". So loops can contain other loops, as long as they are all of the "counted for loop" kind.
        insanitybit 765 days ago
        Yeah, I just wanted to point out that it's possible in ebpf.
        [-]
        deredede 764 days ago
        Fair enough :) Although skimming the paper you linked, they say they don't check termination of loops, which surprises me.
    - wendyshu 765 days ago
      If it's undecidable that means it doesn't halt.
      [-]
      - sharkbot 765 days ago
        Incorrect. The program under inspection could halt or could not halt; undecidability is a statement about Turing machines inspecting that program and unable to make a determination (via a clever proof by contradiction).
        The Halting Problem is a truly interesting result, and for the most part uninteresting in practice.
      - bmacho 765 days ago
        Why is this downvoted? It can't happen that you run an undecidable program, and it halts, since its running would be a proof that it halts.
        Say your Turing machine is searching for a contradiction in ZFC. It enumerates all the possible proofs, and checks if they are valid and they prove 0=1. You can prove in ZFC that you can't prove that it halts, nor you can prove that it doesn't halt.
        Now your Turing machine won't halt, according to ZFC. (It can halt in practice, if ZFC has a contradiction.) Same for any other undecidable programs.
        [-]
        imtringued 764 days ago
        Problems which cannot be solved by an algorithm/program are considered undecidable as well. Saying that something that doesn't exist, doesn't halts makes no sense in the general case.
        [-]
        bmacho 764 days ago
        The thread is about programs. Existing programs do exist. There are problems not solvable by Turing machines, like the meaning of life, but this thread is not about them.
        GGP was talking about programs halting, not halting, and "something else", which GGP called "undecidable".
        There are programs that ZFC cannot prove if they halt or not. For example, searching for a contradiction in ZFS is such a program. Its undecidability means that there is no sequence of axioms of ZFC that ends with "this program halts" or "this program does not halt".
        But then this program does not halt, since its halting would mean that ZFC can prove that this program halts.
        In any case, programs can halt or not halt. There are programs that ZFC cannot prove that they halt or not, but those programs do not halt.
        [-]
        SAI_Peregrinus 763 days ago
        > But then this program does not halt, since its halting would mean that ZFC can prove that this program halts.
        That step is not logically consistent. It could halt, it's just that ZFC can't prove it will ever do so. For example, the program which computes the 8000th busy beaver number halts (per definition of the busy beaver function), but is undecidable in ZFC[1].
        An undecidable program can halt. An undecidable program can run forever. But whatever axiom system is being used to decide that can't prove which (without running the program, potentially for an infinite number of steps).
        [1] https://scottaaronson.blog/?p=2725
        [-]
        bmacho 763 days ago
        > For example, the program which computes the 8000th busy beaver number
        I don't believe that there exist such a program.
        [-]
        SAI_Peregrinus 763 days ago
        Why not? That implies that an 8000 state Turing machine which eventually halts when started with a blank tape doesn't exist. If any of them do (and it's trivial to simulate one), then exactly such a program can exist.
        [-]
        bmacho 762 days ago
        Well, there can indeed exist a program that writes out the number BB(8000). (In the sense that it is consistent with ZFC that there is a program that writes out the number of steps of the smallest contradiction in ZFC.)
        What I don't believe that there exist a program that is a proper counter-example, that is, its halting is undecidable in ZFC, and it halts. Exactly because what I wrote earlier.
      - enugu 765 days ago
        Actually, this comment makes sense and is used in logic to derive true but non-provable propositions withing the current axiom system assuming it is sound.
        The problem is that you dont know if the checker that you use to detect if a program halts will itself halt on your given input or continue forever. But given a sound checker with some assumption, one can find non-halting programs which wont be detected by the checker using the diagonalization trick.
      - kaba0 765 days ago
        n = 1;
        while (isFamousMathProblemWeDontKnowWhetherItHoldsForAllTheIntegers(n)) { n++ }
  - geoelectric 766 days ago
    Seems slightly easier for me to guarantee something I wrote halts. The halting problem is more about a process analyzing someone else's arbitrary code.
    [-]
    - cyanydeez 766 days ago
      Is gpt conceptually always analyzing someone else's code?
      [-]
      - IIAOPSW 766 days ago
        Are you conceptually always analyzing some stackoverflow code?
  - moffkalast 766 days ago
```
    if(going_to_halt)
        dont();
```
  - hgsgm 765 days ago
    Is there a name for the phenomenon where people who understand a concept don't talk about it because it isn't important or relevant, but people who don't understand think it's a big deal?
  - jhanschoo 765 days ago
    As others mentioned, it's trivial* to write useful programs that provably halt; the impossible task is to determine if an arbitrary program halts.
    *: Trivial in a mathematician's sense
  - redox99 766 days ago
    It's trivial to avoid infinite loops by adding some exit condition in case it takes too long, which is probably what you want here.
    And obviously trivial to avoid recursion too.
    [-]
    - Too 765 days ago
      Avoiding recursion isn’t always that trivial if there are intermediate functions between the first function is called again. More dynamic code with callback functions is another case making this difficult.
  - nr2x 765 days ago
    Probably the best programming advice any anyone could get.
  - bentcorner 766 days ago
    Every program halts because no computer can run forever
  - narrator 766 days ago
    If the brain is like a computer, how do humans even solve the halting problem? Maybe it's because humans feel anxiety at the unproductive passing of time while computers will just do what they're doing forever.
    [-]
    - jameshart 766 days ago
      Humans can’t solve the halting problem.
      Write a program that, for a positive integer, runs the Collatz process on it (if even: halve it; if odd, multiply by three and add one; repeat).
      If the process results in a 1, move on to the next number and repeat.
      If the process produces a number it produced previously, halt. (Worried you’ll need unbounded state for this part? Use tortoise+hare, it’s fine).
      This program halts if and only if the Collatz conjecture is false.
      Now, the Collatz conjecture might not be unprovable. But for now no human can tell you whether or not that program halts.
      [-]
      - genewitch 765 days ago
        When I read about collatz I wrote a C# program that short circuits the calculation like that, but my programming chops are not great and it could only handle 3000 digits of memory serves. It's somewhere on my GitHub page and I'm curious how many other people released similar code to speed things up.
        I ended up using it to compare single core performance on any windows machine, because the timestamped logging was deterministic. Rewrote it in python and still use it these days.
    - umanwizard 766 days ago
      Humans can’t solve the halting problem any more than computers can.
      For example, it’s trivial to write a program that searches by brute force for a cycle that would disprove the Collatz conjecture, and halts if it finds one. No human knows whether that program will halt.
    - ethanbond 766 days ago
      They don’t really. People have different thresholds for acceptable exploration costs and heuristics for approximating those costs (as well as the opportunity cost of alternative courses of action).
      IMO, in biological systems the explore vs exploit tradeoff is pretty analogous to the halting problem. There doesn’t seem to be an optimal general “solution” to it.
      Once an organism is very familiar with its environment it’ll approximate near-optimal trade offs, which would suggest it’s just using heuristics.
    - pjerem 765 days ago
      > If the brain is like a computer
      That’s a common misconception. The brain is not like a computer. The brain can’t store and execute programs.
      [-]
      - shkkmo 765 days ago
        > That’s a common misconception. The brain is not like a computer. The brain can’t store and execute programs.
        Can't it? The brain is absolutely capable of emulating a turing complete system such as running a program.
        [-]
        pjerem 764 days ago
        Can it ?
        I mean, sure you can reason about a portion of code on your screen but there is no way you could emulate anything without some visual support.
        There is no way your short term memory could store the program and the variables. The human brain can barely remember 5 to 10 words for several seconds and you have no control on your long term memory.
        So yes your brain can somehow emulate a computer if you give him a pen, a sheet of paper paper, time, and a lot of sugar. But that’s not because it’s functioning like a computer but rather because you learnt how a computer work.
      - kaba0 765 days ago
        The halting problem is about computable functions. Every real thing can at most solve computable functions — so for all practical purposes, a brain is a computer that has all the same limits.
    - IIAOPSW 766 days ago
      Nothing you think about consciously has higher thread priority than the stay-alive loop.
    - i2cmaster 766 days ago
      Humans tend to communicate/write software in what are really sub languages. Often these sub languages are not recursive.
- qayxc 766 days ago
  In the end it all looks like classical programming with extra steps to me :)
  [-]
  - Imnimo 766 days ago
    Even more than the complex prompts, they give the bot access to a variety of functions implementing primitive actions, which are crafted to give helpful hints if the bot is struggling. For example, the code for mining a block:
    https://github.com/MineDojo/Voyager/blob/main/voyager/contro...
    This function keeps a global count of how many times it's been called to mine a block that doesn't exist nearby, and will warn the bot that it should try exploring first instead.
    There's nothing necessarily wrong with all that - it's an important research question to understand how much hand-holding the agent needs to be able to do these sorts of tasks. But readers should be aware that it's hardly dropping an agent into the world and leaving it to its own devices.
    [-]
    - sterlind 766 days ago
      I do love their use of `bot.chat()` as, like, throwing an exception to the AI. maybe this finally gives developers an incentive to document their code and throw intelligible errors - the machines need it to figure out what they're doing wrong!
      [-]
      - ineedasername 766 days ago
        ChatGPT: I am sorry for the confusion. I am a large language model (LLM) trained on natural language and my training data cutoff is September 2021. Hexadecimal references to memory addresses are not part of my training data set.”
  - uw_rob 764 days ago
    Some would call this Software 2.0 https://karpathy.medium.com/software-2-0-a64152b37c35
- tmountain 766 days ago
  How did you get the prompt?
  [-]
  - cjbprime 766 days ago
    https://github.com/MineDojo/Voyager/blob/main/voyager/prompt...
- kodah 766 days ago
  It would be really cool if they simplified an NPCs world to include GPT prompts for an RPG-like experience.
  [-]
  - 1270018080 765 days ago
    That's the only application I see for black box language models that hallucinate text.
  - neuronexmachina 765 days ago
    Something like this? https://arstechnica.com/information-technology/2023/04/surpr...
thesuperbigfrog 766 days ago
At first glance this just seems like an alternate approach at building expert systems.
The Minecraft videos are impressive.
Nethack (https://www.nethack.org/) has been used for AI development in the past and more recently:
http://shelf2.library.cmu.edu/Tech/9997774.pdf
https://portfolios.cs.earlham.edu/wp-content/uploads/2018/12...
https://arxiv.org/abs/2211.00539
https://proceedings.neurips.cc/paper/2020/hash/569ff987c643b...
https://github.com/facebookresearch/nle
https://ojs.aaai.org/index.php/AIIDE/article/view/12923
I am curious how well Voyager would do in Nethack.
smcl 766 days ago
That opening sentence is a very funny statement when you take into account that "...in Minecraft" is a way some YouTubers hide hyperbolic/unserious statements (to skirt TOS violations). Like after 100 deaths to Malenia in Elden Ring: "Oh fuck me, I might as well kill myself ... IN MINECRAFT"
[-]
- opan 766 days ago
  I thought the same thing, although I think of the meme as being used to avoid feds and legal problems, like saying you'll kill someone "in Minecraft" to make it not a realistic-sounding threat.
  [-]
  - TylerLives 766 days ago
    It doesn't seem to be very effective - https://www.kotaku.com.au/2023/03/man-arrested-after-making-...
    [-]
    - smcl 766 days ago
      Yeah a few people got banned from places using it, and also tried "...in Roblox". I think I remember seeing "... Roblox [your|my]self" as a euphemism too.
      [-]
      - GauntletWizard 765 days ago
        Euphemisms are a treadmill, and modern "content policies" have only accelerated it. There was a time not long ago that "retarded" was what kids called each other, and now my phone won't type that word. Every euphemism and hyperbolic statement will eventually be taken seriously, and a new one will be created.
  - IIAOPSW 766 days ago
    Its the kiddie version of "asking for a friend".
    [-]
    - Gigachad 765 days ago
      Minecraft players are in their 20s these days.
lsy 766 days ago
It's not playing in-context of minecraft, it's playing in-context of an API to minecraft. You can see one of the limitations in its error condition when it tries to craft an "acacia axe" out of acacia planks and sticks, fails, and then replaces all the references to "wooden axe". Of course in the real world it doesn't matter what you call the axe you made, and it's pretty clear what an acacia axe is. Even if it did matter, you could also easily keep the function name and output message, and just make an "wooden axe" behind the scenes. The fact that the GPT is so tightly bound to the formalism of the API is an indication that this is a task the GPT can likely do quite well as this API is well-used and documented.
blooalien 766 days ago
Where I think it's gonna start to get really scary, and much more closely approaching "real A.I" / AGI is when they start augmenting and wiring together various differing forms of "A.I." with each other. GPT-4, no matter how impressive it might appear on the surface is still "just" a large language model. Augment it with other types of learning models and at some point you might just hit on the right combination for it to start some form of actual "reasoning" thought or "creativity".
As long as they're all still "special" single-purpose systems (LLM is about processing and responding to language input for example, CV / Computer Vision models specialize in operating on visual or image inputs, etc.), that's all they'll ever be, no matter how good they get at pretending they're more.
[-]
- hahajk 765 days ago
  Isn't GPT-4 multimodal? I remember it designing a website based off of a sketch during the initial demo.
  [-]
  - rst 765 days ago
    Theres a multimodal variant, but it's not widely available yet.
  - ShamelessC 765 days ago
    It is.
- fnordpiglet 765 days ago
  This is happening. These projects are examples of feedback loops. While the models aren’t playing directly they are receiving feedback and iteratively improving. Constraining and optimizing using classical AI is an obvious next step. I agree this is when the magic happens.
- MagicMoonlight 765 days ago
  A LLM is a form of input like a keyboard. If they put it in front of something like siri and used it as input instead of a processor then you could make siri actually functional.
  It’s very good at understanding text but by itself it can’t think. Turning text into commands it knows is doable.
jmugan 766 days ago
I wish there was a summary of how this worked. I see the abstract and lots of figures and movies, but I still don't get a good sense of what exactly the algorithm is. I even skimmed the whole paper.
[-]
- ShamelessC 766 days ago
  https://twitter.com/DrJimFan/status/1662117785487704067
  You just scroll down a tiny bit on the twitter page and get this nice video and summary from the author.
  > Voyager has 3 key components:
  > 1) An iterative prompting mechanism that incorporates game feedback, execution errors, and self-verification to refine programs;
  > 2) A skill library of code to store & retrieve complex behaviors;
  > 3) An automatic curriculum to maximize exploration.
  [-]
  - jmugan 765 days ago
    Right, but I wanted to know what each of those things was.
ChrisArchitect 766 days ago
[dupe]
https://news.ycombinator.com/item?id=36085936
nitwit005 766 days ago
I searched for the word "infinite", and my suspicion was quickly proven correct:
> 9) Do not write infinite loops or recursive functions.
> Sometimes GPT-4 will write an infinite loop that runs forever.
[-]
- csours 766 days ago
  do not write infinite loops
  [-]
  - lsy 766 days ago
    I can only imagine the desperation of the researcher pleading with a computer to do its best to solve the halting problem.
codeulike 766 days ago
This is like its writing a bot to play minecraft?
I'd like to see a visual/language model/AI that learns to play minecraft as an actual inhabitant of the game. i.e. processing visual input, recognising objects, working out whats going on, learning how to move around. Learning how to make food and avoid monsters. It would be an 'Embodied AI' within the world of Minecraft.
The language part would allow us to talk to this being. You could ask it things like:
"Do you prefer to make a house, or dig a cave?"
"What are your hopes for the future?"
"Is there a recent achievement you are particularly proud of?"
etc
[-]
- AlecSchueler 766 days ago
  You could ask it those things but it will tell you that it doesn't have feelings, preferences or hopes. You'd also need to give a reason to do anything. Eventually you get back to the point where you're "writing a bot" but with behaviour closer to how you imagine it should be.
- krapp 766 days ago
  >I'd like to see a visual/language model/AI that learns to play minecraft as an actual inhabitant of the game. i.e. processing visual input, recognising objects, working out whats going on, learning how to move around. Learning how to make food and avoid monsters. It would be an 'Embodied AI' within the world of Minecraft.
  There is already an AI VTuber, Neurosama, trained to "play" games, including Minecraft (also OSU! and Among Us.)
  I don't think she's learned lava is bad yet.
tehsauce 766 days ago
Earlier discussion: https://news.ycombinator.com/item?id=36085936
slg 766 days ago
And yet it still digs straight down.
[-]
- neura 766 days ago
  Yeah... was wondering if it eventually learns that digging straight down is more likely to kill you than any other direction or if it could ever learn to down down only when standing on the edge of another block, instead of the one you're breaking.
  [-]
  - yazaddaruvala 766 days ago
    They don't explicitly talk about curriculum improvement based on negative reinforcement. It would be relatively straight forward to do tho.
    Perform self-reflection every time damage is taken (the iterations of self-reflection can depend on % health lost). Something like "Please output as a list of general guidelines / best-practices that any Minecraft player can use in the future. For each guideline add a risk profile that is introduced if the guideline is neglected. Based on the following game log over the last 100 steps, what should the Minecraft player have done differently to avoid taking damage? \n <game log>"
    And keep storing those guidelines over multiple iterations. If there start to become too many best-practices, ask GPT-4 to "Please output as a list of general guidelines / best-practices that any Minecraft player can use in the future. For each guideline add a risk profile that is introduced if the guideline is neglected. Take the guidelines below and summarize them. Prioritize retaining more detail about the guidelines with a higher risk profile, and do merge guidelines if possible and appropriate. \n <old guidelines>"
    [-]
    - zimpenfish 765 days ago
      > Perform self-reflection every time damage is taken
      Given the number of times I've taken damage in Minecraft from a Creeper that has seemingly just appeared behind me, I expect this feedback loop to pretty quickly end up with a bot that does a full surroundings scan after every action to make sure there's no Creepers around.
- js8 765 days ago
  It also fights an Enderman with a shovel, how smart.
sdenton4 765 days ago
There's untold billions of tokens of Minecraft related content on the web - the controlling LLM personally has memorized every Minecraft strategy guide ever written.
Geee 766 days ago
Very interesting. Here LLM writes code that plays Minecraft. If software 2.0 is neural networks, then software 3.0 is code written by neural networks.
[-]
- ablyveiled 766 days ago
  And software 4.0 is sticks and stones. har har.
Fgehono 765 days ago
Crazy how fast the peaces fall in place.
A big text corpus gives quite a huge amount of context.
SD also got better through this
mensetmanusman 765 days ago
It will set out to build a library that contains all the sources that it quotes but that only exist in the multiverse.
iinnPP 766 days ago
I imagine this is being worked on for OSRS as well. Exciting times. Terrifying times.
[-]
- drekk 765 days ago
  It's interesting you mention OSRS because some bots are using ChatGPT (or other LLMs) to respond to players and it's incredibly effective. Only having to deal with conversations seems like a perfect use-case for the technology
  https://youtu.be/fu9OQRh6K8U
phendrenad2 766 days ago
I'm imagining Twitch Plays Pokemon but even less coherent.
busseio 765 days ago
I’d like to build something like this, but for Robot Odyssey.
thdespou 765 days ago
Prompt engineering at it's finest form.
mbgerring 765 days ago
Why? Who wants this? To what end?
[-]
- mkaic 765 days ago
  > Why?
  Because the researchers found it to be an interesting problem. Using an AI to beat Minecraft has been an active benchmark in the ML world for some time now.
  > Who wants this?
  Me, as well as the many others who have liked and shared the paper.
  > To what end?
  Furthering our ability to create autonomous agents. If we can get it to work in Minecraft, that's one step closer to getting it to work in real life.
tommywiseausmom 766 days ago
[flagged]