> - Physics is still hard and there are obvious failure cases when I tried the classical intuitive physics experiments from psychology (tower of blocks).
> - Social and multi-agent interactions are tricky to handle. 1vs1 combat games do not work
> - Long instruction following and simple combinatorial game logic fails (e.g. collect some points / keys etc, go to the door, unlock and so on)
> - Action space is limited
> - It is far from being a real game engines and has a long way to go but this is a clear glimpse into the future.
Even with these limitations, this is still bonkers. It suggests to me that world models may have a bigger part to play in robotics and real world AI than I realized. Future robots may learn in their dreams...
I similarly am surprised at how fast they are progressing. I wrote this piece a few months ago about how I think steering world model output is the next realm of AAA gaming:
But even when I wrote that I thought things were still a few years out. I facetiously said that Rockstar would be nerd-sniped on GTA6 by a world model, which sounded crazy a few months ago. But seeing the progress already made since GameNGen and knowing GTA6 is still a year away... maybe it will actually happen.
> Rockstar would be nerd-sniped on GTA6 by a world model
I'm having trouble parsing your meaning here.
GTA isn't really a "drive on the street simulator", is it? There is deliberate creative and artistic vision that makes the series so enjoyable to play even decades after release, despite the graphics quality becoming more dated every year by AAA standards.
Are you saying someone would "vibe model" a GTAish clone with modern graphics that would overtake the actual GTA6 in popularity? That seems extremely unlikely to me.
GTA VI's story mode won't be surpassed by a world model, but the fucking around and blowing things up part conceivably could, and that's how people are spending their time in GTA. I don't see a world model providing the framing needed to contextualize the mayhem, thereby making it fun, anytime soon myself, but down the line? Maybe.
They will then learn the bitter lesson that convincing the GenAI to create something that brings your vision to life is impossible. It's a real talent to even be able to define for yourself what your vision is, and then to have artists achieve it visually in any medium is a process of back and forth between people with their own interpretations evolving the idea into something even better and cohesive.
GenAI will never get there because it can't, by design. It can riff on what was, and it can please the prompter, but it cannot challenge anyone creatively. No current LLM's can, either. I'll eat my hat if this is wrong in ten years, but it won't be.
It will generate refined slop ad nauseam, and that will train people's brains into spotting said slop faster using less energy. And then it'll be shunned.
I don't _really_ mean it obviously but I think a key component of what makes something like GTA compelling is that fully modeled world you move around in. These things take what amounts to hundreds if not thousands of man years to create "traditionally", and the fact someone can now prompt to life a city or any other environment with simiailr (or better) fidelity is a massive change in how we think about creative content production.
GTA6 will not actually be nerd-sniped, but it's easy to see how a lot of what makes the game defensible is being rapidly commoditized.
Probably depends on how you engage with GTA. “Drive on the street simulator” along with arrays of weapons and explosions is the majority of my hours in GTA.
I despise the creative and artistic vision of GTA online, but I’m clearly in a minority there gauging by how much money they’ve made off it.
I'm trying to wrap my head around this since we're still seeing text spit out slowly ( I mean slowly as in 1000's of tokens a second)
I'm starting to think some of the names behind LLMs/GenAI are cover names for aliens and any actual humans involved have signed an NDA that comes with millions of dollars and a death warrant if disobeyed.
The future of games was MMORPGs and RPG-ization in general as other genres adopted progression systems. But the former two are simply too expensive and risky even today for AAA to develop. Which brings us to another point, the problem with Western AAA is more about high levels of risk aversion, which is what's really feeding the lack of imaginative. And that's more to do with the economics of opportunity cost to the S&P 500.
Anyways, crafting pretty looking worlds is one thing, but you still need to fill them in with something worth doing, and that's something we haven't really figured out. That's one of the reasons why the sandbox MMORPG was developed as opposed to "themeparks". The underlying systems, the backend is the real meat here. At most with the world models right now is that you're replacing 3d artists and animators, but I would not say that is a real bottleneck in relation to one's own limitations.
> Which brings us to another point, the problem with Western AAA is more about high levels of risk aversion, which is what's really feeding the lack of imaginative.
Maybe I’m misinterpreting what you’re saying here, but 2021 til present has been a glut of some of the best titles ever made, by pretty much any measure
Reality is not composed of words, syntax, and semantics. A human modal is.
Other human modals are sensory only, no language.
So vision learning and energy models that capture the energy to achieve a visual, audio, physical robotics behavior are the only real goal.
Software is for those who read the manual with their new NES game. Where are the words inside us?
Statistical physics of energy to make machine draw the glyphs of language not opionated clustering of language that will close the keyboard and mouse input loop. We're like replicating human work habits. Those are real physical behaviors. Not just descriptions in words.
A neural net can produce information outside of its original data set, but it is all and directly derived from that initial set. There are fundamental information constraints here. You cannot use a neural net to itself generate from its existing data set wholly new and original full quality training data for itself.
You can use a neural net to generate data, and you can train a net on that data, but you'll end up with something which is no good.
I might be misunderstanding your comment so sorry if so. Robots have sensors and RL is a thing, they can collect real world data and then processing and consolidating real world experiences during downtime (or in real time), running simulations to prepare for scenarios, and updating models based on the day's collected data. The way I saw it that I thought was impressive was the robot understood the scene, but didn't know how the scene would respond to it's actions, so it gens videos of the possible scenarios, and then picks the best ones and models it's actuation based on it's "imagination".
The benefit of these AI-generated simulation models as a training mechanism is that it helps add robustness without requiring a large training set. The recombinations can generate wider areas of the space to explore and learn with but using a smaller basis space.
To pick an almost trivial example, let's say OCR digit recognition. You'll train on the original data-set, but also on information-preserving skews and other transforms of that data set to add robustness (stretched numbers, rotated numbers, etc.). The core operation here is taking a smallset in some space (original training data) and producing some bigset in that same space (generated training data).
For simple things like digit recognition, we can imagine a lot of transforms as simple algorithms, but one can consider more complex problems and realize that an ML model would be able to do a good job of learning how to generate bigset candidates from the smallset.
Humans are dependent on their input data (through lifetime learning and, perhaps, information encoded in the brain from evolution), and yet they can produce out of distribution information. How?
There is an uncountably large number of models that perfectly replicate the data they're trained on; some generalize out of distribution much better. Something like dreaming might be a form of regularization: experimenting with simpler structures that perform equally well on training data but generalize better (e.g. by discovering simple algorithms that reproduce the data equally well as pure memorization but require simpler neural circuits than the memorizing circuits).
Once you have those better generalizing circuits, you can generate data that not only matches the input data in quality but potentially exceeds it, if the priors built into the learning algorithm match the real world.
Humans produce out-of-distribution data all the time, yet if you had a teacher making up facts and teaching them to your kids, you would probably complain.
This is definitely one of the potential issues that might happen to embodied agents/robots/bodies trained on the "world model". As we are training a model for the real world based on a model that simulates the real world, the glitches in the world simulator model will be incorporated into the training. There will be edge cases due to this layered "overtraining", where a robot/agent/body will expect Y to happen but X will happen, causing unpredictable behaviour.I assume that a generic world agent will be able to autocorrect, but this could also lead to dangerous issues.
I.e. if the simulation has enough videos of firefighters breaking glass where it seems to drop instantaneously and in the world sim it always breaks, a firefighter robot might get into a problem when confronted with unbreakable glass, as it expects it to break as always, leading to a loop of trying to shatter the glass instead of performing another action.
It is possible - for example, getting a blob of physics data, fitting a curve then projecting the curve to theorise what would happen in new unseen situations. The information constraints don't limit the ability to generate new data in a specific domain from a small sample; indeed it might be possible to fully comprehend the domain if there is an underlying process it can infer. It is impossible to come up with wildly unrelated domains though.
Approximately speaking, you have a world model and an agent model. You continue to train the world model using data collected by the robot day-to-day. The robot "dreams" by running the agent model against the world model instead of moving around in the real world. Dreaming for thousands of (simulated) hours is much more efficient than actually running the physical hardware for thousands of wall clock hours.
We are miles away from the fundamental constraint. We know that our current training methodologies are scandalously data inefficient compared to human/animal brains. Augmenting observations with dreams has long been theorized to be (part of) the answer.
> current training methodologies are scandalously data inefficient compared to human/animal brains
Are you sure? I've been ingesting boatloads of high definition multi-sensory real-time data for quite a few decades now, and I hardly remember any of it. Perhaps the average quality/diversity of LLM training data has been higher, but they sure remember a hell of a lot more of it than I ever could.
Humans can learn from visualising situations and thinking through different scenarios. I don't see why AI / robots can't do similar. In fact I think quite a lot of training for things like Tesla self driving is done in simulation.
Give it tool access let it formulate it's own experiments etc.
The only question here is if it becomes a / the singularity because of this, gets stuck in some local minimum or achieves random perfection and random local minimum locations.
I'm invested in a startup that is doing something unrelated robotics, but they're spending a lot of time in Shenzhen, I keep a very close eye on robotics and was talking to their CTO about what he is seeing in China, versions of this are already being implemented.
And these are consumer options, affordable to you and me, not only to some military. If those are the commonly available options... there may be way more advanced stuff that we haven't seen.
this stuff is old tech, and has nothing to do with transformers. The Boston Dynamics style robot dogs are always shown in marketing demos like the one you linked in secretly very controlled environments. Let me know when I can order one that will bring the laundry downstairs for my wife.
I asked for real examples from someone who claimed to have first hand experience, not more marketing bullshit
"Consciousness" is an overloaded thought killer that swerves all conversation into obfuscated semantic arguments. One person will be talking about 'internality' and self-image (in the testable, mechanical sense that you could argue Chain of Thought models already have in a petty way) and the other will be grappling with the concept of qualia and the ineffable nature of human experience.
That's not even a devil's advocate, many other animals clearly have consciousness, at least if we're not solipsistic. There have been many very dangerous precedents in medicine where people have been declared "brain dead" only to awake and remember.
Since consciousness is closely linked to being a moral patient, it is all the more important to err on the side of caution when denying qualia to other beings.
AI has traditionally been driven by "metaphor-driven development" where people assume the brain has system X, program something they give the same name, and then assume because they've given it that name it must work because it works in the brain.
This is generally a bad idea, but a few of the results like "neural networks" did work out… eventually.
"World model" is another example of a metaphor like this. They've assumed that humans have world models (most likely not true), and that if they program something and call it a "world model" it will work the same way (definitely not true) and will be beneficial (possibly true).
(The above critique comes from Phil Agre and David Chapman.)
Even as a layman and AI skeptic, to me this entirely matches my expectations, and something like this seemed like it was basically inevitable as of the first demos of video rendering responding to user input (a year ago? maybe?).
Not to detract from what has been done here in any way, but it all seems entirely consistent with the types of progress we have seen.
It's also no surprise to me that it's from Google, who I suspect is better situated than any of its AI competitors, even if it is sometimes slow to show progress publicly.
>It's basically what every major AI lab head is saying from the start.
I suppose it depends what you count as "the start". The idea of AI as a real research project has been around since at least the 1950s. And I'm not a programmer or computer scientist, but I'm a philosophy nerd and I know debates about what computers can or can't do started around then. One side of the debate was that it awaited new conceptual and architectural breakthroughs.
I also think you can look at, say, Ted Talks on the topic, with guys like Jeff Hawkins presenting the problem as one of searching for conceptual breakthroughs, and I think similar ideas of such a search have been at the center of Douglas Hofstadter's career.
I think in all those cases, they would have treated "more is different" like an absence of nuance, because there was supposed to be a puzzle to solve (and in a sense there is, and there has been, in terms of vector space and back propagation and so on, but it wasn't necessarily clear that physics could "pop out" emergently from such a foundation).
When they say "the start", I think they mean the start of the current LLM era (circa 2017). The main story of this time has been a rejection of the idea that major conceptual breakthroughs and complex architectures are needed to achieve intelligence. Instead, it's better to focus on simple, general-purpose methods that can scale to massive amounts of data and compute (i.e. the Bitter Lesson [1]).
Oof ... to call other people's decades of research into directed machine learning "a colossal waste of researcher's time" is indeed a rather toxic point of view unsurprisingly causing a bitter reaction in scientists/researchers.
Even if his broader point might be valid (about the most fruitful directions in ML), calling something a "bitter lesson" while insulting a whole field of science is ... something.
Also as someone involved in early RL, he should know better.
It's akin to us sending a rocket to space and immediately discovering a wormhole. Sure, there's a lot of science about what's out there, but to discover all this in our first few trips to orbit ...
Joscha Bach postulates that what we call consciousness must be something rather simple, an emergent property present in all sufficiently complex biological organisms.
We don't inherit any software, so cognitive function must bootstrap itself from it's underlying structure alone.
I wonder, though. Many animal species just "know" how to perform certain complex actions without being taught the way humans have to be taught. Building a nest, for example.
If you say that this is emergent from the "underlying structure alone", doesn't this mean that it would still be "inherited" software (though in this case, maybe we think of it like punch cards).
I’ve seen different figures for information content of DNA but they’re all mostly misleading. What we actually inherit is much more. We are the result of an unpacking algorithm starting from a single cell over time, so our information content should at the very least include the entirety of the cell (which is probably impossible to calculate). Additionally, in a more general sense, arbitrarily complex behavior can be derived from very simple mathematics, e.g. cellular automata. With sufficient complex dynamics (which for us are given by the laws of physics), even very small information changes lead to vastly different “emergent behavior”, whatever that means. One could improperly say that part of the information is included in the laws of physics itself.
A biological example that I like: the neural structures for vision develop almost fully formed from the very beginning. The state of our network at initialization is effectively already functional. I’m not sure to which extent this is true for humans, but it is certainly true for simpler organisms like flies. The way cells achieve this is through some extremely simple growth rules as the structure is being formed for the first time. Different kinds of cells behave almost independently of each other, and it just so happens that the final structure is a perfectly functional eye. I’ve seen animations of this during a conference talk and it was one of the most fascinating things I’ve ever seen. It truly shows how the complexity of a biological organism is just billions of times any human technology. And at the same time, it’s a beautiful illustration of the lack of intelligent design. It’s like watching a Lego assemble by just shaking the pieces.
Problems like this will turn out to have simple solutions. Once we get past the idea of "inherited instinct" (obvious nonsense and easily proved to be so) the solution will be easier to see.
An example that might be useful: dragonflies lay their eggs in water. Since a dragonfly has like a 4-bit CPU you might be amazed at how it manages to get all the processing required to identify a body of water from a distance into its tiny mind, and also marvel at what sort of JPEG+++ encoding must be used to convey what water looks like from generation to generation.
But they don't do that at all: instead they have eyes that are sensitive to polarized light. The surface of water polarizes reflected light. So do things like polished gravestones. So dragonflies will lay their eggs on gravestones too.
One I like to ponder is: beavers building damns. Do they have an encoded algorithm that knows that they need to damn the river to have a place to live, by gnawing on trees, carrying them to the right place on the river bed, etc? Nope, certainly they don't have that. Perhaps they have teeth that grow so long that they hurt, motivating the animal to gnaw on something solid to wear them down. The only solid thing they have available is a tree.
A similar phenomenon was demonstrated with deep neural networks nearly a decade ago. You optimize the architecture using randomized weights instead of optimizing the weights. You can still optimize the weights in a separate additional step to improve performance.
I’ve always said that animals have short term and long term memory via the hippocampus, and then there’s supragenerational memory stored in DNA - behaviors that are learned over many generations and passed down via genetics.
The emergent property theory seems logical, but I'm also partial to the quantum-tunneling-miasma theory which basically posits that there could be something fairly complex going on, and we just lack the ability to observe/measure it in our current physics. (Although I have difficulty coherently separating this theory from faith-based beliefs)
>We don't inherit any software, so cognitive function must bootstrap itself from it's underlying structure alone.
Hardware and software, as metaphors applied to biology, I think are better understood as a continuum than a binary, and if we don't inherit any software (is that true?), we at least inherit assembly code.
> we don't inherit any software (is that true?), we at least inherit assembly code
To stay with the metaphor, DNA could be rather understood as firmware that runs on the cell. What I mean with software is the 'mind' that runs on a collection of cells. Things like language, thoughts and ideas.
There is also a second level of software that runs not on a single mind alone, but collection of minds, to form cliques or a societies. But this is not encoded in genes, but in memes.
I think we have some notion of a proto-grammar or ability to linguistically conceptualize, probably at the level of some primordial conceptual units that are more fundamental than language, thoughts and ideas in the concrete forms we generally understand them to have.
I think it's like Chomsky said, that we don't learn this infrastructure for understanding language any more than a bird "learns" their feathers. But I might be losing track of what you're suggesting is software in the metaphor. I think I'm broadly on board with your characterization of DNA, the mind and memes generally though.
Lemme start by saying this is objectively amazing. But I just really wouldn't call it a breakthrough.
We had one breakthrough a couple of years ago with GPT-3, where we found that neural networks / transformers + scale does wonders.
Everything else has been a smooth continuous improvement. Compare today's announcement to Genie-2[1] release less than 1 year ago.
The speed is insane, but not surprising if you put in context on how fast AI is advancing. Again, nothing _new_. Just absurdly fast continuous progress.
Why wouldn't it? I still have to hear one convincing argument how our brain isn't working as a function of probable next best actions. When you look at amoebas work, and animals that are somewhere between them and us in intelligence, and then us, it is a very similar kind of progression we see with current LLMs, from almost no state of the world, to a pretty solid one.
You don’t ask people to speak how you want, you simply only invite people who already have a history of speaking how you want. This phenomena is explained in detail I. Noam Chomsky’s work around mass media (eg NY Times doesn’t tell their editors what to do exactly, but only hire editors who already want to say what NY Times wants, or have a certain world view). The same can be applied to social media reviews. Invite the person who gives glowing reviews all the time.
Do you know where Noam makes that argument? I've been trying to figure out where I picked it up years ago. I'd like to revisit it to deepen my understanding. It's a pretty universal insight.
"I don't say you're self-censoring - I'm sure you believe everything you're saying; but what I'm saying is, if you believed something different, you wouldn't be sitting where you're sitting." -- Noam Chomksy to Andrew Marr
It's a shame the interviewer didn't quite grasp that point and dig a little deeper into it. Listening to it again I'm reminded of "The masters tools will never dismantle the master's house".
Though this is often associated with his and Herman's "Propaganda Model," Chomsky has also commented that the same appears in scholarly literature, despite the overt propaganda forces of ownership and advertisement being absent:
Don't put the world state into the model. Use the model as a renderer of whatever objects the "engine" throws at it.
Use the CPU and RAM for world state, then pass it off to the model to render.
Regardless of how this is done, Unreal Engine with all of its bells and whistles is toast. That C++ pile of engineering won't outdo something this flexible.
How many watts and how much capital does it take to run this model? How many watts and how much capital does it take to run unity or unreal? I suspect there's a huge discrepancy here, among other things.
> What I don't think this technology will do is replace game engines. I just don't see how you could get the very precise and predictable editing you have in a regular game engine from anything like the current model. The real advantage of game engines is how they allow teams of game developers to work together, making small and localized changes to a game project.
I've been thinking about this a while and it's obvious to me:
Put Minecraft (or something similar) under the hood. You just need data structures to encode the world. To enable mutation, location, and persistence.
If the model is given additional parameters such as a "world mesh", then it can easily persist where things are, what color or texture they should be, etc.
That data structure or server can be running independently on CPU-bound processes. Genie or whatever "world model" you have is just your renderer.
It probably won't happen like this due to monopolistic forces, but a nice future might be a future where you could hot swap renderers between providers yet still be playing the same game as your friends - just with different looks and feels. Experiencing the world differently all at the same time. (It'll probably be winner take all, sadly, or several independent vertical silos.)
If I were Tim Sweeny at Epic Games, I'd immediately drop all work on Unreal Engine and start looking into this tech. Because this is going to shore them up on both the gaming and film fronts.
As a renderer, given a POV, lighting conditions, and world mesh might be a very, very good system. Sort of a tight MCP connection to the world-state.
I think in this context, it could be amazing for game creation.
I’d imagine you would provide item descriptions to vibe-code objects and behavior scripts, set up some initial world state(maps), populated with objects made of objects - hierarchically vibe-modeled, make a few renderings to give inspirational world-feel and textures, and vibe-tune the world until you had the look and feel you want. Then once the textures and models and world were finalised, it would be used as the rendering context.
I think this is a place that there is enough feedback loops and supervision that with decent tools along these lines, you could 100x the efficiency of game development.
It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.
> you could 100x the efficiency of game development.
> It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.
All video games become Minecraft / Roblox / VRChat. You don't need AAA studios. People can make and share their own games with friends.
Scary realization: YouTube becomes YouGame and Google wins the Internet forever.
I haven’t checked on Roblox recently, but afaik it doesn’t really allow complete creative freedom or the ability to have a picture and say “make the world look like this, and make the character textures match the vibe” and have it happen. Don’t they still have a unified world experience or can you really customize things that deeply now?
Can you make a basically indistinguishable copy of other games in Roblox? If so, that’s pretty cool, even without AI integration.
Roblox can't beat Google in AI. Roblox has network effects with users, but on an old school tech platform where users can't magic things into existence.
I've seen Roblox's creative tools, even their GenAI tools, but they're bolted on. It's the steam powered horse problem.
I think this puts Epic Games, Nintendo, and the whole lot into a very tough spot if this tech takes off.
I don't see how Unreal Engine, with its voluminous and labyrinthine tomes of impenetrable legacy C++ code, survives this. Unreal Engine is a mess, gamers are unhappy about it, and it's a PITA to develop with. I certainly hate working with it.
Innovator's Dilemma fast approaching the entire gaming industry and they don't even see it coming it's happening so fast.
Exciting that building games could become as easy as having the idea itself. I'm imagining something like VRChat or Roblox or Fortnite, but where new things are simply spoken into existence.
It's absolutely terrifying that Google has this much power.
I played around with Diamond WM on my 3090 machine. I also ran fast SDXL-turbo and LCM models with ControlNets paired with a 3D game prototype I threw together. The results were very compelling, and I was just one person hacking things together.
This is 100% going to happen on-device. It's just a matter of time.
It is plausible to run a full simulation the old fashioned way and realtime render it with a diffusion model.
It is not currently, or near term, realistic to make a video game where a meaningful portion of the simulation is part of the model.
There will probably be a few interactive model-first experiences. But they’ll be popular as short novelties not meaningful or long experiences.
A simple question to consider is how would you adjust a set of simple tunables in a model-first simulator? For example giving the player more health, making enemies deal 2x damage, increasing move speed, etc etc. You can not.
This is very encouraging progress, and probably what Demis was teasing [1] last month. A few speculations on technical details based on staring at the released clips:
1. You can see fine textures "jump" every 4 frames - which means they're most likely using a 4x-temporal-downscaling VAE with at least 4-frame interaction latency (unless the VAE is also control-conditional). Unfortunately I didn't see any real-time footage to confirm the latency (at one point they intercut screen recordings with "fingers on keyboard" b-roll? hmm).
2. There's some 16x16 spatial blocking during fast motion which could mean 16x16 spatial downscaling in the VAE. Combined with 1, this would mean 24x1280x720/(4x16x16) = 21,600 tokens per second, or around 1.3 million tokens per minute.
3. The first frame of each clip looks a bit sharper and less videogamey than later stationary frames, which suggests this is could be a combination of text-to-image + image-to-world system (where the t2i system is trained on general data but the i2w system is finetuned on game data with labeled controls). Noticeable in e.g. the dirt/textures in [2]. I still noticed some trend towards more contrast/saturation over time, but it's not as bad as in other autoregressive video models I've seen.
Regarding latency, I found a live video of gameplay here [1] and it looks like closer to 1.1s keypress-to-photon latency (33 frames @ 30fps) based on when the onscreen keys start lighting up vs when the camera starts moving. This writeup [2] from someone who tried the Genie 3 research preview mentions that "while there is some control lag, I was told that this is due to the infrastructure used to serve the model rather than the model itself" so a lot of this latency may be added by their client/server streaming setup.
You know that thing in anxiety dreams where you feel very uncoordinated and your attempts to manipulate your surroundings result in unpredictable consequences? Like you try to slam on the brake pedal but your car doesn’t slow down, or you’re trying to get a leash on your dog to lead it out of a dangerous situation and you keep failing to hook it on the collar? Maybe that’s extra latency because your brain is trying to render the environment at the same time as it is acting.
Firstly it can render environments in detail. I'm (mostly) aphantasic even in dreams, so this wasn't obvious to me. But most people literally get visual renderings in their mind.
Secondly, it's fairly clear now that our sensory inputs are not being experienced as sensory inputs. We experience a reconstruction. Obvious basic sign of this is that we fill in the gap in vision where the optic nerve is. But generally, we're making an integrated world model all the time out of the senses, and are conscious of that world model.
You're right though, both the above are rendering the experience and can take shortcuts for that. It's sufficiently detailed in each case though that it kinda is rendering the world too, in some sense.
> I found a live video of gameplay here [1] and it looks like closer to 1.1s keypress-to-photon latency (33 frames @ 30fps) based on when the onscreen keys start lighting up vs when the camera starts moving.
Really impressive... but wow this is light on details.
While I don't fully align with the sentiment of other commenters that this is meaningless unless you can go hands on... it is crazy to think of how different this announcement is than a few years ago when this would be accompanied by an actual paper that shared the research.
Instead... we get this thing that has a few aspects of a paper - authors, demos, a bibtex citation(!) - but none of the actual research shared.
I was discussing with a friend that my biggest concern with AI right now is not that it isn't capable of doing things... but that we switched from research/academic mode to full value extraction so fast that we are way out over our skis in terms of what is being promised, which, in the realm of exciting new field of academic research is pretty low-stakes all things considered... to being terrifying when we bet policy and economics on it.
To be clear, I am not against commercialization, but the dissonance of this product announcement made to look like research written in this way at the same time that one of the preeminent mathematicians writing about how our shift in funding of real academic research is having real, serious impact is... uh... not confidence inspiring for the long term.
I wish they would share more about how it works. Maybe a reseach paper for once? we didn't even get a technical report.
From my best guess: it's a video generation model like the ones we already head. But they condition inputs (movement direction, viewangle). Perhaps they aren't relative inputs but absolute and there is a bit of state simulation going on? [although some demo videos show physics interactions like bumping against objects - so that might be unlikely, or maybe it's 2D and the up axis is generated??].
It's clearly trained on a game engine as I can see screenspace reflection artefacts being learned. They also train on photoscans/splats... some non realistic elements look significantly lower fidelity too..
some inconsistencies I have noticed in the demo videos:
- wingsuit discollcusions are lower fidelity (maybe initialized by high resolution image?)
- garden demo has different "geometry" for each variation, look at the 2nd hose only existing in one version (new "geometry" is made up when first looked at, not beforehand).
- school demo has half a caroutside the window? and a suspiciously repeating pattern (infinite loop patterns are common in transformer models that lack parameters, so they can scale this even more! also might be greedy sampling for stability)
- museum scene has odd reflection in the amethyst box, like the rear mammoth doesn't have reflections on the right most side of the box before it's shown through the box. The tusk reflection just pops in. This isn't fresnel effect.
I feel after the 2017 transformer paper, and its impact on current state of AI, and google stocks, it seems Google is much more hesistant to keep things under their wings for now. Sadly, so.
I'm still struggling to imagine a world where predicting the next pixel wins over over building a deterministic thing that is then ran.
Eg: Using AI to generate textures, wire models, motion sequences which themselves sum up to something that local graphics card can then render into a scene.
I'm very much not an expert in this space, but to me it seems if you do that, then you can tweak the wire model, the texture, move the camera to wherever you want in the scene etc.
At some point it will be computationally cheaper to predict the next pixel than to classically render the scene, when talking about scenes beyond a certain graphical fidelity.
The model can infinitely zoom in to some surface and depict(/predict) what would really be there. Trying to do so via classical rendering introduces many technical challenges
I imagine a future where the “high level” stuff in the environment is pre defined by a human (with or without assistance from AI), and then AI sort of fills in the blanks on the fly.
So for example, a game designer might tell the AI the floor is made of mud, but won’t tell the AI what it looks like if the player decides to dig a 10 ft hole in the mud, or how difficult it is to dig, or what the mud sounds like when thrown out of the hole, or what a certain NPC might say when thrown down the hole, etc.
I'll try! Let's consider a tree blowing in the wind:
To classically render this in any realistic fashion, it quickly gets complex. Between the physics simulation (rather involved) and the number of triangles (trees have many branches and leaves), you're going to be doing a lot of math.
I'll emphasize "realistic" - sure, we can real-time render trees in 2025 that look.. ok. However, take more than a second to glance at it and you will quickly start to see where we have made compromises to the tree's fidelity to ensure it renders at an adequate speed on contemporary hardware.
Now consider a world model trained on enough tree footage that it has gained an "intuition" about how trees look and behave. This world model doesn't need to actually simulate the entire tree to get it to look decent.. it can instead directly output the pixels that "make sense". Much like a human brain can "simulate" the movement of an object through space without expending much energy - we do it via prediction based on a lot of training data, not by accurately crunching a bunch of numbers.
That's just one tree, though - the real world has a lot of fidelity to it. Fidelity that would be extremely expensive to simulate to get a properly realistic output on the other side.
Instead we can use these models which have an intuition for how things aught to look. They can skip the simulation and just give you the end result that looks passable because it's based on predictions informed by real-world data.
Don't you think a sufficiently advanced model will end up emulating what normal 3D engines already do mathematically? At least for the rendering part, I don't see you can "compress" the meaning behind light interaction without ending up with a somewhat latent representation of the rendering equation
> At some point it will be computationally cheaper to predict the next pixel than to classically render the scene,
This is already happening to some extent, some games struggle to reach 60 FPS at 4K resolution with maximum graphics settings using traditional rasterization alone, so technologies like DLSS 3 frame generation are used to improve performance.
Instead of the binary of traditional games vs AI, it's worth thinking more about hybrids.
You could have a stripped down traditional game engine, but without any rendering, that gives a richer set of actions to the neural net. Along with some asset hints, story, a database (player/environment state) the AI can interact with, etc. The engine also provides bounds and constraints.
Basically, we need to work out the new boundary between engine and AI. Right now it's "upsample and interpolate frames", but as AI gets better, what does that boundary become?
Pass-through AR that does more than add a few things to the scene or very crude relighting from a scan mesh. Classic methods aren't great at it and tend to feel like you are just sticking some of of place objects on top of things. apple gives a lighting estimate to make it sit better in the scene, but may already be using some AI for that (I think it's just a cube map or a spherical harmonic based thing). But we'll want to do much more than matching lighting.
Another linguistic devastation. A "world model" is in epistemology the content of a representation of states of thing - all states of things, facts and logic.
This use of the expression "world model" seems to be a reduction. An that's too bad, because we needed the idea in its good form to speak about what neural networks contain, in this LLM sub-era.
Like the new widespread sloppy use of the expression "AI", this does not contribute to collective mental clarity.
And made hands 10x worse. Now hands are good, text is good, image is good, so we’ll have to play where’s Waldo all over again trying to find the flaw. It’s going to eventually get to a point where it’s one of those infinite zoom videos where the AI watermark is the size of 1/3rd of a pixel.
What I’d really love to see more of is augmented video. Like, the stormtrooper vlogs. Runway has some good stuff but man is it all expensive.
someone mentioned physics. Which might be an interesting conundrum because an important characteristic of games is that some part of them is both novel and unrealistic. (They're less fun if they're too real)
It depends on the genre. Simulation “games” tend to love realism of simulation while providing an accelerated time. Others like you said, are more fun with plausible physics or physics that bend the rules a little bit. Sometimes, a game just about funky physics becomes a hit - Goat Simulator.
Walking/Running/Steps have already been solved pretty well with NN’s, but simulation of vehicle engines and vehicle physics have not. Not to my knowledge. I suspect iRacing would be extremely interested in such a model.
edit
I take it back, PINN’s are a thing and now I have a new rabbit hole…
I wouldn't say that the text problem has been fully fixed. It has certainly gotten a lot better, but even gpt-image-1 still fails occasionally when generating text.
This is revolutionary. I mean, we already could see this coming, but now it's here. With limitations, but this is the beginning.
In game engines it's the engineers, the software developers who make sure triangles are at the perfect location, mapping to the correct pixels, but this here, this is now like a drawing made by a computer, frame by frame, with no triangles computed.
Everyone is in agreement, this is impressive stuff. Mind blowing, even. But have the good people at Google decided why exactly we need to build the torment nexus?
Very cool! I've done research on reinforcement/imitation learning in world models. A great intro to these ideas is here: https://worldmodels.github.io/
I'm most excited for when these methods will make a meaningful difference in robotics. RL is still not quite there for long-horizon, sparse reward tasks in non-zero-sum environments, even with a perfect simulator; e.g. an assistant which books travel for you. Pay attention to when virtual agents start to really work well as a leading signal for this. Virtual agents are strictly easier than physical ones.
Compounding on that, mismatches between the simulated dynamics and real dynamics make the problem harder (sim2real problem). Although with domain randomization and online corrections (control loop, search) this is less of an issue these days.
Multi-scale effects are also tricky: the characteristic temporal length scale for many actions in robotics can be quite different from the temporal scale of the task (e.g. manipulating ingredients to cook a meal). Locomotion was solved first because it's periodic imo.
Check out PufferAI if you're scale-pilled for RL: just do RL bigger, better, get the basics right. Check out Physical Intelligence for the same in robotics, with a more imitation/offline RL feel.
Advances in generative AI are making me progressively more and more depressive.
Creativity is taken from us at exponential rate. And I don't buy argument from people who are saying they are excited to live in this age. I can get that if that technology stopped at current state and remained to be just tools for our creative endeavours, but it doesn't seem to be an endgame here. Instead it aims to be a complete replacement.
Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.
So what is final state here for us? Return to menial not-yet-automated work? And when this would be eventually automated, what's left? Plug our brains to personalized autogenerated worlds that are tailored to trigger related neuronal circuitry for producing ever increasing dopamine levels and finally burn our brains out (which is arguably already happening with tiktok-style leasure)? And how you are supposed to pay for that, if all work is automated? How economics of that is supposed to work?
Looks like a pretty decent explanation of Fermi paradox. No-one would know how technology works, there are no easily available resources left to make use of simpler tech and planet is littered to the point of no return.
How to even find the value in living given all of that?
> I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.
Numerous famous writers, painters, artists, etc counter this idea, Kafka being a notable example, whose significant works only came to light after his passing and against his will. This doesn't take away from the rest of your discussion point, but art always has and always will also exist solely for its own sake.
> I don't buy argument from people who are saying they are excited to live in this age
What argument is required for excitement? Excitement is a feeling not a rational act. It comes from optimism and imagination. There is no argument for optimism. There is often little reason in imagination.
> How to even find the value in living given all of that?
You might have heard of the Bhagavad Gita, a 2000+ year old spiritual text. It details a conversation between a warrior prince and a manifestation of God. The warrior prince is facing a very difficult battle and he is having doubts justifying any action in the face of the decisions he has to make. He is begging this manifestation of God to give him good reasons to act, good reasons not just to throw his weapons down, give away all his possessions and sit in a cave somewhere.
There are no definite answers in the text, just meditations on the question. Why should we act when the result is ultimately pointless, we will all die, people will forget you, situations will be resolved with or without you, etc.
This isn't some new question that LLMs are forcing us to confront. LLMs are just providing us a new reason to ask the same age-old questions we have been facing for as long as writing has existed.
Today physical world is largely mechanized, we rarely walk, run lift heavy things for survival. So we grow fat and weak unless we exercise. Tomorrow vast majority of us will never think, create, investigate for earning a living. So we will get dumb and dumber over time. A small minority of us will keep polishing their intellect but will never be smarter than machines just like the best athletes of today can't outrun machines.
This is surprisingly a great analogy because millions of people still run every week for their own benefit (physical and mental health, social connection, etc).
I wonder if mental exercises will move to the same category? Not necessarily a way to earn money, but something everybody does as a way of flourishing as a human.
I don't know... There are plenty of otherwise capable adults who just get home from work and watch TV. They either never, or extremely rarely, indulge in hobbies, go see a concert, or even go out to meet others. Not that TV can't be art and challenge us but lets be honest, 99% of it is not that.
We already live in a world where a vast library of songs by musicians who play much better than you are readily available on YouTube and Spotify. This seems like more of the same?
I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.
I don't want to live in a world where these things are generated cheaply and easily for the profit of a very select few group of people.
I know the world doesn't work like I described in the top paragraph. But it's a lot closer to it than the bottom.
It's hard to see how there will be room for profit as this all advances
There will be two classes of media:
- Generated, consumed en-masse by uncreative, uninspired individuals looking for cheap thrill
- Human created, consumed by discerning individuals seeking out real human talent and expression. Valuing it based merely on the knowledge that a biological brain produced (or helped produce) it.
I tend to suspect that the latter will grow in value, not diminish, as time progresses
It seems to me that you’re describing Hollywood? Admittedly, there are big budget productions, but Hollywood is all about fakery, it’s cheap for the consumer, and there’s a lot of audience-pleasing dreck.
There’s no bright line between computer and human-created video - computer tools are used everywhere.
> I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.
Rewarded how? 99.99% of people who do things like sports or artistic like writing never get "rewarded for doing so", at least in the way I imagine you mean the phrase. The reward is usually the experience itself. When someone picks up a ball or an instrument, they don't do so for some material reward.
Why should anyone be rewarded materially for something like this? Why are you so hung up on the <0.001% that can actually make some money now having to enjoy the activity more as a hobby than a profession.
99.99% of people, really? You think there isn't a huge swath of the economy that are made up of professional writers, artists, musicians, graphic designers, and all the other creative professionals that the producers of these models aim to replicate the skills of?
Why am I so "hung up" on the livelihood of these people?
Doing art is a Hobby is a good in and of itself. I did not say otherwise. But when I see a movie, when I listen to a song, I want to appreciate the integrity and talent of the people that wrote them. I want them to get paid for that enjoyment. I don't think that's bizarre.
You can still makes movies , music etc. But now with better tools. Just accept the new reality and try to play this new level. The old won't come back. Its a waste of time to complain and feel frustrated. There are plenty of opportunities to express your creativity.
I could see that theater and live music (especially performed on acoustic instruments) become hyper popular because it'll be the only talent worth paying to see when everything else is 'cheaply' made.
> I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.
That world has only existed for the last hundred or so years, and the talent is usually brutally exploited by people whose main talent is parasitism. Only a tiny percentage of people who sell creative works can make a living out of it; the living to be made is in buying their works at a premium, bundling them, and reselling them, while offloading almost all of the risk to the creative as an "advance."
Then you're left in a situation where both the buyer of art and the creator of art are desperate to pander to the largest audience possible because everybody is leveraged. It's a dogshit world that creates dogshit art.
Yes but I don't want to hear some anonymous background music.
A better example would be Spotify replacing artist-made music recommandations with low-quality alternatives, to reduce what it pays to artists. Everyone except Spotify loses in this scenario.
My prediction is that personal generation is going to be niche forever, for purely social reasons. The demand for fandoms and fan communities seems to be essentially unlimited. Big artists have big fandoms, tiny ones have tiny fandoms, but none of that works with personalized generations.
Well, maybe. But there are overwhelmingly large numbers of people who want to be in a fandom, and that means being fans of some shared thing. Maybe that shared thing will be AI generated, but it won't be a world of solipsists.
I think what the person you’re responding to meant was that you can generate a fandom for the content that was generated for you. So, you can get the feeling of being in a fandom despite there being no actual other humans that know what you’re talking about.
Communities around fictional universes are already fractured and shrinking in member size because of the sheer number of algorithmically targeted universes available.
Water cooler talk about what happened this week in M.A.S.H. or Friends is extinct.
Worse, in the long run even community may be synthesized. If a friend is meat or if they're silicon (or even carbon fiber!), does it matter if you can't tell the difference? It might to pre-modern boomers like me and you.
I think things will look a lot more like Vinge's Rainbows End than everyone burrowing into their own personal algoentertainment. I can't speak for GenZ but when D&D can sell out Madison Square Garden, there doesn't seem to be any softening in people's interest in fandom.
Virtual influencers might be a big thing, Hatsune Miku has lots of fans. But it's still a shared fandom.
But availability of new works shall change once the floor of how popular you need to be to survive off of art will change and it will, since not everyone will care. Taylor Swift will be fine either way, but it's not about her.
I don't understand your argument at all. I've made hundreds of songs in my life that I haven't shared with anyone and so have all other musicians I know. The act of creating is separate from finding or having an audience. In fact, I would say that the complete opposite of what you say is true.
And even so, music production has been a constant evolution of replacing prior technologies and making it easier to get into. It used to be gatekept by expensive hardware.
You seem to forget that most artists enjoy it but due to the structure of our society are forced to either give it up for most of their waking life to earn money or attempt to market their art to the masses to make money. This AI stuff only makes it harder for artists to make any kind of living off of their work.
While there are plenty of cases where good artists make most of their money from the art, there are plenty of other cases where good artists have a 'real job' on the side.
We can dream bigger: when music, images, video and 3d assets are far easier then treat them as primitives.
We can use these to create entire virtual worlds, games, software that incorporates these, and to incorporate creativity and media into infinitely more situations in real life.
We can create massive installations that are not a single image but an endless video with endless music, and then our hand turns to stabilizing and styling and aestheticizing those exactly in line with our (the artist's) preferences.
Romanticizing the idea that picking at a guitar is somehow 'more creative' than using a DAW to create incredibly complex and layered and beautiful music is the same thing that's happening here, even if the primitives seem 'scarier' and 'bigger'.
Plus, there are many situations in life that would be made infinitely more human by the introduction of our collective work in designing our aesthetic and putting it into the world, and encoding it into models. Installations and physical spaces can absolutely be more beautiful if we can produce more, taking the aesthetic(s) that we've built so far and making them dynamic to spaces.
Also for learning: as a young person learning to draw and sing and play music and so many other things, I would have tremendously appreciated the ability to generate and follow subtle, personalized generation - to take a photo of a scene in front of me and have the AI first sketch it loosely so that I can copy it, then escalate and escalate until I can do something bigger.
I'm one of those excited people! We haven't lost anything with this new technology, only gained.
The way I see it, most people aren't creative. And the people who are creatives are mostly creating for the love of it. Most books that are published are read exclusively by the friends and family of the author. Most musicians, most stand-up comedians, most artist get to show off their works for small groups of people and make no money doing so. But they do it anyway. I draw terrible portraits, make little inventions and sometimes I build something for the home, knowing full well that I do these things for my own enjoyment and whatever ego boost I get from showing these things off to people I know.
I'm doing a marathon later and I've been working my ass off for the prospect of crossing the finishing line as number four thousand and something, and I'll do it again next year.
Greed makes no sense in a truly post scarcity society. There is no scarcity from which to take in a zero sum way from another.
Status is the real issue. Humans use status to select sexually, and the display is both competitive and comparative. It doesnt matter absolutely how many pants you have, only that you have more and better than your competition.
I actually think this thing is baked into our DNA and until sex itself is saturated (if there is such a thing), or DNA is altered, we will continue to have a however subtle form of competition undergirding all interactions.
That's only if greed is applied exclusively to real wealth. The reality is that greed is also applied to second, third, and even fourth order signs[1].
There is no end to semiotics, and therefore no end to greed. In this case, scarcity is artificial; created only via socially imposed monopoly. If we truly want a post-scarcity society, then we must abolish copyright.
Imagine you discover tech that ushers in a post-scarcity society. Let's say you make a replicator that can make anything from a bit of dirt.
Greedy people wouldn't even think about sharing this tech with the world, even though they could literally end world hunger. They will hoard it, and use it to make themselves more powerful and to kill anyone else who looks like they might discover it.
Yes, it'd be difficult. I have some faith that once things escalate far enough the people wielding the weapons are unwilling to murder their countrymen en masse.
Luigi Mangione has shown that all it takes is one person in the right time and place to remove some evil from the world.
It needs to happen, at minimum, before drones can reliably maintain themselves and kill dissidents in the street. At that point even if the human police and soldiers become disloyal it'll be too late; a society of two types of people, the one guy with access to issue prompts, and everyone else.
"Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse."
I sit and play guitar by myself all the time, I play for nobody but myself, and I enjoy it a lot. Your argument is absurd.
Relax, contrary to expectations of the autistic misanthropes that abound here on this bubble, most of this is just computationally expensive slope and far, very far from even having the chance to become real AGI.
> but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse
Kids do it all the time.
> So what is final state here for us?
Something I haven't seen discussed too much is taste - human tastes change based on what has come before. What we will care about tomorrow is not what we care about today.
It seems plausible to me that generative AI could get higher and higher quality without really touching how human tastes changes. That would leave a lot of room for human creativity IMO - we have shared experience in a changing world that seems very hard to capture with data.
> Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.
There's a whole host of "art" that has been created by people - sometimes for themselves, sometimes for a select few friends - which had little purpose beyond that creation[1]. Some people create art because they simply have to create art - for pleasure, for therapy, for whatever[2]. For many, the act of creation was far more important than the act of distribution[3].
For me, my obsession is constructing worlds, maps, societies and languages that will almost certainly die with me. And that's fine. When I feel the compulsion, I'll work on my constructions for a while, until the compulsion passes - just as I have done (on and off) for the past 50 years. If the world really needs to know about me, then it can learn more than it probably wants to know through my poetry.
You're quite the pessimist. I think the arts would do well to look at sports as a glimpse of their future. Machines are faster and stronger than people, but that hasn't had any impact on sports at all. Nobody's tuning in to the robot olympics.
Agreed that no one wants to watch shotput when the ball is launched out of a cannon, but people might be interested when the robots competing are anthropomorphs.
Machine learning as it is needs human data and input to progress further.
Synthetic data can be useful until a certain point, but you can’t expect to have a better model on synthetic data alone indefinitely.
The moat of GDM here is YouTube. That have a bazillion of gameplay and whatever videos. But here it is.
The downside I can see is that most people will stop to publish content online for free since this companies have absolutely no respect whatsoever for the humans that created the data they use.
I've never understood this argument... The real world is an unbounded training set that its cheap to observe with readily available sensors that have existed for almost a century.
In my opinion, what humans need, crave, chase, is novelty. Just look at how phobic we are of boredom. I believe creativity is part of the chasing of novelty, or the allaying of boredom. I studied film making in my 20s when the shift to digital happened, and I was the first cohort through the first digital film program in my country. When new ways to create become available, the people who struggle are often the ones who are unable to adapt their mindset to the new creative mediums and don't think "what is new to be done here". Many people when I graduated thought I was totally nuts of not owning or using an analogue camera, so many reasons, oh you can't trust the CF cards, oh the HDR will never get there, oh the shutter is too slow. This is just a version of that imo. I think AI and robotics are going all the way to the end, I'm trying to adjust my old man brain to the new world the best I can, feel blessed to have been part of a version of this before.
I agree. While I love AI, advancements must be responsible. We are made to be social beings and giving more and more of lives over to AI takes us away from the fundamental need to draw creativity, inspiration, and connection from other people. Thoughts?
I think we have a long way to go yet. Humanity is still in the early stages of its tech tree with so many unknown and unsolved problems. If ASI does happen and solves literally everything, we will be in a position that is completely alien to what we have right now.
> How to even find the value in living given all of that?
I feel like a lot of AI angst comes from people who place their self-worth and value on external validation. There is value in simply existing and doing what you want to do even if nobody else wants it.
> I feel like a lot of AI angst comes from people who place their self-worth and value on external validation. There is value in simply existing and doing what you want to do even if nobody else wants it.
I agree on this point, and have come to that conclusion myself regarding my own AI angst. However that doesn't solve the economic issues that arise from this technology. As large swathes of the workforce becomes replaced (something that, in my opinion, is rapidly approaching), how do we organise society so that everyone can survive / thrive?
As far as I can see there is very little impetus behind tackling such issues, compared to the forces pushing this tech forward so rapidly.
People still value Amish furniture or woodworking despite Ikea existing. I love that if I want a cheap chair made of cardboard and glue that I can find something to satisfy that need; but I still buy nice furniture when I can.
AI creations are analogous. I've seen some cool AI stuff, but it definitely doesn't replace the real "organic" art one finds.
Man, same here. I was initially a massive AI evangelist up until about a year ago, now I just feel sad for some reason - and I don’t want to feel sad, I’m a technologist at heart and I’ve been thrilled by every advance since I was born. I feel like some sad old boomer yelling at clouds and I’m not even 30 yet.
My only hope is this: I think the depression is telling us something real, we are collectively mourning what we see as the loss of our humanity and our meaning. We are resilient creatures though, and hopefully just like the ozone layer, junk food, and even the increasing rejections of social media and screen time, we will navigate it and reclaim what’s important to us. It might take some pain first though.
Be comforted by the fact that no matter how good the AI gets, people crave human connection. Just like AI can generate music there is an uncanny valley effect where you quickly deduce there's no true humanity behind any of it, and ultimately undervalue it. At best you can have something like Minecraft or Dwarf Fortress where the generated worlds CAN be inspiring to a degree, but that is because the rules around generation are incredibly intricate and, ultimately, human.
Yes, AI can make music that sounds decent and lyrics that rhyme and can even be clever. But listen to a couple songs and your brain quickly spots the patterns. Maybe AI gets there some day, but the uncanny valley seems to be quite a chasm - and anything that approaches the other side seems to do so by piling lots of human intention along the way.
What's interesting to me along these lines is I assume most of the companies funding the research are targeting the "creative" media in terms of image generation, music generation, avatars, speach, etc.
I can understand it's very interesting from a researcher's point-of-view (I'm a software dev who's worked adjacent to some ML researchers doing pipeline stuff to integrate models into software), but at the same time: Where are the robots to do menial work like clean toilets, kitchens, homes, etc?
I assume the funding isn't there? Or maybe it's much less exciting to research diffusion networks for image generation that working out algorithms for the best way to clean toilets :)
There are companies out there working on those problems as well. How the funding climate for them are. I don't know. But the market for smart robots, should be gigantic. So there must be some.
Keep in mind that what is easy, and hard for a human, which is the result of billions of years of evolution. Isn't necessary the same things that are hard or easy for our technologies.
Or replacing CEOs, investors, bankers? I would have thought those would be easier to replace than creating robots to clean or replacing artists, or even developers. Maybe I am wrong?
All these jobs are more who you know not what you know. The social network of these people is often an integral part of the work, so they are in a sense much safer than programmers, accountants and artists.
robotics is difficult and since transformers are just next word predictors they can't actually help us design those robots
:)
also the billionaires have help so they don't give a shit if the menial stuff is automated or not. throw in a little misogyny by and large too; I saw a LinkedIn Lunatic in the wild (some C-level) saying laundry is already automated because laundry machines exist
fucking.. tell me you don't ever do the laundry without telling me. That guy's poor wife.
I don't know how on Earth people can think like this. Most people can find "value" in a slice of pizza. It doesn't even have to be a good pizza.
Or kittens and puppies. Do you think there won't be kittens and puppies?
And that's putting aside all the obvious space-exploration stuff that will probably be more interesting than anything the previous 100 billion humans ever saw.
This is digressing a bit, but I don't buy that space exploration will be more interesting than anything you can see on Earth.
Aren't other planets / moons etc. basically just barren deserts of rock and dust? Once you get over the novelty of it, it will basically just be the shittiest and most uncomfortable place you've ever been.
In theory, creativity is an infinite space. As technology advances it allows humans to explore more and more complex things; take the advancement of music as an example, synths, loops etc.
If humans are not stretched to their limits, and are still able to be creative, then the tools will help us find our way through this infinite space.
AI will never be able to generate everything for us, because that means it will need infinite computation.
It doesn't need to generate everything. It only needs to be marginally better or more efficient than a human for it to start generating everything humans need when needed.
Edit: left the page open for a while before responding, and the other person responded with basically the same thing within that time.
If human need drives the creative process, then there will always be a human in the loop. Instead, each human becomes the “random seed” that initialises the process based on their own unique make-up. This is only different from how things work now, in that humans are also creating the artefact.
Similar to how synths meant we no longer need to play an instruments by plucking strings, it hasn’t affected the higher level creativity of creating music, only expanded it.
AI will not be able to generate everything for us. Just the things that are able to be explored by humans and hopefully a tad bit more. AI is already more creative than humans by a lot of measures.
Depends what you mean by creativity. In some ways, AI is not creative at all, everything is generated by mapping text to visuals using diffusion modelling via a shared latent space. It has no agency or creative thought of its own.
Humans have demonstrated time and again, even things beyond our experience can be explored by us; quantum mechanics for example. Humans find a way to map very complex subjects to our own experience using analogy. Maybe AI can help us go further by allowing us to do this on even more complex ideas.
> So what is final state here for us? Return to menial not-yet-automated work? And when this would be eventually automated, what's left? Plug our brains to personalized autogenerated worlds that are tailored to trigger related neuronal circuitry for producing ever increasing dopamine levels and finally burn our brains out (which is arguably already happening with tiktok-style leasure)? And how you are supposed to pay for that, if all work is automated? How economics of that is supposed to work?
Wow. What a picture! Here's an optimistic take, fwiw: Whenever we have had a paradigm shift in our ability to process information, we have grappled with it by shifting to higher-level tasks.
We tend to "invent" new work as we grapple with the technology. The job of a UX designer did not exist in 1970s (at least not as a separate category employing 1000s of people; now I want to be careful this is HN, so there might be someone on here who was doing that in the 70s!).
And there is capitalism -- if everyone has access to the best-in-class model, then no one has true edge in a competition. That is not a state that capitalism likes. The economics _will_ ultimately kick in. We just need this recent S-curve to settle for a bit.
> Whenever we have had a paradigm shift in our ability to process information, we have grappled with it by shifting to higher-level tasks.
People say this all the time, but I think it's a very short-sighted view. It really begs the question: do you believe that there are tasks that exist which a human can do, but we could not train an AI to also do? The difference between AI and any other technological advancement is that AI is (or promises to be, and I have no reason to believe otherwise) a tool that can be adapted to any task. I don't think analogies to history really apply here.
It's not a new problem (for individuals), though perhaps at an unprecedented scale (so, maybe a new problem for civilization). I'm sure there were black smiths that felt they had lost their meaning when they were replaced by industrial manufacturing.
What specific form of creative media is this supposed to replace though?
I feel like its just going to create a brand new, exciting category of entertainment.
I personally fail to see any bad precedent within this announcement.
My bets are hedged on being replaced one day, followed by a few years roughing it, to be eventually be met with something along the lines "Well damn, we really couldn't complete the entire loop on automating the automation" because frankly autocomplete will always be just that. Autocomplete.
Till then, I just learn the tools with the deepest understanding that I can muster and so far the deeper I go, the less impressed with "automated everything" I become, because it isn't really going to be capable of doing anything people are going to find interesting when the creativity well dries up.
Do you honestly believe that human minds won't be overtaken within the century?
I'll concede that it might take even longer to get full artificial human capabilities (robust, selfrepairing, selfreplicating, adaptable), but the writing is on the wall.
Even in the very best case that I see (non-malicious AI with a soft practical ceiling not too far beyond human capabilities) poses giant challenges for our whole society, just in ressource allocation alone (because people, as workers, become practically worthless, undermining our whole system completely).
> I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.
You don't think there was ever a time without a mass media culture? Plenty of people have furniture older than mass media culture. Even 20 years ago people could manage to be creative for a tiny audience of what were possibly other people doing creative things. It's only the zoomers who have never lived in a world where you never thought to consider how you could sell the song you were writing in your bedroom to the Chinese market.
It used to be that music didn't come on piano rolls, records, tapes, CDs or files. It used to be that your daughter would play music on the piano in the living room for the entire family. Even if it was music that wouldn't really sell, and wasn't perfectly played, people somehow managed to enjoy it. It was not a situation that AI could destroy. If anything, AI could assist.
I share your feelings. Also couple that with a populist and cynical political climate that can’t create effective regulations even if it wanted, and that by its very appetite for scale AI thrives at the hands of the few that can feed it and you get something quite bleak.
My only hope is that we could have created 100k nukes of monstrous yields but collectively decided not to. We instead created 10k smaller ones. We could have destroyed ourselves long ago but managed to avoid it.
Automation only leads to more labor if we allow that employer relation to dictate so. Automation affords leisure time (for everything besides labor that life has to offer, including optional labor-like pursuits) but it’s currently unevenly distributed who gets to benefit from that
You need to read Brave New World. Already have all that figured out.
Work is fundamental part of society and will never be eliminated, regardless of its utility/usefulness. The cast/class system determines the type of work. The amount (time) of work is set as it was discovered additional leisure and to reduce it does not improve individuals happiness.
Don’t be mad bro. Seriously. Every single person working on a film has creative input, not just someone hand painting a backdrop. You have an immense number of tools available to be creative with now. This is a great thing!
For too long has humanity been collectively submerged into this hyper-consumption of the arts. We, our parents and our grandparents have been getting bombarded by some or the other form of artificial dopamine sweets - from videos to reels to xeets to "news" to ads to tunes to mainstream media - every second of the day, every single day. The kind of media consumption we have every day is something our forefathers would have been overwhelmed by within an hour. It is not natural.
This complete cheapening of the arts is finally giving us a chance to shed off this load for good.
The main challenge over the next decade as all our media channels are flooded with generated media will become curation. We desperately need ways to filter human-created content from generated content. Not just for the sake of preserving art, but for avoiding societal collapse from disinformation, which is a much more direct and closer threat. Hell, we've been living with the consequences of mass disinformation for the past decade, but automated and much more believable campaigns flooding our communication platforms will drastically lower the signal-to-noise ratio. We're currently unable to even imagine the consequences of that, and are far from being prepared for it.
This tech needs strict regulation on a global scale. Anyone against this is either personally invested in it, or is ignorant of its dangers.
Genuinely technically impressive, but I have a weird issue with calling these world simulator models. To me, they're video game simulator models.
I've only ever seen demos of these models where things happen from a first-person or 3rd-person perspective, often in the sort of context where you are controlling some sort of playable avatar. I've never seen a demo where they prompted a model to simulate a forest ecology and it simulated the complex interplay of life.
Hence, it feels like a video game simulator, or put another way, a simulator of a simulator of a world model.
Also, to drive my point further home, in one of the demos they were operating a jetski during a festival. If the jetski bumps into a small Chinese lantern, it will move the lantern. Impressive. However, when the jetski bumped into some sort of floating structure the structure itself was completely unaffected while the jetski simply stopped moving.
This is a pretty clear example of video game physics at work. In the real world, both the jetski and floating structure would be much more affected by a collision, but in the context of video game physics such an interaction makes sense.
So yeah, it's a video game simulator, not a world simulator.
In the "first person standing in a room" demo, it's cool to see 100% optical (trained from recorded footage from cameras) graphics, including non-rectilinear distortion of parallel lines as you'd get from a wide-angle lens and not a high-FOV game engine. But still the motion of the human protagonist and the camera angle were 100% trained on how characters and controllers work in video games.
Sure, but if you're trying to get there by training a model on video games then you're likely going to wind up inadvertently creating a video game simulator rather than a physics simulator.
I don't doubt they're trying to create a world simulator model, I just think they're inadvertently creating a video game simulator model.
Are they training only on video game data though? I would be surprised when its so easy to generate proper training data for this.
It is interesting to think about. This kind of training and model will only capture macro effects. You cannot use this to simulate what happens in a biological cell or tweak a gravity parameter and see how plants grow etc. For a true world model, you'd need to train models that can simulate at microscopic scales as well and then have it all integrated into a bigger model or something.
As an aside, I would love to see something like this for the human body. My belief is that we will only be able to truly solve human health if we have a way of simulating the human body.
It doesn't feel incredibly far off from demoscene scripts that generate mountain ranges in 10k bytes or something. It is wildly impressive but may also be wildly limited in how it accomplishes it and not extensible in a way we would like.
Consistent output and spatial coherence across each eye, maybe a couple years? But meeting head tracking accuracy and latency requirements, I’d bet decades.
There’s no way any of this tech reduces end to end latency to acceptable levels, without a massive change in hardware. We’ll probably see someone use reprojection techniques in a year or so and claim they’ve done it. But true generated pixels straight to the headset based on head tracking, is so so far away.
You don't have to do it in real time, per se. I imagine a world in which the renderer and the world generation are decoupled. For example, you could descriptively articulate what you wanted to achieve and have it generate a world, quietly do some structure from motion (or just generate the models and textures), and those those as assets in a game engine for the actual moment to moment rendering.
You'd have some "please wait in this lobby space while we generate the universe" moments, but those are easy to hide with clever design.
I think VR will come at the same time they make multiplayer. There needs to be differentiation between the world-state and the viewport. Right now, I suspect they're the same.
But once you can get N cameras looking at the same world-state, you can make them N players, or a player with 2 eyes.
It's hard to get an acceptable VR output for today's rendering engines still. In the examples provided, the movement seems to be slow and somewhat linear, which doesn't translate to head movements in VR. VR needs 2 consistent videos with much higher resolutions and low latency is a must. The feedback would still be very dependent on people's tolerance to all imperfections - some would be amazed, others would puke. That's why VR still isn't in the spotlight after all the years (I personally find it great).
This kind of announcement without an appropriate demo to verify their claims is pretty common with DeepMind at this point. They barely even discuss their limitations, so as always, this should be taken with a grain of salt.
Most of the big labs never go into their models' limitations. OpenAI does it best, despite their inveterate hype-building. Their releases always have a reasonable limitations section, usually with text/image/video examples of failures.
Google does a good job with that too usually. Which makes their last two announcements (IMO success and Genie 3) being a bit light on details is somewhat surprising.
> World models are also a key stepping stone on the path to AGI, since they make it possible to train AI agents in an unlimited curriculum of rich simulation environments.
I don't think Humans are the target market for this model, at least right now.
Sounds like the use case is creating worlds for AI agents to play in.
Can you imagine explaining to someone from the 1800s that we've created a fully generative virtual world experience and the demo was "painting a wall blue"
There's an old documentary where a film crew transported some of those "uncontacted tribe" guys to central London. Rather than being in awe of jet engines or networked internet, the tribesmen guys spent most of their time admiring house construction. "You're telling me there's METAL inside these walls?" I guess we tend to appreciate what we can understand.
The documentary is real. I don't remember a house construction scene but it's been over 10 years since I watched it. It's not an uncontacted tribe, but tribes from Vanuatu and the Solomon Islands.
Reading works of early computer scientists (mathematicians?) like Ada Lovelace or Alan Turing it seems to me that they would be a lot less surprised than some current observers. The idea of artificial mind comes up a lot and they weren't witness to 30 years of slow and uninspiring NLP developments.
What gets me is the egocentric perspective it has naturally produced from its training data, where you have the perception of a 3D 6 degrees of freedom world space around you. Once it's running at 90 frames per second and working in a meshed geometry space, this will intersect with augmented virtual XR headsets, and the metaverse will become an interaction arena for working with artificial intelligence using our physical action, our gaze, our location, and a million other points of background noise telemetry, all of which will be integrated into what we now today call context and the response will be adjusting in a useful, meaningful way what we see painted into our environment. Imagine the world as a tangible user interface.
I believe that the corpus of video data to train on with video far exceeds that of 3D data. It's also much cheaper to produce video data. So I'd expect that this is probably the quickest way forward from a current world state perspective.
Additionally, video seems like a pretty forward output shape to me - 2D image with a time component. If we were talking 3D assets and animations I wouldn't even know where to start with modeling that as input data for training. That seems really hard to model as a fixed input size problem to me.
If there was comparable 3D data available for training, I'd guess that we'd see different issues with different approaches.
A couple of examples that I could think of quickly: Using these to build games, might be easier if we could interact with the underlying "assets". Getting photorealistic results with intricate detail (e.g. hair, vegetation) might be easier with video based solutions.
If the fidelity of the video is high enough, you could use SFM to build point clouds from the generated video frames and essentially do photogrammatry on the assets from a genie video.
well actually image output is fixed and there s lots of training data. Neural networks can learn anything in their latent space so there is no need to impose 3D rendering constraints, and it s not evident that it's less efficient (for the model).
3D model rendering would be useful however for interfacing with robots.
You often view 3D games on a 2D screen. That doesn’t mean that a game is natively 2D and the 3D world is an inconvenient step that can be bypassed. Actually the opposite, the 2D representation on screen is just a projection.
In VR, for example, the same 3D scene will be rendered twice, once for each eye, from two viewpoints 10-15cm apart.
If you don’t have an internal 3D representation of the world, the AI would need to generate exactly the same scene from a very slightly different perspective for each eye, without any discrepancies or artefacts.
And that’s not even discussing physics, collisions or any form of consistent world logic that happens off-screen. Or multiplayer!
Wow, a few years ago, if you've shown me this and Genie 3, I'd assume there were at least 10 years of development between them. This looks worse than Doom.
I'm pretty sure that got debunked as a handful of overfitted examples that fall apart as soon as you leave the area that was 3D recorded. Google's new tech is on another level entirely.
I feel like this tech is a dead end. If it could instead generate 3d models which are then rendered, that would be immensely useful. Eliminates memory and playtime constraints, allows it to be embedded in applications like games. But this? Where do we go from here? Even if we eliminate all graphical issues and get latency from 1s to 0, what purpose does it serve?
I think the most likely path forward for commercialization/widespread use is to use AI as a post-processing filter for low poly games. Imagine if you could take low quality/low poly assets, run it through a game engine to add some basic lighting, then pass this through AI to get a photo-realistic image. This solves the most egregious cases of world inconsistency and still allows for creative human fine-tuning. The trick will be getting the post-processor to run at a reasonable frame rate.
Don’t we already have upscalers which are frequently used in games for this purpose? Maybe they could go further and get better but I’d expect a model specifically designed to improve the quality of an existing image to be better/more efficient at doing so than an image generation model retrofitted to this purpose.
This is beautiful. An incredible device that could expand people's view of history and science. We could create such immersive experiences with this.
I know that everyone always worries about trapping people in a simulation of reality etc. etc. but this would have blown my mind as a child. Even Riven was unbelievable to me. I spent hours in Terragen.
> To that end, we're exploring how we can make Genie 3 available to additional testers in the future.
No need to explore; I can tell you how. Release the weights to the general public so that everyone can play with it and non-Google researchers can build their work upon it.
Of course this isn't going to happen because "safety". Even telling us how many parameters this model has is "unsafe".
Modern AI wouldn't exist without Google's contributions. Yet they're a for-profit company. I'm ok with them keeping some things closed source every now and then.
I thought I was not going to see too many negative comments here, yet I was mistaken. I thought if it's not LLM, people would have a more nuanced take and could look at the research with an open mind. The examples on the website are probably cherry-picked, but progress is really nice compared to Genie 2.
It's a nice step towards gains in embodied AI. Good work, DeepMind.
A lot of the negativity around this post is about the fact that there’s no demo and no open weights, which is Correct Negativity. Like don’t get me wrong, it would be cool for something like this to exist, but I’ve generally learned not to trust AI companies’ descriptions of their models until someone (or I) can actually get their hands on it and see if it’s usable at all. A description of a model that isn’t going to be released to the public isn’t very interesting to me.
Can anyone specifically working or with expertise in this field, give even a best guest breakdown (or better) of the technology and architecture, system design or possibly even the compute requirement's of how they think this was implemented? Very curious as to how thing works and methods employed, as they are atm tight lipped generally. So kind of curious for those who are specialists in this space what they could surmise or speculate on the implementation of Genie 3
Interesting! This feels like they're trying to position it as a competitor to Nvidia's Omniverse, which is based on the Universal Scene Descriptor format as the backbone. I wonder what format world objects can be ingested into Genie in - e.g. for the manufacturing use cases mentioned.
This looks incredibly promising not just for AI research but for practical use cases in game development. Being able to generate dynamic, navigable 3D environments from text prompts could save studios hundreds of hours of manual asset design and prototyping. It could also be a game-changer for indie devs who don’t have big teams.
Another interesting angle is retrofitting existing 2D content (like videos, images, or even map data) into interactive 3D experiences. Imagine integrating something like this into Google Maps suddenly street view becomes a fully explorable 3D simulation generated from just text or limited visual data.
That would be more useful and there are some services that attempt to do that, though I don’t know of any that do it well enough that a human isn’t needed to clean up the mess.
Genie 3 isn’t that though. I don’t think it’s actually intended to be used for games at all.
First AI thing that’s made me feel a bit of derealization…
…and this is the worst the capabilities will ever be.
Watching the video created a glimmer of doubt that perhaps my current reality is a future version of myself, or some other consciousness, that’s living its life in an AI hallucinated environment.
I suggest you go to google/bing/whatever floats your boat and search "it will only get better" then filter results earlier than 2010. Things that I just found that were going to "only get better":
Jokes aside, Google Search results are worse thanks to so much web content being just ad scaffolding, but the interesting one here is music.
Music is typically imagined to be its best at whatever ages one most listened to it, partly trained in and partly thanks to meanings/memories/nostalgia attached to it. As a consequence, for most everyone, more recent music seems to be “getting worse”!
That said, and back to the SEO effect on Google Results, I'd argue mass distribution/advertising/marketing has resulted in most audio airtime getting objectively* less complex, but if one turns off the mass distribution, and looks around, there seems to be plenty of just as good — even building on what came before — music to be found.
Very much disagree. Current AI benchmarks are quite arbitrary as evidenced by the ability of a model to be fitted to a particular benchmark. Like the closest benchmark to objectivity is “does it answer this question factually” and benchmarks like that are just as failable really because who decides what questions we ask? The same struggles happen when we try to measure human intelligence. The more complex the algorithm the harder it is to quantify because there are so many parameters. I could easily contrive some “search engine benchmark”, but it wouldn’t be that useful because it’s only adherent to my own subjective definition of what it means for a search engine to be good.
Those are worse due to economic and cultural reasons, not technological reasons. The technology itself will only get better.
(Also, implying that music has gotten worse is a boomer-ass take. It might not be to your liking, but there's more of it than ever before, and new sonic frontiers are being discovered every day.)
Are you really trying to say that these models aren't going to get better from here? You think that the insane progress of the last 5 years just stops right here?
The difference is the incentive to improve, and actual present rate of improvement, for models like this is far higher than it is for jetpacks. (That and certain intrinsic features at least suggest the route to improvement is roughly "more of the same," vs "needs massive unknown breakthrough".)
And if trillions of dollars were being invested in that, it would mean lots of investors being disappointed in a few years, not that jet packs were close to being useful.
Not sure if that's what you are trying to say about AI, or not.
It's an unsettling feeling as what's more complicated - all the atoms and galaxies, trillions of life forms, the unimaginable distances of our universe OR a relatively simple world model that is our conscious experience and nothing else.
If it helps, if you look at the biology of human vision, you find out things like the width of your cone of sharp vision is about 2 degrees, or the size of your thumb held out at arms length.
Due to this physical limitation, what you 'see' in front of you, widely accepted as ground truth reality, cannot possibly real, its a hallucination produced by your brain.
Your brain, compared to the sensory richness of reality you experience around you, has very limited direct inputs from the outside world, it must construct a rich internal model based on this.
It's very weird (at least to me), that the boundary between reality and assumption (basically educated guessing) is very arbitrary, and definitely only exists in our heads.
No it's good, you're ahead of the curve, most people aren't there yet.
The next step is to realize that, if life is a cheap simulation, not everyone might have... uh... fully simulated minds. Player Characters vs NPCs is what gamers would say, though it doesn't have to be binary like that, and the term NPC has already been ruined by social media rants. (Also, NPC is a bad insult because most of the coolest characters in games are NPC rivals or bosses or whatnot.)
> First AI thing that’s made me feel a bit of derealization…
> …and this is the worst the capabilities will ever be.
I guess if this bothers you (and I can see how it might) you can take some small comfort in thinking that (due to enshitification) this could in fact be the _best_ the capabilities will ever be.
Once it has been proven to be possible, other companies [1][2][3] can and will reproduce it, and will attempt to push the frontier. As far as we know, there's no bottleneck that's stalling development here.
The Simulation Theory presents the following trilemma, one of which must be true:
1. Almost all human-level civilizations go extinct before reaching a technologically mature “posthuman” stage capable of running high-fidelity ancestor simulations.
2. Almost no posthuman civilizations are interested in running simulations of their evolutionary history or beings like their ancestors.
3. We are almost certainly living in a computer simulation.
If you take the idea of it needing to be a constructed simulation you get the dream argument. If you add that one can't verify anyone else having subjective experience you get Boltzmann brain. If you add the idea that maybe the ancestor simulations are designed to teach us virtuous behavior through repeated visits to simulation worlds you get the karmic cycle, and Boltzmann brain + karmic cycle is roughly the egg theory.
I think some/all of these things can roughly true at the same time. Imagine an infinite space full of chaotic noise that arises a solitary Boltzmann brain, top level universe and top level intelligence. This brain, seeking purpose and company in the void, dreams of itself in various situations (lower level universes) and some of those universes' societies seek to improve themselves through deliberate construction of karmic cycle ancestor simulation. A hierarchy of self-similar universes.
It was incredibly comforting to me to think that perhaps the reason my fellow human beings are so poor at empathy, inclusion, justice, is that this is a karmic kindergarten where we're intended to be learning these skills (and the consequences for failing to perform them) and so of course we're bad at it, it's why we're here.
But there are lots of critiques of that supposed trilemma.
Why would beings in simulations be conscious?
Or maybe running simulations is really expensive and so it's done sometimes (more than "almost none") but only sometimes (nowhere near "we are almost certainly").
Or simulations are common but limited? You don't need to simulate a universe if all you want to do is simulate a city.
The "trilemma" is an extreme example of black-and-white thinking. In the real world, things cost resources and so there are tradeoffs -- so middle grounds are the rule, not extremes.
The idea of a possible new AI winter to come seems less likely with each new announcement. Robust world models can be used as an effectively infinite source of training data.
We were working towards this years ago with Doarama/Ayvri, and I remember fondly in 2018 an investor literally yelling at me that I didn't know what I was talking about and AI would never be able to do this. Less than a decade later, here we are.
Our product was a virtual 3d world made up of satellite data. Think of a very quick, higher-res version of google earth, but the most important bit was that you uploaded a GPS track and it re-created the world around that space. The camera was always focused on the target, so it wasn't a first person point of view, which, for the most part, our brains aren't very good at understanding over an extended period of time.
For those curious about the use case, our product was used by every paraglider in the world, commercial drone operations, transportation infrastructure sales/planning, out-door events promotions (specifically bike and ultramarathon races).
Though I suspect we will see a new form of media come from this. I don't pretend to suggest exactly what this media will be, but mixing this with your photos we can see the potential for an infinitely re-framable and zoomable type of photo media.
Creating any "watchable" content will be challenging if the camera is not target focused, and it makes it difficult to create a storyline if you can't dictate where the viewer is pointed.
I find the model very impressive, but how could it be used in the wild? They mention robots (maybe to test them cheaply in completely different environments?), but I don't see the use in games except during development to generate ideas/assets.
The claims being made in this announcement are not demonstrated in the video. A very careful first person walk in an AI video isn’t very impressive these days…
This is bad use of AI, we spend our compute to make science faster. I am pretty confident computational cost of this will be maybe 100x of chatgpt query. I don't want to think even environmental effects.
People are thinking "how are video games going to use this?"
That's not the point, video games are worth chump-change compared to robotics. Training AIs on real-world robotic arms scaled poorly, so they're looking for paths that leverage what AI scales well at.
This is scary. I don’t have a benchmark to propose but in don’t think my brain can imagine things with greater fidelity than this. I can probably write down the physics better but I think these systems have reached parity with at least my imagination model
Why? Sure a virtual walk around the Pantheon in all its glory would be nice. But would that really improve history lessons? It doesn't help students understand why things happened, and what the consequences were and how they have impacted the rest of history of the modern world.
Engagement is one of the core pieces education and one of the hardest things to solve. If you remember back to being a kid, reading white papers is not really a thing. Interesting (e.g. engaging) teachers and field trips (which not all schools have access to) are tools that help kids learn.
At the limit, if you could stay engaged you would be an expert in pretty much anything.
"It doesn't help students understand why things happened, and what the consequences were and how they have impacted the rest of history of the modern world."
I would say the opposite, let's recreate each step in that historical journey so you can see exactly what the concequenses were, exactly why they happened and when.
Or maybe the constant detachment from reality that this technology and social media provide will only make it seem like they're more engaged when in fact they're mentally retreating from the physical world.
Inhabiting a foreign cultural context can provide information that factual lessons may struggle to convey to the same degree. Of course, there's a limit to this - especially with regards to historical accuracy - but you are much more likely to understand why specific historical decisions were made if you are "in the room" where they happened, so to speak.
Some physicist once said "I endeavor to never write more clearly than I think"; in the same way, history probably shouldn't be presented more vividly than it's understood. (We already have this problem with people remembering incidental details and emotional vibes from historical fiction as if they were established historical fact; VR diffusion delusions would make this much worse.)
If you read actual history the historians typically go into quite a lot of depth on why they think X happened as opposed to Y, and what the limitations are on the theories and the reasoning. The amount of archaeological and written records we have is very important to those facts.
There's an entire genre of games (immersive sims) that focus on experiencing the world with little to sometimes no skill required on the part of the player. The genre is diverse and incorporates elements of more gameplay-focused genres. It's also pretty popular.
I think some people want to play, and some want to experience, in different proportions. Tetris is the emanation of pure gameplay, but then you have to remember "Colossal Cave Adventure" is even older than Tetris. So there's a long history of both approaches, and for one of them, these models could be helpful.
Not that it matters. Until the models land in the hands of indie developers for long enough for them to prove their usefulness, no large developer will be willing to take on the risks involved in shipping things that have the slightest possibility of generating "wrong" content. So, the AI in games is still a long way off, I think.
> Do people play video games to look at pretty scenery?
Yes.
> No most people are testing skills in video games
That's not mutually exclusive with playing for scenery.
Games, like all art, have different communities that enjoy them for different reasons. Some people do not want their skills tested at all by a game. Some people want the maximum skill testing. Some want to experience novel fantasy places, some people want to experience real places. Some people want to tell complex weaving narratives, some people want to optimize logistics.
A game like Flower is absolutely a game about looking at pretty scenery and not one about testing skill.
They do both. Nobody played Cyberpunk 2077 for the riveting gameplay.
Actually that game felt a lot like these videos, because often you would turn around and then look back and the game had deleted the NPCs and generated new ones, etc.
I doubt it. The only video games I play are competitive games like DotA 2, Counter Strike 2, Call of Duty, Rainbow 6 Siege, etc. I don't really see how this completes or replaced that at all.
I'm not sure this is interesting beyond the wow effect. Unless we can actually get the world out of the AI. The real reason chatgpt and friends actually have customers is that the text interface is actually durable and easily to build upon after generation. It's also super ez to feed text into a fresh cycle. But this, while looking fancy, doesn't seem to be on the path to actually working out. Unless there is a sane export to unreal or something.
I’m seeing a lot of variations on this in this thread, but we have been able to render photoreal things, and do intricate physical simulations, for a long time. This is mostly impressive because it is a real-time way to generate and render big, intricate worlds.
But if you believe reality is a simulation, why would these “efficient” world-generation methods convince you of anything? The tech our reality would have to be running on is still inconceivable science fiction.
but we have been able to render photoreal things, and do intricate physical simulations, for a long time.
Not like this we haven't. This is convincing because I can have any of you close your eyes and imagine a world where pink rabbits hand out parking tickets. We're a neurolink away from going from thought > to prompt > to fantasy.
I guess I should have clarified: when you talk about reality being a simulation, do you mean that we collectively live in a simulated universe, or that you personally are playing a very realistic vr game?
To add: our reality does not have to be rendered in it's entirety, we'll just have very convincing and unscripted first-person view simulations. Only what you look at is getting rendered (e.g. tiny structures only get rendered when you use microscope).
"developing simulated environments for open-ended learning and robotics"
What this means is that a robot model could be trained 1000x faster on GPUs compared to training a robot in the physical world where normal spacetime constraints apply.
Same here. Though if I were a 17-year-old film fan or gamer with an imaginative drive, I would be really excited about the powerful creative tools that might become available to me soon.
Hollywood maybe for small scenes, but gamers would quickly realize and destroy this level of quality and continuity vs. a 3D game engine with defined meshes
I actually think indie game dev is quite safe from AI (well its already insanely competitive). It might change the field, or shrink the market but I think AI has a chance at replacing workers where the only metric that matters is $$$ and productivity. I just don't see myself consuming, for example, an AI generated autobiography or any AI generated book. As long as enough people feel that way the market will continue to be there.
It's interesting, because I was always a bit confused and annoyed by the Giant's Drink/Mind Game that Ender plays in Ender's Game. It just always felt so different to how games I knew played, it felt odd that he would "discover" things that the developers hadn't intended, because I always just thought "wait, someone had to build that into the game just in case he happened to do that one specific thing?" Or if it was implied that they didn't do that, then my thought was "that's not how this works, how is it coming up with new/emergent stories?"
This feels almost exactly like that, especially the weird/dreamlike quality to it.
A Mind needs a few things: The ability to synthesize sensor data about the outside world into a form that can be compressed into important features, the ability to choose which of those features to pay attention to, the ability to model the physical world around it, find reasonable solutions to problems, and simulate its actions before taking them, The ability to understand and simulate the actions of other Minds, the ability to compress events into important features and store them in memory, the ability to retrieve those memories and appropriate times and in appropriate clarity, etc.
I feel like as time goes on more and more of these important features are showing up as disconnected proofs of concept. I think eventually we'll have all the pieces and someone will just need to hook them together.
I am more and more convinced that AGI is just going to eventually happen and we'll barely notice because we'll get there inch by inch, with more and more amazing things every day.
Why even be sarcastic about it ? There is no human invention that has not exploded thanks (or because of) pornographic possibilities. HD-DVD vs Blueray, Internet...I'd even argue that XR is not as big as it could be because it is really clamped down to deviant usage !
What a strange take. Do you not care about news coming from the James Webb Telescope either, just because you can't play with the telescope personally?
It's a whitepaper release to share the STOTA research. This doesn't seem like an economically viable model, nor does it look polished enough to be practically usable.
I think it's a perfectly valid take coming from some intersection of an engineering mindset and FOSS culture. And, the comparison you bring up is a bit of a category error.
We know how James Webb works and it's developed by an international consortium of researchers. One of our most trusted international institutions, and very verifiable.
We do not know how Genie works, it is unverifiable to non-Google researchers, and there are not enough technical details to move much external teams forward. Worst case, this page could be a total fabrication intended to derail competition by lying about what Google is _actually_ spending their time on.
We really don't know.
I don't say this to defend the other comment and say you're wrong, because I empathize with both points. But I do think that treating Google with total credulity would be a mistake, and the James Webb comparison is a disservice to the JW team.
James Webb Telescope is not something that can be - and is released. AI models are, and others are announcing them when they're available, but DeepMind introduces noise here with their "trust us, that works, now go away" approach.
> James Webb Telescope is not something that can be - and is released
I would actually turn that around. The Telescope is released. It's flying around up there taking photos. If they kept it in some garage while releasing flashy PR pages about how groundbreaking it is, then I'd be pretty skeptical.
What is the purpose of this? It seems designed to muddy the waters of reality vs. falsehood and put creatives in film/tv out of jobs. Real Jurassic Park moment here
They mention some possible applications in the video. Training environments for robotics (use sample data to simulate the surface of mars or the inside of a nuclear reactor), educational worlds for students (like the old Encarta virtual tours), and disaster preparedness simulations (e.g. training firefighters on an endless variety of burning homes).
Obviously, none of these are super viable given the low accuracy and steerability of world models out today, but positive applications for this kind of tech do exist.
Also (I'm speculating now instead of restating the video), I think pretty soon someone will hook up a real time version of this to a voice model, and we will get some kind of interactive voice + keyboard (or VR) lucid dream experience.
Arrogant profit seeking capitalist dick measuring that will break our societies and ours and our childrens worldviews under some pathetic label of "exploring a new scientific frontier"
> Genie 3’s consistency is an emergent capability
So this just happened from scaling the model, rather than being a consequence of deliberate architecture changes?
Edit: here is some commentary on limitations from someone who tried it: https://x.com/tejasdkulkarni/status/1952737669894574264
> - Physics is still hard and there are obvious failure cases when I tried the classical intuitive physics experiments from psychology (tower of blocks).
> - Social and multi-agent interactions are tricky to handle. 1vs1 combat games do not work
> - Long instruction following and simple combinatorial game logic fails (e.g. collect some points / keys etc, go to the door, unlock and so on)
> - Action space is limited
> - It is far from being a real game engines and has a long way to go but this is a clear glimpse into the future.
Even with these limitations, this is still bonkers. It suggests to me that world models may have a bigger part to play in robotics and real world AI than I realized. Future robots may learn in their dreams...
https://www.theguardian.com/technology/2025/aug/05/google-st...
Gemini Robot launch 4 mo ago:
https://news.ycombinator.com/item?id=43344082
https://kylekukshtel.com/diffusion-aaa-gamedev-doom-minecraf...
But even when I wrote that I thought things were still a few years out. I facetiously said that Rockstar would be nerd-sniped on GTA6 by a world model, which sounded crazy a few months ago. But seeing the progress already made since GameNGen and knowing GTA6 is still a year away... maybe it will actually happen.
I'm having trouble parsing your meaning here.
GTA isn't really a "drive on the street simulator", is it? There is deliberate creative and artistic vision that makes the series so enjoyable to play even decades after release, despite the graphics quality becoming more dated every year by AAA standards.
Are you saying someone would "vibe model" a GTAish clone with modern graphics that would overtake the actual GTA6 in popularity? That seems extremely unlikely to me.
But someone creative having a vision in their head and then just guiding AI to flesh out the assets, details, etc.
GenAI will never get there because it can't, by design. It can riff on what was, and it can please the prompter, but it cannot challenge anyone creatively. No current LLM's can, either. I'll eat my hat if this is wrong in ten years, but it won't be.
It will generate refined slop ad nauseam, and that will train people's brains into spotting said slop faster using less energy. And then it'll be shunned.
GTA6 will not actually be nerd-sniped, but it's easy to see how a lot of what makes the game defensible is being rapidly commoditized.
I despise the creative and artistic vision of GTA online, but I’m clearly in a minority there gauging by how much money they’ve made off it.
I'm starting to think some of the names behind LLMs/GenAI are cover names for aliens and any actual humans involved have signed an NDA that comes with millions of dollars and a death warrant if disobeyed.
Anyways, crafting pretty looking worlds is one thing, but you still need to fill them in with something worth doing, and that's something we haven't really figured out. That's one of the reasons why the sandbox MMORPG was developed as opposed to "themeparks". The underlying systems, the backend is the real meat here. At most with the world models right now is that you're replacing 3d artists and animators, but I would not say that is a real bottleneck in relation to one's own limitations.
Maybe I’m misinterpreting what you’re saying here, but 2021 til present has been a glut of some of the best titles ever made, by pretty much any measure
https://kylekukshtel.com/game-design-mimetics
Reality is not composed of words, syntax, and semantics. A human modal is.
Other human modals are sensory only, no language.
So vision learning and energy models that capture the energy to achieve a visual, audio, physical robotics behavior are the only real goal.
Software is for those who read the manual with their new NES game. Where are the words inside us?
Statistical physics of energy to make machine draw the glyphs of language not opionated clustering of language that will close the keyboard and mouse input loop. We're like replicating human work habits. Those are real physical behaviors. Not just descriptions in words.
So prescient. I definitely think this will be a thing in the near future ~12-18 months time horizon
A neural net can produce information outside of its original data set, but it is all and directly derived from that initial set. There are fundamental information constraints here. You cannot use a neural net to itself generate from its existing data set wholly new and original full quality training data for itself.
You can use a neural net to generate data, and you can train a net on that data, but you'll end up with something which is no good.
To pick an almost trivial example, let's say OCR digit recognition. You'll train on the original data-set, but also on information-preserving skews and other transforms of that data set to add robustness (stretched numbers, rotated numbers, etc.). The core operation here is taking a smallset in some space (original training data) and producing some bigset in that same space (generated training data).
For simple things like digit recognition, we can imagine a lot of transforms as simple algorithms, but one can consider more complex problems and realize that an ML model would be able to do a good job of learning how to generate bigset candidates from the smallset.
There is an uncountably large number of models that perfectly replicate the data they're trained on; some generalize out of distribution much better. Something like dreaming might be a form of regularization: experimenting with simpler structures that perform equally well on training data but generalize better (e.g. by discovering simple algorithms that reproduce the data equally well as pure memorization but require simpler neural circuits than the memorizing circuits).
Once you have those better generalizing circuits, you can generate data that not only matches the input data in quality but potentially exceeds it, if the priors built into the learning algorithm match the real world.
We have truly reached peak hackernews here.
I.e. if the simulation has enough videos of firefighters breaking glass where it seems to drop instantaneously and in the world sim it always breaks, a firefighter robot might get into a problem when confronted with unbreakable glass, as it expects it to break as always, leading to a loop of trying to shatter the glass instead of performing another action.
Are you sure? I've been ingesting boatloads of high definition multi-sensory real-time data for quite a few decades now, and I hardly remember any of it. Perhaps the average quality/diversity of LLM training data has been higher, but they sure remember a hell of a lot more of it than I ever could.
The LLM has plenty of experts and approaches etc.
Give it tool access let it formulate it's own experiments etc.
The only question here is if it becomes a / the singularity because of this, gets stuck in some local minimum or achieves random perfection and random local minimum locations.
And these are consumer options, affordable to you and me, not only to some military. If those are the commonly available options... there may be way more advanced stuff that we haven't seen.
I asked for real examples from someone who claimed to have first hand experience, not more marketing bullshit
y'all are in a religion
https://developer.nvidia.com/isaac/gr00t
What's with this insane desire for anthropomorphism? What do you even MEAN learn in its dreams? Fine-tuning overnight? Just say that!
Whether a computational medium is carbon-based or silicon-based seems irrelevant. Call it "carbon-chauvinism".
Since consciousness is closely linked to being a moral patient, it is all the more important to err on the side of caution when denying qualia to other beings.
This is generally a bad idea, but a few of the results like "neural networks" did work out… eventually.
"World model" is another example of a metaphor like this. They've assumed that humans have world models (most likely not true), and that if they program something and call it a "world model" it will work the same way (definitely not true) and will be beneficial (possibly true).
(The above critique comes from Phil Agre and David Chapman.)
No-one cares. It's just terminology.
Unbelievable. How is this not a miracle? So we're just stumbling onto breakthroughs?
It's basically what every major AI lab head is saying from the start. It's the peanut gallery that keeps saying they are lying to get funding.
Not to detract from what has been done here in any way, but it all seems entirely consistent with the types of progress we have seen.
It's also no surprise to me that it's from Google, who I suspect is better situated than any of its AI competitors, even if it is sometimes slow to show progress publicly.
I think this was the first mention of world models I've seen circa 2018.
This is based on VAEs though.
I suppose it depends what you count as "the start". The idea of AI as a real research project has been around since at least the 1950s. And I'm not a programmer or computer scientist, but I'm a philosophy nerd and I know debates about what computers can or can't do started around then. One side of the debate was that it awaited new conceptual and architectural breakthroughs.
I also think you can look at, say, Ted Talks on the topic, with guys like Jeff Hawkins presenting the problem as one of searching for conceptual breakthroughs, and I think similar ideas of such a search have been at the center of Douglas Hofstadter's career.
I think in all those cases, they would have treated "more is different" like an absence of nuance, because there was supposed to be a puzzle to solve (and in a sense there is, and there has been, in terms of vector space and back propagation and so on, but it wasn't necessarily clear that physics could "pop out" emergently from such a foundation).
[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Even if his broader point might be valid (about the most fruitful directions in ML), calling something a "bitter lesson" while insulting a whole field of science is ... something.
Also as someone involved in early RL, he should know better.
We don't inherit any software, so cognitive function must bootstrap itself from it's underlying structure alone.
https://media.ccc.de/v/38c3-self-models-of-loving-grace
If you say that this is emergent from the "underlying structure alone", doesn't this mean that it would still be "inherited" software (though in this case, maybe we think of it like punch cards).
A biological example that I like: the neural structures for vision develop almost fully formed from the very beginning. The state of our network at initialization is effectively already functional. I’m not sure to which extent this is true for humans, but it is certainly true for simpler organisms like flies. The way cells achieve this is through some extremely simple growth rules as the structure is being formed for the first time. Different kinds of cells behave almost independently of each other, and it just so happens that the final structure is a perfectly functional eye. I’ve seen animations of this during a conference talk and it was one of the most fascinating things I’ve ever seen. It truly shows how the complexity of a biological organism is just billions of times any human technology. And at the same time, it’s a beautiful illustration of the lack of intelligent design. It’s like watching a Lego assemble by just shaking the pieces.
An example that might be useful: dragonflies lay their eggs in water. Since a dragonfly has like a 4-bit CPU you might be amazed at how it manages to get all the processing required to identify a body of water from a distance into its tiny mind, and also marvel at what sort of JPEG+++ encoding must be used to convey what water looks like from generation to generation.
But they don't do that at all: instead they have eyes that are sensitive to polarized light. The surface of water polarizes reflected light. So do things like polished gravestones. So dragonflies will lay their eggs on gravestones too.
One I like to ponder is: beavers building damns. Do they have an encoded algorithm that knows that they need to damn the river to have a place to live, by gnawing on trees, carrying them to the right place on the river bed, etc? Nope, certainly they don't have that. Perhaps they have teeth that grow so long that they hurt, motivating the animal to gnaw on something solid to wear them down. The only solid thing they have available is a tree.
But then you have things like language or societal customs that are purely 'software'.
Hardware and software, as metaphors applied to biology, I think are better understood as a continuum than a binary, and if we don't inherit any software (is that true?), we at least inherit assembly code.
To stay with the metaphor, DNA could be rather understood as firmware that runs on the cell. What I mean with software is the 'mind' that runs on a collection of cells. Things like language, thoughts and ideas.
There is also a second level of software that runs not on a single mind alone, but collection of minds, to form cliques or a societies. But this is not encoded in genes, but in memes.
I think it's like Chomsky said, that we don't learn this infrastructure for understanding language any more than a bird "learns" their feathers. But I might be losing track of what you're suggesting is software in the metaphor. I think I'm broadly on board with your characterization of DNA, the mind and memes generally though.
How do you claim to know this?
We had one breakthrough a couple of years ago with GPT-3, where we found that neural networks / transformers + scale does wonders. Everything else has been a smooth continuous improvement. Compare today's announcement to Genie-2[1] release less than 1 year ago.
The speed is insane, but not surprising if you put in context on how fast AI is advancing. Again, nothing _new_. Just absurdly fast continuous progress.
[1] - https://deepmind.google/discover/blog/genie-2-a-large-scale-...
Kind of like how a single neuron doesn't do much, but connect 100 billion of them and well...
He seems to me too enthusiastic, such that I feel Google asked him in particular because they trusted him to write very positively.
The lead in to the quote starts at https://youtu.be/GjENnyQupow?t=662
"I don't say you're self-censoring - I'm sure you believe everything you're saying; but what I'm saying is, if you believed something different, you wouldn't be sitting where you're sitting." -- Noam Chomksy to Andrew Marr
Thank you for finding that link for me :)
https://en.wikipedia.org/wiki/Manufacturing_Consent#:~:text=...
https://www.goodreads.com/book/show/12617.Manufacturing_Cons...
Though this is often associated with his and Herman's "Propaganda Model," Chomsky has also commented that the same appears in scholarly literature, despite the overt propaganda forces of ownership and advertisement being absent:
https://en.wikipedia.org/wiki/Propaganda_model#:~:text=Choms...
& you're basically seeing GPT-3 and saying it will never be used in any serious application.. the rate of improvement in their model is insane
Use the CPU and RAM for world state, then pass it off to the model to render.
Regardless of how this is done, Unreal Engine with all of its bells and whistles is toast. That C++ pile of engineering won't outdo something this flexible.
I've been thinking about this a while and it's obvious to me:
Put Minecraft (or something similar) under the hood. You just need data structures to encode the world. To enable mutation, location, and persistence.
If the model is given additional parameters such as a "world mesh", then it can easily persist where things are, what color or texture they should be, etc.
That data structure or server can be running independently on CPU-bound processes. Genie or whatever "world model" you have is just your renderer.
It probably won't happen like this due to monopolistic forces, but a nice future might be a future where you could hot swap renderers between providers yet still be playing the same game as your friends - just with different looks and feels. Experiencing the world differently all at the same time. (It'll probably be winner take all, sadly, or several independent vertical silos.)
If I were Tim Sweeny at Epic Games, I'd immediately drop all work on Unreal Engine and start looking into this tech. Because this is going to shore them up on both the gaming and film fronts.
I think in this context, it could be amazing for game creation.
I’d imagine you would provide item descriptions to vibe-code objects and behavior scripts, set up some initial world state(maps), populated with objects made of objects - hierarchically vibe-modeled, make a few renderings to give inspirational world-feel and textures, and vibe-tune the world until you had the look and feel you want. Then once the textures and models and world were finalised, it would be used as the rendering context.
I think this is a place that there is enough feedback loops and supervision that with decent tools along these lines, you could 100x the efficiency of game development.
It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.
> It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.
All video games become Minecraft / Roblox / VRChat. You don't need AAA studios. People can make and share their own games with friends.
Scary realization: YouTube becomes YouGame and Google wins the Internet forever.
Can you make a basically indistinguishable copy of other games in Roblox? If so, that’s pretty cool, even without AI integration.
I've seen Roblox's creative tools, even their GenAI tools, but they're bolted on. It's the steam powered horse problem.
I think this puts Epic Games, Nintendo, and the whole lot into a very tough spot if this tech takes off.
I don't see how Unreal Engine, with its voluminous and labyrinthine tomes of impenetrable legacy C++ code, survives this. Unreal Engine is a mess, gamers are unhappy about it, and it's a PITA to develop with. I certainly hate working with it.
Innovator's Dilemma fast approaching the entire gaming industry and they don't even see it coming it's happening so fast.
Exciting that building games could become as easy as having the idea itself. I'm imagining something like VRChat or Roblox or Fortnite, but where new things are simply spoken into existence.
It's absolutely terrifying that Google has this much power.
This is 100% going to happen on-device. It's just a matter of time.
Maybe just as kind of a DLSS on steroids where the engine only renders very simple objects and a world model translates these to the actual graphics.
Not for video games it isn’t.
I for one would love a video game where you're playing in a psychedelic, dream-like fugue.
It is not currently, or near term, realistic to make a video game where a meaningful portion of the simulation is part of the model.
There will probably be a few interactive model-first experiences. But they’ll be popular as short novelties not meaningful or long experiences.
A simple question to consider is how would you adjust a set of simple tunables in a model-first simulator? For example giving the player more health, making enemies deal 2x damage, increasing move speed, etc etc. You can not.
1. You can see fine textures "jump" every 4 frames - which means they're most likely using a 4x-temporal-downscaling VAE with at least 4-frame interaction latency (unless the VAE is also control-conditional). Unfortunately I didn't see any real-time footage to confirm the latency (at one point they intercut screen recordings with "fingers on keyboard" b-roll? hmm).
2. There's some 16x16 spatial blocking during fast motion which could mean 16x16 spatial downscaling in the VAE. Combined with 1, this would mean 24x1280x720/(4x16x16) = 21,600 tokens per second, or around 1.3 million tokens per minute.
3. The first frame of each clip looks a bit sharper and less videogamey than later stationary frames, which suggests this is could be a combination of text-to-image + image-to-world system (where the t2i system is trained on general data but the i2w system is finetuned on game data with labeled controls). Noticeable in e.g. the dirt/textures in [2]. I still noticed some trend towards more contrast/saturation over time, but it's not as bad as in other autoregressive video models I've seen.
[1] https://x.com/demishassabis/status/1940248521111961988
[2] https://deepmind.google/api/blob/website/media/genie_environ...
[1] https://x.com/holynski_/status/1952756737800651144
[2] https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...
Secondly, it's fairly clear now that our sensory inputs are not being experienced as sensory inputs. We experience a reconstruction. Obvious basic sign of this is that we fill in the gap in vision where the optic nerve is. But generally, we're making an integrated world model all the time out of the senses, and are conscious of that world model.
You're right though, both the above are rendering the experience and can take shortcuts for that. It's sufficiently detailed in each case though that it kinda is rendering the world too, in some sense.
so better than Stadia?
While I don't fully align with the sentiment of other commenters that this is meaningless unless you can go hands on... it is crazy to think of how different this announcement is than a few years ago when this would be accompanied by an actual paper that shared the research.
Instead... we get this thing that has a few aspects of a paper - authors, demos, a bibtex citation(!) - but none of the actual research shared.
I was discussing with a friend that my biggest concern with AI right now is not that it isn't capable of doing things... but that we switched from research/academic mode to full value extraction so fast that we are way out over our skis in terms of what is being promised, which, in the realm of exciting new field of academic research is pretty low-stakes all things considered... to being terrifying when we bet policy and economics on it.
To be clear, I am not against commercialization, but the dissonance of this product announcement made to look like research written in this way at the same time that one of the preeminent mathematicians writing about how our shift in funding of real academic research is having real, serious impact is... uh... not confidence inspiring for the long term.
From my best guess: it's a video generation model like the ones we already head. But they condition inputs (movement direction, viewangle). Perhaps they aren't relative inputs but absolute and there is a bit of state simulation going on? [although some demo videos show physics interactions like bumping against objects - so that might be unlikely, or maybe it's 2D and the up axis is generated??].
It's clearly trained on a game engine as I can see screenspace reflection artefacts being learned. They also train on photoscans/splats... some non realistic elements look significantly lower fidelity too..
some inconsistencies I have noticed in the demo videos:
- wingsuit discollcusions are lower fidelity (maybe initialized by high resolution image?)
- garden demo has different "geometry" for each variation, look at the 2nd hose only existing in one version (new "geometry" is made up when first looked at, not beforehand).
- school demo has half a caroutside the window? and a suspiciously repeating pattern (infinite loop patterns are common in transformer models that lack parameters, so they can scale this even more! also might be greedy sampling for stability)
- museum scene has odd reflection in the amethyst box, like the rear mammoth doesn't have reflections on the right most side of the box before it's shown through the box. The tusk reflection just pops in. This isn't fresnel effect.
Eg: Using AI to generate textures, wire models, motion sequences which themselves sum up to something that local graphics card can then render into a scene.
I'm very much not an expert in this space, but to me it seems if you do that, then you can tweak the wire model, the texture, move the camera to wherever you want in the scene etc.
The model can infinitely zoom in to some surface and depict(/predict) what would really be there. Trying to do so via classical rendering introduces many technical challenges
So for example, a game designer might tell the AI the floor is made of mud, but won’t tell the AI what it looks like if the player decides to dig a 10 ft hole in the mud, or how difficult it is to dig, or what the mud sounds like when thrown out of the hole, or what a certain NPC might say when thrown down the hole, etc.
To classically render this in any realistic fashion, it quickly gets complex. Between the physics simulation (rather involved) and the number of triangles (trees have many branches and leaves), you're going to be doing a lot of math.
I'll emphasize "realistic" - sure, we can real-time render trees in 2025 that look.. ok. However, take more than a second to glance at it and you will quickly start to see where we have made compromises to the tree's fidelity to ensure it renders at an adequate speed on contemporary hardware.
Now consider a world model trained on enough tree footage that it has gained an "intuition" about how trees look and behave. This world model doesn't need to actually simulate the entire tree to get it to look decent.. it can instead directly output the pixels that "make sense". Much like a human brain can "simulate" the movement of an object through space without expending much energy - we do it via prediction based on a lot of training data, not by accurately crunching a bunch of numbers.
That's just one tree, though - the real world has a lot of fidelity to it. Fidelity that would be extremely expensive to simulate to get a properly realistic output on the other side.
Instead we can use these models which have an intuition for how things aught to look. They can skip the simulation and just give you the end result that looks passable because it's based on predictions informed by real-world data.
This is already happening to some extent, some games struggle to reach 60 FPS at 4K resolution with maximum graphics settings using traditional rasterization alone, so technologies like DLSS 3 frame generation are used to improve performance.
You could have a stripped down traditional game engine, but without any rendering, that gives a richer set of actions to the neural net. Along with some asset hints, story, a database (player/environment state) the AI can interact with, etc. The engine also provides bounds and constraints.
Basically, we need to work out the new boundary between engine and AI. Right now it's "upsample and interpolate frames", but as AI gets better, what does that boundary become?
Disconcerting that it's daydreaming rather than authoring?
Another linguistic devastation. A "world model" is in epistemology the content of a representation of states of thing - all states of things, facts and logic.
This use of the expression "world model" seems to be a reduction. An that's too bad, because we needed the idea in its good form to speak about what neural networks contain, in this LLM sub-era.
Like the new widespread sloppy use of the expression "AI", this does not contribute to collective mental clarity.
Reminds me of when image AIs weren't able to generate text. It wasn't too long until they fixed it.
What I’d really love to see more of is augmented video. Like, the stormtrooper vlogs. Runway has some good stuff but man is it all expensive.
Walking/Running/Steps have already been solved pretty well with NN’s, but simulation of vehicle engines and vehicle physics have not. Not to my knowledge. I suspect iRacing would be extremely interested in such a model.
edit
I take it back, PINN’s are a thing and now I have a new rabbit hole…
In game engines it's the engineers, the software developers who make sure triangles are at the perfect location, mapping to the correct pixels, but this here, this is now like a drawing made by a computer, frame by frame, with no triangles computed.
I'm most excited for when these methods will make a meaningful difference in robotics. RL is still not quite there for long-horizon, sparse reward tasks in non-zero-sum environments, even with a perfect simulator; e.g. an assistant which books travel for you. Pay attention to when virtual agents start to really work well as a leading signal for this. Virtual agents are strictly easier than physical ones.
Compounding on that, mismatches between the simulated dynamics and real dynamics make the problem harder (sim2real problem). Although with domain randomization and online corrections (control loop, search) this is less of an issue these days.
Multi-scale effects are also tricky: the characteristic temporal length scale for many actions in robotics can be quite different from the temporal scale of the task (e.g. manipulating ingredients to cook a meal). Locomotion was solved first because it's periodic imo.
Check out PufferAI if you're scale-pilled for RL: just do RL bigger, better, get the basics right. Check out Physical Intelligence for the same in robotics, with a more imitation/offline RL feel.
Creativity is taken from us at exponential rate. And I don't buy argument from people who are saying they are excited to live in this age. I can get that if that technology stopped at current state and remained to be just tools for our creative endeavours, but it doesn't seem to be an endgame here. Instead it aims to be a complete replacement.
Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.
So what is final state here for us? Return to menial not-yet-automated work? And when this would be eventually automated, what's left? Plug our brains to personalized autogenerated worlds that are tailored to trigger related neuronal circuitry for producing ever increasing dopamine levels and finally burn our brains out (which is arguably already happening with tiktok-style leasure)? And how you are supposed to pay for that, if all work is automated? How economics of that is supposed to work?
Looks like a pretty decent explanation of Fermi paradox. No-one would know how technology works, there are no easily available resources left to make use of simpler tech and planet is littered to the point of no return.
How to even find the value in living given all of that?
Numerous famous writers, painters, artists, etc counter this idea, Kafka being a notable example, whose significant works only came to light after his passing and against his will. This doesn't take away from the rest of your discussion point, but art always has and always will also exist solely for its own sake.
What argument is required for excitement? Excitement is a feeling not a rational act. It comes from optimism and imagination. There is no argument for optimism. There is often little reason in imagination.
> How to even find the value in living given all of that?
You might have heard of the Bhagavad Gita, a 2000+ year old spiritual text. It details a conversation between a warrior prince and a manifestation of God. The warrior prince is facing a very difficult battle and he is having doubts justifying any action in the face of the decisions he has to make. He is begging this manifestation of God to give him good reasons to act, good reasons not just to throw his weapons down, give away all his possessions and sit in a cave somewhere.
There are no definite answers in the text, just meditations on the question. Why should we act when the result is ultimately pointless, we will all die, people will forget you, situations will be resolved with or without you, etc.
This isn't some new question that LLMs are forcing us to confront. LLMs are just providing us a new reason to ask the same age-old questions we have been facing for as long as writing has existed.
I wonder if mental exercises will move to the same category? Not necessarily a way to earn money, but something everybody does as a way of flourishing as a human.
Nothing can take away your ability to have incredible experiences, except if the robots kill us all.
I don't want to live in a world where these things are generated cheaply and easily for the profit of a very select few group of people.
I know the world doesn't work like I described in the top paragraph. But it's a lot closer to it than the bottom.
There will be two classes of media:
- Generated, consumed en-masse by uncreative, uninspired individuals looking for cheap thrill
- Human created, consumed by discerning individuals seeking out real human talent and expression. Valuing it based merely on the knowledge that a biological brain produced (or helped produce) it.
I tend to suspect that the latter will grow in value, not diminish, as time progresses
people said the world could literally end if we train anything bigger than chatgpt4... I would take these projections with a handful of salt
There’s no bright line between computer and human-created video - computer tools are used everywhere.
Rewarded how? 99.99% of people who do things like sports or artistic like writing never get "rewarded for doing so", at least in the way I imagine you mean the phrase. The reward is usually the experience itself. When someone picks up a ball or an instrument, they don't do so for some material reward.
Why should anyone be rewarded materially for something like this? Why are you so hung up on the <0.001% that can actually make some money now having to enjoy the activity more as a hobby than a profession.
Why am I so "hung up" on the livelihood of these people?
Doing art is a Hobby is a good in and of itself. I did not say otherwise. But when I see a movie, when I listen to a song, I want to appreciate the integrity and talent of the people that wrote them. I want them to get paid for that enjoyment. I don't think that's bizarre.
That world has only existed for the last hundred or so years, and the talent is usually brutally exploited by people whose main talent is parasitism. Only a tiny percentage of people who sell creative works can make a living out of it; the living to be made is in buying their works at a premium, bundling them, and reselling them, while offloading almost all of the risk to the creative as an "advance."
Then you're left in a situation where both the buyer of art and the creator of art are desperate to pander to the largest audience possible because everybody is leveraged. It's a dogshit world that creates dogshit art.
A better example would be Spotify replacing artist-made music recommandations with low-quality alternatives, to reduce what it pays to artists. Everyone except Spotify loses in this scenario.
The future with AI is not going to be our current world with some parts replaced by AI. It will be a whole new way of life.
What does social mean in a future that could just simulate it.
Water cooler talk about what happened this week in M.A.S.H. or Friends is extinct.
Worse, in the long run even community may be synthesized. If a friend is meat or if they're silicon (or even carbon fiber!), does it matter if you can't tell the difference? It might to pre-modern boomers like me and you.
Virtual influencers might be a big thing, Hatsune Miku has lots of fans. But it's still a shared fandom.
And even so, music production has been a constant evolution of replacing prior technologies and making it easier to get into. It used to be gatekept by expensive hardware.
- Because you enjoy it
- Because you get pats in the back from people you share it with
- Because you want to earn money from it
The 1st one will continue to be true in this dystopian AI art future, the other not so much.
And sincerely I find that kind of human art, the one that comes from a pure inner force, the more interesting one.
EDIT: list formatting
Why should we be so desperate to cling to a system that isn't even working?
No it won’t, you’ll be too busy trying to survive off of what pittance is left for you to have any time to waste on leisure activities.
We can use these to create entire virtual worlds, games, software that incorporates these, and to incorporate creativity and media into infinitely more situations in real life.
We can create massive installations that are not a single image but an endless video with endless music, and then our hand turns to stabilizing and styling and aestheticizing those exactly in line with our (the artist's) preferences.
Romanticizing the idea that picking at a guitar is somehow 'more creative' than using a DAW to create incredibly complex and layered and beautiful music is the same thing that's happening here, even if the primitives seem 'scarier' and 'bigger'.
Plus, there are many situations in life that would be made infinitely more human by the introduction of our collective work in designing our aesthetic and putting it into the world, and encoding it into models. Installations and physical spaces can absolutely be more beautiful if we can produce more, taking the aesthetic(s) that we've built so far and making them dynamic to spaces.
Also for learning: as a young person learning to draw and sing and play music and so many other things, I would have tremendously appreciated the ability to generate and follow subtle, personalized generation - to take a photo of a scene in front of me and have the AI first sketch it loosely so that I can copy it, then escalate and escalate until I can do something bigger.
The way I see it, most people aren't creative. And the people who are creatives are mostly creating for the love of it. Most books that are published are read exclusively by the friends and family of the author. Most musicians, most stand-up comedians, most artist get to show off their works for small groups of people and make no money doing so. But they do it anyway. I draw terrible portraits, make little inventions and sometimes I build something for the home, knowing full well that I do these things for my own enjoyment and whatever ego boost I get from showing these things off to people I know.
I'm doing a marathon later and I've been working my ass off for the prospect of crossing the finishing line as number four thousand and something, and I'll do it again next year.
1. Universal Basic Income as we're on the way to a post-scarcity society. Unlikely to actually happen due to greed.
2. We take inspiration from the french revolution and then return to a simpler time.
Greed makes no sense in a truly post scarcity society. There is no scarcity from which to take in a zero sum way from another.
Status is the real issue. Humans use status to select sexually, and the display is both competitive and comparative. It doesnt matter absolutely how many pants you have, only that you have more and better than your competition.
I actually think this thing is baked into our DNA and until sex itself is saturated (if there is such a thing), or DNA is altered, we will continue to have a however subtle form of competition undergirding all interactions.
>Vote for me and we'll hand free money to everyone and the robots will do the work
at the moment is the robots doing the work don't exist. Things will change when they do.
There is no end to semiotics, and therefore no end to greed. In this case, scarcity is artificial; created only via socially imposed monopoly. If we truly want a post-scarcity society, then we must abolish copyright.
[1] - https://en.wikipedia.org/wiki/Simulacra_and_Simulation
Greedy people wouldn't even think about sharing this tech with the world, even though they could literally end world hunger. They will hoard it, and use it to make themselves more powerful and to kill anyone else who looks like they might discover it.
Luigi Mangione has shown that all it takes is one person in the right time and place to remove some evil from the world.
I sit and play guitar by myself all the time, I play for nobody but myself, and I enjoy it a lot. Your argument is absurd.
Kids do it all the time.
> So what is final state here for us?
Something I haven't seen discussed too much is taste - human tastes change based on what has come before. What we will care about tomorrow is not what we care about today.
It seems plausible to me that generative AI could get higher and higher quality without really touching how human tastes changes. That would leave a lot of room for human creativity IMO - we have shared experience in a changing world that seems very hard to capture with data.
There's a whole host of "art" that has been created by people - sometimes for themselves, sometimes for a select few friends - which had little purpose beyond that creation[1]. Some people create art because they simply have to create art - for pleasure, for therapy, for whatever[2]. For many, the act of creation was far more important than the act of distribution[3].
For me, my obsession is constructing worlds, maps, societies and languages that will almost certainly die with me. And that's fine. When I feel the compulsion, I'll work on my constructions for a while, until the compulsion passes - just as I have done (on and off) for the past 50 years. If the world really needs to know about me, then it can learn more than it probably wants to know through my poetry.
[1] - Emily Dickinson is an obvious example: https://en.wikipedia.org/wiki/Emily_Dickinson
[2] - Coral Castle, Florida: https://en.wikipedia.org/wiki/Coral_Castle
[3] - Federico Garcia Lorca almost certainly didn't write his Sonetos del amor oscuro for publication - he just needed to write them: https://es.wikisource.org/wiki/Sonetos_del_amor_oscuro
For example, robot boxing: https://www.youtube.com/watch?v=rdkwjs_g83w
But it might also go the way of pottery, glass-making and weaving. They’re still around but extremely niche.
Most commercial artists are very much unknown, in the background. This is a different situation from sport
Synthetic data can be useful until a certain point, but you can’t expect to have a better model on synthetic data alone indefinitely.
The moat of GDM here is YouTube. That have a bazillion of gameplay and whatever videos. But here it is.
The downside I can see is that most people will stop to publish content online for free since this companies have absolutely no respect whatsoever for the humans that created the data they use.
I think we have a long way to go yet. Humanity is still in the early stages of its tech tree with so many unknown and unsolved problems. If ASI does happen and solves literally everything, we will be in a position that is completely alien to what we have right now.
> How to even find the value in living given all of that?
I feel like a lot of AI angst comes from people who place their self-worth and value on external validation. There is value in simply existing and doing what you want to do even if nobody else wants it.
I agree on this point, and have come to that conclusion myself regarding my own AI angst. However that doesn't solve the economic issues that arise from this technology. As large swathes of the workforce becomes replaced (something that, in my opinion, is rapidly approaching), how do we organise society so that everyone can survive / thrive?
As far as I can see there is very little impetus behind tackling such issues, compared to the forces pushing this tech forward so rapidly.
People still value Amish furniture or woodworking despite Ikea existing. I love that if I want a cheap chair made of cardboard and glue that I can find something to satisfy that need; but I still buy nice furniture when I can.
AI creations are analogous. I've seen some cool AI stuff, but it definitely doesn't replace the real "organic" art one finds.
These fears aren't realized if AI never achieves superhuman performance, but what if they do?
(2) AI has already achieved superhuman performance in breadth and, with tuning, depth.
My only hope is this: I think the depression is telling us something real, we are collectively mourning what we see as the loss of our humanity and our meaning. We are resilient creatures though, and hopefully just like the ozone layer, junk food, and even the increasing rejections of social media and screen time, we will navigate it and reclaim what’s important to us. It might take some pain first though.
Yes, AI can make music that sounds decent and lyrics that rhyme and can even be clever. But listen to a couple songs and your brain quickly spots the patterns. Maybe AI gets there some day, but the uncanny valley seems to be quite a chasm - and anything that approaches the other side seems to do so by piling lots of human intention along the way.
The merge. (https://blog.samaltman.com/the-merge)
I'm quite enthusiastic. I've always thought mortality sucks.
I can understand it's very interesting from a researcher's point-of-view (I'm a software dev who's worked adjacent to some ML researchers doing pipeline stuff to integrate models into software), but at the same time: Where are the robots to do menial work like clean toilets, kitchens, homes, etc?
I assume the funding isn't there? Or maybe it's much less exciting to research diffusion networks for image generation that working out algorithms for the best way to clean toilets :)
I wonder how advanced world models like genie 3 would change the approach if it all.
also the billionaires have help so they don't give a shit if the menial stuff is automated or not. throw in a little misogyny by and large too; I saw a LinkedIn Lunatic in the wild (some C-level) saying laundry is already automated because laundry machines exist
fucking.. tell me you don't ever do the laundry without telling me. That guy's poor wife.
Or kittens and puppies. Do you think there won't be kittens and puppies?
And that's putting aside all the obvious space-exploration stuff that will probably be more interesting than anything the previous 100 billion humans ever saw.
Aren't other planets / moons etc. basically just barren deserts of rock and dust? Once you get over the novelty of it, it will basically just be the shittiest and most uncomfortable place you've ever been.
"Nothing human makes it out of the near-future."
If humans are not stretched to their limits, and are still able to be creative, then the tools will help us find our way through this infinite space.
AI will never be able to generate everything for us, because that means it will need infinite computation.
Edit: left the page open for a while before responding, and the other person responded with basically the same thing within that time.
Similar to how synths meant we no longer need to play an instruments by plucking strings, it hasn’t affected the higher level creativity of creating music, only expanded it.
Humans have demonstrated time and again, even things beyond our experience can be explored by us; quantum mechanics for example. Humans find a way to map very complex subjects to our own experience using analogy. Maybe AI can help us go further by allowing us to do this on even more complex ideas.
Wow. What a picture! Here's an optimistic take, fwiw: Whenever we have had a paradigm shift in our ability to process information, we have grappled with it by shifting to higher-level tasks.
We tend to "invent" new work as we grapple with the technology. The job of a UX designer did not exist in 1970s (at least not as a separate category employing 1000s of people; now I want to be careful this is HN, so there might be someone on here who was doing that in the 70s!).
And there is capitalism -- if everyone has access to the best-in-class model, then no one has true edge in a competition. That is not a state that capitalism likes. The economics _will_ ultimately kick in. We just need this recent S-curve to settle for a bit.
People say this all the time, but I think it's a very short-sighted view. It really begs the question: do you believe that there are tasks that exist which a human can do, but we could not train an AI to also do? The difference between AI and any other technological advancement is that AI is (or promises to be, and I have no reason to believe otherwise) a tool that can be adapted to any task. I don't think analogies to history really apply here.
Till then, I just learn the tools with the deepest understanding that I can muster and so far the deeper I go, the less impressed with "automated everything" I become, because it isn't really going to be capable of doing anything people are going to find interesting when the creativity well dries up.
It's not. We will be replaced, but the AI will carry on.
a lot of these comments border on cult thinking. it's a fucking text to 3D image model, not R Daneel Olivaw, calm down
I'll concede that it might take even longer to get full artificial human capabilities (robust, selfrepairing, selfreplicating, adaptable), but the writing is on the wall.
Even in the very best case that I see (non-malicious AI with a soft practical ceiling not too far beyond human capabilities) poses giant challenges for our whole society, just in ressource allocation alone (because people, as workers, become practically worthless, undermining our whole system completely).
Nothing is being taken away.
With UBI, probably. With a central government formed by our robot overlords. But why even pay us at that point?
If your value in living is in any way affected by AI, ever, then, well, let's just say I would never choose that for myself. Good luck.
You don't think there was ever a time without a mass media culture? Plenty of people have furniture older than mass media culture. Even 20 years ago people could manage to be creative for a tiny audience of what were possibly other people doing creative things. It's only the zoomers who have never lived in a world where you never thought to consider how you could sell the song you were writing in your bedroom to the Chinese market.
It used to be that music didn't come on piano rolls, records, tapes, CDs or files. It used to be that your daughter would play music on the piano in the living room for the entire family. Even if it was music that wouldn't really sell, and wasn't perfectly played, people somehow managed to enjoy it. It was not a situation that AI could destroy. If anything, AI could assist.
My only hope is that we could have created 100k nukes of monstrous yields but collectively decided not to. We instead created 10k smaller ones. We could have destroyed ourselves long ago but managed to avoid it.
Work is fundamental part of society and will never be eliminated, regardless of its utility/usefulness. The cast/class system determines the type of work. The amount (time) of work is set as it was discovered additional leisure and to reduce it does not improve individuals happiness.
With business as usual capital is power and capital is increasingly getting centralized.
For too long has humanity been collectively submerged into this hyper-consumption of the arts. We, our parents and our grandparents have been getting bombarded by some or the other form of artificial dopamine sweets - from videos to reels to xeets to "news" to ads to tunes to mainstream media - every second of the day, every single day. The kind of media consumption we have every day is something our forefathers would have been overwhelmed by within an hour. It is not natural.
This complete cheapening of the arts is finally giving us a chance to shed off this load for good.
The main challenge over the next decade as all our media channels are flooded with generated media will become curation. We desperately need ways to filter human-created content from generated content. Not just for the sake of preserving art, but for avoiding societal collapse from disinformation, which is a much more direct and closer threat. Hell, we've been living with the consequences of mass disinformation for the past decade, but automated and much more believable campaigns flooding our communication platforms will drastically lower the signal-to-noise ratio. We're currently unable to even imagine the consequences of that, and are far from being prepared for it.
This tech needs strict regulation on a global scale. Anyone against this is either personally invested in it, or is ignorant of its dangers.
I've only ever seen demos of these models where things happen from a first-person or 3rd-person perspective, often in the sort of context where you are controlling some sort of playable avatar. I've never seen a demo where they prompted a model to simulate a forest ecology and it simulated the complex interplay of life.
Hence, it feels like a video game simulator, or put another way, a simulator of a simulator of a world model.
This is a pretty clear example of video game physics at work. In the real world, both the jetski and floating structure would be much more affected by a collision, but in the context of video game physics such an interaction makes sense.
So yeah, it's a video game simulator, not a world simulator.
I don't doubt they're trying to create a world simulator model, I just think they're inadvertently creating a video game simulator model.
It is interesting to think about. This kind of training and model will only capture macro effects. You cannot use this to simulate what happens in a biological cell or tweak a gravity parameter and see how plants grow etc. For a true world model, you'd need to train models that can simulate at microscopic scales as well and then have it all integrated into a bigger model or something.
As an aside, I would love to see something like this for the human body. My belief is that we will only be able to truly solve human health if we have a way of simulating the human body.
That's an insane product right there just waiting to happen. Too bad Google sleeps so hard on the tech they create.
You'd have some "please wait in this lobby space while we generate the universe" moments, but those are easy to hide with clever design.
But once you can get N cameras looking at the same world-state, you can make them N players, or a player with 2 eyes.
https://odyssey.world/introducing-interactive-video
I don't think Humans are the target market for this model, at least right now.
Sounds like the use case is creating worlds for AI agents to play in.
I DECLARE BANKRUPTCY vibes here
Meet the Natives:
https://www.youtube.com/watch?v=S1DqarK4NlA
There is a second season where a different tribe go to USA.
Additionally, video seems like a pretty forward output shape to me - 2D image with a time component. If we were talking 3D assets and animations I wouldn't even know where to start with modeling that as input data for training. That seems really hard to model as a fixed input size problem to me.
If there was comparable 3D data available for training, I'd guess that we'd see different issues with different approaches.
A couple of examples that I could think of quickly: Using these to build games, might be easier if we could interact with the underlying "assets". Getting photorealistic results with intricate detail (e.g. hair, vegetation) might be easier with video based solutions.
There’s absolutely no reason that a game needs to be generated frame-by-frame like this. It seems like a deeply unserious approach to making games.
(My feeling is that it must be easier to train this way.)
3D model rendering would be useful however for interfacing with robots.
In VR, for example, the same 3D scene will be rendered twice, once for each eye, from two viewpoints 10-15cm apart.
If you don’t have an internal 3D representation of the world, the AI would need to generate exactly the same scene from a very slightly different perspective for each eye, without any discrepancies or artefacts.
And that’s not even discussing physics, collisions or any form of consistent world logic that happens off-screen. Or multiplayer!
https://odyssey.world/introducing-interactive-video
I know that everyone always worries about trapping people in a simulation of reality etc. etc. but this would have blown my mind as a child. Even Riven was unbelievable to me. I spent hours in Terragen.
No need to explore; I can tell you how. Release the weights to the general public so that everyone can play with it and non-Google researchers can build their work upon it.
Of course this isn't going to happen because "safety". Even telling us how many parameters this model has is "unsafe".
It's a nice step towards gains in embodied AI. Good work, DeepMind.
Sora was described very similar to this as a "world simulator" but ultimately it never materialized.
This one is a bit more hopeful from the videos though.
Another interesting angle is retrofitting existing 2D content (like videos, images, or even map data) into interactive 3D experiences. Imagine integrating something like this into Google Maps suddenly street view becomes a fully explorable 3D simulation generated from just text or limited visual data.
Genie 3 isn’t that though. I don’t think it’s actually intended to be used for games at all.
…and this is the worst the capabilities will ever be.
Watching the video created a glimmer of doubt that perhaps my current reality is a future version of myself, or some other consciousness, that’s living its life in an AI hallucinated environment.
- Google search
- Web browsers
- Web content
- Internet Explorer
- Music
- Flight process at Mosul airport
- Star Wars
And then you watched Mandalorian and Andor?
Jokes aside, Google Search results are worse thanks to so much web content being just ad scaffolding, but the interesting one here is music.
Music is typically imagined to be its best at whatever ages one most listened to it, partly trained in and partly thanks to meanings/memories/nostalgia attached to it. As a consequence, for most everyone, more recent music seems to be “getting worse”!
That said, and back to the SEO effect on Google Results, I'd argue mass distribution/advertising/marketing has resulted in most audio airtime getting objectively* less complex, but if one turns off the mass distribution, and looks around, there seems to be plenty of just as good — even building on what came before — music to be found.
* https://www.researchgate.net/publication/387975100_Decoding_...
(Also, implying that music has gotten worse is a boomer-ass take. It might not be to your liking, but there's more of it than ever before, and new sonic frontiers are being discovered every day.)
Personal jetpacks are the worst they’ll ever be. Doesn’t mean they’re any close to being useful.
Your comparison is incorrect
Not sure if that's what you are trying to say about AI, or not.
Have they become better over the past 20 years?
Due to this physical limitation, what you 'see' in front of you, widely accepted as ground truth reality, cannot possibly real, its a hallucination produced by your brain.
Your brain, compared to the sensory richness of reality you experience around you, has very limited direct inputs from the outside world, it must construct a rich internal model based on this.
It's very weird (at least to me), that the boundary between reality and assumption (basically educated guessing) is very arbitrary, and definitely only exists in our heads.
The next step is to realize that, if life is a cheap simulation, not everyone might have... uh... fully simulated minds. Player Characters vs NPCs is what gamers would say, though it doesn't have to be binary like that, and the term NPC has already been ruined by social media rants. (Also, NPC is a bad insult because most of the coolest characters in games are NPC rivals or bosses or whatnot.)
> …and this is the worst the capabilities will ever be.
I guess if this bothers you (and I can see how it might) you can take some small comfort in thinking that (due to enshitification) this could in fact be the _best_ the capabilities will ever be.
[1]: https://www.worldlabs.ai/
[2]: https://wayfarerlabs.ai/
[3]: https://runwayml.com/research/introducing-general-world-mode...
1. Almost all human-level civilizations go extinct before reaching a technologically mature “posthuman” stage capable of running high-fidelity ancestor simulations.
2. Almost no posthuman civilizations are interested in running simulations of their evolutionary history or beings like their ancestors.
3. We are almost certainly living in a computer simulation.
I think some/all of these things can roughly true at the same time. Imagine an infinite space full of chaotic noise that arises a solitary Boltzmann brain, top level universe and top level intelligence. This brain, seeking purpose and company in the void, dreams of itself in various situations (lower level universes) and some of those universes' societies seek to improve themselves through deliberate construction of karmic cycle ancestor simulation. A hierarchy of self-similar universes.
It was incredibly comforting to me to think that perhaps the reason my fellow human beings are so poor at empathy, inclusion, justice, is that this is a karmic kindergarten where we're intended to be learning these skills (and the consequences for failing to perform them) and so of course we're bad at it, it's why we're here.
Why would beings in simulations be conscious?
Or maybe running simulations is really expensive and so it's done sometimes (more than "almost none") but only sometimes (nowhere near "we are almost certainly").
Or simulations are common but limited? You don't need to simulate a universe if all you want to do is simulate a city.
The "trilemma" is an extreme example of black-and-white thinking. In the real world, things cost resources and so there are tradeoffs -- so middle grounds are the rule, not extremes.
I wonder how much it costs to run something like this.
Our product was a virtual 3d world made up of satellite data. Think of a very quick, higher-res version of google earth, but the most important bit was that you uploaded a GPS track and it re-created the world around that space. The camera was always focused on the target, so it wasn't a first person point of view, which, for the most part, our brains aren't very good at understanding over an extended period of time.
For those curious about the use case, our product was used by every paraglider in the world, commercial drone operations, transportation infrastructure sales/planning, out-door events promotions (specifically bike and ultramarathon races).
Though I suspect we will see a new form of media come from this. I don't pretend to suggest exactly what this media will be, but mixing this with your photos we can see the potential for an infinitely re-framable and zoomable type of photo media.
Creating any "watchable" content will be challenging if the camera is not target focused, and it makes it difficult to create a storyline if you can't dictate where the viewer is pointed.
(See "Exploring locations and historical settings" scene 5.)
https://arstechnica.com/information-technology/2023/12/googl...
While watching the video I was just imagining the $ increasing by the second. But then it's not available at all yet :(
Really great work though, impressive to see.
That's not the point, video games are worth chump-change compared to robotics. Training AIs on real-world robotic arms scaled poorly, so they're looking for paths that leverage what AI scales well at.
At the limit, if you could stay engaged you would be an expert in pretty much anything.
"It doesn't help students understand why things happened, and what the consequences were and how they have impacted the rest of history of the modern world." I would say the opposite, let's recreate each step in that historical journey so you can see exactly what the concequenses were, exactly why they happened and when.
I think some people want to play, and some want to experience, in different proportions. Tetris is the emanation of pure gameplay, but then you have to remember "Colossal Cave Adventure" is even older than Tetris. So there's a long history of both approaches, and for one of them, these models could be helpful.
Not that it matters. Until the models land in the hands of indie developers for long enough for them to prove their usefulness, no large developer will be willing to take on the risks involved in shipping things that have the slightest possibility of generating "wrong" content. So, the AI in games is still a long way off, I think.
Yes.
> No most people are testing skills in video games
That's not mutually exclusive with playing for scenery.
Games, like all art, have different communities that enjoy them for different reasons. Some people do not want their skills tested at all by a game. Some people want the maximum skill testing. Some want to experience novel fantasy places, some people want to experience real places. Some people want to tell complex weaving narratives, some people want to optimize logistics.
A game like Flower is absolutely a game about looking at pretty scenery and not one about testing skill.
You must be young. As people get older they (usually) care less about that.
Actually that game felt a lot like these videos, because often you would turn around and then look back and the game had deleted the NPCs and generated new ones, etc.
But if you believe reality is a simulation, why would these “efficient” world-generation methods convince you of anything? The tech our reality would have to be running on is still inconceivable science fiction.
Not like this we haven't. This is convincing because I can have any of you close your eyes and imagine a world where pink rabbits hand out parking tickets. We're a neurolink away from going from thought > to prompt > to fantasy.
To add: our reality does not have to be rendered in it's entirety, we'll just have very convincing and unscripted first-person view simulations. Only what you look at is getting rendered (e.g. tiny structures only get rendered when you use microscope).
What this means is that a robot model could be trained 1000x faster on GPUs compared to training a robot in the physical world where normal spacetime constraints apply.
https://extraakt.com/extraakts/google-s-genie-3-capabilities...
I think it just outputs image frames...
Are they just multimodal for everything?
Are foundational time series models included in this category?
This feels almost exactly like that, especially the weird/dreamlike quality to it.
I feel like as time goes on more and more of these important features are showing up as disconnected proofs of concept. I think eventually we'll have all the pieces and someone will just need to hook them together.
I am more and more convinced that AGI is just going to eventually happen and we'll barely notice because we'll get there inch by inch, with more and more amazing things every day.
This is starting to feel pretty **ing exponential.
/s
It's a whitepaper release to share the STOTA research. This doesn't seem like an economically viable model, nor does it look polished enough to be practically usable.
The main product of the telescope is its data, not the ability for anyone to play with the instruments.
The main product of the model is the ability for anyone to play with it.
Strange rebutal.
We know how James Webb works and it's developed by an international consortium of researchers. One of our most trusted international institutions, and very verifiable.
We do not know how Genie works, it is unverifiable to non-Google researchers, and there are not enough technical details to move much external teams forward. Worst case, this page could be a total fabrication intended to derail competition by lying about what Google is _actually_ spending their time on.
We really don't know.
I don't say this to defend the other comment and say you're wrong, because I empathize with both points. But I do think that treating Google with total credulity would be a mistake, and the James Webb comparison is a disservice to the JW team.
I would actually turn that around. The Telescope is released. It's flying around up there taking photos. If they kept it in some garage while releasing flashy PR pages about how groundbreaking it is, then I'd be pretty skeptical.
good writers will remain scarce though.
maybe we will have personalized movies written entirely through A.I
Obviously, none of these are super viable given the low accuracy and steerability of world models out today, but positive applications for this kind of tech do exist.
Also (I'm speculating now instead of restating the video), I think pretty soon someone will hook up a real time version of this to a voice model, and we will get some kind of interactive voice + keyboard (or VR) lucid dream experience.