Ask HN: What'd be possible with 1000x faster CPUs?
Imagine if we had an unlikely scientific breakthrough and many orders of magnitude faster general-purpose CPUs, probably alongside petabyte-scale RAM modules and appropriately fast memory bus, become widely available. Besides making bloatware on a previously unimaginable scale possible, what other interesting, maybe revolutionary, impossible today or at least impractical, applications would crop up then?
Video engineer here. Many seemingly network restricted tasks could be unlocked with faster CPUS doing advanced compression and decompression.
1. Video Calls
In video calls, encoding and decoding is actually a significant cost of video calls, not just networking. Right now the peak is Zoom's 30 video streams onscreen, but with 1000x CPUS you can have 100s of high quality streams with advanced face detection and superscaling[1]. Advanced computer vision models could analyze each face creating a face mesh of vectors, then send those vector changes across the wire instead of a video frame. The receiving computers could then reconstruct the face for each frame. This could completely turn video calling into a CPU restricted task.
2. Incredible Realistic and Vast Virtual Worlds
Imagine the most advanced movie realistic CGI being generated for each frame. Something like the new Lion King or Avatar like worlds being created before you through your VR headset. With extremely advanced eye tracking and graphics, VR would hit that next level of realism. AR and VR use cases could explode with incredibly light headsets.
To be imaginative, you could have everything from huge concerts to regular meetings take play in the real world, but be scanned and sent to VR participants in real time. The entire space including the room and whiteboard or live audience could be rendered in realtime for all VR participants.
> In video calls, encoding and decoding is actually a significant cost of video calls, not just networking. Right now the peak is Zoom's 30 video streams onscreen, but with 1000x CPUS you can have 100s of high quality streams with advanced face detection and superscaling[1]. Advanced computer vision models could analyze each face creating a face mesh of vectors, then send those vector changes across the wire instead of a video frame. The receiving computers could then reconstruct the face for each frame. This could completely turn video calling into a CPU restricted task.
Interesting, how do you see this different from deep learning based video coding recently demonstrated? [1]
Realistically, AI network training at the level being done by corporations with big server farms, becomes accessible to solo devs and hobbyists (let's count GPU's as general purpose). So if you want your own network for Stable Diffusion or Leela Chess, you can do on your own PC. I think that is the most interesting obvious consequence.
Also, large scale data hoarding becomes far more affordable (I assume the petabyte ram modules also mean exabyte disk drives). So you can be your own Internet Archive, which is great. Alternatively, you can be your own NSA or Google/Facebook in terms of tracking everyone, which is less great.
I think when that hardware is attainable and the tech democratized things are going to get very bizarre very quickly. I'm hitting a wall in my imagination of what a society where this is common even looks like and it scares me.
Electron basically IS an entire OS. Since Chromium has APIs for doing just about anything, including accessing the filesystem and USB devices and 500 other APIs.
If _accessing_ the filesystem counts toward being an OS, and not _implementing_ the filesystem, then I guess Qt and the stdlib of every lang is also "kind of an OS"
That's splitting hairs. Paravirtualised IO on a virtual machine doesn't make the guest OS running inside it, any less of an OS just because it has a simpler interface to the outside world than a SATA/SAS/NVMe controller.
Some applications depend on approximately solving optimization problems that are hard even for small problems.
The poster child here is combinatorial optimization (more or less equivalently, np-complete problems), concrete examples are SMT solvers and their applications to software verification [1]. Non convex problems are sometimes similarly bad.
Non smooth and badly conditioned optimization problems scale much better with size, but getting high precision solutions is hard. These are important for simulations mentioned elsewhere, but not just for architecture and games, also for automating design, inspections etc [2].
The thing is, computing has been getting steadily faster, just not at quite the pace it was before and in a different way.
With GPUs we have proven that parallelism can be just as good or even better than speed increases in enhancing computation. And there again have been speed increases trickling in.
I don't think it's realistic to say that more speed advances are unlikely. We have already been through many different paradigm shifts in computing, from mechanical to nanoscale. There are new paradigms coming up such as memristors and optical computing.
It seems like 1000x will make Stable Diffusion-style video generation feasible.
We will be able to use larger, currently slow AI models in realtime for things like streaming compression or games.
Real global illumination in graphics could become standard.
Much more realistic virtual reality. For example, imagine a realistic forest stream that your avatar is wading through, with realtime accurate simulation of the water, and complex models for animal cognition of the birds and squirrels around you.
I think with this type of speed increase we will see fairly general purpose AI, since it will allow average programmers to easily and inexpensively experiment with combining many, many different AI models together to handle broader sets of tasks and eventually find better paradigms.
It also could allow for emphasis on iteration in AI, and that could move the focus away from parallel-specific types of computation back to more programmer-friendly imperative styles, for example if combined with many smaller neural networks to enable program synthesis, testing and refinement in real time.
Here's a weird one: imagine something like emojis in VR, but in 3d, animated, and customized on the fly for the context of what you are discussing, automatically based on an AI you have given permission to.
Or, hook the AI directly into your neocortex. Hook it into several people's neocortices and then train an animated AI 3d scene generation system to respond to their collective thoughts and visualizations. You could make serialized communication almost obsolete.
However, 1000x is really not very much. With a 1000x uplift, we could certainly get better weather predictions, but not necessarily paradigm-altering improvement. In a real sense, we already have 1000x speedup and its what you get in a contemporary "supercomputer", whatever that is in a given market at a given point in history.
Let's say we had perfect 1000x improvement in compute, storage, and IO such that everything remains balanced. A fluid-dynamics or atmospheric simulation can only increase resolution by about 10x if a 3D volumetric grid is refined uniformly, or only about 5x if we spread it uniformly over 4D to also improve temporal resolution. Or maybe you decide to increase the 2D geographic reach of a model by 30x and leave the height and temporal resolution alone. These growth factors are not life-changing unless you happen to be close to a non-linear boundary where you cross a threshold from impractical to practical.
I'm not sure we can say how much a video game would improve. There are so many "dimensions" that are currently limited and it's hard to say where that extra resource budget should go. Maybe you currently can simulate a dozen interesting NPCs and now you could have a crowd of 10,000 of them. But you still couldn't handle a full stadium full of these interesting behaviors without another 10x of resources...
I work on an open source multiplayer game that's limited by single thread CPU speed so I can give a perspective of what would improve for us at least.
The fastest thing to change is we'd increase player limits per server, per player CPU costs are significant and we could bring the player limits to maybe 500 before network speeds start being a consideration. Certain ai improvements that are currently not viable like goal oriented ai design and pathfinding improvements could be added that would make new kinds of gameplay possible. Hell with even just 10x I would be very tempted to try unifying our atmospheric and chemistry simulations so they use the same data structures, thus allowing chemical reactions between gases that aren't basically masses of nonstandard performance hacks on the back end.
In short though, even minor performance improvements would vastly change what we could accomplish. 1000x is extreme and you would see very different games that could make use of techniques that today are mostly relegated to games built around them as a gimmick that they make sacrifices for.
>With GPUs we have proven that parallelism can be just as good or even better than speed increases in enhancing computation.
Not really, no. It's just that certain classes of problems can be very readily parallelized and it's relatively easy to figure out how to do something 1000x in parallel compared to figuring out how to achieve a 1000x single thread speedup.
>Much more realistic virtual reality. For example, imagine a realistic forest stream that your avatar is wading through, with realtime accurate simulation of the water, and complex models for animal cognition of the birds and squirrels around you.
I'm not sure 1000x would do much more than scratch the surface of that, especially if you're already tying a lot of it up with higher fidelity rendering.
In a lot of aspects the limiting factor of using mobiles as workstations is the software and OS, you can add a Bluetooth keyboard and mouse then cast it to a screen but all you will get is a bigger phone and not a workstation. Mobile CPUs are not that bad nowadays.
8 core 2.8 Ghz, 11 GB RAM, 256 GB storage, liquid cooling, camera zooms at the level of a toy microscope. This is more powerful than some gaming PCs just a while ago.
It runs fine, but any less gets laggy, so I suspect apps like Facebook and TikTok are just going to continue to swallow up any more power.
the best thing I know of is you could emulate 256 bits with 4x64 bits float (double) and then use the derivative of mandelbrot to approximate the fractals around interesting points
it would be nice for the architecture field. we deal with lots of crappy unoptimized software that's 20-30 years old. so if you like nice buildings and better energy performance (which requires simulations), give us faster cpus.
imagine you're working on airport. thousands of sheets, all of them PDF. hundreds or thousands of people flipping PDFs and waiting 2-3+ seconds for the screen to refresh. CPUs baby, we need CPUs.
No, we're not there yet. Ray tracing in games is still merely augmenting traditional rasterization, and requires heavy post-processing to denoise because we cannot yet run with enough rays per pixel to get a stable, accurate render.
I feel like we are - I can run Minecraft RTX at 4k with acceptable framerate using DLSS 2.0 on a 3090. Minecraft is using pure raytracing (no rasterization). It also isn't using A-SVGF or ReSTIR, so there are 2 pretty big improvements that could be made.
Minecraft RTX does suffer really badly with ghosting when you destroy a light source, but my intuition says that A-SVGF would fix that entirely.
That being said, some of the newest techniques, like ReSTIR PT (a generalized form) have only been published for a couple of months, so current games don't have that yet. But in 3-6 months I would start to expect some games go with a 100% RT approach.
Still orders of magnitude away from full tracing, only as a part of traditional rendering, with a ton of hacks on top.
Actually, there always was a lingering suspicion that brute force simulation might get sidestepped by some another clever technique long before it's achieved, to get both photorealism and ease of creation. ML style transfer could potentially become such a technique (or not).
One thing i'd like to see would be smart traffic lights. For example as soon as a person finishied crossing the road, when there is noone else it switches back to green immediately.
Assuming that a CPU at today's speeds would require vastly less power, we would have very powerful, very efficient mobile devices such as smartwatches.
Probably using AI a lot more, on-device for every single camera.
Cheaper employees. With faster CPU's, they won't need to understand leetcode level optimization, i.e. they won't need expensive or sophisticated training. Just find someone with a pulse and stick them in front of the computer. Less-than-ideal big O's won't be an issue with this kind of speed.
Less time spent in software development on optimization. That might sound horrible at first, but also means that less resources need to be used for programming something
Windows update in the background would take 3 hours invested of 4.
Average nodejs manifest file would contain x12000 more dependencies.
Also, we would see a ton more AI being done on the local CPU. Anything from genuine OS improvements to super realistic cat filters on teams/zoom.
And finally, I think people would need to figure out storage and network bottlenecks because there is only so much you can do with compute before you end up stalling waiting for more data
we have always been memory-bound, in one way or another, even today.
the difference in performance between an application using RAM with random access patterns and an application using RAM sequentially is far more than you are expecting it to be if you haven’t actually measured it. an order of magnitude or more for sequential stuff over random access. having your data already in the L1 cache before you need it is worth the effort it takes to make that happen.
Indeed, but in the case of your average application it is not only lack of will/expertise to optimize, but also simply that the program domain has a much more random memory allocation pattern. Most programs are not operating in a single hot loop on terabytes of data.
A 1000000000hp equivalent electric truck generating much tork would probably lift off and fly to the Moon, or dig itself so deep it would melt in lava. In the meantime, a cybertruck with 3 motors (or 4) may soon (2023?) challenge Ferrari.
Training time is a massive constraint on advancement of the science, so at the very least the field would progress much faster and be much more accessible to researchers.
I feel people are overlooking the OP's mention of parallel improvements in storage and speed of access. While there are physical limits to this, I feel like capabilities will continue to expand not so much in terms of pure speed as in better automation of parallelization and resource allocation.
I think AGI requires different topological/conceptual paradigms rather than pure speed/processing capacity. But the latter is necessary to experiment and create recognizable results.
A lot of the current excitement around AI image construction and SD's availability is the intuitive sense that these tools have succeeded in emulating some key aspects of our visual cortex - given a set of object classifiers they can create imaginary views that are recognizable to us. It's sort of an illusion - Stable Diffusion has no aesthetic or experiential preferences of its own and so its activity is reflexive rather than conscious, and we don't understand if or how consciousness is emergent from complex reflexivity.
But the key point is that it's doing such a good job at this 'narrow' task of visual synthesis, and other models are doing such a good job at the 'narrow' tasks of textual or audible synthesis, that it's competitive with a human in an idiot-savant kind of way. And we know from our own experience that skill and learning are protean - we may disagree on the value of different types of learning, but don't question the similarity of the underlying mechanism. Thus I might think that becoming an expert on, say, the fictional universe of Star Wars is a waste of time, but the process of knowledge acquisition, recall, and synthesis are not fundamentally different from those used to learn history or engineering ('experimentation' can exist in terms of consensus establishment in a fandom about whether an innovation is canonical or parodic).
So if we can train models with a billion semantically-tagged media objects and have them generate new media objects that meaningfully reflect the tags we supply, it means we have a decent general environmental-feature detection, recall, and resynthesis tool. Being able to take an existing model and tune it on workstations instead of needing a whole datacenter substantially widens the field of possibilities. So what happens if we connect it to sensors and actuators and train our model to navigate a dynamic landscape, which includes 'internal' signals that can't be directly responded to? Consider a virtual or lab environment which is complex and dynamic, and includes energy units (batteries). Our model has internal batteries and feedback mechanisms, but their state can only be altered through external activity and their signals are heavily weighted. Sensory subsystems attached to the model have some precomputed models of their own.
My idea is that the brain is a 'system of systems' and that consciousness emerges from the instrumentation of the time cost of model tuning vs the rate of environmental variation.
java might run at a decent speed... Might, but probably won't (jk, sorry, I couldn't help myself...) [edit Grammarly decided to remove some text when fixing spelling...]
1. Video Calls
In video calls, encoding and decoding is actually a significant cost of video calls, not just networking. Right now the peak is Zoom's 30 video streams onscreen, but with 1000x CPUS you can have 100s of high quality streams with advanced face detection and superscaling[1]. Advanced computer vision models could analyze each face creating a face mesh of vectors, then send those vector changes across the wire instead of a video frame. The receiving computers could then reconstruct the face for each frame. This could completely turn video calling into a CPU restricted task.
2. Incredible Realistic and Vast Virtual Worlds
Imagine the most advanced movie realistic CGI being generated for each frame. Something like the new Lion King or Avatar like worlds being created before you through your VR headset. With extremely advanced eye tracking and graphics, VR would hit that next level of realism. AR and VR use cases could explode with incredibly light headsets.
To be imaginative, you could have everything from huge concerts to regular meetings take play in the real world, but be scanned and sent to VR participants in real time. The entire space including the room and whiteboard or live audience could be rendered in realtime for all VR participants.
[1] https://developer.nvidia.com/maxine-getting-started
Interesting, how do you see this different from deep learning based video coding recently demonstrated? [1]
[1]https://dl.acm.org/doi/10.1145/3368405
Also, large scale data hoarding becomes far more affordable (I assume the petabyte ram modules also mean exabyte disk drives). So you can be your own Internet Archive, which is great. Alternatively, you can be your own NSA or Google/Facebook in terms of tracking everyone, which is less great.
"Play me Frank Zappa's new album featuring Kanye West."
It will also mean data in general will be bigger and scale accordingly.
Non smooth and badly conditioned optimization problems scale much better with size, but getting high precision solutions is hard. These are important for simulations mentioned elsewhere, but not just for architecture and games, also for automating design, inspections etc [2].
[1] https://ocamlpro.github.io/verification_for_dummies/
[2] https://www.youtube.com/watch?v=1ALvgx-smFI&t=14s
With GPUs we have proven that parallelism can be just as good or even better than speed increases in enhancing computation. And there again have been speed increases trickling in.
I don't think it's realistic to say that more speed advances are unlikely. We have already been through many different paradigm shifts in computing, from mechanical to nanoscale. There are new paradigms coming up such as memristors and optical computing.
It seems like 1000x will make Stable Diffusion-style video generation feasible.
We will be able to use larger, currently slow AI models in realtime for things like streaming compression or games.
Real global illumination in graphics could become standard.
Much more realistic virtual reality. For example, imagine a realistic forest stream that your avatar is wading through, with realtime accurate simulation of the water, and complex models for animal cognition of the birds and squirrels around you.
I think with this type of speed increase we will see fairly general purpose AI, since it will allow average programmers to easily and inexpensively experiment with combining many, many different AI models together to handle broader sets of tasks and eventually find better paradigms.
It also could allow for emphasis on iteration in AI, and that could move the focus away from parallel-specific types of computation back to more programmer-friendly imperative styles, for example if combined with many smaller neural networks to enable program synthesis, testing and refinement in real time.
Here's a weird one: imagine something like emojis in VR, but in 3d, animated, and customized on the fly for the context of what you are discussing, automatically based on an AI you have given permission to.
Or, hook the AI directly into your neocortex. Hook it into several people's neocortices and then train an animated AI 3d scene generation system to respond to their collective thoughts and visualizations. You could make serialized communication almost obsolete.
Let's say we had perfect 1000x improvement in compute, storage, and IO such that everything remains balanced. A fluid-dynamics or atmospheric simulation can only increase resolution by about 10x if a 3D volumetric grid is refined uniformly, or only about 5x if we spread it uniformly over 4D to also improve temporal resolution. Or maybe you decide to increase the 2D geographic reach of a model by 30x and leave the height and temporal resolution alone. These growth factors are not life-changing unless you happen to be close to a non-linear boundary where you cross a threshold from impractical to practical.
I'm not sure we can say how much a video game would improve. There are so many "dimensions" that are currently limited and it's hard to say where that extra resource budget should go. Maybe you currently can simulate a dozen interesting NPCs and now you could have a crowd of 10,000 of them. But you still couldn't handle a full stadium full of these interesting behaviors without another 10x of resources...
The fastest thing to change is we'd increase player limits per server, per player CPU costs are significant and we could bring the player limits to maybe 500 before network speeds start being a consideration. Certain ai improvements that are currently not viable like goal oriented ai design and pathfinding improvements could be added that would make new kinds of gameplay possible. Hell with even just 10x I would be very tempted to try unifying our atmospheric and chemistry simulations so they use the same data structures, thus allowing chemical reactions between gases that aren't basically masses of nonstandard performance hacks on the back end.
In short though, even minor performance improvements would vastly change what we could accomplish. 1000x is extreme and you would see very different games that could make use of techniques that today are mostly relegated to games built around them as a gimmick that they make sacrifices for.
Not really, no. It's just that certain classes of problems can be very readily parallelized and it's relatively easy to figure out how to do something 1000x in parallel compared to figuring out how to achieve a 1000x single thread speedup.
>Much more realistic virtual reality. For example, imagine a realistic forest stream that your avatar is wading through, with realtime accurate simulation of the water, and complex models for animal cognition of the birds and squirrels around you.
I'm not sure 1000x would do much more than scratch the surface of that, especially if you're already tying a lot of it up with higher fidelity rendering.
It runs fine, but any less gets laggy, so I suspect apps like Facebook and TikTok are just going to continue to swallow up any more power.
I mean one of the fundamental attributes of infinity is that you can never be 'almost there'.
imagine you're working on airport. thousands of sheets, all of them PDF. hundreds or thousands of people flipping PDFs and waiting 2-3+ seconds for the screen to refresh. CPUs baby, we need CPUs.
Minecraft RTX does suffer really badly with ghosting when you destroy a light source, but my intuition says that A-SVGF would fix that entirely.
That being said, some of the newest techniques, like ReSTIR PT (a generalized form) have only been published for a couple of months, so current games don't have that yet. But in 3-6 months I would start to expect some games go with a 100% RT approach.
Actually, there always was a lingering suspicion that brute force simulation might get sidestepped by some another clever technique long before it's achieved, to get both photorealism and ease of creation. ML style transfer could potentially become such a technique (or not).
Probably using AI a lot more, on-device for every single camera.
Also, 1000x parallelism or 1000x single core?
Higher IPC, higher clock, more cores, more cache, more cache levels, more memory bandwidth, faster memory access, faster decode, etc.
One idea I imagine would be possible with a 1000x speed would be real time software defined radio capture, analysis and injection.
This would lead to a complete chaos, until we update our security standards.
With 1000X CPU computing, each computer will have equivalent computing power as human brain.
So brain compute interface or jarvis like AI may get possible
Average nodejs manifest file would contain x12000 more dependencies.
Also, we would see a ton more AI being done on the local CPU. Anything from genuine OS improvements to super realistic cat filters on teams/zoom.
And finally, I think people would need to figure out storage and network bottlenecks because there is only so much you can do with compute before you end up stalling waiting for more data
the difference in performance between an application using RAM with random access patterns and an application using RAM sequentially is far more than you are expecting it to be if you haven’t actually measured it. an order of magnitude or more for sequential stuff over random access. having your data already in the L1 cache before you need it is worth the effort it takes to make that happen.
This is absolutely true
And MacOS updates will still find a way to take your machine offline for an hour
A lot of the current excitement around AI image construction and SD's availability is the intuitive sense that these tools have succeeded in emulating some key aspects of our visual cortex - given a set of object classifiers they can create imaginary views that are recognizable to us. It's sort of an illusion - Stable Diffusion has no aesthetic or experiential preferences of its own and so its activity is reflexive rather than conscious, and we don't understand if or how consciousness is emergent from complex reflexivity.
But the key point is that it's doing such a good job at this 'narrow' task of visual synthesis, and other models are doing such a good job at the 'narrow' tasks of textual or audible synthesis, that it's competitive with a human in an idiot-savant kind of way. And we know from our own experience that skill and learning are protean - we may disagree on the value of different types of learning, but don't question the similarity of the underlying mechanism. Thus I might think that becoming an expert on, say, the fictional universe of Star Wars is a waste of time, but the process of knowledge acquisition, recall, and synthesis are not fundamentally different from those used to learn history or engineering ('experimentation' can exist in terms of consensus establishment in a fandom about whether an innovation is canonical or parodic).
So if we can train models with a billion semantically-tagged media objects and have them generate new media objects that meaningfully reflect the tags we supply, it means we have a decent general environmental-feature detection, recall, and resynthesis tool. Being able to take an existing model and tune it on workstations instead of needing a whole datacenter substantially widens the field of possibilities. So what happens if we connect it to sensors and actuators and train our model to navigate a dynamic landscape, which includes 'internal' signals that can't be directly responded to? Consider a virtual or lab environment which is complex and dynamic, and includes energy units (batteries). Our model has internal batteries and feedback mechanisms, but their state can only be altered through external activity and their signals are heavily weighted. Sensory subsystems attached to the model have some precomputed models of their own.
My idea is that the brain is a 'system of systems' and that consciousness emerges from the instrumentation of the time cost of model tuning vs the rate of environmental variation.
It uses 50x the RAM to do so. But you're dead wrong to think java is slow.
The only reason physics game engines are written in C++ is because physics game engines are written in C++.
It's not 90%, more of several times slower: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
>The only reason physics game engines are written in C++ is because physics game engines are written in C++.
They are written in C++ because of latency requirements which are nearly impossible in GCed language.