That wasp is one of the single most impressive pieces of computer graphics I have ever seen, and seemingly in contradiction also a fantastic piece of macro photography. The fact it renders in real time is amazing.
There was a discussion on here the other day about the PS6, and honestly were I involved in consoles/games production anymore I'd be looking seriously about how to incorporate assets like this.
I wonder if there's research into fitting gaussian splats that are dependent on focus distance? Basically as a way of modeling bokeh - you'd feed the raw, unstacked shots and get a sharp-everywhere model back.
Thanks for the links, that is great to know.
I'm not quite sold if it's the better approach. You'd need to do SfM (tracking) on the out of focus images, which with macro subject can be really blurry, I don't know how well that works.. and a lot more of images too. You'd have group them somehow or preprocess.. then you're back to focus stacking first :-)
The linked paper describes a pipeline that starts with “point cloud from SfM” so they’re assuming away this problem at the moment.
Is it possible to handle SfM out of band? For example, by precisely measuring the location and orientation of the camera?
The paper’s pipeline includes a stage that identifies the in-focus area of an image. Perhaps you could use that to partition the input images. Exclusively use the in-focus areas for SfM, perhaps supplemented by out of band POV information, then leverage the whole image for training the splat.
Overall this seems like a slow journey to building end-to-end model pipelines. We’ve seen that in a few other domains, such as translation. It’s interesting to see when specialized algorithms are appropriate and when a unified neural pipeline works better. I think the main determinant is how much benefit there is to sharing information between stages.
How does it capture the reflection (the iridescence of the fly's body)? It's almost as if I can see the background through the reflection.
I would have thought that since that reflection has a different color in different directions, gaussian splat generation would have a hard time coming to a solution that satisfies all of the rays. Or at the very least, that a reflective surface would turn out muddy rather than properly reflective-looking.
Is there some clever trickery that's happening here, or am I misunderstanding something about gaussian splats?
Gaussian splats can have colour components that depend on the viewing direction. As far as I know, they are implemented as spherical harmonics. The angular resolution is determined by the number of spherical harmonic components. If this is too low, all reflection changes will be slow and smooth, and any reflection will be blurred.
The color is view-dependent, which also means the lighting is baked in and results in them not being usable directly for 3D animation/environments (though I’m sure there must be research happening on dynamic lighting).
Sometimes it will “go wrong”, you can see in some of the fly models that if you get too close, body parts start looking a bit transparent as some of the specular highlights are actually splats on the back of an internal surface. This is very evident with mirrors - they are just an inverted projection which you can walk right into.
Nothing fancy. Postshot does need a nvidia card though, I have a 3060Ti. A single insect, with around 5 million splats takes about 3 hours to train in high quality.
It is remarkable that this is accomplished with relatively modest setup and effort, and the results are already great. Makes me wonder what you could get with high-end gear (e.g. 61mp sony a7rv and the new 100mm 1.4x macro) and capturing more frames. I also imagine that the web versions lose some detail to reduce size.
I presume these would look great on good vr headset?
There was a discussion on here the other day about the PS6, and honestly were I involved in consoles/games production anymore I'd be looking seriously about how to incorporate assets like this.
https://dof-gs.github.io/
https://dof-gaussian.github.io/
Is it possible to handle SfM out of band? For example, by precisely measuring the location and orientation of the camera?
The paper’s pipeline includes a stage that identifies the in-focus area of an image. Perhaps you could use that to partition the input images. Exclusively use the in-focus areas for SfM, perhaps supplemented by out of band POV information, then leverage the whole image for training the splat.
Overall this seems like a slow journey to building end-to-end model pipelines. We’ve seen that in a few other domains, such as translation. It’s interesting to see when specialized algorithms are appropriate and when a unified neural pipeline works better. I think the main determinant is how much benefit there is to sharing information between stages.
I would have thought that since that reflection has a different color in different directions, gaussian splat generation would have a hard time coming to a solution that satisfies all of the rays. Or at the very least, that a reflective surface would turn out muddy rather than properly reflective-looking.
Is there some clever trickery that's happening here, or am I misunderstanding something about gaussian splats?
Sometimes it will “go wrong”, you can see in some of the fly models that if you get too close, body parts start looking a bit transparent as some of the specular highlights are actually splats on the back of an internal surface. This is very evident with mirrors - they are just an inverted projection which you can walk right into.
I'd also like to show my gratitude for you releasing this as a free culture file! (CC BY)
[1] https://youtu.be/wEiBxHOGYps
Black text on a dark grey background is nearly unreadable - I used Reader Mode.
I'd love to know the compute hardware he used and the time it took to produce.
I presume these would look great on good vr headset?