This article reports on A$AP Rocky's music video production using volumetric performance capture and Gaussian splatting, celebrating technological innovation that enables creative freedom. The content positively engages with cultural participation through artistic expression and recognition of skilled labor, framing technical advancement as expanding human expressive capability and participation in the arts and sciences.
Be sure to watch the video itself* - it’s really a great piece of work. The energy is frenetic and it’s got this beautiful balance of surrealism from the effects and groundedness from the human performances.
* (Mute it if you don’t like the music, just like the rest of us will if you complain about the music)
To be honest it looks like it was rendered in an old version of Unreal Engine. That may be an intentional choice - I wonder how realistic guassian splatting can look? Can you redo lights, shadows, remove or move parts of the scene, while preserving the original fidelity and realism?
The way TV/movie production is going (record 100s of hours of footage from multiple angles and edit it all in post) I wonder if this is the end state. Gaussian splatting for the humans and green screens for the rest?
Really amazing video. Unfortunately this article is like 60% over my head. Regardless, I actually love reading jargon-filled statements like this that are totally normal to the initiated but are completely inscrutable to outsiders.
"That data was then brought into Houdini, where the post production team used CG Nomads GSOPs for manipulation and sequencing, and OTOY’s OctaneRender for final rendering. Thanks to this combination, the production team was also able to relight the splats."
Super cool to read but can someone eli5 what Gaussian splatting is (and/or radiance fields?) specifically to how the article talks about it finally being "mature enough"? What's changed that this is now possible?
I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.
If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.
Tangential, but I've been exploring gaussian splatting as a photographic/artistic medium for a while, and love the expressionistic quality of the model output when deprived of data.
The end result is really interesting. As others have pointed out, it looks sort of like it was rendered by an early 2000s game engine. There’s a cohesiveness to the art direction that you just can’t get from green screens and the like. In service of some of the worst music made by human brains, but still really cool tech.
A shame that kid was slept on. Allegedly (according to discord) he abandoned this because so many artists reached out to have him do this style of mv, instead of wanting to collaborate on music.
Hello! I’m Chris Rutledge, the post EP / cg supervisor at Grin Machine. Happy to answer any questions. Glad people are enjoying this video, was so fun to get to play with this technique and help break it into some mainstream production
The texture of Gaussian Splatting always looks off to me. It looks like the entire scene has been textured or has a bad, uniform film grain filter to me. Everything looks a little off in an unpleasing way -- things that should be sharp are aren't, and things that should be blurry are not. It's uncanny valley and not in a good way. I don't get what all the rage is about it and it always looks like really poor B-roll to me.
It’s interesting to see Gaussian splatting show up in a mainstream music video this quickly. A year ago it was mostly a research demo, and now artists are using it as part of their visual toolkit.
What I find most notable is how well splats handle chaotic motion like helicopter shots — it’s one of the few 3D reconstruction methods that doesn’t completely fall apart with fast movement.
Feels like we’re going to see a lot more of this in creative work before it shows up in anything “serious”.
> many viewers focused on the chaos, the motion, and the unmistakable early MTV energy of the piece
It certainly moves around a lot!
It certainly looks like the tech and art style here are indissociable. Not only the use of Gaussian Splats made such extreme camera movement possible, one can be argued that it made them necessary.
Pause the video and notice the blurriness and general lack of details. But the frantic motion doesn't let the viewer focus on this details, most of them hidden by a copious amount of motion blur anyways.
To me it is typical of demos, both as in the "demoscene" and "tech demo" sense, where the art style is driven by the technology, insisting on what it enables, while at the same time working around its shortcomings. I don't consider it a bad thing of course, it leads to a lot of creativity and interesting art styles.
Oh wow, somehow I was not aware of how capable this technology has become, looks like a major game changer, across many fields.
In the near term, it could be very useful for sports replays. The UFC has this thing where they stitch together sequences of images from cameras all around the ring, to capture a few seconds of '360 degree' video of important moments. It looks horrible, this would be a huge improvement.
Hello! I'm Ben Nunez, CEO at Evercoast. Our software was used to capture and reconstruct the 4D Gaussian splats in this A$AP Rocky video.
The music video is a mix of creative and technical genius from several different teams, and it's ultimately a byproduct of building tooling to capture reality once and reuse it downstream.
There’s a lot more to explore here. Once you have grounded 4D human motion and appearance tied to a stable world coordinate system, it becomes a missing primitive for things like world models, simulation, and embodied AI, where synthetic or purely parametric humans tend to break down.
A$AP Mob has some really great music videos. They're usually not the first to adopt a new technology, but they love to push the envelope and popularize fringe techniques.
For the ELI5, Gaussian splatting represents the scene as millions of tiny, blurry colored blobs in 3D space and renders by quickly "splatting" them onto the screen, making it much faster than computing an image by querying a neural net model like radiance fields.
Hah, for the past day, I've been trying to somehow submit the Helicopter music video / album as a whole to HN. Glad someone figured out the angle was Gaussian.
Gaussian splatting is a way to record 3-dimensional video. You capture a scene from many angles simultaneously and then combine all of those into a single representation. Ideally, that representation is good enough that you can then, post-production, simulate camera angles you didn't originally record.
For example, the camera orbits around the performers in this music video are difficult to imagine in real space. Even if you could pull it off using robotic motion control arms, it would require that the entire choreography is fixed in place before filming. This video clearly takes advantage of being able to direct whatever camera motion the artist wanted in the 3d virtual space of the final composed scene.
To do this, the representation needs to estimate the radiance field, i.e. the amount and color of light visible at every point in your 3d volume, viewed from every angle. It's not possible to do this at high resolution by breaking that space up into voxels, those scale badly, O(n^3). You could attempt to guess at some mesh geometry and paint textures on to it compatible with the camera views, but that's difficult to automate.
Gaussian splatting estimates these radiance fields by assuming that the radiance is build from millions of fuzzy, colored balls positioned, stretched, and rotated in space. These are the Gaussian splats.
Once you have that representation, constructing a novel camera angle is as simple as positioning and angling your virtual camera and then recording the colors and positions of all the splats that are visible.
It turns out that this approach is pretty amenable to techniques similar to modern deep learning. You basically train the positions/shapes/rotations of the splats via gradient descent. It's mostly been explored in research labs but lately production-oriented tools have been built for popular 3d motion graphics tools like Houdini, making it more available.
Hi, I'm one of the creators of GSOPs for SideFX Houdini.
The gist is that Gaussian splats can replicate reality quite effectively with many 3D ellipsoids (stored as a type of point cloud). Houdini is software that excels at manipulating vast numbers of points, and renderers (such as Octane) can now leverage this type of data to integrate with traditional computer graphics primitives, lights, and techniques.
The aesthetic here is at least partially an intentional choice to lean into the artifacts produced by Gaussian splatting, particularly dynamic (4DGS) splatting. There is temporal inconsistency when capturing performances like this, which are exacerbated by relighting.
That said, the technology is rapidly advancing and this type of volumetric capture is definitely sticking around.
It’s a point cloud where each point is a semitransparent blob that can have a view dependent color: color changes depending on direction you look at them. Allowing to capture reflections, iridescence…
You generate the point clouds from multiple images of a scene or an object and some machine learning magic
Similarly, the music video for Taylor Swif[0] (another track by A$AP Rocky) is just as surrealistic and weird in the best way possible, but with an eastern european flavor of it (which is obviously intentional and makes sense, given the filming location and being very on-the-nose with the theme).
Knowing what I know about the artist in this video this was probably more about the novelty of the technology and the creative freedom it offers rather than it is budget.
For me it felt more like higher detail version of Teardown, the voxel-based 3d demolition game. Sure it's splats and not voxels, but the camera and the lighting give this strong voxel game vibe.
1. Create a point cloud from a scene (either via lidar, or via photogrammetry from multiple images)
2. Replace each point of the point cloud with a fuzzy ellipsoid, that has a bunch of parameters for its position + size + orientation + view-dependent color (via spherical harmonics up to some low order)
3. If you render these ellipsoids using a differentiable renderer, then you can subtract the resulting image from the ground truth (i.e. your original photos), and calculate the partial derivatives of the error with respect to each of the millions of ellipsoid parameters that you fed into the renderer.
4. Now you can run gradient descent using the differentiable renderer, which makes your fuzzy ellipsoids converge to something closely reproducing the ground truth images (from multiple angles).
5. Since the ellipsoids started at the 3D point cloud's positions, the 3D structure of the scene will likely be preserved during gradient descent, thus the resulting scene will support novel camera angles with plausible-looking results.
Can such plugin be possible for Davinci Resolve, to have merge of scene captured from two iPhones with spatial data, into 3D scene?
With M4 that shouldn’t be problem?
Several of ASAP's video have a lo-fi retro vibe, or specific effects such as simulating stuff like a mpeg a/v corruption, check out A$AP Mob - Yamborghini High (https://www.youtube.com/watch?v=tt7gP_IW-1w)
My bad! I am the author. Gaussian splatting allows you to take a series of normal 2D images or a video and reconstruct very lifelike 3D from it. It’s a type of radiance field, like NeRFs or voxel based methods like Plenoxels!
I’m fascinated by the aesthetic of this technique. I remember early versions that were completely glitched out and presented 3d clouds of noise and fragments to traverse through. I’m curious if you have any thoughts about creatively ‘abusing’ this tech? Perhaps misaligning things somehow or using some wrong inputs.
I think this tech has become "production-ready" recently due to a combination of research progress (the seminal paper was published in 2023 https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) and improvements to differentiable programming libraries (e.g. PyTorch) and GPU hardware.
Awesome work, incredibly well done! What was the process like for setting the direction on use of these techniques with Rakim? Were you basically just trusted to make something great or did they have a lot of opinions on the technicalities?
Article substantively celebrates technological innovation in service of artistic and cultural production. Frames volumetric capture and Gaussian splatting as advancing human capability to participate in cultural creation. Presents as 'one of the most ambitious real world deployments' suggesting significant cultural achievement.
FW Ratio: 50%
Observable Facts
Article documents technological innovation applied to music video production, describing it as 'one of the most ambitious real world deployments of dynamic gaussian splatting in a major music release'
Narrative celebrates advancement of 3D capture and rendering techniques, positioning them as enabling new forms of artistic expression
Article discusses how technology enables participation in cultural expression previously impossible: 'visuals that simply weren't possible before'
Inferences
The framing of technical advancement as culturally significant suggests advocacy for human participation in scientific and artistic progress
The emphasis on innovation in service of creative expression positions technology as expanding human cultural capability
The celebration of novel visual techniques applied to music implicitly endorses advancement of the arts and sciences as human good
Article celebrates creative expression through technology, emphasizing director's goal for 'radical freedom in post-production' and technology enabling visuals 'simply weren't possible before.' Frames volumetric capture as expanding expressive possibilities.
FW Ratio: 50%
Observable Facts
Article states director approached project with goal to capture performances in way allowing 'radical freedom in post-production'
Article describes volumetric capture as enabling 'visuals that simply weren't possible before'
Narrative emphasizes that every physical performance was 'captured in real space' and authentically preserved
Inferences
The framing suggests that technological advancement expands human creative expression capacity
The emphasis on 'freedom' and enabling new visual possibilities implicitly advocates for creative autonomy
The authentication of human performance against synthetic appearance suggests advocacy for authentic creative expression
Article credits multiple named professionals and teams (Evercoast, Chris Rutledge, Grin Machine, Wilfred Driscoll, WildCapture, Fitsū.ai), acknowledging skilled labor and expertise. However, does not discuss wages, working conditions, or labor rights per se.
FW Ratio: 50%
Observable Facts
Article names and credits multiple professional contributors: Evercoast, Chris Rutledge (CG Supervisor), Grin Machine, Wilfred Driscoll, WildCapture, and Fitsū.ai
Author describes role of 'teams' and acknowledges expertise ('CG Supervisor,' 'director,' technical specialists)
Inferences
Crediting named professionals suggests recognition of their skilled labor and intellectual contribution
The detailed attribution implies respect for professional expertise, though without explicit engagement with labor rights
build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 10:41:39 UTC
Support HN HRCB
Each evaluation uses real API credits. HN HRCB runs on donations — no ads, no paywalls.
If you find it useful, please consider helping keep it running.