5.5 KB · updated 2026-07-06 · md

synthesis.md

examples/demo-corpus/data/research/weekly/2026-W18/synthesis.md

Weekly synthesis — 2026-W18

The thread this week: the SDS distillation paradigm that DreamFusion launched in 2022 is quietly being deprecated. Two replacements are competing for the slot: splatting-native SDS variants that make distillation cheap enough to live with, and feed-forward Large Reconstruction Models that skip per-scene optimisation entirely.

Where SDS got to

The SDS line is well-traced in this corpus. DreamFusion was the first paper to backpropagate a 2D diffusion model's score into a per-scene 3D representation; it produced cartoonish geometry and took hours per asset. Magic3D added a coarse-to-fine schedule with a mesh-export second stage. Zero-1-to-3 reconditioned a diffusion model on camera pose, so the prior had geometric structure rather than only appearance-level structure; Magic123 combined the two priors. ProlificDreamer is the most intellectually satisfying of the line — it reframed SDS as a variational problem (Variational Score Distillation) and made the mode-collapse disappear.

Then DreamGaussian swapped the backbone for 3D Gaussians, and the per-asset cost dropped from hours to minutes. The same paradigm now applies, but cheaply. LucidDreamer is the text-to-Gaussian-scene generalisation; GALA3D extends to layout-guided compositional scenes; AGG is the amortised single-image variant.

Where it's going

The more disruptive move is feed-forward. LRM (Hong et al., 2023) trains a giant transformer on Objaverse to produce a triplane radiance field from a single image in ~5 seconds — no per-scene optimisation, no SDS loop. Instant3D chains a multi-view diffusion model into an LRM to make the input less ambiguous. LGM ports the same idea onto the splatting representation: feed-forward predict Gaussians from sparse multi-view inputs. TripoSR is the production-quality open release in this line.

Two ways to read this: either feed-forward will absorb SDS the way splatting absorbed the NeRF backbone, or the two will stratify by use case — feed-forward for low-effort consumer image-to-3D, SDS-on-splats for high-fidelity asset creation. The corpus doesn't have head-to-head benchmarks across both regimes; that's the gap inference-cost-vs-fidelity tries to articulate.

A note on Feature 3DGS: the "distil a 2D foundation feature into the 3D Gaussian primitives" trick is orthogonal to both axes. It's worth flagging because it predicts a third direction — splatting representations as the substrate for 3D-aware foundation features generally, not just for view synthesis.

The dynamic side

The other half of this week was the dynamic / 4D line. Nerfies (Park et al., 2020) and HyperNeRF (Park et al., 2021) are the deformable-NeRF parents — per-observation warps of a canonical field. Dynamic View Synthesis from Dynamic Monocular Video (Gao et al., 2021) is the monocular companion. K-Planes (Fridovich-Keil et al., 2023) factorises 4D into six explicit feature planes — a tractable alternative to a 4D MLP. MERF (Reiser et al., 2023) is the memory-efficient variant for unbounded scenes (it's strictly static, but its tri-plane plus sparse 3D grid structure is what K-Planes generalises to time).

4D Gaussian Splatting (Wu et al., 2023) and 4D-Rotor Gaussian Splatting (Duan et al., 2024) are the splatting analogues — the same "explicit primitives, cheap rasterisation" trade-off applied to the dynamic case. Neither paper benchmarks at the scale or scene complexity that monocular dynamic capture in the wild would demand. That gap is the dynamic-reconstruction-scale question.

What to read next

Two follow-ups: (1) anything that benchmarks feed-forward (LRM/LGM/TripoSR) against SDS-on-splats head to head on the same prompts and (2) anything that pushes 4D Gaussian Splatting past short clips into minute-scale dynamic capture. Neither sub-area is solved.