This work is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made to the text in this document.
Citation
If you use this work in your research, please cite it as:
Deybach, N. (2025). Complete explanations on the gradient descent for a 3D gaussian ray traced, 3D reconstruction algorithm.
https://github.com/ndeybach/gaussian-gradient-descent-3DGRT
Gradient Descent, 3D Gaussian, Ray Tracing, Gaussian Splatting, 3DGRT, 3DGUT
Acknowledgments
Special thanks to NVIDIA researchers[1] (and their 3DGRT paper), the gaussian splatting community and open-source contributors who have allowed me by their prior contribution to reach the following explanations by their prior work.
I am also deeply thankful to the gsplat team [2] for their 3DGS paper. It is the easiest and most well explained gaussian splatting descent that I have read. Arriving in the field without prior knowledge, it was the main source of information that allowed me to come up with the 3D gaussian descent.
Coming from a background in theoretical physics and industrial engineering, and an internship in project management at an industrial water treatment site in Africa, going into the gaussian splatting/radiance fields/computer vision domain was not the most obvious of transitions...
I did computer vision in my high school days on opencv and robotics, and I am also an avid programmer (cut my teeth on C++/Qt, python, javascript/node, and of course bash/linux managing my own server 😁). However, getting back to understanding AI-related algorithms, matrix derivations, encoders, optimisers or Jacobian matrices or the like has not been the smoothest of experiences 👾. Especially as the field is usually the home turf of more senior computer vision artists and researchers already deep in the weeds of brand new shiny techs 😎.
However, I once came across an article by leveleighty that mentioned [gaussian splatting] (https://80.lv/articles/an-ultimate-free-beginner-s-course-on-unreal-engine-5-unveiled/). Intrigued by the technology, I was slowly but surely drawn into it. And in September 2024, when I was done with my final internship in Africa, I decided to give up all job offers and go all in on the radiation fields. Myself and two other associates decided to explore the field and see what we could learn and build with it.
3DGRT
One of the paths we explored was 3D Gaussian Ray Traced (3DGRT for intimates) [1:1]. It is an improvement on the original 3DGS idea from INRIA [3]. The techniques require hardware accelerated ray tracing GPUs to be feasible in real time, but the improvements in visual fidelity are massive. So we were interested in how we could use it. However, the code was never released. And as a learning exercise, understanding and implementing it seemed like a good way to dive headfirst into the field in November 2024.
What this article is all about
A significant challenge encountered was the absence of a comprehensive explanation within the 3DGRT paper [1:2] concerning the gradient descent and the overall parametrization of the Gaussian in 3D.
Even in the field of gaussian splatting at large, I at first did not find a well-made explanation of gaussian splatting algorithms before discovering the gsplat paper[2:1] and a rather well-made global explanation of the gaussian splatting mathematical parametrization and relation between the gradients. But even it had incoherences inside it, sometimes changed notations and overall just did not describe the 3DGRT gaussian parametrization. I thus started to learn here and there things, destroyed and later rebuilt the mathematics of gaussian to form a parametrization and backpropagation suited for 3DGRT since it was not described by NVIDIA researchers.
Despite promising results, internal considerations have made us diverge from the 3DGRT route. However, I would like to give back to the radiance field community as much as I can, since it allowed me to learn much from such an incredible technique. I hope the following explanation of the gradient descent in 3DGRT (and similarly gaussian ''splatting'') is as clear and understandable as possible and will help others after me.
Ray tracing explanations
3DGRT and ray tracing is best explained in [1:3]NVIDIA researchers's paper - arXiv:2407.07090, and I do not think I could do better on that part. Thank you to the amazing team of researchers at NVIDIA for a "newbie compatible" explanation 👌.
↪ 3DGRT ray tracing and BVH collisions explanation by NVIDIA paper
What is however not described or explained in any form (mathematically or programmatically), is the gradient descent required for the new paradigm. And without it you cannot compute efficiently the gaussian reconstruction.
I will therefore detail in this publication the whole gradient descent applicable for a fully 3D gaussian reconstruction of 3D scenes.
General explanations on Gaussian Splatting
Gaussian Splatting is a novel approach to 3D scene representation and rendering that has gained significant attention for its speed and visual quality. At its core, it represents a 3D scene as a collection of 3D Gaussian functions (or "splats") rather than using traditional triangle meshes or voxel grids.
Core Concept
A 3D scene is represented as a point cloud where each point is replaced by a 3D Gaussian function. Each Gaussian is characterized by:
Position (μn3D): The center point of the Gaussian in 3D space
Covariance matrix (Σn3D): Defines the shape, orientation, and extent of the Gaussian and is generally decomposed into two fundamental parameters:
Rotation (Rn): A 3×3 rotation matrix typically derived from a quaternion (qn) that defines the orientation of the Gaussian
Scale (Sn): A diagonal 3×3 matrix containing the scaling factors along the principal axes
The covariance is constructed as: Σn3D=RnSnSnTRnT (further explained later in this article)
Color (cn): The RGB color values (can be view-dependent through spherical harmonics, more details later)
Opacity (on): Controls the transparency of the Gaussian
These Gaussians act as "splats" of color and density that collectively reconstruct the scene when properly positioned and parameterized.
The traditional Gaussian Splatting pipeline is well explained in this learnopencv article and should give anyone a good first understanding of the classical overall pipeline (without gradient descent specifics). I highly encourage anyone not already familiar with gaussian splatting and its overall ''classical'' technique to read it before reading this article!
It explains well the process of rasterizing the 3D gaussian to 2D "screenspace" gaussian for computing reasons.
However, these approximations (and others such as 2D sorting, first-order Jacobians, hit direction approximations, etc.) lead to many small visual artifacts or aberrations.
Many of which NVidia researchers have addressed in 3DGRT and its subsequent 3DGUT papers thanks to a ray traced, fully 3D pipeline. This provides more accurate rendering, especially for transparent media, reflections, and (if extended) complex lighting effects, but at the cost of higher computational complexity and hardware acceleration needs if real-time performance is needed.
Diagram of the gaussian Ray Traced gradient descent
Since you already learned about general gaussian splatting, I will now dive head first into this article's main problem. The following diagram introduces the gradient descent and overall parameter calculation in the gaussian splatting pipeline. If it is overwhelming at first, do not worry, further step-by-step explanations will follow. If need be, print the diagram and look at it whenever you need 😉.
↪ Fully 3D gaussian splatting gradient descent diagram and parameters computation
Parameters Explanations
Ci: Pixel color or intensity contribution computed by integrating the contributions of Gaussians along a ray R. Ci∈R3 for RGB color space.
cn(k): Color vector of the 3D Gaussian splat n, represented in RGB or other feature domain, cn(k)∈R for each RGB channel k. Can optionally depend on direction of the ray for better reproduction (spherical harmonics is an example technique that can be used).
αn: alpha (opacity) for the Gaussian splat n at the effective position along the ray and relative to the gaussian (weighted by the gaussian distribution), αn∈R for the optimization/compute domain, mapped to R0+ for rendering and regularization.
on: Opacity scalar controlling the transparency of Gaussian splat n, on∈R for the optimization domain, mapped to [0,1] for rendering and regularization (or if we look precisely at implementation dependent domain it should be [pruning_threshold,max_threshold], with pruning_threshold under which gaussians are pruned (deleted from scene forever), like 0.01 for example because they do not contribute enough anymore; and max_threshold to allow gaussians behind others to still contribute).
Gn3D: 3D Gaussian function represented by its parameters (μn,Σn), where μn is the mean and Σn is the covariance. Gn3D:R3→R+.
Σn3D: 3D covariance matrix parameterizing the orientation, shape, and size of the Gaussian in 3D space, Σn3D∈S++3 (positive definite symmetric matrices of dimension 3).
σn: Shape parameter controlling the extent of the Gaussian splat n in 3D space, σn∈R+, influencing its contribution to Gn3D and affecting the overall density profile. It defines how "spread out" or "compact" the Gaussian appears along each axis.
μn3D: 3D mean (center) position of the Gaussian splat in world coordinates, μn3D∈R3.
Mn: Transformation matrix that combines rotation and scaling to define each Gaussian's orientation and size in the splatting process, Mn∈R3×3.
Rn: Rotation matrix derived from the quaternion that orients the 3D Gaussian in world coordinates, Rn∈SO(3) (special orthogonal group in 3D).
Sn: Scale matrix representing the anisotropic scaling of the Gaussian in each spatial direction, Sn∈R++3 (positive diagonal matrices of dimension 3).
sni: Scale factor for a specific axis (x, y, or z) of Gaussian splat n, sni∈R+.
qn: Quaternion representing the orientation of the Gaussian splat in 3D space, qn∈H1 (unit quaternions, isomorphic to S3).
qni: Components of the quaternion qn=[qw,qx,qy,qz], where each qni∈R with the constraint ∥qn∥=1.
Indices
i: Pixel index in the image or ray index during tracing. There could be one or more ray per pixel, usually if one for one, ray i is equivalent to pixel i.
k: Color channel index (e.g., R, G, B) for the Gaussian splat's color representation.
n: Index for the Gaussian splats contributing to the ray.
Intermediary Parameters and Gradients for 3D Gaussian Ray-Traced Rendering
Parameters Definitions
This chapter will detail how each parameter is obtained (without too much details, these are available further down). It is important to note that all "base" parameters (gaussian center: μn3D;gaussian center: qn;gaussian scaling: sn;gaussian opacity: on;gaussian color: cn), are for gradient traversal reasons in R space (see ).
1. Color Contribution along a Ray
During the traversal of a ray, each gaussian contribution is taken into account (colorwise for now) by:
Ci(k)=n∈N∑cn×αn×Tn
Where:
Ci(k) represents the color contribution along ray i for color channel k.
N is the set of Gaussians encountered along ray i.
cn is the color value at sample point n.
To make the color more realistic, it is possible to have the color be dependent on the direction of the incoming ray. One of the techniques commonly used is to use spherical harmonics. The more degrees the spherical harmonic has, the more direction dependent the result will be (with a trade off of a higher computation cost). The basic formula of the spherical harmonics is (and as described in [1:4]): cn(k,d)=ϕβ(d)=f(ℓ=0∑ℓmaxm=−ℓ∑ℓβℓmYℓm(d))
where f is the sigmoid function to normalize the colors.
αn is the opacity value at sample point n, computed as: αn=on(1−exp(−σn))
where on is the base opacity and σn controls the spread.
Tn is the accumulated transparency, accumulating the complement of opacity of each gaussian between the camera's pixel i up to the gaussian crossed n. Thus measuring something akin to the the transmittance allowed by the gaussians in front of gaussian n.
The equation computes how each sample point along a ray contributes to the final color by multiplying its color value with its opacity and accumulated transparency. The sum over N aggregates all these contributions along the ray path.
2. Transparency Accumulation
Along the ray, each effective opacity αj is accumulated into a transparency intermediary that we denote as Tn:
where Δn3D=x−μn3D represents the offset from the Gaussian mean. Considering the rays hit in a continuous manner, x represents the most effective hit position. You could also integrate over the ray path for a better result that accounts for densities superpositions but it would probably need a different primitive with more easily integrable densities.
In the scope of gaussian ray tracing x can be obtained from the intrinsics (R and S) of the gaussian (noted later n) which polyhedron was hit by the ray i descent's (as described in NVIDIA's paper): x=o+τmax×d
Where:
on∈R3 represents the origin of the ray hitting the gaussian n.
τmax is the length of distance to the point of maximum response of Gaussians encountered along ray i. (3x1 position). It is defined by the paper (for ease of computation and first order approximation) as: τmax=dn⊤Σn−1dn(μn−on)⊤Σn−1dn=dng⊤dng−ong⊤dng
where ong=Sn−1RnT(o−μ) and dng=Sn−1RnTd.
dn∈R3 is the normalized direction vector of the ray hitting the gaussian n.
All those factors are obtained by any standard ray tracing algorithm (such as optix in the case of nvidia). The only other relevant element that is specific to the ray traced rendering is how you create the vertices of the polyhedron. In the case of NVIDIA's choice, they implement it with a stretched icosahedron
4. Covariance Matrix
The covariance matrix is a combination of the rotation matrix and scale matrix such as: Σn3D=RnSnSn⊤Rn⊤=RnSnSnRn⊤=MnMn⊤
Since Sn is a symmetric, diagonal matrix.
5. Scale Matrix
Sn=diag(sn1,sn2,sn3)
6. Rotation Matrix (Quaternion Representation)
Given a quaternion qn=[w,x,y,z], the rotation matrix Rn is: Rn=1−2(y2+z2)2(xy+wz)2(xz−wy)2(xy−wz)1−2(x2+z2)2(yz+wx)2(xz+wy)2(yz−wx)1−2(x2+y2)
Gradients
In this section, I will provide a complete overview of the gradients necessary for the 3D Gaussian Ray Traced reconstruction algorithm. We'll examine each gradient component in detail, starting with the initial loss function and working through the chain rule to derive gradients for all Gaussian parameters.
1. Gradient on initial loss
Gradient on Initial Loss
In the context of 3D Gaussian Ray-Traced Reconstruction, the initial loss function is designed to balance pixel-wise accuracy with structural fidelity. The corrected initial loss function is defined as:
L=(1−λ)L1+λLD-SSIM
Breakdown of the Loss Function Components
L:
This is the overall loss that the optimization process aims to minimize. It quantifies the discrepancy between the reconstructed 3D model and the observed rendering.
Minimizing L ensures that the reconstructed model closely matches the target rendering and a correct visual reconstruction.
L1:
Represents the primary loss metric, focusing on the intensity-based error between the reconstructed image and the ground truth. Common implementations include:
Mean Squared Error (MSE): LMSE=N1i=1∑N(Ci−C^i)2
Mean Absolute Error (MAE): LMAE=N1i=1∑N∣Ci−C^i∣
where:
N is the number of pixels or rays.
Ci is the predicted color/intensity for the i-th pixel or ray.
C^i is the ground truth color/intensity for the i-th pixel or ray.
I would advise on MSE for a quicker convergence, but learning rates would need to be more finely tuned for the material at hand. For a general, tried and true approach, prefer MAE.
It represents the primary loss metric, or more physically speaking, the intensity-based error. It can be implemented with multiple algorithms. Examples are mean squared error (MSE) or mean absolute error (MAE). In some contexts, this can be viewed as the straightforward "pixel-wise" or "ray-wise" difference from a reference/ground truth pixel/ray.
It ensures that the reconstructed image does not deviate significantly from the observed rendering at the local (pixel or ray) level, providing a foundational accuracy constraint.
LD-SSIM:
It represents the Structural Dissimilarity component of the loss. It quantifies the structural differences between the reconstructed image and the ground truth using the Structural Similarity Index Measure (SSIM): LD-SSIM=1−SSIM(C,C^)
where:
SSIM(C,C^) measures the similarity between two images, considering luminance, contrast, and structure.
While L1 ensures pixel-wise accuracy, LD-SSIM preserves the structural integrity of the reconstructed image. This is crucial for maintaining the perceptual quality and ensuring that important features and textures are accurately represented at the regional level. The method to achieve this is a bit outside the scope of this publication and I invite the reader to read relevant documentation on the subject.
λ∈[0,1]:
A weighting parameter that balances the influence of the primary loss L1 against the structural dissimilarity loss LD-SSIM.
By adjusting λ, one can control the trade-off between achieving high pixel-wise accuracy and preserving structural features. λ=0.2 is a quite safe starting value.
By computing the gradient ∂Ci∂L and backpropagating it through the network of Gaussian parameters, the optimization process iteratively adjusts the model to minimize the loss, resulting in a high-quality 3D reconstruction.
Derivative of the Loss Function with Respect to Ci
To perform gradient descent optimization, we need to compute the derivative of the loss L with respect to each pixel or ray contribution Ci:
∂Ci∂L=(1−λ)∂Ci∂L1+λ∂Ci∂LD-SSIM.
∂Ci∂L1:
For MSE: ∂Ci∂LMSE=N2(Ci−C^i)
For MAE: ∂Ci∂LMAE=N1⋅sign(Ci−C^i)
∂Ci∂LD-SSIM:
The derivative of the SSIM-based loss is more complex and involves the partial derivatives of the SSIM index with respect to Ci.
This term ensures that changes in Ci not only reduce pixel-wise errors but also enhance structural similarity.
Both can be automated with most compute frameworks and I will not dive further on this aspect. A typical code for this would be something like this with pytorch:
import torch
import torch.nn.functional as F
from ssim import ssim # use library of your choicedefcompute_loss_gradient(gaussian_render_image, target_image):# Normalize inputs between 0 and 1 (if 8bit images, divide by max_value)
gaussian_render_image = gaussian_render_image / max_value
target_image = target_image / max_value
# Prepare for gradient computation, mark as the variable to do the gradient w.r.t
gaussian_render_image = gaussian_render_image.requires_grad_(True)# Compute error metrics
error_metric1 = F.l1_loss(gaussian_render_image, target_image)# Compute similarity-based loss
disimilarity_loss =1.0- ssim(
gaussian_render_image,
target_image
)# Combine losses with adjusted weights
total_loss =0.8* error_metric1 +0.2* disimilarity_loss
# Compute the gradient
total_loss.backward()# Get and normalize gradient
gradient = gaussian_render_image.grad / gaussian_render_image.grad.norm()return gradient
where Sn(k) represents the accumulated (back to front since it is a backward traversal) contribution of samples along the ray. It is also expressed as: Sn(k)=m>n∑ci(k)αmTm
If Sn(k) is computed after a forward pass (thus you have the value of Ci(k) of the color at pixel i), it can also take the value of: Sn(k)=Ci(k)−m≤n∑cm(k)αmTm
This is usefull because in ray tracing the ray ''standard'' evolution is from the camera plane up to the far plane/end of frustrum (infinity, or closer for computational purposes). This would require first tracing a ray, storing the values of each "hit", then compute the product. It would consume memory and reduce efficiency with a backward loop. A rewrite in the direction of the ray is thus usefull to save on compute ressources.
3. Gradient of Transparency
If we take a look at the gradient of transparency w.r.t. opacity we have:
∂on∂αn=exp(−σn)
And w.r.t. 3D Gaussian function we have: ∂Gn3D∂αn=on
4. Gradient of Gaussian Function
Relative to the shape parameter of the gaussian function σn: ∂σn∂Gn3D=−exp(−σn)=−Gn3D
5. Gradient of shape parameter Covariance Matrix
Relative to the 3D mean dependent gradient μn3D: ∂μn3D∂σn=−(Σn3D)−1Δn3D
where Δn3D=x−μn3D represents the offset from the Gaussian mean.
Relative to the 3D covariance Σn3D: ∂Σn3D∂σn=−21(Σn3D)−1Δn3D(Δn3D)T(Σn3D)−1
6. Gradient of the gaussian transformation matrix
∂Mn∂L=∂Σn3D∂LMn+(∂Σn3D∂L)TMn
7. Gradient of Loss with Respect to Quaternion
Let qn=[wn,xn,yn,zn], and the general case of q=[w,x,y,z], then: ∂w∂R∂x∂R∂y∂R∂z∂R=20z−y−z0xy−x0=20yzy−2xwz−w−2x=2−2yx−wx0zwz−2y=2−2zwx−w−2zyxy0
8. Gradient of Scale
For Sn=diag(sn1,sn2,sn3): ∂sni∂S=δij
where δij is the Kronecker delta, defined as: δij={10if i=jif i=j
(As an example, for the 3×3 case as in the 3D gaussian descent, this is equivalent to:) 100010001
Full Gaussian Parameterization details
In this section I will detail a bit more some
1. 3D Gaussian Function
The 3D Gaussian function is: Gn3D(x)=exp(−21(x−μn3D)T(Σn3D)−1(x−μn3D))
where m<n ensures that compositing respects the order of contributions of the Gaussians: accumulated from the camera pixel up to gaussian n alogn the ray.
In ray tracing the ray ''standard'' evolution is from the camera plane up to the far plane/end of frustrum (infinity, or closer for computational purposes).
Explicit Chain Rules for Gradient Descent
This chapter will detail all the gradients of the 5 main parameters.
Gradient Propagation to 3D Parameters intermediary steps explanations
The steps at the transformation matrix of the gradient descent are not so straightforward and I think a little more detailled explaination is appreciated.
First we Chain rule leading to the 3D covariance matrix gradient: ∂Σn3D∂L=∂Ci∂L⋅∂αn∂Ci⋅∂Gn3D∂αn⋅∂σn∂Gn3D⋅∂Σn3D∂σn=(same factor)∂Ci(k)∂L⋅∂αn∂Ci(cn(k)Tn−1−αnSn(k))⋅∂Gn3D∂αn(on)⋅∂σn∂Gn3D(−Gn3D)⋅∂Σn3D∂σn(−21(Σn3D)−1Δn3D(Δn3D)T(Σn3D)−1)
Then we can obtain the gradient of Loss with respect to transformation matrix Mn from there: ∂Mn∂L=∂Σn3D∂LMn+(∂Σn3D∂L)TMn=(∂Ci∂L⋅∂αn∂Ci⋅∂Gn3D∂αn⋅∂σn∂Gn3D⋅∂Σn3D∂σn)Mn+(∂Ci∂L⋅∂αn∂Ci⋅∂Gn3D∂αn⋅∂σn∂Gn3D⋅∂Σn3D∂σn)TMn
Which can be eventually expended (with minus simplifications) as: ∂Mn∂L=(∂Σn3D∂L)Mn+(∂Σn3D∂L)TMn=[(depends on loss)∂Ci(k)∂L∂αn∂Ci(cn(k)Tn−1−αnSn(k))∂Gn3D∂αn(on)∂σn∂Gn3D(−Gn3D)∂Σn3D∂σn(−21(Σn3D)−1Δn3D(Δn3D)T(Σn3D)−1)]Mn+[(same factor)∂Ci(k)∂L∂αn∂Ci(cn(k)Tn−1−αnSn(k))∂Gn3D∂αn(on)∂σn∂Gn3D(−Gn3D)∂Σn3D∂σn(−21(Σn3D)−1Δn3D(Δn3D)T(Σn3D)−1)]TMn.
Proof for this step can be derived with the help of the Matrix CookBook. See later paragraph.
Parameters modifications with respects to the graidents
Parameters Modifications with Respect to the Gradients
Gradient Descent Update Rule
To optimize the 5 primary Gaussian parameters, we apply gradient descent as follows:
qit+1=qit−λ∂qi∂L
where:
qit is the current value of the parameter at iteration t.
qit+1 is the updated parameter after the gradient step.
λ is the learning rate controlling the step size.
∂qi∂L is the computed gradient of the loss function with respect to the parameter.
This iterative update adjusts each parameter in the direction that minimizes the reconstruction error.
Parameter Updates
During training, some points are to be noted.
Firstly, the parameters (hereby noted generically θ) should be updated generally in the form of:
θt+1=θt−λ∂θ∂L
With:
θt+1 is the updated parameter value at iteration t+1
θt is the current parameter value at iteration t
λ is the learning rate controlling the step size
∂θ∂L is the gradient of the loss function with respect to parameter θ
This allows converging toward the lesser difference between the grount truth (GT, or input images) and the current 3D modelisation raster obtained from the gaussians.
Secondly, there are some conciderations on each parameters to keep a physically, updatable and normalized update flow of the parameters. Mainly, the parameters in the gradient optimizer (I recommend to follow traditional gsplat implementation of adam or derivative as a bases that works) should be mapped to [−∞,+∞] to take advantage of unbounded real values and the dynamic range of float32 types used during training, allowing for a more stable gradient flow.
Thirdly, the learning rates should be determined empirically in a further study if possible. At first glance taking similar order as gsplat or similar 2D gaussian would seem to work.
The first two concerns are detailled in the following subsections for each main parameter update as much as possible.
1. Position Update (μn3D)
The 3D position update follows:
μn3D,t+1=μn3D,t−λ∂μn3D∂L
This moves the Gaussian center in 3D space toward an optimal position that reduces the loss function.
2. Rotation Update (Quaternion) (qn)
Since quaternions represent rotations, we must normalize them after applying gradient descent:
qnt+1=normalize(qnt−λ∂qn∂L)
where the normalization ensures the updated quaternion remains on the unit hypersphere:
normalize(q)=∥q∥q
This maintains numerical stability while optimizing rotation in SO(3).
3. Scale Update (sn)
To ensure positive scaling, we parameterize the scale factors as exponentials:
sn=es^n
which transforms the update into:
snt+1=snt−λ∂sn∂L
This guarantees that the scale remains strictly positive during optimization.
4. Opacity Update (on)
Opacity is updated using:
ont+1=ont−λ∂on∂L
Since the normalized opacity o^n ([0,1] normalized, not the compute opacity on∈[−∞,∞] used in the optimizer) must remain in [0,1], you can apply clipping. Use 0,999 or a similar not 1 value can be used to allow for the gradient descent to still evolve and not be stuck because the splat becomes opaque and nothing behind can evolve: o^nt+1=clip(o^nt+1,0,0.999)=clip(sigmoid(ont+1),0,0.999)
5. Color Update (cn)
Color values are updated similarly:
cnt+1=cnt−λ∂cn∂L
NB:
To ensure stability in SDR and HDR or other spaces imaging, colors should be first clipped within a standard, common range. Such that with a [0,1] space: c^n=clip(cnraw,0,1)
With:
cnraw the color as inputed (for exemple as 8bit SDR only values, you would divide by 255 cnraw to clip it to a [0,1] space).
c^n the intermediary, normalized color
And then during training you can work in the full R space to take full advantage of the dynamic range of your float32 gradient descent (as mentionned earlier): cn=logit(c^n)=log(1−c^nc^n)
With cn the color as used by the gradient descent optimizer.
This takes advantage of the logit function (inverse sigmoid) to work with unbounded real values. Allowing for more stable gradient flow.
And colors are then rendered using the sigmoid + unclipping (back to your original range of values, ie: ×255 for SDR 8bit content) back to original space: cnrendered=unclip(sigmoid(cn))=unclip(1+e−cn1)
Gradient descent algorithm quick flow and summary table
For each iteration:
Compute gradients ∂θ∂L for each parameter θ.
Update each parameter using the rules above.
Normalize the quaternion qn to maintain unit norm.
Ensure constraints for positive scales sn, valid opacities on, and correct color range.
Iterate until convergence.
Here is a summary table for a general overview of what was in this chapter:
Parameter
Update Rule
Position(μn3D)
μn3D,t+1=μn3D,t−λ∂μn3D∂L
Rotation(qn)
qnt+1=normalize(qnt−λ∂qn∂L)
Scale(sn)
snt+1=snt−λ∂sn∂L
Opacity(on)
ont+1=ont−λ∂on∂L
Color(cn)
cnt+1=cnt−λ∂cn∂L
Result example:
The following image is the result of an optix + cuda + pytorch algorithm following the implementation described earlier. It did not implement splitting/pruning and is only the gradient descent described above:
For legal reasons I cannot publish the associated code, as I do not have the consent of all parties involved. However, the mathematics described here is an adaptation to 3DGRT of already published mathematics made by me and myself alone (and obviously built upon the shoulders of giants fully disclosed in references and links).
Conclusion
It is my hope that this detailed explanation of the gradient descent of 3DGRT will prove useful to those new to the field or wishing to build upon 3DGRT. A preliminary review suggests that the gradient descent method may also be applicable to other papers, including EVER [4].
In my opinion, it serves as an excellent introduction to the mathematical principles underlying Gaussian splatting in general. It even covers the traditional technique with 3D to 2D steps (including projection to screen space and Jacobean in the gradient descent). Some minor adjustments would be required to extend this article. However, this falls outside the scope of my current work and I encourage anyone interested to either add it or contact me to extend the current version of this article with it.
The results obtained without the use of advanced techniques (such as running, splitting, kernel filtering, etc.) appear to confirm the working basics for the gradient descent. I am hopeful that the mathematics in this article will be made open source in the future and that ray tracing will become more accessible in terms of hardware in the coming years.
Nota Bene:
If any error in this article is found, feel free to contact the author for a revision to be made (contacts at the beginning).
Annexes
Annex: Gradients and other mathematical proofs:
In this section, I present detailed mathematical proofs for the gradient formulations described earlier. These proofs are essential for understanding the derivation of update rules in the 3D Gaussian Ray-Traced reconstruction algorithm.
Proof 1: Gradient of Color Contribution
We begin by proving the gradients of the color contribution with respect to color and opacity parameters.
Gradient with respect to opacity αn:
For the derivative with respect to αn, the situation is more complex because αn affects not only its own term but also the transparency factors Tm for m>n (recall that n=0 is back of the frustrum and n=N is at the last gaussian before camera plane).
Starting with:
Ci(k)=m∈N∑cm(k)×αm×Tm
We separate this into terms before, at, and after index n:
For m<n, the terms are independent of αn, so their derivatives are zero.
For m=n: ∂αn∂(cn(k)×αn×Tn)=cn(k)×Tn
For m>n, we need to consider that Tm depends on αn through the transparency accumulation. The resulting derivative is:
∂αn∂(cm(k)×αm×Tm)=cm(k)×αm×1−αn−Tm
Let's derive this more explicitly:
First, recall the definition of accumulated transparency: Tm=j=1∏m−1(1−αj)
We can separate this product to isolate the term containing αn: Tm=(j=1∏n−1(1−αj))×(1−αn)×(j=n+1∏m−1(1−αj))=Tn×(1−αn)×j=n+1∏m−1(1−αj)
Now we take the partial derivative with respect to αn: ∂αn∂Tm=∂αn∂[Tn×(1−αn)×j=n+1∏m−1(1−αj)]=−Tn×j=n+1∏m−1(1−αj)
To simplify this further, we note that: Tm⇔j=n+1∏m−1(1−αj)=Tn×(1−αn)×j=n+1∏m−1(1−αj)=Tn×(1−αn)Tm
Substituting this back into our derivative: ∂αn∂Tm=−Tn×Tn×(1−αn)Tm=1−αn−Tm
Thus, for m>n: ∂αn∂(cm(k)×αm×Tm)=cm(k)×αm×1−αn−Tm
Combining all terms: ∂αn∂Ci(k)=cn(k)×Tn−m>n∑1−αncm(k)×αm×Tm
Recognizing that Sn(k)=∑m>ncm(k)×αm×Tm is the accumulated contribution of samples after n:
∂αn∂Ci(k)=cn(k)×Tn−1−αnSn(k)
which matches the previous gradient formulation.
Proof 2: Gradient of Shape Parameter with Respect to the inverse Covariance
Let Δn3D:(R3,R3)→R3,Δn3D(x,μn3D)=x−μn3D and σn:(R3,R3×3)→R3,σn(Δn3D,(Σn3D)−1)=21Δn3DT(Σn3D)−1Δn3D:
Since (Δn3D)⊤(Σn3D)−1 is a 1×3 vector (row vector) and (Σn3D)−1Δn3D is a 3×1 vector (column vector), and both represent the same mathematical quantity in transposed form (since Σn3D is symmetric), we can simplify:
∂μn3D∂σn=−(Σn3D)−1Δn3D
Therefore: ∂μn3D∂σn=−(Σn3D)−1Δn3D
Proof 4: Gradient of the form factor σn w.r.t Gaussian Transformation Matrix (Mn)
Which if we apply the inner product on Σn3D w.r.t itself we obtain: ∂Σn3D∂σn=−(Σn3D−1)⊤∂Σn3D−1∂σnH(Σn3D−1)⊤=−(Σn3D−1)⊤H21Δn3D(Δn3D)⊤(Σn3D−1)⊤
Proof 5: Gradient of Loss w.r.t Gaussian Transformation Matrix (Mn)
We need to derive ∂Mn∂L based on the chain rule.
Starting with the relationship Σn3D=MnMnT, we need to apply the chain rule: ∂Mn∂L=∂Σn3D∂L∂Mn∂Σn3D
To find ∂Mn∂Σn3D, we differentiate Σn3D=MnMn⊤.
By using the following rules on the Frobenius inner product:
Proof 6: Gradient with Respect to Quaternion Parameters
For the gradient with respect to quaternion parameters, we need to derive how changes in quaternion components affect the rotation matrix Rn, which then influences the transformation matrix Mn.
Given a quaternion qn=[w,x,y,z], the rotation matrix Rn is defined as: Rn=1−2(y2+z2)2(xy+wz)2(xz−wy)2(xy−wz)1−2(x2+z2)2(yz+wx)2(xz+wy)2(yz−wx)1−2(x2+y2)
Taking the derivative with respect to each quaternion component (w, x, y, z) requires differentiating each element of this matrix.
For example for the component w, the derivation involves calculating ∂w∂Rij for each element Rij of the rotation matrix. For instance: ∂w∂R12=∂w∂(2(xy−wz))=−2z
Similar calculations for the other elements lead to the complete derivative matrix.
For the component w: ∂w∂Rn=20z−y−z0xy−x0
The derivation involves calculating ∂w∂Rij for each element Rij of the rotation matrix. For instance: ∂w∂R12=∂w∂(2(xy−wz))=−2z
Similar calculations for the other elements lead to the complete derivative matrix.
For the component x: ∂x∂Rn=20yzy−2xwz−w−2x
For the component y: ∂y∂Rn=2−2yx−wx0zwz−2y
For the component z: ∂z∂Rn=2−2zwx−w−2zyxy0
These derivatives are then used in the chain rule to compute ∂qi∂L as shown earlier.
Kerbl, B., Kopanas, G., Leimk{"u}hler, T., & Drettakis, G. (2023). 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42(4). Link↩︎
Mai, A., Hedman, P., Kopanas, G., Verbin, D., Futschik, D., Xu, Q., Kuester, F., Barron, J. T., & Zhang, Y. (2024). EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis. arXiv:2410.01804. Link↩︎