Complete explanations on the gradient descent for a 3D gaussian ray traced, 3D reconstruction algorithm.

General publication information:

Author: Nils Deybach
Date: march 19th 2024
Version: 1.0.3

Contact Information

License

This work is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to:

Under the following terms:

Citation

If you use this work in your research, please cite it as:

Deybach, N. (2025). Complete explanations on the gradient descent for a 3D gaussian ray traced, 3D reconstruction algorithm.
https://github.com/ndeybach/gaussian-gradient-descent-3DGRT

Repository

The source code of the article and html are available at:
github.com/ndeybach/gradient-descent-3DGRT

Keywords

Gradient Descent, 3D Gaussian, Ray Tracing, Gaussian Splatting, 3DGRT, 3DGUT

Acknowledgments

Special thanks to NVIDIA researchers[1] (and their 3DGRT paper), the gaussian splatting community and open-source contributors who have allowed me by their prior contribution to reach the following explanations by their prior work.

I am also deeply thankful to the gsplat team [2] for their 3DGS paper. It is the easiest and most well explained gaussian splatting descent that I have read. Arriving in the field without prior knowledge, it was the main source of information that allowed me to come up with the 3D gaussian descent.


Table of content

Introduction and Purpose of this article

Personal background before radiance fields

Coming from a background in theoretical physics and industrial engineering, and an internship in project management at an industrial water treatment site in Africa, going into the gaussian splatting/radiance fields/computer vision domain was not the most obvious of transitions...

I did computer vision in my high school days on opencv and robotics, and I am also an avid programmer (cut my teeth on C++/Qt, python, javascript/node, and of course bash/linux managing my own server 😁). However, getting back to understanding AI-related algorithms, matrix derivations, encoders, optimisers or Jacobian matrices or the like has not been the smoothest of experiences 👾. Especially as the field is usually the home turf of more senior computer vision artists and researchers already deep in the weeds of brand new shiny techs 😎.

However, I once came across an article by leveleighty that mentioned [gaussian splatting] (https://80.lv/articles/an-ultimate-free-beginner-s-course-on-unreal-engine-5-unveiled/). Intrigued by the technology, I was slowly but surely drawn into it. And in September 2024, when I was done with my final internship in Africa, I decided to give up all job offers and go all in on the radiation fields. Myself and two other associates decided to explore the field and see what we could learn and build with it.

3DGRT

One of the paths we explored was 3D Gaussian Ray Traced (3DGRT for intimates) [1:1]. It is an improvement on the original 3DGS idea from INRIA [3]. The techniques require hardware accelerated ray tracing GPUs to be feasible in real time, but the improvements in visual fidelity are massive. So we were interested in how we could use it. However, the code was never released. And as a learning exercise, understanding and implementing it seemed like a good way to dive headfirst into the field in November 2024.

What this article is all about

A significant challenge encountered was the absence of a comprehensive explanation within the 3DGRT paper [1:2] concerning the gradient descent and the overall parametrization of the Gaussian in 3D.
Even in the field of gaussian splatting at large, I at first did not find a well-made explanation of gaussian splatting algorithms before discovering the gsplat paper[2:1] and a rather well-made global explanation of the gaussian splatting mathematical parametrization and relation between the gradients. But even it had incoherences inside it, sometimes changed notations and overall just did not describe the 3DGRT gaussian parametrization. I thus started to learn here and there things, destroyed and later rebuilt the mathematics of gaussian to form a parametrization and backpropagation suited for 3DGRT since it was not described by NVIDIA researchers.

Despite promising results, internal considerations have made us diverge from the 3DGRT route. However, I would like to give back to the radiance field community as much as I can, since it allowed me to learn much from such an incredible technique. I hope the following explanation of the gradient descent in 3DGRT (and similarly gaussian ''splatting'') is as clear and understandable as possible and will help others after me.

Ray tracing explanations

3DGRT and ray tracing is best explained in [1:3] NVIDIA researchers's paper - arXiv:2407.07090, and I do not think I could do better on that part. Thank you to the amazing team of researchers at NVIDIA for a "newbie compatible" explanation 👌.

3DGRT ray tracing and BVH collisions explanation by NVIDIA paper
\hookrightarrow 3DGRT ray tracing and BVH collisions explanation by NVIDIA paper

What is however not described or explained in any form (mathematically or programmatically), is the gradient descent required for the new paradigm. And without it you cannot compute efficiently the gaussian reconstruction.

I will therefore detail in this publication the whole gradient descent applicable for a fully 3D gaussian reconstruction of 3D scenes.

General explanations on Gaussian Splatting

Gaussian Splatting is a novel approach to 3D scene representation and rendering that has gained significant attention for its speed and visual quality. At its core, it represents a 3D scene as a collection of 3D Gaussian functions (or "splats") rather than using traditional triangle meshes or voxel grids.

Core Concept

A 3D scene is represented as a point cloud where each point is replaced by a 3D Gaussian function. Each Gaussian is characterized by:

Single splat with all 5 main parameters annotated

These Gaussians act as "splats" of color and density that collectively reconstruct the scene when properly positioned and parameterized.

The traditional Gaussian Splatting pipeline is well explained in this learnopencv article and should give anyone a good first understanding of the classical overall pipeline (without gradient descent specifics). I highly encourage anyone not already familiar with gaussian splatting and its overall ''classical'' technique to read it before reading this article!
It explains well the process of rasterizing the 3D gaussian to 2D "screenspace" gaussian for computing reasons.

However, these approximations (and others such as 2D sorting, first-order Jacobians, hit direction approximations, etc.) lead to many small visual artifacts or aberrations.
Many of which NVidia researchers have addressed in 3DGRT and its subsequent 3DGUT papers thanks to a ray traced, fully 3D pipeline. This provides more accurate rendering, especially for transparent media, reflections, and (if extended) complex lighting effects, but at the cost of higher computational complexity and hardware acceleration needs if real-time performance is needed.

Diagram of the gaussian Ray Traced gradient descent

Since you already learned about general gaussian splatting, I will now dive head first into this article's main problem. The following diagram introduces the gradient descent and overall parameter calculation in the gaussian splatting pipeline. If it is overwhelming at first, do not worry, further step-by-step explanations will follow. If need be, print the diagram and look at it whenever you need 😉.

graph RL %%{init:{'flowchart':{'nodeSpacing': 80, 'rankSpacing': 30}}}%% subgraph Loss Chain direction TB L["$$\mathcal{L}$$"] -.->|"$$\frac{\partial \mathcal{L}}{\partial C_i}$$"| C_i["$$C_i$$"] C_i --> L["$$\mathcal{L}$$"] C_i -.->|"$$\frac{\partial C_i}{\partial \alpha_n}$$"| alpha_n["$$\alpha_n$$"] alpha_n --> C_i C_i -.->|"$$\frac{\partial C_i}{\partial c_n}$$"| c_n["$$c_n(k)$$"] c_n --> C_i alpha_n -.->|"$$\frac{\partial \alpha_n}{\partial G^{3D}_n}$$"| G3D_n["$$G^{3D}_n$$"] G3D_n --> alpha_n alpha_n -.->|"$$\frac{\partial \alpha_n}{\partial o_n}$$"| o_n["$$o_n$$"] o_n --> alpha_n G3D_n -.->|"$$\frac{\partial G^{3D}_n}{\partial \sigma_n}$$"| sigma_n["$$\sigma_n$$"] sigma_n --> G3D_n sigma_n -.->|"$$\frac{\partial \sigma_n}{\partial \Sigma^{3D}_n}$$"| Sigma3D_n["$$\Sigma^{3D}_n$$"] Sigma3D_n --> sigma_n sigma_n -.->|"$$\frac{\partial G^{3D}_n}{\partial \mu^{3D}_n}$$"| mu3D_n["$$\mu^{3D}_n$$"] mu3D_n --> sigma_n Sigma3D_n -.->|"$$\frac{\partial \Sigma^{3D}_n}{\partial M_n}$$"| M_n["$$M_n$$"] M_n --> Sigma3D_n M_n -.->|"$$\frac{\partial M_n}{\partial R_n}$$"| R_n["$$R_n$$"] R_n --> M_n R_n -.->|"$$\frac{\partial R_n}{\partial q_n}$$"| q_n["$$q_n$$"] q_n --> R_n q_n -.->|"$$\frac{\partial q_n}{\partial q_i}$$"| q_i["$$q_{ni} \equiv (q_w,q_x,q_y,q_z)$$"] q_i --> q_n M_n -.->|"$$\frac{\partial M_n}{\partial S_n}$$"| S_n["$$S_n$$"] S_n --> M_n S_n -.->|"$$\frac{\partial S_n}{\partial s_n}$$"| s_n["$$s_{ni}$$"] s_n --> S_n style c_n stroke:orangered, fill:#ffe6de style o_n stroke:orangered, fill:#ffe6de style mu3D_n stroke:orangered, fill:#ffe6de style q_i stroke:orangered, fill:#ffe6de style s_n stroke:orangered, fill:#ffe6de %% linkStyle 2,4,6,8,10,12,14,16,18,20,22,24,26,28 stroke:red end subgraph Legend direction TB %%{init:{'flowchart':{'nodeSpacing': 50}}}%% subgraph Forward direction LR start1[ ] ---> stop1[ ] style start1 height:0px, stroke:red; style stop1 height:0px; end style Forward stroke-width:0, fill:none, stroke:none subgraph Backward direction LR start2[ ] -.-> stop2[ ] style start2 height:0px; style stop2 height:0px; end style Backward stroke-width:0, fill:none, stroke:none %% linkStyle 1 stroke:red; gparams["Gaussian parameter"] style gparams stroke:orangered, fill:#ffe6de; gparamsinter["Intermediary parameter"] end

\hookrightarrow Fully 3D gaussian splatting gradient descent diagram and parameters computation

Parameters Explanations


Indices


Intermediary Parameters and Gradients for 3D Gaussian Ray-Traced Rendering

Parameters Definitions

This chapter will detail how each parameter is obtained (without too much details, these are available further down). It is important to note that all "base" parameters (gaussian center: μn3D;gaussian center: qn;gaussian scaling: sn;gaussian opacity: on;gaussian color: cn\text{gaussian center: }\mu_n^{3D} ; \text{gaussian center: }q_n ; \text{gaussian scaling: }s_n ; \text{gaussian opacity: }o_n ; \text{gaussian color: }c_n), are for gradient traversal reasons in R\mathbb{R} space (see ).

1. Color Contribution along a Ray

During the traversal of a ray, each gaussian contribution is taken into account (colorwise for now) by:

Ci(k)=nNcn×αn×Tn\begin{equation} C_{i}(k) = \sum_{n \in N} c_n \times \alpha_n \times T_n \end{equation}

Where:

To make the color more realistic, it is possible to have the color be dependent on the direction of the incoming ray. One of the techniques commonly used is to use spherical harmonics. The more degrees the spherical harmonic has, the more direction dependent the result will be (with a trade off of a higher computation cost). The basic formula of the spherical harmonics is (and as described in [1:4]):
cn(k,d)=ϕβ(d)=f(=0maxm=βmYm(d))\begin{equation} c_{n}(k,d) = \phi_{\beta}(\mathbf{d}) = f \left( \sum_{\ell=0}^{\ell_{\max}} \sum_{m=-\ell}^{\ell} \beta_{\ell}^{m} Y_{\ell}^{m} (\mathbf{d}) \right) \end{equation}
where ff is the sigmoid function to normalize the colors.

The equation computes how each sample point along a ray contributes to the final color by multiplying its color value with its opacity and accumulated transparency. The sum over NN aggregates all these contributions along the ray path.

2. Transparency Accumulation

Along the ray, each effective opacity αj\alpha_j is accumulated into a transparency intermediary that we denote as TnT_n:

Tn=j=1n1(1αj)\begin{equation} T_n = \prod_{j=1}^{n-1} (1 - \alpha_j) \end{equation}

3. 3D Gaussian Function

Gn3D(x)=exp(12(xμn3D)T(Σn3D)1(xμn3D))Gn3D(x)=exp(12(Δn3D)T(Σn3D)1(Δn3D))=exp(σn)\begin{equation} \begin{split} G_n^{3D}(x) &= \exp \left( -\frac{1}{2} (x - \mu_n^{3D})^T (\Sigma_n^{3D})^{-1} (x - \mu_n^{3D}) \right) \\ G_n^{3D}(x) &= \exp \left( -\frac{1}{2} (\Delta_n^{3D})^T (\Sigma_n^{3D})^{-1} (\Delta_n^{3D}) \right) \\ &= \exp(-\sigma_n) \end{split} \end{equation}

where Δn3D=xμn3D\Delta_n^{3D} = x - \mu_n^{3D} represents the offset from the Gaussian mean. Considering the rays hit in a continuous manner, x represents the most effective hit position. You could also integrate over the ray path for a better result that accounts for densities superpositions but it would probably need a different primitive with more easily integrable densities.

In the scope of gaussian ray tracing xx can be obtained from the intrinsics (RR and SS) of the gaussian (noted later nn) which polyhedron was hit by the ray ii descent's (as described in NVIDIA's paper):
x=o+τmax×d\begin{equation} x = o + \tau_{max} \times d \end{equation}
Where:

4. Covariance Matrix

The covariance matrix is a combination of the rotation matrix and scale matrix such as:
Σn3D=RnSnSnRn=RnSnSnRn=MnMn\begin{equation} \begin{split} \Sigma_n^{3D} &= R_n S_n S_n^\top R_n^\top \\ &= R_n S_n S_n R_n^\top \\ &= M_n M_n^\top \end{split} \end{equation}

Since SnS_n is a symmetric, diagonal matrix.

5. Scale Matrix

Sn=diag(sn1,sn2,sn3)\begin{equation} S_n = \text{diag}(s_{n1}, s_{n2}, s_{n3}) \end{equation}

6. Rotation Matrix (Quaternion Representation)

Given a quaternion qn=[w,x,y,z]q_n = [w, x, y, z], the rotation matrix RnR_n is:
Rn=[12(y2+z2)2(xywz)2(xz+wy)2(xy+wz)12(x2+z2)2(yzwx)2(xzwy)2(yz+wx)12(x2+y2)]\begin{equation} R_n = \begin{bmatrix} 1 - 2(y^2 + z^2) & 2(xy - wz) & 2(xz + wy) \\ 2(xy + wz) & 1 - 2(x^2 + z^2) & 2(yz - wx) \\ 2(xz - wy) & 2(yz + wx) & 1 - 2(x^2 + y^2) \end{bmatrix} \end{equation}


Gradients

In this section, I will provide a complete overview of the gradients necessary for the 3D Gaussian Ray Traced reconstruction algorithm. We'll examine each gradient component in detail, starting with the initial loss function and working through the chain rule to derive gradients for all Gaussian parameters.

1. Gradient on initial loss

Gradient on Initial Loss

In the context of 3D Gaussian Ray-Traced Reconstruction, the initial loss function is designed to balance pixel-wise accuracy with structural fidelity. The corrected initial loss function is defined as:

L  =  (1λ)L1  +  λLD-SSIM\begin{equation} \mathcal{L} \;=\; (1 - \lambda)\,\mathcal{L}_1 \;+\; \lambda\,\mathcal{L}_{\text{D-SSIM}} \end{equation}

Breakdown of the Loss Function Components

By computing the gradient LCi\frac{\partial \mathcal{L}}{\partial C_i} and backpropagating it through the network of Gaussian parameters, the optimization process iteratively adjusts the model to minimize the loss, resulting in a high-quality 3D reconstruction.

Derivative of the Loss Function with Respect to CiC_i

To perform gradient descent optimization, we need to compute the derivative of the loss L\mathcal{L} with respect to each pixel or ray contribution CiC_i:

LCi=(1λ)L1Ci  +  λLD-SSIMCi.\begin{equation}\frac{\partial \mathcal{L}}{\partial C_i} = (1 - \lambda)\,\frac{\partial \mathcal{L}_1}{\partial C_i} \;+\; \lambda\,\frac{\partial \mathcal{L}_{\text{D-SSIM}}}{\partial C_i}.\end{equation}

Both can be automated with most compute frameworks and I will not dive further on this aspect. A typical code for this would be something like this with pytorch:

import torch
import torch.nn.functional as F
from ssim import ssim # use library of your choice

def compute_loss_gradient(gaussian_render_image, target_image):

    # Normalize inputs between 0 and 1 (if 8bit images, divide by max_value)
    gaussian_render_image = gaussian_render_image / max_value
    target_image = target_image / max_value
    
    # Prepare for gradient computation, mark as the variable to do the gradient w.r.t
    gaussian_render_image = gaussian_render_image.requires_grad_(True)
    
    # Compute error metrics
    error_metric1 = F.l1_loss(gaussian_render_image, target_image)
    
    # Compute similarity-based loss
    disimilarity_loss = 1.0 - ssim(
        gaussian_render_image,
        target_image
    )
    
    # Combine losses with adjusted weights
    total_loss = 0.8 * error_metric1 + 0.2 * disimilarity_loss
    
    # Compute the gradient
    total_loss.backward()
    
    # Get and normalize gradient
    gradient = gaussian_render_image.grad / gaussian_render_image.grad.norm()
    return gradient

2. Gradient of Color Contribution

Let's remember that:

Ci(k)=nNcn×αn×TnC_{i}(k) = \sum_{n \in N} c_n \times \alpha_n \times T_n

Now, we have:

Ci(k)cn(k)=αnTn\begin{equation} \frac{\partial C_i(k)}{\partial c_n(k)} = \alpha_n T_n \end{equation}
Ci(k)αn=cn(k)TnSn(k)1αn\begin{equation} \frac{\partial C_i(k)}{\partial \alpha_n} = c_n(k) T_n - \frac{S_n(k)}{1 - \alpha_n} \end{equation}

where Sn(k)S_n(k) represents the accumulated (back to front since it is a backward traversal) contribution of samples along the ray. It is also expressed as:
Sn(k)=m>nci(k)αmTm\begin{equation} S_n(k) = \sum_{m>n} c_i(k) \alpha_m T_m \end{equation}

If Sn(k)S_n(k) is computed after a forward pass (thus you have the value of Ci(k)C_i(k) of the color at pixel i), it can also take the value of:
Sn(k)=Ci(k)mncm(k)αmTm\begin{equation} S_n(k) = C_i(k) - \sum_{m\leq n} c_m(k) \alpha_m T_m \end{equation}
This is usefull because in ray tracing the ray ''standard'' evolution is from the camera plane up to the far plane/end of frustrum (infinity, or closer for computational purposes). This would require first tracing a ray, storing the values of each "hit", then compute the product. It would consume memory and reduce efficiency with a backward loop. A rewrite in the direction of the ray is thus usefull to save on compute ressources.

3. Gradient of Transparency

If we take a look at the gradient of transparency w.r.t. opacity we have:

αnon=exp(σn)\begin{equation} \frac{\partial \alpha_n}{\partial o_n} = \exp(-\sigma_n) \end{equation}

And w.r.t. 3D Gaussian function we have:
αnGn3D=on\begin{equation} \frac{\partial \alpha_n}{\partial G_n^{3D}} = o_n \end{equation}

4. Gradient of Gaussian Function

Relative to the shape parameter of the gaussian function σn\sigma_n:
Gn3Dσn=exp(σn)=Gn3D\begin{equation} \frac{\partial G_n^{3D}}{\partial \sigma_n} = -\exp(-\sigma_n) = - G_n^{3D} \end{equation}

5. Gradient of shape parameter Covariance Matrix

where Δn3D=xμn3D\Delta_n^{3D} = x - \mu_n^{3D} represents the offset from the Gaussian mean.

6. Gradient of the gaussian transformation matrix

LMn=LΣn3DMn+(LΣn3D)TMn\begin{equation} \frac{\partial \mathcal{L}}{\partial M_n} = \frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}} M_n + \left(\frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}} \right)^T M_n \end{equation}

7. Gradient of Loss with Respect to Quaternion

Let qn=[wn,xn,yn,zn]q_n = [w_{n}, x_{n}, y_{n}, z_{n}], and the general case of q=[w,x,y,z]q = [w, x, y, z], then:
Rw=2[0zyz0xyx0]Rx=2[0yzy2xwzw2x]Ry=2[2yxwx0zwz2y]Rz=2[2zwxw2zyxy0]\begin{align} \frac{\partial R}{\partial w} &= 2 \begin{bmatrix} 0 & -z & y \\ z & 0 & -x \\ -y & x & 0 \end{bmatrix} \\ \frac{\partial R}{\partial x} &= 2 \begin{bmatrix} 0 & y & z \\ y & -2x & -w \\ z & w & -2x \end{bmatrix} \\ \frac{\partial R}{\partial y} &= 2 \begin{bmatrix} -2y & x & w \\ x & 0 & z \\ -w & z & -2y \end{bmatrix} \\ \frac{\partial R}{\partial z} &= 2 \begin{bmatrix} -2z & -w & x \\ w & -2z & y \\ x & y & 0 \end{bmatrix} \end{align}

8. Gradient of Scale

For Sn=diag(sn1,sn2,sn3)S_n = \text{diag}(s_{n1}, s_{n2}, s_{n3}):
Ssni=δij\begin{equation} \frac{\partial S}{\partial s_{ni}} = \delta_{ij} \end{equation}

where δij\delta_{ij} is the Kronecker delta, defined as:
δij={1if i=j0if ij\begin{equation} \delta_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases} \end{equation}

(As an example, for the 3×3 case as in the 3D gaussian descent, this is equivalent to:)
[100010001]\begin{equation} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \end{equation}


Full Gaussian Parameterization details

In this section I will detail a bit more some

1. 3D Gaussian Function

The 3D Gaussian function is:
Gn3D(x)=exp(12(xμn3D)T(Σn3D)1(xμn3D))\begin{equation} G_n^{3D}(x) = \exp \left( -\frac{1}{2} (x - \mu_n^{3D})^T (\Sigma_n^{3D})^{-1} (x - \mu_n^{3D}) \right) \end{equation}

2. Final Compositing Equation

Ci=nNcnαnm=1n1(1αm)=nNcnαnm<n(1αm)\begin{equation} \boxed{ \begin{split} C_i &= \sum_{n \in N} c_n \alpha_n \prod_{m=1}^{n-1} (1 - \alpha_m) \\[23pt] &= \sum_{n \in N} c_n \alpha_n \prod_{m<n} (1 - \alpha_m) \end{split} } \end{equation}

where m<nm<n ensures that compositing respects the order of contributions of the Gaussians: accumulated from the camera pixel up to gaussian n alogn the ray.
In ray tracing the ray ''standard'' evolution is from the camera plane up to the far plane/end of frustrum (infinity, or closer for computational purposes).


Explicit Chain Rules for Gradient Descent

This chapter will detail all the gradients of the 5 main parameters.

1. Gradient of Loss w.r.t. Quaternion (qi)( q_i )

Lqi=((LCiCiαnαnGn3DGn3DσnσnΣn3D)Mn+(LCiCiαnαnGn3DGn3DσnσnΣn3D)TMn)MnRnRnqni\begin{equation} {\small \boxed{\frac{\partial \mathcal{L}}{\partial q_i}} = \left( \left( \frac{\partial \mathcal{L}}{\partial C_i} \cdot \frac{\partial C_i}{\partial \alpha_n} \cdot \frac{\partial \alpha_n}{\partial G_n^{3D}} \cdot \frac{\partial G_n^{3D}}{\partial \sigma_n} \cdot \frac{\partial \sigma_n}{\partial \Sigma_n^{3D}} \right) M_n + \left( \frac{\partial \mathcal{L}}{\partial C_i} \cdot \frac{\partial C_i}{\partial \alpha_n} \cdot \frac{\partial \alpha_n}{\partial G_n^{3D}} \cdot \frac{\partial G_n^{3D}}{\partial \sigma_n} \cdot \frac{\partial \sigma_n}{\partial \Sigma_n^{3D}} \right)^T M_n \right) \cdot \frac{\partial M_n}{\partial R_n} \cdot \frac{\partial R_n}{\partial q_{ni}} } \end{equation}

Fully substituted:
Lqni=SMnRnRnqnisee Gradients - 7.(LCi(k)(cn(k)TnSn(k)1αn)(on)(Gn3D)(12(Σn3D)1Δn3D(Δn3D)T(Σn3D)1)LΣn3DMn+(LCi(k)(cn(k)TnSn(k)1αn)(on)(Gn3D)(12(Σn3D)1Δn3D(Δn3D)T(Σn3D)1)LΣn3D)TMn)\begin{equation} \begin{split} \frac{\partial \mathcal{L}}{\partial q_{ni}} = \underbrace{S}_{\frac{\partial M_n}{\partial R_n}} \cdot \underbrace{\frac{\partial R_n}{\partial q_{ni}}}_{\text{see Gradients - 7.}}\cdot \Biggl( \underbrace{ \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \left( c_n(k)\,T_n - \frac{S_n(k)}{1 - \alpha_n} \right) \cdot (o_n) \cdot (\color{brown}{\cancel{\color{grey}-}} \color{normalcolor}G_n^{3D}) \cdot \left( \color{brown}{\cancel{\color{grey}-}} \color{normalcolor}\frac{1}{2} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} (\Delta_n^{3D})^T (\Sigma_n^{3D})^{-1} \right) }_{\frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}} \cdot M_n + \\ \left( \underbrace{ \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \left( c_n(k)\,T_n - \frac{S_n(k)}{1 - \alpha_n} \right) \cdot (o_n) \cdot (\color{brown}{\cancel{\color{grey}-}} \color{normalcolor}G_n^{3D}) \cdot \left( \color{brown}{\cancel{\color{grey}-}} \color{normalcolor}\frac{1}{2} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} (\Delta_n^{3D})^T (\Sigma_n^{3D})^{-1} \right) }_{\frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}} \right)^T \cdot M_n \Biggl) \end{split} \end{equation}


2. Gradient of Loss w.r.t. Scale Factor (sni)( s_{ni} )

Lsni=MnSnSnsni((LCiCiαnαnGn3DGn3DσnσnΣn3D)Mn+(LCiCiαnαnGn3DGn3DσnσnΣn3D)TMn)\begin{equation} {\small \boxed{\frac{\partial \mathcal{L}}{\partial s_{ni}}} = \frac{\partial M_n}{\partial S_n} \cdot \frac{\partial S_n}{\partial s_{ni}} \cdot \left( \left( \frac{\partial \mathcal{L}}{\partial C_i} \cdot \frac{\partial C_i}{\partial \alpha_n} \cdot \frac{\partial \alpha_n}{\partial G_n^{3D}} \cdot \frac{\partial G_n^{3D}}{\partial \sigma_n} \cdot \frac{\partial \sigma_n}{\partial \Sigma_n^{3D}} \right) M_n + \left( \frac{\partial \mathcal{L}}{\partial C_i} \cdot \frac{\partial C_i}{\partial \alpha_n} \cdot \frac{\partial \alpha_n}{\partial G_n^{3D}} \cdot \frac{\partial G_n^{3D}}{\partial \sigma_n} \cdot \frac{\partial \sigma_n}{\partial \Sigma_n^{3D}} \right)^T M_n \right) } \end{equation}

Fully substituted:
Lsni=RTMnSnδijSnsni(LCi(k)(cn(k)TnSn(k)1αn)(on)(Gn3D)(12(Σn3D)1Δn3D(Δn3D)T(Σn3D)1)LΣn3DMn+(LCi(k)(cn(k)TnSn(k)1αn)(on)(Gn3D)(12(Σn3D)1Δn3D(Δn3D)T(Σn3D)1)LΣn3D)TMn)\begin{equation} \begin{split} \normalsize { \frac{\partial \mathcal{L}}{\partial s_{ni}} = \underbrace{R^T}_{\frac{\partial M_n}{\partial S_n}} \cdot \underbrace{\delta_{ij}}_{\frac{\partial S_n}{\partial s_{ni}}} \cdot \Biggl( \underbrace{ \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \left( c_n(k)\,T_n - \frac{S_n(k)}{1 - \alpha_n} \right) \cdot (o_n) \cdot (\color{brown}{\cancel{\color{grey}-}} \color{normalcolor}G_n^{3D}) \cdot \left( \color{brown}{\cancel{\color{grey}-}} \color{normalcolor}\frac{1}{2} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} (\Delta_n^{3D})^T (\Sigma_n^{3D})^{-1} \right) }_{\frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}} \cdot M_n} \\ \normalsize { + \left( \underbrace{ \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \left( c_n(k)\,T_n - \frac{S_n(k)}{1 - \alpha_n} \right) \cdot (o_n) \cdot (\color{brown}{\cancel{\color{grey}-}} \color{normalcolor}G_n^{3D}) \cdot \left( \color{brown}{\cancel{\color{grey}-}} \color{normalcolor}\frac{1}{2} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} (\Delta_n^{3D})^T (\Sigma_n^{3D})^{-1} \right) }_{\frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}} \right)^T \cdot M_n \Biggl) } \end{split} \end{equation}


3. Gradient of Loss w.r.t. Gaussian Mean (μn)( \mu_n )

Lμn3D=LCiCiαnαnGn3DGn3Dσnσnμn3D\begin{equation} \boxed{\frac{\partial \mathcal{L}}{\partial \mu_n^{3D}}} = \frac{\partial \mathcal{L}}{\partial C_i} \cdot \frac{\partial C_i}{\partial \alpha_n} \cdot \frac{\partial \alpha_n}{\partial G_n^{3D}} \cdot \frac{\partial G_n^{3D}}{\partial \sigma_n} \cdot \frac{\partial \sigma_n}{\partial \mu_n^{3D}} \end{equation}

Fully substituted:
Lμn3D=LCi(k)(cn(k)TnSn(k)1αn)CiαnonαnGn3D(Gn3D)Gn3Dσn((Σn3D)1Δn3D)σnμn3D=LCi(k)(cn(k)TnSn(k)1αn)onGn3D(Σn3D)1Δn3D\begin{equation} \begin{split} \frac{\partial \mathcal{L}}{\partial \mu_n^{3D}} &= \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \underbrace{\left( c_n(k)\,T_n - \frac{S_n(k)}{1 - \alpha_n} \right)}_{\frac{\partial C_i}{\partial \alpha_n}} \cdot \underbrace{o_n}_{\frac{\partial \alpha_n}{\partial G_n^{3D}}} \cdot \underbrace{(\color{brown}{\cancel{\color{grey}-}} \color{normalcolor}G_n^{3D})}_{\frac{\partial G_n^{3D}}{\partial \sigma_n}} \cdot \underbrace{\left( \color{brown}{\cancel{\color{grey}-}} \color{normalcolor} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} \right)}_{\frac{\partial \sigma_n}{\partial \mu_n^{3D}}} \\ &= \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \left( c_n(k)\,T_n - \frac{S_n(k)}{1 - \alpha_n} \right) \cdot o_n \cdot G_n^{3D} \cdot (\Sigma_n^{3D})^{-1} \Delta_n^{3D} \end{split} \end{equation}


4. Gradient of Loss w.r.t. Opacity (on)( o_n )

Lon=LCiCiαnαnon\begin{equation} \boxed{\frac{\partial \mathcal{L}}{\partial o_n}} = \frac{\partial \mathcal{L}}{\partial C_i} \cdot \frac{\partial C_i}{\partial \alpha_n} \cdot \frac{\partial \alpha_n}{\partial o_n} \end{equation}

Fully substituted:
Lon=LCi(k)(cn(k)TnSn(k)1αn)Ciαn(1exp(σn))αnon\begin{equation} \frac{\partial \mathcal{L}}{\partial o_n} = \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \underbrace{\left( c_n(k)\,T_n - \frac{S_n(k)}{1 - \alpha_n} \right)}_{\frac{\partial C_i}{\partial \alpha_n}} \cdot \underbrace{(1 - \exp(-\sigma_n))}_{\frac{\partial \alpha_n}{\partial o_n}} \end{equation}


5. Gradient of Loss w.r.t. Color (cn(k))( c_n(k) )

Lcn(k)=LCi(k)Ci(k)cn(k)\begin{equation} \boxed{\frac{\partial \mathcal{L}}{\partial c_n(k)}} = \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \frac{\partial C_i(k)}{\partial c_n(k)} \end{equation}

Fully substituted:
Lcn(k)=LCi(k)(αnTn)Ci(k)cn(k)\begin{equation} \frac{\partial \mathcal{L}}{\partial c_n(k)} = \frac{\partial \mathcal{L}}{\partial C_i(k)} \cdot \underbrace{(\alpha_n \, T_n)}_{\frac{\partial C_i(k)}{\partial c_n(k)}} \end{equation}


Gradient Propagation to 3D Parameters intermediary steps explanations

The steps at the transformation matrix of the gradient descent are not so straightforward and I think a little more detailled explaination is appreciated.

Which can be eventually expended (with minus simplifications) as:
LMn=(LΣn3D)Mn  +  (LΣn3D)TMn=[LCi(k)(depends on loss)  (cn(k)Tn    Sn(k)1αn)Ciαn  (on)αnGn3D  (Gn3D)Gn3Dσn  (12(Σn3D)1Δn3D(Δn3D)T(Σn3D)1)σnΣn3D]Mn+  [LCi(k)(same factor)  (cn(k)Tn    Sn(k)1αn)Ciαn  (on)αnGn3D  (Gn3D)Gn3Dσn  (12(Σn3D)1Δn3D(Δn3D)T(Σn3D)1)σnΣn3D]TMn.\begin{equation} \begin{split} \frac{\partial \mathcal{L}}{\partial M_n} &= \left(\frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}\right)\,M_n \;+\; \Bigl(\tfrac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}\Bigr)^{T}\,M_n \\[6pt] &= \Bigl[ \underbrace{\frac{\partial \mathcal{L}}{\partial C_i(k)}}_{\text{(depends on loss)}} \;\underbrace{\Bigl(c_n(k)\,T_n \;-\;\frac{S_n(k)}{1-\alpha_n}\Bigr)}_{\displaystyle \frac{\partial C_i}{\partial \alpha_n}} \;\underbrace{\bigl(\,o_n\bigr)}_{\displaystyle \frac{\partial \alpha_n}{\partial G_n^{3D}}} \;\underbrace{\bigl(\,\color{brown}{\cancel{\color{grey}-}} \color{normalcolor} G_n^{3D}\bigr)}_{\displaystyle \frac{\partial G_n^{3D}}{\partial \sigma_n}} \;\underbrace{\Bigl(\color{brown}{\cancel{\color{grey}-}} \color{normalcolor}\tfrac12 (\Sigma_n^{3D})^{-1}\,\Delta_n^{3D}\,(\Delta_n^{3D})^T\,(\Sigma_n^{3D})^{-1}\Bigr)}_{\displaystyle \frac{\partial \sigma_n}{\partial \Sigma_n^{3D}}} \Bigr] \,M_n \\ &\qquad{} +\; \Bigl[ \underbrace{\frac{\partial \mathcal{L}}{\partial C_i(k)}}_{\text{(same factor)}} \;\underbrace{\Bigl(c_n(k)\,T_n \;-\;\frac{S_n(k)}{1-\alpha_n}\Bigr)}_{\displaystyle \frac{\partial C_i}{\partial \alpha_n}} \;\underbrace{\bigl(\,o_n\bigr)}_{\displaystyle \frac{\partial \alpha_n}{\partial G_n^{3D}}} \;\underbrace{\bigl(\color{brown}{\cancel{\color{grey}-}} \color{normalcolor} G_n^{3D}\bigr)}_{\displaystyle \frac{\partial G_n^{3D}}{\partial \sigma_n}} \;\underbrace{\Bigl(\color{brown}{\cancel{\color{grey}-}} \color{normalcolor}\tfrac12 (\Sigma_n^{3D})^{-1}\,\Delta_n^{3D}\,(\Delta_n^{3D})^T\,(\Sigma_n^{3D})^{-1}\Bigr)}_{\displaystyle \frac{\partial \sigma_n}{\partial \Sigma_n^{3D}}} \Bigr]^{T} \,M_n. \end{split} \end{equation}

Proof for this step can be derived with the help of the Matrix CookBook. See later paragraph.


Parameters modifications with respects to the graidents

Parameters Modifications with Respect to the Gradients

Gradient Descent Update Rule

To optimize the 5 primary Gaussian parameters, we apply gradient descent as follows:

qit+1=qitλLqi\begin{equation} q_i^{t+1} = q_i^{t} - \lambda \frac{\partial \mathcal{L}}{\partial q_i} \end{equation}

where:

This iterative update adjusts each parameter in the direction that minimizes the reconstruction error.


Parameter Updates

During training, some points are to be noted.

Firstly, the parameters (hereby noted generically θ\theta) should be updated generally in the form of:

θt+1=θtλLθ\begin{equation} \theta^{t+1} = \theta^{t} - \lambda \frac{\partial \mathcal{L}}{\partial \theta} \end{equation}

With:

This allows converging toward the lesser difference between the grount truth (GT, or input images) and the current 3D modelisation raster obtained from the gaussians.

Secondly, there are some conciderations on each parameters to keep a physically, updatable and normalized update flow of the parameters. Mainly, the parameters in the gradient optimizer (I recommend to follow traditional gsplat implementation of adam or derivative as a bases that works) should be mapped to [,+][-\infty, +\infty] to take advantage of unbounded real values and the dynamic range of float32 types used during training, allowing for a more stable gradient flow.

Thirdly, the learning rates should be determined empirically in a further study if possible. At first glance taking similar order as gsplat or similar 2D gaussian would seem to work.

The first two concerns are detailled in the following subsections for each main parameter update as much as possible.

1. Position Update (μn3D)(\mu_n^{3D})

The 3D position update follows:

μn3D,t+1=μn3D,tλLμn3D\begin{equation} \mu_n^{3D, t+1} = \mu_n^{3D, t} - \lambda \frac{\partial \mathcal{L}}{\partial \mu_n^{3D}} \end{equation}

This moves the Gaussian center in 3D space toward an optimal position that reduces the loss function.


2. Rotation Update (Quaternion) (qn)(q_n)

Since quaternions represent rotations, we must normalize them after applying gradient descent:

qnt+1=normalize(qntλLqn)\begin{equation} q_n^{t+1} = \text{normalize} \Big( q_n^{t} - \lambda \frac{\partial \mathcal{L}}{\partial q_n} \Big) \end{equation}

where the normalization ensures the updated quaternion remains on the unit hypersphere:

normalize(q)=qq\begin{equation} \text{normalize}(q) = \frac{q}{\| q \|} \end{equation}

This maintains numerical stability while optimizing rotation in SO(3)\text{SO(3)}.


3. Scale Update (sn)(s_n)

To ensure positive scaling, we parameterize the scale factors as exponentials:

sn=es^n\begin{equation} s_n = e^{\hat{s}_n} \end{equation}

which transforms the update into:

snt+1=sntλLsn\begin{equation} s_n^{t+1} = s_n^{t} - \lambda \frac{\partial \mathcal{L}}{\partial s_n} \end{equation}

This guarantees that the scale remains strictly positive during optimization.


4. Opacity Update (on)(o_n)

Opacity is updated using:

ont+1=ontλLon\begin{equation} o_n^{t+1} = o_n^{t} - \lambda \frac{\partial \mathcal{L}}{\partial o_n} \end{equation}

Since the normalized opacity o^n\hat{o}_n ([0,1] normalized, not the compute opacity on[,]o_n \in [-\infty,\infty] used in the optimizer) must remain in [0,1], you can apply clipping. Use 0,999 or a similar not 1 value can be used to allow for the gradient descent to still evolve and not be stuck because the splat becomes opaque and nothing behind can evolve:
o^nt+1=clip(o^nt+1,0,0.999)=clip(sigmoid(ont+1),0,0.999)\begin{equation} \begin{split} \hat{o}_n^{t+1} &= \text{clip}(\hat{o}_n^{t+1}, 0, 0.999) \\ &= \text{clip}(\text{sigmoid}(o_n^{t+1}), 0, 0.999) \end{split} \end{equation}


5. Color Update (cn)(c_n)

Color values are updated similarly:

cnt+1=cntλLcn\begin{equation} c_n^{t+1} = c_n^t - \lambda \frac{\partial \mathcal{L}}{\partial c_n} \end{equation}

NB:
To ensure stability in SDR and HDR or other spaces imaging, colors should be first clipped within a standard, common range. Such that with a [0,1] space:
c^n=clip(cnraw,0,1)\begin{equation} \hat{c}_n = \text{clip}(c_{n_{raw}}, 0, 1) \end{equation}
With:


Gradient descent algorithm quick flow and summary table

For each iteration:

  1. Compute gradients Lθ\frac{\partial \mathcal{L}}{\partial \theta} for each parameter θ\theta.
  2. Update each parameter using the rules above.
  3. Normalize the quaternion qnq_n to maintain unit norm.
  4. Ensure constraints for positive scales sns_n, valid opacities ono_n, and correct color range.
  5. Iterate until convergence.

Here is a summary table for a general overview of what was in this chapter:

Parameter Update Rule
Position (μn3D)(\mu_n^{3D}) μn3D,t+1=μn3D,tλLμn3D\mu_n^{3D, t+1} = \mu_n^{3D, t} - \lambda \frac{\partial \mathcal{L}}{\partial \mu_n^{3D}}
Rotation (qn)(q_n) qnt+1=normalize(qntλLqn)q_n^{t+1} = \text{normalize}(q_n^t - \lambda \frac{\partial \mathcal{L}}{\partial q_n})
Scale (sn)(s_n) snt+1=sntλLsns_n^{t+1} = s_n^{t} - \lambda \frac{\partial \mathcal{L}}{\partial s_n}
Opacity (on)(o_n) ont+1=ontλLono_n^{t+1} = o_n^{t} - \lambda \frac{\partial \mathcal{L}}{\partial o_n}
Color (cn)(c_n) cnt+1=cntλLcnc_n^{t+1} = c_n^t - \lambda \frac{\partial \mathcal{L}}{\partial c_n}

Result example:

The following image is the result of an optix + cuda + pytorch algorithm following the implementation described earlier. It did not implement splitting/pruning and is only the gradient descent described above:

Description

For legal reasons I cannot publish the associated code, as I do not have the consent of all parties involved. However, the mathematics described here is an adaptation to 3DGRT of already published mathematics made by me and myself alone (and obviously built upon the shoulders of giants fully disclosed in references and links).

Conclusion

It is my hope that this detailed explanation of the gradient descent of 3DGRT will prove useful to those new to the field or wishing to build upon 3DGRT. A preliminary review suggests that the gradient descent method may also be applicable to other papers, including EVER [4].

In my opinion, it serves as an excellent introduction to the mathematical principles underlying Gaussian splatting in general. It even covers the traditional technique with 3D to 2D steps (including projection to screen space and Jacobean in the gradient descent). Some minor adjustments would be required to extend this article. However, this falls outside the scope of my current work and I encourage anyone interested to either add it or contact me to extend the current version of this article with it.

The results obtained without the use of advanced techniques (such as running, splitting, kernel filtering, etc.) appear to confirm the working basics for the gradient descent. I am hopeful that the mathematics in this article will be made open source in the future and that ray tracing will become more accessible in terms of hardware in the coming years.


Nota Bene:

If any error in this article is found, feel free to contact the author for a revision to be made (contacts at the beginning).


Annexes

Annex: Gradients and other mathematical proofs:

In this section, I present detailed mathematical proofs for the gradient formulations described earlier. These proofs are essential for understanding the derivation of update rules in the 3D Gaussian Ray-Traced reconstruction algorithm.

Proof 1: Gradient of Color Contribution

We begin by proving the gradients of the color contribution with respect to color and opacity parameters.

Gradient with respect to opacity αn\alpha_n:

For the derivative with respect to αn\alpha_n, the situation is more complex because αn\alpha_n affects not only its own term but also the transparency factors TmT_m for m>nm > n (recall that n=0 is back of the frustrum and n=N is at the last gaussian before camera plane).

Starting with:

Ci(k)=mNcm(k)×αm×Tm\begin{equation} C_i(k) = \sum_{m \in N} c_m(k) \times \alpha_m \times T_m \end{equation}

We separate this into terms before, at, and after index nn:

Ci(k)=m<ncm(k)×αm×Tm+cn(k)×αn×Tn+m>ncm(k)×αm×Tm\begin{equation} C_i(k) = \sum_{m < n} c_m(k) \times \alpha_m \times T_m + c_n(k) \times \alpha_n \times T_n + \sum_{m > n} c_m(k) \times \alpha_m \times T_m \end{equation}

αn(cm(k)×αm×Tm)=cm(k)×αm×Tm1αn\begin{equation} \frac{\partial}{\partial \alpha_n}(c_m(k) \times \alpha_m \times T_m) = c_m(k) \times \alpha_m \times \frac{-T_m}{1-\alpha_n} \end{equation}

Let's derive this more explicitly:


First, recall the definition of accumulated transparency:
Tm=j=1m1(1αj)\begin{equation} T_m = \prod_{j=1}^{m-1} (1 - \alpha_j) \end{equation}
We can separate this product to isolate the term containing αn\alpha_n:
Tm=(j=1n1(1αj))×(1αn)×(j=n+1m1(1αj))=Tn×(1αn)×j=n+1m1(1αj)\begin{equation} \begin{split} T_m &= \left(\prod_{j=1}^{n-1} (1 - \alpha_j)\right) \times (1-\alpha_n) \times \left(\prod_{j=n+1}^{m-1} (1 - \alpha_j)\right) \\ &= T_n \times (1-\alpha_n) \times \prod_{j=n+1}^{m-1}(1-\alpha_j) \\ \end{split} \end{equation}
Now we take the partial derivative with respect to αn\alpha_n:
Tmαn=αn[Tn×(1αn)×j=n+1m1(1αj)]=Tn×j=n+1m1(1αj)\begin{equation} \begin{split} \frac{\partial T_m}{\partial \alpha_n} &= \frac{\partial}{\partial \alpha_n}\left[T_n \times (1-\alpha_n) \times \prod_{j=n+1}^{m-1}(1-\alpha_j)\right] \\ &= -T_n \times \prod_{j=n+1}^{m-1}(1-\alpha_j) \end{split} \end{equation}
To simplify this further, we note that:
Tm=Tn×(1αn)×j=n+1m1(1αj)j=n+1m1(1αj)=TmTn×(1αn)\begin{align} T_m &= T_n \times (1-\alpha_n) \times \prod_{j=n+1}^{m-1}(1-\alpha_j) \\ \Leftrightarrow \prod_{j=n+1}^{m-1}(1-\alpha_j) &= \frac{T_m}{T_n \times (1-\alpha_n)} \end{align}
Substituting this back into our derivative:
Tmαn=Tn×TmTn×(1αn)=Tm1αn\begin{equation} \begin{split} \frac{\partial T_m}{\partial \alpha_n} &= -{\color{brown}{\cancel{\color{grey}T_n}}} \times \frac{T_m}{{\color{brown}{\cancel{\color{grey}T_n}}} \times (1-\alpha_n)} \\ &= \frac{-T_m}{1-\alpha_n} \end{split} \end{equation}

Thus, for m>nm > n:
αn(cm(k)×αm×Tm)=cm(k)×αm×Tm1αn\begin{equation} \frac{\partial}{\partial \alpha_n}(c_m(k) \times \alpha_m \times T_m) = c_m(k) \times \alpha_m \times \frac{-T_m}{1-\alpha_n} \end{equation}

Combining all terms:
Ci(k)αn=cn(k)×Tnm>ncm(k)×αm×Tm1αn\begin{equation} \frac{\partial C_i(k)}{\partial \alpha_n} = c_n(k) \times T_n - \sum_{m > n} \frac{c_m(k) \times \alpha_m \times T_m}{1-\alpha_n} \end{equation}

Recognizing that Sn(k)=m>ncm(k)×αm×TmS_n(k) = \sum_{m > n} c_m(k) \times \alpha_m \times T_m is the accumulated contribution of samples after nn:

Ci(k)αn=cn(k)×TnSn(k)1αn\begin{equation} \boxed{\frac{\partial C_i(k)}{\partial \alpha_n} = c_n(k) \times T_n - \frac{S_n(k)}{1-\alpha_n}} \end{equation}

which matches the previous gradient formulation.

Proof 2: Gradient of Shape Parameter with Respect to the inverse Covariance

Let Δn3D:(R3,R3)R3,  Δn3D(x,μn3D)=xμn3D\Delta_n^{3D}: (\mathbb{R}^3, \mathbb{R}^3) \rightarrow \mathbb{R}^3, \space\space \Delta_n^{3D}(x, \mu_n^{3D})= x - \mu_n^{3D} and σn:(R3,R3×3)R3,  σn(Δn3D,(Σn3D)1)=12Δn3DT(Σn3D)1Δn3D\sigma_n: (\mathbb{R}^3, \mathbb{R}^{3 \times 3}) \rightarrow \mathbb{R}^3, \space\space \sigma_n(\Delta_n^{3D}, (\Sigma_n^{3D})^{-1}) = \frac{1}{2}\Delta_n^{3D^T} (\Sigma_n^{3D})^{-1} \Delta_n^{3D}:

Taking the derivative with respect to μn3D\mu_n^{3D}:

σnΣ3D=μn3D[12(Δn3D)a(Σn3D)1Δn3Db]\begin{equation} \frac{\partial \sigma_n}{\partial \Sigma^{3D}} = \frac{\partial}{\partial \mu_n^{3D}} \left[ \frac{1}{2}\underbrace{(\Delta_n^{3D})^{\top}}_{a^{\top}} (\Sigma_n^{3D})^{-1} \underbrace{\Delta_n^{3D}}_{b} \right] \end{equation}

Using the identity (70) in the matrix cookbook:

aXbX=ab\begin{equation} \frac{\partial \mathbf{a}^\top \mathbf{X} \mathbf{b}}{\partial \mathbf{X}} = \mathbf{a} \mathbf{b}^\top \end{equation}

We thus obtain:

σn(Σn3D)1=12Δn3D(Δn3D)\begin{equation} \boxed{\frac{\partial \sigma_n}{\partial (\Sigma_n^{3D})^{-1}} = \frac{1}{2}\Delta_n^{3D}(\Delta_n^{3D})^{\top}} \end{equation}

Proof 3: Gradient of Shape Parameter with Respect to the Mean

Let Δn3D:(R3,R3)R3,  Δn3D(x,μn3D)=xμn3D\Delta_n^{3D}: (\mathbb{R}^3, \mathbb{R}^3) \rightarrow \mathbb{R}^3, \space\space \Delta_n^{3D}(x, \mu_n^{3D})= x - \mu_n^{3D} and σn:(R3,R3×3)R3,  σn(Δn3D,Σn3D)=12Δn3DT(Σn3D)1Δn3D\sigma_n: (\mathbb{R}^3, \mathbb{R}^{3 \times 3}) \rightarrow \mathbb{R}^3, \space\space \sigma_n(\Delta_n^{3D}, \Sigma_n^{3D}) = \frac{1}{2}\Delta_n^{3D^T} (\Sigma_n^{3D})^{-1} \Delta_n^{3D}:

Taking the derivative with respect to μn3D\mu_n^{3D}:
σnμn3D=μn3D[12(xμn3D)(Δn3D)(Σn3D)1(xμn3D)Δn3D]\begin{align} \frac{\partial \sigma_n}{\partial \mu_n^{3D}} &= \frac{\partial}{\partial \mu_n^{3D}} \left[ \frac{1}{2}\underbrace{(x - \mu_n^{3D})^{\top}}_{(\Delta_n^{3D})^{\top}} (\Sigma_n^{3D})^{-1} \underbrace{(x - \mu_n^{3D})}_{\Delta_n^{3D}} \right] \\ \end{align}

Using the product rule on derivatives and noting that Δn3Dμn3D=I\frac{\partial \Delta_n^{3D}}{\partial \mu_n^{3D}} = -I (since Δn3D=xμn3D\Delta_n^{3D} = x - \mu_n^{3D}):

σnμn3D=12μn3D[(Δn3D)(Σn3D)1Δn3D]=12[(Δn3D)μn3D(Σn3D)1Δn3D+(Δn3D)(Σn3D)1Δn3Dμn3D]=12[(I)(Σn3D)1Δn3D+(Δn3D)(Σn3D)1(I)]=12[(Σn3D)1Δn3D(Δn3D)(Σn3D)1]\begin{equation} \begin{split} \frac{\partial \sigma_n}{\partial \mu_n^{3D}} &= \frac{1}{2} \frac{\partial}{\partial \mu_n^{3D}} \left[ (\Delta_n^{3D})^{\top} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} \right] \\ &= \frac{1}{2} \left[ \frac{\partial (\Delta_n^{3D})^{\top}}{\partial \mu_n^{3D}} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} + (\Delta_n^{3D})^{\top} (\Sigma_n^{3D})^{-1} \frac{\partial \Delta_n^{3D}}{\partial \mu_n^{3D}} \right] \\ &= \frac{1}{2} \left[ (-I)^{\top} (\Sigma_n^{3D})^{-1} \Delta_n^{3D} + (\Delta_n^{3D})^{\top} (\Sigma_n^{3D})^{-1} (-I) \right] \\ &= \frac{1}{2} \left[ -(\Sigma_n^{3D})^{-1} \Delta_n^{3D} - (\Delta_n^{3D})^{\top} (\Sigma_n^{3D})^{-1} \right] \\ \end{split} \end{equation}

Since (Δn3D)(Σn3D)1(\Delta_n^{3D})^{\top} (\Sigma_n^{3D})^{-1} is a 1×31 \times 3 vector (row vector) and (Σn3D)1Δn3D(\Sigma_n^{3D})^{-1} \Delta_n^{3D} is a 3×13 \times 1 vector (column vector), and both represent the same mathematical quantity in transposed form (since Σn3D\Sigma_n^{3D} is symmetric), we can simplify:

σnμn3D=(Σn3D)1Δn3D\begin{align} \frac{\partial \sigma_n}{\partial \mu_n^{3D}} &= -(\Sigma_n^{3D})^{-1} \Delta_n^{3D} \end{align}

Therefore:
σnμn3D=(Σn3D)1Δn3D\begin{equation} \boxed{\frac{\partial \sigma_n}{\partial \mu_n^{3D}} = -(\Sigma_n^{3D})^{-1} \Delta_n^{3D}} \end{equation}

Proof 4: Gradient of the form factor σn\sigma_n w.r.t Gaussian Transformation Matrix (MnM_n)

Let's remember the following prior proof:

Furthermore the the matrix cookbook gives us :

Y1=Y1YY1(from equation (59) of the matrix  cookbook on the derivative of an inverse)\partial \mathbf{Y}^{-1} = -\mathbf{Y}^{-1} \partial \mathbf{Y} \mathbf{Y}^{-1} \scriptsize\text{(from equation (59) of the matrix \\\ cookbook on the derivative of an inverse)}

Which can be rewritten in our case to :
Σn3D1=Σn3D1 (Σn3D)  Σn3D1\begin{equation} \partial \mathbf{{\Sigma_n^{3D}}^{-1}} = -\mathbf{{\Sigma_n^{3D}}^{-1}} \space \partial \mathbf{(\Sigma_n^{3D})} \space \space \mathbf{{\Sigma_n^{3D}}^{-1}} \end{equation}

From these, using the Frobenius inner product (introduced in gsplat paper if need be) and letting us have H=σnΣn3D1H = \frac{\partial \sigma_n}{\partial {\Sigma_n^{3D}}^{-1}} we have:

σn=σnΣn3D1,Σn3D1=H,Σn3D1=H,Σn3D1(Σn3D)Σn3D1=(Σn3D1)H,(Σn3D)Σn3D1=(Σn3D1)H(Σn3D1),(Σn3D)\begin{equation} \begin{split} \partial \sigma_n &= \left\langle \frac{\partial \sigma_n}{\partial {\Sigma_n^{3D}}^{-1}}, \partial {\Sigma_n^{3D}}^{-1} \right\rangle \\ &= \left\langle H, \partial {\Sigma_n^{3D}}^{-1} \right\rangle \\ &= \left\langle H, -{\Sigma_n^{3D}}^{-1} \partial (\Sigma_n^{3D}) {\Sigma_n^{3D}}^{-1} \right\rangle \\ &= \left\langle -({\Sigma_n^{3D}}^{-1})^{\top} H, \partial (\Sigma_n^{3D}) {\Sigma_n^{3D}}^{-1} \right\rangle \\ &= \left\langle -({\Sigma_n^{3D}}^{-1})^{\top} H ({\Sigma_n^{3D}}^{-1})^{\top}, \partial (\Sigma_n^{3D}) \right\rangle \end{split} \end{equation}

Which if we apply the inner product on Σn3D\Sigma_n^{3D} w.r.t itself we obtain:
σnΣn3D=(Σn3D1)σnΣn3D1H(Σn3D1)=(Σn3D1)12Δn3D(Δn3D)H(Σn3D1)\begin{equation} \boxed{ \begin{split} \frac{\partial \sigma_n}{\partial {\Sigma_n^{3D}}} &= -({\Sigma_n^{3D}}^{-1})^{\top} \overbrace{\frac{\partial \sigma_n}{\partial {\Sigma_n^{3D}}^{-1}}}^{\text{H}} ({\Sigma_n^{3D}}^{-1})^{\top} \\ &= -({\Sigma_n^{3D}}^{-1})^{\top} \underbrace{\frac{1}{2}\Delta_n^{3D}(\Delta_n^{3D})^{\top}}_{\text{H}} ({\Sigma_n^{3D}}^{-1})^{\top} \end{split}} \end{equation}

Proof 5: Gradient of Loss w.r.t Gaussian Transformation Matrix (MnM_n)

We need to derive LMn\frac{\partial \mathcal{L}}{\partial M_n} based on the chain rule.

Starting with the relationship Σn3D=MnMnT\Sigma_n^{3D} = M_n M_n^T, we need to apply the chain rule:
LMn=LΣn3DΣn3DMn\frac{\partial \mathcal{L}}{\partial M_n} = \frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}} \frac{\partial \Sigma_n^{3D}}{\partial M_n}

To find Σn3DMn\frac{\partial \Sigma_n^{3D}}{\partial M_n}, we differentiate Σn3D=MnMn\Sigma_n^{3D} = M_n M_n^\top.

By using the following rules on the Frobenius inner product:

Let F=LΣn3DF = \frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}.

We then have:

L=LΣn3D,Σn3D=F,Σn3D=F,(MM)=F,(M)M+F,M(M)=F,(M)M+F,(M(M))=F,(M)M+F,(M)M=FM,(M)+FM,(M)=FM+FM,(M)\begin{equation} \begin{split} \partial \mathcal{L} &= \left\langle \frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}, \partial \Sigma_n^{3D} \right\rangle = \left\langle F, \partial \Sigma_n^{3D} \right\rangle = \left\langle F, \partial (MM^{\top}) \right\rangle \\ &= \left\langle F, \partial (M) M^{\top} \right\rangle + \left\langle F, M \partial (M^{\top}) \right\rangle \\ &= {\color{grey} {\left\langle F, \partial (M) M^{\top} \right\rangle}} + \left\langle F^{\top}, \left( M \partial \left( M^{\top} \right) \right)^{\top} \right\rangle \\ &= {\color{grey} {\left\langle F, \partial (M) M^{\top} \right\rangle}} + \left\langle F^{\top}, \partial(M) M^{\top} \right\rangle \\ &= \left\langle F M, \partial (M) \right\rangle + \left\langle F^{\top}M, \partial(M) \right\rangle \\ &= \left\langle F M + F^{\top}M, \partial(M) \right\rangle \end{split} \end{equation}

Which if we replace F and make the partial derivative with respect to the transformation matrix (thus beying an axes of the inner product):

LMn=FMn+FMn=LΣn3DMn+(LΣn3D)Mn\begin{equation} \boxed{\begin{split} \frac{\partial \mathcal{L}}{\partial M_n} &= F M_n + F^{\top} M_n \\ &= \frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}} M_n + \left(\frac{\partial \mathcal{L}}{\partial \Sigma_n^{3D}}\right)^{\top} M_n \end{split}} \end{equation}

Proof 6: Gradient with Respect to Quaternion Parameters

For the gradient with respect to quaternion parameters, we need to derive how changes in quaternion components affect the rotation matrix RnR_n, which then influences the transformation matrix MnM_n.

Given a quaternion qn=[w,x,y,z]q_n = [w, x, y, z], the rotation matrix RnR_n is defined as:
Rn=[12(y2+z2)2(xywz)2(xz+wy)2(xy+wz)12(x2+z2)2(yzwx)2(xzwy)2(yz+wx)12(x2+y2)]\begin{equation} R_n = \begin{bmatrix} 1 - 2(y^2 + z^2) & 2(xy - wz) & 2(xz + wy) \\ 2(xy + wz) & 1 - 2(x^2 + z^2) & 2(yz - wx) \\ 2(xz - wy) & 2(yz + wx) & 1 - 2(x^2 + y^2) \end{bmatrix} \end{equation}

Taking the derivative with respect to each quaternion component (w, x, y, z) requires differentiating each element of this matrix.

For example for the component ww, the derivation involves calculating Rijw\frac{\partial R_{ij}}{\partial w} for each element RijR_{ij} of the rotation matrix. For instance:
R12w=w(2(xywz))=2z\frac{\partial R_{12}}{\partial w} = \frac{\partial}{\partial w}(2(xy - wz)) = -2z

Similar calculations for the other elements lead to the complete derivative matrix.

For the component ww:
Rnw=2[0zyz0xyx0]\begin{equation} \frac{\partial R_n}{\partial w} = 2 \begin{bmatrix} 0 & -z & y \\ z & 0 & -x \\ -y & x & 0 \end{bmatrix} \end{equation}

The derivation involves calculating Rijw\frac{\partial R_{ij}}{\partial w} for each element RijR_{ij} of the rotation matrix. For instance:
R12w=w(2(xywz))=2z\begin{equation} \frac{\partial R_{12}}{\partial w} = \frac{\partial}{\partial w}(2(xy - wz)) = -2z \end{equation}

Similar calculations for the other elements lead to the complete derivative matrix.

For the component xx:
Rnx=2[0yzy2xwzw2x]\begin{equation} \frac{\partial R_n}{\partial x} = 2 \begin{bmatrix} 0 & y & z \\ y & -2x & -w \\ z & w & -2x \end{bmatrix} \end{equation}

For the component yy:
Rny=2[2yxwx0zwz2y]\begin{equation} \frac{\partial R_n}{\partial y} = 2 \begin{bmatrix} -2y & x & w \\ x & 0 & z \\ -w & z & -2y \end{bmatrix} \end{equation}

For the component zz:
Rnz=2[2zwxw2zyxy0]\begin{equation} \frac{\partial R_n}{\partial z} = 2 \begin{bmatrix} -2z & -w & x \\ w & -2z & y \\ x & y & 0 \end{bmatrix} \end{equation}

These derivatives are then used in the chain rule to compute Lqi\frac{\partial \mathcal{L}}{\partial q_i} as shown earlier.


References


  1. Nicolas Moenne-Loccoz et al., "3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes," arXiv, 2024. https://arxiv.org/abs/2407.07090 ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. Vickie Ye et al., "gsplat: An Open-Source Library for Gaussian Splatting," arXiv, 2024. https://arxiv.org/abs/2409.06765 ↩︎ ↩︎

  3. Kerbl, B., Kopanas, G., Leimk{"u}hler, T., & Drettakis, G. (2023). 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42(4). Link ↩︎

  4. Mai, A., Hedman, P., Kopanas, G., Verbin, D., Futschik, D., Xu, Q., Kuester, F., Barron, J. T., & Zhang, Y. (2024). EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis. arXiv:2410.01804. Link ↩︎