Projected normal distribution

Projected normal distribution
Notation
Parameters	(location); (scale)
Support
PDF	complicated, see text

In directional statistics, the projected normal distribution (also known as offset normal distribution, angular normal distribution or angular Gaussian distribution)^[1]^[2] is a probability distribution over directions that describes the radial projection of a random variable with n-variate normal distribution over the unit (n-1)-sphere.

Definition and properties

Given a random variable ${\boldsymbol {X}}\in \mathbb {R} ^{n}$ that follows a multivariate normal distribution ${\mathcal {N}}_{n}({\boldsymbol {\mu }},\,{\boldsymbol {\Sigma }})$ , the projected normal distribution ${\mathcal {PN}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ represents the distribution of the random variable ${\boldsymbol {Y}}={\frac {\boldsymbol {X}}{\lVert {\boldsymbol {X}}\rVert }}$ obtained projecting ${\boldsymbol {X}}$ over the unit sphere. In the general case, the projected normal distribution can be asymmetric and multimodal. In case ${\boldsymbol {\mu }}$ is parallel to an eigenvector of ${\boldsymbol {\Sigma }}$ , the distribution is symmetric.^[3] The first version of such distribution was introduced in Pukkila and Rao (1988).^[4]

Density function

The density of the projected normal distribution ${\mathcal {PN}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ can be constructed from the density of its generator n-variate normal distribution ${\mathcal {N}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ by re-parametrising to n-dimensional spherical coordinates and then integrating over the radial coordinate.

In spherical coordinates with radial component $r\in [0,\infty )$ and angles ${\boldsymbol {\theta }}=(\theta _{1},\dots ,\theta _{n-1})\in [0,\pi ]^{n-2}\times [0,2\pi )$ , a point ${\boldsymbol {x}}=(x_{1},\dots ,x_{n})\in \mathbb {R} ^{n}$ can be written as ${\boldsymbol {x}}=r{\boldsymbol {v}}$ , with $\lVert {\boldsymbol {v}}\rVert =1$ . The joint density becomes

p(r,{\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\frac {r^{n-1}}{{\sqrt {|{\boldsymbol {\Sigma }}|}}(2\pi )^{\frac {n}{2}}}}e^{-{\frac {1}{2}}(r{\boldsymbol {v}}-{\boldsymbol {\mu }})^{\top }\Sigma ^{-1}(r{\boldsymbol {v}}-{\boldsymbol {\mu }})}

and the density of ${\mathcal {PN}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ can then be obtained as^[5]

p({\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})=\int _{0}^{\infty }p(r,{\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})dr.

The same density had been previously obtained in Pukkila and Rao (1988, Eq. (2.4))^[4] using a different notation.

Circular distribution

Parametrising the position on the unit circle in polar coordinates as ${\boldsymbol {v}}=(\cos \theta ,\sin \theta )$ , the density function can be written with respect to the parameters ${\boldsymbol {\mu }}$ and ${\boldsymbol {\Sigma }}$ of the initial normal distribution as

p(\theta |{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\frac {e^{-{\frac {1}{2}}{\boldsymbol {\mu }}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {\mu }}}}{2\pi {\sqrt {|{\boldsymbol {\Sigma }}|}}{\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {v}}}}\left(1+T(\theta ){\frac {\Phi (T(\theta ))}{\phi (T(\theta ))}}\right)I_{[0,2\pi )}(\theta )

where $\phi$ and $\Phi$ are the density and cumulative distribution of a standard normal distribution, $T(\theta )={\frac {{\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {\mu }}}{\sqrt {{\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {v}}}}}$ , and $I$ is the indicator function.^[3]

In the circular case, if the mean vector ${\boldsymbol {\mu }}$ is parallel to the eigenvector associated to the largest eigenvalue of the covariance, the distribution is symmetric and has a mode at $\theta =\alpha$ and either a mode or an antimode at $\theta =\alpha +\pi$ , where $\alpha$ is the polar angle of ${\boldsymbol {\mu }}=(r\cos \alpha ,r\sin \alpha )$ . If the mean is parallel to the eigenvector associated to the smallest eigenvalue instead, the distribution is also symmetric but has either a mode or an antimode at $\theta =\alpha$ and an antimode at $\theta =\alpha +\pi$ .^[6]

Spherical distribution

Parametrising the position on the unit sphere in spherical coordinates as ${\boldsymbol {v}}=(\cos \theta _{1}\sin \theta _{2},\sin \theta _{1}\sin \theta _{2},\cos \theta _{2})$ where ${\boldsymbol {\theta }}=(\theta _{1},\theta _{2})$ are the azimuth $\theta _{1}\in [0,2\pi )$ and inclination $\theta _{2}\in [0,\pi ]$ angles respectively, the density function becomes

p({\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\frac {e^{-{\frac {1}{2}}{\boldsymbol {\mu }}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {\mu }}}}{{\sqrt {|{\boldsymbol {\Sigma }}|}}\left(2\pi {\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {v}}\right)^{\frac {3}{2}}}}\left({\frac {\Phi (T({\boldsymbol {\theta }}))}{\phi (T({\boldsymbol {\theta }}))}}+T({\boldsymbol {\theta }})\left(1+T({\boldsymbol {\theta }}){\frac {\Phi (T({\boldsymbol {\theta }}))}{\phi (T({\boldsymbol {\theta }}))}}\right)\right)I_{[0,2\pi )}(\theta _{1})I_{[0,\pi ]}(\theta _{2})

where $\phi$ , $\Phi$ , $T$ , and $I$ have the same meaning as the circular case.^[7]

Angular Central Gaussian Distribution

In the special case, ${\boldsymbol {\mu }}=\mathbf {0}$ , the projected normal distribution, with $n\geq 2$ is known as the angular central Gaussian (ACG)^[8] and in this case, the density function can be obtained in closed form as a function of Cartesian coordinates. Let $\mathbf {x} \sim {\mathcal {N}}_{n}(\mathbf {0} ,{\boldsymbol {\Sigma }})$ and project radially: $\mathbf {v} =\lVert \mathbf {x} \rVert ^{-1}\mathbf {x}$ so that $\mathbf {v} \in \mathbb {S} ^{n-1}=\{\mathbf {z} \in \mathbb {R} ^{n}:\lVert \mathbf {z} \rVert =1\}$ (the unit hypersphere). We write $\mathbf {v} \sim \operatorname {ACG} ({\boldsymbol {\Sigma }})$ , which as explained above, has density (with respect to Lebesgue measure pulled back to $\mathbb {S} ^{n-1}$ ):

p_{\text{ACG}}(\mathbf {v} \mid {\boldsymbol {\Sigma }})=\int _{0}^{\infty }r^{n-1}{\mathcal {N}}_{n}(r\mathbf {v} \mid \mathbf {0} ,{\boldsymbol {\Sigma }})\,dr={\frac {\Gamma ({\frac {n}{2}})}{2\pi ^{\frac {n}{2}}}}\left|{\boldsymbol {\Sigma }}\right|^{-{\frac {1}{2}}}(\mathbf {v} '{\boldsymbol {\Sigma }}^{-1}\mathbf {v} )^{-{\frac {n}{2}}}

where the integral can be solved by a change of variables and then using the standard definition of the gamma function. Notice that:

For any $k>0$ there is the parameter indeterminacy:

p_{\text{ACG}}(\mathbf {v} \mid k{\boldsymbol {\Sigma }})=p_{\text{ACG}}(\mathbf {v} \mid {\boldsymbol {\Sigma }})

.

If ${\boldsymbol {\Sigma }}=k\mathbf {I}$ , the uniform distribution, $\operatorname {ACG(\mathbf {I} _{n})}$ results, with constant density equal to the reciprocal of the surface area of $\mathbb {S} ^{n-1}$ :

p_{\text{ACG}}(\mathbf {v} \mid \mathbf {k} I_{n})=p_{\text{uniform}}={\frac {\Gamma ({\frac {n}{2}})}{2\pi ^{\frac {n}{2}}}}

ACG via transformation of normal or uniform variates

Let $\mathbf {T}$ be any $n$ -by- $n$ invertible matrix such that $\mathbf {T} \mathbf {T} '={\boldsymbol {\Sigma }}$ . Let $\mathbf {u} \sim \operatorname {ACG} (\mathbf {I} _{n})$ (uniform) and $s\sim \chi (n)$ (chi distribution), so that: $\mathbf {x} =s\mathbf {Tu} \sim {\mathcal {N}}_{n}(\mathbf {0} ,{\boldsymbol {\Sigma }})$ (multivariate normal). Now consider:

\mathbf {v} ={\frac {\mathbf {Tu} }{\lVert \mathbf {Tu} \rVert }}={\frac {\mathbf {x} }{\lVert \mathbf {x} \rVert }}\sim \operatorname {ACG} ({\boldsymbol {\Sigma }})

which shows that the ACG distribution also results from applying, to uniform variates, the normalized linear transform:^[8]

f_{\mathbf {T} }(\mathbf {u} )={\frac {\mathbf {Tu} }{\lVert \mathbf {Tu} \rVert }}

Some further explanation of these two ways to obtain $\mathbf {v} \sim \operatorname {ACG} ({\boldsymbol {\Sigma }})$ may be helpful:

If we start with $\mathbf {x} \in \mathbb {R} ^{n}$ , sampled from a multivariate normal, we can project radially onto $\mathbb {S} ^{n-1}$ to obtain ACG variates. To derive the ACG density, we first do a change of variables: $\mathbf {x} \mapsto (r,\mathbf {v} )$ , which is still an $n$ -dimensional representation, and this transformation induces the differential volume change factor, $r^{n-1}$ , which is proportional to volume in the $(n-1)$ -dimensional tangent space perpendicular to $\mathbf {x}$ . Then, to finally obtain the ACG density on the $(n-1)$ -dimensional unitsphere, we need to marginalize over $r$ .
If we start with $\mathbf {u} \in \mathbb {S} ^{n-1}$ , sampled from the uniform distribution, we do not need to marginalize, because we are already in $n-1$ dimensions. Instead, to obtain ACG variates (and the associated density), we can directly do the change of variables, $\mathbf {v} =f_{\mathbf {T} }(\mathbf {u} )$ , for which further details are given in the next subsection.

Caveat: when ${\boldsymbol {\mu }}$ is nonzero, although $s\mathbf {Tu} +{\boldsymbol {\mu }}\sim {\mathcal {N}}_{d}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ , a similar duality does not hold:

{\frac {\mathbf {Tu} +{\boldsymbol {\mu }}}{\lVert \mathbf {Tu} +{\boldsymbol {\mu }}\rVert }}\neq {\frac {s\mathbf {Tu} +{\boldsymbol {\mu }}}{\lVert s\mathbf {Tu} +{\boldsymbol {\mu }}\rVert }}\sim {\mathcal {PN}}_{n}({\boldsymbol {\mu ,\Sigma }})

Although we can radially project affine-transformed normal variates to get ${\mathcal {PN}}_{n}$ variates, this does not work for uniform variates.

Wider application of the normalized linear transform

The normalized linear transform, $\mathbf {v} =f_{\mathbf {T} }(\mathbf {u} )$ , is a bijection from the unitsphere to itself; the inverse is $\mathbf {u} =f_{\mathbf {T} ^{-1}}(\mathbf {v} )$ . This transform is of independent interest, as it may be applied as a probabilistic flow on the hypersphere (similar to a normalizing flow) to generalize other (non-uniform) distributions on hyperspheres, for example the Von Mises-Fisher distribution. The fact that we have a closed form for the ACG density allows us to recover also in closed form the differential volume change induced by this transform.

For the change of variables, $\mathbf {v} =f_{\mathbf {T} }(\mathbf {u} )$ on the manifold, $\mathbb {S} ^{n-1}$ , the uniform and ACG densities are related as:^[9]

p_{\text{ACG}}(\mathbf {v} \mid {\boldsymbol {\Sigma }})={\frac {p_{\text{uniform}}}{R(\mathbf {v} ,{\boldsymbol {\Sigma }})}}

where the (constant) uniform density is $p_{\text{uniform}}={\frac {\Gamma (n/2)}{2\pi ^{n/2}}}$ and where $R(\mathbf {v} ,{\boldsymbol {\Sigma }})$ is the differential volume change factor from the input to the output of the transformation; specifically, it is given by the absolute value of the determinant of an $(n-1)$ -by- $(n-1)$ matrix:

R(\mathbf {v} ,{\boldsymbol {\Sigma }})=\operatorname {abs} \left|\mathbf {Q} _{\mathbf {v} }'\mathbf {J} _{\mathbf {u} }\mathbf {Q} _{\mathbf {u} }\right|

where $\mathbf {J} _{\mathbf {u} }$ is the $n$ -by- $n$ Jacobian matrix of the transformation in Euclidean space, $f_{\mathbf {T} }:\mathbb {R} ^{n}\to \mathbb {R} ^{n}$ , evaluated at $\mathbf {u}$ . In Euclidean space, the transformation and its Jacobian are non-invertible, but when the domain and co-domain are restricted to $\mathbb {S} ^{n-1}$ , then $f_{\mathbf {T} }:\mathbb {S} ^{n-1}\to \mathbb {S} ^{n-1}$ is a bijection and the induced differential volume ratio, $R(\mathbf {v} ,{\boldsymbol {\Sigma }})$ is obtained by projecting $\mathbf {J} _{\mathbf {u} }$ onto the $(n-1)$ -dimensional tangent spaces at the transformation input and output: $\mathbf {Q} _{\mathbf {u} },\mathbf {Q} _{\mathbf {v} }$ are $n$ -by- $(n-1)$ matrices whose orthonormal columns span the tangent spaces. Although the above determinant formula is relatively easy to evaluate numerically on a software platform equipped with linear algebra and automatic differentiation, a simple closed form is hard to derive directly. However, since we already have $p_{\text{ACG}}$ , we can recover:

R(\mathbf {v} ,{\boldsymbol {\Sigma }})=\left|{\boldsymbol {\Sigma }}\right|^{\frac {1}{2}}(\mathbf {v} '{\boldsymbol {\Sigma }}^{-1}\mathbf {v} )^{\frac {n}{2}}={\frac {\operatorname {abs} \left|\mathbf {T} \right|}{\lVert \mathbf {Tu} \rVert ^{n}}}

where in the final RHS it is understood that ${\boldsymbol {\Sigma }}=\mathbf {T} \mathbf {T} '$ and $\mathbf {u} =f_{\mathbf {T} ^{-1}}(\mathbf {v} )$ .

The normalized linear transform can now be used, for example, to give a closed-form density for a more flexible distribution on the hypersphere, that is generalized from the Von Mises-Fisher. Let $\mathbf {x} \sim {\text{VMF}}({\boldsymbol {\mu }},\kappa )$ and $\mathbf {v} =f_{\mathbf {T} }(\mathbf {x} )$ ; the resulting density is:

p(\mathbf {v} \mid {\boldsymbol {\mu }},\kappa ,\mathbf {T} )={\frac {p_{\text{VMF}}{\bigl (}\mathbf {f} _{T^{-1}}(\mathbf {v} )\mid {\boldsymbol {\mu }},\kappa {\bigr )}}{R(\mathbf {v} ,\mathbf {T} \mathbf {T} ')}}

References

^ Wang & Gelfand 2013.
^ Pukkila & Rao 1988.
^ ^a ^b Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 115.
^ ^a ^b Pukkila & Rao 1988, p. 381.
^ Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 117.
^ Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, Supplementary material, p. 1.
^ Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 123.
^ ^a ^b Tyler 1987.
^ Sorrenson et al. 2024, Appendix A.

Sources

Pukkila, Tarmo M.; Rao, C. Radhakrishna (1988). "Pattern recognition based on scale invariant discriminant functions". Information Sciences. 45 (3): 379–389. doi:10.1016/0020-0255(88)90012-6.
Hernandez-Stumpfhauser, Daniel; Breidt, F. Jay; van der Woerd, Mark J. (2017). "The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference". Bayesian Analysis. 12 (1): 113–133. doi:10.1214/15-BA989.
Wang, Fangpo; Gelfand, Alan E (2013). "Directional data analysis under the general projected normal distribution". Statistical Methodology. 10 (1). Elsevier: 113–127. doi:10.1016/j.stamet.2012.07.005. PMC 3773532. PMID 24046539.
Tyler, David E (1987). "Statistical analysis for the angular central Gaussian distribution on the sphere". Biometrika. 74 (3): 579–589. doi:10.2307/2336697.
Sorrenson, Peter; Draxler, Felix; Rousselot, Armand; Hummerich, Sander; Köthe, Ullrich (2024). "Learning Distributions on Manifolds with Free-Form Flows". arXiv:2312.09852 [cs.LG].

[FOOTNOTEWangGelfand2013-1] Wang & Gelfand 2013.

[FOOTNOTEPukkilaRao1988-2] Pukkila & Rao 1988.

[FOOTNOTEHernandez-StumpfhauserBreidtvan_der_Woerd2017115-3] Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 115.

[FOOTNOTEPukkilaRao1988381-4] Pukkila & Rao 1988, p. 381.

[FOOTNOTEHernandez-StumpfhauserBreidtvan_der_Woerd2017117-5] Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 117.

[FOOTNOTEHernandez-StumpfhauserBreidtvan_der_Woerd2017-6] Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, Supplementary material, p. 1.

[FOOTNOTEHernandez-StumpfhauserBreidtvan_der_Woerd2017123-7] Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 123.

[FOOTNOTETyler1987-8] Tyler 1987.

[FOOTNOTESorrensonDraxlerRousselotHummerich2024-9] Sorrenson et al. 2024, Appendix A.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]