Spherical counterpart of Gaussian kernel

Heat kernel on high-dimensional spheres

Introduction

Normal distribution may be the most common and important continuous distribution. Gaussian radial basis kernel (RBF) is also widely applied in various kernel methods such as SVM and, kernel PCA. Gaussian kernel is essentially the same as normal density up to a scaling factor such that self-similarity is one.

Cosine similarity is often employed to measure overlaps between high-dimensional feature vectors where vectors in Rn/{0}\mathbb R^n / \{0\}are normalized by their positive 2\ell_2-norms. In this case, the feature or embedding vectors are identified as points on a unit Sn1S^{n-1}.

Generalization of Gaussian kernel

If the feature space is a high-dimensional sphere, what is the equivalent of Gaussian RBF kernel?

von Mises-Fisher

The von Mises-Fisher distribution is directly related to normal distribution by restricting the density to a unit hypersphere. The corresponding kernel is exp(κcosθ)\sim \exp (\kappa \cos \theta) where κ>0\kappa > 0 is the precision parameter. If we rescale the kernel such that self-similarity is one, then we have

Kvmf(x^,y^)=eκ(cosθ1)=eκ(x^y^1)=eκ2x^y^2K_\text{vmf}(\hat x, \hat y) = {\rm e}^{\kappa (\cos \theta - 1)} = {\rm e}^{ \kappa ( \hat x \cdot \hat y - 1)} = {\rm e}^{- \frac{\kappa}{2} \Vert \hat x - \hat y \Vert^2 }

which is Gaussian RBF kernel in disguise with Euclidean distance as the chordal length x^y^\Vert \hat x - \hat y \Vert.

Geodesic Gaussian

If we replace the Euclidean distance squared by geodesic distance squared, we have a measure of similarity exp(κθ2)\sim \exp (- \kappa \theta^2 ). However, the symmetric binary function Sn1×Sn1RS^{n-1} \times S^{n-1} \rightarrow \mathbb R is not positive definite which can be proved using randomly generated examples. Thus, the geodesic Gaussian function is not a reproducing kernel. Hence, it cannot be applied to any of the kernel methods based on the assumption of reproducing kernel Hilbert space (RKHS).

Heat Kernel

Euclidean heat kernel

Normal distribution is the Green's function of heat (diffusion) equation in Euclidean space. If we assume the diffusion constant is isotropic and equals one, then we get a one-parameter family of kernels parameterized by diffusion time tt,

Geuc(xy;t;n)=(14πt)n2exp(xy24t)G_\text{euc} (\Vert x - y \Vert; t; n) = \left ( \frac{1}{4\pi t }\right )^{\frac{n}{2}} \exp \left( - \frac{\Vert x - y \Vert^2}{4t}\right)

Spherical heat kernel

The Laplacian operator in Euclidean space may be interpreted as momentum squared or kinetic energy. The spherical Laplacian operator must be the kinetic energy of a particle on a hypersphere and the kinetic energy is quantum mechanical angular momentum squared. For more details, please refer to Exact Heat Kernel on a Hypersphere and Its Applications in Kernel SVM. The heat kernel on Sn1S^{n-1} takes the exact expansion

Gsph(x^y^;t;n1)==0e(+n2)t2+n2n21ASn1Cn21(x^y^)G_\text{sph} (\hat x \cdot \hat y; t; n-1) = \sum_{\ell = 0}^\infty {\rm e}^{-\ell (\ell + n - 2) t} \frac{2\ell + n - 2}{n - 2} \frac{1}{A_{S^{n-1}}} C_{\ell}^{\frac{n}{2} - 1}(\hat x \cdot \hat y)

where ASn1A_{S^{n-1}} is area of Sn1S^{n-1} and Cα(w)C_\ell^\alpha (w) are Gegenbauer polynomials.

For very large time, the heat must be uniform on the sphere. It is easy to verify that when tt\uparrow \infty the heat kernel reduces to uniform distribution 1/ASn11/A_{S^{n-1}}.

The kernel was obtained through eigenfunction expansion with positive eigenvalues. Thus, it is not only positive definite but also a Mercer kernel.

If we rescale the kernel such that self-similarity is unity, we have

Ksph(x^y^;t)=Gsph(x^y^;t)/Gsph(1;t).K_\text{sph}(\hat x \cdot \hat y; t) = G_\text{sph}(\hat x \cdot \hat y; t)/G_\text{sph}(1; t).

Reduction to Euclidean heat kernel within a small vicinity

When x^y^cosθ1\hat x \cdot \hat y \equiv \cos \theta \approx 1, the hypersphere is locally homeomorphic to Rn1\mathbb R^{n-1}and the physics of heat diffusion should reduce to that of Euclidean space as well. Therefore, we need to show that the exact hyperspherical heat kernel reduces to Euclidean heat kernel for θ0\theta \downarrow 0.

First, let x^y^=1ε\hat x \cdot \hat y = 1 - \varepsilon then

Gsph(1ε;t)==0e(+n2)t2+n2n21ASn1Cn21(1ε).G_\text{sph} (1 - \varepsilon; t) = \sum_{\ell = 0}^\infty {\rm e}^{-\ell (\ell + n - 2) t} \frac{2\ell + n - 2}{n - 2} \frac{1}{A_{S^{n-1}}} C_{\ell}^{\frac{n}{2} - 1}(1 - \varepsilon).

Gegenbauer polynomials can be expressed using hypergeometric functions

Keep only first order terms, and letx=1εx=1-\varepsilonand α=n21\alpha = \frac{n}{2} - 1

Cn21(1ε)=(+2n3)(1(+n2)n1ε+O(ε2)).C_{\ell}^{\frac{n}{2} - 1}(1 - \varepsilon) = \binom{\ell + 2n - 3}{\ell}\left( 1 - \frac{\ell (\ell + n - 2)}{n-1}\varepsilon + \mathcal O (\varepsilon^2)\right).

The heat kernel becomes

Gsph(1ε;t)=Gsph(1;t)+1n1tGsph(1;t)ε+O(ε2)Gsph(1;t)e1n1tlogGsph(1;t)εG_\text{sph} (1 - \varepsilon; t) = G_\text{sph} (1; t)+ \frac{1}{n-1}\partial_t G_\text{sph} (1; t) \varepsilon + \mathcal O(\varepsilon^2) \approx G_\text{sph} (1; t) {\rm e}^{\frac{1}{n-1} \partial_t \log G_\text{sph} (1; t) \varepsilon}

Set ε=θ2/2\varepsilon = \theta^2 / 2 and f(t)=1/Gsph(1;t)f(t) = 1 / G_\text{sph}(1; t). In geodesic spherical coordinates, normalization of the density requires

f(t)=(2πn1tlogf(t))n12f(t) = \left (2\pi \frac{n-1}{\partial_t \log f(t)} \right)^{\frac{n-1}{2}}

which is a nonlinear differential equation for f(t)f(t). Let f(t)=Ctαf(t) = C t^\alpha, then we have the equality

C2n1α2π(n1)=t12αn1=1\frac{C^\frac{2}{n-1} \alpha}{2\pi (n-1)} = t^{1 - \frac{2 \alpha}{n-1}}=1

which gives α=(n1)/2\alpha = (n-1)/2 and C=(4π)(n1)/2C=(4\pi)^{(n-1)/2} or f(t)=(4πt)(n1)/2f(t) = (4\pi t)^{(n-1)/2}.

Thus, for small geodesic distance, we have

Gsph(cosθ;t;n1)=Geuc(θ;t;n1)+O(θ4).G_\text{sph}(\cos\theta; t; n-1) = G_\text{euc}(\theta; t; n-1) + \mathcal O (\theta^4).

Hence, we have shown that the spherical heat kernel is indeed an extension of Euclidean heat kernel to the whole hypersphere.

Last updated

Was this helpful?