☕
Machine Learning Demystified
  • Introduction
  • Blogs
    • Spherical counterpart of Gaussian kernel
    • Vanilla Kalman Filter
    • Support Vector Machine
    • One Class SVM
    • Differentiable Bayesian Structure Learning
    • Stein Variational Gradient Descent
    • Finite Step Unconditional Markov Diffusion Models
Powered by GitBook
On this page
  • Introduction
  • Generalization of Gaussian kernel
  • von Mises-Fisher
  • Geodesic Gaussian
  • Heat Kernel

Was this helpful?

Edit on GitHub
  1. Blogs

Spherical counterpart of Gaussian kernel

Heat kernel on high-dimensional spheres

PreviousIntroductionNextVanilla Kalman Filter

Last updated 4 years ago

Was this helpful?

Introduction

Normal distribution may be the most common and important continuous distribution. Gaussian radial basis kernel (RBF) is also widely applied in various kernel methods such as SVM and, kernel PCA. Gaussian kernel is essentially the same as normal density up to a scaling factor such that self-similarity is one.

Cosine similarity is often employed to measure overlaps between high-dimensional feature vectors where vectors in Rn/{0}\mathbb R^n / \{0\}Rn/{0}are normalized by their positive ℓ2\ell_2ℓ2​-norms. In this case, the feature or embedding vectors are identified as points on a unit Sn−1S^{n-1}Sn−1.

Generalization of Gaussian kernel

If the feature space is a high-dimensional sphere, what is the equivalent of Gaussian RBF kernel?

von Mises-Fisher

The is directly related to normal distribution by restricting the density to a unit hypersphere. The corresponding kernel is ∼exp⁡(κcos⁡θ)\sim \exp (\kappa \cos \theta)∼exp(κcosθ) where κ>0\kappa > 0κ>0 is the precision parameter. If we rescale the kernel such that self-similarity is one, then we have

Kvmf(x^,y^)=eκ(cos⁡θ−1)=eκ(x^⋅y^−1)=e−κ2∥x^−y^∥2K_\text{vmf}(\hat x, \hat y) = {\rm e}^{\kappa (\cos \theta - 1)} = {\rm e}^{ \kappa ( \hat x \cdot \hat y - 1)} = {\rm e}^{- \frac{\kappa}{2} \Vert \hat x - \hat y \Vert^2 }Kvmf​(x^,y^​)=eκ(cosθ−1)=eκ(x^⋅y^​−1)=e−2κ​∥x^−y^​∥2

which is Gaussian RBF kernel in disguise with Euclidean distance as the chordal length ∥x^−y^∥\Vert \hat x - \hat y \Vert∥x^−y^​∥.

Geodesic Gaussian

If we replace the Euclidean distance squared by geodesic distance squared, we have a measure of similarity ∼exp⁡(−κθ2)\sim \exp (- \kappa \theta^2 )∼exp(−κθ2). However, the symmetric binary function Sn−1×Sn−1→RS^{n-1} \times S^{n-1} \rightarrow \mathbb RSn−1×Sn−1→R is not positive definite which can be proved using randomly generated examples. Thus, the geodesic Gaussian function is not a reproducing kernel. Hence, it cannot be applied to any of the kernel methods based on the assumption of reproducing kernel Hilbert space (RKHS).

Heat Kernel

Euclidean heat kernel

Spherical heat kernel

If we rescale the kernel such that self-similarity is unity, we have

Reduction to Euclidean heat kernel within a small vicinity

Gegenbauer polynomials can be expressed using hypergeometric functions

C_{\ell}^{\alpha}(x) = \binom{\ell + 2\alpha - 1}{\ell}\,_2F_1\left(-\ell, \ell + 2\alpha; \alpha + \frac12; \frac{1-x}{2} \right).

The heat kernel becomes

Thus, for small geodesic distance, we have

Hence, we have shown that the spherical heat kernel is indeed an extension of Euclidean heat kernel to the whole hypersphere.

Normal distribution is the Green's function of heat (diffusion) equation in Euclidean space. If we assume the diffusion constant is isotropic and equals one, then we get a one-parameter family of kernels parameterized by diffusion time ttt,

Geuc(∥x−y∥;t;n)=(14πt)n2exp⁡(−∥x−y∥24t)G_\text{euc} (\Vert x - y \Vert; t; n) = \left ( \frac{1}{4\pi t }\right )^{\frac{n}{2}} \exp \left( - \frac{\Vert x - y \Vert^2}{4t}\right)Geuc​(∥x−y∥;t;n)=(4πt1​)2n​exp(−4t∥x−y∥2​)

The Laplacian operator in Euclidean space may be interpreted as momentum squared or kinetic energy. The spherical Laplacian operator must be the kinetic energy of a particle on a hypersphere and the kinetic energy is quantum mechanical angular momentum squared. For more details, please refer to . The heat kernel on Sn−1S^{n-1}Sn−1 takes the exact expansion

Gsph(x^⋅y^;t;n−1)=∑ℓ=0∞e−ℓ(ℓ+n−2)t2ℓ+n−2n−21ASn−1Cℓn2−1(x^⋅y^)G_\text{sph} (\hat x \cdot \hat y; t; n-1) = \sum_{\ell = 0}^\infty {\rm e}^{-\ell (\ell + n - 2) t} \frac{2\ell + n - 2}{n - 2} \frac{1}{A_{S^{n-1}}} C_{\ell}^{\frac{n}{2} - 1}(\hat x \cdot \hat y)Gsph​(x^⋅y^​;t;n−1)=ℓ=0∑∞​e−ℓ(ℓ+n−2)tn−22ℓ+n−2​ASn−1​1​Cℓ2n​−1​(x^⋅y^​)

where ASn−1A_{S^{n-1}}ASn−1​ is area of Sn−1S^{n-1}Sn−1 and Cℓα(w)C_\ell^\alpha (w)Cℓα​(w) are Gegenbauer polynomials.

For very large time, the heat must be uniform on the sphere. It is easy to verify that when t↑∞t\uparrow \inftyt↑∞ the heat kernel reduces to uniform distribution 1/ASn−11/A_{S^{n-1}}1/ASn−1​.

The kernel was obtained through eigenfunction expansion with positive eigenvalues. Thus, it is not only positive definite but also a .

Ksph(x^⋅y^;t)=Gsph(x^⋅y^;t)/Gsph(1;t).K_\text{sph}(\hat x \cdot \hat y; t) = G_\text{sph}(\hat x \cdot \hat y; t)/G_\text{sph}(1; t).Ksph​(x^⋅y^​;t)=Gsph​(x^⋅y^​;t)/Gsph​(1;t).

When x^⋅y^≡cos⁡θ≈1\hat x \cdot \hat y \equiv \cos \theta \approx 1x^⋅y^​≡cosθ≈1, the hypersphere is locally homeomorphic to Rn−1\mathbb R^{n-1}Rn−1and the physics of heat diffusion should reduce to that of Euclidean space as well. Therefore, we need to show that the exact hyperspherical heat kernel reduces to Euclidean heat kernel for θ↓0\theta \downarrow 0θ↓0.

First, let x^⋅y^=1−ε\hat x \cdot \hat y = 1 - \varepsilonx^⋅y^​=1−ε then

Gsph(1−ε;t)=∑ℓ=0∞e−ℓ(ℓ+n−2)t2ℓ+n−2n−21ASn−1Cℓn2−1(1−ε).G_\text{sph} (1 - \varepsilon; t) = \sum_{\ell = 0}^\infty {\rm e}^{-\ell (\ell + n - 2) t} \frac{2\ell + n - 2}{n - 2} \frac{1}{A_{S^{n-1}}} C_{\ell}^{\frac{n}{2} - 1}(1 - \varepsilon).Gsph​(1−ε;t)=ℓ=0∑∞​e−ℓ(ℓ+n−2)tn−22ℓ+n−2​ASn−1​1​Cℓ2n​−1​(1−ε).

Keep only first order terms, and letx=1−εx=1-\varepsilonx=1−εand α=n2−1\alpha = \frac{n}{2} - 1α=2n​−1

Cℓn2−1(1−ε)=(ℓ+2n−3ℓ)(1−ℓ(ℓ+n−2)n−1ε+O(ε2)).C_{\ell}^{\frac{n}{2} - 1}(1 - \varepsilon) = \binom{\ell + 2n - 3}{\ell}\left( 1 - \frac{\ell (\ell + n - 2)}{n-1}\varepsilon + \mathcal O (\varepsilon^2)\right).Cℓ2n​−1​(1−ε)=(ℓℓ+2n−3​)(1−n−1ℓ(ℓ+n−2)​ε+O(ε2)).
Gsph(1−ε;t)=Gsph(1;t)+1n−1∂tGsph(1;t)ε+O(ε2)≈Gsph(1;t)e1n−1∂tlog⁡Gsph(1;t)εG_\text{sph} (1 - \varepsilon; t) = G_\text{sph} (1; t)+ \frac{1}{n-1}\partial_t G_\text{sph} (1; t) \varepsilon + \mathcal O(\varepsilon^2) \approx G_\text{sph} (1; t) {\rm e}^{\frac{1}{n-1} \partial_t \log G_\text{sph} (1; t) \varepsilon}Gsph​(1−ε;t)=Gsph​(1;t)+n−11​∂t​Gsph​(1;t)ε+O(ε2)≈Gsph​(1;t)en−11​∂t​logGsph​(1;t)ε

Set ε=θ2/2\varepsilon = \theta^2 / 2ε=θ2/2 and f(t)=1/Gsph(1;t)f(t) = 1 / G_\text{sph}(1; t)f(t)=1/Gsph​(1;t). In geodesic spherical coordinates, normalization of the density requires

f(t)=(2πn−1∂tlog⁡f(t))n−12f(t) = \left (2\pi \frac{n-1}{\partial_t \log f(t)} \right)^{\frac{n-1}{2}}f(t)=(2π∂t​logf(t)n−1​)2n−1​

which is a nonlinear differential equation for f(t)f(t)f(t). Let f(t)=Ctαf(t) = C t^\alphaf(t)=Ctα, then we have the equality

C2n−1α2π(n−1)=t1−2αn−1=1\frac{C^\frac{2}{n-1} \alpha}{2\pi (n-1)} = t^{1 - \frac{2 \alpha}{n-1}}=12π(n−1)Cn−12​α​=t1−n−12α​=1

which gives α=(n−1)/2\alpha = (n-1)/2α=(n−1)/2 and C=(4π)(n−1)/2C=(4\pi)^{(n-1)/2}C=(4π)(n−1)/2 or f(t)=(4πt)(n−1)/2f(t) = (4\pi t)^{(n-1)/2}f(t)=(4πt)(n−1)/2.

Gsph(cos⁡θ;t;n−1)=Geuc(θ;t;n−1)+O(θ4).G_\text{sph}(\cos\theta; t; n-1) = G_\text{euc}(\theta; t; n-1) + \mathcal O (\theta^4).Gsph​(cosθ;t;n−1)=Geuc​(θ;t;n−1)+O(θ4).
Mercer kernel
von Mises-Fisher distribution
Exact Heat Kernel on a Hypersphere and Its Applications in Kernel SVM