Support Vector Machine

Formulation of Support Vector Machine (SVM)

Primal Optimization Problem

SVM is a maximum margin classifier where the decision boundary f(x)=wx+bf(x) = \vec w\cdot \vec x + \vec b separate two classes labeled by y=±1y=\pm 1 in Rn\mathbb R^n with a largest possible margin.

The normal vector of the hyperplane

The normal vector in direction of increasing f(x)f(x) is

n^=f(x)f(x)2=ww2\hat n = \frac{\nabla f(x)}{\Vert \nabla f(x) \Vert_2} = \frac{\vec w}{\Vert w \Vert_2}

The perpendicular distance to the hyperplane from any point xRn\vec x'\in\mathbb R^n

The distance between any point xx' and the plane f(x)f(x) is the distance between this point and any point in the plane projected to the normal direction d=(xx)n^d = \vert (\vec x' - \vec x)\cdot \hat n \vert where f(x)=wn^x+bf(x) = \Vert w \Vert \hat n\cdot \vec x + b. Now ground the decision boundary to zero potential, then 0=wn^x+b0 = \Vert w \Vert \hat n\cdot \vec x + b and thus,

d=wx+bwd=\frac{\left\vert \vec w\cdot \vec x' + b \right\vert}{\Vert w \Vert}

Maximize the distance subject to the constraint of correct class labels

The optimization is

maxw,bminxd=maxw,bminxwx+bw\max_{w, b} \min_x d = \max_{w, b}\min_{x} \frac{\left\vert \vec w\cdot \vec x + b \right\vert}{\Vert w \Vert}

subject to

wx+b1,y=+1wx+b1,y=1\vec w \cdot \vec x + b \ge 1,\, y=+1\\ \vec w \cdot \vec x + b \le -1,\, y=-1

which is equivalent to

maxw,b1w=minw,b12w22,  (x,y)  1y(wx+b)0\max_{w,b} \frac{1}{\Vert w \Vert} = \min_{w,b} \frac{1}{2}\Vert w \Vert_2^2,\; \forall (\vec x,y) \; 1 - y(\vec w \cdot \vec x + b) \le 0

Introduce KKT parameters αi\alpha_i for each pair of data and label (xi,yi)(\vec x_i, y_i), the primal Lagrangian, or cost function is

LP(w,b)=12ww+iαi[1yi(wxi+b)]\mathcal L^P(w,b) = \frac12 w^\top w + \sum_i \alpha_i [1-y_i(w^\top x_i + b)]

with

αi[1yi(wxi+b)]=0,αi>0\alpha_i [1 - y_i(w^\top x_i +b)] = 0, \alpha_i > 0

Soft margin and slack variables

The hinge loss max(0,1y(wx+b))\max(0, 1-y(w^\top x + b)) measures the degree of misclassification where slack variable ξ=1y(wx+b)\xi = 1 - y(w^\top x + b).

The optimization becomes

minw,b12w2+Cimax(0,1yi(wxi+b))\min_{w,b} \frac12 \Vert w \Vert^2 + C\sum_i \max(0, 1-y_i(w^\top x_i + b))

If we wish to use the dual form, we need to cast the above into constraints. Now the soft margin constraint becomes

(1ξ)y(wx+b)0,ξ0(1-\xi) - y(w^\top x + b) \le 0,\, \xi \ge 0

The primal Lagrangian becomes

LP(w,b,{ξi})=12ww+Ciξi+iαi[(1ξi)yi(wxi+b)]iμiξi\mathcal L^P(w,b, \{\xi_i\}) = \frac12 w^\top w + C\sum_i \xi_i + \sum_i \alpha_i [(1-\xi_i)-y_i(w^\top x_i + b)] - \sum_i \mu_i\xi_i

The gradients are

wLP=wiαiyixi=0bLP=iαiyi=0ξiLP=Cμiαi=0\partial_w \mathcal L^P = w - \sum_i \alpha_i y_i x_i=0\\ \partial_b \mathcal L^P = -\sum_i \alpha_i y_i = 0\\ \partial_{\xi_i} \mathcal L^P = C - \mu_i - \alpha_i=0

while μi>0\mu_i > 0, αi>0\alpha_i >0, ξi0\xi_i \ge 0.

Dual Formulation

The dual Lagrangian reads

LD=iαi12i,jαiαjyiyjxi,xj\mathcal L^D = \sum_i \alpha_i -\frac12\sum_{i,j} \alpha_i \alpha_j y_i y_j \langle x_i, x_j \rangle

where CμC-\mu gives α\alpha and hinge loss gives w2-w^2 combine the first term gives 12w2-\frac12 w^2.

Now the optimization problem becomes

max{αi}iαi12i,jαiαjyiyjxi,xj\max_{\{\alpha_i\}} \sum_i \alpha_i -\frac12\sum_{i,j} \alpha_i \alpha_j y_i y_j \langle x_i, x_j \rangle

subject to αi[0,C]\alpha_i\in[0, C].

Kernel Trick

Replace xi,xj\langle x_i, x_j \rangle with a kernel function K(xi,xj)=ϕ(xi),ϕ(xj)KK(x_i,x_j) = \langle \phi(x_i),\phi(x_j)\rangle_K. The weight becomes (Representer Theorem)

w=iαiyiϕ(xi)w = \sum_i \alpha_i y_i \phi(x_i)

and the decision hypersurface

f(x)=w,ϕ(x)+b=iαiyiϕ(xi),ϕ(x)+b=iαiyiK(xi,x)+bf(x) = \langle w, \phi(x) \rangle + b = \sum_i \alpha_i y_i \langle \phi(x_i), \phi(x)\rangle + b = \sum_i \alpha_i y_i K(x_i, x) + b

where the bias (or threshold) can be solved using

yjf(xj)=1iαiyiyjK(xi,xj)=1byjy_j f(x_j) = 1 \Rightarrow \sum_{i} \alpha_i y_i y_j K(x_i, x_j) = 1 - b y_j

where ϕ(xj)\phi(x_j) is any support vector with αj>0\alpha_j > 0

Last updated

Was this helpful?