SVM is a maximum margin classifier where the decision boundary f(x)=w⋅x+b separate two classes labeled by y=±1 in Rn with a largest possible margin.
The normal vector of the hyperplane
The normal vector in direction of increasing f(x) is
n^=∥∇f(x)∥2∇f(x)=∥w∥2w
The perpendicular distance to the hyperplane from any point x′∈Rn
The distance between any point x′ and the plane f(x) is the distance between this point and any point in the plane projected to the normal direction d=∣(x′−x)⋅n^∣ where f(x)=∥w∥n^⋅x+b. Now ground the decision boundary to zero potential, then 0=∥w∥n^⋅x+b and thus,
d=∥w∥∣w⋅x′+b∣
Maximize the distance subject to the constraint of correct class labels
The optimization is
w,bmaxxmind=w,bmaxxmin∥w∥∣w⋅x+b∣
subject to
w⋅x+b≥1,y=+1w⋅x+b≤−1,y=−1
which is equivalent to
w,bmax∥w∥1=w,bmin21∥w∥22,∀(x,y)1−y(w⋅x+b)≤0
Introduce KKT parameters αi for each pair of data and label (xi,yi), the primal Lagrangian, or cost function is
LP(w,b)=21w⊤w+i∑αi[1−yi(w⊤xi+b)]
with
αi[1−yi(w⊤xi+b)]=0,αi>0
Soft margin and slack variables
The hinge loss max(0,1−y(w⊤x+b)) measures the degree of misclassification where slack variable ξ=1−y(w⊤x+b).
The optimization becomes
w,bmin21∥w∥2+Ci∑max(0,1−yi(w⊤xi+b))
If we wish to use the dual form, we need to cast the above into constraints. Now the soft margin constraint becomes