Certainly! The mathematical formulation of the linear Support Vector Machine (SVM) algorithm is integral to understanding how it operates for classification tasks. Here’s a step-by-step breakdown of the formulation:
The primary objective of a linear SVM is to find a hyperplane that best separates the data into two classes. In a two-dimensional space, this hyperplane is a line defined as: [ w \cdot x + b = 0 ] Where:
The margin is defined as the distance between the closest data points of each class to the hyperplane. These closest points are known as the support vectors. The SVM aims to maximize this margin to improve the model’s generalization ability. Mathematically, the margin is given by the formula: [ \frac{2}{|w|} ] The goal is to maximize ( \frac{2}{|w|} ), which is equivalent to minimizing ( |w| ) or ( |w|^2 ) (for simpler computation).
For data points ( (x_i, y_i) ) where ( y_i ) is either +1 or -1 (representing two classes), the constraints ensuring that these points are correctly classified by the hyperplane are: [ y_i (w \cdot x_i + b) \geq 1, \; \forall i ] These constraints assert that:
To allow for some misclassifications (especially in non-linearly separable cases), slack variables ( \xi_i ) are introduced. This leads to a soft margin SVM, which modifies the constraints to: [ y_i (w \cdot x_i + b) \geq 1 - \xi_i, \; \forall i ] The objective function also adjusts to incorporate these slack variables: [ \min \frac{1}{2} |w|^2 + C \sum_{i=1}^n \xi_i ] Where:
The linear SVM problem is typically solved using quadratic programming techniques because the objective function is quadratic and the constraints are linear. The solution involves finding the values of ( w ) and ( b ) that minimize the objective function while satisfying the constraints.
Often, it’s beneficial to solve the SVM in its dual form, especially when extending SVMs to non-linear kernels. The dual formulation involves Lagrange multipliers ( \alpha_i ), and the problem transforms into: [ \max_\alpha \sum_{i=1}^n \alpha_i - \frac{1}{2} \sum_{i,j} \alpha_i \alpha_j y_i y_j x_i \cdot x_j ] Subject to: [ \sum_{i=1}^n \alpha_i y_i = 0 ] [ 0 \leq \alpha_i \leq C, \; \forall i ] Where ( \alpha_i ) are the Lagrange multipliers for each constraint. This form is particularly useful as it naturally accommodates the kernel trick for non-linear classification.
This dual formulation not only allows for solving the SVM problem efficiently but also extends the algorithm to handle non-linear boundaries by simply replacing the dot product with a kernel function, facilitating non-linear classification in higher-dimensional spaces through implicit mapping.