One-class Support Vector Machines (SVMs) are a specialized version of SVMs used primarily for anomaly detection, also known as novelty detection. This model learns a decision function for outlier detection: classifying new data as similar or different to the training set. The method is particularly useful in applications where the majority of the data is ‘normal’ and anomalies are few or not well defined. This chapter delves into the concept, usage, and implementation of one-class SVMs.
A one-class SVM works by fitting a decision boundary that encompasses the majority of the normal data points. It attempts to separate all the normal data points from the origin (in feature space) and maximize the distance from this hyperplane to the origin. This results in a model that is sensitive to new data points that are different from the training set, identifying them as anomalies.
Here is an example of using a one-class SVM for detecting outliers in a dataset. We’ll simulate a dataset where the majority of the data is normal but includes some outliers.
import numpy as np
from sklearn.svm import OneClassSVM
import matplotlib.pyplot as plt
# Simulate some data
np.random.seed(42)
X = 0.3 * np.random.randn(100, 2)
X_train = np.r_[X + 2, X - 2] # Normal data
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2)) # Outliers
# Fit the model
clf = OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(X_train)
# Predict the labels (normal -1 or outlier 1)
y_pred_train = clf.predict(X_train)
y_pred_outliers = clf.predict(X_outliers)
# Plot the line, the points, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("One-Class SVM")
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=20, edgecolor='k')
plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red', s=20, edgecolor='k')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.show()
Evaluating a one-class SVM can be challenging because true anomalies are often not known in practice. Performance can be assessed using artificial anomalies (if ground truth is available) or by considering the proportion of outliers the model is expected to detect based on the nu
parameter.
One-class SVMs offer a robust method for anomaly detection in various applications. By creating a model based solely on what it learns as normal, it effectively identifies deviations that signify important or critical anomalies.
This chapter explained the fundamental concepts and practical applications of one-class SVMs for anomaly detection, demonstrated with a Python example using Scikit-learn. Understanding how to implement and adjust one-class SVMs is crucial for professionals working in fields where outlier detection is paramount.