Section 2
Support Vector Machines
7. Introducing: Support Vector Machines
07:44 (Preview)
8. Support Vector Machines to Maximise Decision Margins šŸ“‚
25:06
9. A Code Walkthrough for SVMs šŸ“‚
32:55
10. Overlapping Classes and Kernel SVMs šŸ“‚
21:06
11. Experimenting with Overlapping Class Distributions šŸ“‚
25:33
12. Using Kernel SVMs for Non-Linear Predictions šŸ“‚
11:36
13. Support Vector Machines in the Wild šŸ“‚
17:16
14. Solving Regression Problems with SVMs
22:37
15. Comparing Least-Squares with SVM Regression šŸ“‚
56:07
16. Conclusion, Certificate, and What Next?
04:39
2. Projecting data features into higher dimensions
The Kernel Trick
šŸ“‚ Please register or log in to download resources

In this tutorial, we shall take our first step away from linear models, and begin identifying more complex, non-linear patterns in real-world data. We begin with an example problem which is not linearly solvable in two dimensions, and show that a solution presents itself in higher dimensions. This lends itself to the idea of the kernel trick, a clever mathematical method which enables building non-linear models with linear methods.

šŸ“‘ Learning Objectives
  • Identifying linearly seperable data
  • The difference between parametric and non-parametric models
  • The importance of model flexibility to identify unforeseen feeatures
  • Understanding the benefit of projecting features into higher dimensions

Pushing data into new dimensions


Equator split

Figure 1. Data points recorded in the north and southern hemisphere are linearly, split by the equator.

Does your data split linearly? Whether your task is classification or regression, your key task as a data scientist is to make predictions from the patterns in your dataset. If these patterns can be separated by straight lines, we can use a linear regression model to make predictions of the form

f(x)=āˆ‘i=1DĪ²ixi(1)f(\mathbf{x}) = \sum_{i=1}^{D} \beta_{i}x_{i} \tag{1}

where the features x=(x1,ā€¦,xD)\mathbf{x}=(x_1,\ldots,x_D) contribute to the outcome yy in proportion to the constants Ī²i\beta_i, which we optimise in advance: this is the ultimate interpretable model as the Ī²i\beta_i indicates relevance of the ii-th feature.

Non-linearly seperable 2D data

Figure 2. Non-linear data in two dimensions.

However, life is rarely so straight forward. For instance, how would you analyse the data in Fig. 2? By-eye inspection shows us that we could chalk out the blue centres of the three gold blobs. But without our human know-how, how might a machine identify these rings within rings?

šŸ’” Answer: Project into new dimensions.

All a matter of perspective


What may look like a ring to us, may appear as flat surface (a linear model) in 3D. But how do we get there? Let's go ahead and project onto a third dimension. To do this, let's systematically examine the patterns in the two-dimensional data. Along the horizontal xx axis, we see three peaks of blue amidst the troughs of gold; a periodic structure of the form cosā”(x)\cos(x). Along the vertical yy axis we observe a central peak of blue with decaying gold, similar to the quadratic function āˆ’y2-y^2. So let us plot a projection map Ļ•ā€‰ā£:R2ā†’R3\phi\colon \mathbb{R}^2 \rightarrow\mathbb{R}^3, from two dimensions into three, given by

Ļ•(x,y)=(x,y,cosā”(Ļ€x)āˆ’y2).\boldsymbol{\phi}(x,y) = (x, y, \cos(\pi x) - y^2 ).
Linearly seperable 3D data

Figure 3. Linearly-separable data in three dimensions.

Success! Stretching our two-dimensional dataset into this third dimension allows us to drive a linear plane straight through the blue and golden data points.

In machine learning we ask the computer to do the heavy lifting. In fact, for subtle pattern recognition tasks, the whole point is that a sophisticated model will very often be able to disect data better than humans. The art is in choosing the correct model; such as a kernel method.

Next Lesson
3. Introducing: The Kernel Trick