Section 2
Support Vector Machines
7. Introducing: Support Vector Machines
07:44 (Preview)
8. Support Vector Machines to Maximise Decision Margins 📂
9. A Code Walkthrough for SVMs 📂
10. Overlapping Classes and Kernel SVMs 📂
11. Experimenting with Overlapping Class Distributions 📂
12. Using Kernel SVMs for Non-Linear Predictions 📂
13. Support Vector Machines in the Wild 📂
14. Solving Regression Problems with SVMs
15. Comparing Least-Squares with SVM Regression 📂
Section 3
Decision Trees
16. Introducing: Decision Trees
09:19 (Preview)
17. Decision Trees in Everyday Thinking 📂
18. Machine-Designed Decision Trees 📂
19. Classification Problems with Decision Trees: A Code Walkthrough 📂
20. Regression Problems with Decision Trees: A Code Walkthrough 📂
Section 4
Random Forests
21. Ensemble Methods: Machine Learning and Democracy
4:57 (Preview)
22. Random Forests: Decisions Don't Fall Far from the Decision Tree 📂
23. Random Forests out in the Wild 📂
24. Interpolation Through a Random Forest 📂
Section 5
Gradient Boosting
25. Give Yourself a Gradient Boost
07:01 (Preview)
26. Auto-Correction in a Forest of Stumps 📂
27. Gradient Boosting by Hand: Code Example 📂
28. XGBoost in the Wild 📂
29. Cross validate with the XGBoost API 📂
30. Conclusion, Certificate, and What Next?
4. Choosing Between Kernel Functions
The Kernel Trick
📂 Please register or log in to download resources
📑 Learning Objectives
  • Three key types of kernel function:
    • Polynomial kernels
    • Squared-exponential kernels
    • Laplacian kernels
  • The effects of varying hyperparameters alters the prediction
  • Kernel interpretation of the the motivating example projecting into 3D

Constructing kernels can be close an art form. A very important art form when it comes to understanding trends in data.

The ground zero example is the kernel formed by the inner product of the original data features k(x,x)=xxk(\mathbf{x}, \mathbf{x}') = \mathbf{x} \cdot \mathbf{x}'. Other kernels may be found through sums, products, positive multiples of existing kernels. Three commonly used prototypes are:

  • k(x,x)=(xx+c)dk(\mathbf{x}, \mathbf{x}') = (\mathbf{x} \cdot \mathbf{x}' + c)^d; a polynomial kernel with constant cc and degree dd.
  • k(x,x)=exp(γxx2)k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|^{2}); the squared-exponential kernel, with 'inverse length-scale' γ\gamma.
  • k(x,x)=exp(γxx1)k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|_{1}); the Laplacian kernel, based on the taxi-cab norm v1\| \mathbf{v} \|_{1}.

In this video we shall explore how the various hyperparameters impact the prediction, and how the different choices of kernel can improve model fitting.

Choosing Kernels

Figure 1. Diagram of various kernels used to fit data using the Kernel Ridge Regression model.

Back to our example

Recall our running dataset from the previous example.

Projection Diagram

Figure 2. A dataset projection to produce linearly seperable data.

The kernel that makes the projection from 2D to 3D in our two-dimensional problem would correspond to the product

ϕ(x,y)ϕ(x,y)=xx+yy+(cos(πx)y2)(cos(πx)y2).\boldsymbol{\phi}(x,y) \cdot \boldsymbol{\phi}(x',y') = xx' + yy' + (\cos(\pi x) - y^2)(\cos(\pi x') - y'^2).

This is the sum of a linear kernel, xx+yyxx' + yy', and the norm of a non-linear mapping R2R\mathbb{R}^2\rightarrow \mathbb{R}.

Remember, in machine learning, it is not the task of the user to derive these projections by hand. This is the part where the algorithm itself will find the optimal projection.

Next Lesson
5. A Code Walkthrough for Kernel Ridge Regression