Section 2
Support Vector Machines
7. Introducing: Support Vector Machines
07:44 (Preview)
8. Support Vector Machines to Maximise Decision Margins 📂
25:06
9. A Code Walkthrough for SVMs 📂
32:55
10. Overlapping Classes and Kernel SVMs 📂
21:06
11. Experimenting with Overlapping Class Distributions 📂
25:33
12. Using Kernel SVMs for Non-Linear Predictions 📂
11:36
13. Support Vector Machines in the Wild 📂
17:16
14. Solving Regression Problems with SVMs
22:37
15. Comparing Least-Squares with SVM Regression 📂
56:07
Section 3
Decision Trees
16. Introducing: Decision Trees
09:19 (Preview)
17. Decision Trees in Everyday Thinking 📂
20:29
18. Machine-Designed Decision Trees 📂
27:44
19. Classification Problems with Decision Trees: A Code Walkthrough 📂
25:55
20. Regression Problems with Decision Trees: A Code Walkthrough 📂
18:16
Section 4
Random Forests
21. Ensemble Methods: Machine Learning and Democracy
4:57 (Preview)
22. Random Forests: Decisions Don't Fall Far from the Decision Tree 📂
15:38
23. Random Forests out in the Wild 📂
36:15
24. Interpolation Through a Random Forest 📂
08:57
Section 5
Gradient Boosting
25. Give Yourself a Gradient Boost
07:01 (Preview)
26. Auto-Correction in a Forest of Stumps 📂
22:06
27. Gradient Boosting by Hand: Code Example 📂
15:55
28. XGBoost in the Wild 📂
14:41
29. Cross validate with the XGBoost API 📂
15:30
30. Conclusion, Certificate, and What Next?
05:52
6. Applying the Kernel Trick in the Wild
The Kernel Trick
📂 Please register or log in to download resources
📑 Learning Objectives
  • Applying the Kernel Ridge Regression model with scikit-learn
  • Exploring hyperparameters and kernel choices
  • Visualising output

In this video, we'll get stuck straight in to our problem: predicting the compressive strength of concrete from a few easy to measure variables. This is an end-to-end implementation of kernel ridge regression that you'll need when deploying in the wild.

Inspect the data


Loading the data and inspecting the columns, we print out the following table:

Target:  Concrete compressive strength(MPa, megapascals)
Features:  Index(['Cement (component 1)(kg in a m^3 mixture)',
       'Blast Furnace Slag (component 2)(kg in a m^3 mixture)',
       'Fly Ash (component 3)(kg in a m^3 mixture)',
       'Water  (component 4)(kg in a m^3 mixture)',
       'Superplasticizer (component 5)(kg in a m^3 mixture)',
       'Coarse Aggregate  (component 6)(kg in a m^3 mixture)',
       'Fine Aggregate (component 7)(kg in a m^3 mixture)', 'Age (day)'],
      dtype='object')
X shape:  (1030, 8)
y shape:  (1030, 1)

This tells us that there are 8 input features from which we want to predict a single target: the compressive strength of concrete.

Preprocessing

We'll need to scale and centre the data set so it is clean for machine learning. We'll also split it into training and test sets.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=31)

print('X_train shape: ', X_train.shape)
print('X_test shape: ', X_test.shape)
print('y_train shape: ', y_train.shape)
print('y_test shape: ', y_test.shape)

scaler = StandardScaler()
scaler.fit(X_train)

# Transform both X matrices with these values
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Try a baseline linear model


So that we can benchmark any improvement, we'll first try a linear model.

from sklearn.linear_model import LinearRegression

# Fit a linear model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

The results aren't aweful, but indicate there is much more to be done with an R2R^2 score of 0.58.

A plot of the predictions vs. the ground truth for a baseline linear fit.

Figure 1. A plot of the predictions vs. the ground truth for a baseline linear fit.

The Kernel Ridge Regressor


Our assumption is that a non-parametric model can learn the non-linear behaviour found in the residuals of the linear model. This is where subtle feature extraction can improve the fit.

from sklearn.kernel_ridge import KernelRidge

# Fit a KRR model
krr= KernelRidge(kernel='polynomial', degree=4, alpha=1.0)
krr.fit(X_train, y_train)


y_pred_linear = linear_model.predict(X_test)
y_pred_poly = krr.predict(X_test)

The results give us a far improved model.

Kernel Ridge Regression solution.

Figure 2. Kernel Ridge Regression solution.

Tune the model over parameter space


In the implementation, we specified a number of hyper parameters kernel, degree, alpha. To ensure that we make the best possible choice, we can automate a search through the options and assess the outcome with K-fold cross validation.

First of all, let's specify the parameters that we shall search through.

param_grid = [
    {'degree': [2, 3, 4, 5, 6],
     'alpha': [1e-1, 1.0, 10],
     'kernel': ['polynomial'],
    },
    {'gamma': [1e-2, 1e-1, 1, 10],
     'alpha': [1e-1, 1.0, 10],
     'kernel': ['rbf'],
    },
]

Then we can perform the grid search.

from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold

krr = KernelRidge()
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=10, random_state=0)
search = GridSearchCV(
    estimator=krr, param_grid=param_grid,
)
search.fit(X_train, y_train)

This object now contains a search.best_estimator_ which we can compute the score on and recall the kernel. In fact, if we want to call upon more outputs of the search, we can return the values as a pandas.dataframe:

df = pd.DataFrame(search.cv_results_)
df[df['param_kernel'] == 'polynomial'][df['param_alpha'] == 0.1]

Which gives a complete overview of the K-fold cross validation statistics to report upon. Don't forget the df.to_markdown() method!

mean_fit_timestd_fit_timemean_score_timestd_score_timeparam_alphaparam_degreeparam_kernelparam_gammaparamssplit0_test_scoresplit1_test_scoresplit2_test_scoresplit3_test_scoresplit4_test_scoremean_test_scorestd_test_scorerank_test_score
00.3138520.1547080.114450.06190050.12polynomialnan{'alpha': 0.1, 'degree': 2, 'kernel': 'polynomial'}0.7684450.7999140.7999080.7425480.8099020.7841430.025049112
10.3231350.02229130.07264110.01936530.13polynomialnan{'alpha': 0.1, 'degree': 3, 'kernel': 'polynomial'}0.8641290.8950910.8748320.8417560.8785140.8708640.01762883
20.3423960.04719950.07405180.03875010.14polynomialnan{'alpha': 0.1, 'degree': 4, 'kernel': 'polynomial'}0.8742750.8998580.9134860.7770220.8759070.8681090.04788074
30.328830.04366760.0569710.01277540.15polynomialnan{'alpha': 0.1, 'degree': 5, 'kernel': 'polynomial'}0.8405660.7845250.6866160.6731680.6980320.7365810.064984917
40.3055370.005716280.0765360.02885260.16polynomialnan{'alpha': 0.1, 'degree': 6, 'kernel': 'polynomial'}0.7923780.663244-0.03840430.4627560.6644080.5088760.2932719
Next Lesson
7. Introducing: Support Vector Machines