# 13. Performance analysis of models¶

## 13.1. Introduction¶

In the previous chapters, we saw the examples of ‘supervised machine learning’, i.e. classification and regression models. Also, we calculated the ‘score’ to see the performance of these models. But there are several other standard methods to evaluate the performance of the models. Table 13.1 shows the list of metrics which can be used to measure the performance of different types of model, , which are discussed in the chapter.

Table 13.1 Metrics to measure the performance
Problem Performance metric
Classification

Accuracy, Receiver operating curve (ROC), Area under ROC,

Logarithmic loss, Confusion matrix, Classification report

Regression

Mean square error (MSE), Root MSE (RMSE), Mean absolute error,

$$R^2$$

## 13.2. Performance of classification problem¶

In this section, we will see the performance measurement of the classification problem.

Note

Cross-validation is used in this section, which is discussed in Chapter 5.

Remember, cross-validation does not create the model to predict the new samples; it only gives an idea about the accuracy of model.

### 13.2.1. Accuracy¶

The ‘accuracy’ is the ratio of the ‘correct predictions’ and ‘all the predictions’. By default, the scoring is done based on ‘accuracy’,

Note

In previous chapters, we already calculated ‘accuracy’ for the ‘training’ and ‘test’ datasets. For easy analysis, the ‘Cross-validation’ class have in-built performance-measurement methods e.g. ‘accuracy’, ‘mean_squared_error and r2_score’ etc. as shown in this chapter.

>>> import numpy as np
>>> from sklearn.neighbors import KNeighborsClassifier
>>> from sklearn.model_selection import cross_val_score
>>>
>>> # create object of class 'load_iris'
>>>
>>> # save features and targets from the 'iris'
... features, targets = iris.data, iris.target
>>>
>>> # use KNeighborsClassifier for classification
... classifier = KNeighborsClassifier()
>>>
>>> # cross-validation
... scores = cross_val_score(classifier,
...                     features, targets,
...                     cv=7, scoring="accuracy")
>>> print("Cross validation scores:", scores)
Cross validation scores: [ 0.95833333    1.     0.95238095
0.9047619    0.95238095    1.    1.]
>>> print("Mean={0:0.4f}, Var={1:0.4f}".format(
...                             np.mean(scores),
...                             np.var(scores)))
Mean=0.9668, Var=0.0011


### 13.2.2. Logarithmic loss¶

It measures the probability of having the correct predictions, and prints the logarithmic value of the probability. Since the probability has the range between 0 and 1, therefore ‘Logarithmic loss’ has the range between 0 and ‘-infinity’.

Note

Higher the ‘Logarithmic loss’ value, better is the model. Perfect model will have the maximum value i.e. ‘0’.

>>> import numpy as np
>>> from sklearn.neighbors import KNeighborsClassifier
>>> from sklearn.model_selection import cross_val_score
>>>
>>> # create object of class 'load_iris'
>>>
>>> # save features and targets from the 'iris'
... features, targets = iris.data, iris.target
>>>
>>> # use KNeighborsClassifier for classification
... classifier = KNeighborsClassifier()
>>>
>>> # cross-validation
... scores = cross_val_score(classifier,
...                     features, targets,
...                     cv=7, scoring="neg_log_loss")
>>> print("Cross validation scores:", scores)
Cross validation scores: [-1.45771098 -0.03187765
-0.07858381 -0.14654173 -1.66902867 -0.02125177
-0.03495091]
>>> print("Mean={0:0.4f}, Var={1:0.4f}".format(
...        np.mean(scores),
...        np.var(scores)))
Mean=-0.4914, Var=0.4644


### 13.2.3. Classification report¶

Classification report gives the ‘precision’, ‘recall’, ‘F1-score’ and ‘support’ values for each class as shown below,

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.metrics import classification_report
>>>
>>>
>>> X, y = iris.data, iris.target
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
...         test_size=0.2,
...         random_state=23,
...         stratify=y)
>>>
>>> # Linear classifier
... cls = LogisticRegression()
>>> cls.fit(X_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
>>>
>>> prediction = cls.predict(X_test)
>>> report = classification_report(y_test, prediction)
>>> print(report) # print classification_report
precision    recall  f1-score   support

0       1.00      1.00      1.00        10
1       0.90      0.90      0.90        10
2       0.90      0.90      0.90        10

avg / total       0.93      0.93      0.93        30


### 13.2.4. Confusion matrix (Binary classification)¶

Let’s understand the Confusion matrix first, which is the basis for ROC, which can be used with ‘binary (not multiclass) classification’. Confusion matrix is a $$2 \times 2$$ matrix, whose columns are shown in Table 13.2 and explained below,

• True positive : Actual value is positive, and predicted value is also positive.
• False negative : Actual value is positive, and predicted value is negative.
• False positive : Actual value is negative, and predicted value is positive.
• True negative : Actual value is negative, and predicted value is negative.
Table 13.2 Confusion matrix
Predicted value
Positive Negative
Actual value Positive True Positive False Negative
Negative False Positive True Negative

Note

Clearly the desired results are the ‘True positive’ and ‘True negative’ columns. Therefore, for better performance, these values should be higher than the ‘False negative’ and ‘False positive’ columns.

Below is an example of Confusion matrix. Here results have following values

• True positive = 9
• True negative = 9
• False positive = 1
• False negative = 1
>>> from sklearn.datasets import make_blobs
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.metrics import confusion_matrix
>>>
>>> X, y = make_blobs(centers=2, random_state=0)
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
...         test_size=0.2,
...         random_state=23,
...         stratify=y)
>>>
>>> # Linear classifier
... cls = LogisticRegression()
>>> cls.fit(X_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
>>> prediction = cls.predict(X_test)
>>> c_matrix = confusion_matrix(y_test, prediction)
>>> print(c_matrix) # print confusion_matrix
[[9 1]
[1 9]]


### 13.2.5. Area under ROC (AUC)¶

ROC is the plot between the ‘true positive rate’ and ‘false positive rate’, which are defined as below,

• True positive rate = (True positive) / (True positive + False negative)
• False positive rate = (False positive) / (False positive + True negative)

Note

ROC and AUC are used for ‘binary (not multiclass) classification’ problem; and ‘AUC = 1’ represents the perfect model,

>>> import numpy as np
>>> from sklearn.datasets import make_blobs
>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.linear_model import LogisticRegression
>>>
>>> X, y = make_blobs(centers=2, random_state=0)
>>>
>>> # use KNeighborsClassifier for classification
... classifier = LogisticRegression()
>>>
>>> # cross-validation
... scores = cross_val_score(classifier,
...                     X, y,
...                     cv=7, scoring="roc_auc")
>>>
>>> print("Cross validation scores:", scores)
Cross validation scores: [ 1.          1.          0.97959184  0.91836735  0.97959184  1.          1.        ]
>>> print("Mean={0:0.4f}, Var={1:0.4f}".format(
...        np.mean(scores),
...        np.var(scores)))
Mean=0.9825, Var=0.0008


## 13.3. Performance of regression problem¶

The code used in this section is discussed in Chapter 4.

### 13.3.1. MAE, MSE and R2¶

Note

• By default, Scikit library calculates the ‘r2_score’ as shown in Lines 44-46. The ‘r2_score’ has the values between 0 (no fit) and 1 (perfect fit).
• Mean absolute error (MAE) is the sum of the ‘absolute differences’ between the predicted and the actual values, and calculated at Lines 48-50.
• Mean square error (MSE) is the sum of squares of the errors, where errors are the differences between the ‘predicted’ and ‘actual’ values. This is calculated at Lines 53-55.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 >>> import numpy as np >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import LinearRegression >>> from sklearn.metrics import mean_squared_error, mean_absolute_error >>> from sklearn.metrics import r2_score >>> >>> N = 100 # 100 samples >>> x = np.linspace(-3, 3, N) # coordinates >>> noise_sample = np.random.RandomState(20) # constant random value >>> # growing sinusoid with random fluctuation ... sine_wave = x + np.sin(4*x) + noise_sample.uniform(N) >>> >>> # convert features in 2D format i.e. list of list ... features = x[:, np.newaxis] >>> >>> # save sine wave in variable 'targets' ... targets = sine_wave >>> >>> # split the training and test data ... train_features, test_features, train_targets, test_targets = train_test_split( ... features, targets, ... train_size=0.8, ... test_size=0.2, ... # random but same for all run, also accuracy depends on the ... # selection of data e.g. if we put 10 then accuracy will be 1.0 ... # in this example ... random_state=23, ... # keep same proportion of 'target' in test and target data ... # stratify=targets # can not used for single feature ... ) >>> >>> # training using 'training data' ... regressor = LinearRegression() >>> regressor.fit(train_features, train_targets) # fit the model for training data LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False) >>> >>> >>> # predict the 'target' for 'test data' ... prediction_test_targets = regressor.predict(test_features) >>> test_accuracy = regressor.score(test_features, test_targets) >>> print("Accuracy for test data:", test_accuracy) Accuracy for test data: 0.822872868183 >>> >>> r2_score = r2_score(test_targets, prediction_test_targets) >>> print("r2_score: ", r2_score) r2_score: 0.822872868183 >>> >>> mean_absolute_error = mean_absolute_error(test_targets, prediction_test_targets) >>> print("mean_absolute_error: ", mean_absolute_error) mean_absolute_error: 0.680406590952 >>> >>> >>> mean_squared_error = mean_squared_error(test_targets, prediction_test_targets) >>> print("mean_squared_error: ", mean_squared_error) mean_squared_error: 0.584535345592 

### 13.3.2. Problem with cross-validation¶

Below is the issue with ‘regressor-performances’ with ‘cross-validation’ method,

Error

• Mean score for ‘r2’ is calculated as ‘-7.7967’, which is negative. Note that, the negative value is not possible for ‘r2’ score.
• Similarly, replace ‘r2’ with ‘neg_mean_squared_error’ and ‘neg_mean_absolute_error’, and it may give some undesired results.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 >>> import numpy as np >>> from sklearn.model_selection import cross_val_score >>> from sklearn.model_selection import KFold >>> from sklearn.linear_model import LinearRegression >>> >>> N = 100 # 100 samples >>> x = np.linspace(-3, 3, N) # coordinates >>> noise_sample = np.random.RandomState(20) # constant random value >>> # growing sinusoid with random fluctuation ... sine_wave = x + np.sin(4*x) + noise_sample.uniform(N) >>> >>> >>> # convert features in 2D format i.e. list of list ... features = x[:, np.newaxis] >>> >>> # save sine wave in variable 'targets' ... targets = sine_wave >>> >>> # cross-validation ... regressor = LinearRegression() >>> >>> cv = KFold(n_splits=10, random_state=7) >>> scores = cross_val_score(regressor, features, targets, cv=cv, ... scoring="r2") >>> >>> print("Cross validation scores:", scores) Cross validation scores: [-13.91006325 -20.21043299 0.36952646 -1.92292726 -3.30936741 -3.30936741 -1.92292726 0.36952646 -20.21043299 -13.91006325] >>> print("Mean={0:0.4f}, Var={1:0.4f}".format( ... np.mean(scores), ... np.var(scores))) Mean=-7.7967, Var=62.5597