Model Metrics

The assessment of a machine learning model’s performance can be achieved using statistical metrics that quantify the accuracy of the model’s predictions (how closely the predictions match their true values) and identify possible underfitting or overfitting. A regression model’s performance is assessed based on how close the actual (\(y_i\)) and the predicted (\(\widehat{y_i}\)) values of the target are, usually by calculating the differences (residuals) between actual and predicted values. The performance of classification models is evaluated based on the number of correct predictions and the number of misclassifications.

Table of contents

  1. Regression Metrics
  2. Classification Metrics
  3. Tips
  4. See also
  5. References
  6. Version History

Regression Metrics

The computed statistical metrics include the mean squared error (MSE, Eq. 1), the root mean squared error (RMSE, Eq. 2), the mean absolute error (MAE, Eq. 3), and the R Square ($R^2$, Eq. 4)1,2.

$$ \begin{equation} \text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 {\qquad [1] \qquad} \end{equation} $$
$$ \begin{equation} \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2} {\qquad [2] \qquad} \end{equation} $$
$$ \begin{equation} \text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i| {\qquad [3] \qquad} \end{equation} $$
$$ \begin{equation} R^2 = \left( \frac{\sum_{i=1}^{N} (y_i - \bar{y})(\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum_{i=1}^{N} (y_i - \bar{y})^2} \sqrt{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})^2}} \right)^2 {\qquad [4] \qquad} \end{equation} $$

where \(y_i\) and \(\widehat{y_i}\) are the actual and the predicted target values, respectively over the \(N\) samples, \(\bar{y}\) and \(\bar{\hat{y_i}}\) are the averages of the original and predicted values, respectively.

Use the Regression Metrics function by browsing in the top ribbon:

Statistics \(\rightarrow\) Model Metrics \(\rightarrow\) Regression Metrics

Input

Data matrix with at least two columns, including the actual and predicted response (target/endpoint) variable values (numerical data).

Configuration

Actual Value Column Select from the dropdown menu the column name corresponding to the actual target values.
Prediction Value Column Select from the dropdown menu the column name corresponding to the predicted target values.

Output

Table with the computed statistical metrics.

Example

Input

In the left-hand spreadsheet of the tab import the data matrix where the actual [1] and the predicted [2] target values are included.

regression-predictions
Configuration
  1. Statistics \(\rightarrow\) Model Metrics \(\rightarrow\) Regression Metrics.
  2. Select the from the dropdown lists Actual Value Column [3] and Prediction Value Column [4] the column names that correspond to the actual and predicted target values, respectively. In this case the column “price” contains the observed class values and the column “kNN Prediction” the predicted values from a developed machine learning model.
  3. Click on the Execute button [5] to perform computations.
regression-configuration
Output

In the right-hand spreadsheet of the tab the metrics values are presented [6].

regression-metrics

Classification Metrics

The results of the classification are summarized in a confusion matrix, which is a table showing the number of True Positives (TP), False Positives (FP), True Negatives (TN) and False Negatives (FN). From the confusion matrix the following statistical metrics are calculated: accuracy (Eq. 5), precision (Eq. 6), sensitivity/recall (Eq. 7), specificity (Eq. 8), F1 score and F\(\mathrm{\beta}\) scores (Eqs. 9 and 10) and the Matthews correlation coefficient (MCC, Eq. 11)1,2.

  Actual Class  
Predicted Class Positive Negative
Positive TP FP
Negative FN TN
$$ \begin{equation} \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} {\qquad [5] \qquad} \end{equation} $$
$$ \begin{equation} \text{Precision} = \frac{TP}{TP + FP} {\qquad [6] \qquad} \end{equation} $$
$$ \begin{equation} \text{Sensitivity/Recall} = \frac{TP}{TP + FN} {\qquad [7] \qquad} \end{equation} $$
$$ \begin{equation} \text{Specificity} = \frac{TN}{TN + FP} {\qquad [8] \qquad} \end{equation} $$
$$ \begin{equation} F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} {\qquad [9] \qquad} \end{equation} $$
$$ \begin{equation} F_\beta = (1 + \beta^2) \cdot \frac{\text{Precision} \cdot \text{Recall}}{(\beta^2 \cdot \text{Precision}) + \text{Recall}} {\qquad [10] \qquad} \end{equation} $$
$$ \begin{equation} MCC = \frac{ TP \times TN - FP \times FN }{ \sqrt{ (TP + FP)(TP + FN)(TN + FP)(TN + FN) } } {\qquad [11] \qquad} \end{equation} $$

Use the Classification function by browsing in the top ribbon:

Statistics \(\rightarrow\) Model Metrics \(\rightarrow\) Classification Metrics

Input

Data matrix with at least two columns, including the actual and predicted response (target/endpoint) variable values (categorical data, two or more classes).

Configuration

Actual Value Column Select from the dropdown menu the column name corresponding to the actual (original/observed) class values.
Prediction Value Column Select from the dropdown menu the column name corresponding to the predicted class values.
beta of F score The beta parameter for the calculation of F\(\mathrm{\beta}\) score.

Output

The confusion matrix and the accuracy statistics.

Example

Input

In the left-hand spreadsheet of the tab import the data matrix where the actual [1] and the predicted [2] class values are included.

classification-predictions
Configuration
  1. Select Statistics \(\rightarrow\) Model Metrics \(\rightarrow\) Classification Metrics.
  2. Select from the dropdown lists Actual Value Column [3] and Prediction Value Column [4] the column names that correspond to the actual and predicted class values, respectively. In this case the column “Species” contains the observed class values and the column “Prediction” the predicted values from a developed machine learning model.
  3. Type the value \(\beta\) value for the F\(\mathrm{\beta}\) calculation in the beta of F Score field [5].
  4. Click on the Execute button [6] to perform computations.
classification-configuration
classification-configuration2
Output

In the right-hand spreadsheet of the tab the confusion matrix is depicted [7], followed by the overall classification accuracy [8], and the rest of the statistics measures [9].

classification-metrics

Tips

  • In model development, the abovementioned statistical measures can be calculated for both training set predictions (to assess model fit and potential overfitting) and external/test set predictions (to assess generalization and predictive performance).
  • In classification, the following metrics: precision, recall/sensitivity, specificity, F1 score, and F\(\mathrm{\beta}\) score, are calculated for all classes. This approach is crucial when dealing with imbalanced class distributions, where one class might be much more frequent than the other(s). By calculating these statistics for both the positive and negative classes, you can better understand how well the developed model performs across both common and rare events. This comprehensive evaluation provides a complete performance picture of the model.

See also

For the development of regression models refer to the Regression functions and for the development of classification models refer to the Classification functions.

Workflows

Bodyfat prediction case study

House pricing case study

Insurance charges case study

MA score case study

Salary prediction case study

Breast cancer case study

Credit card case study

Parkinson’s disease case study

Students’ performance case study

References

  1. Naser MZ, Alavi AH. Error Metrics and Performance Fitness Indicators for Artificial Intelligence and Machine Learning in Engineering and Sciences. Archit Struct Constr 2021. doi.org/10.1007/s44150-021-00015-8.
  2. Witten Ian H and Frank, Eibe and Hall, Mark A and Pal CJ. Data Mining: Practical Machine Learning Tools and Techniques. Fourth. Morgan Kaufmann; 2011. doi.org/10.1016/C2009-0-19715-5.

Version History

Introduced in Isalos Analytics Platform v0.1.18

Instructions last updated on May 2024