Model Metrics
The assessment of a machine learning model’s performance can be achieved using statistical metrics that quantify the accuracy of the model’s predictions (how closely the predictions match their true values) and identify possible underfitting or overfitting. A regression model’s performance is assessed based on how close the actual (\(y_i\)) and the predicted (\(\widehat{y_i}\)) values of the target are, usually by calculating the differences (residuals) between actual and predicted values. The performance of classification models is evaluated based on the number of correct predictions and the number of misclassifications.
Table of contents
Regression Metrics
The computed statistical metrics include the mean squared error (MSE, Eq. 1), the root mean squared error (RMSE, Eq. 2), the mean absolute error (MAE, Eq. 3), and the R Square ($R^2$, Eq. 4)1,2.
where \(y_i\) and \(\widehat{y_i}\) are the actual and the predicted target values, respectively over the \(N\) samples, \(\bar{y}\) and \(\bar{\hat{y_i}}\) are the averages of the original and predicted values, respectively.
Use the Regression Metrics
function by browsing in the top ribbon:
Statistics \(\rightarrow\) Model Metrics \(\rightarrow\) Regression Metrics |
Input
Data matrix with at least two columns, including the actual and predicted response (target/endpoint) variable values (numerical data).
Configuration
Actual Value Column | Select from the dropdown menu the column name corresponding to the actual target values. |
Prediction Value Column | Select from the dropdown menu the column name corresponding to the predicted target values. |
Output
Table with the computed statistical metrics.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix where the actual [1] and the predicted [2] target values are included.
Configuration
Statistics
\(\rightarrow\)Model Metrics
\(\rightarrow\)Regression Metrics
.- Select the from the dropdown lists
Actual Value Column
[3] andPrediction Value Column
[4] the column names that correspond to the actual and predicted target values, respectively. In this case the column “price” contains the observed class values and the column “kNN Prediction” the predicted values from a developed machine learning model. - Click on the
Execute
button [5] to perform computations.
Output
In the right-hand spreadsheet of the tab the metrics values are presented [6].
Classification Metrics
The results of the classification are summarized in a confusion matrix, which is a table showing the number of True Positives (TP), False Positives (FP), True Negatives (TN) and False Negatives (FN). From the confusion matrix the following statistical metrics are calculated: accuracy (Eq. 5), precision (Eq. 6), sensitivity/recall (Eq. 7), specificity (Eq. 8), F1 score and F\(\mathrm{\beta}\) scores (Eqs. 9 and 10) and the Matthews correlation coefficient (MCC, Eq. 11)1,2.
 | Actual Class |  |
Predicted Class | Positive | Negative |
Positive | TP | FP |
Negative | FN | TN |
Use the Classification
function by browsing in the top ribbon:
Statistics \(\rightarrow\) Model Metrics \(\rightarrow\) Classification Metrics |
Input
Data matrix with at least two columns, including the actual and predicted response (target/endpoint) variable values (categorical data, two or more classes).
Configuration
Actual Value Column | Select from the dropdown menu the column name corresponding to the actual (original/observed) class values. |
Prediction Value Column | Select from the dropdown menu the column name corresponding to the predicted class values. |
beta of F score | The beta parameter for the calculation of F\(\mathrm{\beta}\) score. |
Output
The confusion matrix and the accuracy statistics.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix where the actual [1] and the predicted [2] class values are included.
Configuration
- Select
Statistics
\(\rightarrow\)Model Metrics
\(\rightarrow\)Classification Metrics
. - Select from the dropdown lists
Actual Value Column
[3] andPrediction Value Column
[4] the column names that correspond to the actual and predicted class values, respectively. In this case the column “Species” contains the observed class values and the column “Prediction” the predicted values from a developed machine learning model. - Type the value \(\beta\) value for the F\(\mathrm{\beta}\) calculation in the
beta of F Score
field [5]. - Click on the
Execute
button [6] to perform computations.
Output
In the right-hand spreadsheet of the tab the confusion matrix is depicted [7], followed by the overall classification accuracy [8], and the rest of the statistics measures [9].
Tips
- In model development, the abovementioned statistical measures can be calculated for both training set predictions (to assess model fit and potential overfitting) and external/test set predictions (to assess generalization and predictive performance).
- In classification, the following metrics: precision, recall/sensitivity, specificity, F1 score, and F\(\mathrm{\beta}\) score, are calculated for all classes. This approach is crucial when dealing with imbalanced class distributions, where one class might be much more frequent than the other(s). By calculating these statistics for both the positive and negative classes, you can better understand how well the developed model performs across both common and rare events. This comprehensive evaluation provides a complete performance picture of the model.
See also
For the development of regression models refer to the Regression
functions and for the development of classification models refer to the Classification
functions.
Workflows
Bodyfat prediction case study
House pricing case study
Insurance charges case study
MA score case study
Salary prediction case study
Breast cancer case study
Credit card case study
Parkinson’s disease case study
Students’ performance case study
References
- Naser MZ, Alavi AH. Error Metrics and Performance Fitness Indicators for Artificial Intelligence and Machine Learning in Engineering and Sciences. Archit Struct Constr 2021. doi.org/10.1007/s44150-021-00015-8.
- Witten Ian H and Frank, Eibe and Hall, Mark A and Pal CJ. Data Mining: Practical Machine Learning Tools and Techniques. Fourth. Morgan Kaufmann; 2011. doi.org/10.1016/C2009-0-19715-5.
Version History
Introduced in Isalos Analytics Platform v0.1.18
Instructions last updated on May 2024