Model Metrics

The assessment of a machine learning model’s performance can be achieved using statistical metrics that quantify the accuracy of the model’s predictions (how closely the predictions match their true values) and identify possible underfitting or overfitting. A regression model’s performance is assessed based on how close the actual ($y_i$) and the predicted ($\widehat{y_i}$) values of the target are, usually by calculating the differences (residuals) between actual and predicted values. The performance of classification models is evaluated based on the number of correct predictions and the number of misclassifications.

Regression Metrics
Classification Metrics
Tips
See also
References
Version History

Regression Metrics

The computed statistical metrics include the mean squared error (MSE, Eq. 1), the root mean squared error (RMSE, Eq. 2), the mean absolute error (MAE, Eq. 3), and the R Square ($R^2$, Eq. 4)^1,2.

$$ \begin{equation} \text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 {\qquad [1] \qquad} \end{equation} $$

$$ \begin{equation} \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2} {\qquad [2] \qquad} \end{equation} $$

$$ \begin{equation} \text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i| {\qquad [3] \qquad} \end{equation} $$

$$ \begin{equation} R^2 = \left( \frac{\sum_{i=1}^{N} (y_i - \bar{y})(\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum_{i=1}^{N} (y_i - \bar{y})^2} \sqrt{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})^2}} \right)^2 {\qquad [4] \qquad} \end{equation} $$

where $y_i$ and $\widehat{y_i}$ are the actual and the predicted target values, respectively over the $N$ samples, $\bar{y}$ and $\bar{\hat{y_i}}$ are the averages of the original and predicted values, respectively.

Use the Regression Metrics function by browsing in the top ribbon:

Statistics $\rightarrow$ Model Metrics $\rightarrow$ Regression Metrics

Input

Data matrix with at least two columns, including the actual and predicted response (target/endpoint) variable values (numerical data).

Configuration

Actual Value Column	Select from the dropdown menu the column name corresponding to the actual target values.
Prediction Value Column	Select from the dropdown menu the column name corresponding to the predicted target values.

Output

Table with the computed statistical metrics.

Example

Input

In the left-hand spreadsheet of the tab import the data matrix where the actual [1] and the predicted [2] target values are included.

Configuration

Statistics $\rightarrow$ Model Metrics $\rightarrow$ Regression Metrics.
Select the from the dropdown lists Actual Value Column [3] and Prediction Value Column [4] the column names that correspond to the actual and predicted target values, respectively. In this case the column “price” contains the observed class values and the column “kNN Prediction” the predicted values from a developed machine learning model.
Click on the Execute button [5] to perform computations.

Output

In the right-hand spreadsheet of the tab the metrics values are presented [6].

Classification Metrics

The results of the classification are summarized in a confusion matrix, which is a table showing the number of True Positives (TP), False Positives (FP), True Negatives (TN) and False Negatives (FN). From the confusion matrix the following statistical metrics are calculated: accuracy (Eq. 5), precision (Eq. 6), sensitivity/recall (Eq. 7), specificity (Eq. 8), F₁ score and F_{$\mathrm{\beta}$} scores (Eqs. 9 and 10) and the Matthews correlation coefficient (MCC, Eq. 11)^1,2.

	Actual Class
Predicted Class	Positive	Negative
Positive	TP	FP
Negative	FN	TN

$$ \begin{equation} \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} {\qquad [5] \qquad} \end{equation} $$

$$ \begin{equation} \text{Precision} = \frac{TP}{TP + FP} {\qquad [6] \qquad} \end{equation} $$

$$ \begin{equation} \text{Sensitivity/Recall} = \frac{TP}{TP + FN} {\qquad [7] \qquad} \end{equation} $$

$$ \begin{equation} \text{Specificity} = \frac{TN}{TN + FP} {\qquad [8] \qquad} \end{equation} $$

$$ \begin{equation} F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} {\qquad [9] \qquad} \end{equation} $$

$$ \begin{equation} F_\beta = (1 + \beta^2) \cdot \frac{\text{Precision} \cdot \text{Recall}}{(\beta^2 \cdot \text{Precision}) + \text{Recall}} {\qquad [10] \qquad} \end{equation} $$

$$ \begin{equation} MCC = \frac{ TP \times TN - FP \times FN }{ \sqrt{ (TP + FP)(TP + FN)(TN + FP)(TN + FN) } } {\qquad [11] \qquad} \end{equation} $$

Use the Classification function by browsing in the top ribbon:

Statistics $\rightarrow$ Model Metrics $\rightarrow$ Classification Metrics

Input

Data matrix with at least two columns, including the actual and predicted response (target/endpoint) variable values (categorical data, two or more classes).

Configuration

Actual Value Column	Select from the dropdown menu the column name corresponding to the actual (original/observed) class values.
Prediction Value Column	Select from the dropdown menu the column name corresponding to the predicted class values.
beta of F score	The beta parameter for the calculation of F_{$\mathrm{\beta}$} score.

Output

The confusion matrix and the accuracy statistics.

Example

Input

In the left-hand spreadsheet of the tab import the data matrix where the actual [1] and the predicted [2] class values are included.

Configuration

Select Statistics $\rightarrow$ Model Metrics $\rightarrow$ Classification Metrics.
Select from the dropdown lists Actual Value Column [3] and Prediction Value Column [4] the column names that correspond to the actual and predicted class values, respectively. In this case the column “Species” contains the observed class values and the column “Prediction” the predicted values from a developed machine learning model.
Type the value $\beta$ value for the F_{$\mathrm{\beta}$} calculation in the beta of F Score field [5].
Click on the Execute button [6] to perform computations.

Output

In the right-hand spreadsheet of the tab the confusion matrix is depicted [7], followed by the overall classification accuracy [8], and the rest of the statistics measures [9].

Tips

In model development, the abovementioned statistical measures can be calculated for both training set predictions (to assess model fit and potential overfitting) and external/test set predictions (to assess generalization and predictive performance).
In classification, the following metrics: precision, recall/sensitivity, specificity, F₁ score, and F_{$\mathrm{\beta}$} score, are calculated for all classes. This approach is crucial when dealing with imbalanced class distributions, where one class might be much more frequent than the other(s). By calculating these statistics for both the positive and negative classes, you can better understand how well the developed model performs across both common and rare events. This comprehensive evaluation provides a complete performance picture of the model.

References

Naser MZ, Alavi AH. Error Metrics and Performance Fitness Indicators for Artificial Intelligence and Machine Learning in Engineering and Sciences. Archit Struct Constr 2021. doi.org/10.1007/s44150-021-00015-8.
Witten Ian H and Frank, Eibe and Hall, Mark A and Pal CJ. Data Mining: Practical Machine Learning Tools and Techniques. Fourth. Morgan Kaufmann; 2011. doi.org/10.1016/C2009-0-19715-5.

Version History

Introduced in Isalos Analytics Platform v0.1.18

Instructions last updated on May 2024

Model Metrics

Table of contents

Regression Metrics

Input

Configuration

Output

Example

Input

Configuration

Output

Classification Metrics

Input

Configuration

Output

Example

Input

Configuration

Output

Tips

See also

Workflows

Bodyfat prediction case study

House pricing case study

Insurance charges case study

MA score case study

Salary prediction case study

Breast cancer case study

Credit card case study

Parkinson’s disease case study

Students’ performance case study

References

Version History