Classification
Classification is a typical supervised learning technique in predictive modeling that aims to categorize data into predefined classes. Classification analysis is used when the target is a categorical value, and algorithms are trained on labeled datasets to predict the category of unseen objects. There are two types of classification problems; Binary classification where the possible outcome can be one of two distinct classes, and Multiclass classification where the outcome can be one of several categories. 1
Table of contents
- k Nearest Neighbors (kNN)
- Fully Connected Neural Network
- Radial Basis Function Network
- XGBoost
- J48 Decision Tree
- Random Forest
- Tips
- See also
- References
- Version History
k Nearest Neighbors (kNN)
k Nearest Neighbors (kNN) is a simple non-parametric algorithm that operates by identifying the data points from the training set that are most proximate to a new unseen input. This instance-based learning method calculates the Euclidean distance between instances considering all attribute values in order to determine the closeness between data points. The k parameter denotes the number of nearest neighbors to consider for the prediction.1,2 The prediction of a new instance is assigned based on the class label that is most frequently represented among the k nearest neighbors using the inverted distance as the weighting factor. As calculations of Euclidean distances are performed, scaling of data is performed within the function.
Use the k Nearest Neighbors
classification function by browsing in the top ribbon:
Analytics \(\rightarrow\) Classification \(\rightarrow\) k Nearest Neighbors (kNN) |
Input
Data matrix with training set data. The categories of the target feature should be presented as strings.
Configuration
Target Column | Select from the drop-down menu the column containing the target (dependent) variable that is going to be predicted. Columns with numerical features cannot be selected as targets. |
Number of Neighbors | An integer representing the number of closest data points (k) used to make predictions for a new data point. |
Output
A data matrix including the actual target value and the value predicted by the algorithm (“kNN Prediction”). For each data point, the closest neighbors from the training set are listed in the “Closest NN” columns, along with their corresponding distances in the “Distance from NN” columns.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix including the target variable for prediction.

Configuration
- Select
Analytics
\(\rightarrow\)Classification
\(\rightarrow\)k Nearest Neighbors (kNN)
. - Select the column that is going to be predicted from the drop-down menu [1].
- Type the
Number of Neighbors
[2] to consider. - Click on the
Execute
button [3] to apply the training algorithm on the input columns.
Output
In the right-hand spreadsheet of the tab the output data matrix with the actual and the predicted value of the target is presented. Also, the k most proximate identified instances from the training set are given for each data point, along with the corresponding inverted Euclidean distances from each neighbor. Note that the “Closest NN1” represents the nearest neighbor, which is the data point itself when applied on the training set. Consequently, the “Distance from NN1” is 0 for all given training instances.

Application on external set
You can apply the trained k Nearest Neighbors (kNN)
model to any external (test) data using the Existing Model Utilization
function:
- Import the external data in the left-hand spreadsheet of the tab. Include the same columns used to build the kNN model.
- Select
Analytics
\(\rightarrow\)Existing Model Utilization
. Select the kNN model [1] and click on theExecute
button [2].
- Inspect the results in the right-hand spreadsheet of the tab. Note that in this case the closest neighbor listed in the “Closest NN1” belongs to the training set, and the “Distance from NN1” is not zero.

Fully Connected Neural Network
A type of feedforward artificial neural network consisting of multiple layers of neurons. It consists of an input layer, one or more hidden layers and an outer layer, which are fully connected with each other. A variety of non-linear activation functions are typically used in the hidden layer, allowing the network to learn complex patterns in data. MLP uses a backpropagation algorithm to train the model and classify instances.1,2
Use the Fully Connected Neural Network
function by browsing in the top ribbon:
Analytics \(\rightarrow\) Classification \(\rightarrow\) Fully Connected Neural Network |
Input
Data matrix with training set data. The categories of the target feature should be presented as strings. Other categorical features must be encoded into numerical values, presented as either integers or doubles.
Configuration
Batch Size | Select from the drop-down menu the number of training instances that will be propagated through the network. Four options are available for selection: 16 , 32 , 64 , and 128 . |
Number of Epochs | Specify the number (integer) of complete passes through the data during training. |
Learning Rate | Specify the learning rate (between 0 and 1), which controls the size of the steps taken during optimization. |
Momentum | Specify the momentum rate (between 0 and 1) for the backpropagation algorithm. |
+/- | Click on the + and - buttons to add or remove hidden layers, respectively. |
Hidden Layers | For each added hidden layer, specify the number of neurons and select the non-linear activation function used to map the weighted inputs to the output of each neuron. Options for the activation functions include: - RELU , - RELU6 , - LEAKYRELU , - SELU , - SWISH , - RRELU , - SIGMOID , - SOFTMAX , - SOFTPLUS , - SOFTSIGN , - TANH , - THRESHOLDEDRELU , - GELU , - ELU , - MISH , - CUBE , - HARDSIGMOID , - HARDTANH , - IDENTITY , - RATIONALTANH , and - RECTIFIEDTANH . |
Target Column | Select from the drop-down menu the column containing the feature that is going to be predicted. |
RNG Seed | Select an integer as seed to get reproducible results. The option to select a time-based random number-generated seed is available. |
Output
A data matrix including the actual categorical value and the class predicted by the algorithm (“Prediction”) is presented.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix including the target variable with at least two distinct categories for prediction. In case that categorical-string columns are included in the set, these should be encoded into representative numerical values.

Configuration
- Select
Analytics
\(\rightarrow\)Classification
\(\rightarrow\)Fully Connected Neural Network
- Select the hyperparameters that determine the training procedure: the
Batch Size
[1], theNumber of Epochs
[2], theLearning Rate
[3] and theMomentum
[4]. - Select the hyperparameters that determine the
Hidden Layers
[5] of the neural network: the number of neurons [6] and the activation function [7] of each layer. - Add [8] or remove [9] hidden layers to define the architecture of the neural network.
- Select the column that is going to be predicted from the drop-down menu [10]. Only columns containing categorical features should be selected for prediction.
- Select an
RNG Seed
for reproducible results or a random number generatedTime-based (RNG) Seed
[11]. - Click on the
Execute
button [12] to apply the training algorithm on the input columns.
Output
In the right-hand spreadsheet of the tab the output data matrix with the actual and the predicted category of the target is presented.

Radial Basis Function Network
Radial basis function networks (RBF) network is an artificial neural network that employs RBF kernels as activation functions. The network consists of three layers: the input layer modeling a vector that passes data, the hidden layer that performs computations and the output layer designated for classification problems. The output layer of the neural network is a linear combination of the activation (output) from the hidden units.5,6
A Radial Basis Function is a real-valued function $φ(r)$ that is dependent only on the distance between a fixed input point ($r$) to the center ($c$) of each neuron as reference point Eq. 1.3
The radial basis function kernels available in Isalos include:
Gaussian:
Multiquadric:
Inverse Quadratic:
Inverse Multiquadric:
Polyharmonic spline:
where $k$ is the order of the spline.
Thin Plate Spline:
Bump Function:
where $\varepsilon$ is the shape parameter used to scale the input of the radial kernel.
Use the Radial Basis Function Network
classification function by browsing in the top ribbon:
Analytics \(\rightarrow\) Classification \(\rightarrow\) Radial Basis Function Network |
Input
Data matrix with training set data. The categories of the target feature should be presented as strings. Other categorical features must be encoded into numerical values, presented as doubles, since RBF does not allow the use of integers.
Configuration
Hidden Neurons | Specify the number of neurons in the hidden layer of the network. |
RBF Kernel | Select from the drop-down menu the radial basis function kernel. Options include: - GAUSSIAN Eq. 2, - MULTIQUADRIC Eq. 3, - INVERSE QUADRATIC Eq. 4, - INVERSE MULTIQUADRIC Eq. 5, - POLYHARMONIC SPLINE Eq. 6, - THIN PLATE SPLINE Eq. 7, and - BUMP FUNCTION Eq. 8. A new configuration field appears accordingly after RBF Kernel selection, for the selection of Epsilon ($\varepsilon$) shape parameter or K ($k$) where applicable. |
Point Selection | Select manually the way that determines how the centers of the neural network are chosen. Options include: - Random Points from Training set : chosen randomly from the training data. - Use KMeans : RBF centers are the cluster centers of the partitioned training data. |
RNG Seed | Select an integer as seed to get reproducible results. The option to select a time-based random number-generated seed is available. |
Target Column | Select from the drop-down menu the column containing the target variable that is going to be predicted. Columns with numerical features cannot be selected as targets. |
Output
A data matrix including the actual categorical value and the class predicted by the algorithm (“Prediction”) is presented.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix including the target variable for prediction containing at least two distinct categories presented as strings. In case that categorical-string columns are included in the set, these should be encoded into representative numerical values (doubles).

- Select
Analytics
\(\rightarrow\)Classification
\(\rightarrow\)Radial Basis Function Network
- Type the number (integer) of
Hidden Neurons
[1]. - Select the
RBF Kernel
[2] used as activation function of the hidden layer and subsequently select theEpsilon
[3] orK
parameter where applicable. - Select the
Point Selection
[4] method to determine the center of the network. - Select an
RNG Seed
for reproducible results or a random number generatedTime-based (RNG) Seed
[5]. - Select the column that is going to be predicted from the drop-down menu [6].
- Click on the
Execute
button [7] to apply the normalization on the selected columns.
Output
In the right-hand spreadsheet of the tab the output data matrix with the actual and the predicted category of the target is presented.

XGBoost
The Extreme Gradient Boosting (XGBoost) open-source library7 is used to implement the gradient boosting framework. The library uses a class of ensemble machine learning algorithms constructed from decision tree models. Ensemble learning operates by combining different individual base learners to obtain a final prediction.8 In an iterative process, trees are added to the ensemble so that the prediction error (loss) of previous models is reduced. In classification tasks, there are different loss functions available for binary and multiclass problems.9
Use the XGBoost
classification function by browsing in the top ribbon:
Analytics \(\rightarrow\) Classification \(\rightarrow\) XGBoost |
Input
Data matrix with training set data. The categories of the target feature should be presented as strings. Other categorical features must be encoded into numerical values, presented as doubles.
Configuration
Target Column | Select from the drop-down menu the column containing the target variable that is going to be predicted. Columns with numerical features cannot be selected as targets. |
booster | Select from the drop-down menu which booster to use. Three options are available for selection, namely: - gbtree : default tree-based models, - dart : tree-based models, and - gblinear : linear functions. |
objective | Select from the drop-down menu the learning objective of the method. Options include: - reg:squarederror : regression with squared loss, - reg:gamma : gamma regression with log-link, whose output is a mean of gamma distribution, and - reg:tweedie : tweedie regression with log-link. |
number of estimators | Type the number of models (integer) to train in the learning ensemble. |
eta | Specify the learning rate (between 0 and 1) which determines the step size shrinkage to prevent overfitting (default value: 0.3). |
gamma | Specify the minimum loss reduction required to make a further partition on a leaf node of the tree (default value: 0). |
max depth | Specify the maximum depth of a tree as a positive integer (default value: 6). |
min child weight | Specify the minimum sum of instance weight (hessian) needed in a child (default value: 1). |
column sample by tree | Specify the subsample ratio of features when constructing each tree. Subsampling will occur once for every tree constructed. |
sub sample | Specify the subsampling ratio (between 0 and 1) of the training instances. Subsampling will occur once in every boosting iteration (default value: 1). |
tree method | Select the tree construction algorithm used in XGBoost. Options include: - auto : use this heuristically to choose the fastest method typically based on the dataset size, - exact : exact greedy algorithm, - approx : approximates the greedy algorithm using quantile sketch and gradient histogram, and - hist : fast histogram optimized approximate greedy algorithm. |
lambda | Specify the L2 regularization term on leaf weights (default value: 1). |
alpha | Specify the L1 regularization term on leaf weights (default value: 0). |
RNG Seed | Select an integer as seed to get reproducible results. The option to select a time-based random number-generated seed is available by clicking on the Time-based RNG Seed checkbox. |
Output
A data matrix including the actual categorical value and the class predicted by the algorithm (“Prediction”) is presented.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix including the target variable for prediction containing at least two distinct categories presented as strings. In case that categorical-string columns are included in the set, these should be encoded into representative numerical values.

Configuration
- Select
Analytics
\(\rightarrow\)Classification
\(\rightarrow\)XGBoost
. - Select the column that is going to be predicted from the drop-down menu [1].
- Select the tree
booster
[2] method, theobjective
function [3] for loss and type thenumber of estimators
[4] involved in the ensemble. - Select the hyperparameters involved in the regularization of the model:
eta
[5],gamma
[6],lambda
[12] andalpha
[13]. Select the hyperparameters involved in tree construction:max depth
[7] andmin child weight
[8]. Select the column sampling rate by tree [9] and the overall subsampling rates [10]. Default values, data types (double or integer) and acceptable ranges are indicated as guidance on the input parameter values. - Select the tree construction algorithm [11] used in the XGBoost.
- Select an
RNG Seed
for reproducible results or a random number generatedTime-based (RNG) Seed
[14]. - Click on the
Execute
button [15] to apply the training algorithm on the input columns.
Output
In the right-hand spreadsheet of the tab the output data matrix with the actual and the predicted category of the target is presented.

J48 Decision Tree
J48 is an open-source Java implementation of the C4.5 statistical classifier, which performs classification based on pruned or unpruned decision trees or rules generated from them. In this interpretation, the algorithm uses the normalized information gain as criterion to split the dataset into subsets and uses the most influential attribute to make the decision in each node. J48 uses the pruning technique, which refers to the removal of branches from a decision tree that have limited contribution to the predictive performance of the model.1,10
Use the J48 Decision Tree
classification function by browsing in the top ribbon:
Analytics \(\rightarrow\) Classification \(\rightarrow\) J48 Decision Tree |
Input
Data matrix with training set data. The categories of the target feature should be presented as strings. Other categorical features must be encoded into numerical values, presented as either integers or doubles.
Configuration
Minimum Sample Split | Specify the minimum number of data samples required to split a node. |
Max Depth | Select the maximum depth in the decision tree. |
Target Column | Select from the drop-down menu the column containing the feature that is going to be predicted. Columns with numerical features should not be selected as targets. |
Output
A data matrix including the actual categorical value and the class predicted by the algorithm (“Prediction”) is presented. The visualization of the constructed decision tree can be also generated.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix including the target variable with at least two distinct categories for prediction. In case that categorical-string columns are included in the set, these should be encoded into representative numerical values.

Configuration
- Select
Analytics
\(\rightarrow\)Classification
\(\rightarrow\)J48 Decision Tree
- Select the hyperparameters that determine the structure of the decision tree: the
Min sample split
[1] andMax Depth
[2]. Default values, data types and acceptable ranges are indicated as guidance on the input parameter values. - Select the column that is going to be predicted from the drop-down menu [3]. Only columns containing categorical features should be selected for prediction.
- Click on the
Execute
button [4] to apply the training algorithm on the input columns.
Output
In the right-hand spreadsheet of the tab the output data matrix with the actual and the predicted category of the target is presented.

After the predictions are obtained, the option Draw Decision Tree
[1] appears on the configuration window. Clicking this button will provide a visualization of the constructed decision tree.
The decision tree appears in a new window, offering a detailed view to help understand the decision-making process of the model. Specifically, each node is represented as a box containing information about the selected feature and its threshold used for data splitting. Also, the measured node purity (Gini) is presented, as well as the majority class that is predicted in each node.

Random Forest
Random forest classifier is an ensemble learning method that operates by building multiple randomized decision trees during training and obtaining the category prediction from each tree. The decision trees are constructed in parallel, with no interaction between them, using random subsets of training data and input attributes to ensure diversity. Classifications independently made by all the trees in the forest are aggregated and the majority vote is selected as the final class prediction.8,11
Use the Random Forest
function by browsing in the top ribbon:
Analytics \(\rightarrow\) Classification \(\rightarrow\) Random Forest |
Input
Data matrix with training set data. The categories of the target feature should be presented as strings. Other categorical features must be encoded into numerical values, presented as either integers or doubles.
Configuration
Features fraction | Specify the feature subsampling rate represented as a fraction of features (between 0 and 1) available in each tree split (default value: 0.9). |
Min impurity decrease | Specify the impurity decrease threshold (between 0 and 1) necessary to determine the quality of splits in the decision trees. A split is only considered if it results in a decrease of impurity greater than or equal to this value (default value: 0.1). |
Seed | Select an integer as seed to get reproducible results. The option to select a time-based random number-generated seed is available. |
Number of ensembles | Specify the number of individual trees to be generated by the algorithm (default value: 10). |
Target column | Select from the drop-down menu the column containing the feature that is going to be predicted. Columns with numerical features cannot be selected as targets. |
Output
A data matrix including the actual categorical value and the class predicted by the algorithm (“Prediction”) is presented.
Example
Input
In the left-hand spreadsheet of the tab import the data matrix including the target variable for prediction with at least two distinct categories presented as strings. In case that other categorical-string features are included in the set, these should be encoded into representative numerical values.

Configuration
- Select
Analytics
\(\rightarrow\)Classification
\(\rightarrow\)Random Forest
. - Select the hyperparameters that determine the structure of the model: the
Features fraction
[1],Min impurity decrease
[2] andNumber of ensembles
[4]. Default values, data types (double or integer) and acceptable ranges are indicated as guidance on the input parameter values. - Select a
Seed
for reproducible results or a random number generatedTime-based (RNG) Seed
[3]. - Select the column that is going to be predicted from the drop-down menu [5]. Only columns containing categorical features should be selected for prediction.
- Click on the
Execute
button [6] to apply the training algorithm on the input columns.
Output
In the right-hand spreadsheet of the tab the output data matrix with the actual and the predicted category of the target is presented.

Tips
k Nearest Νeighbors:
- It works more efficiently for small to medium datasets and low-dimensional data. kNN is sensitive to missing data.
- The performance of the model is highly influenced by the selection of k.
Radial Basis Function Network:
- The number of neurons in the hidden layer has a high impact on the model performance, since a large number of neurons can lead to overfitting.
XGBoost:
- Be cautious during hyperparameter tuning: Choosing smaller
eta
values, as well as increasing thelambda
,alpha
andgamma
values result in a more conservative boosting process. Increasing the value ofmax depth
parameter makes the model more complex, more likely to overfit.
Random Forest:
- This algorithm performs well with datasets that contain missing values. However, it is not as efficient with a large number of sparse features or with categorical variables of many levels that are improperly encoded.
See also
The model generated by any of the k Nearest Neighbors (kNN)
, Fully Connected Neural Network
, Radial Basis Function Network
, XGBoost
, J48 Decision Tree
, or Random Forest
classification algorithms can be applied to any input data through the Existing Model Utilization
function (e.g., a classification algorithm trained from the training set data of a machine learning model can be applied to the test/external set data).
Workflows
Breast cancer case study
Credit card case study
Parkinson’s disease case study
Students’ performance case study
References
- Witten Ian H and Frank, Eibe and Hall, Mark A and Pal CJ. Data Mining: Practical Machine Learning Tools and Techniques. Fourth. Morgan Kaufmann; 2011. https://doi.org/10.1016/C2009-0-19715-5.
- Murphy KP. Machine Learning: A Probabilistic Perspective. The MIT Press; 2012. 10.5555/2380985.
- Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern 1982;43:59–69. https://doi.org/10.1007/BF00337288.
- Van Hulle MM. Self-organizing Maps. In: Rozenberg G, Bäck T, Kok JN, editors. Handbook of Natural Computing, Berlin, Heidelberg: Springer Berlin Heidelberg; 2012, p. 585–622. https://doi.org/10.1007/978-3-540-92910-9_19.
- Lee C-C, Chung P-C, Tsai J-R, Chang C-I. Robust radial basis function neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 1999;29:674–85. https://doi.org/10.1109/3477.809023.
- Ghosh J, Nag A. An Overview of Radial Basis Function Networks. In: Howlett RJ, Jain LC, editors. Radial Basis Function Networks 2: New Advances in Design, Heidelberg: Physica-Verlag HD; 2001, p. 1–36. https://doi.org/10.1007/978-3-7908-1826-0_1.
- XGBoost Parameters — xgboost 2.1.0-dev documentation n.d. https://xgboost.readthedocs.io/en/latest/parameter.html (accessed June 3, 2024).
- Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York, NY: Springer New York; 2009. https://doi.org/10.1007/978-0-387-84858-7.
- Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, p. 785–94. https://doi.org/10.1145/2939672.2939785.
- Salzberg SL. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn 1994;16:235–40. https://doi.org/10.1007/BF00993309.
- Breiman L. Random Forests. Machine Learning 2001;45:5–32. https://doi.org/10.1023/A:1010933404324.
Version History
Introduced in Isalos Analytics Platform v0.1.18
Instructions last updated on January 2025