Descriptive Statistics
Descriptive statistics is a collection of analytical methods used to effectively summarize or describe a given dataset. These methods involve various measures, including measures of central tendency (e.g., mean, median, and mode), as well as measures of variability (e.g., range, variance, and standard deviation). By employing descriptive statistics, one can quickly evaluate the distribution and patterns within a dataset without relying on additional assumptions. Descriptive statistics play a crucial role as they offer valuable insights about a dataset in a straightforward manner, without the necessity of graphical representation, which might not have been feasible when dealing with large datasets.1.
Table of contents
The available statistical measures in Isalos, applicable to both the population and to a sample of the population, include:
Minimum
The minimum (min) is the smallest value in a dataset, defining the lower boundary of the data range. There is no distinction in calculation between a sample and a population for this measure.
Maximum
The maximum (max) is the largest value in a dataset, defining the upper boundary of the data range. Like the minimum, the calculation does not vary between a sample and a population.
Range
The range is a measure of the spread between the maximum and minimum values in the dataset, calculated as:
This measure does not differ between population and sample data.
Size/Count
The size or count is the total number of data points in the dataset, denoted as $n$. It is used as the denominator in many statistical formulas to calculate averages and variations. Both population and sample datasets use the same method for counting data points.
Sum
The sum is the total of all numerical values in a dataset, calculated as:
Here, $x_{i}$ represents each value in the dataset. The calculation is identical whether dealing with a sample or a population.
Median
The median is the middle value of a dataset when ordered. The method does not vary between sample and population datasets.
If $n$ is odd, the median is the middle number calculated at position $p$ (where $p = \frac{n + 1}{2}$):
If $n$ is even, it is the average of the two middle numbers at positions $p$ and $p+1$ (where $p = \frac{n}{2}$):
Mode
The mode is the value that appears most frequently in a dataset. A dataset may have one mode, multiple modes, or no mode if no number repeats more frequently than others. This calculation is consistent for both population and sample data.
Midrange
The midrange (MR) is the average of the maximum and minimum values of the dataset, calculated as:
This calculation is the same for both sample and population data.
Root Mean Square
The root mean square (RMS) provides a measure of the magnitude of a set of numbers, typically used to calculate the standard deviation. It is calculated by taking the square root of the average of the squares of the numbers:
This formula is consistent for both population and sample datasets.
Mean
The mean, often referred to as the average, is a measure of central tendency that sums all the numerical values in a dataset and divides by the count of the values. The equation for calculating the mean is:
For populations:
For samples:
Mean Absolute Deviation
The mean absolute deviation (MAD) is the average of absolute deviations from the mean, providing a measure of variability.
For populations:
For samples:
Standard Deviation
Standard deviation measures the spread of data around the mean.
For populations:
For samples:
The difference lies in the denominator, where $n-1$ (Bessel’s correction) is used for samples to provide an unbiased estimator.
Variance
Variance measures the average squared deviations from the mean.
For populations:
For samples:
The distinction is similar to that of standard deviation.
Quartiles
Quartiles divide the data into four equal parts. $Q1$ is the median of the lower half, $Q2$ is the median of the dataset, and $Q3$ is the median of the upper half. The method for calculating quartiles is consistent across sample and population data.
Interquartile Range
The interquartile Range (IQR) is the difference between the third and first quartiles ($Q3$ - $Q1$), measuring the middle 50% spread of the data. There is no variation in calculating IQR between samples and populations.
Outliers
Outliers are data points significantly distant from other observations. They are typically identified using the IQR; values below $Q1 - 1.5 \times IQR$ or above $Q3 + 1.5 \times IQR$ are considered outliers. This identification method applies equally to sample and population data.
Sum of Squares
The sum of squares (SS) is the sum of squared differences from the mean.
For populations:
For samples:
This foundational calculation underpins variance and standard deviation.
Standard Error of the Mean
The standard error of the mean (SEM) measures the accuracy with which a sample represents a population.
For populations:
For samples:
Skewness
Skewness ($\gamma_1$) is a measure of the asymmetry of the probability distribution of a real-valued random variable. Positive skewness indicates a distribution with an asymmetric tail extending towards more positive values, and negative skewness indicates a tail that extends towards more negative values.
For populations:
For samples:
Kurtosis
Kurtosis ($\beta_2$) measures the “tailedness” of the probability distribution of a real-valued random variable. High kurtosis in a data set is an indicator of substantial outliers.
For populations:
For samples:
Kurtosis Excess
Excess kurtosis ($\alpha_4$) is calculated by subtracting 3 from the standard kurtosis measurement. This adjustment helps to compare the tails of the distribution to those of a normal distribution, which has a kurtosis of 3.
For populations:
For samples:
Coefficient of Variation
The coefficient of variation (CV) is a standardized measure of dispersion of a probability distribution or frequency distribution.
For populations:
For samples:
This measure is useful because it allows comparison between distributions with different units or means.
Relative Standard Deviation
The relative standard deviation (RSD) is expressed as a percentage and describes the dispersion or variability relative to the mean of the data.
For populations:
For samples:
Frequency
Frequency ($f_1$) refers to the number of times a particular data point or value appears in a dataset. It is a simple yet fundamental measure used to describe the distribution of data values. The basic formula for frequency is:
where $f_{i}$ represents the frequency of the $i$-th value $x_{i}$.
Use the Descriptive Statistics
function by browsing in the top ribbon:
Statistics \(\rightarrow\) Descriptive Statistics |
Input
The input spreadsheet consists of multiple columns, each representing a distinct dataset on which descriptive statistics will be utilized.
Configuration
List of statistics | Select the appropriate methods for conducting descriptive statistics analysis by clicking on the corresponding radio buttons. The available statistics are described above. |
Select Type | Select from the dropdown list the option that describes wether the data represents a sample or population. |
Output
In the output sheet, descriptive statistics analysis is performed by applying selected statistical measures to each dataset represented in the input sheet. The results display the statistical measures alongside their corresponding datasets in the respective columns.
Example
Input
The input sheet on the left hand-side is organized with individual columns representing each data set. Insert the data sets in a manner that allows for independent analysis of each set when performing descriptive statistics.

Configuration
- Select
Statistics
\(\rightarrow\)Descriptive Statistics
. - Click on the statistics that you wish to calculate for the input data [1] and determine whether the data correspond to
Population
orSample
[2]. - Click on the
Execute
button [3] to calculate the selected statistics.
Output
The right-hand side of the output sheet displays individual datasets per column, accompanied by the corresponding statistical measure chosen on each row.
References
- Jones, J.S., Exploratory and descriptive statistics. 2022: SAGE Publications Limited.
Version History
Introduced in Isalos Analytics Platform v0.2.4
Instructions last updated on July 2024