Branch of statistics that describes or summarizes data.
Descriptive Statistics
Differentiate Exploratory Data Analysis (EDA) from Descriptive Statistics
Exploratory Data Analysis (EDA) helps you understand your data.
Descriptive statistics help you explain your data to others.
Ways of Describing Data
Frequency Distribution → shows values and how often they occur.
Bar Graph → for nominal/ordinal data.
Histogram → for interval/ratio data.
Frequency Polygon → plots points at class midpoints instead of bars.
A way of describing data that presents the score values and their frequency of occurrence.
Frequency Distribution
How frequency distributions of Nominal or Ordinal Data are customarily plotted
Bar Graph
used to represent frequency distributions composed of interval or ratio data using bars
Histogram
Used to represent interval or ratio data using a point that is plotted over the midpoint of each interval at a height corresponding to the frequency of the interval
Frequency Polygon
indicates the proportion of the total number of scores in each interval.
Relative Frequency Distribution
indicates the number of scores that fall below the upper limit of each interval.
Cumulative Frequency Distribution
indicates the percentage of scores that fall below the upper limit of each interval.
Cumulative Percentage Distribution
f/N
Relative Frequency
frequency of interval + frequencies of all class intervals below it.
Cumulative Frequency
cumulative f / N × 100
Cumulative Percentage
also known as the Gaussian Distribution
Normal Distribution
symmetrical and bell shaped
curves outwards at the top and then inwards nearer the bottom, the tails getting thinner and thinner
Normal Distribution
Note: As long as the distribution is close to a normal distribution, it will not matter too much.
a non-symmetrical distribution
skewed distribution
the curve rises rapidly and then drops off slowly
Positive Skew
📌 In simpler terms:
The tail of the distribution is stretched out to the right (higher values).
Most of the scores are low, but a few very high scores pull the mean upward.
the curve rises slowly and then decreases rapidly
Negative Skew
📌 In simpler terms:
The tail of the distribution is stretched out to the right (higher values).
Most of the scores are low, but a few very high scores pull the mean upward.
occurs when there are either too many people at the extremes of the scale, or not enough people at the extremes.
Kurtosis
when there are insufficient people in the tail (ends) of the scores to make the distribution normal.
Positive Kurtosis
when there are too many people, too far away, in the tails of the distribution.
Negative Kurtosis
small number of data points that lie outside the distribution when the distribution is approximately normal. Usually easily spotted in histograms.
Outliers
the most central value of a data set with different interpretations of the sense of “central.”
Central Tendency
Measures of central tendency
Mean (x̄): sum of scores ÷ number of scores.
Median: middle score when ordered.
Mode: most frequent score.