What is a stem and leaf diagram
displays numerical data where each value is split into two parts:
the stem, which represents the leading digit(s), and
the leaf, which represents the last digit. It is a way to see distrubutions, patterns and data spread
The table below shows the ages (in years) of 20 people who attended a movie night:
13, 15, 17, 18, 19, 20, 21, 21, 22, 22,
23, 24, 25, 25, 26, 28, 30, 31, 33, 35
a) Construct a stem-and-leaf diagram to represent the data.
1 | 3 5 7 8 9
2 | 0 1 1 2 2 3 4 5 5 6 8
3 | 0 1 3 5
What can we also use a stem and leaf diagram for
comparing 2 data sets using one stem and leaf plot. ( remember we put lowest value at the top of the diagram)
What other method can be used for comparing 2 sets of data. and what is the purpose of this method
box plots- useful as they show the median ,range ,IQR and skewness
In order, what are the different points on a box plot
lower adjacent value
lower hinge
median
upper hinge
upper adjacent value
what is the middle 50 percent of the data known as in a boxplot
the interquartile range
Describe features of a frequency distribution
what are the freatures of a probability distribution
bell curves
smooth but segmented by SD’s
area under curbe is the probability that value occurs
why do we wnant normally fitted distributed data which fits under a bell shaped curve
as it allows us to do more powerful and accurte statistical tests.
power= more likely to be true
what are the two ways that distribution can deviate from normality
lack of symmetry ( skewness)
pointiness ( kurtosis)
what is skewness and how does it occur
is deviation from symmetry . This happens when more extreme scores are affecting the mean.
When histograms show a big difference between means, median and modes
what is positively skewed data
when the tail extends to the right
mode<median<mean
what is negatively skewed data
when the tail extends to the left of the graph and starts at the right
mean < median < mode , meaning mean is smallest
what is good method for showing the distribution of data
histograms
why do we not report the mean in skewed data sets
as the mean is more sensitive to skew, so we report the median
what is kurtosis
is the measure of the tailedness of a distribution
tailedness= how often outliers occur
(in simple terms it refers to the point of the curve)
the higher the curve- more outliers
what are the three types of kurtosis
Mesokurtic ( normal peak and tail)
platykurtic (negative)
leptokurtic ( positive
for the types of kurtosis what are the values for there kurtosis
P= <3
m= 3
L= >3
what are features of leptokurtic
-fat tails : signifies either lots of outliers or occassional outliers which are extreme
Platykurtic distribution features
negative kurtosis
flatter distribution
broad in middle
skinny tails ( few outliers or outliers are not so extreme)
Mesokurtic distribution features
Normal
Medium
Normal amount
0
draw the distribution kurtosis
refer to lecture slides
why is distribution so important
(cover later in semester)
determines what we can do with the data :
parametic/normal= mean/sd
non-parametic/skewed= median or range (IQR)
what inferential tests
Parametic = Parametic tests
non-parametic =np test
how do we determine skewness
skewness statistics
histogram with normality curve
probability density curves