Descriptive statistics
Use numerical and visual summaries to describe and illustrate a data set
Variable
Measurement made on each individual case in a population
Ex. If we measure everyone’s weight in the entire population then “weight” would be a variable
Variables can be
Qualitative or quantitative
Quantitative
Ex. Weight. Height. Age. Number of products sold. Etc.
Discrete (countable, like number of products sold
Continuous( not countable like weight. Height)
Qualitative
Ex. Colours, customer satisfaction level. Suites of cards. Etc
Ordinal (has an order/ranking/hierarchy to it, like satisfaction level
Nominative (also called nominal: no natural order, like colours)
Measures of central tendency
(Centrality, location): mean, median, mode
A measure of central tendency is a measurement of the centre or typical value of a data set
Median
Half of the observations are greater than the median and half are less than the median
1,3,3,6,7,8,9
Median is 6
Sample median is denoted
Capital M sub lower case d
Md
How to calculate median if odd and if even
Order the data set
If n is odd, take x sub (n+1)/2
Otherwise
If n is even take (x sub (n/2) + x sub (n/2)+1)/2
Mode
The most frequently occurring value in a set
Ex. 1,2,3,3,3,4,5 mode is 3
Bimodial data set
Ex. 1,2,2,2,3,3,3,4,5,6
This is bimodial. There are 2 modes, 2 and 3
When to use the sample mean
When you don’t have any extreme values in your data set (data is symmmetric) or if you must use the sample mean for subsequent analysis
When to use the sample median
When you have extreme values in your data set, or your data set exhibits skewness
The sample mean gets “pulled” by extreme observations, the sample median does not
When to use sample mode
Only when you need to for subsequent analysis, because it doesn’t provide much information about the data set
(In this course if you should state the modes, the question will ask for it)
Mean formula for excel
=average
Mode formula for excel
=mode
Median formula excel
=median
Skewness
A sample, data set, or population is skewed or exhibits skew if it contains extreme values
Positive vs negative skew
Positive contains right handed skew. 1,2,3,400
Negative is left hand skew
-20,50,51,52
Weighted mean
We’ve looked at a sample mean, which is a measurement of central tendency. Since each observation is given an equal amount of importance, we call this an unweighted mean
Weighted mean
When we need to assign more importance to some observations, we use a weighted mean
When should you use weighted vs u weighted mean
If some of your observations are more important than others, use a weighted mean. Otherwise use a regular sample mean
In example 3 the grade on test 2 was more important than the grade on test 1
Symmetric data
If median and mode are approximately equal
If median is less than the mean
Data set is right , positively skewed