What is statistics?
Practice or science of collecting & analysing numerical data in large quantities, especially to make inferences on a population based on a representative sample
Descriptive stats
Make descriptions & summaries of population through numbers,graphs: central tendency, data spread, count, proportion, skewness etc
Inferential stats
What are the types of statistics?
What are the measures of central tendency?
Mean, Median, Mode
When is mean usually used?
Suitable for symmetric distribution, often with SD
When is median usually used?
Suitable for skewed distribution, often with IQR
Why is median most used in skewed distribution?
It is less sensitive to extreme values unlike mean where it is pulled with the direction of skew
What is variance?
Average of squared differences of each data point from mean, squared unit of mean
What is standard deviation?
Square root of a variance
What does a small & large SD mean?
Small - data points are closer around the mean
Large - data points are further to mean
What does small & large variance mean?
Small - data are close to mean & each other
Large - data are far from mean & each other
What is the empirical rule?
68% - within 1 SD from mean
95% - within 2 SD from mean
99.7% - within 3 SD from mean
What is the purpose of inferential stats?
What is confidence interval?
a range of values where the true mean lies
What is the distribution assumed for parametric tests?
Normal (Gaussian) distribution
What are the limitations of inferential stats?
What is non parametric test based on?
No need to follow normal distribution mostly based on rank order or how common data is
What central tendency does parametric & non parametric measure?
Para - mean
Non para - median
What type of variables does parametric measure?
Continuous
What type of variables does non parametric measure?
Continuous and discrete
Assumptions for parametric test?
How to check for normality?
T-test
Determine whether there is a significant difference between the means of two groups. It is widely used in hypothesis testing when comparing sample means to make inferences about population mean