Descriptive statistics
Describing the data you have and presenting it in an easily understandable manner
Inferential statistics
Using data from a sample to make predictions or estimations about a larger group
Mean
Median
Mode
Variance
Standard deviation
Range
IQR: Interquartile Range
Sampling and simulations
Selecting a subset of individuals or observations from a larger population for analysis
Regression analysis
Allows us to find out and predict how changes in one variable are associated with changes in another
Hypothesis testing
Used to determine whether there is enough evidence to support a claim about a population parameter
Confidence interval
A range of values that is likely to contain the true value of a population parameter
Probability
Measures the likelihood of an event occurring
Correlation
Measures the strength and direction of the relationship between two variables.
It ranges from -1 to 1, where:
1 indicates a perfect positive correlation
-1 indicates a perfect negative correlation
0 indicates no correlation
Correlation does not imply causation
Signal VS Noise
Signal: true underlying pattern or effect you’re interested in detecting and understanding
Noise: random, irrelevant variations or errors that can distort the signal/pattern
Exploratory Data Analysis (EDA)
Aims to understand the characteristics and relationships of the data (mostly through visualisations) leading to potential insights
Univariate analysis
Analysis of each variable’s distribution individually
Bivariate analysis
Explores relationships between pairs of variables
Multivariate analysis
Explores relationships between three or more variables
Normal distribution