Correlation
measures the linear relationship between two variables
we need to find the covariance first
covariance formula
sXY = (Sum of all (Xi - Xmean)(Yi - Ymean))/(n-1)
correlation formula
rXY = sX*sY
the 4 important properties of correlation
-1 <= rXY <=1
–> In other words, an increase in X is associated with an increase in Y.
–> When rXY=1, the variables have a perfect positive linear relationship
–> In other words, an increase in X is associated with a decrease in Y
–> When rXY=−1, the variables have a perfect inverse linear relationship or perfect negative linear relationship.
if there is no linear pattern, is it appropriate to use the correlation coefficient to test any relationship between variables?
nope
Limitations of Correlation Analysis
–> Analysts need to justify the inclusion of the outliers in the data or handle them through trimming or winsorization
–> A spurious correlation
–> Different pairs of datasets may have the same correlation but different underlying relationships