Regression
Regression can be defined as a method or an algorithm in Machine Learning that models a target value
based on independent predictors. It is essentially a statistical tool used in finding out the relationship
between a dependent variable and an independent variable. This method comes to play in forecasting
and finding out the cause and effect relationship between variables.
Regression techniques differ based on:
data used
Regression is basically performed when the dependent variable is of a continuous data type. The
independent variables, however, could be of any data type — continuous, nominal/categorical etc.
regression methods do..
Regression methods find the most accurate line describing the relationship between the dependent
variable and predictors with least error. In regression, the dependent variable is the function of the
independent variable and the coefficient and the error term.
Correlation
is a measure of the strength of a linear relationship between two quantitative variables
(e.g. price, sales)
Correlation can have a value
cross tabs
Cross tabs help us establish a relationship between two variables. This relationship is exhibited in a tabular form
Column percentages
(these are percentages within the columns, so that each column’s
percentages add up to 100%
in cross tabs when the variables are not ordered..
where both variables are not ordered, we can simply refer to the strength of the
correlation without discussing its direction
Scatterplots
What type of correlation is shown here?
This is a negative correlation. As we move along the x-axis toward the greater numbers,
the points move down which means the y-values are decreasing, making this a negative correlation.
Pearson’s r
Requirements for Pearson’s correlation coefficient are as follows: Scale of measurement should be
interval or ratio
What does this test do?
Pearson’s r
What values can the Pearson correlation coefficient take?
How can we determine the strength of association based on the Pearson correlation coefficient?
if we use a simple linear regression model where y depends on x, then the regression line
of y on x is:
y = a + bx
regression constant
The two constants a and b are regression parameters. Furthermore, we denote the
variable b as byx and we term it as regression coefficient of y on x.
least square method is suitable for
The standard form of the regression equation of variable x on y is:
[ x – x¯ ]/Sx = r[ y – y¯ ]/Sy
a regression line
: In statistics, a regression line is a line that best describes the behaviour of a set of data. In other
words, it’s a line that best fits the trend of a given data.
The regression line formula is like the following:
(Y = a + bX + u)
The multiple regression formula looks like this
(Y = a + b1X1 + b2X2 + b3X3 + … + btXt +u.)
u is the residual regression
purpose of regression line