scatterplots
graphical summary of the relationship between two quantitative variables, plotted as (x, y) pairs
the straight line is called a regression line
must determine which varible is
the regression line describes how the response variable changes as the explanatory variable changes
a = intercept coefficient
b = slope coefficient

linear correlation coefficient (r)
measures the strength and direction of a linear relationship between two quantitative variables

regression line for prediction
after the fitted regression equation, use it to predict variables of y for any value of x (even for values of x that were not in the original sample data)
outliers
observations that lie outside the overall pattern of other observations
influential points
observations that, if removed, would considerably change correlation or line
best method to establish a relationship
best method: manipulate explantory variable in an experiment institute control for other variables
people who smoke more often or a longer period get lung cancer more often put people who stop reduce risk
lung cancer develops after years of smoking and was rare among women until wmen began to smoke
non-observational animal studies have shawn tar from cigarettes does cause cancer