What is descriptive statistics and inferential statistics?
Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample (e.g., mean and standard deviation). Taken from all data
for randomness and drawing inferences about the larger population.
Sample
Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. Taken from a sample
These inferences may take the form of:
Data Mining is sometimes referred to as exploratory statistics generating new hypotheses.
What are random variables?
π is a random variable if it represents a random draw from some population, and is associated with a probability distribution.
For example, a Normal distribution, with mean π and variance π2 is written as π(ΞΌ, Ο2) has a pdf of
f(x) = (1 / Ο sqrt(2Ο)e)-(x-ΞΌ)^2/2Ο^2
The Standard Normal
Any random variable can be βstandardizedβ by subtracting the mean, π, and dividing by the standard deviation, π , so
πΈπ =0,ππππ =1.
Thus, the standard normal, π 0,1 , has probability density function (pdf):

Statistical Estimation
Populiation with parameters -every member of the population has the same chance to be selected-> Random sample
Random sample -estimation-> Population
Expected Value of X: Population Mean E(X)
Sampling Distribution of the Mean
Examples of Estimators

Estimators should be Unbiased
An estimator (e.g., the arithmetic sample mean) is a statistic (a function of the observable sample data) that is used to estimate an unknown population parameter (e.g., the expected value)
Standard Error of the Mean: Standard Deviation of Sample Means
The standard deviation of the sample means is equal to the standard deviation of the population divided by the square root of the sample size.
Ο / sqrt(n)
Rule:Var[aX + b] a2 Var[X]
Random Samples and Sampling
Central Limit Theorem

Statistical Estimation
Studentβs t-Distribution
Student t-Distribution

Statistical Estimation (Types)
Confidence Interval (CI)
Provide us with a range of values that we believe, with a given level of confidence, contains a population parameter CI for the population means:
Pr(X - 1.96SD <= Β΅ <= X + 1.96SD) = 0.95
lower bound and upper bound.
There is a 95% chance that your interval contains π.
Example: Standard Normal Distribution
Suppose sample of π=100 persons mean = 215, standard deviation = 20
95% CI = X +- 1.96s / sqrt(n)
βWe are 95% confident that the interval 211-219 contains πβ
Effect of Sample Size
Suppose we had only 10 observations What happens to the confidence interval?
X +- 1.96s / sqrt(n)
Suppose we use a 90% interval
What happens to the confidence interval?
X +- 1.645s / sqrt(n)
90%: 215 1.645(20) / sqrt(100) = (212,218)
Lower confidence level = smaller interval (A 99% interval would use 2.58 as multiplier and the interval would be larger)
Effect of Standard Deviation
Suppose we had a SD of 40 (instead of 20) What happens to the confidence interval?
X 1.96s/ sqrt(n)
215 +- 1.96(40)/ sqrt(100) = (207,223)
More variation = larger interval
Statistical Inference
Random error (chance) can be controlled by statistical significance or by confidence interval
Hypothesis Testing
Possible Results of Tests