population mean
sample mean
median
mode
Range
(R)
population variance
> (n=length(data$Score))
(x =data$Score)
(v.population = sum((x-mean(x))^2) / n)
sample variance
R: var(DataTable$Column)
R var() only gives sample variance
standard deviation
population and sample
square root of the variance
- denoted by Ο (population) or s (sample)
Coefficient of Variation
(CV)
Computing standard deviation
Excel and R
Empirical Rule
left to right percentages under curve:
0.15 + (2.35 + (13.5 + (34 + 34) + 13.5) + 2.35) + 0.15
Standardized values
z value
z = (Data value (y) - mean(ΞΌ)) / Standard deviation (Ο)
p-value
p-value calculation for specific range
example
probability of student scoring b/n 450 and 600 on SAT
mean = 500, sd = 100
z = (600-500)/100 = 1.0 = p-value 0.8413
z = (450-500)/100 = -0.50 = p-value 0.3085
0.8413 - 0.3085 = 0.5328 or 53.28%
Standard normal curve
(π=0, π=1)
Normalizing Data
Computing z-values
Excel and R
Computing p-values
π(π§)π₯ < π§
Excel and R
βLeftβ Area (Probability) under Standard Normal Curve
- Excel: =NORMSDIST(z-value) Normal standard dist
- R: pnorm(z-value)
βRightβ Area (Probability) under Standard Normal Curve
- Excel: =1-NORMSDIST(z-value)
- R: 1 - pnorm(z-value)
βIn Betweenβ Area (Probability) under Standard Normal
Curve
- Excel: =NORMSDIST(high z) - NORMSDIST(low z)
- R: pnorm(high z) - pnorm(low z)
Converting p-value (‘left’ area) to z-value
Excel and R
Skewness
types (3)
Percentile
Quartile
The quartiles divide the data into 4 equal parts
*First quartile: Q1 *
- Bottom 25% (25 percentile)
Second quartile: Q2
-Bottom 50% = 50 percentile (median)
Third quartile: Q3
- Bottom 75% = 75 percentile
Boxplot
features and R command
Skewness and Boxplots
Normal Distribution
- (Q3-Q2) = (Q2-Q1)
Positive Skew
- (Q3-Q2) > (Q2-Q1)
Negative Skew
- (Q3-Q2) < (Q2-Q1)
Data Standardization and Scaling
Standardization Data Variation (z-value)
- Range: -3 to +3
**Scaling Data Variation **
- (value - min value) / (max value - min value)
- Range: 0 to 1
- not effective with outliers bc will suppress scaling values of other data elements