Data Analysis Flashcards

(64 cards)

1
Q

Mean/Median/Mode

A
  • Mean: Average
  • Median: Middle Number in a chronological order
  • Mode: The data element that appears most often in the dataset.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Finding the Median

A
  1. Count number of integers and ÷ 2
  • If the result ends in .5, then the median is the nth term following: 5.5→6th term, 8.5→9th term
  • If the result is a whole #, then that is the nth term there are two terms: 6→6th term from each side. You will then need to calculate the avg of those two terms to find the median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cases where Mean = Median

A

The Mean can = Medium in the below cases:
* The Dataset is Evenly Spaced: The gap between the two numbers is equal to each other: 1, 5, 9, 13, 17→+4 btwn each term
* The Dataset is Symmetrical: The gaps between the two numbers is symmetrical: 4, 6, 9, 9, 12, 14
4+2=6, 6+3=9, 9, 9+3=12, 12+2=14

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If Mean = Median

A

If the Mean = Median, this can make some calculations easier:
* Calculating Median/Mean: (First # + Last #) ÷ 2
* Calculating Sum of Integers: Mean x # of integers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mode

A
  • The data element that appears most often in the dataset.
  • There can be more than one mode if there are several numbers that appear the same amt of times: bimodal (two modes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Weighted Mean in FQ

Calculate weighted avg of below:
Number/FQ
3/1
7/5
11/3

A

The average but places importance on numbers based on their weight/frequency.

Weighted Mean = (Number x FQ) ÷ Total number of #/FQ

(3 x 1) + (7 x 5) + (11 x 3) ÷ 9 = 71/9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Weighted Mean in Grades

What is the weighted avg?
Midterm: 85% score/40% weight
Final Exam: 80% score/60% weight

A
  1. Convert weighted % to integer as points (20%→20, 60%→60)
  2. Calculate average:
    Weighted Avg = (Score ① x weighted% pt) + (Score ② x weighted% pt) + (Score ③ x weighted% pt)

Weighted Avg: (.85 x 40) + (.8 x 60) = 34 + 48→82 Weighted Avg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Quartiles

A

Dividing the dataset into four equal groups.
1st Quartile, 2nd Quartile, 3rd Quartile, 4th Quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q₁, Q₂, Q₃ values

What are the quartiles for the below datasets?
* Dataset 1: 10,14,18,21,25,34,46
* Dataset 2: 13,17,19,22,23,40,41,46

A
  • Q₁ : A boundary # that separates lowest 25% of values, also known as 25th percentile
  • Q₂ : A boundary # that separates lowest 50% of values/Also known as the median/50th percentile
  • Q₃ : A boundary # that separates lowest 75% of values, also known as 75th percentile
  1. Find the median (Q₂) and draw a vertical line through it to separate the dataset into two equal halves
  2. Q₁ : The median in the first half
  3. Q₃ : The median in the second half
  • If the median is a term in the dataset, draw a line in the term, and disregard the term when splitting the dataset in half
  • If the median is the avg btwn middle terms, draw a line btwn the two middle terms, and include the two middle terms in each of the halved dataset

  • Dataset 1: Q₁ = 14 Q₂ = 21 Q₃ = 34
  • Dataset 2: Q₁ = 18 Q₂ = 22.5 Q₃ = 40.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Percentiles

A
  • Dividing the dataset into 100 equal groups.
  • If a value is in the 50th percentile, it means that that value is higher than 50% of the values in the dataset.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measures of Dispersion

A

Tells us how “spread out” a certain dataset is.

3 measures:
* The Range
* The Interquartile Range
* Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Range

What is the range of the below dataset?
Dataset: {62, 60, 75, 73, 80, 95}

A

Range = Largest # - Smallest #
* Make sure to arrange numbers in ascending order
If all values in a dataset are equal to each other, the range is
0

  1. Re-arrange so that numbers are in ascending order.
  2. largest # - smallest #: 95 - 60 = 35

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outliers and Effects

A

Datapoints that differ significantly from most (or all) of the other data.

Outlier Effects:
* Average: Affected. If we add a large number outlier to the data, the average will increase.
* Median: is NOT affected (or only slightly so).
* Range: Affected. If we add a large number outlier to the data, the range will increase.
* Interquartile range: is NOT affected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interquartile Range

Calculate the interquartile range of the dataset below:
9,13,17,30,34,37,42,46,49,53

A

Use to find a more realistic range when there is an outlier.
Interquartile Range: Q₃ - Q₁

  1. Calculate Q₁ and Q₃ values:
    med: 35.5, Q₁: 17 Q₃: 46
  2. Q₃ - Q₁: 46 - 17 = 29

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Boxplots

Construct a box plot with below values:
3,6,12,19,31,36,37,60

A

A box diagram that indicates 5 data points:
* Lowest #
* Highest #
* Q₁
* Q₂
* Q₃

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard Deviation

A

A value that gives us an idea of how “spread out” a group of numbers is. The higher the standard deviation, the greater the spread.

  • Standard deviation is represented by σ
  • Standard deviation is never negative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Calculating SD (simple cases)

What’s the standard deviation of a dataset of 10 and 22?

A
  1. If there is only one number in the dataset (or every number is the same), the SD is 0: 1,1,1,1,1 = SD 0
  2. If there are exactly two numbers in the dataset. Calculate the SD by first calculating the average of the two numbers then subtract one of the numbers, the difference = SD.

  1. Calculate average of 10 and 22 = 16
  2. 16 - 10 = 6
  3. SD = 6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Calculating SD (complex cases)

What is the SD of the dataset: 1,4,9,16,25?

A
  1. Calculate average
  2. Find difference btwn average and each term (avg - 1st term…)
  3. Square each difference and add them all together:
    difference of 1st term² + difference of 2nd term² + difference of 3rd term²….
  4. Sum of differences² ÷ # of terms
  5. Take the square root of above number √ →SD

  1. Calculate avg = 11
  2. Find difference btwn 11 and each term: (11-1) , (11-4), (11-9), (11-16), (11-25)
  3. Square each term and add: 10² + 7² +2² +5² +14² = 374
  4. Divide above number by # of terms: 374 ÷ 5 = 74.8
  5. Square root of above number: √74.8
  6. SD = 8.65

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Population vs Sample SD

Sample SD of below dataset:
5, 6, 8, 21

A

Population: Calculating SD using all terms
Sample: Calculating SD with only a sample (not all)

Population: Normal SD calculation
Sample: Normal SD calculation but you divide by ÷ (# of terms -1)

  1. Calculate avg: 10
  2. Find differences and square each difference
  3. Add up all the squared differences: 166
  4. 166 ÷ (4-1): 166 ÷ 3 = 55.3
  5. Sample SD = √55.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

SD Effects

A
  • Add or Subtract the Same Number to Each Number: No Change to SD
  • Multiply or Divide Each Number by the Same Number: SD is multiplied/divided by the same number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sets vs. Lists

A

Sets:
* Written with (), [], or no brackets
* All elements are in order: (1, 2, 3) ≠ (3, 2, 1)
* Does not count repeats: 1, 1, 2, 2, 3 = 3 elements
* Can be finite or infinite elements

Lists:
* Written with {}
* Elements do not have to be in order
* Counts repeats: {1, 1, 2, 2, 3} = 5 elements
* Always finite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Standardization

Standardize the values below. The average is 3 and σ is √2 or approximately 1.4
1,2,3,4,5

A
  • A process that tells you “how many standard deviations” from the avg a number in a dataset is.
  • For example, imagine we have a list of numbers in which the average = 30 and SD = 5
  • The number 25 is exactly one standard deviation below the mean, so we would give it the “standard” value of −1.
  1. Subtract avg from each number in the dataset
  2. Divide each difference by ÷ SD

  1. Subtract avg from each number: (1-3), (2-3), (3-3), (4-3), (5-3)
  2. Divide each difference by SD: (-2÷√2), (-1÷√2), (0÷√2), (1÷√2), (2÷√2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Set Intersections

What is the set intersections for the below sets?
Set A={1,2,3,4,5,6,7,8,9,10}
Set B ={2,3,5,7,11,13,17,19}

A
  • A new set that is comprised of all elements the two original sets share in common.
  • Insersections are indicated as ∩

For the below sets, what is A∩B?
Set A={1,2,3,4,5,6,7,8,9,10}
Set B ={2,3,5,7,11,13,17,19}
Since they both share 2,3,5,7 in common, the set intersection:
A∩B={2,3,5,7}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Set Unions

What is the set union for the below sets?
Set A={1,2,3,4,5,6,7,8,9,10}
Set B ={2,3,5,7,11,13,17,19}

A
  • A new set that is comprised of all the elements of both original sets.
  • Unions are indicated as ∪
  • Remember that a set does NOT HAVE REPEATS, If there are repeated numbers in both sets, we only write down the number once.

For the below sets, what is A∪B?
Set A={1,2,3,4,5,6,7,8,9,10}
Set B ={2,3,5,7,11,13,17,19}
The set union is all the numbers:
A∪B={1,2,3,4,5,6,7,8,9,10,11,13,17,19}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Mutually Exclusive Sets
* Two sets that share NOTHING in common. (Set A: Even numbers, Set B: Odd numbers)
26
Complete Overlap
Two sets that where both elements overlap. **1. Case 1: The two sets are identical.** Both the intersection and the union of the two sets are equal to each set individually: A∩B = Set A = Set B = A∪B Set A: 1,2,3, Set B: 1,2,3 Since both sets are identical, A∩B={1,2,3}={1,2,3}=A∪B **1. Case 2: The set with fewer elements is contained entirely in the larger set.** The intersection of the two sets is equal to the smaller set. The union of the two sets is equal to the larger set. A∩B= set with fewer elements A∪B= set with more elements Set A: 1,2,3 Set B: 1,2,3,4,5 Since numbers in Set A are all included in Set B, A∩B= 1,2,3 A∪B = 1,2,3,4,5
27
Inclusion-Exclusion Formula ## Footnote 100 students take either physics, biology or both. If 38 students take physics and 81 students take biology, how many students take both?
Used to find the number of elements in an intersection and an union of two sets. Total = A+B−both+neither * Total: Total # of elements in both Set A + B * A: Total # of elements in Set A * B: Total # of elements in Set B * both+neither: # of elements shared between Sets A&B or neither ## Footnote Total students = # of physics students + # of bio students - # of students who take both/neither 100 = 38 + 81 + X→X = 19 19 students take both physics and biology
28
Inclusion-Exclusion Principle (Only one) ## Footnote There are 95 houses, and each house has a yard, a pool or both. If 34 houses have a pool, 77 houses have a yard, how many houses have only either one?
Only one = A + B - (2 x both) ## Footnote 1. First, we calculate # of houses that have both: 95 = 34 + 77 - both→both = 16 2. Then we plug it in the only one formula to find # of house that have only one. Only one = 34 + 77 - (2 x 16)→79 79 houses have either only a yard or a pool.
29
Three Overlapping Sets ## Footnote In a neighborhood containing 235 houses, 77 have a swimming pool, 100 have a garage, and 156 have a yard. If every house has at least one of these elements, and if 14 houses have all three, how many houses have exactly two?
total=A+B+C−(exactly two)−2(all three)+none ## Footnote 1. 235 = 77+100+156 - (exactly two) - (2 x 14) 2. Exactly two = 70
30
The "Choice" Method ## Footnote At a certain restaurant, you have the choice of four entrees, seven side dishes, and three desserts. If you must choose only one of each, how many different meals can you order?
The "Choice" Method is a great way to solve problems that ask you to calculate the "number of ways" you can do something. 1. Write out the number of hash marks that corresponds to the number of categories in the problem. 1. For each category, and on top of each hash mark, write out the number of choices that you have. 1. Multiply these numbers together. ## Footnote 1. Write out number of hash marks: ____ ____ ____ 2. Input # of choices for each category in the hashmarks: 4 entrees, 7 sides, 3 desserts 3. Multiply all numbers together: 4 x 3 x 7 = 84
31
Combinatronics Permutations and Combinations
Combinatronics: Number of ways things can be arranged. * Permutations: The number of possible arrangements, where order is important. * Permutations are indicated as P (n, r) * Combinations: The number of possible groupings, where order is not important. * Combinations are indicated as C (n, r) or C (n/r) n is above r w/out the line. * n = Total number of items, r = number of Number of items being chosen * Permutations > Combinations
32
Permutations or Combinations?
**Permutations:** * arranged/arrangements * ranking * routes * position/placements **Combinations:** * group/groupings * subs/subsets * teams/committees * pairs/pairings
33
Permutations ## Footnote 10 people enter a contest that awards 1st, 2nd, and 3rd prizes. How many different ways can three people win the prizes?
n! ÷ (n - r)! n = Total # of elements r = # of choices ## Footnote 1. 10! ÷ (10-3)! = 10! ÷ 7! 1. (10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1) ÷ (7 x 6 x 5 x 4 x 3 x 2 x 1) 1. 10 x 9 x 8 = 720→720 different ways
34
Permutations with Restrictions
Permutations with restrictions: Problems that ask for a number of ways elements can be arranged with a certain restriction/rule applied.
35
Permutations with Restrictions: Problem 1 Abe, Bill, Cat, Demi and Ed go to a movie. Bill must sit in the middle.
1. Write down hashmarks corresponding to # of seats: 5 2. If Bill must sit in the middle, that means that the middle seat is fixed by 1 choice (Bill). So we mark the middle hashmark as 1 2. This causes other seats to only have 4 choices (Abe, Cat, Demi, Ed) So, the other hashmarks are: 4 x 3 x 2 x 1 3. Multiply all choices: 4 x 3 x 1 x 2 x 1 = 24
36
Permutations with Restrictions: Problem 2 Abe, Bill, Cat, Demi and Ed go to a movie. Abe and Bill must sit together
1. Write down hashmarks corresponding to # of seats: 5 2. If Abe + Bill must sit together, they count as 1 unit, so we can combine their hashmarks into 1, resulting in 4 hashmarks. 3. Fill in the 4 hashmarks: 4 x 3 x 2 x 1 4. Multiply all choices: 4 x 3 x 2 x 1 = 24 5. But we have to take into account the order of Abe and Bill: Abe+Bill or Bill+Abe. Therefore we 24 x 2 to include all options.
37
Permutations with Restrictions: Problem 3 Abe, Bill, Cat, Demi and Ed go to a movie. Ed must sit on either end.
1. Write down hashmarks corresponding to # of seats: 5 2. If Ed must sit on either side: the side seat is fixed by 1 choice (Ed), so that leaves 4 available choices for the other seats: 1 x 4 x 3 x 2 x 1 = 24 3. But we have to take into account that Ed can sit on either side (right or left). Therefore, we need to x 2 to get all options. Ed on left: **1** x 4 x 3 x 2 x 1 = 24 Ed on right: 4 x 3 x 2 x 1 x **1** = 24 24 + 24 = 48 total choices
38
Permutations with Restrictions: Problem 3 Abe, Bill, Cat, Demi and Ed go to a movie. Abe cannot sit next to Bill
* In this case, it is easiest to find the undesired outcome and subtract it from the total w/out restrictions. * Total w/out restrictions - undesired outcome 1. Calculate the # of choices when there are no restrictions: 5 x 4 x 3 x 2 x 1 = 120 2. Calculate the # of choices when Abe+Bill sit together: 4 x 3 x 2 x 1 = 24 3. Subtract to get the # of choices where Abe cannot sit next to Bill: 120 - 24 = 72
39
Permutations with Repeats | How many combinations can the word BANANA be written? ## Footnote How many combinations can TEXTBOOK be arranged if all vowels must be together at the front?
Total # of choices! ÷ # of Repeats! We divide by the # of repeats b/c permutations don't count repeats BANANA = Since there are 3 As and 2 Ns→6! ÷ 3! 2! = 6 x 5 x 2 = 60 ## Footnote 1. TEXTBOOK: Separate vowels (E,O,O) and consonants (T,X,T,B,K) 2. Calculate the repeats for each category: Vowels: 3! ÷ 1!2! = 3 Consonants: 5! ÷ 2!1!1!1! = 60 1. Multiply both repeats to get the total: 3 x 60 = 180
40
Using Permutations in a graph/route ## Footnote Imagine there is a small car located at the origin of the xy-coordinate plane. The car can move in one-unit increments. How many distinct shortest routes can be taken by this car to the point (5,4)?
We can use the permutation repeat formula to solve # of routes. ## Footnote 1. Map out the car's route using letters: U for an "up" move and R for a "right" one. No matter the route, we have to move UP four times and move to the RIGHT five times. So we have this word: UUUURRRRR. 1. We can then use the permutation repeat formula to calculate the ways the letters/routes can be arranged: 9! ÷ 4!5! = 126
41
Permutations in a Circle ## Footnote We have three people: Abe, Bob, and Cob sitting in a circle, how many distinct arrangements are there?
(n−1)! n = # of ppl or elements around the circle b/c at a circular table, you get repeats. In order to get rid of the repeats, we use the above formula. ## Footnote (3 -1)! = 2! = 2
42
Combinations ## Footnote 1. A boss has to select three people for a business trip. Out of 6 people, how many different combinations are there? 1. A professor must select three juniors and two seniors out of 5 juniors and 6 seniors. How many different combinations are there?
n! ÷ r! (n-r)! * n: total number of elements * r: the size of each grouping ## Footnote 1. 6! ÷ 3! (6-3)!→6! ÷ 3!3! = 20 2. Separate juniors and seniors and calculate for each. Juniors: 5! ÷ 3! (5-3)! = 10 Seniors: 6! ÷ 2!(6-2)! = 15 Multiply both combinations to get the total: 10 x 15 = 150
43
The Combinations Pattern
* If the total number of elements is odd, the data results in two equal peaks in the middle. (two medians) * If the total number of elements is even, there is the tallest peak in the middle. (one median)
44
Yes or No ## Footnote An ice cream store has 8 toppings. Customers can have as many or few toppings they want, how many different combinations are possible?
Permutations can be used to calculate combinations of Yes or No problems. For Yes or No, it will be indicated as 2 choices. ## Footnote 1. Write down hashmarks for toppings: 8 2. Input 2 in each hashmark 3. Multiply all together: 2⁸
45
Probability
* Probability is a number from 0 to 1. All possible outcomes add up to 1 * A value of 0 indicates that something will never happen. * A value of 1 indicates that it is guaranteed to happen. * Probability can be presented as a decimal, a fraction, or a percentage: 0.25=¼ =25% * To calculate the probability of an event occurring, create a fraction that has in the numerator the number of desired cases and total number of cases in the denominator. * To calculate the probability of an event not happening simply calculate the probability of it happening and subract it from 1. * Probability does not change **when** you pull a marble out even when you don't replace/put it back.
46
Independent Events
* Events that DO NOT influence each other. For example, a coin flip does not influence the second. * Can happen simultaneously * In a Venn diagram, the circles are slightly overlapping.
47
Mutually Exclusive Events
* Two events that CANNOT happen simultaneously. For example, a ball cannot be "red" and "not red." It's either one or the other. * Probability of a mutually exclusive event occuring + not occuring = 1. It cannot be >1 b/c that will mean that there is an overlap. * In a Venn diagram, the circles are separate and there is no overlap
48
Probability of A AND B (Independent Events/Mutually Exclusive Events)
**Independent Events:** If we have two independent events, A and B, the probability they both occur? Probability of both A and B = A×B But remember, this ONLY works in independent events. **Mutually Exclusive Events:** Since there is no overlap, the Probability is zero.
49
Probability of A OR B (Independent Events/Mutually Exclusive Events)
**Independent Events:** A or B = A+B − AB We subract A x B to remove the double count of the probability of them BOTH happening (overlap). What is the probability that either event occurs but not both? A or B & not both = A+B − (2 x AB) We subtract A x B twice to completely remove the probability of them BOTH happening (overlap). **Mutually Exclusive Events:** A or B = A+B There is no need to subtract AB b/c they don't have overlap.
50
The Extremes of Probability
If a problem doesn't tell us whether two events are independent or mutually exclusive, we have to consider The Extremes of Probability. * Independent Event A and B = A x B A or B = A + B - AB * Mutually Exclusive: A and B = 0 A or B = A + B * Complete Overlap (where one event is completely dependent on another) Circle is inside another in a Venn diagram. If B is completely dependent on A (B is the smaller circle inside A) A and B = B A or B = A
51
Probability and Combinatorics Combined If someone flips a fair coin five times, what is the probability he or she gets heads exactly three times? ## Footnote The chance of rain on any given day is 0.4. What is the probability that it rains exactly twice in a 4-day period?
1. Calculate the probability of one "successful" case (using the "choice" method) 2. Calculate number of ways to arrange that "successful" case (using combinatorics). 3. Multiply both numbers. * Probability of Heads: ½ H x H x H x T x T ½ x ½ x ½ x ½ x ½ = ¹⁄₃₂ * Calculate number of ways to arrange "HHHTT" 5! ÷ 3!2! = 10 * Multiply both numbers: ¹⁄₃₂ x 10 = ¹⁰⁄₃₂ ## Footnote 1. We'll use capital letter R to represent "rain" and capital letter N to represent "no rain." The probability of R = 0.4, so the probability of N = 1−0.4 = 0.6 2. Successful case: 0.4 x 0.4 x 0.6 x 0.6 = 0.0576 3. Calculate ways to arrange the successful case "RRNN" = 4! ÷ 2!2! = 6 4. Multiply both: 0.0576 x 6 = 0.3456
52
AT LEAST / AT MOST Questions * John flips a fair coin four times. What is the probability that he receives **at least** one heads? * The probability that John wears a green shirt is ⅕, what is the probability that we wears a green shirt **at most** 4 days in a 6 day period?
# of ways: 6! ÷ 5!1! = 6 At least/At most probability = 1 - Cases we don't want **AT LEAST:** 1. Calculate case we don't want: All Tails TTTT = ½ x ½ x ½ x ½ = ¹⁄₁₆ 1. Subtract from 1: 1 - ¹⁄₁₆ = ¹⁵⁄₁₆ **AT MOST:** 1. Calculate cases we don't want : Green shirt 5 days and 6 days * 5 days: ⅕ x ⅕ x ⅕ x ⅕ x ⅕ x ⅘ = ⁴⁄₁₅₆₂₅ Number of ways: 6! ÷ 5!1! = 6 ⁴⁄₁₅₆₂₅ x 6 = ²⁴⁄₁₅₆₂₅ * 6 days: ⅕ x ⅕ x ⅕ x ⅕ x ⅕ x ⅕ = ¹⁄₁₅₆₂₅ Number of ways: 6! ÷ 6! = 1 ¹⁄₁₅₆₂₅ x 1 = ¹⁄₁₅₆₂₅ 1. Add both cases: ²⁴⁄₁₅₆₂₅ + ¹⁄₁₅₆₂₅ = ²⁵⁄₁₅₆₂₅ 1 - ²⁵⁄₁₅₆₂₅ = ¹⁵⁶⁰⁰⁄₁₅₆₂₅
53
Given Probability A couple has four children. If it is assumed or "given" that at least two of the children are boys, what is the probability that all four are boys?
For some problems, we have to assume that some cases are given and must be excluded. * 0 boys: GGGG How many ways can this happen? 4! ÷ 4! = 1→exclude 1 case * 1 boy: BGGG How many ways can this happen? 4! ÷ 1!3! = 4→exclude 4 cases * 2 boys: BBGG→4! ÷ 2!2! = 6 * 3 boys: BBBG→4! ÷ 3!1! = 4 * 4 boys: BBBB→4! ÷ 4! = 1 Total cases = 1 + 4 + 6 + 4 + 1 = 16 In total, we have 16 cases but 5 of them are invalid and must be excluded. So really, we're only dealing with 11 valid cases (16-5). That is the denominator of our probability calculation. How many of those valid cases have all four boys? Only 1. Therefore, the case in which all four children are boys = ¹⁄₁₁
54
Expected Value ## Footnote If you correctly guess a randomly chosen positive integer from 1 to 100 inclusive, you will win $25,000. However, to play this game you must pay $300.
* If you roll a fair, 6-sided die enough times, what is the average value you can expect? Each outcome has a certain probability and value, and if we multiply those numbers together and add all the outcomes up, we get the expected value, or the weighted average. * The Expected Value of a Roll of the Die ⅙ (1) + ⅙ (2) + ⅙ (3) + ⅙ (4) + ⅙ (5) + ⅙ (6) = 3.5 So on average, we can "expect" a value of 3.5 for each roll of a fair die. ## Footnote Correctly guessing a randomly chosen integer from 1 to 100 has a probability of ¹⁄₁₀₀ or 0.01. So we can expect to win this game about 1% of the time. The expected value is this value multiplied by the award or the prize money. 0.01 × $25,000 = $250 Given that we can expect to win $250 on average every time we play this game, but have to pay $300 to play the game one time, this is NOT a good deal. In fact, we can say the TRUE expected value of playing this game is... $300 −$250=$50 It COSTS you $50 to play the game.
55
Relative Frequency
How common/uncommon a value in a dataset is in relative to other values in the same dataset.
56
Histograms
* A bar graph used to show frequency distributions. * Frequency distribution defines how often each different value occurs in the data set. * A histogram's total area sums to 100% when the vertical axis represents relative frequency. * Symmetric Histogram: mean = medium * Histogram has a tail to the left: mean>medium (mean follows the tail) * Flatter histogram has higher SD eg. Families w/ 0 children comprise of 20% of the data Families w/ 1 child comprise of 60% of the data Families w/ 2 children comprise of 20% of the data
57
Probability Distribution Table
* Tables can be used to display relative frequencies. * The entries in the first row are the values of the variable, and the entries in the second row are their corresponding probabilities, * All probabilities must add up to 1.
58
Random Variable Mean ## Footnote Rolling a fair 6-sided die
* A random variable = something that gives random numbers. * The mean (expected value) = the average number you’d expect if you repeated it a ton of times, weighted by how likely each result is. Also known as expected value/weighted average. Calculating random variable mean: (value1 x probability) + (value2 x probability) + (value2 x probability) ... ## Footnote Possible values: 1, 2, 3, 4, 5, 6 Each has a probability of ⅙ To find the mean: 1(⅙) + 2(⅙) + 3(⅙)+ 4(⅙)+ 5(⅙)+ 6(⅙) = ⅙ (1+2+3+4+5+6) = ²¹⁄₆ = 3.5 So the average result is 3.5 — even though you’ll never actually roll a 3.5!
59
Uniform Distribution
The probability is distributed evenly across all possible outcomes. e.g. a fiar die (⅙, ⅙, ⅙, ⅙, ⅙, ⅙)
60
Skewness
* Data being biased toward one direction (the right or the left). * If the distribution has a longer tail to the left, we say it is "left-skewed." * If the tail is longer to the right, we say it is "right-skewed."
61
Mean versus Median (Skewness)
* Positively skewed distribution (tail to the right): Mean > Median. * Negatively skewed distribution (tail to the left): Mean < Median * Symmetrical distribution: Mean = Median
62
Normal Distribution
* Normal Distribution: A bell-shaped Symmetrical Distribution * Most of the data is "bunched up" toward the middle. As we move either to the left or the right, the datapoints get rarer and rarer * The curve extends infinitely in both directions. * The area under the curve is always = 1. * Mean = Median = Mode * If you draw a line that intersects both the number line below and the highest point of the chart, you will identify the mean and median values on the number line.
63
Normal Distribution Graph
* The values between median and m +/- 1d (mean +/- 1 standard deviation) = 34% of graph * The values between median and m -/+ 2d (mean -/+ 2 standard deviation) = 14% of graph * The values between median and m -/+ 3d (mean -/+ 3 standard deviation) = 2% of graph
64
Effects on Normal Distribution
* Subtract or add something to the mean: Subtract (graph moves to the left), Add (graph moves to the right) * Subtract or add something to the standard deviation: Subtract (graph becomes narrower/taller), Add (graph becomes wider/shorter)