Ch.3 Flashcards

(27 cards)

1
Q

Mean of a data set

A

The mean of a data set in the sum of the observations divided by the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Measures of central tendency or measures of Center

A

Descriptive measures that indicate where the Center or most typical value of a data set lies are called ^.

  • measures of Center often called averages.

3 important measures of Center: mean, median, mode.

  • mean and median apply only to quantitative data.
  • mode either quantitative or qualitative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Median of a data set

A

Essentially, it is the number that divides the bottom 50% of the data from the top 50%.

Arrange the data in increasing order.

• If the number of observations is odd, then the median is the observation exactly in the middle of the ordered list.

• If the number of observations is even, then the median is the mean of the two middle observations in the ordered list.
In both cases, if we let n denote the number of observations, then the median is at position (n + 1)/2 in the ordered list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode of a data set

A

Find the frequency of each value in the data set.

• If no value occurs more than once, then the data set has no mode.
• Otherwise, any value that occurs with the greatest frequency is a mode of the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Resistant measure and trimmed mean

A

A resistant measure is not sensitive to the influence of a few extreme observations.

The median is a resistant measure of center, but the mean is not.

A trimmed mean can improve the resistance of the mean: removing a percentage of the smallest and largest observations before computing the mean gives a trimmed mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Summation notion

A

In statistics, as in algebra, letters such as x, y, and z are used to denote variables.

-We can often use notation for variables, along with other mathematical notations, to express statistics definitions and formulas concisely.

  • Of particular importance, in this regard, is summation notation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The sample mean

A

Values of variables for a sample from a pop = sample data. The mean of sample data is sample mean.

For a variable x, the mean of the observations for a sample is called a sample mean and is denoted ×(with line on top). Read as x bar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Means of variation or measures of spread

A

Describe differences quantitatively, it indicates the amount of variation, or spread, in a data set.

2 frequently used measures: range and sample standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Range of a data set

A

The range of a data set is given by the formula
Range = Max - Min,

where Max and Min denote the maximum and minimum observations, respectively.

Takes into account only the largest and smallest observations. For that reason 2 other measures are favoured over. Standard deviation and interquartile range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The sample standard deviation

A

Takes into account all the observations.
It is the preferred measure of variation when the mean is used as the measure of center.

  • Roughly speaking, standard deviation measures variation by indicating how far, on average, the observations are from the mean.
  • For a data set with a large amount of variation, the observations will, on average, be far from the mean; so the standard deviation will be large.
  • For a data set with a small amount of variation, the observations will, on average, be close to the mean; so the standard deviation will be small.
  • The formulas for the standard deviations of sample data and population data differ slightly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Steps to computing standard deviation

A
  • The first step in computing a sample standard deviation is to find the deviations from the mean, that is, how far each observation is from the mean.
  • The second step in computing a sample standard deviation is to obtain a measure of the total deviation from the mean for all the observations.
  • To obtain quantities that do not sum to zero, we square the deviations from the mean.
  • The sum of the squared deviations from the mean, is called the sum of squared deviations and gives a measure of total deviation from the mean for all the observations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variation and the standard deviation

A

The more variation that there is in a data set, the larger is its standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

3 standard deviations rules

A

Almost all the observations in any data set lie within three standard deviations to either side of the mean.

  • a data set with a great deal of variation has a large standard deviation, so 3 standard deviations to either side of its mean will be extensive.
  • a data set with little variation has a small standard deviation. Hence three standard deviations to either side of its mean will be narrow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Quartiles

A

Certain percentiles are particularly important:

• the 10th, 20th, . .., 90th percentiles are called the deciles and divide a data set into tenths(10 equal parts);
• the 20th, 40th, 60th, and 80th percentiles are called the quintiles and divide a data set into fifths (five equal parts).

• The most commonly used percentiles other than the median are the quartiles, which are the 25th, 50th, and 75th percentiles, and divide a data set into quarters (four equal parts).
— Because of their importance, we use a special notation for the three guar-tiles, namely, Q1, Q2, and Q3.

Hence, roughly speaking,

• the first quartile, O1, is the number that divides the bottom 25% of the data from the top 75%;

• the second quartile, Q2, is the median, which, as you know, is the number that divides the bottom 50% of the data from the top 50%; and

• the third quartile, Os, is the number that divides the bottom 75% of the data from the top 25%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Simpler explanation of Quartiles

A
  • The quartiles divide a data set into quarters (four equal parts).
  • quartiles are used to define the interquartile range
  • is a resistant measure.
  • First, arrange the data in increasing order. Next, determine the median. Then, divide the (ordered) data set into two halves, a bottom half and a top half; if the number of observations is odd, include the median in both halves.
  • The first quartile (Q1) is the median of the bottom half of the data set.
  • The second quartile (O2) is the median of the entire data set.
  • The third quartile (Q3) is the median of the top half of the data set.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To determine the Quartiles

A

Step 1 Arrange the data in increasing order.

Step 2 Find the median of the entire data set. This value is the second quartile, Q2.

Step 3 Divide the ordered data set into two halves, a bottom half and a top half; if the number of observations is odd, include the median in both halves.

Step 4 Find the median of the bottom half of the data set. This value is the first quartile, 01.

Step 5 Find the median of the top half of the data set. This value is the third quartile, Q3.

Step 6 Summarize the results.

17
Q

The interquartile range

A

The interquartile range, or IQR, is the difference between the first and third
quartiles; that is, IOR = Q3 - Q1.

18
Q

The 5 number summary

A

The 5 number summary of a data set is Min, Q1, Q2, Q3, Max.

  • the minimum, maximum, and quartiles together provide, among other things, information on center and variation.
19
Q

Outliers

A

In data analysis, the identification of outliers observations that fall well outside the overall pattern of the data- -is important.

An outlier requires special attention. It may be the result of a measurement or recording error, an observation from a different population, or an unusual extreme observation.

  • Note that an extreme observation need not be an outlier; it may instead be an indication of skewness.
20
Q

Lower and upper limits (or fence)

A

The lower limit and upper limit of a data set are
Lower limit = Q1 - 1.5 * IQR
Upper limit = Q3 + 1.5 * IQR

- Observations that lie below the lower limit or above the upper limit are potential outliers.

  • To determine whether a potential outlier is truly an outlier, you should perform further data analyses by constructing a histogram, stem-and-leaf diagram, etc.
21
Q

Box plots

A

boxplot, also called a box-and-whisker diagram, is based on the five-number summary and can be used to provide a graphical display of the center and variation of a data set.

  • especially suited for comparing two or more data sets. In doing so, the same scale should be used for all boxplots.

-To construct a boxplot, we also need the concept of adjacent values. The adiacent values of a data set are the most extreme observations that still lie within the lower and upper limits; they are the most extreme observations that are not potential outliers.

  • Note that, if a data set has no potential outliers, the adjacent values are just the minimum and maximum observations.
  • the 2 lines emanating from box are called whiskers
  • frequently drawn vertically instead of horizontally
  • symbols other than an asterisk are often used to plot potential outliers
22
Q

To construct a box plot

A

Step 1 Determine the quartiles.

Step 2 Determine potential outliers and the adjacent values.

Step 3 Draw a horizontal axis on which the numbers obtained in Steps 1 and 2 can be located. Above this axis, mark the quartiles and the adjacent values with vertical lines.

Step 4 Connect the quartiles to make a box, and then connect the box to the adjacent values with lines.

Step 5 Plot each potential outlier with an asterisk.

23
Q

Population mean (mean of a variable)

A

First, we sum the observations of the variable for the sample, and then we divide by the size of the sample.

We can find the mean of a finite population similarly:

  • first, we sum all possible observations of the variable for the entire population, and
  • then we divide by the size of the population.
  • However, to distinguish the population mean from a sample mean, we
    use the Greek letter (pronounced “mew”) to denote the population mean.
  • We also use the uppercase English letter N to represent the size of the population.

Note: For a particular variable on a particular population:

• There is only one population mean- namely, the mean of all possible observations of the variable for the entire population.

• There are many sample means one for each possible sample of the population.

24
Q

Parameter and statistic

A

Parameter: A descriptive measure for a population

Statistic: A descriptive measure for a sample

25
Standardized variable
A standardized variable always has mean 0 and standard deviation 1. For this and other reasons, standardized variables play an important role in many aspects of statistical theory and practice.
26
Z-score
For an observed value of a variable x, the corresponding value of the standardized variable z is called the z-score of the observation. The term standard score is often used instead of z-score. - A negative z-score indicates that the observation is below (less than) the mean, - whereas a positive z-score indicates that the observation is above (greater than) the mean.
27
The z-Score as a Measure of Relative Standing
• The three-standard-deviations rule states that almost all the observations in any data set lie within three standard deviations to either side of the mean. Thus, for any variable, almost all possible observations have z-scores between -3 and 3. • The z-score of an observation, therefore, can be used as a rough measure of its relative standing among all the observations comprising a data set. — For instance, a z-score of 3 or more indicates that the observation is larger than most of the other observations; a z-score of -3 or less indicates that the observation is smaller than most of the other observations; and a z-score near 0 indicates that the observation is located near the mean. • The use of z-scores as a measure of relative standing can be refined and made more precise by applying Chebyshev's rule • Moreover, if the distribution of the variable under consideration is roughly bell shaped, then, the use of z-scores as a measure of relative standing can be improved even further. • Percentiles usually give a more exact method of measuring relative standing than do z-scores. However, if only the mean and standard deviation of a variable are known, z-scores provide a feasible alternative to percentiles for measuring relative standing.