Ch.2 Flashcards

(42 cards)

1
Q

quantitative variables

A

(numerical): yield numerical information.

  • classified as either discrete or continuous.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

qualitative variables

A

(categorical): does not assume a numerical value but rather, is classifiable into 2 or more non-numeric, distinct, categorical values,

  • i.e. gender (male, female, other),
    species (chinstrap, gentoo, adelie), smell (stinky, neutral, fresh, Old Spice fresh), etc.
  • When there is ordering among the levels of a categorical variable, the variable is said to be ordinal; when no such ordering exists, the variable is called nominal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

A variable is a characteristic/condition that varies for different individuals/observations,

i.e. height, weight, g.p.a., gender, number of times visited by aliens, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

discrete variable

A

is a variable whose possible values can be listed, even though the list may continue indefinitely.

-involves a count of something

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

continuous variable

A

is a variable whose possible values form some interval of numbers.

-involves a measurement of something

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data

A

Values of a variable.

  • information, measurements, or observations.
  • In order to obtain data, one must first identify variables of interest. The data then
    correspond to particular realizations of said variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Observation

A

Each individual piece of date is called this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data set

A

A collection of all observations is called a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

frequency distribution

A

A frequency distribution of qualitative data is a listing of the distinct values and their frequencies.

  • Frequency distributions are typically displayed in a table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Frequency or count

A

In a qualitative dataset, the number of times a particular, distinct value occurs is referred to as its frequency or count.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

To Construct a Frequency Distribution of Qualitative Data

A

Step 1 List the distinct values of the observations in the data set in the first column of a table.

Step 2 For each observation, place a tally mark in the second column of the table in the row of the appropriate distinct value.

Step 3 Count the tallies for each distinct value and record the totals in the third column of the table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Relative-Frequency Distribution of Qualitative Data (or proportions)

A

A relative-frequency distribution of qualitative data is a listing of the distinct values and their relative frequencies.

  • provides a table of the values of the observations and (relatively) how often they occur.

-to obtain we first find a frequency distribution and then divide each freq by the total number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To Construct a Relative-Frequency Distribution of Qualitative Data

A

Step 1 Obtain a frequency distribution of the data.

Step 2 Divide each frequency by the total number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pie chart

A

Is a disk divided into wedge-shaped pieces proportional to the relative frequencies of the qualitative data

  • Look for: values that form large and small proportions of the data set.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To construct a pie chart

A

Step 1 Obtain a relative-frequency distribution of the data by applying
Procedure 2.2.

Step 2 Divide a disk into wedge-shaped pieces proportional to the relative frequencies

Step 3 Label the slices with the distinct values and their relative frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bar chart

A
  • displays the distinct values of the qualitative data on a horizontal axis and the relative frequencies (or frequencies or percents) of those values on a vertical axis.
  • The relative frequency of each distinct value is represented by a vertical bar whose height is equal to the relative frequency of that value.
  • The bars should be positioned so that they do not touch each other.
  • Another graphical display for qualitative data is the bar chart. Frequencies, relative frequencies, or percents can be used to label a bar chart. Although we primarily use relative frequencies, some of our applications employ frequencies or percents.
  • Look for: frequently and infrequently occurring values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

To construct a bar chart

A

Step 1 Obtain a relative-frequency distribution of the data by applying
Procedure 2.2.

Step 2 Draw a horizontal axis on which to place the bars and a vertical axis on which to display the relative frequencies.

Step 3 For each distinct value, construct a vertical bar whose height equals the relative frequency of that value.

Step 4 Label the bars with the distinct values, the horizontal axis with the name of the variable, and the vertical axis with “Relative frequency.”

18
Q

Classes/categories/bins

A

To organize quantitative data, we first group the observations into classes (also known as categories or bins) and then treat the classes as the distinct values of qualitative data.

  • Consequently, once we group the quantitative data into classes, construct frequency and relative-frequency distributions of the data in exactly the same way as we did for qualitative data.
  • classes or bins, which are non-overlapping intervals of equal width that, collectively, span the
    entire range of the data
19
Q

3 common sense and important guidelines for grouping quantitative data into classes are:

A
  1. The number of classes should be small enough to provide an effective summary but large enough to display the relevant characteristics of the data. A rule of thumb is that the number of classes should be between 5 and 20.
  2. Each observation must belong to one, and only one, class. That is, each observation should belong to some class and no observation should belong to more than one class.
  3. Whenever feasible, all classes should have the same width. Roughly speaking, this guideline means that, if possible, all classes should cover the same number of possible values. We’ll make this guideline more precise later in this section.
20
Q

Single-value grouping

A

we use the distinct values of the observations as the classes, a method completely analogous to that used for qualitative data.

  • Single-value grouping is particularly suitable for discrete data in which there are only a small number of distinct values.
21
Q

Limit grouping

A

A second way to group quantitative data is to use class limits. With this method, each class consists of a range of values.

  • The smallest value that could go in a class is called the lower limit of the class, and the largest value that could go in the class is called the upper limit of the class.
  • This method of grouping quantitative data is called limit grouping.
  • It is particularly useful when the data are expressed as whole numbers and there are too many distinct values to employ single-value grouping.
22
Q

Terms used in limit grouping

A

Lower class limit: The smallest value that could go in a class.

Upper class limit: The largest value that could go in a class

Class width: The difference between the lower limit of a class and the lower limit of the next-higher class.

Class mark: The average of the two class limits of a class.

23
Q

Cutpoint grouping

A

A third way to group quantitative data is to use class cutpoints. As with limit grouping, each class consists of a range of values.

  • The smallest value that could go in a class is called the lower cutpoint of the class, and the smallest value that could go in the next-higher class is called the upper cutpoint of the class.
  • Note that the lower cutpoint of a class is the same as its lower limit and that the upper cutpoint of a class is the same as the lower limit of the next higher class.
  • The method of grouping quantitative data by using cutpoints is called cutpoint grouping.
  • This method is particularly useful when the data are continuous and are expressed with decimals.
24
Q

Terms used in cutpoint grouping

A

Lower class cutpoint: The smallest value that could go in a class.

Upper class cutpoint: The smallest value that could go in the next-higher class (equivalent to the lower cutpoint of the next-higher class).

Class width: The difference between the cutpoints of a class.

Class midpoint: The average of the two cutpoints of a class.

25
Choosing the grouping method
Grouping method Sinole-value grouping: Use with discrete data in which there are only a small number of distinct values. Limit grouping: Use when the data are expressed as whole numbers and there are too many distice values to employ single-value grouping. Cutpoint geouping: Use when the data are continuous and are expressed with decimals.
26
Histogram
displays the classes of the quantitative data on a horizontal axis and the frequencies (relative frequencies, percents) of those classes on a vertical axis. - The frequency (relative frequency, percent) of each class is represented by a vertical bar whose height is equal to the frequency (relative fre-quency, percent) of that class. The bars should be positioned so that they touch each other. * For single-value grouping, we use the distinct values of the observations to label the bars, with each such value centered under its bar. * For limit grouping or cutpoint grouping, we use the lower class limits (of, equivalently, lower class cutpoints) to label the bars. Note: Some statisticans and technologies use class marks or class midpoints centered under the bars. Look for: - Central or typical value and corresponding spread - Gaps in the data or outliers - Presence of symmetry in the distribution - Number and location of peaks
27
Frequency histogram
Uses freqs on vertical axes. - relative frequencies or prevents in the vertical axis is called relative-freq histogram or prevent histogram
28
To construct a histogram
Step 1 Obtain a frequency (relative-frequency, percent) distribution of the data. Step 2 Draw a horizontal axis on which to place the bars and a vertical axis on which to display the frequencies (relative frequencies, percents). Step 3 For each class, construct a vertical bar whose height equals the frequency (relative frequency, percent) of that class. Step 4 Label the bars with the classes, as explained in Definition 2.9, the horizontal axis with the name of the variable, and the vertical axis with "Frequency" ("Relative frequency," "Percent").
29
Dot plot
A dotplot is a graph in which each observation is plotted as a dot at an appropriate place above a horizontal axis. Observations having equal values are stacked vertically. Dotplots are particularly useful for showing the relative positions of the data in a data set or for comparing two or more data sets. Look for: - Typical values and corresponding spread - Gaps in the data or outliers - Presence of symmetry in the distribution - Number and location of peaks
30
To construct a dot plot
Step 1 Draw a horizontal axis that displays the possible values of the quantitative data. Step 2 Record each observation by placing a dot over the appropriate value on the horizontal axis. Step 3 Label the horizontal axis with the name of the variable.
31
Stem and leaf diagram
In a stem-and-leaf diagram (or stemplot), each observation is separated into two parts, namely, a stem- consisting of all but the rightmost digit--and a leaf, the rightmost digit.
32
To Construct a Stem-and-Leaf Diagram
Step 1 Think of each observation as a stem consisting of all but the rightmost digit--and a leaf, the rightmost digit. Step 2 Write the stems from smallest to largest in a vertical column to the left of a vertical rule. Step 3 Write each leaf to the right of the vertical rule in the row that contains the appropriate stem. Step 4 Arrange the leaves in each row in ascending order.
33
Distribution of a data set
The distribution of a data set is a table, graph, or formula that provides the values of the observations and how often they occur.
34
Modality
When considering the shape of a distribution, you should observe its number of peaks (highest points). - A distribution is unimodal if it has one peak, - bimodal if it has two peaks, and - multimodal if it has three or more peaks.
35
Symmetry
A distribution that can be divided into two pieces that are mirror images of one another is called symmetric. -The three distributions called bell shaped, triangular, and uniform (or rectangular), are specific categories of symmetric distributions.
36
Skewness
A unimodal distribution that is not symmetric is either right skewed or left skewed. • A right-skewed distribution rises to its peak rapidly and comes back toward the horizontal axis more slowly--its "right tail" is longer than its "left tail." • A left-skewed distribution rises to its peak slowly and comes back toward the horizontal axis more rapidly-its "left tail" is longer than its "right tail." It is important to note the following distinction between general and specific classifications of distribution shape: • Modality, symmetry, and skewness are general classifications of distribution shape. • Such designations as bell shaped, triangular, and uniform are specific classifications of distribution shape.
37
Population and sample data
Population data: The values of a variable for the entire population. Sample data: The values of a variable for a sample of the population.
38
Population and sample distributions; distribution of a variable
The distribution of population data is called the population distribution, or the distribution of the variable. The distribution of sample data is called a sample distribution.
39
Population and sample distributions
For a simple random sample, the sample distribution approximates the population distribution (i.e., the distribution of the variable under consideration). The larger the sample size, the better the approximation tends to be.
40
Improper scaling
Misleading graphs and charts can result in this. Gives a false visual impression
41
univariate data
data characterized by one variable.
42
bivariate data
data characterized by two variables. Number of different ways we can summarize bivariate data - Side-by-Side/ Stacked Plots --> Stacked and side-by-side plots are generally used when the variable of interest is numerical, and the grouping variable is categorical (or sometimes discrete with a small number of distinct values). - Scatterplot --> When both variables are numerical, a scatterplot is the preferred graphical summary. A scatterplot is a graphical summary of two numerical variables: x-variable goes on the xaxis, y-variable on the y-axis. - Two-Way Tables --> gives a joint frequency (or joint relative frequency) which is the number of elements in the data corresponding one level of each categorical variable