Probability
Probability is a mathematical tool used to study randomness. It deals with the chance (the likelihood) of an event occurring. Technically, a probability is a number between zero and one, inclusive, that gives the likelihood that a specific event will occur. Example : The chance that the beautiful and bright girl sitting in front of you offers you her phone number by the end of this class. Or the chance that you rank “delicious” the sandwich you bought at the cafeteria.
Population
all individuals, objects, or measurements whose properties are being studied.
random sampling
a method of selecting a sample that gives every member of the population an equal chance of being selected
Sample
a subset of the population studied.
Statistic
a numerical characteristic of the sample; a statistic estimates the corresponding population parameter.
Parameter
a number that is used to represent a population characteristic and that generally cannot be determined easily.
Representative sample
a subset of the population that has the same characteristics as the population.
Variable
A characteristic, measure or property of interest for each person or object in a population.
Numerical versus Categorical
variables that take on values that are indicated by numbers ( continuous or discrete ) versus variables that take on values that are names or labels such as Liberal – Conservative – New Democrat or True and False.
Data / Datum ( singular )
: a set of observations (a set of possible outcomes); most data can be put into two groups: qualitative (an attribute whose value is indicated by a label) or quantitative (an attribute whose value is indicated by a number). Quantitative data can be separated into two subgroups: discrete and continuous. Data is discrete if it is the result of counting (such as the number of students of a given ethnic group in a class or the number of books on a shelf). Data is continuous if it is the result of measuring (such as distance traveled or weight of luggage).
Quantitative
data are always numbers. Quantitative data are the result of counting or measuring attributes of a population.
Qualitative
data are the result of categorizing or describing attributes of a population. Qualitative data are also often called categorical data
continuous
Data is discrete if it is the result of counting (such as the number of students of a given ethnic group in a class or the number of books on a shelf).
discrete
Data is continuous if it is the result of measuring (such as distance traveled or weight of luggage)
pie chart
categories of data are represented by wedges in a circle and are proportional in size to the percent of individuals in each category. The sum of all categories should add up to 100%. Otherwise, look at a bar or Pareto graph.
bar graph
the length of the bar for each category is proportional to the number or percent of individuals in each category. Bars may be vertical or horizontal.
Pareto chart
consists of bars that are sorted into order by category size (largest to smallest).
Sampling with replacement
Sampling with replacement is truly random sampling, once a member is picked, that member goes back into the population and thus may be chosen more than once. When does it matter? And when not?
sampling errors
sampling errors and nonsampling errors. The actual process of sampling causes sampling errors. For example, the sample may not be large enough, you miscalculated twice the 50th following
nonsampling errors
A defective counting or reporting device can cause a nonsampling error. Postal codes were erroneous for a part of your sample on a mailing questionnaire.
Sampling bias
is created when a sample is collected from a population and some members of the population are not as likely to be chosen as others (remember, each member of the population should have an equally likely chance of being chosen).
Variation
is present in any set of data. Do you really think there are precisely 16oz of drink in every soda can you buy?
Nominal scale level
is qualitative (categorical), such as “weekly”, “New Democrats”, “Blue eyes”. Nominal is from “nom…”, that is, name of the category. You cannot order them, you cannot average them… You can only count elements using “this label” out of your sample.
Ordinal scale level:
is like nominal but you can “rank” them, order them. We could rank a restaurant from “awful” to “delicious”. We know we all prefer the “delicious” to the “awful”, no number needed! But you cannot know the difference between awful and delicious, it does not have a meaning.