DATA AND SAMPLING Flashcards

(22 cards)

1
Q

Hypothesis

A

A statement which may or may not be true. (A question is not a hypothesis.) A statistical investigation is used to see if there is evidence to support the hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Population

A

All items/people being investigated. (e.g. all students in Year 10, all fireworks made by a factory, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample frame

A

A list of all members of the population. (Instead of ‘list’ it may be e.g. a register, or database.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random sample

A

All items in the population have an equal chance of being selected for the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Stratified (random) sampling

A

Where the population is divided into clear strata (e.g. gender/school year), the proportions in the sample are matched to the proportions in the population. Members in each strata are then chosen at random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Judgement sampling

A

Non-random sampling, selecting using some criteria. (e.g. first 20 items/people)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cluster sampling

A

Non-random sampling, using all members of randomly chosen cluster(s). (e.g. all pupils in 3 randomly chosen tutor groups.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Quota sampling

A

Non-random sampling, where, e.g. an interviewer selects a pre-determined number of people of different age-groups/genders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Systematic sampling

A

Non-random sampling: from a random start point, selecting at fixed intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cleaning data

A

Data may need to be cleaned to improve reliability, and so that it can be understood and used by statistical software (for diagrams and calculations). Cleaning data may involve dealing with outliers or missing data, or standardising the format/units of data, removing symbols, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Anomaly

A

A value that appears not to fit the rest of the data. e.g. a long way from the line of best fit on a scatter diagram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outlier

A

A suspiciously high or low value. Outlier boundaries are found using: mean  3  s.d. or 1.5  IQR above upper quartile/below lower quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variables, multivariate

A

Variables are the ‘values’ being investigated that vary between different members of the population, May be discrete, continuous, qualitative, etc. A multivariate problem is where more than one linked variable is being investigated. (e.g. bivariate is two linked variables.) For example investigating how driving test performance varies by gender and by time of day.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Categorical data

A

Data fits into clearly defined categories. e.g. gender, voting intention, car make,…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ordinal data

A

Data indicating a rank order. e.g. position in a race.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distribution

A

The set of values of a variable along with their frequencies or probabilities.

17
Q

Extraneous variables

A

Variables we are not investigating: In planning an investigation, we aim to limit the effect of variables we are not interested in that might affect the outcome. (e.g. if comparing reaction times for two groups, ‘time of day’ may affect the results – using the same time of day for the two groups eliminates any effect of this extraneous variable.)

18
Q

Control groups & matched pairs

A

A control group is used alongside a test group so that comparisons can be made. Matched pairs can be used (one in each group) to help make the two groups similar. (e.g. test group get a new drug, control group get a placebo. The two groups should be as similar as possible to minimise the effect of extraneous variables. Any differences are then likely due to the new drug.)

19
Q

Closed/open questions

A

Closed questions require a choice from stated answers (e.g. with tick box options). Results can be easily analysed and used to produce graphs. Open questions have no restriction on how they can be answered (no options). Results are not easy to analyse – open questions are usually best avoided.

20
Q

Pilot survey / pre-test

A

Trying out a questionnaire on a small scale to see if any changes are needed, before using with a larger sample. (To check: Are the questions understood? Is the required information obtained? Are sufficient questionnaires returned (response rate)? Do response boxes cover all options? etc

21
Q

Random response

A

Used to estimate responses to a sensitive question. Allows more reliable responses to be collected by using an element of chance. e.g. Using a dice or a coin first, only some of the subjects will answer the question to ‘tick box A’, whilst others will ‘tick box A’ due to the outcome on the dice or coin.

22
Q

Reliability & validity

A

Reliability is the extent to which repeating a process would lead to similar results. (e.g. using too small a sample may be unreliable) Validity is the extent to which a process measures what was intended. (e.g. obtaining opinions about school food from Year 7 has poor validity if investigating opinions of all students.)