DATA AND SAMPLING Flashcards

Question 1

Q

Hypothesis

Answer

A

A statement which may or may not be true. (A question is not a hypothesis.) A statistical investigation is used to see if there is evidence to support the hypothesis.

Question 2

Q

Population

Answer

A

All items/people being investigated. (e.g. all students in Year 10, all fireworks made by a factory, etc)

Question 3

Q

Sample frame

Answer

A

A list of all members of the population. (Instead of ‘list’ it may be e.g. a register, or database.)

Question 4

Q

Random sample

Answer

A

All items in the population have an equal chance of being selected for the sample.

Question 5

Q

Stratified (random) sampling

Answer

A

Where the population is divided into clear strata (e.g. gender/school year), the proportions in the sample are matched to the proportions in the population. Members in each strata are then chosen at random.

Question 6

Q

Judgement sampling

Answer

A

Non-random sampling, selecting using some criteria. (e.g. first 20 items/people)

Question 7

Q

Cluster sampling

Answer

A

Non-random sampling, using all members of randomly chosen cluster(s). (e.g. all pupils in 3 randomly chosen tutor groups.)

Question 8

Q

Quota sampling

Answer

A

Non-random sampling, where, e.g. an interviewer selects a pre-determined number of people of different age-groups/genders.

Question 9

Q

Systematic sampling

Answer

A

Non-random sampling: from a random start point, selecting at fixed intervals.

Question 10

Q

Cleaning data

Answer

A

Data may need to be cleaned to improve reliability, and so that it can be understood and used by statistical software (for diagrams and calculations). Cleaning data may involve dealing with outliers or missing data, or standardising the format/units of data, removing symbols, etc.

Question 11

Q

Anomaly

Answer

A

A value that appears not to fit the rest of the data. e.g. a long way from the line of best fit on a scatter diagram.

Question 12

Q

Outlier

Answer

A

A suspiciously high or low value. Outlier boundaries are found using: mean  3  s.d. or 1.5  IQR above upper quartile/below lower quartile

Question 13

Q

Variables, multivariate

Answer

A

Variables are the ‘values’ being investigated that vary between different members of the population, May be discrete, continuous, qualitative, etc. A multivariate problem is where more than one linked variable is being investigated. (e.g. bivariate is two linked variables.) For example investigating how driving test performance varies by gender and by time of day.

Question 14

Q

Categorical data

Answer

A

Data fits into clearly defined categories. e.g. gender, voting intention, car make,…

Question 15

Q

Ordinal data

Answer

A

Data indicating a rank order. e.g. position in a race.

Question 16

Q

Distribution

Answer

Study These Flashcards

A

The set of values of a variable along with their frequencies or probabilities.

Question 17

Q

Extraneous variables

Answer

Study These Flashcards

A

Variables we are not investigating: In planning an investigation, we aim to limit the effect of variables we are not interested in that might affect the outcome. (e.g. if comparing reaction times for two groups, ‘time of day’ may affect the results – using the same time of day for the two groups eliminates any effect of this extraneous variable.)

Question 18

Q

Control groups & matched pairs

Answer

Study These Flashcards

A

A control group is used alongside a test group so that comparisons can be made. Matched pairs can be used (one in each group) to help make the two groups similar. (e.g. test group get a new drug, control group get a placebo. The two groups should be as similar as possible to minimise the effect of extraneous variables. Any differences are then likely due to the new drug.)

Question 19

Q

Closed/open questions

Answer

Study These Flashcards

A

Closed questions require a choice from stated answers (e.g. with tick box options). Results can be easily analysed and used to produce graphs. Open questions have no restriction on how they can be answered (no options). Results are not easy to analyse – open questions are usually best avoided.

Question 20

Q

Pilot survey / pre-test

Answer

Study These Flashcards

A

Trying out a questionnaire on a small scale to see if any changes are needed, before using with a larger sample. (To check: Are the questions understood? Is the required information obtained? Are sufficient questionnaires returned (response rate)? Do response boxes cover all options? etc

Question 21

Q

Random response

Answer

Study These Flashcards

A

Used to estimate responses to a sensitive question. Allows more reliable responses to be collected by using an element of chance. e.g. Using a dice or a coin first, only some of the subjects will answer the question to ‘tick box A’, whilst others will ‘tick box A’ due to the outcome on the dice or coin.

Question 22

Q

Reliability & validity

Answer

Study These Flashcards

A

Reliability is the extent to which repeating a process would lead to similar results. (e.g. using too small a sample may be unreliable) Validity is the extent to which a process measures what was intended. (e.g. obtaining opinions about school food from Year 7 has poor validity if investigating opinions of all students.)

DATA AND SAMPLING Flashcards

(22 cards)