Data analysis unit 1 Flashcards by Kaibar Safi

What is the Main Ideas of data anlysis?

Statistics is part of any scientific research—it helps us make sense of data and answer questions.

Two types of research:

Qualitative: open-ended, descriptive (e.g. interviews, observations).

Quantitative: numbers-based (e.g. test scores, reaction times).

How well did you know this?

Not at all

Perfectly

Explain the research process

Data Analysis is a core part of the research process:

Ask a research question

Gather documentation (literature, background)

Create a hypothesis

Design your study

Collect data

Analyze data ← This is where statistics comes in

Interpret what it means

Example:

“Do students who sleep more have better grades?”
→ Collect sleep hours and grade data → analyze statistically → interpret the results.

How well did you know this?

Not at all

Perfectly

Population – What Do We Study?

Population: Everyone or everything that has the characteristic you’re interested in.

Could be: people, animals, institutions, etc.

Two types:

Finite Quantafiable: You can count all (e.g., “Deaths in 2021”).

Infinite (non Quantafiable): Theoretically unending or too big (e.g., “All possible coin tosses”).

Example:

Population = All university students in Spain → That’s finite, but very large.

Population = All future tosses of a coin → Infinite.

How well did you know this?

Not at all

Perfectly

Explain finite and infinite

Two types:

Finite: You can count all (e.g., “Deaths in 2021”).

Infinite: Theoretically unending or too big (e.g., “All possible coin tosses”).

Example:

Population = All university students in Spain → That’s finite, but very large.

Population = All future tosses of a coin → Infinite.

How well did you know this?

Not at all

Perfectly

explain Sample and Units of Analysis

Sample = A subset of the population.

You can’t study everyone, so you study a representative group.

The units of analysis are the actual elements being studied (e.g., people, coin tosses).

✅ A good sample must:

Be large enough (size matters).

Be randomly selected (to avoid bias).

Example:

Population = All UCAM students.

Sample = 150 UCAM students randomly selected.

Unit of analysis = each individual student.

How well did you know this?

Not at all

Perfectly

What do we use to study it? Parameters and statistics

Quantitative values (numbers) used to describe:

Population → Called parameters → Use Greek letters (μ, σ, π, N)

Sample → Called statistics → Use Latin letters (X̄, S, P, n)

Property: Mean (average)

Population (Parameter): μ

Sample (Statistic): X̅
____________________________________

Property: Standard deviation

Population (Parameter): σ

Sample (Statistic): S
____________________________________

Property: Proportion

Population (Parameter): π

Sample (Statistic): P
____________________________________

Property: Size

Population (Parameter): N

Sample (Statistic): n
____________________________________

Example:

μ = the average anxiety score of all students in Spain.

X̄ = average anxiety score of your sample.

How well did you know this?

Not at all

Perfectly

Explain the different Parameters vs. Statistics

Property: Mean (average)

Population (Parameter): μ

Sample (Statistic): X̅
____________________________________

Property: Standard deviation

Population (Parameter): σ

Sample (Statistic): S
____________________________________

Property: Proportion

Population (Parameter): π

Sample (Statistic): P
____________________________________

Property: Size

Population (Parameter): N

Sample (Statistic): n
____________________________________

Example:

μ = the average anxiety score of all students in Spain.

X̄ = average anxiety score of your sample.

How well did you know this?

Not at all

Perfectly

What Do We Measure? → Variables

Variable: FEATURES OF THE UNITS OF ANALYSIS. Something that can change between individuals.

Examples: Age, anxiety, blood pressure.
_____________________________

Constant: Same value for everyone. OPPOSITE TO A VARIABLE

Example: All participants are female (gender = constant here).

How well did you know this?

Not at all

Perfectly

🔄 Cross-section vs. Longitudinal

These terms refer to the time dimension of your study design.

Term: Cross-sectional

What it means: All data collected at one single time point.

__________________

Term: Longitudinal

What it means: Data collected at multiple time points over a period of time.

How well did you know this?

Not at all

Perfectly

A hypothesis always has 2 variables:

Cause = the variable we think makes a difference

Result = what we think is affected

But the words we use for the variables depend on the type of study.
_____________________________

🔷 1. Experimental / Quasi-Experimental Studies

🧪 You’re comparing groups or testing something.

📌 You use:

Independent variable (IV) = the cause (you change or compare it)

Dependent variable (DV) = the result (you measure it)

✅ Example:

Does coffee improve memory?

Coffee = IV

Memory score = DV

Do smokers have higher anxiety? (You don’t assign smoking = quasi)

Smoking status = IV

Anxiety level = DV
_____________________

🔶 2. Cross-sectional / Longitudinal Studies

📋 You are observing things over time or at one time point. No manipulation.

📌 You use:

Predictive variable = the “cause” you think predicts something

Result variable = the “effect” you measure

✅ Examples:

Do people who sleep more feel less anxious? (Cross-sectional: measured once)

Sleep hours = Predictive variable

Anxiety score = Result variable

Does social media use affect GPA over 1 year? (Longitudinal)

Social media use = Predictive

GPA = Result

How well did you know this?

Not at all

Perfectly

Explain the variables in Experimental / Quasi-Experimental Studies
vs Cross-sectional / Longitudinal Studies

🔷 1. Experimental / Quasi-Experimental Studies

🧪 You’re comparing groups or testing something.

📌 You use:

Independent variable (IV) = the cause (you change or compare it)

Dependent variable (DV) = the result (you measure it)

✅ Example:

Does coffee improve memory?

Coffee = IV

Memory score = DV

Do smokers have higher anxiety? (You don’t assign smoking = quasi)

Smoking status = IV

Anxiety level = DV
_____________________

🔶 2. Cross-sectional / Longitudinal Studies

📋 You are observing things over time or at one time point. No manipulation.

📌 You use:

Predictive variable = the “cause” you think predicts something

Result variable = the “effect” you measure

✅ Examples:

Do people who sleep more feel less anxious? (Cross-sectional: measured once)

Sleep hours = Predictive variable

Anxiety score = Result variable

Does social media use affect GPA over 1 year? (Longitudinal)

Social media use = Predictive

GPA = Result

How well did you know this?

Not at all

Perfectly

Find the Dependent and independent variables in these:

Energetic drinks decrease fatigue.
*
Walk daily increase life expectancy.
*
Generous behavior is more frequent among people with lower socioeconomic status.

Energetic drinks (IV) decrease fatigue (DV)

Walk daily (IV) increases life expectancy (DV)

Socioeconomic status (IV) linked to generosity (DV)

Reminder: IV causes or predicts the DV.

How well did you know this?

Not at all

Perfectly

Can generalizable conclusions be drawn from the data collected in one sample?

No normally you cannot draw a generalizable conclusion from one sample.
The sample has to be enormus in order to do so. The sample is not big enough, It is not
Representative enough, and it is not done in a enough systematic way.

How well did you know this?

Not at all

Perfectly

What are the Two Types of Statistics?

Type: Descriptive

Purpose: Summarize the data from your sample. (Goal: to summarize a set of information in order to interpret it and draw conclusions.)

E.g.,graphs, percentages, means.
*
How many people are younger than 18years old?
*
Do more anxious people go to the doctor more often?

Example: “The average anxiety score in our sample is 2.”
________________

Type: Inferential

Purpose: Use that sample to predict, estimate, or generalize to the population. (Goal: based on probability calculations, and from the sample data, estimate, predict or generalize conclusions.

To determine whether the observed effects are strong enough to be generalized beyond the sample.

E.g.,hypothesis testing, regression analysis…

Example: “We estimate that the average anxiety score for all students in Spain is also around 2.”

How well did you know this?

Not at all

Perfectly

Explain Notation of Variables

In data analysis, we give variables letters to work with them more easily.

Symbol: X

Meaning: Predictor / Independent variable
______________________

Symbol: Y

Meaning: Outcome / Dependent variable
______________________

Symbol: U, V

Meaning: Outcome / Dependent variable

Example:

Does stimulant treatment (Y) affect creativity (X)?
Creativity = X (predictor)
Treatment = Y (outcome)

⚠️ This is reversed from usual (normally treatment = X and creativity = Y), so always check the context.

How well did you know this?

Not at all

Perfectly

🔢 Variables are coded as numbers in the dataset.

Study These Flashcards

fx:

Variable:
Gender

Code: 1, 2

Meaning: Male, Female

What are the Categories of Variables

Study These Flashcards

📌 Two rules for categories of variables:

Mutual exclusion
→ One person can only be in one category.
✅ Example: You can’t be both Spanish and Italian in the same study.

Exhaustiveness
→ All people must fit into some category.
✅ Example: If “education level” is a variable, everyone must fall into one level.

Explain the types of Variables

Study These Flashcards

Qualitative (names, types, can’t do math)

Just say what someone is.

Example: Gender, nationality.

Dichotomous (Only 2 categories
Example:Male / Female, Yes / No) /Polytomous (More than 2 categories
Example: Spanish / Italian / German)

Quasi-quantitative / Ordinal (can be ordered, but not measured)

Can sort them, but can’t do real math.

Example: Low / Medium / High stress

Quantitative (numbers you can do math on)

Real numeric values: age, weight, time.

Explain Dichotomous vs. Polytomous

Study These Flashcards

Term: Dichotomous
Meaning: Only 2 categories
Example:Male / Female, Yes / No

Term: Polytomous
Meaning: More than 2 categories
Example: Spanish / Italian / German

✅ Tip:

“Di” = 2

“Poly” = many

So it’s about how many choices the variable gives you.

Explain 🔸 Dichotomized vs. Polytomized and why it only works for Why only for quasi-quantitative (ordinal) variables?

Study These Flashcards

This means the variable was changed into 2 or more groups.

These terms apply when you transform a variable to group it.

Term: Dichotomized
Meaning: A variable has been simplified into 2 categories
Example: Age: “< 18” vs. “18+”

Term: Polytomized
Meaning: A variable has been grouped into more than 2 categories
Example: Income: low, medium, high

Example 1: Stress level (ordinal)

Original (5 levels):

Very low, Low, Medium, High, Very high → Ordinal

🔸 If you turn it into 2 categories:

Low stress (Very low, Low, Medium)

High stress (High, Very high)
→ You dichotomized the ordinal variable.

🔸 If you make 3 groups:

Low, Medium, High
→ You polytomized the ordinal variable.

Why only for quasi-quantitative (ordinal) variables?

Because:

Ordinal variables have a ranking or order (like “low”, “medium”, “high”), but they’re not true numbers.

You can group or collapse these into fewer categories — and when you do, that’s called dichotomizing or polytomizing.

Explain Quantitative → Continuous vs Discrete

Study These Flashcards

All quantitative variables are either:

Type: Continuous
Example: Height, time, temperature
Can have decimals?: ✅ Yes (e.g., 172.5 cm)

Type: Discrete
Example : # of people, sessions
Can have decimals?: ❌ No (whole numbers only)

Explain the qualitative varible nominal

Study These Flashcards

🔵 Nominal (names only, no order)

Just categories

You can only say: same or different

Numbers = labels only

✅ Example:
0 = Single
1 = Married
2 = Divorced

You can’t say 1 is “more” than 0.

Explain the variable ordinal

Study These Flashcards

🟠 Ordinal (order matters, but math doesn’t)

You can say who is higher or lower

But you can’t calculate how much higher

✅ Example:
Education level:
1 = Primary
2 = Secondary
3 = University
But: 3 ≠ 1 + 2

What does arbitrariness mean?

Study These Flashcards

Arbitrariness = how random or artificial the choice of numbers or zero is.

If a number (like 0) is arbitrary, it means it was chosen by humans, not because it represents “nothing” in the real world.

Explain interval

🔵 Interval Scale = Some arbitrariness ✔️ What you can do: You can say how much more or less something is → add or subtract You can measure distance between values ❌ What you can't do: You cannot say "twice as much" or "half as much" Because the zero is arbitrary — it doesn’t mean “none” 🌡️ Example: Celsius temperature 0°C is not “no temperature” — it’s the freezing point of water, chosen by scientists. You can say: 20°C is 10°C warmer than 10°C ✅ But you can’t say 20°C is twice as hot as 10°C ❌ This is because 0°C is arbitrary, not a real zero. It’s just a reference point. 📅 Example: Years Year 0 (in our calendar) is also arbitrary — it doesn’t mark the beginning of time, it’s just a reference point from a historical or religious event. You can say: 2024 – 2020 = 4 years passed ✅ But: Year 2000 is not double year 1000 ❌ Again, because “Year 0” is not a true starting point — it’s just where we decided to begin counting. 🔁 So why does the slide say "Interval = some arbitrariness"? Because: The numbers make sense in terms of distance But the zero point is artificial (not a true zero) That limits the math you can do

🟠 What is a Ratio Scale?

A Ratio scale is the most precise and most powerful measurement scale in statistics. ✅ It allows: Ordering (like ordinal) Meaningful differences (like interval) Ratios (like “twice as much”, “half as much”) 🔑 Why? Because there is NO arbitrariness ❗ The zero point is real and absolute 0 = none of the thing you're measuring It’s not chosen by humans — it represents a true absence Everyday Examples: Variable: Weight Why it's Ratio: 0 kg = no weight, 10 kg = twice 5 kg In Ratio scales, the 0 point is not chosen by humans — it's the natural starting point of that measurement. 0 kg = no mass 0 sec = no time 0 money = no money So we can use all math operations: Add / Subtract ✅ Multiply / Divide ✅ Use proportions ✅

⚖️ Comparison: Interval vs Ratio

Feature: Zero point Interval: ❌ Arbitrary (chosen) Ratio: ✅ True zero (none of it) _______________________ Feature: Can values be negative? Interval: ✅ Yes Ratio: ❌ No (always 0 or more) _______________________ Feature: Can compare ratios? Interval: ❌ No (can’t say 2× or ½) Ratio: ✅ Yes (can say 2×, ½, etc.) _______________________ Feature: Examples Interval: °C, years, IQ Ratio: Weight, time, income, Kelvin _______________________ In Ratio scales, the 0 point is not chosen by humans — it's the natural starting point of that measurement. 0 kg = no mass 0 sec = no time 0 money = no money So we can use all math operations: Add / Subtract ✅ Multiply / Divide ✅ Use proportions ✅

Data analysis unit 1 Flashcards

(28 cards)