What is the Main Ideas of data anlysis?
Statistics is part of any scientific research—it helps us make sense of data and answer questions.
Two types of research:
Qualitative: open-ended, descriptive (e.g. interviews, observations).
Quantitative: numbers-based (e.g. test scores, reaction times).
Explain the research process
Data Analysis is a core part of the research process:
Ask a research question
Gather documentation (literature, background)
Create a hypothesis
Design your study
Collect data
Analyze data ← This is where statistics comes in
Interpret what it means
Example:
“Do students who sleep more have better grades?”
→ Collect sleep hours and grade data → analyze statistically → interpret the results.
Population – What Do We Study?
Population: Everyone or everything that has the characteristic you’re interested in.
Could be: people, animals, institutions, etc.
Two types:
Finite Quantafiable: You can count all (e.g., “Deaths in 2021”).
Infinite (non Quantafiable): Theoretically unending or too big (e.g., “All possible coin tosses”).
Example:
Population = All university students in Spain → That’s finite, but very large.
Population = All future tosses of a coin → Infinite.
Explain finite and infinite
Two types:
Finite: You can count all (e.g., “Deaths in 2021”).
Infinite: Theoretically unending or too big (e.g., “All possible coin tosses”).
Example:
Population = All university students in Spain → That’s finite, but very large.
Population = All future tosses of a coin → Infinite.
explain Sample and Units of Analysis
Sample = A subset of the population.
You can’t study everyone, so you study a representative group.
The units of analysis are the actual elements being studied (e.g., people, coin tosses).
✅ A good sample must:
Be large enough (size matters).
Be randomly selected (to avoid bias).
Example:
Population = All UCAM students.
Sample = 150 UCAM students randomly selected.
Unit of analysis = each individual student.
What do we use to study it? Parameters and statistics
Quantitative values (numbers) used to describe:
Population → Called parameters → Use Greek letters (μ, σ, π, N)
Sample → Called statistics → Use Latin letters (X̄, S, P, n)
Property: Mean (average)
Population (Parameter): μ
Sample (Statistic): X̅
____________________________________
Property: Standard deviation
Population (Parameter): σ
Sample (Statistic): S
____________________________________
Property: Proportion
Population (Parameter): π
Sample (Statistic): P
____________________________________
Property: Size
Population (Parameter): N
Sample (Statistic): n
____________________________________
Example:
μ = the average anxiety score of all students in Spain.
X̄ = average anxiety score of your sample.
Explain the different Parameters vs. Statistics
Property: Mean (average)
Population (Parameter): μ
Sample (Statistic): X̅
____________________________________
Property: Standard deviation
Population (Parameter): σ
Sample (Statistic): S
____________________________________
Property: Proportion
Population (Parameter): π
Sample (Statistic): P
____________________________________
Property: Size
Population (Parameter): N
Sample (Statistic): n
____________________________________
Example:
μ = the average anxiety score of all students in Spain.
X̄ = average anxiety score of your sample.
What Do We Measure? → Variables
Variable: FEATURES OF THE UNITS OF ANALYSIS. Something that can change between individuals.
Examples: Age, anxiety, blood pressure.
_____________________________
Constant: Same value for everyone. OPPOSITE TO A VARIABLE
Example: All participants are female (gender = constant here).
🔄 Cross-section vs. Longitudinal
These terms refer to the time dimension of your study design.
Term: Cross-sectional
What it means: All data collected at one single time point.
__________________
Term: Longitudinal
What it means: Data collected at multiple time points over a period of time.
A hypothesis always has 2 variables:
Cause = the variable we think makes a difference
Result = what we think is affected
But the words we use for the variables depend on the type of study.
_____________________________
🔷 1. Experimental / Quasi-Experimental Studies
🧪 You’re comparing groups or testing something.
📌 You use:
Independent variable (IV) = the cause (you change or compare it)
Dependent variable (DV) = the result (you measure it)
✅ Example:
Does coffee improve memory?
Coffee = IV
Memory score = DV
Do smokers have higher anxiety? (You don’t assign smoking = quasi)
Smoking status = IV
Anxiety level = DV
_____________________
🔶 2. Cross-sectional / Longitudinal Studies
📋 You are observing things over time or at one time point. No manipulation.
📌 You use:
Predictive variable = the “cause” you think predicts something
Result variable = the “effect” you measure
✅ Examples:
Do people who sleep more feel less anxious? (Cross-sectional: measured once)
Sleep hours = Predictive variable
Anxiety score = Result variable
Does social media use affect GPA over 1 year? (Longitudinal)
Social media use = Predictive
GPA = Result
Explain the variables in Experimental / Quasi-Experimental Studies
vs Cross-sectional / Longitudinal Studies
🔷 1. Experimental / Quasi-Experimental Studies
🧪 You’re comparing groups or testing something.
📌 You use:
Independent variable (IV) = the cause (you change or compare it)
Dependent variable (DV) = the result (you measure it)
✅ Example:
Does coffee improve memory?
Coffee = IV
Memory score = DV
Do smokers have higher anxiety? (You don’t assign smoking = quasi)
Smoking status = IV
Anxiety level = DV
_____________________
🔶 2. Cross-sectional / Longitudinal Studies
📋 You are observing things over time or at one time point. No manipulation.
📌 You use:
Predictive variable = the “cause” you think predicts something
Result variable = the “effect” you measure
✅ Examples:
Do people who sleep more feel less anxious? (Cross-sectional: measured once)
Sleep hours = Predictive variable
Anxiety score = Result variable
Does social media use affect GPA over 1 year? (Longitudinal)
Social media use = Predictive
GPA = Result
Find the Dependent and independent variables in these:
Energetic drinks decrease fatigue.
*
Walk daily increase life expectancy.
*
Generous behavior is more frequent among people with lower socioeconomic status.
Energetic drinks (IV) decrease fatigue (DV)
Walk daily (IV) increases life expectancy (DV)
Socioeconomic status (IV) linked to generosity (DV)
Reminder: IV causes or predicts the DV.
Can generalizable conclusions be drawn from the data collected in one sample?
No normally you cannot draw a generalizable conclusion from one sample.
The sample has to be enormus in order to do so. The sample is not big enough, It is not
Representative enough, and it is not done in a enough systematic way.
What are the Two Types of Statistics?
Type: Descriptive
Purpose: Summarize the data from your sample. (Goal: to summarize a set of information in order to interpret it and draw conclusions.)
E.g.,graphs, percentages, means.
*
How many people are younger than 18years old?
*
Do more anxious people go to the doctor more often?
Example: “The average anxiety score in our sample is 2.”
________________
Type: Inferential
Purpose: Use that sample to predict, estimate, or generalize to the population. (Goal: based on probability calculations, and from the sample data, estimate, predict or generalize conclusions.
To determine whether the observed effects are strong enough to be generalized beyond the sample.
E.g.,hypothesis testing, regression analysis…
Example: “We estimate that the average anxiety score for all students in Spain is also around 2.”
Explain Notation of Variables
In data analysis, we give variables letters to work with them more easily.
Symbol: X
Meaning: Predictor / Independent variable
______________________
Symbol: Y
Meaning: Outcome / Dependent variable
______________________
Symbol: U, V
Meaning: Outcome / Dependent variable
Example:
Does stimulant treatment (Y) affect creativity (X)?
Creativity = X (predictor)
Treatment = Y (outcome)
⚠️ This is reversed from usual (normally treatment = X and creativity = Y), so always check the context.
🔢 Variables are coded as numbers in the dataset.
fx:
Variable:
Gender
Code: 1, 2
Meaning: Male, Female
What are the Categories of Variables
📌 Two rules for categories of variables:
Mutual exclusion
→ One person can only be in one category.
✅ Example: You can’t be both Spanish and Italian in the same study.
Exhaustiveness
→ All people must fit into some category.
✅ Example: If “education level” is a variable, everyone must fall into one level.
Explain the types of Variables
Just say what someone is.
Example: Gender, nationality.
Dichotomous (Only 2 categories
Example:Male / Female, Yes / No) /Polytomous (More than 2 categories
Example: Spanish / Italian / German)
Can sort them, but can’t do real math.
Example: Low / Medium / High stress
Real numeric values: age, weight, time.
Explain Dichotomous vs. Polytomous
Term: Dichotomous
Meaning: Only 2 categories
Example:Male / Female, Yes / No
Term: Polytomous
Meaning: More than 2 categories
Example: Spanish / Italian / German
✅ Tip:
“Di” = 2
“Poly” = many
So it’s about how many choices the variable gives you.
Explain 🔸 Dichotomized vs. Polytomized and why it only works for Why only for quasi-quantitative (ordinal) variables?
This means the variable was changed into 2 or more groups.
These terms apply when you transform a variable to group it.
Term: Dichotomized
Meaning: A variable has been simplified into 2 categories
Example: Age: “< 18” vs. “18+”
Term: Polytomized
Meaning: A variable has been grouped into more than 2 categories
Example: Income: low, medium, high
Example 1: Stress level (ordinal)
Original (5 levels):
Very low, Low, Medium, High, Very high → Ordinal
🔸 If you turn it into 2 categories:
Low stress (Very low, Low, Medium)
High stress (High, Very high)
→ You dichotomized the ordinal variable.
🔸 If you make 3 groups:
Low, Medium, High
→ You polytomized the ordinal variable.
Why only for quasi-quantitative (ordinal) variables?
Because:
Ordinal variables have a ranking or order (like “low”, “medium”, “high”), but they’re not true numbers.
You can group or collapse these into fewer categories — and when you do, that’s called dichotomizing or polytomizing.
Explain Quantitative → Continuous vs Discrete
All quantitative variables are either:
Type: Continuous
Example: Height, time, temperature
Can have decimals?: ✅ Yes (e.g., 172.5 cm)
Type: Discrete
Example : # of people, sessions
Can have decimals?: ❌ No (whole numbers only)
Explain the qualitative varible nominal
🔵 Nominal (names only, no order)
Just categories
You can only say: same or different
Numbers = labels only
✅ Example:
0 = Single
1 = Married
2 = Divorced
You can’t say 1 is “more” than 0.
Explain the variable ordinal
🟠 Ordinal (order matters, but math doesn’t)
You can say who is higher or lower
But you can’t calculate how much higher
✅ Example:
Education level:
1 = Primary
2 = Secondary
3 = University
But: 3 ≠ 1 + 2
What does arbitrariness mean?
Arbitrariness = how random or artificial the choice of numbers or zero is.
If a number (like 0) is arbitrary, it means it was chosen by humans, not because it represents “nothing” in the real world.