Week 1 - Introduction, Understanding data using SPSS Flashcards

(37 cards)

1
Q

What are the two categories of data based on structure?

A

Structured data, unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is structured data?

A

organised, often numeric, easy to store in databases (e.g., transaction data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is unstructured data?

A

not organised, harder to analyse (e.g., contracts, texts, voices, video).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the four categories of data based on characteristics?

A

Descriptive data
Behaviour data
Interaction data
Attitudinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does descriptive data include?

A

attributes, characteristics, geo/demographics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is does behaviour data include?

A

orders, transactions, payments, credit history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does interaction data include?

A

emails, chat transcripts, click-streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does attitudinal data include?

A

opinions, preferences, needs, desires.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Business Analytics (BA)?

A

The skills, technologies, and practices for continuous exploration of past business performance to gain insight and support fact-based decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 4 main components of Business Analytics?

A

(Big) Data

Statistical & Quantitative Analysis

Explanatory or Predictive Models

Decision Making & Actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What types of questions does Business Analytics answer?

A

What happened? (Describe)

Why is this happening?

What if these trends continue?

What will happen next? (Predict)

What is the best that can happen? (Optimize)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three categories of analytics?

A

Descriptive Analytics
Predictive Analytics
Prescriptive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are descriptive analytics?

A

Using and visualising data (e.g., reports, scorecards, clustering) to understand performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are predictive analytics?

A

Using statistical/machine learning techniques (time-series, regression) to find relationships and predict outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are prescriptive analytics?

A

Using optimisation and simulation techniques to improve business performance, given objectives and constraints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of Descriptive Analytics.

A

Reports showing past sales trends, dashboards, and performance scorecards.

17
Q

Example of Predictive Analytics.

A

Using regression models to forecast future sales or customer churn.

18
Q

Example of Prescriptive Analytics.

A

Supply chain optimisation to minimise costs under constraints or scheduling staff to maximise efficiency.

19
Q

What are the two main types of structured data?

A

Categorical (qualitative)

Numerical (quantitative)

20
Q

What are the two subtypes of categorical (qualitative) data?

A

Nominal: Categories without order (e.g., gender, colors, countries).

Ordinal: Categories with implied order (e.g., rankings, education levels).

21
Q

What are the two subtypes of numerical (quantitative) data?

A

Interval: Numeric values where differences make sense, but no absolute zero (e.g., temperature in °C).

Ratio: Numeric values with a true zero, allowing all arithmetic operations, including ratios (e.g., weight, salary).

22
Q

Formula for the mean (x~?

23
Q

Formula for standard deviation (s)?

A

s = √ (∑(x_i - x~)^2 /(n-1))

24
Q

What does skewness describe?

A

The shape of a distribution – whether it is symmetric, left-skewed (negative), or right-skewed (positive).

25
What does right skewness (positive skew) mean?
A distribution with a long right tail (e.g., compensation levels, income distributions).
26
Example of data with strong right-skewness?
Compensation levels – very varied, max = 16 × mean.
27
How do compensation levels vary across industries?
Highest mean in Telecoms. Maximum overall in IT.
28
What factors influence compensation besides industry?
Compensation is related to Net Sales, but other factors also play a role.
29
Are there industry-specific compensation patterns?
Yes – fairly strong industry-specific relationships exist between Compensation and Net Sales.
30
Pros and cons of a Dotplot
Pros: Simple, easy to construct. Shows individual data values. Good for small datasets. Cons: Not suitable for large datasets (too cluttered). Doesn’t show distribution shape well for big data.
31
Pros and cons of a Histogram
Pros: Shows distribution (shape, spread, skewness). Useful for large datasets. Good for continuous variables like salary. Cons: Loses individual data details. Sensitive to bin width choice.
32
Pros and cons of a Boxplot
Pros: Summarises data with median, quartiles, and outliers. Easy to compare multiple groups (e.g., industries). Good for skewed data like compensation. Cons: Doesn’t show the full distribution shape. Can be hard to interpret without practice.
33
Pros and cons of a Pie Chart
Pros: Good for showing proportions/percentages. Easy for simple categorical data (e.g., % employees per industry). Cons: Hard to compare slices accurately. Not suitable for many categories. Misleading if values are close.
34
Pros and cons of a Bar Chart
Pros: Good for categorical comparisons (e.g., mean pay by industry). Clear visual differences between groups. Handles many categories better than pie charts. Cons: Doesn’t show distribution within each category. Can oversimplify variation.
35
What is the use of Boxplots by Industry in compensation analysis.
Compare pay distributions across industries. Show median pay, variation, and outliers. Highlights industries with high inequality or extreme salaries.
36
What is the use of Mean Compensation by Industry (Bar Chart).
Shows average pay per industry. Easy to identify highest/lowest paying industries. Simpler than boxplots but hides variation within industries.
37
What is the use of Scatterplot (Compensation vs. Sales).
Pros: Shows relationship between compensation and net sales. Identifies patterns or correlations (e.g., higher sales linked to higher pay). Can reveal industry-specific trends. Cons: Harder to interpret with overlapping points. Doesn’t show distribution within industries.