Data Mining Flashcards

(24 cards)

1
Q

What is data mining?

A

The process of combining big data, machine learning, and statistics to uncover hidden patterns and make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the main steps in a data mining workflow?

A

Prepare/clean data, choose algorithm, train model, fit model, validate, deploy/update.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Difference between OLTP and OLAP?

A

OLTP handles real-time transactional updates, OLAP analyses large historical datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between descriptive and predictive data mining?

A

Descriptive finds and explains patterns (stats, clustering, dimensionality reduction); Predictive uses patterns to make predictions (classification, regression).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the CRISP-DM methodology?

A

Six-step framework: business understanding, data understanding, data preparation, modelling, evaluation, deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between descriptive and inferential statistics?

A

Descriptive summarises data (mean, variance); Inferential draws conclusions/hypotheses from data (t-tests, regression).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Supervised vs Unsupervised learning?

A

Supervised uses labelled data (classification/regression), unsupervised finds patterns without labels (clustering, dimensionality reduction).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Examples of classification tasks?

A

Spam detection, species classification, predicting yes/no outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Examples of regression tasks?

A

Predicting continuous values like stock prices or sales amounts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an association rule?

A

An if–then pattern, e.g. ‘If bread, then butter.’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are support, confidence, lift, and conviction?

A

Support = frequency of itemset; Confidence = probability of Y given X; Lift = how much more likely Y is with X compared to random; Conviction = measure of dependency of Y on X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What algorithm is commonly used for association rules?

A

Apriori algorithm (finds frequent itemsets using downward closure property).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which Python library provides TransactionEncoder?

A

mlxtend.preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does TransactionEncoder do?

A

Converts a list of transactions into a boolean array for market basket analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which function finds frequent itemsets in mlxtend?

A

apriori()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which function generates association rules?

A

association_rules()

17
Q

What is Flask in Python?

A

A lightweight web framework for creating servers and APIs.

18
Q

What caused the Graphviz error?

A

Graphviz system executable (dot) not installed or not in PATH.

19
Q

How can you export code as PDF from VS Code?

A

For Jupyter: Export to PDF or HTML. For .py: Use ‘Print to PDF’ or an extension like PrintCode.

20
Q

What was the feature mismatch error about?

A

Model expected more features than provided in JSON/test input, leading to X feature count mismatch.

21
Q

How did your workflow improve during the course?

A

I moved from running Jupyter in CMD to using VS Code with a virtual environment.

22
Q

What was the key milestone in your technical workflow?

A

Running Jupyter notebooks directly in VS Code with .venv.

23
Q

How does association rule mining connect to FP&A/Power BI work?

A

Helps understand patterns and relationships (cross-sell, basket-type analysis) that can be replicated in reporting.

24
Q

What did you learn from dataset differences compared to the instructor’s examples?

A

Real-world data often needs extra cleaning and preparation, unlike ‘clean’ demo data.