Data Mining Flashcards

Question 1

Q

What is data mining?

Answer

A

The process of combining big data, machine learning, and statistics to uncover hidden patterns and make predictions.

Question 2

Q

What are the main steps in a data mining workflow?

Answer

A

Prepare/clean data, choose algorithm, train model, fit model, validate, deploy/update.

Question 3

Q

Difference between OLTP and OLAP?

Answer

A

OLTP handles real-time transactional updates, OLAP analyses large historical datasets.

Question 4

Q

What’s the difference between descriptive and predictive data mining?

Answer

A

Descriptive finds and explains patterns (stats, clustering, dimensionality reduction); Predictive uses patterns to make predictions (classification, regression).

Question 5

Q

What is the CRISP-DM methodology?

Answer

A

Six-step framework: business understanding, data understanding, data preparation, modelling, evaluation, deployment.

Question 6

Q

What is the difference between descriptive and inferential statistics?

Answer

A

Descriptive summarises data (mean, variance); Inferential draws conclusions/hypotheses from data (t-tests, regression).

Question 7

Q

Supervised vs Unsupervised learning?

Answer

A

Supervised uses labelled data (classification/regression), unsupervised finds patterns without labels (clustering, dimensionality reduction).

Question 8

Q

Examples of classification tasks?

Answer

A

Spam detection, species classification, predicting yes/no outcomes.

Question 9

Q

Examples of regression tasks?

Answer

A

Predicting continuous values like stock prices or sales amounts.

Question 10

Q

What is an association rule?

Answer

A

An if–then pattern, e.g. ‘If bread, then butter.’

Question 11

Q

What are support, confidence, lift, and conviction?

Answer

A

Support = frequency of itemset; Confidence = probability of Y given X; Lift = how much more likely Y is with X compared to random; Conviction = measure of dependency of Y on X

Question 12

Q

What algorithm is commonly used for association rules?

Answer

A

Apriori algorithm (finds frequent itemsets using downward closure property).

Question 13

Q

Which Python library provides TransactionEncoder?

Answer

A

mlxtend.preprocessing

Question 14

Q

What does TransactionEncoder do?

Answer

A

Converts a list of transactions into a boolean array for market basket analysis.

Question 15

Q

Which function finds frequent itemsets in mlxtend?

Answer

A

apriori()

Question 16

Q

Which function generates association rules?

Answer

Study These Flashcards

A

association_rules()

Question 17

Q

What is Flask in Python?

Answer

Study These Flashcards

A

A lightweight web framework for creating servers and APIs.

Question 18

Q

What caused the Graphviz error?

Answer

Study These Flashcards

A

Graphviz system executable (dot) not installed or not in PATH.

Question 19

Q

How can you export code as PDF from VS Code?

Answer

Study These Flashcards

A

For Jupyter: Export to PDF or HTML. For .py: Use ‘Print to PDF’ or an extension like PrintCode.

Question 20

Q

What was the feature mismatch error about?

Answer

Study These Flashcards

A

Model expected more features than provided in JSON/test input, leading to X feature count mismatch.

Question 21

Q

How did your workflow improve during the course?

Answer

Study These Flashcards

A

I moved from running Jupyter in CMD to using VS Code with a virtual environment.

Question 22

Q

What was the key milestone in your technical workflow?

Answer

Study These Flashcards

A

Running Jupyter notebooks directly in VS Code with .venv.

Question 23

Q

How does association rule mining connect to FP&A/Power BI work?

Answer

Study These Flashcards

A

Helps understand patterns and relationships (cross-sell, basket-type analysis) that can be replicated in reporting.

Question 24

Q

What did you learn from dataset differences compared to the instructor’s examples?

Answer

Study These Flashcards

A

Real-world data often needs extra cleaning and preparation, unlike ‘clean’ demo data.

Data Mining Flashcards

(24 cards)