What is data mining?
The process of combining big data, machine learning, and statistics to uncover hidden patterns and make predictions.
What are the main steps in a data mining workflow?
Prepare/clean data, choose algorithm, train model, fit model, validate, deploy/update.
Difference between OLTP and OLAP?
OLTP handles real-time transactional updates, OLAP analyses large historical datasets.
What’s the difference between descriptive and predictive data mining?
Descriptive finds and explains patterns (stats, clustering, dimensionality reduction); Predictive uses patterns to make predictions (classification, regression).
What is the CRISP-DM methodology?
Six-step framework: business understanding, data understanding, data preparation, modelling, evaluation, deployment.
What is the difference between descriptive and inferential statistics?
Descriptive summarises data (mean, variance); Inferential draws conclusions/hypotheses from data (t-tests, regression).
Supervised vs Unsupervised learning?
Supervised uses labelled data (classification/regression), unsupervised finds patterns without labels (clustering, dimensionality reduction).
Examples of classification tasks?
Spam detection, species classification, predicting yes/no outcomes.
Examples of regression tasks?
Predicting continuous values like stock prices or sales amounts.
What is an association rule?
An if–then pattern, e.g. ‘If bread, then butter.’
What are support, confidence, lift, and conviction?
Support = frequency of itemset; Confidence = probability of Y given X; Lift = how much more likely Y is with X compared to random; Conviction = measure of dependency of Y on X
What algorithm is commonly used for association rules?
Apriori algorithm (finds frequent itemsets using downward closure property).
Which Python library provides TransactionEncoder?
mlxtend.preprocessing
What does TransactionEncoder do?
Converts a list of transactions into a boolean array for market basket analysis.
Which function finds frequent itemsets in mlxtend?
apriori()
Which function generates association rules?
association_rules()
What is Flask in Python?
A lightweight web framework for creating servers and APIs.
What caused the Graphviz error?
Graphviz system executable (dot) not installed or not in PATH.
How can you export code as PDF from VS Code?
For Jupyter: Export to PDF or HTML. For .py: Use ‘Print to PDF’ or an extension like PrintCode.
What was the feature mismatch error about?
Model expected more features than provided in JSON/test input, leading to X feature count mismatch.
How did your workflow improve during the course?
I moved from running Jupyter in CMD to using VS Code with a virtual environment.
What was the key milestone in your technical workflow?
Running Jupyter notebooks directly in VS Code with .venv.
How does association rule mining connect to FP&A/Power BI work?
Helps understand patterns and relationships (cross-sell, basket-type analysis) that can be replicated in reporting.
What did you learn from dataset differences compared to the instructor’s examples?
Real-world data often needs extra cleaning and preparation, unlike ‘clean’ demo data.