chapter 4 Flashcards

(96 cards)

1
Q

What are the three different types of data mining tasks that can be used depending on the business need?

A

Prediction, clustering, or association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is classification in data mining prediction?

A

what is being Predicted is a class label (e.g., weather: sunny, cloudy, rainy).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is regression in data mining prediction?

A

Predicting a numeric value (e.g., temperature: 31).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What kind of learning does classification employ?

A

Supervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the most frequently used data mining method?

A

Classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is nominal data?

A

Data labelled/classified into mutually exclusive categories within a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ordinal data?

A

Statistical data where variables have natural, ordered categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two steps in classification-type prediction?

A

Model development/training and model testing/deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the simple split method in classification?

A

Splitting data into 2 mutaally exclusive sets
(eg: training (~70%) and testing (30%) sets.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is k-fold cross-validation?

A

Data split into k subsets; k training/testing experiments are done.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a confusion matrix used for?

A

Accuracy estimation in classification problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a True Positive (TP)?

A

A correctly predicted positive case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a False Positive (FP)?

A

An incorrectly predicted positive case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a False Negative (FN)?

A

An incorrectly predicted negative case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a True Negative (TN)?

A

A correctly predicted negative case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are classification techniques in data mining?

A

Classification techniques are methods used to assign data items to predefined classes based on their attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

List the classification techniques mentioned in the slides.

A

Decision tree analysis, statistical analysis, neural networks, support vector machines, case-based reasoning, Bayesian classifiers, genetic algorithms, and rough sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

True or False: Decision tree analysis is a type of classification technique.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False: Neural networks are included under classification techniques.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which classification technique is based on probabilistic reasoning?

A

Bayesian classifiers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which classification technique mimics biological evolution concepts?

A

Genetic algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which classification technique focuses on similarity to past cases?

A

Case-based reasoning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which classification technique separates classes using optimal boundaries?

A

Support vector machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a decision tree?

A

A decision tree is a classification method that employs a divide-and-conquer approach to classify data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What method do decision trees employ?
Decision trees employ a divide-and-conquer method.
26
How does a decision tree divide a training set?
It recursively divides a training set until each division consists of examples from one class.
27
What happens when decision tree division is complete?
Each division contains examples from only one class.
28
What is the first step in building a decision tree?
Create a root node and assign all of the training data to it.
29
What is the second step in building a decision tree?
Select the best splitting attribute.
30
What does “best splitting attribute” mean in decision trees?
It refers to the attribute that best separates the data into distinct classes based on a chosen splitting criterion.
31
What is the third step in building a decision tree?
Add a branch to the root node for each value of the split.
32
What does splitting the data into mutually exclusive subsets mean?
Each data instance belongs to only one subset created by the split.
33
What is done after creating branches in a decision tree?
The data is split into mutually exclusive subsets along the specific split.
34
What is the fourth step in building a decision tree?
Repeat steps 2 and 3 for each and every leaf node until the stopping criteria is reached.
35
When does the decision tree building process stop?
When the stopping criteria is reached.
36
What are the main aspects in which decision tree algorithms differ?
Splitting criteria, stopping criteria, and pruning.
37
What does splitting criteria refer to in decision tree algorithms?
It refers to which variable is selected and what value is used to split the data.
38
What does stopping criteria determine in decision trees?
It determines when to stop building the tree.
39
What is pruning in decision trees?
A generalization method that determines which parts of the tree to remove.
40
Why is pruning used in decision trees?
To improve generalization and reduce overfitting.
41
List the most popular decision tree algorithms.
ID3, C4.5, C5, CART, CHAID, and M5.
42
True or False: CART is a decision tree algorithm.
True.
43
True or False: CHAID is used in decision tree construction.
True.
44
What is cluster analysis used for in data mining?
Automatic identification of natural groupings of things.
45
Which learning family does cluster analysis belong to?
The machine-learning family.
46
What type of learning does cluster analysis employ?
Unsupervised learning.
47
What does unsupervised learning mean in clustering?
Learning patterns from data without a predefined output or target variable.
48
True or False: Cluster analysis has an output variable.
False.
49
How does cluster analysis handle new instances?
It learns clusters from past data and then assigns new instances to those clusters.
50
What is cluster analysis called in marketing?
Segmentation.
51
What can clustering results be used to identify?
Natural groupings of customers.
52
How can clustering help in targeting or diagnostic purposes?
By identifying rules for assigning new cases to classes.
53
What role does clustering play in population analysis?
It provides characterization, definition, and labeling of populations.
54
True or false: Clusting decreases the size and complexity of problems for other data mining methods.
True
55
How does clustering help with rare-event detection?
By identifying outliers in a specific domain.
56
What statistical methods are used in cluster analysis?
k-means, k-modes, and similar statistical methods.
57
Which neural network methods are used in cluster analysis?
Adaptive Resonance Theory (ART) and Self-Organizing Maps (SOM).
58
What role does fuzzy logic play in clustering?
It enables fuzzy clustering such as the fuzzy c-means algorithm.
59
Which evolutionary technique is used in clustering?
Genetic algorithms.
60
What is an important question to answer before clustering?
How many clusters?
61
What is k-means clustering?
A clustering algorithm that partitions data into k pre-determined clusters.
62
What does “k” represent in k-means clustering?
The pre-determined number of clusters.
63
What is Step 0 in the k-means clustering algorithm?
Determine the value of k.
64
What is Step 1 in the k-means algorithm?
Randomly generate k random points as initial cluster centers.
65
What is Step 2 in the k-means algorithm?
Assign each point to the nearest cluster center.
66
What is Step 3 in the k-means algorithm?
Re-compute the new cluster centers.
67
What is the repetition step in k-means clustering?
Repeat steps 2 and 3 until a convergence criterion is met.
68
What does convergence usually mean in k-means clustering?
The assignment of points to clusters becomes stable.
69
What is association rule mining?
A data mining method that finds interesting relationships (affinities) between variables.
70
True or False: Association rule mining is popular in business.
True.
71
What types of relationships does association rule mining discover?
Relationships between items or events.
72
Which learning family does association rule mining belong to?
The machine-learning family.
73
What type of learning does association rule mining use?
Unsupervised learning.
74
Does association rule mining have an output variable?
No, there is no output variable.
75
What is another name for association rule mining?
Market basket analysis or affinity analysis.
76
List business applications of association rule mining.
Cross-marketing, cross-selling, store design, catalog design, e-commerce site design, optimization of online advertising, product pricing, and sales/promotion configuration.
77
How is association rule mining used in medicine?
To identify relationships between symptoms and illnesses, diagnoses and patient characteristics, treatments in medical DSS, and genes and their functions.
78
What is the generic form of an association rule?
X ⇒ Y [S%, C%].
79
What do X and Y represent in an association rule?
Products and/or services.
80
What is X called in an association rule?
The left-hand-side (LHS) or antecedent.
81
What is Y called in an association rule?
The right-hand-side (RHS) or consequent.
82
What does support (S) measure in association rules?
How often X and Y occur together.
83
Write the formula for support of an association rule.
Supp(X→Y) = (Number of baskets that contain both X and Y) / Total number of baskets.
84
What does confidence (C) measure in association rules?
How often Y occurs given X.
85
Write the formula for confidence of an association rule.
Confidence(X→Y) = Supp(X→Y) / Supp(X).
86
True or False: All association rules are interesting and useful.
False.
87
Why are association rule algorithms needed?
To discover and identify association rules from data.
88
List algorithms used for discovering association rules.
Apriori, Eclat, FP-Growth, and their derivatives and hybrids.
89
What do association rule algorithms first identify?
Frequent item sets.
90
What happens to frequent item sets after identification?
They are converted into association rules.
91
What is the Apriori algorithm?
An algorithm that finds subsets common to at least a minimum number of itemsets based on support.
92
What criterion does Apriori use to find frequent subsets?
Minimum support value.
93
What approach does the Apriori algorithm use?
A bottom-up approach.
94
How does Apriori expand frequent subsets?
One item at a time, increasing subset size incrementally.
95
What is the progression of subset sizes in Apriori?
One-item subsets, then two-item subsets, then three-item subsets, and so on.
96
How are candidate groups evaluated in Apriori?
They are tested against the data for minimum support value.