Decision Trees Flashcards by Abhishek Verma

What type of data structure, used in machine learning for both regression and classification, is based on a binary tree structure?

A decision tree.

How well did you know this?

Not at all

Perfectly

In a decision tree, what is the term for a node that represents a decision point and has two child nodes?

An internal node.

How well did you know this?

Not at all

Perfectly

How does a decision tree process a single data point to arrive at a classification or prediction?

It passes the data point down from the root to a leaf node, making a decision at each internal node based on a feature value.

How well did you know this?

Not at all

Perfectly

What is the final outcome for a data point once it reaches a leaf node in a decision tree?

It is assigned the label or prediction value associated with that leaf node.

How well did you know this?

Not at all

Perfectly

List three common use cases for decision trees in business.

Credit scoring, fraud detection, and customer segmentation.

How well did you know this?

Not at all

Perfectly

In the context of decision trees, what does the term “serving” refer to?

The process where the trained model is used to make predictions on new, unseen data.

How well did you know this?

Not at all

Perfectly

A decision tree is what type of machine learning model: supervised or unsupervised?

Supervised.

How well did you know this?

Not at all

Perfectly

For a classification task, what two outputs does a decision tree typically emit for a given data point?

The class label and the probability of belonging to that class.

How well did you know this?

Not at all

Perfectly

What is the key advantage of decision trees regarding input data features compared to other models?

They can handle numerical and categorical features with equal ease, often without extensive data pre-processing.

How well did you know this?

Not at all

Perfectly

In the decision tree training process, what hyperparameter defines the maximum depth the tree can grow to?

The max_depth.

How well did you know this?

Not at all

Perfectly

What hyperparameter in decision tree training sets the minimum number of samples a node must have to be eligible for splitting?

The min_samples.

How well did you know this?

Not at all

Perfectly

During training, a decision tree algorithm splits a node by selecting the feature and threshold that achieves the greatest reduction in a ____ _____.

score function

How well did you know this?

Not at all

Perfectly

For a classification tree, what label is assigned to a leaf node during training?

The majority class label of the training data points that occupy that leaf node.

How well did you know this?

Not at all

Perfectly

For a regression tree, what prediction value is assigned to a leaf node during training?

The average of the target values among all the training data points that occupy that leaf node.

How well did you know this?

Not at all

Perfectly

What score function is typically used to decide the split for a classification decision tree?

Entropy reduction or Gini impurity gain.

How well did you know this?

Not at all

Perfectly

What score function is used as the splitting criterion for a regression decision tree?

The reduction in variance.

How well did you know this?

Not at all

Perfectly

How is feature importance calculated in a decision tree?

By calculating the weighted reduction in impurity (like entropy or Gini) attributable to a feature across all nodes where it was used for splitting.

How well did you know this?

Not at all

Perfectly

The term “random” in random forests signifies that the subsample of data points, features, and feature _____ are randomly chosen at each node.

thresholds

How well did you know this?

Not at all

Perfectly

What is a major limitation of vanilla decision trees regarding the training data?

The entire dataset must be loaded into memory, which can impose size limitations.

How well did you know this?

Not at all

Perfectly

What popular decision tree algorithm uses Gini impurity values for classification, chosen for computational efficiency?

CART (Classification and Regression Trees).

How well did you know this?

Not at all

Perfectly

What is the formula for calculating Entropy at a node D?

$Entropy(D) = -\sum_{i=1}^{k} p_i \log_2(p_i)$, where $p_i$ is the proportion of data points belonging to class i.

How well did you know this?

Not at all

Perfectly

When is entropy at a node at its maximum value?

Study These Flashcards

When the data is evenly split among all classes, reflecting maximum uncertainty.

What are the two main reasons the Gini index is often preferred over entropy for splitting nodes in decision trees?

Study These Flashcards

Computational efficiency (no logarithms) and greater robustness to small changes in class probabilities.

What is the formula for the Gini index at node t?

Study These Flashcards

$Gini(t) = 1 - \sum_{y=0}^{k} [p(y|t)]^2$, where $p(y|t)$ is the proportion of cases for class y.

In a regression tree, how is the variance at a node D calculated?

$Var(D) = \frac{1}{r} \sum_{i=1}^{r} (y_i - \mu)^2$, where $r$ is the number of data points and $\mu$ is the mean of their target values.

What is the term for the limitation of decision trees where small changes in training data can lead to significantly different tree structures?

Brittleness or instability.

What is a common ensemble method used to address the brittleness and improve the robustness of decision trees?

Random forests or gradient boosted decision trees.

Finding the optimal feature and threshold for each split in a decision tree is computationally exhaustive, known as an __-_______ problem.

NP-complete

What is the primary risk associated with growing a decision tree that is too deep?

Overfitting to the training data, which harms its ability to generalize to new data.

What is a common technique used to prevent decision trees from overfitting?

Tree pruning, such as limiting the maximum depth or setting a minimum number of samples per leaf.

Why is using batches of data not practical for training a single, vanilla decision tree?

Training a single tree requires the entire dataset to be loaded into memory to effectively select feature thresholds at each node.

How can ensemble methods be used to train on a dataset that is too large to fit into memory?

By training multiple decision trees, each on a different subset of the dataset.

What is a Hoeffding tree?

A type of decision tree that can grow incrementally as new data arrives, making it suitable for online learning.

What is the general approach for updating an existing decision tree ensemble with new incremental data?

Mini-batch the new data to train a new decision tree, which is then incorporated into the existing ensemble, possibly replacing an older tree.

In decision analysis, a decision tree is a flowchart-like structure where each internal node represents a _____ on an attribute.

test

What do the leaf nodes in a decision tree represent?

A class label or a final decision taken after computing all attributes.

What are the three types of nodes in a decision tree used for decision analysis?

Decision nodes, chance nodes, and end nodes.

In a decision analysis flowchart, what shape typically represents a decision node?

A square.

In a decision analysis flowchart, what shape typically represents a chance node?

A circle.

A decision tree can be linearized into a set of if-then statements known as _____ _____.

decision rules

What is a major disadvantage of decision trees regarding categorical variables with many levels?

Information gain is biased in favor of attributes with more levels.

What is meant by a "pure node" in a decision tree?

A node where all the data belongs to a single class.

While increasing tree depth can increase accuracy, what are two potential negative consequences?

Increased runtime and a higher risk of overfitting.

What is a major drawback of using information gain as a splitting criterion?

It tends to select features that have more unique values.

An ensemble method that generates many decision trees and uses their collective votes to make a final classification is called a _____ _____.

random forest

What is the formula for Accuracy based on a confusion matrix?

Accuracy = (TP + TN) / (TP + TN + FP + FN).

What evaluation metric, also known as the true positive rate (TPR), is calculated as TP / (TP + FN)?

Sensitivity or Recall.

What does a low sensitivity score indicate about a classification model?

It indicates the model does not do well at identifying actual positive samples.

What evaluation metric, also known as the true negative rate (TNR), is calculated as TN / (TN + FP)?

Specificity.

What evaluation metric, also known as positive predictive value (PPV), measures the proportion of positive predictions that were actually correct?

Precision.

What is the formula for Precision?

Precision = TP / (TP + FP).

What is a key difference in structure between a decision tree and an influence diagram?

An influence diagram can represent information more compactly, focusing on relationships between events, whereas a decision tree shows explicit paths.

In regression, the training process for a decision tree aims to select splits that maximize the reduction in _____.

variance

What is a key difference between the Gini index and entropy regarding their mathematical computation?

The Gini index uses straightforward arithmetic (squaring), while entropy requires logarithmic calculations, making Gini computationally faster.

Decision Trees Flashcards

(54 cards)