What type of data structure, used in machine learning for both regression and classification, is based on a binary tree structure?
A decision tree.
In a decision tree, what is the term for a node that represents a decision point and has two child nodes?
An internal node.
How does a decision tree process a single data point to arrive at a classification or prediction?
It passes the data point down from the root to a leaf node, making a decision at each internal node based on a feature value.
What is the final outcome for a data point once it reaches a leaf node in a decision tree?
It is assigned the label or prediction value associated with that leaf node.
List three common use cases for decision trees in business.
Credit scoring, fraud detection, and customer segmentation.
In the context of decision trees, what does the term “serving” refer to?
The process where the trained model is used to make predictions on new, unseen data.
A decision tree is what type of machine learning model: supervised or unsupervised?
Supervised.
For a classification task, what two outputs does a decision tree typically emit for a given data point?
The class label and the probability of belonging to that class.
What is the key advantage of decision trees regarding input data features compared to other models?
They can handle numerical and categorical features with equal ease, often without extensive data pre-processing.
In the decision tree training process, what hyperparameter defines the maximum depth the tree can grow to?
The max_depth.
What hyperparameter in decision tree training sets the minimum number of samples a node must have to be eligible for splitting?
The min_samples.
During training, a decision tree algorithm splits a node by selecting the feature and threshold that achieves the greatest reduction in a ____ _____.
score function
For a classification tree, what label is assigned to a leaf node during training?
The majority class label of the training data points that occupy that leaf node.
For a regression tree, what prediction value is assigned to a leaf node during training?
The average of the target values among all the training data points that occupy that leaf node.
What score function is typically used to decide the split for a classification decision tree?
Entropy reduction or Gini impurity gain.
What score function is used as the splitting criterion for a regression decision tree?
The reduction in variance.
How is feature importance calculated in a decision tree?
By calculating the weighted reduction in impurity (like entropy or Gini) attributable to a feature across all nodes where it was used for splitting.
The term “random” in random forests signifies that the subsample of data points, features, and feature _____ are randomly chosen at each node.
thresholds
What is a major limitation of vanilla decision trees regarding the training data?
The entire dataset must be loaded into memory, which can impose size limitations.
What popular decision tree algorithm uses Gini impurity values for classification, chosen for computational efficiency?
CART (Classification and Regression Trees).
What is the formula for calculating Entropy at a node D?
$Entropy(D) = -\sum_{i=1}^{k} p_i \log_2(p_i)$, where $p_i$ is the proportion of data points belonging to class i.
When is entropy at a node at its maximum value?
When the data is evenly split among all classes, reflecting maximum uncertainty.
What are the two main reasons the Gini index is often preferred over entropy for splitting nodes in decision trees?
Computational efficiency (no logarithms) and greater robustness to small changes in class probabilities.
What is the formula for the Gini index at node t?
$Gini(t) = 1 - \sum_{y=0}^{k} [p(y|t)]^2$, where $p(y|t)$ is the proportion of cases for class y.