id5059 Flashcards

(38 cards)

1
Q

What is the purpose of boosting methods in ensemble techniques?

A

To train a sequence of simple predictors that correct the errors of previous ones

Boosting methods improve the performance of weak classifiers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the two boosting methods discussed.

A
  • AdaBoost (Adaptive Boosting)
  • Gradient Boosting

These methods are used to enhance the performance of weak classifiers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In AdaBoost, what happens to the training data instances when training the ith predictor?

A

Instances are weighted to increase the influence of those misclassified by the ensemble

This helps improve the accuracy of the ensemble model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the AdaBoost model based on?

A

A weighted voting classifier

The predictions are made based on the performance of the predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In AdaBoost, what does a weight of 2 for a training instance imply?

A

The instance contributes twice as much to the objective function

This means it has a greater influence on the training process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the initial weight of each training instance in AdaBoost?

A

w(i)1 = 1/m for all i

Here, m is the number of training instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a key characteristic of Gradient Boosting?

A

Each predictor is trained to directly correct the errors of the previous ones

This sequential training improves the overall model accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In Gradient Boosting for Regression, what is the initial value of F0(x)?

A

F0(x) = 0

This serves as the starting point for the predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the learning rate parameter (η) do in Gradient Boosting?

A

Controls the contribution of each predictor to the final prediction

Affects how quickly the model learns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or false: In AdaBoost, the estimators can be trained in parallel.

A

FALSE

Estimators in AdaBoost must be trained sequentially.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a common base classifier used in AdaBoost?

A

Small decision trees

However, any classifier can be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a drawback of ensemble methods like AdaBoost and Gradient Boosting?

A

Loss of interpretability

The complexity of the model makes it harder to understand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the final ensemble in Gradient Boosting based on?

A

The sum of the predictions of the base predictors

This aggregation improves the overall prediction accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the role of validation in boosting methods?

A

To determine the learning rate and the number of predictors to use

Helps in tuning the model for better performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the main goal of ensemble learning?

A

To combine the predictions of many weak predictors to make a strong predictor

This enhances the overall predictive performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the curse of dimensionality?

A

The phenomenon where algorithms and ML models see degradation in performance on high-dimensional data

This can affect computational cost and model performance.

17
Q

List two issues caused by the curse of dimensionality.

A
  • Computational cost
  • Model performance

High-dimensional space has counter-intuitive geometry leading to these issues.

18
Q

What does dimensionality reduction aim to achieve?

A

Creating a new representation of a data set in fewer dimensions while preserving its structural properties

This process can help mitigate issues related to high-dimensional data.

19
Q

Name two benefits of dimensionality reduction.

A
  • May reduce overfitting
  • Speeds up model training

Some models perform better in lower dimensions.

20
Q

What is a major drawback of dimensionality reduction?

A

Loses information

Ground truth decision boundaries may become more complex.

21
Q

What are projection techniques in the context of dimensionality reduction?

A

A class of linear dimensionality reduction techniques

Examples include principle component analysis (PCA) and random projection.

22
Q

What does Principle Component Analysis (PCA) find?

A

The dimensions of maximum variation in the data

Finding principle components is equivalent to finding a singular value decomposition (SVD).

23
Q

What is the equation for the PCA decomposition?

A

X = UΣV⊺

Where U, V, and Σ are matrices representing the decomposition.

24
Q

How can you reduce dimensions using PCA?

A

By projecting to the first k principle components: Tk = XVk

Vk contains the first k columns of V.

25
What does the **squared entries of Σ** represent in PCA?
Proportional to the variance of the data explained by each principle component ## Footnote This helps in understanding the importance of each component.
26
What is **Incremental PCA**?
An algorithm for approximately computing the PCA decomposition by processing the dataset in batches ## Footnote Useful for large datasets.
27
What is the **Johnson-Lindenstrauss Lemma (JL Lemma)**?
States that if k ≥ 4 log(n)(1/2 ϵ² - 1/3 ϵ³), then distances between points are preserved ## Footnote This lemma supports the effectiveness of random projections.
28
List two **advantages** of Random Projection.
* Simple to implement * Fast to compute ## Footnote Effective on very high-dimensional data.
29
What is a **disadvantage** of Random Projection?
Requires more dimensions than PCA ## Footnote Dense random projection has a high memory overhead.
30
What is a **manifold** in the context of topology?
A topological space that locally resembles Euclidean space near each point ## Footnote In data science, a manifold is a low-dimensional structure embedded in a higher-dimensional space.
31
What happens to the **manifold structure** of data when using projection techniques?
It can be lost ## Footnote This is a concern in manifold learning where the goal is to discover the structure of the manifold.
32
What is the goal of **manifold learning**?
To discover the structure of the manifold ## Footnote This involves not simply projecting the data along some axis but may involve stretching or twisting the data.
33
What is **Local Linear Embedding**?
A manifold learning method that finds a low dimensional embedding of the data preserving relative positions of nearby points ## Footnote It represents each point as a linear combination of its k nearest neighbors.
34
List the steps involved in **Local Linear Embedding**.
* Represent each point as a linear combination of other data points * Restrict to k nearest neighbors * Minimize reconstruction error * Normalize the weights * Find low dimensional points minimizing reconstruction error * Use weights calculated from the original dataset ## Footnote It preserves local distances well but not far distances and has a high computational cost.
35
What are some **other manifold learning methods**?
* Multidimensional Scaling * Isomap * t-SNE ## Footnote Each method has its own approach and computational considerations.
36
What does **Multidimensional Scaling** aim to achieve?
Find a low dimensional embedding preserving distances between all pairs of points ## Footnote It is computationally expensive with a complexity of O(n^3).
37
How does **Isomap** work?
Builds the nearest neighbor graph and embeds it to preserve geodesic distances ## Footnote It is effective and popular but expensive in high dimensions.
38
What is the purpose of **t-Distributed Stochastic Neighbor Embedding (t-SNE)**?
A visualization method for high dimensional data that keeps similar instances close and dissimilar instances apart ## Footnote The algorithm follows a complex optimization process, which can be computationally expensive.