id5059 Flashcards by Lo B

What is the purpose of boosting methods in ensemble techniques?

To train a sequence of simple predictors that correct the errors of previous ones

Boosting methods improve the performance of weak classifiers.

How well did you know this?

Not at all

Perfectly

Name the two boosting methods discussed.

AdaBoost (Adaptive Boosting)
Gradient Boosting

These methods are used to enhance the performance of weak classifiers.

How well did you know this?

Not at all

Perfectly

In AdaBoost, what happens to the training data instances when training the ith predictor?

Instances are weighted to increase the influence of those misclassified by the ensemble

This helps improve the accuracy of the ensemble model.

How well did you know this?

Not at all

Perfectly

What is the AdaBoost model based on?

A weighted voting classifier

The predictions are made based on the performance of the predictors.

How well did you know this?

Not at all

Perfectly

In AdaBoost, what does a weight of 2 for a training instance imply?

The instance contributes twice as much to the objective function

This means it has a greater influence on the training process.

How well did you know this?

Not at all

Perfectly

What is the initial weight of each training instance in AdaBoost?

w(i)1 = 1/m for all i

Here, m is the number of training instances.

How well did you know this?

Not at all

Perfectly

What is a key characteristic of Gradient Boosting?

Each predictor is trained to directly correct the errors of the previous ones

This sequential training improves the overall model accuracy.

How well did you know this?

Not at all

Perfectly

In Gradient Boosting for Regression, what is the initial value of F0(x)?

F0(x) = 0

This serves as the starting point for the predictions.

How well did you know this?

Not at all

Perfectly

What does the learning rate parameter (η) do in Gradient Boosting?

Controls the contribution of each predictor to the final prediction

Affects how quickly the model learns.

How well did you know this?

Not at all

Perfectly

True or false: In AdaBoost, the estimators can be trained in parallel.

FALSE

Estimators in AdaBoost must be trained sequentially.

How well did you know this?

Not at all

Perfectly

What is a common base classifier used in AdaBoost?

Small decision trees

However, any classifier can be used.

How well did you know this?

Not at all

Perfectly

What is a drawback of ensemble methods like AdaBoost and Gradient Boosting?

Loss of interpretability

The complexity of the model makes it harder to understand.

How well did you know this?

Not at all

Perfectly

What is the final ensemble in Gradient Boosting based on?

The sum of the predictions of the base predictors

This aggregation improves the overall prediction accuracy.

How well did you know this?

Not at all

Perfectly

What is the role of validation in boosting methods?

To determine the learning rate and the number of predictors to use

Helps in tuning the model for better performance.

How well did you know this?

Not at all

Perfectly

What is the main goal of ensemble learning?

To combine the predictions of many weak predictors to make a strong predictor

This enhances the overall predictive performance.

How well did you know this?

Not at all

Perfectly

What is the curse of dimensionality?

Study These Flashcards

The phenomenon where algorithms and ML models see degradation in performance on high-dimensional data

This can affect computational cost and model performance.

List two issues caused by the curse of dimensionality.

Study These Flashcards

Computational cost
Model performance

High-dimensional space has counter-intuitive geometry leading to these issues.

What does dimensionality reduction aim to achieve?

Study These Flashcards

Creating a new representation of a data set in fewer dimensions while preserving its structural properties

This process can help mitigate issues related to high-dimensional data.

Name two benefits of dimensionality reduction.

Study These Flashcards

May reduce overfitting
Speeds up model training

Some models perform better in lower dimensions.

What is a major drawback of dimensionality reduction?

Study These Flashcards

Loses information

Ground truth decision boundaries may become more complex.

What are projection techniques in the context of dimensionality reduction?

Study These Flashcards

A class of linear dimensionality reduction techniques

Examples include principle component analysis (PCA) and random projection.

What does Principle Component Analysis (PCA) find?

Study These Flashcards

The dimensions of maximum variation in the data

Finding principle components is equivalent to finding a singular value decomposition (SVD).

What is the equation for the PCA decomposition?

Study These Flashcards

X = UΣV⊺

Where U, V, and Σ are matrices representing the decomposition.

How can you reduce dimensions using PCA?

Study These Flashcards

By projecting to the first k principle components: Tk = XVk

Vk contains the first k columns of V.

What does the **squared entries of Σ** represent in PCA?

Proportional to the variance of the data explained by each principle component ## Footnote This helps in understanding the importance of each component.

What is **Incremental PCA**?

An algorithm for approximately computing the PCA decomposition by processing the dataset in batches ## Footnote Useful for large datasets.

What is the **Johnson-Lindenstrauss Lemma (JL Lemma)**?

States that if k ≥ 4 log(n)(1/2 ϵ² - 1/3 ϵ³), then distances between points are preserved ## Footnote This lemma supports the effectiveness of random projections.

List two **advantages** of Random Projection.

* Simple to implement * Fast to compute ## Footnote Effective on very high-dimensional data.

What is a **disadvantage** of Random Projection?

Requires more dimensions than PCA ## Footnote Dense random projection has a high memory overhead.

What is a **manifold** in the context of topology?

A topological space that locally resembles Euclidean space near each point ## Footnote In data science, a manifold is a low-dimensional structure embedded in a higher-dimensional space.

What happens to the **manifold structure** of data when using projection techniques?

It can be lost ## Footnote This is a concern in manifold learning where the goal is to discover the structure of the manifold.

What is the goal of **manifold learning**?

To discover the structure of the manifold ## Footnote This involves not simply projecting the data along some axis but may involve stretching or twisting the data.

What is **Local Linear Embedding**?

A manifold learning method that finds a low dimensional embedding of the data preserving relative positions of nearby points ## Footnote It represents each point as a linear combination of its k nearest neighbors.

List the steps involved in **Local Linear Embedding**.

* Represent each point as a linear combination of other data points * Restrict to k nearest neighbors * Minimize reconstruction error * Normalize the weights * Find low dimensional points minimizing reconstruction error * Use weights calculated from the original dataset ## Footnote It preserves local distances well but not far distances and has a high computational cost.

What are some **other manifold learning methods**?

* Multidimensional Scaling * Isomap * t-SNE ## Footnote Each method has its own approach and computational considerations.

What does **Multidimensional Scaling** aim to achieve?

Find a low dimensional embedding preserving distances between all pairs of points ## Footnote It is computationally expensive with a complexity of O(n^3).

How does **Isomap** work?

Builds the nearest neighbor graph and embeds it to preserve geodesic distances ## Footnote It is effective and popular but expensive in high dimensions.

What is the purpose of **t-Distributed Stochastic Neighbor Embedding (t-SNE)**?

A visualization method for high dimensional data that keeps similar instances close and dissimilar instances apart ## Footnote The algorithm follows a complex optimization process, which can be computationally expensive.

id5059 Flashcards

(38 cards)