1. Standardize data 2. Calculate the covariance matrix of the data 3. Calculate the eigenvectors and eigenvalues of the covariance matrix 4. Choose the principal components 5. Project data onto principal components

FODS dimensionality reduction Flashcards by Unknown Unknown

Unsupervised learning

The ability to output useful characterizations of objects. Objects with no class labels given

How well did you know this?

Not at all

Perfectly

Data dimensionality

Data dimensionality is about how many attributes a data point has.

How well did you know this?

Not at all

Perfectly

Problems with high data dimensionality

more input dimensions(more attributes) leads to worse performance on the learning algorithms

How well did you know this?

Not at all

Perfectly

The curse of dimensionality

As the number of dimensions increases, data points become more spread out, which can make models overly complex and prone to overfitting.

Higher dimensions increase
the computational cost for
algorithms. Hard to visualize multiple dimensions

How well did you know this?

Not at all

Perfectly

Dimensionality reduction

Techniques that can be applied to reduce the dimensionality down to manageable levels.

This can be done by reducing dimensions through feature selection, linear projection, non linear projection and feature selection.

How well did you know this?

Not at all

Perfectly

Feature selection

Filter methods: evaluate the importance of features independently of any learning algorithm.

Wrapper methods: measure the usefulness of attributes based on model performance.

Embedded methods: during the model training process.

How well did you know this?

Not at all

Perfectly

Feature extraction

Transform the input set into a much smaller set of features.

How well did you know this?

Not at all

Perfectly

How should we determine the “best” lower dimensional space

The best lower-dimensional representation of the data is defined by the principal components, which are the eigenvectors of the covariance matrix that capture the most variance.

Eigenvectors are directions in which data varies the most.

How well did you know this?

Not at all

Perfectly

PCA

Breaks down a multidimensional dataset into a set
of orthogonal components. PCA dramatically reduces the dimensionality of a large
data set and potentially reveals a simpler structure.

How well did you know this?

Not at all

Perfectly

PCA basic steps:

Standardize data
Calculate the covariance matrix of the data
Calculate the eigenvectors and eigenvalues of the covariance matrix
Choose the principal components
Project data onto principal components

How well did you know this?

Not at all

Perfectly

Multi-Dimensional Scaling (MDS)

A set of data analysis techniques used to
explore similarities or dissimilarities in data.

Used to map high-dimensional data into a lower-dimensional space such that pairwise distances between points are preserved as much as possible.

How well did you know this?

Not at all

Perfectly

Limitations of MDS

MDS can be computationally intensive for large datasets
The choice of output dimensions can affect the interpretability

How well did you know this?

Not at all

Perfectly

T-sne

t-sne is used to map high-dimensional data into 2D or 3D for visualization, preserving local neighborhoods.

How well did you know this?

Not at all

Perfectly

t-sne limitations

Computationally Intensive: slow on very large
datasets
Interpretation Challenges: The axes in t-SNE
plots don’t have a specific meaning

How well did you know this?

Not at all

Perfectly

Eigenvectors and eigenvalues

Eigenvectors are directions in which data varies the most
Eigenvalues are used to calculate the variance
represented by each eigenvector

How well did you know this?

Not at all

Perfectly

What is the main objective of PCA?

Study These Flashcards

To reduce dimensionality by creating orthogonal axes (principal components) that capture the most variance in the data

How is PCA typically visualized?

Study These Flashcards

Using linear projections — usually shown as 2D or 3D scatter plots where the new axes are the principal components

When is PCA most applicable?

Study These Flashcards

When dealing with linearly related data, suitable for performing feature extraction and noise reduction.

How should PCA results be interpreted

Study These Flashcards

Every axis is a linear combination of the original features.

What is the main objective of MDS?

Study These Flashcards

To find a low-dimensional configuration of data points that preserves pairwise distances between them.

How is MDS visualized?

Study These Flashcards

As spatial plots (in 2D or 3D) that reflect dissimilarities or similarities among data points.

When is MDS most applicable?

Study These Flashcards

When it’s important to preserve distances between points — often used in psychology, market research, or perception studies.

How should MDS results be interpreted

Study These Flashcards

The axes are not interpretable; the focus is on relative distances between points, not on specific features.

What is the main objective of t-sne

Study These Flashcards

To preserve local structure and neighborhood relationships in high-dimensional data.

How is t-sne visualized

Through 2D or 3D plots that show clusters and local groupings of similar data points.

When is t-sne most applicable

For complex, nonlinear datasets — such as images, text embeddings, or biological data (like gene expression).

How should t-SNE results be interpreted?

The axes have no direct meaning; the emphasis is on cluster layout and relationships, not on feature-based interpretation.

how PCA determines the principal components onto which the data is projected. In your answer, mention the role of variance and eigenvectors.

PCA standardizes the data, computes the covariance matrix, and finds its eigenvectors and eigenvalues. Eigenvectors show directions of maximum variance; eigenvalues show how much variance each explains. The top eigenvectors (largest eigenvalues) form the principal components used to project the data.

FODS dimensionality reduction Flashcards

(28 cards)