FODS dimensionality reduction Flashcards

(28 cards)

1
Q

Unsupervised learning

A

The ability to output useful characterizations of objects. Objects with no class labels given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data dimensionality

A

Data dimensionality is about how many attributes a data point has.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Problems with high data dimensionality

A

more input dimensions(more attributes) leads to worse performance on the learning algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The curse of dimensionality

A

As the number of dimensions increases, data points become more spread out, which can make models overly complex and prone to overfitting.

Higher dimensions increase
the computational cost for
algorithms. Hard to visualize multiple dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Dimensionality reduction

A

Techniques that can be applied to reduce the dimensionality down to manageable levels.

This can be done by reducing dimensions through feature selection, linear projection, non linear projection and feature selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Feature selection

A

Filter methods: evaluate the importance of features independently of any learning algorithm.

Wrapper methods: measure the usefulness of attributes based on model performance.

Embedded methods: during the model training process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Feature extraction

A

Transform the input set into a much smaller set of features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How should we determine the “best” lower dimensional space

A

The best lower-dimensional representation of the data is defined by the principal components, which are the eigenvectors of the covariance matrix that capture the most variance.

Eigenvectors are directions in which data varies the most.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

PCA

A

Breaks down a multidimensional dataset into a set
of orthogonal components. PCA dramatically reduces the dimensionality of a large
data set and potentially reveals a simpler structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PCA basic steps:

A
  1. Standardize data
  2. Calculate the covariance matrix of the data
  3. Calculate the eigenvectors and eigenvalues of the covariance matrix
  4. Choose the principal components
  5. Project data onto principal components
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multi-Dimensional Scaling (MDS)

A

A set of data analysis techniques used to
explore similarities or dissimilarities in data.

Used to map high-dimensional data into a lower-dimensional space such that pairwise distances between points are preserved as much as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Limitations of MDS

A

MDS can be computationally intensive for large datasets
The choice of output dimensions can affect the interpretability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T-sne

A

t-sne is used to map high-dimensional data into 2D or 3D for visualization, preserving local neighborhoods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

t-sne limitations

A

Computationally Intensive: slow on very large
datasets
Interpretation Challenges: The axes in t-SNE
plots don’t have a specific meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Eigenvectors and eigenvalues

A

Eigenvectors are directions in which data varies the most
Eigenvalues are used to calculate the variance
represented by each eigenvector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main objective of PCA?

A

To reduce dimensionality by creating orthogonal axes (principal components) that capture the most variance in the data

17
Q

How is PCA typically visualized?

A

Using linear projections — usually shown as 2D or 3D scatter plots where the new axes are the principal components

18
Q

When is PCA most applicable?

A

When dealing with linearly related data, suitable for performing feature extraction and noise reduction.

19
Q

How should PCA results be interpreted

A

Every axis is a linear combination of the original features.

20
Q

What is the main objective of MDS?

A

To find a low-dimensional configuration of data points that preserves pairwise distances between them.

21
Q

How is MDS visualized?

A

As spatial plots (in 2D or 3D) that reflect dissimilarities or similarities among data points.

22
Q

When is MDS most applicable?

A

When it’s important to preserve distances between points — often used in psychology, market research, or perception studies.

23
Q

How should MDS results be interpreted

A

The axes are not interpretable; the focus is on relative distances between points, not on specific features.

24
Q

What is the main objective of t-sne

A

To preserve local structure and neighborhood relationships in high-dimensional data.

25
How is t-sne visualized
Through 2D or 3D plots that show clusters and local groupings of similar data points.
26
When is t-sne most applicable
For complex, nonlinear datasets — such as images, text embeddings, or biological data (like gene expression).
27
How should t-SNE results be interpreted?
The axes have no direct meaning; the emphasis is on cluster layout and relationships, not on feature-based interpretation.
28
how PCA determines the principal components onto which the data is projected. In your answer, mention the role of variance and eigenvectors.
PCA standardizes the data, computes the covariance matrix, and finds its eigenvectors and eigenvalues. Eigenvectors show directions of maximum variance; eigenvalues show how much variance each explains. The top eigenvectors (largest eigenvalues) form the principal components used to project the data.