Unsupervised learning
The ability to output useful characterizations of objects. Objects with no class labels given
Data dimensionality
Data dimensionality is about how many attributes a data point has.
Problems with high data dimensionality
more input dimensions(more attributes) leads to worse performance on the learning algorithms
The curse of dimensionality
As the number of dimensions increases, data points become more spread out, which can make models overly complex and prone to overfitting.
Higher dimensions increase
the computational cost for
algorithms. Hard to visualize multiple dimensions
Dimensionality reduction
Techniques that can be applied to reduce the dimensionality down to manageable levels.
This can be done by reducing dimensions through feature selection, linear projection, non linear projection and feature selection.
Feature selection
Filter methods: evaluate the importance of features independently of any learning algorithm.
Wrapper methods: measure the usefulness of attributes based on model performance.
Embedded methods: during the model training process.
Feature extraction
Transform the input set into a much smaller set of features.
How should we determine the “best” lower dimensional space
The best lower-dimensional representation of the data is defined by the principal components, which are the eigenvectors of the covariance matrix that capture the most variance.
Eigenvectors are directions in which data varies the most.
PCA
Breaks down a multidimensional dataset into a set
of orthogonal components. PCA dramatically reduces the dimensionality of a large
data set and potentially reveals a simpler structure.
PCA basic steps:
Multi-Dimensional Scaling (MDS)
A set of data analysis techniques used to
explore similarities or dissimilarities in data.
Used to map high-dimensional data into a lower-dimensional space such that pairwise distances between points are preserved as much as possible.
Limitations of MDS
MDS can be computationally intensive for large datasets
The choice of output dimensions can affect the interpretability
T-sne
t-sne is used to map high-dimensional data into 2D or 3D for visualization, preserving local neighborhoods.
t-sne limitations
Computationally Intensive: slow on very large
datasets
Interpretation Challenges: The axes in t-SNE
plots don’t have a specific meaning
Eigenvectors and eigenvalues
Eigenvectors are directions in which data varies the most
Eigenvalues are used to calculate the variance
represented by each eigenvector
What is the main objective of PCA?
To reduce dimensionality by creating orthogonal axes (principal components) that capture the most variance in the data
How is PCA typically visualized?
Using linear projections — usually shown as 2D or 3D scatter plots where the new axes are the principal components
When is PCA most applicable?
When dealing with linearly related data, suitable for performing feature extraction and noise reduction.
How should PCA results be interpreted
Every axis is a linear combination of the original features.
What is the main objective of MDS?
To find a low-dimensional configuration of data points that preserves pairwise distances between them.
How is MDS visualized?
As spatial plots (in 2D or 3D) that reflect dissimilarities or similarities among data points.
When is MDS most applicable?
When it’s important to preserve distances between points — often used in psychology, market research, or perception studies.
How should MDS results be interpreted
The axes are not interpretable; the focus is on relative distances between points, not on specific features.
What is the main objective of t-sne
To preserve local structure and neighborhood relationships in high-dimensional data.