PCA
When to Use PCA
MDS - Multidimensional Scaling
MDS is a method that takes the distance matrix and tries to find a configuration of points in a space (in this case, a 2D space) where the distances between the points match the distances in the matrix as closely as possible.
principal coordinates analysis (PCO) - give example
Imagine you have a dataset with information about the heights and weights of a group of people. A classical solution like principal coordinates analysis (PCO) or classical scaling would take this data and find a way to represent each person as a point in a space where the distances between the points reflect how similar or dissimilar the people are based on their heights and weights.
“Euclidean space”
“Euclidean space” refers to the familiar space we live in, where distances are measured in a straight line, and the principles of Euclidean geometry apply. It’s named after the ancient Greek mathematician Euclid, who laid down the foundational rules for this type of space.
Here’s a simple way to understand it:
Imagine you have a piece of graph paper, with two axes: one horizontal (x-axis) and one vertical (y-axis). Each point on this graph represents a unique location in this space. For example, the point (2,3) would mean moving 2 units to the right along the x-axis and 3 units upwards along the y-axis from the origin (0,0).
Jaccard’s coefficient and Jaccard dissimilarity
Directly interpretable as the “proportion of species (or characteristics)
shared” between two objects or sites. S = a/(a+b+c) ,Ignores joint absence information
DJ = 1 – SJ = Jaccard dissimilarity, interpretable as the proportion of
unshared species (or characters)
Distance measures for continuous variables
Bray-Curtis dissimilarity
It does not include double zeros.
* The comparison between two sampling units does
not depend on the rest of the data set.
* It is a ratio with an upper limit: 0 ≤ d ≤ 1.
* Can be interpreted as a “percentage difference” in
ecological terms.
* The contribution of variables to the measure will be
relative to their scale of measurement – so an
appropriate transformation a priori is generally
recommended
Bray-Curtis takes into account both the types of species in a community and how abundant they are. It’s like considering not only the types of houses in a neighborhood but also how many of each type there are.
For instance, if in Neighborhood A, there are mostly small houses and in Neighborhood B, there are mostly big houses, Bray-Curtis helps capture these differences in both types and abundances.
A complexity of PCO
Eigenvalues
Eigenvalues are like measures of how much variation or spread there is in the data. Higher eigenvalues mean more spread, while lower ones mean less spread.
Shepard plot
what is the relationship between the distances in
the original matrix compared with what we are
seeing in Euclidean space on our plot. To check if your squishing job is any good, you can look at something called a Shepard plot. This plot compares the distances between points in the original data with the distances between those same points in the squished-down space. If the distances look similar, then your squishing job is probably pretty good. But if the distances are all wonky, then maybe you need to squish the data in a different way to capture the important stuff better.
Why use MDS?
Why use MDS?
* Sometimes PCO doesn’t do a very good job of
representing the relative inter-point
relationships in a small no. of dimensions.
* Instead of using an eigenvector technique, we
could use a criterion based on distances
themselves that focuses explicitly on
dimension reduction.
* Namely, why not use the concept underlying
the Shepard plot as our criterion for
ordination?
MDS
Choose a number of dimensions (k) that you want for the
configuration.
*Place the points into that space so that the Euclidean distances
in the configuration (of reduced dimension) replicate the actual
underlying dissimilarities as well as possible
Metric vs Non-metric MDS
Non-metric MDS
least squares monotone regression
the MDS technique is being used to find the least squares monotone regression of a new variable (ˆ new d rs) based on the original variable (drs) while minimizing stress, ensuring that the relationship between the variables is preserved as much as possible in the low-dimensional space.
stress in MDS
Stress is a measure of how much the distances between the points in the low-dimensional space differ from the original distances in the high-dimensional space. Minimizing stress means finding the configuration of points that best represents the original distances.