Steps from euclidean distances E to configurations X
Steps from configurations X to distances E and what information do we lose
What is classical multidimensional scaling? What is the impact of eigenvalues on CMS?
How to run multidimensional scaling?
can use R function: cmdscale(distance object, k=max dimension, eig=TRUE) where the distance object E needs to be symmetrised
Give properties of a metric d(x,y)
keep in mind: if {dα} is a family of metrics then sumα (dα) is a metric
Define Hamming distance
The number of mismatches: d(x,y) = sum(from i=1 to n) di(x,y) = b+c, where di(x,y) = 1 if xi = yi and 0 otherwise
Define Jaccard distance
dJ(x,y) = (b+c) / (a+b+c)
c = sum (1{x=0, y=1}) and b = sum (1{x=1, y=0})
a = sum (1{x=y=1}) and d = sum (1{x=y=0})Define example of 5 other dissimilarities distance
What is the stress function?
Stress function is the degree of agreement of dissimilarities {δm,l} and created euclidean distances {dm,l}
. monotone linear regression to get fitted {d^m,l} in the same order as {δm,l}
. S(X) = sqrt(S/T) where S*= sum(m
Whate are the Miles Algorithm and Young’s boundary search algorithm?
Algorithms for the monotone linear regression resulting in increasing step function (each step value is the mean of the values in that “block”)
How do we chose configuration X from non-euclidean dissimilarities?
- minimise it over all possible configurations
How to find optimal configuration from stress function?
Mention advantage of ordinal scaling
It can cope with missing data
Steps of K-means clustering
What is self-organising maps SOM?
Similar to k-means but we assign and update values of closest neighbours instead of centroids
What is Procrustes Analysis?
Find the best configuration Y such that G(X,Y) = sum_k sum_i (Xi,k - Yi,k)^2 is minimised under translation, rotation and scale change.
Steps of Procrustes
What is the Genral Procrustes Analysis algorithm?
(end of Block 2)