Give two reasons why unsupervised learning is often more challenging than supervised learning
Describe how principal components analysis works
PCA transforms a high-dimensional dataset into a smaller, much more manageable set of representative variables that capture most of the information in the original dataset (especially useful for highly correlated data)
Describe how centering and scaling the variables will affect the results of principal components analysis
Describe the drawbacks (or limitations) of principal components analysis
Explain how K-means clustering works
K-means clustering assigns each observation in a dataset into one of relatively homogeneous K clusters, where K is specified upfront.
First, we randomly assign K points to be initial cluster centers. Then we perform an iteration process:
1. Assign each observation to the closest cluster based on Eucliedean distance
2. Recalculate the center of each of the K clusters
3. Repeat until the cluster assignments no longer change
Explain what the term “K-means” refers to
The algorithim involves iteratively calculating the K means/centers of the clusters, hence the name
Explain why it is desirable to run a K-means clustering algorithm multiple times
This is because the k-means clustering algorithim is guaranteed to arrive at a local but not global optimum. Initial cluster assignments affect the local optimum, so running the k-means clustering algorithim multiple times (20 to 50) increases the change of identifying a global optimum and getting a representative cluster grouping
Explain how the elbow method can be used to select the value of K
A plot of the proportion of variance explained (equal to the between-cluster variation divided by the total variation in the data) as new clusters get added can be used. the elbow of this plot represents when the proportion of variance explained has plateued.
Explain how hierarchical clustering works
Consists of a series of fusions of observations in the data. This is a bottoms-up clustering method that starts with the individual observations treated as its own cluster, then successively fuses the closest pair of clusters one pair at a time. The process iterates until all clusters are fused into a single cluster containing all observations
Explain the difference between average linkage and centroid linkage
Explain the two differences between K-means clustering and hierarchical clustering
K-means
* Randomization is needed to determine initial cluster centers
* Number of clusters is pre-specified
* Clusters are not nested
Hierarchical clustering
* Randomization is not needed
* Number of clusters is not pre-specified
* Clusters are nested
Explain how scaling the variables will affect the results of hierarchical clustering
Without scaling: unscaled variables result in some variables dominating the distance calculations to exert a disproportionate impact on cluster assignments
With scaling: equal importance is attached to each feature when performing distance calculations
Explain two ways in which clustering can be used to generate features for predicting a target variable
Generates features in two ways
1. Cluster groups: group assignments as a result of clustering is a factor variable/feature that can be used to predict a target variable
2. Cluster centers: replace original variables by the cluster centers to serve as numeric features. two advantages to this feature generation
* interpretation, cluster centers provide numeric summary of the characteristics of observations in different clusters
* prediction, cluster centers retains numeric characteristics of the observations and uses the summarized characteristics to help make better predictions
What are the properties of principal components?
What are two applications of PCA?
What is the tradeoff of increasing the number of PCs (M) to use?
As M increases:
* cumulative PVE increases
* dimension increases
* (if y exists) model complexity increases
How can we choose the number of principal components (M) to use?
Why might we use complete and average linkage over single and centroid?
List two ways you can perform feature generation using PCA