What is Cluster Analysis?
is a multivariate statistical technique that groups observations on the basis some of their features or variables they are described by
observations in a dataset can be divided into different groups
example: clustering by geographic proximity
or language
or Market Segmentation
What is the goal of Cluster Analysis?
To maximize the similarity of observations within a cluster and maximize the dissimilarity between clusters
When is clustering most often used?
is often used as a preliminary step in all types of analysis
it is a useful technique for exploring and identifying patterns in the data
Data Scientists often turn to it when they have no idea where to start or what to expect
What is a key distinguishing trait of supervised leanering?
We are dealing with labeled data
What is the Euclidean distance?
What is a Centroid?
the mean position of a group of points
aka - center of mass
What does K in K-means clustering stand for?
The number of clusters
What is the proper way of selecting the number of clusters?
The elbow method
What is Clustering about?
What does WCSS stand for?
Within-cluster sum of squares
if we minimize WCSS we have reached the perfect clustering solution
What are pros of K-Means Clustering?
What are some cons of K-means Clustering?
What are the 3 Types of Analysis?
What are characteristics of Exploratory Analysis?
What are characteristics of Confirmatory and Explanatory Analysis?
using hypothesis testing and regression analysis
What are the two broad types of clustering?
What are the two types of Hierarchical Clustering?