What is unsupervised machine learning?
Unsupervised machine learning is a type of machine learning where the algorithm learns patterns and structures in the data without being provided with explicit labels or target variables.
What is K-Means clustering?
K-Means clustering is an unsupervised machine learning algorithm used for partitioning data into K clusters based on similarity. It aims to minimize the sum of squared distances between data points and their cluster centroids.
What are the advantages of K-Means clustering?
Advantages of K-Means clustering include its simplicity, scalability to large datasets, and effectiveness in identifying well-separated spherical clusters.
When should you use K-Means clustering?
K-Means clustering is suitable when the data is continuous and there is a need to partition it into distinct groups based on similarity or proximity. It is useful for:
What are the limitations of K-Means clustering?
Limitations of K-Means clustering include sensitivity to the initial placement of cluster centroids, the requirement to specify the number of clusters in advance, and the assumption of spherical clusters.
From slides:
What is DBSCAN clustering?
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised machine learning algorithm that groups data points into clusters based on density. It can find clusters of arbitrary shapes and handle outliers.
What are the advantages of DBSCAN clustering?
Advantages of DBSCAN clustering include its ability to discover clusters of various shapes, its robustness to noise and outliers, and the ability to determine the number of clusters automatically.
When should you use DBSCAN clustering?
DBSCAN clustering is suitable when the data has varying density, there are irregularly shaped clusters, and when noise or outliers need to be identified.
What are the limitations of DBSCAN clustering?
Limitations of DBSCAN clustering include sensitivity to the choice of distance parameters, difficulty in handling data with varying densities, and the potential for producing overly complex clusters.
What is hierarchical clustering?
Hierarchical clustering is an unsupervised machine learning algorithm that creates a hierarchy of clusters. It iteratively merges or divides clusters based on their similarity, forming a tree-like structure called a dendrogram.
What are the advantages of hierarchical clustering?
Advantages of hierarchical clustering include its ability to reveal the hierarchical structure of the data, its flexibility in handling different similarity measures, and the visualization provided by dendrograms.
When should you use hierarchical clustering?
Hierarchical clustering is suitable when the data has a hierarchical structure, and the goal is to explore relationships and similarities at different levels of granularity.
What are the limitations of hierarchical clustering?
Limitations of hierarchical clustering include its computational complexity for large datasets, sensitivity to the choice of distance or similarity measures, and difficulty in handling noise and outliers.
How do you determine the optimal number of clusters in K-Means clustering?
The optimal number of clusters in K-Means clustering can be determined using techniques such as the elbow method, silhouette analysis, or visual inspection of cluster quality.
What is the silhouette coefficient used for in clustering?
The silhouette coefficient is a measure of how well each data point fits into its assigned cluster in terms of both cohesion and separation. It ranges from -1 to 1, where higher values indicate better clustering quality.
What is the difference between K-Means and Hierarchical Clustering?
K-Means Clustering is a partitioning-based algorithm that requires specifying the number of clusters in advance, while Hierarchical Clustering is an agglomerative or divisive algorithm that creates a hierarchy of clusters without the need for a predetermined number of clusters.
What are the Three C’s of ML?
Three C’s of ML:
1. Collaborative filtering: is a technique for recommendations
2. Clustering: algorithms discover structure in collections of data.
3. Classification: is a form of ‘supervised’ learning.