What is soft clustering and its charcteristics
Data points can belong to multiple clusters simultaneously, with a probability representing their ownership
Useful for overlapping or ambiguous data
Provides more nuanced results
More computationally intensive and harder to interpret than hard clustering
What is Soft K-means
Distance between each point in the cluster to the centroid * U
U is a continuous value between 0 and 1 giving the probability that the point belongs to the cluster
How is the U calculated to determine the probability that the point belongs to that cluster
β is called temperature
Small β (high temperature) are very soft, every cluster gets some weight
Large β (low temperature), approaching hard k-means, assignments become nearly 0/1
How to calculate new centroid for Soft-K means
Sum all the distances * their weights and divide by the total weights
What is the guassian mixture model
A probabilistic model that assumes data is generate from a mixture of guassian distributions