Unlabeled Data
any data that’s not organized in an easily identifiable manner is known as unstructured/unlabeled
Goals of Unsupervised Learning
Goal is to learn about data’s underlying structure and find out how different features relate to each other.
Name 2 Methodologies of unsupervised learning
Briefly describe a Recommendation system
Recommendation systems are a
- subclass of machine learning algorithms that
- can be both supervised or unsupervised
- offer relevant suggestions to users.
What is the goal of a recommendation system?
to quantify how similar one thing is to another, and use this information to suggest a closely related option.
What is content-based filtering?
Content-based filtering is a type of recommendation system where comparisons are made based on the attributes of the content itself.
For example, attributes of a song you played are compared to attributes of other songs to determine similarity.
What are some benefits of content-based filtering?
What are some drawbacks of content-based filtering?
What is collaborative filtering?
Collaborative filtering is a type of recommendation system that uses the likes and dislikes of users to make recommendations.
It does not need to know anything about the content itself. All that matters is if the user liked it.
What are some benefits of collaborative filtering?
The benefits include the ability to
- recommend across content types,
- finding hidden correlations in the data, and
- not requiring tedious manual mapping.
What are some drawbacks of collaborative filtering?
Drawbacks include
- needing lots of data to even start getting useful results,
- requiring every user to give the system lots of data, and
- dealing with sparse data that has a lot of missing values.
What type of model is K-means and what does it do?
What is a Centroid?
List the 4 steps to build K-means model
What is the difference between Clustering and Partitioning Algorithms
clustering algorithms: outlying points can exist outside of the clusters.
partitioning algorithms: all points must be assigned to a cluster.
in other words, K-means does not allow unassigned outliers
What is k in Initiate k centroids step?
K = the number of centroids in your model, which is how many clusters you’ll have.
Who makes decision on k = #?
you
How to choose k value?
sometimes known, for instance, if there’s 3 species of beetle to cluster, then k=3. sometimes unknown.
Name 2 other clustering methodologies
What cluster shape does K-mean work best with?
round clusters
DBSCAN (density-based spatial clustering of applications with noise)
DBSCAN Hyperparameters
epsilon, min_samples
DBSCAN: eps, Epsilon (ε)
The radius of your search area from any given point
DBSCAN: min_samples
the number of samples in an ε-neighborhood for a point to be considered a core point (including itself)