t/f: data mining has one definition
false, has many
data mining
origins of data mining
ideas come from many disciplines including machine learning/AI, pattern recognition, statistics, and database systems
compared to data mining, traditional techniques may be unsuitable due to
types of data mining algorithms
supervised algorithms (classification) and unsupervised algorithms (clustering)
supervised algorithms
unsupervised algorithms
approaches for supervised (classification)
classification: description
classification goal
previously unseen records should be assigned to a class as accurately as possible
classification: a ____ is used to determine the accuracy of the model.
test set
- usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it
classification approach: k-Nearest Neighbor
look at characteristics/attributes
“if it walks like a duck and quacks like a duck, then it’s probably a duck”
nearest neighbor requires 3 things
to classify an unknown record
choosing the value of k
if k is too small, model is sensitive to noise.
if k is too large, neighborhood may include too many points from other classes
unsupervised algorithm: clustering
given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that
- data points in one cluster are more similar to one another
- data points in separate clusters are less similar to one another
clustering similarity measures
examples of classification
examples of clustering