What is Discretisation, and where might it be used?
Discretisation = The translation of continuous attributes into nominal attributes.
Might be used in some learners such as Decision Trees, as they generally work better with nominal attributes.
Summarise some approaches to supervised discretisation
What is Equal Width?
Equal width is an unsupervised method
1) max instance - min instance = difference
2) difference / num of buckets = width of each bucket
3) min + width, …, until finished
What is Equal Frequency?
Equal Freq. is an unsupervised method
1) for a specific attribute, sort the instances in ascending order
2) split according to how many buckets we want
3) if we need to transform new data that is added later, define the dividing point at the median
What is k-means in the context of discretisation?
K-means is a “clustering” approach, but it can work well in the context of discretisation.
How to calculate the (sample) mean?
mean of a specific attribute = 1/N (sumof(Ci))
How to calculate the standard deviation?
1) Sumof( Squaring the difference between attribute value and sample mean (Ci - Meanc) )
2) Dividing by 1 less than the number of values
3) Taking the positive square root
How could we use the MEAN and STANDARD DEVIATION when building a classifier?
Could construct a Gaussian probability density function, which would allow us to estimate the probability of observing any given value, based on counting the number of standard deviations it is from the mean (its z-score)
WTF IS A Z-SCORE
???
What is a hyperparameter? What does it mean for the model to parametrise the data? How do these relate to the model being non-parametric / parametric?
???
What are the general two steps in discretisation?
1) Decide how many values (= intervals/buckets) to map the features on to
2) Map each continuous value onto a discrete value
Pros and Cons of K-means clustering
Pros - Efficient O(tkn) n = # of instances k = # of clusters t = # of iterations normally k, t << n
Cons
Information-Based supervised discretisation
1) sort the values
2) calculate the mean information at the different breakpoints in class membership
Naïve Supervised Discretisation
“cluster” values into class-contiguous intervals
1) sort the values and identify breakpoints in class membership
2) reposition any breakpoints where there is no change in numeric values
3) set the breakpoints midway between the neighbouring values
*SIMPLE TO IMPLEMENT
What is Gaussian Distribution (aka normal distribution)
Given the mean and standard deviation of a distribution, it is possible to estimate the probability density for x via Gaussian Distribution.
Why is smoothing important in NB?
Prevents 0 probabilities from just decimating our entire thing.