What does Machine learning mean?
Statistical methods to enable machines to learn and improve with data
What is the main difference between machine learning and traditional programming?
Inputs of classical programming:
- dataset and algorithm
Inputs of machine learning:
- dataset and “output”
output of machine learning
- algorithm
Name some machine learning applications in medical imaging (4)
Disease / object detection
Segmentation
Registration
Image generation
Describe the overall workflow of a typical ML algorithm in the training and inference (prediction) phase
Consider classification task:
Training Phase:
(iterative learning to find the best model)
1. input images w/ labels (benign / malignant)
2. Feature extraction
3. Feature vectors
4. machine learning algorithm
Prediction Phase:
(applying the model on new data)
1. new images
2. feature extraction
3. feature vectors
4. predicted labels
a) What is a feature?
b) What does feature extraction mean in medical image analysis?
a) set of attributes
ex. intensity, shape, texture
b) process of generating attributes
- converts raw image data into interpret-able and actionable info for machine learning
a) Define texture features. Give some examples.
b) Do texture features provide relative position information?
a) spatial distribution of grey levels over the pixels in an image
- used to measure first order statistics derived from first order histogram
ex.
- mean
- variance
- sd
- skewness (measure of asymmetry)
- kurtosis (measure of tailedness of distribution - how often outliers occur)
- measure of smoothness
- uniformity
- entropy
b) no info about the relative position of various grey levels within the image
a) What is feature normalization? Why is it needed?
b) Name some of the common feature normalization techniques
a) process of transforming each numeric input variable so its values lie on a comparable scale
- allows models to treat features more equitably during learning
- prevents bias in proximity measures
- prevents distortion of penalty effects and model interpretation
b)
Z-score normalizaiton
(substracts mean and divides by sd so feature has mean 0 and variance 1)
min-max normalization
(changes dynamic range)
linear scaling to unit range
(squash range AND keep linear structure)
softMax scaling
(squash range but does not keep linearity )
What are the three realms of ML?
What is the main difference between supervised and unsupervised learning?
Unsupervised: learn w/o labels
Supervised: learn w/ labels on training data
Name some applications of unsupervised learning in medical image analysis
a) What does clustering mean?
b) What are the clustering essentials
a) aggregate samples (unlabeled data) into groups
- membership to a group determined by similarity metric or distance
b)
1. Proximity measure
- similarity / dissimilarity (distance)
a) What does proximity measure do?
b) What does the criterion function do?
c) what are the common distance definitions?
a)
computes a numeric value to reflect how close two objects are in feature space
- small dissimilarity / large similarity –> points belong to same cluster
b)
(aka objective function)
evaluates quality of clusters by aggregating pairwise proximity or clusters statistics into a single score
- clustering algorithms want to optimize this function
c)
Euclidean distance
- straight-line distance in feature space
d(xa,xb) = (sum(xa^k - xb^k) ^2) ^(1/2)
Manhattan distance
- sum of absolute differences
d(xa,xb) = sum | xa^k - xb^k |
similarity:
sim(x1, x2) = 1 / [dist(x1,x2)]
a) How can a cluster be evaluated
b) What is measured for assessing the compactness of a cluster?
c) What is measured to evaluate cluster separation?
a) compactness and separation
b) Compactness: intra-cluster cohesion
- how near the cluster data points are to the cluster centroid
- sum of squared error
c) Separation: inter-cluster separation
- how separated different cluster centroids are w.r.t each other
a) How does k-means algorithm work?
b) What is the main objective of this algorithm?
a)
K-means clustering:
- partitional: groups data into K clusters (K is user defined)
- centroid-based: cluster has a center
b) Minimize the total variance within each cluster by updating cluster centorids
i. initalize centroids of random clusters
ii calculate data pts dist to centroids
iii. group objects based on min dist
iv. move centroids to minimize compactness
How can you optimize a k-means algorithm? What are the variables to change to improve the performance?
How do you define the optimum value for the number of clusters - describe the approach
a) What is the concept of hierarchical clustering?
b) What does linkage mean?
c) What are the main categories of hierarchical clustering?
d) How do you interpret dendrograms?
a)
Divide the dataset into a seq of nested partitions (ex. dendrogram - tree of clusters)
- work with any distance matrix cluster
- sensitive to outliers
- computationally expensive with large datasets
b)
dissimilarity between the pairs of observations (user defines linkage criterion)
- “samples that belong to the child cluster also belong to the parent cluster” –> small clusters are part of a big cluster
c)
single linkage:
- dist between closest points
centroid linkage:
- dist between cluster centroids
complete linkage:
- dist between furthers points
average linkage:
- distance between all points
d)
observations that fuse at:
bottom –> similar to each other
top –> diff from each other
Similarity of 2 observations based on location of vertical axis
- what is the height at which the branches containing the observations first fuse
What is the primary goal of the k-means algorithm during its iterative process?
Minimize the total variance within each cluster by updating cluster centorids