receiver operating characteristic (ROC) graph
usually for ranking classifiers (usually binary); for accepting n most likely classifications (as “positive”), over all the test set of size N, create confusion matrix; plot false positive rate on x axis (N-n divided by total negative in the whole set), and true positive rate on y axis (n divided by total positive in the whole set); plot over all acceptable n
features:
ROC space details:
profit curve
with a ranking classifier, create confusion matrix for accepting n most likely correct category classifications; compute profit/loss from the confusion matrix; plot profit/loss as a function of n
2x2 classification table
frequency matrix for binary classification problems
usually,
predictions are on rows: (1) positive, (2) negative
true classes are on columns: (1) positive, (2) negative
rates are column-based:
confusion matrix
a frequency matrix for classification problems; each row a model (class) prediction and each column the actual class; the closer to diagonal the matrix is, the better the model; useful for imbalanced classes, giving more information re accuracy
learning curve
for a given model and a fixed holdout set size, plot the model accuracy as a function of training set data size; typically plateaus as marginal gain of more data goes to 0
gini coefficient
a general measure of dispersion, as area between Lorenz curve and diagonal line; eg plot the cumulative holdings of wealth by the population, with population ordered in increasing order of wealthiness–if everyone had same wealth, g.c.=0
fitting graph
typically x axis is “model complexity” and y axis is model accuracy on (a) training data and (b) holdout data; “sweet spot” is where training data and holdout data plots are about to diverge away from each other–where training data starts to get increasingly accurate (overfitting), and holdout accuracy starts to plunge
cumulative response curve and lift curve
for a ranked classifier at cutoff n with test set of size N, plots the true positive rate on the y-axis (n divided by total number of positives in the test set), against the proportion of the population that is considered in the class of relevance (i.e. n/N)
features:
dendogram
a 2-D visualization for progressive clustering; instances are on the x-axis, and the degree of clustering (low to high) is on the y-axis; the instances are ordered so that initial clusters are immediate neighbors, recursing on this ordering scheme as clustering is increased (i.e. at a given height / level on the dendogram, the ordering scheme applies to subgroups of instances)
entropy graph
re segmentation and information gain–a visualization of the weighted-sum-of-entropies resulting from any given segmentation scheme–each segment occupies a proportion (0 to 1) on the x axis, the segment’s height is the classification entropy (so a kind of bar plot); low height means low entropy (so “good” classification for that segment)
scree plot
used (at least) in context of PCA, showing the percent of total variance as a function of the number of (leading) PCA components retained; so it allows figuring out how many PCA components to retain for modeling purposes
calibration plot / reliability diagram
for checking performance of probabilistic classification models
for k classes, pick the class of interest, C (one plot per class)
define a bin as a probability range [p-low,p-high]
group all instances in the test set with class C predicted probability in [p-low,p-high] into set S
count the number, n, of instances in S that are actually of class C
n / |S| should be approximately within [p-low,p-high]
calibration histograms / heat maps
for checking performance of probabilistic classification models
for 2 classes
* group test set into true positive and true negative outcomes
* for each group, plot histogram of probability predictions for (say) negative outcomes
* the true positive histogram should be skewed toward 0 (no probability of negative outcome), and the true positive histogram should be skewed toward 1
for > 2 classes
* construct a per-instance heat map, usually with eg rows grouped by true class
* for each instance, each of k categories gets a color/intensity, reflecting probability
* for each instance true class group, probabilities should be clustered around the given class
scatterplot matrix
shows pairwise correlations between all numeric predictors; (note a feature plot may include scatterplots, but is more general)
predictor plot
plots each predictor against target variable (varies depending on categoric / numeric types)
classifier probability plot
for categorical outcomes, traceplane bar plot, by predictor, of frequency in each outcome factor level
mosaic plot
a categorical predictor vs categorical outcome trace plane plot; shows instance frequencies over the discrete 2D space
volcano plot
scree plot
fundamental to PCA, showing total variance accounted for as a function of (ordered) PCA components
added variable plot
calibration plots