3: Machine Learning Flashcards

Question 1

Q

Data are shuffled randomly and then divided into k equal subsamples.
One sample is saved to be used as validation sample, and the other k-1 samples are used as training samples

Answer

A

K-fold cross validation

Question 2

Q

Technique of combining predictions from a number of models, with the objective of canceling out noise

Answer

A

Ensemble Learning

Results in: more accuracy & stable predictions (vs single model)

Question 3

Q

Nodes connected by links
Useful in: Supervised Regression & Classification models
Works well in presence of: nonlinearities & complex interactions among variables
Recognizes: patterns, clusters, and classifies

Answer

A

Neural Networks

Question 4

Q

Unsupervised Neural Networks with many hidden layers (often >20), and reinforcorced learning learn from their own prediction errors

Used for: complex tasks; image, pattern, & character recognition

Answer

A

Deep Learning Networks

Question 5

Q

Algorithm learns from success & mistakes
Seeking to maximize reward and minimize punishment
Defined constraints

Answer

A

Reinforcement Learning

Question 6

Q

Inputs & outputs are identified for the computer, and the algorithm uses this labeled training data to model relationships

Answer

A

Supervised Learning

Question 7

Q

Computer is provided unlabeled data that the algorithm uses to determine the structure of the data

Answer

A

Unsupervised Data

Question 8

Q

Least Absolute Shrinkage and Selection Operator (LASSO) is useful in building:

Penalized regression model

Answer

A

Parsimonious models, through feature reduction

Question 9

Q

K-Nearest Neighbor, investment application includes:

Used in: classification & regression

Answer

A

predicting bankrupcty
assigning bond ratings class
predicting stock prices
creating customized indicies

Question 10

Q

Random Forest investment applications include:

Answer

A

factor based asset allocation
prediction models for IPO success

Question 11

Q

Linear relationships

A penalized regression model tries to use a limited number of most important features that…

Answer

A

explain the variation in the dependent variable

Example: monthly returns on 100 stocks

Question 12

Q

Overfitting occurs when:
Bias error:
Variance error:

Answer

A

when model fits the training too well
Bias error: low
Variance error: high

displaying non linear characteristics

Question 13

Q

Generalize is the degree to which the model retains it’s explanatory power when:

Answer

A

predicting out of sample

Question 14

Q

Bias error is the degree to which:

Answer

A

the model fits the training data

Question 15

Q

Variance error shows how much the model responds to:

Question 16

Q

How to prevent overfitting:

Answer

Study These Flashcards

A

don’t let model become too complex
proper data sampling using cross validation (k-fold)

Question 17

Q

Complexity Reduction:

Answer

Study These Flashcards

A

Dimensional Reduction
Use: PCA

Question 18

Q

With supervised data, the training data contains:

Answer

Study These Flashcards

A

ground truth

Question 19

Q

Supervised ML algorithm

Classification focuses on sorting observation into:

Answer

Study These Flashcards

A

distinct categories:
* pass or failure

Question 20

Q

Regression based uses:

Answer

Study These Flashcards

A

continuous variables

Question 21

Q

Regression:

CART & Random forests are used for:

Answer

Study These Flashcards

A

complex & non-linear

Question 22

Q

Classified unsupervised data:

K-means is used for:

Answer

Study These Flashcards

A

complex & linear data
with a known number of k clusters

3: Machine Learning Flashcards

(22 cards)