Quantitative Methods Flashcards

Question

Covariance stationary | What is it? What are its implications on a time series?

Answer 1

The property of a time series in which: 1.) The expected value is constant & finite in all periods 2.) The variance of the time series is constant & finite in all periods 3.) The covariance of the time series with itself for a fixed number of periods in the past/future must be constant & finite in all periods A time series must be covariance stationary in order for an AR model to yield inferences which are statistically sound.

Answer 2

Correlation between observations in a time series separated by k periods.

Answer 3

Sample autocorrelation of an error term used to deduce whether the errors of an AR model are correlated.

Answer 4

The property of a time series in which the value of the dependent variable returns to its average in the long term. Mean-reverting level of an AR(1) model = b0 / (1-b1)

Answer 5

An estimation method in which the next period's value as predicted by the forecasting equation is substituted into the right-hand side of the equation in order to deduce the value two periods ahead.

Answer 6

A dummy variable that functions to steepen or flatten the regression line.

Answer 7

The residuals part of the sample that was used to fit the model.

Answer 8

The residuals of forecasts outside of the sample that was used to fit the model.

Answer 9

The criterion used to quantify and compare the out-of-sample forecast performances of models against each other.

Answer 10

A set of technological, legal, political and regulatory characteristics that defines an economic environment over a particular timeframe.

Answer 11

A time-series in which the value in the current period is equal to the value in the past period plus some random error.

Answer 12

A regression transformation in which the value at time t -1 is subtracted from the value at time t; used to make time-series covariance stationary.

Answer 13

A time series that is non-covariance-stationary is said to have a unit root.

Answer 14

A hypothesis test used to determine whether a time series contains a unit root. The test uses a modified t-table. H₀: g₁ = 0, thus time series has a unit root. Hₐ: g₁ > 0, thus time series does not have a unit root. Rejecting H₀ means that the model is covariance stationary. Failing to reject H₀ means that the model is non-covariance-stationary.

Answer 15

A characteristic of a time series in which a pattern emerges within a particular smaller timeframe. To test for seasonality in a time series: 1. Check the autocorrelations of each lag's residuals. 2. If one of the autocorrelations seems high, test if it is significantly different from zero by calculating its t-statistic. 3. If the t-statistic rejects the null hypothesis that autocorrelation is zero, this lag contains seasonality.

Answer 16

The value of a time series one year before the current period, included as an extra term to correct seasonality.

Answer 17

A characteristic in an autoregressive model in which the residuals are correlated with time. To test for ARCH, regress the original AR model on its squared residuals, resulting in êₜ² = a₀+a₁êₜ₋₁²+uₜ. Then, perform the following hypothesis test: H₀: a₁ = 0 Hₐ: a₁ > 0 Rejecting H₀ means that the AR model is ARCH and must be re-estimated using GLS. Failing to reject H₀ means that the AR model is not ARCH.

Answer 18

The characteristic of a linear regression with two time series in which both series are correlated and thus do not diverge without bound in the long run

Answer 19

A hypothesis test used to determine whether two time series in a linear regression are cointegrated. It is performed by verifying whether eₜ in y = b₀ + b₁x₁ + eₜ contains a unit root. H₀: eₜ contains a unit root and thus is non-covariance-stationary Hₐ: eₜ does not contain a unit root and thus is covariance stationary Rejecting H₀ means that y and x₁ are cointegrated. Failing to reject H₀ means that y and x₁ are not cointegrated.

Answer 20

An ML algorithm using penalized regression with a hyperparameter set to minimize the sum of the absolute values of the regression coefficients; this is a supervised algorithm best suited for regression problems.

Answer 21

A value set by a researcher before machine learning begins.

Answer 22

The reduction of statistical variability in high-dimensional data estimation problems.

Answer 23

An ML algorithm characterized by a linear classifier that aims to maximize the distance between two groups of data; this is a supervised algorithm best suited for classification problems.

Answer 24

A vector used to separate groups of data based on the features of each data point

Answer 25

A modification of the support vector machine as to both maximize the distance between the two groups of data and minimize misclassification of data points.

Answer 26

An ML algorithm which classifies the new data point based on the similarity of its characteristics to k existing data points, where k is a hyperparameter; this is a supervised algorithm used mostly for classification problems, but can also be used for regression.

Answer 27

An ML algorithm that can be applied to predict either a categorical target variable, creating a classification tree, or a continuous target variable, creating a regression tree.

Answer 28

A method that combines multiple models to reach a more accurate classification or regression.

Answer 29

A method that combines multiple ML algorithms to reach a more accurate classification or regression.

Answer 30

A method that assigns to a new data point the label with the most votes from the ensemble.

Answer 31

A technique whereby the original training dataset is used to generate n new training datasets (bags) of data; each new dataset is generated by random sampling with replacement from the original dataset.

Answer 32

A collection of a large number of trees trained through a bagging method

Answer 33

The ratio of correctly predicted positive classes to all predicted positive classes. Precision = TP / (TP + FP)

Answer 34

The ratio of correctly predicted positive classes to all actual positive classes. Recall = TP / (TP + FN)

Answer 35

A system in which regression coefficients are chosen as to minimize the combination of sum of squares error and penalty, the latter of which is imposed on each instance of an independent variable’s addition to the model.

Answer 36

Regularization of CART in which sections that do not hold sufficient classification/regression power are removed from the tree.

Answer 37

A type of dimension reduction that converts many correlated variables into fewer, uncorrelated composite variables.

Answer 38

Algorithms which are trained to classify or regress data using a labeled dataset; includes penalized regression, LASSO, SVM, KNN, CART, ensemble learning and random forest.

Answer 39

Algorithms which are trained to classify or regress data by finding patterns within the data itself; includes dimension reduction and clustering.

Answer 40

An ML algorithm which aims to represent a dataset with many correlated features to one represented by fewer features that maintain their explanatory power.

Answer 41

A variable that is formed of multiple variables that are statistically strongly correlated with each other; they are represented by eigenvectors that each have an eigenvalue corresponding to their power in explaining the initial dataset.

Answer 42

The vertical (perpendicular) distance of a data point from a principal component; PCA aims to minimize the sum of this error across all data points.

Answer 43

The horizontal (parallel) distance between data points in a PCA; PCA aims to maximize the sum of spreads across all data points.

Answer 44

A plot that shows the total variance of the data explained by each principal component.

Answer 45

An ML algorithm that organizes data points into subsets called clusters, in which all points within a cluster are deemed similar; can be k-means or hierarchical.

Answer 46

Clustering that partitions observations into k clusters; works well for large datasets, but k is a hyperparameter that must be estimated beforehand.

Answer 47

The centre of a cluster formed using k-means clustering

Answer 48

Clustering that creates intermediate clusters that increase (agglomerative) or decrease (divisive) in size until a final clustering is reached; agglomerative clusters compute quicker and are better suited for large datasets, while divisive clusters are better suited for large clusters

Answer 49

A type of tree diagram used to outline the results of hierarchical clustering at each iteration.

Answer 50

1. Conceptualization 2. Data collection 3. Data preparation (cleansing) and preprocessing (wrangling) 4. Data exploration 5. Model training

Answer 51

1. Text problem formulation 2. Text curation 3. Text preparation (cleansing) and preprocessing (wrangling) 4. Text exploration 5. Model training

Answer 52

The employment of a program to scour external data sources (usually websites) in order to collect raw textual information.

Answer 53

Files containing instructions on how to use a given piece of raw data.

Answer 54

A set of well-defined methods that allow softwares to communicate; used to deliver data

Answer 55

The correction of invalid, inaccurate, inconsistent, incomplete, non-uniform or duplicate data before its use as an input to the preprocessing stage.

Answer 56

1. Transformation (extracting, aggregating, filtering, selecting, converting) 2. Outlier removal (trimming, winsorization) 3. Scaling (normalization, standardization)

Answer 57

Data that provides information about other data.

Answer 58

The removal of k% of highest and lowest values from a dataset.

Answer 59

The replacement of outliers with maximum and minimum values which are not outliers.

Answer 60

Adjusting the range of data by shifting and changing the scale of the data; normalization and standardization are two types of scaling

Answer 61

A type of scaling in which numbers are rescaled to fit in the range [0, 1]; this scaling method is sensitive to outliers. Xnorm = (X - Xmin)/(Xmax-Xmin)

Answer 62

A type of scaling in which numerical data is centered around a mean of 0; data must follow a normal distribution in order for it to be standardized. Xstandard = (X - mean)/standard deviation

Answer 63

The functional part of a neural network's node that multiplies each input value by its respective weight and sums the weighted values to form the total net input, which is then passed to the activation function.

Answer 64

The functional part of a neural network's node that receives the total net input from the summation operator and transforms it into the final output of the node; operates akin to a dimmer switch that in/decreases the strength of the output.

Answer 65

The method of adjusting weights in a neural network to minimize its error by moving forward through the network.

Answer 66

The method of adjusting weights in a neural network to minimize its error by moving backward through the network.

Answer 67

New weight = Old weight - (Learning rate x partial derivative of the total error with respect to time), where learning rate is a hyperparameter.

Answer 68

A neural network with at least two hidden layers.

Answer 69

Machine learning in which the algorithm learns from its past outputs or from interacting with itself.

Answer 70

A series of particular characters in order used to find patterns in a body of text.

Answer 71

The removal of html tags, punctuation, numbers, and whitespaces from a body of text in order to prepare it for preprocessing.

Answer 72

1. Tokenization 2. Normalization (lowercasing, removing stop words, stemming, lemmatization) 3. Bag-of-words 4. Document term matrix

Answer 73

The process of splitting a body of text into separate words, or tokens.

Answer 74

A collection of a distinct set of tokens from all texts in a sample dataset.

Answer 75

A two-dimensional representation in which each column represents a token from the BOW, each row represents the name of a text, and each intersection represents the number of appearances of that token in that text.

Answer 76

The representation of a sequence of words aggregated into a single token.

Answer 77

The first step in data exploration; it requires a high degree of collaboration with other departments in order to understand relationships between data points in order to create graphs, charts and other visualizations, as well as analyze descriptive statistics.

Answer 78

The second stap in data exploration; a process in which only the features most relevant for model training are included in the dataset in order to prevent overfitting and model overcomplication.

Answer 79

The third and final step in data exploration; a process in which new features are created by changing or transforming existing features.

Answer 80

The process in which categorical variables are transformed into binary form for machine reading.

Answer 81

TF = frequency of token in a dataset / total number of tokens in a dataset

Answer 82

1. Frequency analysis 2. Chi-squared test 3. Mutual information (MI)

Answer 83

DF = number of documents (i.e. sentences, texts) in a dataset containing a token / total number of documents in the dataset

Answer 84

A statistical method used to determine the independence of two events: the occurrence of a token and the occurrence of a class; a token with a high chi-squared statistic indicates higher discriminatory potential.

Answer 85

A measure of the tendency of a particular token in appearing in a specific text versus appearing across all texts uniformly; MI of 0 indicates uniformity, MI of 1 indicates high bias and thus discriminatory potential.

Answer 86

1. Numbers 2. N-grams 3. Name entity recognition 4. Parts-of-speech

Answer 87

An algorithm that tags a token to an element of a sentence (e.g. noun, verb) based on the words surrounding it.

Answer 88

An algorithm that tags a token to an object class (e.g. organization, year, name) based on the words surrounding it.

Answer 89

1. Method selection 2. Performance evaluation 3. Tuning

Answer 90

1. Type of machine learning (supervised vs unsupervised) 2. Type of data 3. Size of data

Answer 91

The known outcome of a target variable available, characteristic of supervised ML.

Answer 92

An event in which the number of data points belonging to one class greatly outnumbers those belonging to another class; can be alleviated by oversampling the underrepresented class and undersampling the overrepresented class.

Answer 93

A dummy variable that functions to raise or lower the regression line parallel to the original regression.

Answer 94

Model error due to randomness in the data.

Answer 95

Describes the degree to which the model fits the training data; high bias error can be caused by erroneous assumptions and will lead to underfitting and high in-sample error.

Answer 96

Refers to the number of dimensions/features/parameters in the data and whether they are linear or non-linear.

Answer 97

The process of estimating out-of-sample error directly by determining the error in validation samples.

Answer 98

An ML algorithm that uses neural networks to find patterns in highly complex data.

Answer 99

A tree diagram used to visualize hierarchical clustering.

Answer 100

The harmonic mean of precision and recall; F1 score is a better performance measure than accuracy when there is class imbalance. F1 score = 2PR / (P+R)

Answer 101

The ratio of correctly predicted classes to total predictions; an overall performance metric for classification problems. Accuracy = (TP + TN) / (TP + FP + TN + FN)

Answer 102

The independent variables in a labeled dataset.

Answer 103

A curve that shows in- and out-of-sample error rates on the y-axis plotted against model complexity on the x-axis.

Answer 104

The retention of a model's explanatory power when performing out-of-sample.

Answer 105

The dependent variable in a labeled dataset.

Answer 106

A grid with actual classes on the x-axis and predicted classes on the y-axis used to evaluate Type I and Type II error rates, as well as correct predictions. Top left cell -> True positives Top right cell -> False positives (Type I Error) Bottom left cell -> False negatives (Type II Error) Bottom right cell -> True negatives

Answer 107

The measurement of model performance for goodness of fit. The three performance evaluation methods are as follows: 1. Error analysis 2. Receiver operating characteristic 3. RMSE

Answer 108

A curve illustrating the trade-off between the false positive error (x-axis) and true positive rate (y-axis) for various cutoff points.

Answer 109

A process performed on an ML model that aims to achieve the optimal parameters and hyperparameters that neither underfits nor overfits the model, and thus involves optimizing the bias-variance error tradeoff; two tuning methods are grid search and ceiling analysis.

Answer 110

A tuning method in which different combinations of hyperparameters are applied to an ML model until the best model is found.

Answer 111

An assessment of the pipeline of ML model development to locate at which step the model requires tuning.

Answer 112

A collection of text data in any form.

Answer 113

The number of characters, including spaces, in a sentence.

Answer 114

The process of determining how important certain tokens are in a sentence and the corpus as a whole. Frequency analysis measures include: 1. Term frequency 2. Document frequency 3. Collection frequency 4. Inverse document frequency

Answer 115

Number of instances of a token in a corpus / Number of tokens in the corpus

Answer 116

A relative measure of how unique a token is across the corpus. IDF = log(1/DF)

Answer 117

An overall measure of the value of a token across the entire dataset. TF-IDF = TF x IDF A token with a higher TF-IDF appears more frequently throughout a small number of documents. A token with a lower TF-IDF appears across many documents.

Answer 118

Data samples that are not used to train the model.

Answer 119

A method in which data is shuffled and divided randomly into k equal subsamples, in which k-1 subsamples are used as the training sample and the final, the kth, is used as a validation sample; used to mitigate the effect of holdout samples on shrinking the sample excessively

Quantitative Methods Flashcards

Basics of multiple regression and underlying assumptions, evaluating regression model fit and interpreting model results, model misspecification, extensions of multiple regression, time-series analysis, machine learning, and big data projects (143 cards)