ML Flashcards

Question

True or false: AI is good at solving a specific problem based on data but finds it easier to transfer techniques to similar problems.

Answer 1

FALSE ## Footnote AI struggles to transfer techniques to similar problems.

Answer 2

Machine learning ## Footnote Machine learning uses large datasets processed and labeled by humans.

Answer 3

Synonyms for the variable you are interested in predicting ## Footnote Examples include categorical variables like 'car' or 'no car'.

Answer 4

An estimator fitted to the model that classifies response variables into predefined classes ## Footnote It can be used for binary or multiple classifications.

Answer 5

The iterative process of fitting a line to a numeric data set ## Footnote It predicts future numbers based on past data.

Answer 6

classes ## Footnote An example is analyzing past customer purchase patterns.

Answer 7

To reduce discrepancies between known and model estimates ## Footnote It applies weighting to improve prediction accuracy.

Answer 8

A set of instructions used to analyze input data and predict output values ## Footnote Algorithms learn without explicit programming as they are exposed to more data.

Answer 9

The output provided by a trained machine learning model based on input data ## Footnote Predictions can be used in decision-making processes or as inputs for other systems.

Answer 10

Factors on which the response or prediction variable depends ## Footnote Features are also known as attributes.

Answer 11

Features refer to form, shape, or proportion; attributes refer to characteristics or qualities ## Footnote Both are used to describe data in machine learning.

Answer 12

A single data point or a sample ## Footnote It can also be referred to as an example or observation.

Answer 13

To help refine algorithms and reduce errors in model predictions ## Footnote They work with data to make predictions or classify items.

Answer 14

TRUE ## Footnote This allows for more sophisticated predictions.

Answer 15

To gain new abilities or knowledge through the analysis of prior data ## Footnote This involves utilizing various methods to analyze and interpret data.

Answer 16

Supervised learning ## Footnote It is often the first method encountered in workplace applications.

Answer 17

* Features * Label variable (target) ## Footnote Features are the input variables, while the label is what we aim to predict.

Answer 18

The variable we aim to predict ## Footnote In the provided example, the label is the Review.

Answer 19

Training the algorithm ## Footnote This involves finding patterns in the training data.

Answer 20

Classification ## Footnote An example is determining if an image contains a car.

Answer 21

Predicting waiter tip values in a restaurant ## Footnote Regression tasks involve predicting continuous values.

Answer 22

Only features provided without any label or target ## Footnote The algorithm discovers patterns in the data without correct answers.

Answer 23

Grouping data by similarity ## Footnote This helps in identifying patterns without predefined labels.

Answer 24

To find relationships between variables ## Footnote An example is determining if buying X leads to buying Y.

Answer 25

Computers generate vast amounts of unlabelled data ## Footnote Labeling all data in advance is often impractical.

Answer 26

No guarantee that the model will output relevant results ## Footnote The algorithm works based on its internal rules without true answers.

Answer 27

A mix of labeled and unlabeled data ## Footnote It occurs when some data points are labeled while others are not.

Answer 28

* Drop unlabeled examples and use supervised learning * Predict labels for unlabeled examples * Use only input variables in an unsupervised model ## Footnote Each option has its own implications for model performance.

Answer 29

An agent learns through punishment or reward in an environment ## Footnote The agent discovers which actions lead to achieving its goal.

Answer 30

The current world or environment of the task ## Footnote It is crucial for the agent's decision-making process.

Answer 31

An agent playing noughts and crosses (tic-tac-toe) ## Footnote The agent learns from sequences of actions and their outcomes.

Answer 32

* Self-supervised * Multi-instance * Inductive * Deductive * Transductive * Multi-task * Active * Online * Transfer * Ensemble ## Footnote These methods may appear in literature or workplace applications.

Answer 33

To analyze large quantities of data and identify patterns ## Footnote This capability exceeds human analytical abilities.

Answer 34

A recipe or set of instructions that takes inputs and returns an output ## Footnote It learns patterns from historical data to make predictions on new unseen data.

Answer 35

Algorithms ## Footnote The type of algorithm used will impact the result.

Answer 36

Paw Size ## Footnote The y-axis represents weight.

Answer 37

The output of the algorithm ## Footnote It indicates how the ML model makes predictions.

Answer 38

* The line itself (straight or squiggly) * Its placement in the graph ## Footnote These aspects visually represent how the model classifies data.

Answer 39

A sequence of tasks performed when training a machine learning model ## Footnote Typical tasks include data validation, model training, model tuning, and model deployment.

Answer 40

The ML model ## Footnote The preceding steps prepare the data for the model.

Answer 41

* Regression * Classification * Clustering ## Footnote Regression and Classification are used in supervised learning, while Clustering is used in unsupervised learning.

Answer 42

When the target variable is a continuous numerical value ## Footnote This often involves complex relationships between multiple attributes.

Answer 43

Classification problem ## Footnote Classes can be strings, numbers, or Booleans, and must be nominal or ordinal values.

Answer 44

Classification problem with more than 2 classes ## Footnote The type of classification algorithm used depends on the number of classes.

Answer 45

Clustering ## Footnote It groups data points based on their similarities.

Answer 46

Groups them based on similar properties ## Footnote Dissimilar data points are placed in separate clusters.

Answer 47

A free open-source machine learning library written in Python ## Footnote It contains tools for predictive data analysis and is built upon NumPy, SciPy, and Matplotlib.

Answer 48

sklearn ## Footnote This alias is commonly used for convenience in coding.

Answer 49

* Classification * Regression * Clustering * Data processing * Dimensionality reduction * Feature engineering * Feature scaling * Feature selection * Tuning model hyperparameters * Creating ML pipelines * Evaluating model performance ## Footnote These functionalities make it a comprehensive tool for machine learning tasks.

Answer 50

A mathematical function that learns from data ## Footnote Examples include machine learning models like random forests or linear regression.

Answer 51

* Predictors * Transformers ## Footnote Each type has different methods for learning and transforming data.

Answer 52

* .fit() * .predict() ## Footnote These methods allow the model to learn patterns from data and make predictions.

Answer 53

* .fit() * .transform() ## Footnote These methods allow the estimator to learn from data and transform it into a better distribution.

Answer 54

A sequence of tasks arranged into a single function ## Footnote It typically includes data preparation steps followed by the ML model.

Answer 55

It is a centralized and complete library for conventional ML ## Footnote It provides effective modules that assist from development through to deploying ML pipelines.

Answer 56

Machine Learning ## Footnote This field focuses on algorithms and statistical models that enable computers to perform tasks without explicit instructions.

Answer 57

Logistic Regression ## Footnote Logistic regression is widely used for predicting binary outcomes.

Answer 58

A mathematical function that learns from data ## Footnote Estimators are fundamental components in Scikit-learn for building machine learning models.

Answer 59

To learn from rewards and punishments ## Footnote This approach allows agents to learn optimal behaviors through trial and error.

Answer 60

Median value of owner-occupied homes ## Footnote The target variable is what the model aims to predict.

Answer 61

Grouping data without predefined classes ## Footnote Clustering is an unsupervised learning technique.

Answer 62

The sensitivity to positive instances ## Footnote Recall measures the ability of a model to find all relevant cases.

Answer 63

Regression ## Footnote Regression algorithms predict continuous outcomes.

Answer 64

The factors on which the response variable depends ## Footnote Features are the input variables used for making predictions.

Answer 65

Regression ## Footnote Regression analysis is a key technique for prediction.

Answer 66

Split the dataset into a train and test set ## Footnote This step is crucial for evaluating model performance.

Answer 67

To classify response variables into predefined classes ## Footnote Classifiers are used in supervised learning to categorize data.

Answer 68

Reasoning ## Footnote AI systems aim to replicate cognitive functions.

Answer 69

Supervised Learning ## Footnote This approach uses input-output pairs for training.

Answer 70

R-squared Score ## Footnote R-squared indicates the proportion of variance explained by the model.

Answer 71

Clustering ## Footnote Clustering is a fundamental concept in unsupervised learning.

Answer 72

Mimic human intelligence ## Footnote AI aims to replicate cognitive functions and decision-making processes.

Answer 73

Modelling the relationship between a dependent variable and one or more independent variables ## Footnote Linear regression is used to predict outcomes based on linear relationships.

Answer 74

Train a model, evaluate its performance, and use the model for prediction ## Footnote Pipelines streamline the workflow in machine learning projects.

Answer 75

Label ## Footnote Labels are the outcomes that models aim to predict.

Answer 76

Whether a system acts like a human ## Footnote The Turing test assesses a machine's ability to exhibit intelligent behavior.

Answer 77

* Supervised learning * Unsupervised learning ## Footnote Supervised learning involves labeled data, while unsupervised learning works with unlabeled data.

Answer 78

Y = mX + c ## Footnote In ML terms: β₀ (beta zero) is the intercept (c) and β₁ (beta one) is the slope (m).

Answer 79

* Predicting a number * Examples: House prices, Revenue, Temperature, Sales ## Footnote It draws a straight line that best fits the data and minimizes error.

Answer 80

Logistic Regression outputs a probability (0 to 1), not a raw number ## Footnote Logistic function produces an S-shaped curve and is used for binary classification.

Answer 81

* Pass / Fail * Spam / Not spam * Yes / No ## Footnote Logistic regression predicts probability, then you choose how to classify it.

Answer 82

A flowchart of yes/no questions ## Footnote Each question splits the data, starting with general questions and moving toward specific decisions.

Answer 83

Overfitting ## Footnote This occurs when there are too many questions, causing the model to memorize training data.

Answer 84

* More accurate * More robust * Reduce overfitting ## Footnote Ensemble methods combine multiple models to improve predictions.

Answer 85

* Random samples of data → many trees * Trees run in parallel * Final result = majority vote / average ## Footnote Random Forest is the most famous bagging method.

Answer 86

Reduces overfitting ## Footnote It is also fast due to parallel processing.

Answer 87

Trees run one after another ## Footnote Each new tree focuses on correcting previous errors, using the same dataset modified each time.

Answer 88

* Pros: Very strong predictive power * Cons: Slower (serial), more prone to overfitting if overused ## Footnote Examples include AdaBoost, Gradient Boosting, XGBoost.

Answer 89

Grouping similar data ## Footnote It is an unsupervised learning method that requires no labels.

Answer 90

* Points close together are similar * Clusters are roughly circular ## Footnote K-Means is sensitive to outliers and initial cluster positions.

Answer 91

Number ## Footnote It is used to predict values.

Answer 92

Class (2) ## Footnote It is used for binary classification.

Answer 93

Class / Number ## Footnote They are used for rule-based decisions.

Answer 94

Class / Number ## Footnote It provides strong general predictions.

Answer 95

Class / Number ## Footnote It is used for high accuracy models.

Answer 96

Clusters ## Footnote It is used for grouping data.

Answer 97

What are you trying to predict (if anything)? ## Footnote This question guides the selection of the appropriate algorithm based on the prediction goal.

Answer 98

* Predicting a continuous number * Examples: Sales revenue, House prices, Demand forecasting, Temperature ## Footnote Use regression when the output is a number, not a category.

Answer 99

* Linear Regression * Decision Trees * Random Forest * Ensemble Tree Algorithms ## Footnote These algorithms are effective for predicting continuous numerical outcomes.

Answer 100

* Predicting a category * Examples: Fraud / Not fraud, Pass / Fail, Customer churn / No churn ## Footnote Classification is used when the output is a category rather than a number.

Answer 101

* Binary classification: Logistic Regression, Decision Trees, Random Forests, Ensemble Trees * Multi-class classification: Decision Trees, Random Forests, Ensemble Trees, Artificial Neural Networks * Image classification: Convolutional Neural Networks (CNNs) ## Footnote The choice of algorithm depends on the number of classes and the data type.

Answer 102

* No labels * Want to group similar data ## Footnote Clustering is used when there is no target variable and the goal is to discover structure in the data.

Answer 103

K-Means ## Footnote K-Means is widely used for grouping similar data points without predefined outcomes.

Answer 104

* Predict a number: Regression (Linear regression, Trees) * Predict a class: Classification (Logistic regression, Trees, ANN) * Group data: Clustering (K-Means) * Images: Classification (CNNs) * Text / language: NLP ## Footnote This framework helps in understanding the types of algorithms based on the task.

Answer 105

* Algorithms need enough data to learn patterns * Rough guidelines: Fewer than ~100 images → very hard to learn patterns; < 50–100 rows for tabular data → most models struggle ## Footnote More complex data requires more data for effective learning.

Answer 106

* Help Desk Automation (Chatbots) * Recommender Systems * Semantic Search * Using Existing Analytics as ML Opportunities * ML is often R&D ## Footnote Quick wins are low effort, high impact ML projects that can improve efficiency.

Answer 107

* Handles FAQs * Faster response * Frees human agents for complex issues ## Footnote Users prefer fast answers over human interaction for simple tasks.

Answer 108

* Understands meaning, not just words * Learns relationships from past searches ## Footnote This prevents duplicate documents and wasted time by providing more relevant search results.

Answer 109

* Trends * Bottlenecks * Clear targets to predict ## Footnote Defining a target variable can help uncover patterns that analytics may have missed.

Answer 110

* Depends on what you’re predicting * Type of output * Data type * Amount of data ## Footnote Quick ML wins usually come from automation, recommendations, search improvements, and addressing existing business pain points.

ML Flashcards

(134 cards)