ML Flashcards

(134 cards)

1
Q

What is the main purpose of a dashboard in an ML project?

A

To display key performance indicators in a graphical user interface

Dashboards utilize data visualization and are linked to the data source for interactivity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or false: A dashboard is primarily a backend application on a server.

A

FALSE

A dashboard is a graphical user interface, while an API is the backend application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does an API provide for clients in an ML project?

A

An endpoint URL for data input and prediction output

The API sends responses typically in JSON format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fill in the blank: An API exposes an endpoint URL which the frontend app can make a _______ request to send data.

A

HTTP POST

This allows communication between the frontend client app and the backend API.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the common HTTP status codes returned by an API?

A
  • 200 range: Successful operation
  • 400 range: Client error
  • 500 range: Server error

These codes help indicate the status of the API request.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a key challenge when conveying ML results to a non-technical audience?

A

Making complex statistical results understandable

Interactive visualizations, like dashboards, can help bridge this gap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a dashboard and an API?

A
  • Dashboard: Standalone solution, graphical/tabular format
  • API: Integrated into existing client frontend, supplies data as JSON

Dashboards are great for explaining findings, while APIs are more production-ready.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is required for authentication when accessing an API?

A

An API key in the authorization header

Clear instructions must be provided for customers to access the API.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a simple framework for creating an API?

A

Flask

For more complicated production-ready versions, Django DRF can be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a 400 BAD REQUEST error indicate?

A

Incorrect data sent that the server did not understand

This is a common client error returned by an API.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of error messages in an API?

A

To provide appropriate codes and consistent formats for client display

This helps clients understand issues that arise during API requests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of a dashboard in data analysis?

A

To explain findings and assess if the proposed hypothesis yielded expected predictions and met business requirements

A dashboard serves as a storytelling tool for a non-technical audience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What should be included in a dashboard summary?

A
  • Problem statement
  • Business requirements
  • Associated jargon

This summary helps clarify the context for the audience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two key aspects of dashboard design?

A
  • Art
  • Science

Effective dashboard design combines aesthetic appeal with clear communication of complex ideas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What should be prioritized in a dashboard to answer business questions?

A

The most important data

Prioritization ensures clarity and relevance in the information presented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of controls should be used for a small number of categorical classes?

A
  • Dropdown
  • Radio button

These controls allow users to select from predefined options easily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What control should be used for a continuous numerical feature?

A

Slider

A slider allows users to adjust values within a range intuitively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the principle of progressive disclosure in dashboard design?

A

Don’t show everything at once; let the user drill down

This approach helps manage complexity and enhances user experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why is it important to accentuate visuals in a dashboard?

A

With titles and labels

Clear titles and labels help users understand the data being presented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the different types of users that may interact with a dashboard?

A
  • Analyst
  • Manager
  • ML Engineer
  • Data Scientist

Different users have varying needs for data access and interaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What might be necessary for different user types regarding dashboard access?

A

Authentication with different access levels

This ensures that users only see data relevant to their role.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Does Streamlit support native authentication?

A

No

However, dashboards can be built with Flask or Django to implement authentication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does authentication also apply to besides dashboards?

A

APIs

Ensuring secure access is crucial for both dashboards and APIs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Artificial Intelligence?

A

Simulation of human intelligence and mimicry of the human brain’s problem-solving and decision-making skills

AI mimics the brain using algorithms that adapt to input using mathematical functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
True or false: AI is good at solving a specific problem based on data but finds it easier to transfer techniques to similar problems.
FALSE ## Footnote AI struggles to transfer techniques to similar problems.
26
What is the sub-field of AI that deals with how computers learn?
Machine learning ## Footnote Machine learning uses large datasets processed and labeled by humans.
27
In machine learning, what are **labels** or **targets**?
Synonyms for the variable you are interested in predicting ## Footnote Examples include categorical variables like 'car' or 'no car'.
28
What is a **classifier** in machine learning?
An estimator fitted to the model that classifies response variables into predefined classes ## Footnote It can be used for binary or multiple classifications.
29
What does **regression** refer to in machine learning?
The iterative process of fitting a line to a numeric data set ## Footnote It predicts future numbers based on past data.
30
Fill in the blank: A **clustering task** is where you want to group data but don’t have predefined _______.
classes ## Footnote An example is analyzing past customer purchase patterns.
31
What is the purpose of **model optimizing**?
To reduce discrepancies between known and model estimates ## Footnote It applies weighting to improve prediction accuracy.
32
What is an **algorithm** in the context of machine learning?
A set of instructions used to analyze input data and predict output values ## Footnote Algorithms learn without explicit programming as they are exposed to more data.
33
What is a **prediction** in machine learning?
The output provided by a trained machine learning model based on input data ## Footnote Predictions can be used in decision-making processes or as inputs for other systems.
34
What are **features** in a dataset?
Factors on which the response or prediction variable depends ## Footnote Features are also known as attributes.
35
What is the difference between **features** and **attributes**?
Features refer to form, shape, or proportion; attributes refer to characteristics or qualities ## Footnote Both are used to describe data in machine learning.
36
What is an **instance** in a dataset?
A single data point or a sample ## Footnote It can also be referred to as an example or observation.
37
What is the role of a **data practitioner** in machine learning?
To help refine algorithms and reduce errors in model predictions ## Footnote They work with data to make predictions or classify items.
38
True or false: Machine learning models can fit in a multidimensional plane for complex data.
TRUE ## Footnote This allows for more sophisticated predictions.
39
What is the primary goal of **machine learning**?
To gain new abilities or knowledge through the analysis of prior data ## Footnote This involves utilizing various methods to analyze and interpret data.
40
What type of learning is most commonly used in practical machine learning use cases?
Supervised learning ## Footnote It is often the first method encountered in workplace applications.
41
In supervised learning, what are the two types of variables in the training dataset?
* Features * Label variable (target) ## Footnote Features are the input variables, while the label is what we aim to predict.
42
In the context of supervised learning, what is the **label**?
The variable we aim to predict ## Footnote In the provided example, the label is the Review.
43
What is the process of mapping the relationship between features and the label in supervised learning called?
Training the algorithm ## Footnote This involves finding patterns in the training data.
44
What is the term for the task in supervised learning when the target variable is a class or category?
Classification ## Footnote An example is determining if an image contains a car.
45
What is an example of a **regression** task in supervised learning?
Predicting waiter tip values in a restaurant ## Footnote Regression tasks involve predicting continuous values.
46
What does **unsupervised learning** involve?
Only features provided without any label or target ## Footnote The algorithm discovers patterns in the data without correct answers.
47
What is an example of **clustering** in unsupervised learning?
Grouping data by similarity ## Footnote This helps in identifying patterns without predefined labels.
48
What is the purpose of **association rules** in unsupervised learning?
To find relationships between variables ## Footnote An example is determining if buying X leads to buying Y.
49
Why is unsupervised learning important in today's technological world?
Computers generate vast amounts of unlabelled data ## Footnote Labeling all data in advance is often impractical.
50
What is the **caveat** of unsupervised learning?
No guarantee that the model will output relevant results ## Footnote The algorithm works based on its internal rules without true answers.
51
What is **semi-supervised learning**?
A mix of labeled and unlabeled data ## Footnote It occurs when some data points are labeled while others are not.
52
What is a potential option when dealing with semi-supervised learning?
* Drop unlabeled examples and use supervised learning * Predict labels for unlabeled examples * Use only input variables in an unsupervised model ## Footnote Each option has its own implications for model performance.
53
What is the main concept of **reinforcement learning**?
An agent learns through punishment or reward in an environment ## Footnote The agent discovers which actions lead to achieving its goal.
54
In reinforcement learning, what does the **state** represent?
The current world or environment of the task ## Footnote It is crucial for the agent's decision-making process.
55
What is an example of reinforcement learning in action?
An agent playing noughts and crosses (tic-tac-toe) ## Footnote The agent learns from sequences of actions and their outcomes.
56
What are some specialized methods in machine learning that were mentioned but not detailed?
* Self-supervised * Multi-instance * Inductive * Deductive * Transductive * Multi-task * Active * Online * Transfer * Ensemble ## Footnote These methods may appear in literature or workplace applications.
57
What is the overall purpose of learning methods in machine learning?
To analyze large quantities of data and identify patterns ## Footnote This capability exceeds human analytical abilities.
58
What is a **machine learning model** in abstract form?
A recipe or set of instructions that takes inputs and returns an output ## Footnote It learns patterns from historical data to make predictions on new unseen data.
59
What does an **ML model** generate to learn patterns?
Algorithms ## Footnote The type of algorithm used will impact the result.
60
In a scatter plot for a dataset of Paw and Weight measurements, what does the **x-axis** represent?
Paw Size ## Footnote The y-axis represents weight.
61
What does the **line** in a scatter plot represent in the context of a machine learning model?
The output of the algorithm ## Footnote It indicates how the ML model makes predictions.
62
What are the two parts to consider about the graph in a machine learning model?
* The line itself (straight or squiggly) * Its placement in the graph ## Footnote These aspects visually represent how the model classifies data.
63
What is an **ML pipeline**?
A sequence of tasks performed when training a machine learning model ## Footnote Typical tasks include data validation, model training, model tuning, and model deployment.
64
What is the last step in an ML pipeline?
The ML model ## Footnote The preceding steps prepare the data for the model.
65
What are the **three commonly used machine learning tasks**?
* Regression * Classification * Clustering ## Footnote Regression and Classification are used in supervised learning, while Clustering is used in unsupervised learning.
66
When would you use a **regression problem**?
When the target variable is a continuous numerical value ## Footnote This often involves complex relationships between multiple attributes.
67
What type of problem is it if the label variable you wish to predict is a **class**?
Classification problem ## Footnote Classes can be strings, numbers, or Booleans, and must be nominal or ordinal values.
68
What is **multi-class classification**?
Classification problem with more than 2 classes ## Footnote The type of classification algorithm used depends on the number of classes.
69
What is the main type of **unsupervised learning algorithm**?
Clustering ## Footnote It groups data points based on their similarities.
70
What does clustering do with data points?
Groups them based on similar properties ## Footnote Dissimilar data points are placed in separate clusters.
71
What is **Scikit-learn**?
A free open-source machine learning library written in Python ## Footnote It contains tools for predictive data analysis and is built upon NumPy, SciPy, and Matplotlib.
72
What is the common alias used to import **Scikit-learn** in Python scripts?
sklearn ## Footnote This alias is commonly used for convenience in coding.
73
List the main functionalities provided by **Scikit-learn**.
* Classification * Regression * Clustering * Data processing * Dimensionality reduction * Feature engineering * Feature scaling * Feature selection * Tuning model hyperparameters * Creating ML pipelines * Evaluating model performance ## Footnote These functionalities make it a comprehensive tool for machine learning tasks.
74
Define an **estimator** in the context of Scikit-learn.
A mathematical function that learns from data ## Footnote Examples include machine learning models like random forests or linear regression.
75
What are the two types of **estimators** in Scikit-learn?
* Predictors * Transformers ## Footnote Each type has different methods for learning and transforming data.
76
What methods do **predictor estimators** use?
* .fit() * .predict() ## Footnote These methods allow the model to learn patterns from data and make predictions.
77
What methods do **transformer estimators** use?
* .fit() * .transform() ## Footnote These methods allow the estimator to learn from data and transform it into a better distribution.
78
What is a **pipeline** in Scikit-learn?
A sequence of tasks arranged into a single function ## Footnote It typically includes data preparation steps followed by the ML model.
79
Why is **Scikit-learn** important for data practitioners?
It is a centralized and complete library for conventional ML ## Footnote It provides effective modules that assist from development through to deploying ML pipelines.
80
What is the sub-field of AI that deals with how computers are made smart?
Machine Learning ## Footnote This field focuses on algorithms and statistical models that enable computers to perform tasks without explicit instructions.
81
Which machine learning technique is essential for **binary classification problems**?
Logistic Regression ## Footnote Logistic regression is widely used for predicting binary outcomes.
82
What is an **estimator** in Scikit-learn?
A mathematical function that learns from data ## Footnote Estimators are fundamental components in Scikit-learn for building machine learning models.
83
What is the primary goal of **reinforcement learning**?
To learn from rewards and punishments ## Footnote This approach allows agents to learn optimal behaviors through trial and error.
84
In the Boston housing dataset example, what is the **target variable**?
Median value of owner-occupied homes ## Footnote The target variable is what the model aims to predict.
85
What is a **clustering task** in machine learning?
Grouping data without predefined classes ## Footnote Clustering is an unsupervised learning technique.
86
In a classification report, what does '**recall**' indicate?
The sensitivity to positive instances ## Footnote Recall measures the ability of a model to find all relevant cases.
87
What type of algorithm would you use if your **target variable** is a continuous numerical value?
Regression ## Footnote Regression algorithms predict continuous outcomes.
88
What are the **features** within a dataset?
The factors on which the response variable depends ## Footnote Features are the input variables used for making predictions.
89
Which statistical method is commonly used in machine learning for making **predictions**?
Regression ## Footnote Regression analysis is a key technique for prediction.
90
What is the typical first step in a **supervised learning workflow**?
Split the dataset into a train and test set ## Footnote This step is crucial for evaluating model performance.
91
What is the role of a **classifier** in machine learning?
To classify response variables into predefined classes ## Footnote Classifiers are used in supervised learning to categorize data.
92
Which of these abilities does AI attempt to mimic from the **human brain**?
Reasoning ## Footnote AI systems aim to replicate cognitive functions.
93
Which type of learning involves training a model with **labelled data**?
Supervised Learning ## Footnote This approach uses input-output pairs for training.
94
Which metric is used to measure the performance of a **regression model**?
R-squared Score ## Footnote R-squared indicates the proportion of variance explained by the model.
95
Which term refers to **grouping data points** based on their similarities without predefined labels?
Clustering ## Footnote Clustering is a fundamental concept in unsupervised learning.
96
What is the primary objective of an **AI system**?
Mimic human intelligence ## Footnote AI aims to replicate cognitive functions and decision-making processes.
97
What is the primary focus of **linear regression**?
Modelling the relationship between a dependent variable and one or more independent variables ## Footnote Linear regression is used to predict outcomes based on linear relationships.
98
In the context of machine learning, what does a **pipeline** allow you to do?
Train a model, evaluate its performance, and use the model for prediction ## Footnote Pipelines streamline the workflow in machine learning projects.
99
What is the term for the variable that you are interested in **predicting** in machine learning?
Label ## Footnote Labels are the outcomes that models aim to predict.
100
In the context of AI, what is a **Turing test** used to determine?
Whether a system acts like a human ## Footnote The Turing test assesses a machine's ability to exhibit intelligent behavior.
101
What are the **two main learning styles** in machine learning?
* Supervised learning * Unsupervised learning ## Footnote Supervised learning involves labeled data, while unsupervised learning works with unlabeled data.
102
What is the **formula** for Linear Regression?
Y = mX + c ## Footnote In ML terms: β₀ (beta zero) is the intercept (c) and β₁ (beta one) is the slope (m).
103
What is **Linear Regression** used for?
* Predicting a number * Examples: House prices, Revenue, Temperature, Sales ## Footnote It draws a straight line that best fits the data and minimizes error.
104
What is the key difference between **Linear Regression** and **Logistic Regression**?
Logistic Regression outputs a probability (0 to 1), not a raw number ## Footnote Logistic function produces an S-shaped curve and is used for binary classification.
105
What are the **examples** of Logistic Regression?
* Pass / Fail * Spam / Not spam * Yes / No ## Footnote Logistic regression predicts probability, then you choose how to classify it.
106
What do **Decision Trees** resemble?
A flowchart of yes/no questions ## Footnote Each question splits the data, starting with general questions and moving toward specific decisions.
107
What is a common risk associated with **Decision Trees**?
Overfitting ## Footnote This occurs when there are too many questions, causing the model to memorize training data.
108
What is the purpose of **Ensemble Methods**?
* More accurate * More robust * Reduce overfitting ## Footnote Ensemble methods combine multiple models to improve predictions.
109
How does **Bagging** (Bootstrap Aggregation) work?
* Random samples of data → many trees * Trees run in parallel * Final result = majority vote / average ## Footnote Random Forest is the most famous bagging method.
110
What is a key strength of **Bagging**?
Reduces overfitting ## Footnote It is also fast due to parallel processing.
111
How does **Boosting** differ from Bagging?
Trees run one after another ## Footnote Each new tree focuses on correcting previous errors, using the same dataset modified each time.
112
What are the **pros** and **cons** of Boosting?
* Pros: Very strong predictive power * Cons: Slower (serial), more prone to overfitting if overused ## Footnote Examples include AdaBoost, Gradient Boosting, XGBoost.
113
What is the main use of **K-Means** in machine learning?
Grouping similar data ## Footnote It is an unsupervised learning method that requires no labels.
114
What are the **key assumptions** of K-Means?
* Points close together are similar * Clusters are roughly circular ## Footnote K-Means is sensitive to outliers and initial cluster positions.
115
What is the **output type** of Linear Regression?
Number ## Footnote It is used to predict values.
116
What is the **output type** of Logistic Regression?
Class (2) ## Footnote It is used for binary classification.
117
What is the **output type** of Decision Trees?
Class / Number ## Footnote They are used for rule-based decisions.
118
What is the **output type** of Random Forest (Bagging)?
Class / Number ## Footnote It provides strong general predictions.
119
What is the **output type** of Boosting?
Class / Number ## Footnote It is used for high accuracy models.
120
What is the **output type** of K-Means?
Clusters ## Footnote It is used for grouping data.
121
What is the key question when choosing an **algorithm**?
What are you trying to predict (if anything)? ## Footnote This question guides the selection of the appropriate algorithm based on the prediction goal.
122
When should you use **regression** algorithms?
* Predicting a continuous number * Examples: Sales revenue, House prices, Demand forecasting, Temperature ## Footnote Use regression when the output is a number, not a category.
123
What are suitable algorithms for **regression**?
* Linear Regression * Decision Trees * Random Forest * Ensemble Tree Algorithms ## Footnote These algorithms are effective for predicting continuous numerical outcomes.
124
When should you use **classification** algorithms?
* Predicting a category * Examples: Fraud / Not fraud, Pass / Fail, Customer churn / No churn ## Footnote Classification is used when the output is a category rather than a number.
125
What are the types of **classification** and their suitable algorithms?
* Binary classification: Logistic Regression, Decision Trees, Random Forests, Ensemble Trees * Multi-class classification: Decision Trees, Random Forests, Ensemble Trees, Artificial Neural Networks * Image classification: Convolutional Neural Networks (CNNs) ## Footnote The choice of algorithm depends on the number of classes and the data type.
126
What is the purpose of **clustering** in unsupervised learning?
* No labels * Want to group similar data ## Footnote Clustering is used when there is no target variable and the goal is to discover structure in the data.
127
What is a common algorithm used for **clustering**?
K-Means ## Footnote K-Means is widely used for grouping similar data points without predefined outcomes.
128
What are the **algorithm families** based on ML tasks?
* Predict a number: Regression (Linear regression, Trees) * Predict a class: Classification (Logistic regression, Trees, ANN) * Group data: Clustering (K-Means) * Images: Classification (CNNs) * Text / language: NLP ## Footnote This framework helps in understanding the types of algorithms based on the task.
129
Why does **data quantity** matter in machine learning?
* Algorithms need enough data to learn patterns * Rough guidelines: Fewer than ~100 images → very hard to learn patterns; < 50–100 rows for tabular data → most models struggle ## Footnote More complex data requires more data for effective learning.
130
What are **quick wins** in machine learning?
* Help Desk Automation (Chatbots) * Recommender Systems * Semantic Search * Using Existing Analytics as ML Opportunities * ML is often R&D ## Footnote Quick wins are low effort, high impact ML projects that can improve efficiency.
131
What is the benefit of **chatbots** in help desk automation?
* Handles FAQs * Faster response * Frees human agents for complex issues ## Footnote Users prefer fast answers over human interaction for simple tasks.
132
What is the purpose of **semantic search**?
* Understands meaning, not just words * Learns relationships from past searches ## Footnote This prevents duplicate documents and wasted time by providing more relevant search results.
133
What can traditional reports show that may lead to **ML opportunities**?
* Trends * Bottlenecks * Clear targets to predict ## Footnote Defining a target variable can help uncover patterns that analytics may have missed.
134
What is a key takeaway regarding choosing algorithms in machine learning?
* Depends on what you’re predicting * Type of output * Data type * Amount of data ## Footnote Quick ML wins usually come from automation, recommendations, search improvements, and addressing existing business pain points.