AI Flashcards

Learning about AI (174 cards)

1
Q

Where was the conference held in 1956 which saw the first public mention of the term artificial intelligence?

A

Dartmouth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Who came up with the term artificial intelligence during his preparations for the 1956 conference in Dartmouth?

A

John McCarthy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does DARPA abbreviate?

A

Defense Advanced Research Project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What was the programming language Prolog originally designed for?

A

natural language processing and AI applications, particularly computational linguistics

It works by expressing problems as facts and rules in first-order logic (Horn clauses), then uses unification and backtracking to perform automated reasoning — effectively proving whether a query follows from the given knowledge base.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Please name the key disciplines that contributed to the development of AI.

A
  • decision theory
  • game theory
  • natural language processing
  • neuroscience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name a big challenge first forages into the realm of automatic translations faced in the 1960s.

A

word ambiguity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 prerequisites for AI to be successful?

A
  • Availability of data
  • Computational power
  • Mature algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Please define AI winter.

A

periods when interest, research activities, and funding of AI projects significantly decrease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What broader category of AI systems do expert systems belong to?

A

Knowledge-based systems.

Expert systems are a specialized subset that use a knowledge base of domain-specific facts and rules combined with an inference engine to emulate the decision-making of a human expert.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If a problem to be solved can be categorized as a decision problem, how might the knowledge be typically represented in an expert system?

A

as a decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What was Dendral, one of the first expert systems put into practice, created to achieve?

A

identify organic molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What might have caused the downfall of expert systems?

A
  1. computational complexity of inference growing faster than linear per no. of rules
  2. difficulty of proving consistency as the base grows
  3. inability of rule-based systems to update their own rules and learn from experience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are typical examples of possible rules used to flag potentially fraudulent transactions in credit institutions?

A
  • deviation from normal customer behaviour
  • geographical anomalies
  • rapid succession of multiple small transactions
  • unusual transaction time
  • unusually high amount
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of an expert system?

A

It encodes the knowledge and reasoning of domain experts into a rule-based system, enabling non-expert users to reach expert-level conclusions and decisions.

knowledge base (facts and rules from experts) ► inference engine (applies the rules) ► user interface (faces the non-expert)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Who developed the perceptron in 1957?

A

Frank Rosenblatt

American psychologist and computer scientist

at the Cornell Aeronautical Laboratory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the basic operating principle of the perceptron?

A

Each input is multiplied by its corresponding weight, the weighted inputs and a bias term are summed, and the result is passed through a step function (activation function) that produces a binary output — 1 if the sum exceeds a threshold, 0 otherwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the typical components of a perceptron?

A
  1. input: x1, x2, x3
  2. weights: w1, w2, w3
  3. transfer function:
    Σ
  4. activation function
  5. output: y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When was the backpropagation algorithm introduced by Rumelhart et al.?

A

1986

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the backpropagation algorithm allow the neural network to do?

A

learn from errors

by gradually adjusting the connection weights to improve predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the number of neurons in the input layer correspond to in ANNs?

A

the no. of features in the input data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does the number of neurons in the output layer correspond to in ANNs?

A

the no. of output classes or output variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the main architectural characteristics of feed-forward neural networks?

A
  1. the outputs of neurons are only connected to the next layer
  2. the information only flows in one direction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the term GAN abbreviate in the context of the history of generative AI?

A

generative adversarial networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does GPT abbreviate in the context of AI?

A

generative pre-trained transformers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the main concept called by Chomsky which stated that people generate an unlimited no. of sentences based on a set of rules in their mind?
transformational-generative grammar
26
What does AlphaZero base its learning on?
intensive self-play following a set of rules
27
What is the main assumption of quantum mechanics?
Physical systems can be characterized using a wave function describing the probabilities of the system being in a particular state.
28
What are the 5 phases of the Gartner hype curve?
1. discovery - innovation trigger 2. exaggerated/inflated expectations 3. trough of disillusionment 4. slope of enlightenment 5. plateau of productivity
29
What does the Gartner term AI TRiSM stand for?
Trust, Risk and Security Management | (for AI systems) ## Footnote scoped to AI governance — not general enterprise risk
30
Give an example for a technology within AI that reached the phase of the plateau of productivity.
computer vision
31
Outline what the knowledge base for an expert system to detect COVID-19 could look like.
1. Symptom Ontology 2. Epidemiological & Risk-Factor Rules 3. Diagnostic Test Interpretation Module 4. Temporal Reasoning Engine 5. Differential Diagnosis & Exclusion Rules 6. Variant-Specific Modifiers 7. Inference Strategy
32
Where should we put deep learning on the hype cycle curve?
Plateau of Productivity ## Footnote It powers production systems in search, recommendation, NLP, computer vision, fraud detection, and medical imaging at scale.
33
Where should we put smart robots on the hype cycle curve?
Peak of Inflated Expectations | Demos are impressive. ## Footnote Reliable real-world deployment outside structured environments remains extremely limited.
34
Where should we put autonomous vehicles on the hype cycle curve?
Trough of Disillusionment | The industry is quietly grinding through hard engineering. ## Footnote After peak hype around 2017–2019 ("fully self-driving by 2020"), reality hit hard: Argo AI shut down, Cruise suspended operations after incidents, Uber sold its AV unit.
35
How many AI systems have already reached the plateau of productivity at the Gartner hype cycle curve?
only computer vision – per the course book | Knowledge graphs were also identified as hitting the plateau by Gartner. ## Footnote And from the 2023 AI Hype Cycle, cloud AI services (many of which are low-code) were recognized as approaching the Plateau of Productivity, with potential to transform industries from manufacturing to finance.
36
From which disciplines did American mathematician Marvin Minsky combine knowledge to contribute to AI?
computer science and cognitive science
37
How do we calculate the F score?
(2 * precision * recall) / (precision + recall)
38
Describe at least 5 characteristics that a humanoid robot equipped with an AGI would have to possess.
1. **General-Purpose Reasoning & Transfer Learning**. It must solve novel problems across domains it was never explicitly trained on — not just pattern-match within a narrow task. 2. **Embodied Perception & Sensorimotor Integration**. It needs to fuse multimodal sensory input (vision, proprioception, tactile feedback, auditory signals) into a coherent world model in real time. 3. **Natural Language Understanding** & Social Cognition. Functioning among humans requires pragmatic communication: understanding sarcasm, implicature, cultural context, and emotional subtext — not just syntax. Beyond language, it needs a theory of mind — the capacity to model others' beliefs, intentions, and knowledge states to cooperate, negotiate, or assist effectively. 4. **Autonomous Goal Setting & Planning Under Uncertainty**. True AGI implies the robot can formulate its own sub-goals, prioritize competing objectives, and plan over long time horizons while dealing with incomplete information. This goes beyond reactive behavior into deliberate, hierarchical planning with contingency handling (e.g., "the store is closed → re-plan the recipe with available ingredients"). 5. **Continuous Learning & Self-Correction.** It must learn incrementally from experience after deployment — adapting to a specific household, new tools, or changing user preferences — without catastrophic forgetting of previously acquired skills. This includes metacognition: recognizing when it doesn't know something, when to ask for help, and when its own model of the world is wrong. 6. **Ethical Reasoning & Value Alignment.** Operating autonomously in human environments demands the ability to weigh competing moral considerations, respect social norms, and degrade gracefully when facing dilemmas (e.g., prioritizing human safety over task completion). It must internalize values rather than merely following a rule list. 7. **Fine-Grained Dexterous Manipulation**. Humanoid form implies human-like manipulation tasks: threading a needle, cracking an egg, handling fragile objects.
39
Reflect on how AI could contribute to reducing the workload of the HR department of a company.
**Recruitment & Screening** NLP-based systems can parse resumes, rank candidates against job requirements, and flag top matches. **Employee Onboarding** An automation layer with an LLM-based assistant can guide new hires through the process, answer FAQ, and escalate only genuine edge cases to a human. **Performance Review Support** AI can draft initial review summaries from collected peer feedback, project metrics, and self-assessments — giving managers a structured starting point rather than a blank page. **Internal Mobility & Skill Mapping** By analyzing employee profiles, completed trainings, and project histories, AI can recommend internal candidates for open roles or suggest upskilling paths — something HR teams rarely have bandwidth to do systematically. **Attrition Prediction** ML models trained on historical data (tenure, engagement survey scores, promotion velocity, team changes) can flag flight-risk employees, giving HR time to intervene proactively.
40
What are the 3 main types of machine learning by the learning paradigm (how the model learns from data)?
1. Reinforcement learning 2. Supervised learning 3. Unsupervised learning
41
What are the 5 most important terms in reinforcement learning?
1. Action (A) – all possible actions the agent can perform 2. Environment (E) – the scenario the agent must explore 3. States (S) – all possible states in the given environment 4. Reward (R) – feedback from the environment to reward an action 5. Policy (π) – long-term value of current state S using policy π
42
How do we formalize the process of receiving a reward from a state–action pair in RL?
f(st, at) = rt+1 ## Footnote the agent is in state st, performs action at, and the environment returns the scalar reward rt+1
43
For a sequence of discrete time steps t = 0, 1, 2, … starting at the state s0 ∈ S, what sequence will the agent–environment interaction lead to?
s0, a0, r1, s1, a1, r2, s2, a2, r3, s3, …
44
Please explain how the environment reacts to the action of an agent in reinforcement learning.
by returning a pair of state and reward
45
What does MDP generally abbreviate in the context of reinforcement learning?
Markov decision process
46
When do we say that a state *s* holds the **Markov property** in reinforcement learning?
When the state holds all the relevant information about past actions.
47
In the context of the Markov decision process, what is the formula for the policy π which describes the chosen action in a certain state?
π(s, a) = p(at = a | st = s)
48
What do value functions estimate in the context of reinforcement learning?
how good it is for an agent to be in a state and perform a specific action in that given state
49
What is γ in the following value function equation? ## Footnote **Vπ(s) = Eπ[ rt+1 + γrt+2 + … + γT−1rT | st = s ] = Eπ[ ∑k=0 γk rt+k+1 | st = s ]**
the discount factor in [0, 1] that controls how much the agent cares about future vs. immediate rewards | security of the expected return ## Footnote A γ close to 1 expresses high confidence that future rewards are "secure," while a γ close to 0 says "I only trust what's right in front of me."
50
In a Markov decision process, what aims to maximize the reward?
agent
51
In model-based reinforcement learning, what does an agent try to understand?
the model of the environment
52
Temporal difference learning is a model-free approach, so no model of the learning environment is required. What is the learning based on instead?
Learning happens directly from the experience in a system that is partially unknown. | the most prominent illustration of this principle is weather forecasting ## Footnote TD learning makes predictions based on correlation between subsequent predictions.
53
When we initialize the Q-learning algorithm, all values in the Q table are set to zero. What are the iterative steps of the following phase?
1. Choose an action for the current state 2. Perform the chosen action 3. Evaluate the outcome and update the Q matrix
54
What is the difference between exploration and exploitation as a strategy when choosing an action for the current state in the Q-learning algorithm?
Exploration: choosing an action randomly, maybe because the Q matrix is still full of 0s. Exploration is necessary to discover new promising strategies. Exploitation: choosing an action based on already aquired knowledge. Exploitation ensures that already proven methods are used optimally.
55
What is the transition function (state-transition probability) in a Markov decision process?
T(s, a, s′) = P(st+1 = s′ | st = s, at = a) | the P of arriving in *s′* given that the agent is in *s* and does *a* ## Footnote The function satisfies ∑s′∈S T(s, a, s′) = 1 for all *s*, *a* — i.e., it's a proper probability distribution over next states.
56
What is the formula expressing the Q-learning update rule, which makes the agent learn?
**Qt(s, a) = Qt−1(s, a) + α TDt(s, a)** | the new Q equals the old Q + a small correction. α (the learning rate) ## Footnote TDt is the temporal difference error, which is essentially "how surprised was the agent" — the gap between what it expected and what it actually got. So the whole equation says: update your beliefs by a small fraction of your surprise.
57
Imagine you have a gardening bot to water your plants. How could the bot apply reinforcement learning to learn how to perfectly water the plants?
States (s): Soil moisture level, time of day, temperature, humidity, plant type, days since last watering, season. Actions (a): What the bot can do — water 50ml, water 100ml, water 200ml, or do nothing. Rewards (r): Positive rewards for healthy soil moisture staying in the optimal range and negative rewards for overwatering (root rot, waterlogging) or underwatering (wilting, dry soil). Transitions (s → s′): After the bot takes action a in state s, the environment moves to s′. The learning loop then uses the Q-learning update: Qt(s, a) = Qt−1(s, a) + α TDt(s, a)
58
You have a computer that learns to play chess by playing against a random opponent. How would the learning process change if the computer played against another computer using the same algorithm?
Against a random opponent, the learning process has a low ceiling. The computer quickly discovers that simple tactics work and then its improvement stalls, because the opponent never punishes sophisticated mistakes. The reward signal becomes uninformative: it keeps winning with r = +1, the TD error shrinks to near zero, and Q-values stop updating meaningfully. The agent converges on a policy that's "good enough to beat chaos" but brittle against any robust strategy. Against a copy of itself, everything changes in several key ways. 1. The opponent co-evolves. Every time the agent finds a better strategy, its opponent instantly has that same improvement. There's no plateau — the bar keeps rising. This is sometimes called an arms race dynamic. 2. The reward signal stays rich. Against a random player, 95% of games end in a win, so r ≈ +1 almost always and the gradient vanishes. In self-play, the win rate hovers around 50%, which means the TD error remains large and informative throughout training. The agent is perpetually surprised, perpetually learning. 3. Exploration happens naturally. A random opponent already provides "random" situations, but they're meaninglessly random — nonsensical positions that won't arise in real play. A self-play opponent creates meaningfully challenging positions that force the agent to discover deep tactics like positional sacrifices, and long-term pawn structure planning. 4. The risk is strategy collapse — both copies might lock into a narrow cycle of mutual exploitation (A beats B beats C beats A) and never discover broader strategies. AlphaZero addressed this with Monte Carlo Tree Search (MCTS) for structured exploration alongside the neural network's policy and value heads. | DeepMind did with AlphaZero - leap from basic RL to self-play ## Footnote In terms of our Q update: Qt(s, a) = Qt−1(s, a) + α TDt(s, a) Against a random opponent, TDt shrinks fast and learning flatlines. In self-play, TDt stays substantial because the opponent keeps getting smarter, so the Q-values keep refining toward genuinely optimal play — which is how AlphaZero surpassed centuries of human chess knowledge in roughly four hours of training against itself.
59
Applying Q learning in chess learning, exploration and exploitation play an important role. What is more important at what stage of the learning process and why?
Early stage → exploration dominates Middle stage → gradual shift Late stage → exploitation dominates ## Footnote At the start, the Q-table is essentially blank. The agent has no basis for making good decisions, so exploiting Q-values would mean following noise. High ε (0.9–1.0) forces the agent to try random actions across many states, which does two critical things: it populates the Q-table with actual experience, and it prevents premature convergence on a bad policy. Think of the gardening bot on day one — it has no idea, so it should try everything and observe what happens. As TD errors accumulate and Q-values start reflecting reality, the agent can increasingly trust its own estimates. We anneal ε downward (e.g., from 0.9 toward 0.1), so the agent mostly follows its best-known actions but still occasionally tries something unexpected. This is where the bulk of the real learning happens — the agent is refining its policy, not building it from scratch. Once Q-values have converged (TD errors are consistently small), the agent should mostly exploit its learned policy. Low ε (0.01–0.05) keeps a tiny door open for discovering rare improvements, but the agent is now confident in its strategy. A well-trained agent shouldn't randomly sacrifice its queen just to "explore" — it should play its best move almost every time.
60
What component describes which action is picked in a certain state in reinforcement learning?
policy π | a function that determines the action based on the state ## Footnote Formally: π(s, a) = p(at = a | st = s). It can be deterministic (1 action / state) or stochastic (a probability distribution over actions).
61
What state do the decisions in Markov decision processes depend on?
the present state | This is the defining characteristic of the Markov property. ## Footnote Formally: P(st+1 | st, st−1, …, s0) = P(st+1 | st).
62
What kind of approach is used in temporal difference learning?
a model-free approach
63
What elements does the reward in the Bellman equation consist of?
the immediate and the future expected reward
64
What are the three main fields of study natural language processing is rooted in?
computer science, cognitive science, and linguistics
65
What are the three main subdomains in NLP?
1. **Speech recognition** identifies words in spoken language and includes speech-to-text processing. 2. Natural **language understanding** extracts the meaning of words and sentences, as well as reading comprehension. 3. Natural language **generation** is the ability to generate meaningful sentences and texts.
66
Around 1985, almost 20 years after the NLP winter began, NLP started to attract interest again due which three developments?
1. Increased computing power 2. Shift of paradigms from complex rule-based systems to statistical tools (e.g. decision trees) 3. Part-of-speech tagging
67
What was the full name of the creator of the Turing test?
Alan Mathison Turing
68
What was the computer program created by Joseph Weizenbaum called, which simulated a conversation with a psychotherapist?
ELIZA
69
What was the chatbot called, which first seemed to pass the Turing test in 2014?
Eugene Goostman
70
What are the most notable application areas of NLP?
* chatbots * named entity recognition * sentiment analysis * text summarization * topic identification * translation
71
In the context of text summarization, what is a common text summarization technique that works in an unsupervised extractive way?
TextRank | It compares every sentence of a given text with all other sentences. ## Footnote done by computing a similarity score for every pair of sentences
72
In the original TextRank paper (Mihalcea & Tarau, 2004), the similarity between two sentences Si and Sj for extractive summarization is computed as a normalized overlap. What does it mean in practice?
We count the shared words and then **divide by the sum of the log-lengths** of both sentences. ## Footnote The logarithmic normalization prevents long sentences from dominating just because they contain more words and thus have a higher chance of overlap with everything.
73
What is the main difference between topic identification and sentiment analysis?
Topic identification focuses on **objective** aspects of the text. In contrast, sentiment analysis centers on **subjective** characteristics like moods and emotions.
74
What are the main challenges of detecting emotions from user-generated content in the context of sentiment analysis?
irony/sarcasm, negation, and multipolarity
75
What are the main starting categories of entities located in text can be classified into in named entity recognition?
1. CoNLL-2003 scheme PER — Person names ORG — Organizations LOC — Locations MISC — Miscellaneous (nationalities, events, works of art, etc.) 2. MUC-6/7 scheme All the above + TIME and MONEY/PERCENTAGE/QUANTITY
76
What are major challenges in named entity recognition (NER)?
* ambiguity in entity boundaries and types * domain adaptation * limited annotated data
77
Around when did neural MT become more popular than statistical MT?
2017
78
Give an example for a speech-to-speech translation system.
Skype translator
79
What is done when using pivot MT in the context of NLP?
the source and target languages are bridged using a third language
80
What are the three levels of chatbot intelligence (from lowest to highest)?
1. **Notification** assistants — unidirectional; push only 2. **FAQ** assistants — bidirectional; match user queries against a knowledge base 3. **Contextual** assistants — bidirectional and context-aware (conversation history)
81
In NLP, what is the stage called where raw text is cleaned and normalized (e.g., tokenized, stemmed, lemmatized) before further processing?
text preprocessing
82
What are typical text preprocessing steps in NLP?
1. Tokenization 2. Stop word removal 3. Lemmatization or stemming
83
Based on what it is very common to tokenize text?
spaces
84
How could we solve the typical problem with simple tokenization that punctuation remains attached to the words?
using regex
85
What are the main advantages of building rule-based systems in NLP?
1. Explainability 2. Flexibility (no need to change application core if rules change) 3. Volume of training data required is relatively small
86
What are the main disadvantages of rule-based systems in NLP?
1. Expert working hours 2. Domain-specificity
87
What is the main disadvantage of statistics-based NLP systems?
a lot of annotated training data are required to produce good results
88
Into what four categories can NLP tasks be divided?
**Syntax** (e.g., tokenization, POS tagging) **Semantics** (e.g., sentiment analysis, NER, topic identification) **Discourse** (e.g., text summarization, topic identification) **Speech** (e.g., speech-to-text, text-to-speech)
89
Give an example for syntactic ambiguity which exemplifies why part-of-speech tagging is tricky?
I saw her duck.
90
Name a couple of fields where semantics are important for correct classification results in NLP.
NER, sentiment analysis
91
What does discourse deal with?
coherent texts that are longer than a single sentence
92
List a few subtasks belonging to the **discourse** domain of NLP?
* analyzing coreference: linking linguistic expressions that refer to the same object or person * examining conversational structure * identifying topic structures
93
What are the 2 main subtasks in the speech tasks domain in NLP?
1. STT 2. TTS
94
What kind of input do algorithms accept in machine learning?
numerical
95
In the context of the Bag-of-Words model, what is the vector length for each sentence if the vocabulary consists of 8 words?
8
96
In the context of the Bag-of-Words model, what are the two main strategies for summarizing vocabulary distribution of a whole text in one vector?
* In a Boolean representation, the vector simply indicates if a word occurs. E.g. [1, 1, 1, 1, 1, 1, 1, 1]. * In a count of words, the resulting vector reflects how often a word occurs. E.g. [2, 1, 2, 1, 2, 2, 2, 1].
97
Name an example for what word similarities can be based on?
cosine similarity
98
What is the date when Google Research published the Word2Vec neural network model that generates word embeddings based on only one hidden layer?
2013
99
What does CBOW abbreviate in the subfield of embedding studies?
**continuous** bag of words
100
How many values are set to 0 in the input vector of the skip-gram prediction model?
all of them except for 1 for which the context has to be reconstructed
101
Which technique is faster, CBOW or the skip-gram architecture?
CBOW
102
What is the general equation for term frequency?
TF(t, d) = (no. of occurrences of t in d) / (no. of words in d)
103
What does document frequency indicate in the context of NLP?
the percentage of documents including a specific term t in relation to the total number of documents D
104
What is the mathematical formula for inverse document frequency?
IDF(t, D) = log(1/DF(t, D))
105
What type of input matrix does the GloVe model (Pennington et al., 2014) operate on?
A **global word–word co-occurrence matrix**, where each entry records how often two words appear together within a context window across the corpus. ## Footnote GloVe learns word vectors by factorizing the log of these co-occurrence counts.
106
What does USE abbreviate in the context of NLP?
Universal Sentence Encoder
107
Please name three methods for word vectorization.
* GloVe * TF-IDF * Word2Vec
108
What does the term n-gram refer to?
a sequence of n consecutive words (bigram, trigram etc.)
109
What does P(w │ ℎ) denote in the context of n-grams?
the probability of a word occurring after a series of words contained in the history of words | P of w occurring after ℎ ## Footnote P(w │ ℎ) = P(beach │ I like to go to the)
110
How do we estimate the probability of a word coming after a recorded sequence of words using frequency counts based on a corpus?
P(w │ ℎ) = C(ℎw)/C(ℎ) ## Footnote P(beach │ I like to go to the) = C(I like to go to the beach)/C(I like to go to the)
111
Why might n-gram-based word probability scoring be useful in speech applications?
Similar-sounding sentences can cause ambiguity. ## Footnote In the context of "I like to go to the b/peach", although both sentences might sound alike, it is statistically more likely that the last word of the sentence is “beach” rather than “peach".
112
What are disadvantages of using statistical models in NLP?
Words or **sequences** **not** **present** in the training data **are** **assigned** a **p**robability of **0**, reducing the model’s robustness. Furthermore, statistical models **lack the ability to generalize effectively**, especially when compared to neural models, which are better equipped to capture complex linguistic patterns.
113
What does the encoder do in an encoder-decoder architecture?
It converts the input text into a vector, which encapsu- lates all important information from the input sequence.
114
What does the decoder do in an encoder-decoder architecture?
It takes the infomation from the encoded vector and converts it back to the original representation.
115
What is the distinguishing mechanism of the transformer models proposed in 2017 by Google?
self-attention
116
Please explain the issue posed by the phenomenon of vanishing gradients in NLP.
The influence of input or earlier layers diminishes as it passes through deep layers, causing the network to struggle in learning long-range dependencies.
117
Name a few pre-trained models based on the transformer architecture.
* BERT (Bidirectional Encoder Representations from Transformers, Devlin et al., 2018) * GPT (Generative Pre-trained Transformer, Radford, 2018) * RoBERTa (a Robustly Optimized BERT Pretraining Approach, Liu et al., 2019) * DistilBERT (a distilled version of BERT, Sanh et al., 2019) * XLNet (Yang et al., 2019)
118
How does a masked language model work?
The masked model takes a sentence from the training set. Next, about 15 percent of the words in that sentence are masked. | I like to [mask1] a cup of coffee with [mask2] in the morning. ## Footnote The model is then trained to predict the missing words in the sentence. The focus of the model is to understand the context of the words. The text data processing is no longer done in a unidirectional way from left to right or right to left.
119
How does next-sentence prediction work in NLP?
The model receives a pair of sentences. The model’s goal is to predict if the first sentence is followed by the second sentence. | The resulting model focuses on how a pair of sentences is related.
120
What decade does NLP date back to in computer science?
the 1950s
121
What is the Weaver’s memorandum in the context of NLP?
In July 1949, Warren Weaver mathematician circulated a short memorandum titled "Translation" to about 200 colleagues. ## Footnote In it, he proposed that **translating** between natural languages could be treated as a problem of **cryptography** and **information theory** — arguing that a Russian text is really just an English text that has been "encoded" in Russian, and that with the right statistical methods a machine could "decode" it back.
122
How does self-attention address the issue of word ambiguities?
by dynamically weighting the importance of surrounding words when computing each token's representation | qualitative leap over static embedding approaches (Word2Vec, GloVe) ## Footnote It is the reason transformer-based models handle polysemy, coreference, and syntactic ambiguity much better than anything that came before them.
123
Where was GloVe developed?
at Stanford University
124
What kind of approaches were used in early NLP?
rule-based approaches
125
What aspects of a text does sentiment analysis deal with?
subjective aspects
126
What was the goal of early computer vision?
to mimic human vision
127
What are the 4 major categories computer vision tasks can be separated into?
1. gemoetry reconstruction 2. image restoration 3. motion analysis 4. recognition
128
What are the 5 main challenges that must be tackled in computer vision?
1. Differentiating similar objects (ball vs. egg) 2. Illumination of an object (red vs. orange) 3. Location 4. Rotation 5. Size and aspect ratios
129
How much more bits are used in a true color vs. a monochrome image per pixel?
24 times more | Monochrome uses a single bit, being 0 or 1. ## Footnote True color uses 24 bits.
130
In RGB, into how many parts are the 24 bits of a true color image separated?
three parts | each 8 bits in length
131
What color do we get if we set all 3 values in a true color RGB image to 0?
black | (additive mixing)
132
What color does (255, 0, 255) yield in additive RGB true colour?
magenta
133
What is the range of color values used in CMYK?
from 0 to 1
134
When we are building coloured images from single pixels, what function do we need?
a mapping from 2D coordinates (x, y) to specific color values
135
What are the 3 commonly used padding techniques?
1. **constant** padding 2. **replication** padding 3. **reflection** padding
136
What is the output of the 3x3 matrix 41 24 N 80 4 N N N N if we apply reflection padding?
41 24 41 80 4 80 41 24 41
137
What are the 4 types of distortions in the group of radial distortions?
1. **Barrel** (positive radial) distortion 2. **Pincushion** (negative radial) distortion 3. **Complex** (mustache radial) distortion 4. **Fisheye** radial distortion
138
What is the cause of the so-called tangential distortion observed in digital images?
misalignment of the image sensor unit and the camera lens
139
What do we estimate in the process of camera calibration in the context of computer vision?
the extrinsic and intrinsic parameters of a camera
140
Name 3 intrinsic characteristics of a camera.
* focal length * lens distortion parameters * optical center
141
In the context of the pinhole camera, what coordinate systems are used?
1. the 3D real-world coordinate system 2. the 3D coordinate system of the camera 3. the 2D coordinate system of the projected image
142
What are the two steps of the projection process using a pinhole camera?
1. Transform the coordinates from the 3D world to the 3D camera coordinates. 2. Transform the 3D camera coordinates to the 2D image coordinates.
143
How does the camera calibration process work when we use a checkerboard?
1. We select at least 2 sample images. 2. We identify distinctive points in each image (corners). 3. We localize the distinctive points in the 3D real world as well as in the 2D image, calculating the camera matrix and the distortion coefficients.
144
What are the 3 main types of features in digital images in the context of computer vision?
blobs, corners, edges
145
What are the 3 main steps of feature engineering in the context of computer vision?
1. Feature detection 2. Feature description/extraction 3. Feature matching
146
What are the 2 techniques commonly used for edge detection?
1. **Canny** edge detector 2. **Sobel** filter
147
When using Sobel filters for edge detection, how many special kernel matrices are used?
2 | one for each of the axes
148
Please give the three steps of feature engineering.
Feature detection Feature description Feature matching
149
What type of deep learning architecture first enabled end-to-end semantic segmentation as a dense pixel-wise prediction task?
Fully Convolutional Networks, which replaced the fully connected classification layers of standard CNNs with convolutional layers, enabling output at the same spatial resolution as the input. ## Footnote Modern approaches extend this with encoder–decoder designs (U-Net, DeepLab) and transformer-based architectures (SegFormer, Mask2Former).
150
Please define semantic segmentation in the context of computer vision.
In semantic segmentation, parts of an image belonging to the same **object class** are **clustered together**. ## Footnote The algorithm receives an image with several objects as input and returns an image where the pixels are labeled by their semantic category.
151
What are typical methods used in image generation?
1. Diffusion models 2. GANs 3. VAEs
152
What does VAE abbreviate in the context of image generation?
variational autoencoder
153
What do diffusion models use as the first step in image generation?
random noise
154
What does CLIP abbrev in the context of image synthesis?
Contrastive Language-Image Pre-training
155
What is the issue known as mode collapse in the context of generative adversarial networks?
the generator produces only a limited variety of images
156
What techniques are experimented with when the goal is to develop models that require fewer data and computational resource in image generation?
few-shot learning, transfer learning, use of pre-trained models
157
When was the subfield of computer vision born?
in the 1960s | Larry Roberts' 1963 PhD thesis (extracting 3D geoinfo from 2D photos) ## Footnote the 1966 MIT Summer Vision Project, where Minsky and Papert famously proposed "solving" computer vision as an undergraduate summer assignment — a scope estimate that turned out to be optimistic by about six decades
158
What is the building block that we use for feature matching?
feature vectors
159
What are the 2 main components of GANs in computer vision?
Generator - Generates synthetic data that should look as realistic as possible. Discriminator - Distinguishes between real and generated data, and provides feedback to improve the generator.
160
Please identify what type of architecture VAEs are based on.
encoder-decoder
161
How do diffusion models generate images?
By learning to reverse a gradual noising process. | In training, noise is added; the model then learns to denoise. ## Footnote At generation time, it starts from pure noise and iteratively refines it into a coherent image.
162
What does a 3x3 kernel do if it has a single 1 at the center-right position (row no. 1, col no. 2) and zeros everywhere else.
it shifts the image one pixel to the left | translation/shift kernel ## Footnote Each output pixel copies the value of its right neighbor from the input.
163
What does a 3×3 box blur kernel (all entries = 1, scaled by 1/9) do to an image?
Each output pixel becomes the arithmetic mean of the 3×3 neighborhood around it (9 pixels, each weighted equally at 1/9). | This smooths the image by averaging out local variations. ## Footnote reducing noise but also softening edges
164
For camera calibration, it is important to know external and internal parameters. What are the main extrinsic and intrinsic parameters?
Extrinsic parameters Rotation — the orientation of the camera relative to the world coordinate system Translation — the position of the camera in the world coordinate system Intrinsic parameters Focal length — the distance between the lens and the image sensor Optical center — the point where the optical axis meets the image plane Lens distortion coefficients — parameters describing radial and tangential distortion (as modeled e.g. by the Brown–Conrady model)
165
Think about possible use cases for semantic image segmentation. Where could it be used?
* autonomous driving (detecting vehicles, lanes, pedestrians) * geosensing (land use classification from satellite imagery) * medical imaging (e.g. tumor detection in brain scans) * pose estimation / motion capture (segmenting body parts)
166
How does video denoising relate to image restoration?
Video denoising extends image restoration techniques to the temporal domain. | The broader umbrella term is video restoration. ## Footnote Classical image restoration methods (spatial filtering, transform-domain thresholding) can be applied frame by frame, but dedicated video denoising also exploits inter-frame redundancy and motion compensation for better results.
167
Is it true that in computer vision we assume that a camera image is a radial projection of a scene?
No. | a linear projection of the real-world scene is assumed ## Footnote Straight lines in the real world are expected to appear as straight lines in the image. Deviations from this assumption are what camera calibration and distortion correction aim to fix.
168
How many bits does the monochrome representation of images require?
1 bit / pixel
169
Which architecture is primarily responsible for facilitating the refinement of images from noise in diffusion models?
U-Net architecture | developed by Ronneberger et al., 2015, for biomed image segmentation ## Footnote a symmetric encoder-decoder network with skip connections that turned out to be exceptionally well-suited for the iterative denoising process in diffusion models
170
What is a development set also called?
validation set ## Footnote used to evaluate and further optimize the model's performance before the final, one-time evaluation on the test set
171
Which AI model type allows processing of language without the sequential limits of traditional neural networks?
bidirectional encoder representations from transformers | (Vaswani et al., 2017) ## Footnote replacing the recurrent connections of RNNs/LSTMs with a self-attention mechanism, which relates all positions in a sequence simultaneously rather than processing tokens one by one
172
Sam is designing a new generative model that aims to reduce the computational demands associated with traditional models. Choose the approach best suited to achieve this goal.
utilizing pre-trained models and transfer learning techniques | to minimize resource usage ## Footnote Instead of training from scratch, Sam should leverage a model already trained on vast datasets and fine-tune it for the specific task, drastically reducing the time, data, and compute needed.
173
Briefly explain the Internet of Things.
IoT connects physical and virtual devices using technologies from information and communication technology.
174
Name four examples of how artificial intelligence can be used in high tech and telecommunications.
- Ensuring that networks are healthy and secure - Optimizing and automating networks - Predicting network anomalies - **Predictive maintenance**: fix network issues before they occur