AI Flashcards

Question

What is the main concept called by Chomsky which stated that people generate an unlimited no. of sentences based on a set of rules in their mind?

Answer 1

transformational-generative grammar

Answer 2

intensive self-play following a set of rules

Answer 3

Physical systems can be characterized using a wave function describing the probabilities of the system being in a particular state.

Answer 4

1. discovery - innovation trigger 2. exaggerated/inflated expectations 3. trough of disillusionment 4. slope of enlightenment 5. plateau of productivity

Answer 5

Trust, Risk and Security Management | (for AI systems) ## Footnote scoped to AI governance — not general enterprise risk

Answer 6

computer vision

Answer 7

1. Symptom Ontology 2. Epidemiological & Risk-Factor Rules 3. Diagnostic Test Interpretation Module 4. Temporal Reasoning Engine 5. Differential Diagnosis & Exclusion Rules 6. Variant-Specific Modifiers 7. Inference Strategy

Answer 8

Plateau of Productivity ## Footnote It powers production systems in search, recommendation, NLP, computer vision, fraud detection, and medical imaging at scale.

Answer 9

Peak of Inflated Expectations | Demos are impressive. ## Footnote Reliable real-world deployment outside structured environments remains extremely limited.

Answer 10

Trough of Disillusionment | The industry is quietly grinding through hard engineering. ## Footnote After peak hype around 2017–2019 ("fully self-driving by 2020"), reality hit hard: Argo AI shut down, Cruise suspended operations after incidents, Uber sold its AV unit.

Answer 11

only computer vision – per the course book | Knowledge graphs were also identified as hitting the plateau by Gartner. ## Footnote And from the 2023 AI Hype Cycle, cloud AI services (many of which are low-code) were recognized as approaching the Plateau of Productivity, with potential to transform industries from manufacturing to finance.

Answer 12

computer science and cognitive science

Answer 13

(2 * precision * recall) / (precision + recall)

Answer 14

1. **General-Purpose Reasoning & Transfer Learning**. It must solve novel problems across domains it was never explicitly trained on — not just pattern-match within a narrow task. 2. **Embodied Perception & Sensorimotor Integration**. It needs to fuse multimodal sensory input (vision, proprioception, tactile feedback, auditory signals) into a coherent world model in real time. 3. **Natural Language Understanding** & Social Cognition. Functioning among humans requires pragmatic communication: understanding sarcasm, implicature, cultural context, and emotional subtext — not just syntax. Beyond language, it needs a theory of mind — the capacity to model others' beliefs, intentions, and knowledge states to cooperate, negotiate, or assist effectively. 4. **Autonomous Goal Setting & Planning Under Uncertainty**. True AGI implies the robot can formulate its own sub-goals, prioritize competing objectives, and plan over long time horizons while dealing with incomplete information. This goes beyond reactive behavior into deliberate, hierarchical planning with contingency handling (e.g., "the store is closed → re-plan the recipe with available ingredients"). 5. **Continuous Learning & Self-Correction.** It must learn incrementally from experience after deployment — adapting to a specific household, new tools, or changing user preferences — without catastrophic forgetting of previously acquired skills. This includes metacognition: recognizing when it doesn't know something, when to ask for help, and when its own model of the world is wrong. 6. **Ethical Reasoning & Value Alignment.** Operating autonomously in human environments demands the ability to weigh competing moral considerations, respect social norms, and degrade gracefully when facing dilemmas (e.g., prioritizing human safety over task completion). It must internalize values rather than merely following a rule list. 7. **Fine-Grained Dexterous Manipulation**. Humanoid form implies human-like manipulation tasks: threading a needle, cracking an egg, handling fragile objects.

Answer 15

**Recruitment & Screening** NLP-based systems can parse resumes, rank candidates against job requirements, and flag top matches. **Employee Onboarding** An automation layer with an LLM-based assistant can guide new hires through the process, answer FAQ, and escalate only genuine edge cases to a human. **Performance Review Support** AI can draft initial review summaries from collected peer feedback, project metrics, and self-assessments — giving managers a structured starting point rather than a blank page. **Internal Mobility & Skill Mapping** By analyzing employee profiles, completed trainings, and project histories, AI can recommend internal candidates for open roles or suggest upskilling paths — something HR teams rarely have bandwidth to do systematically. **Attrition Prediction** ML models trained on historical data (tenure, engagement survey scores, promotion velocity, team changes) can flag flight-risk employees, giving HR time to intervene proactively.

Answer 16

1. Reinforcement learning 2. Supervised learning 3. Unsupervised learning

Answer 17

1. Action (A) – all possible actions the agent can perform 2. Environment (E) – the scenario the agent must explore 3. States (S) – all possible states in the given environment 4. Reward (R) – feedback from the environment to reward an action 5. Policy (π) – long-term value of current state S using policy π

Answer 18

f(s_t, a_t) = r_t+1 ## Footnote the agent is in state s_t, performs action a_t, and the environment returns the scalar reward r_t+1

Answer 19

s₀, a₀, r₁, s₁, a₁, r₂, s₂, a₂, r₃, s₃, …

Answer 20

by returning a pair of state and reward

Answer 21

Markov decision process

Answer 22

When the state holds all the relevant information about past actions.

Answer 23

π(s, a) = p(a_t = a | s_t = s)

Answer 24

how good it is for an agent to be in a state and perform a specific action in that given state

Answer 25

the discount factor in [0, 1] that controls how much the agent cares about future vs. immediate rewards | security of the expected return ## Footnote A γ close to 1 expresses high confidence that future rewards are "secure," while a γ close to 0 says "I only trust what's right in front of me."

Answer 26

the model of the environment

Answer 27

Learning happens directly from the experience in a system that is partially unknown. | the most prominent illustration of this principle is weather forecasting ## Footnote TD learning makes predictions based on correlation between subsequent predictions.

Answer 28

1. Choose an action for the current state 2. Perform the chosen action 3. Evaluate the outcome and update the Q matrix

Answer 29

Exploration: choosing an action randomly, maybe because the Q matrix is still full of 0s. Exploration is necessary to discover new promising strategies. Exploitation: choosing an action based on already aquired knowledge. Exploitation ensures that already proven methods are used optimally.

Answer 30

T(s, a, s′) = P(s_t+1 = s′ | s_t = s, a_t = a) | the P of arriving in *s′* given that the agent is in *s* and does *a* ## Footnote The function satisfies ∑_s′∈S T(s, a, s′) = 1 for all *s*, *a* — i.e., it's a proper probability distribution over next states.

Answer 31

**Q_t(s, a) = Q_t−1(s, a) + α TD_t(s, a)** | the new Q equals the old Q + a small correction. α (the learning rate) ## Footnote TD_t is the temporal difference error, which is essentially "how surprised was the agent" — the gap between what it expected and what it actually got. So the whole equation says: update your beliefs by a small fraction of your surprise.

Answer 32

States (s): Soil moisture level, time of day, temperature, humidity, plant type, days since last watering, season. Actions (a): What the bot can do — water 50ml, water 100ml, water 200ml, or do nothing. Rewards (r): Positive rewards for healthy soil moisture staying in the optimal range and negative rewards for overwatering (root rot, waterlogging) or underwatering (wilting, dry soil). Transitions (s → s′): After the bot takes action a in state s, the environment moves to s′. The learning loop then uses the Q-learning update: Q_t(s, a) = Q_t−1(s, a) + α TD_t(s, a)

Answer 33

Against a random opponent, the learning process has a low ceiling. The computer quickly discovers that simple tactics work and then its improvement stalls, because the opponent never punishes sophisticated mistakes. The reward signal becomes uninformative: it keeps winning with r = +1, the TD error shrinks to near zero, and Q-values stop updating meaningfully. The agent converges on a policy that's "good enough to beat chaos" but brittle against any robust strategy. Against a copy of itself, everything changes in several key ways. 1. The opponent co-evolves. Every time the agent finds a better strategy, its opponent instantly has that same improvement. There's no plateau — the bar keeps rising. This is sometimes called an arms race dynamic. 2. The reward signal stays rich. Against a random player, 95% of games end in a win, so r ≈ +1 almost always and the gradient vanishes. In self-play, the win rate hovers around 50%, which means the TD error remains large and informative throughout training. The agent is perpetually surprised, perpetually learning. 3. Exploration happens naturally. A random opponent already provides "random" situations, but they're meaninglessly random — nonsensical positions that won't arise in real play. A self-play opponent creates meaningfully challenging positions that force the agent to discover deep tactics like positional sacrifices, and long-term pawn structure planning. 4. The risk is strategy collapse — both copies might lock into a narrow cycle of mutual exploitation (A beats B beats C beats A) and never discover broader strategies. AlphaZero addressed this with Monte Carlo Tree Search (MCTS) for structured exploration alongside the neural network's policy and value heads. | DeepMind did with AlphaZero - leap from basic RL to self-play ## Footnote In terms of our Q update: Q_t(s, a) = Q_t−1(s, a) + α TD_t(s, a) Against a random opponent, TD_t shrinks fast and learning flatlines. In self-play, TD_t stays substantial because the opponent keeps getting smarter, so the Q-values keep refining toward genuinely optimal play — which is how AlphaZero surpassed centuries of human chess knowledge in roughly four hours of training against itself.

Answer 34

Early stage → exploration dominates Middle stage → gradual shift Late stage → exploitation dominates ## Footnote At the start, the Q-table is essentially blank. The agent has no basis for making good decisions, so exploiting Q-values would mean following noise. High ε (0.9–1.0) forces the agent to try random actions across many states, which does two critical things: it populates the Q-table with actual experience, and it prevents premature convergence on a bad policy. Think of the gardening bot on day one — it has no idea, so it should try everything and observe what happens. As TD errors accumulate and Q-values start reflecting reality, the agent can increasingly trust its own estimates. We anneal ε downward (e.g., from 0.9 toward 0.1), so the agent mostly follows its best-known actions but still occasionally tries something unexpected. This is where the bulk of the real learning happens — the agent is refining its policy, not building it from scratch. Once Q-values have converged (TD errors are consistently small), the agent should mostly exploit its learned policy. Low ε (0.01–0.05) keeps a tiny door open for discovering rare improvements, but the agent is now confident in its strategy. A well-trained agent shouldn't randomly sacrifice its queen just to "explore" — it should play its best move almost every time.

Answer 35

policy π | a function that determines the action based on the state ## Footnote Formally: π(s, a) = p(a_t = a | s_t = s). It can be deterministic (1 action / state) or stochastic (a probability distribution over actions).

Answer 36

the present state | This is the defining characteristic of the Markov property. ## Footnote Formally: P(s_t+1 | s_t, s_t−1, …, s₀) = P(s_t+1 | s_t).

Answer 37

a model-free approach

Answer 38

the immediate and the future expected reward

Answer 39

computer science, cognitive science, and linguistics

Answer 40

1. **Speech recognition** identifies words in spoken language and includes speech-to-text processing. 2. Natural **language understanding** extracts the meaning of words and sentences, as well as reading comprehension. 3. Natural language **generation** is the ability to generate meaningful sentences and texts.

Answer 41

1. Increased computing power 2. Shift of paradigms from complex rule-based systems to statistical tools (e.g. decision trees) 3. Part-of-speech tagging

Answer 42

Alan Mathison Turing

Answer 43

Eugene Goostman

Answer 44

* chatbots * named entity recognition * sentiment analysis * text summarization * topic identification * translation

Answer 45

TextRank | It compares every sentence of a given text with all other sentences. ## Footnote done by computing a similarity score for every pair of sentences

Answer 46

We count the shared words and then **divide by the sum of the log-lengths** of both sentences. ## Footnote The logarithmic normalization prevents long sentences from dominating just because they contain more words and thus have a higher chance of overlap with everything.

Answer 47

Topic identification focuses on **objective** aspects of the text. In contrast, sentiment analysis centers on **subjective** characteristics like moods and emotions.

Answer 48

irony/sarcasm, negation, and multipolarity

Answer 49

1. CoNLL-2003 scheme PER — Person names ORG — Organizations LOC — Locations MISC — Miscellaneous (nationalities, events, works of art, etc.) 2. MUC-6/7 scheme All the above + TIME and MONEY/PERCENTAGE/QUANTITY

Answer 50

* ambiguity in entity boundaries and types * domain adaptation * limited annotated data

Answer 51

Skype translator

Answer 52

the source and target languages are bridged using a third language

Answer 53

1. **Notification** assistants — unidirectional; push only 2. **FAQ** assistants — bidirectional; match user queries against a knowledge base 3. **Contextual** assistants — bidirectional and context-aware (conversation history)

Answer 54

text preprocessing

Answer 55

1. Tokenization 2. Stop word removal 3. Lemmatization or stemming

Answer 56

using regex

Answer 57

1. Explainability 2. Flexibility (no need to change application core if rules change) 3. Volume of training data required is relatively small

Answer 58

1. Expert working hours 2. Domain-specificity

Answer 59

a lot of annotated training data are required to produce good results

Answer 60

**Syntax** (e.g., tokenization, POS tagging) **Semantics** (e.g., sentiment analysis, NER, topic identification) **Discourse** (e.g., text summarization, topic identification) **Speech** (e.g., speech-to-text, text-to-speech)

Answer 61

I saw her duck.

Answer 62

NER, sentiment analysis

Answer 63

coherent texts that are longer than a single sentence

Answer 64

* analyzing coreference: linking linguistic expressions that refer to the same object or person * examining conversational structure * identifying topic structures

Answer 65

1. STT 2. TTS

Answer 66

* In a Boolean representation, the vector simply indicates if a word occurs. E.g. [1, 1, 1, 1, 1, 1, 1, 1]. * In a count of words, the resulting vector reflects how often a word occurs. E.g. [2, 1, 2, 1, 2, 2, 2, 1].

Answer 67

cosine similarity

Answer 68

**continuous** bag of words

Answer 69

all of them except for 1 for which the context has to be reconstructed

Answer 70

TF(t, d) = (no. of occurrences of t in d) / (no. of words in d)

Answer 71

the percentage of documents including a specific term t in relation to the total number of documents D

Answer 72

IDF(t, D) = log(1/DF(t, D))

Answer 73

A **global word–word co-occurrence matrix**, where each entry records how often two words appear together within a context window across the corpus. ## Footnote GloVe learns word vectors by factorizing the log of these co-occurrence counts.

Answer 74

Universal Sentence Encoder

Answer 75

* GloVe * TF-IDF * Word2Vec

Answer 76

a sequence of n consecutive words (bigram, trigram etc.)

Answer 77

the probability of a word occurring after a series of words contained in the history of words | P of w occurring after ℎ ## Footnote P(w │ ℎ) = P(beach │ I like to go to the)

Answer 78

P(w │ ℎ) = C(ℎw)/C(ℎ) ## Footnote P(beach │ I like to go to the) = C(I like to go to the beach)/C(I like to go to the)

Answer 79

Similar-sounding sentences can cause ambiguity. ## Footnote In the context of "I like to go to the b/peach", although both sentences might sound alike, it is statistically more likely that the last word of the sentence is “beach” rather than “peach".

Answer 80

Words or **sequences** **not** **present** in the training data **are** **assigned** a **p**robability of **0**, reducing the model’s robustness. Furthermore, statistical models **lack the ability to generalize effectively**, especially when compared to neural models, which are better equipped to capture complex linguistic patterns.

Answer 81

It converts the input text into a vector, which encapsu- lates all important information from the input sequence.

Answer 82

It takes the infomation from the encoded vector and converts it back to the original representation.

Answer 83

self-attention

Answer 84

The influence of input or earlier layers diminishes as it passes through deep layers, causing the network to struggle in learning long-range dependencies.

Answer 85

* BERT (Bidirectional Encoder Representations from Transformers, Devlin et al., 2018) * GPT (Generative Pre-trained Transformer, Radford, 2018) * RoBERTa (a Robustly Optimized BERT Pretraining Approach, Liu et al., 2019) * DistilBERT (a distilled version of BERT, Sanh et al., 2019) * XLNet (Yang et al., 2019)

Answer 86

The masked model takes a sentence from the training set. Next, about 15 percent of the words in that sentence are masked. | I like to [mask1] a cup of coffee with [mask2] in the morning. ## Footnote The model is then trained to predict the missing words in the sentence. The focus of the model is to understand the context of the words. The text data processing is no longer done in a unidirectional way from left to right or right to left.

Answer 87

The model receives a pair of sentences. The model’s goal is to predict if the first sentence is followed by the second sentence. | The resulting model focuses on how a pair of sentences is related.

Answer 88

In July 1949, Warren Weaver mathematician circulated a short memorandum titled "Translation" to about 200 colleagues. ## Footnote In it, he proposed that **translating** between natural languages could be treated as a problem of **cryptography** and **information theory** — arguing that a Russian text is really just an English text that has been "encoded" in Russian, and that with the right statistical methods a machine could "decode" it back.

Answer 89

by dynamically weighting the importance of surrounding words when computing each token's representation | qualitative leap over static embedding approaches (Word2Vec, GloVe) ## Footnote It is the reason transformer-based models handle polysemy, coreference, and syntactic ambiguity much better than anything that came before them.

Answer 90

at Stanford University

Answer 91

rule-based approaches

Answer 92

subjective aspects

Answer 93

to mimic human vision

Answer 94

1. gemoetry reconstruction 2. image restoration 3. motion analysis 4. recognition

Answer 95

1. Differentiating similar objects (ball vs. egg) 2. Illumination of an object (red vs. orange) 3. Location 4. Rotation 5. Size and aspect ratios

Answer 96

24 times more | Monochrome uses a single bit, being 0 or 1. ## Footnote True color uses 24 bits.

Answer 97

three parts | each 8 bits in length

Answer 98

black | (additive mixing)

Answer 99

from 0 to 1

Answer 100

a mapping from 2D coordinates (x, y) to specific color values

Answer 101

1. **constant** padding 2. **replication** padding 3. **reflection** padding

Answer 102

41 24 41 80 4 80 41 24 41

Answer 103

1. **Barrel** (positive radial) distortion 2. **Pincushion** (negative radial) distortion 3. **Complex** (mustache radial) distortion 4. **Fisheye** radial distortion

Answer 104

misalignment of the image sensor unit and the camera lens

Answer 105

the extrinsic and intrinsic parameters of a camera

Answer 106

* focal length * lens distortion parameters * optical center

Answer 107

1. the 3D real-world coordinate system 2. the 3D coordinate system of the camera 3. the 2D coordinate system of the projected image

Answer 108

1. Transform the coordinates from the 3D world to the 3D camera coordinates. 2. Transform the 3D camera coordinates to the 2D image coordinates.

Answer 109

1. We select at least 2 sample images. 2. We identify distinctive points in each image (corners). 3. We localize the distinctive points in the 3D real world as well as in the 2D image, calculating the camera matrix and the distortion coefficients.

Answer 110

blobs, corners, edges

Answer 111

1. Feature detection 2. Feature description/extraction 3. Feature matching

Answer 112

1. **Canny** edge detector 2. **Sobel** filter

Answer 113

2 | one for each of the axes

Answer 114

Feature detection Feature description Feature matching

Answer 115

Fully Convolutional Networks, which replaced the fully connected classification layers of standard CNNs with convolutional layers, enabling output at the same spatial resolution as the input. ## Footnote Modern approaches extend this with encoder–decoder designs (U-Net, DeepLab) and transformer-based architectures (SegFormer, Mask2Former).

Answer 116

In semantic segmentation, parts of an image belonging to the same **object class** are **clustered together**. ## Footnote The algorithm receives an image with several objects as input and returns an image where the pixels are labeled by their semantic category.

Answer 117

1. Diffusion models 2. GANs 3. VAEs

Answer 118

variational autoencoder

Answer 119

random noise

Answer 120

Contrastive Language-Image Pre-training

Answer 121

the generator produces only a limited variety of images

Answer 122

few-shot learning, transfer learning, use of pre-trained models

Answer 123

in the 1960s | Larry Roberts' 1963 PhD thesis (extracting 3D geoinfo from 2D photos) ## Footnote the 1966 MIT Summer Vision Project, where Minsky and Papert famously proposed "solving" computer vision as an undergraduate summer assignment — a scope estimate that turned out to be optimistic by about six decades

Answer 124

feature vectors

Answer 125

Generator - Generates synthetic data that should look as realistic as possible. Discriminator - Distinguishes between real and generated data, and provides feedback to improve the generator.

Answer 126

encoder-decoder

Answer 127

By learning to reverse a gradual noising process. | In training, noise is added; the model then learns to denoise. ## Footnote At generation time, it starts from pure noise and iteratively refines it into a coherent image.

Answer 128

it shifts the image one pixel to the left | translation/shift kernel ## Footnote Each output pixel copies the value of its right neighbor from the input.

Answer 129

Each output pixel becomes the arithmetic mean of the 3×3 neighborhood around it (9 pixels, each weighted equally at 1/9). | This smooths the image by averaging out local variations. ## Footnote reducing noise but also softening edges

Answer 130

Extrinsic parameters Rotation — the orientation of the camera relative to the world coordinate system Translation — the position of the camera in the world coordinate system Intrinsic parameters Focal length — the distance between the lens and the image sensor Optical center — the point where the optical axis meets the image plane Lens distortion coefficients — parameters describing radial and tangential distortion (as modeled e.g. by the Brown–Conrady model)

Answer 131

* autonomous driving (detecting vehicles, lanes, pedestrians) * geosensing (land use classification from satellite imagery) * medical imaging (e.g. tumor detection in brain scans) * pose estimation / motion capture (segmenting body parts)

Answer 132

Video denoising extends image restoration techniques to the temporal domain. | The broader umbrella term is video restoration. ## Footnote Classical image restoration methods (spatial filtering, transform-domain thresholding) can be applied frame by frame, but dedicated video denoising also exploits inter-frame redundancy and motion compensation for better results.

Answer 133

No. | a linear projection of the real-world scene is assumed ## Footnote Straight lines in the real world are expected to appear as straight lines in the image. Deviations from this assumption are what camera calibration and distortion correction aim to fix.

Answer 134

1 bit / pixel

Answer 135

U-Net architecture | developed by Ronneberger et al., 2015, for biomed image segmentation ## Footnote a symmetric encoder-decoder network with skip connections that turned out to be exceptionally well-suited for the iterative denoising process in diffusion models

Answer 136

validation set ## Footnote used to evaluate and further optimize the model's performance before the final, one-time evaluation on the test set

Answer 137

bidirectional encoder representations from transformers | (Vaswani et al., 2017) ## Footnote replacing the recurrent connections of RNNs/LSTMs with a self-attention mechanism, which relates all positions in a sequence simultaneously rather than processing tokens one by one

Answer 138

utilizing pre-trained models and transfer learning techniques | to minimize resource usage ## Footnote Instead of training from scratch, Sam should leverage a model already trained on vast datasets and fine-tune it for the specific task, drastically reducing the time, data, and compute needed.

Answer 139

IoT connects physical and virtual devices using technologies from information and communication technology.

Answer 140

- Ensuring that networks are healthy and secure - Optimizing and automating networks - Predicting network anomalies - **Predictive maintenance**: fix network issues before they occur

AI Flashcards

Learning about AI (174 cards)