09-reasoning-slms-key-formulas Flashcards

(20 cards)

1
Q

What is the Hallucination Paradox?

A

Reasoning models (e.g., OpenAI o1, o3) hallucinate at HIGHER rates on simple factual benchmarks despite improved logical reasoning. Mechanism: more internal claims in the reasoning chain create more opportunities for error propagation. o3 had a 0.51 hallucination rate vs. 0.44 for o1 on SimpleQA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Reasoning Models (e.g., o1, o3)?

A

LLMs that generate explicit chains of thought (“System 2 thinking”) before producing a final answer. They improve performance on logic and coding but exhibit the Hallucination Paradox on factual benchmarks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should you use a standard LLM with RAG instead of a Reasoning Model?

A

For applications requiring strict factual accuracy (e.g., benefits eligibility, regulatory citations). Reasoning models should be reserved for complex analysis where the logic can be verified (e.g., code vulnerability analysis, legal reasoning over provided statutes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Small Language Model (SLM)?

A

An LLM with fewer than ~7 billion parameters (e.g., Phi-3). Advantages: runs on edge devices without cloud connectivity, dramatically cheaper inference, data never leaves the device (privacy), and lower latency for simple tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the key advantages of SLMs for federal use?

A

(1) Edge capability — run on laptops or tactical devices without cloud; (2) Cost — orders of magnitude cheaper; (3) Privacy — data never leaves the device; (4) Speed — lower latency for simple tasks; (5) Bypasses data center energy bottleneck.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for the Compound Mistake Problem?

A

Accuracy_total = (Accuracy_step)^n, where n is the number of steps. A 95% per-step accuracy over 10 steps yields only ~60% total accuracy (0.95^10 ≈ 0.598).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is RAG in one sentence?

A

Retrieve relevant documents from an external knowledge base, inject them into the model’s context window, and generate answers grounded in those documents to reduce hallucination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Data Flywheel in one sentence?

A

A virtuous cycle where user interactions generate data that improves the AI system, which produces better interactions that generate more valuable data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is LoRA in one sentence?

A

A parameter-efficient fine-tuning method that adds small trainable low-rank matrices (~100MB) to frozen model layers, avoiding full retraining of billions of parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is AI-as-a-Judge in one sentence?

A

Using a strong LLM to automatically evaluate the outputs of another LLM for quality, faithfulness, or safety at scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Model Router in one sentence?

A

An abstraction layer that routes AI requests to different models based on complexity, cost, or capability — enabling vendor-agnostic, cost-optimized architectures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Instruction Hierarchy in one sentence?

A

A prompt design pattern where system-level instructions take precedence over user-level inputs, enforcing behavioral compliance regardless of user attempts to override.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Span-Level Verification in one sentence?

A

Requiring the model to cite the exact sentence in a source document that supports its answer, enforcing auditability and truthfulness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Hybrid Search in one sentence?

A

Combining dense vector search (semantic meaning) with sparse keyword search (BM25 for exact terms) to maximize retrieval recall in RAG pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Reranking in one sentence?

A

A second-stage retrieval step where a cross-encoder model re-scores initial search results to maximize precision before context is passed to the LLM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does Faithfulness measure in one sentence?

A

Whether a generated answer is derived ONLY from the retrieved context, with no hallucinated information introduced by the model.

17
Q

What is the Planner-Evaluator-Executor pattern in one sentence?

A

A decoupled agent architecture where a Planner proposes actions, an Evaluator checks them for safety and compliance, and an Executor carries out only approved actions.

18
Q

What is Evaluation-Driven Development in one sentence?

A

The discipline of defining success metrics and building the evaluation pipeline BEFORE building the AI system, so every design decision is tested against measurable criteria.

19
Q

What are the three memory types in Huyen’s hierarchy?

A

Parametric (model weights), Episodic (context window), and Semantic (vector database / external storage).

20
Q

What is PEFT in one sentence?

A

Parameter-Efficient Fine-Tuning methods (like LoRA) that adapt only a small subset of model parameters, dramatically reducing compute cost while preserving most fine-tuning benefits.