Prompt Engineering Flashcards

(446 cards)

1
Q

What is a prompt in the context of AI models?

A

The input given to an AI model to generate a response

A prompt can be a question, instruction, context, data, or a combination of these.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define prompt engineering.

A

The practice of designing and refining prompts for better AI outputs

It involves using structure, wording, and context intentionally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

List the key aspects of prompt engineering.

A
  • Clarifying the task
  • Specifying format
  • Setting role/perspective
  • Providing context and constraints
  • Using examples
  • Controlling creativity/speculation

These aspects help guide the AI to produce more reliable and useful outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or false: A prompt is the same as prompt engineering.

A

FALSE

A prompt is the input itself, while prompt engineering is the skill of designing that input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of clarifying the task in prompt engineering?

A

To ensure the AI understands what is being asked

For example, specifying the audience and format can lead to more relevant responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fill in the blank: Prompt engineering involves using structure, wording, and context ______.

A

on purpose

This intentional design helps the model perform as expected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does specifying format in prompt engineering help achieve?

A

It guides the AI to respond in a desired structure

For example, asking for a numbered list or a markdown table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does setting role/perspective influence AI responses?

A

It frames the context in which the AI should respond

For instance, asking the AI to act as a mentor or reviewer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the role of providing context and constraints in prompt engineering?

A

To limit the scope of the AI’s response

This ensures the AI stays relevant to the specified parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the benefit of using examples in prompt engineering?

A

It helps the AI understand the desired style and format

This technique is known as few-shot prompting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does controlling creativity/speculation in prompts entail?

A

Guiding the AI to avoid guessing or making assumptions

This can be achieved by instructing the AI to stick to established practices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the first step in prompt engineering?

A

Be explicit about the task

Clearly define the task, input, and expected output to avoid ambiguity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you enhance the model’s responses by setting a role or perspective?

A

Giving the model a ‘hat’ to wear makes responses more targeted

This adds built-in style, depth, and priorities to the response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a common pattern for specifying structure and format in prompts?

A

Answer in a markdown table with columns: X | Y | Z

This helps shape the answer and removes ambiguity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What should you do to control the scope and depth of the model’s response?

A

Define how big and how detailed the answer should be

This prevents overly broad or shallow responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is it important to provide context and constraints in prompts?

A

It helps the model make relevant choices

This leads to less generic advice tailored to specific situations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of using examples in prompting?

A

Show the model what ‘good’ looks like

This helps the model understand the desired output style.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a useful strategy for complex tasks in prompting?

A

Ask for step-by-step thinking or plans

This organizes the solution logically and clarifies the process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can you encourage the model to self-check its work?

A

List ways the answer could be wrong or incomplete

This reduces overconfidence and encourages critical thinking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the benefit of iterating with follow-up prompts?

A

Treat it like a conversation, not a one-shot

This allows for refining and improving the output progressively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What should you include in prompts to set constraints & guardrails?

A

Tell the model what not to do

This helps avoid vague phrases and ensures factual accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the universal prompt skeleton for adapting prompts?

A

Role: Act as a [role]. Task: Help me with [goal]. Audience: I am [who you are / skill level]. Constraints: Keep it [length/depth/style constraints]. Format: Reply as [list, table, sections, etc.]. Guardrails: If you’re unsure, say so; don’t make up facts.

This template can be tailored to various workflows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A prompt template is a __________ with blanks in it.

A

reusable prompt

A prompt template includes fixed instructions plus placeholders you fill in each time you use it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the two main parts of a prompt template?

A
  • Static part
  • Variable part

The static part contains instructions that stay the same, while the variable part includes slots you swap out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
The **static part** of a prompt template includes __________.
instructions that stay the same ## Footnote This part is consistent across different uses of the template.
26
The **variable part** of a prompt template includes __________.
placeholders you fill in ## Footnote These placeholders can be swapped out for specific values each time the template is used.
27
What is one benefit of using **prompt templates**?
* Consistency * Speed * Scalability * Quality improvement ## Footnote These benefits help streamline the process of generating prompts for various tasks.
28
True or false: **Prompt templates** help improve quality over time.
TRUE ## Footnote You can tweak the template itself, and all future uses will benefit from the improvements.
29
In the context of prompt engineering, what does **prompt engineering** refer to?
designing good instructions/structures ## Footnote This involves creating effective prompts that yield the desired responses.
30
What is the purpose of a **prompt template** in tools like LangChain?
define prompts in code ## Footnote It allows for programmatic insertion of user input and other variables.
31
Fill in the blank: A prompt template is like a __________ or a function.
form ## Footnote It provides a structured way to create prompts with fixed and variable components.
32
What is a common **mistake** when prompts are too vague?
They lead to super **generic**, textbook-style answers ## Footnote This often results in responses that do not align with the user's actual goal.
33
What happens when multiple different tasks are crammed into one prompt?
* Picks one or two tasks to do decently * Skims or half-does the rest * Feels scattered and incomplete ## Footnote Conflicting constraints can lead to shallow coverage on everything.
34
True or false: Conflicting or overloaded instructions can lead to random-seeming behavior in LLMs.
TRUE ## Footnote The model has to choose one side of the conflict, often violating the more restrictive part.
35
What is the effect of not specifying an audience or depth in a prompt?
It results in a **middle-of-the-road** explanation that may not suit anyone ## Footnote This can lead to responses that feel either too shallow or too dense.
36
What is a common mistake when not specifying output format?
Responses are often free-form paragraphs ## Footnote This makes it harder to copy into tools, spreadsheets, or docs.
37
What is the consequence of under-specifying context in a prompt?
The model applies **generic optimization criteria** ## Footnote It fills in missing constraints with plausible-sounding but imaginary assumptions.
38
What is a leading mistake when asking biased questions?
The premise is baked into the question ## Footnote This leads to the model downplaying or omitting counterarguments.
39
What happens when asking for impossible or out-of-scope behaviors?
* The model may say it can't do that * It might hallucinate a price or state ## Footnote Users can misinterpret this as the AI knowing something.
40
What is a common mistake when expecting perfection in one shot?
The output may be missing edge cases or integration issues ## Footnote This leads to polished-looking but non-functional results.
41
What should be included in prompts to address uncertainty or limitations?
A section titled ‘**Uncertainty & Limitations**’ ## Footnote This invites the model to flag its own blind spots.
42
What is a mistake related to over-trusting stylistic fluency?
Assuming polished writing equals correct information ## Footnote This can lead to accepting incorrect answers due to their authoritative-sounding output.
43
What should you be **explicit** with in time references when writing prompts?
Use specific dates instead of vague phrases like 'last 10 years' ## Footnote LLMs do not automatically apply current system time to relative phrases.
44
Why should you **avoid ambiguity** in relative phrasing?
Words like 'recent' or 'latest' are contextually fluid and can lead to misinterpretation ## Footnote LLMs fill in meaning based on the strongest historical data pattern.
45
What are **temporal grounding tokens**?
Phrases like 'as of today' or 'current date' that act as soft constraint hints ## Footnote They help steer generation by weighting newer data more heavily.
46
What is the impact of using **lower temperature settings**?
Reduces speculation and increases factual accuracy ## Footnote Recommended for date-sensitive queries.
47
What does the **top-p (nucleus sampling)** setting do?
Limits randomness by filtering unlikely tokens ## Footnote Use with lower temperature for accurate date-grounded results.
48
What is a good way to ask the model to **recompute timeframes**?
Can you recompute what 'last 10 years' means based on today’s date? ## Footnote This triggers a runtime reinterpretation of relative terms.
49
Fill in the blank: Use **absolute dates** like __________ instead of vague terms.
Between 2015 and 2025 ## Footnote This provides clarity and specificity in prompts.
50
What is the recommended **temperature setting** for accuracy?
0.3 ## Footnote Lower values reduce speculation for factual queries.
51
What should you do to **reference the current date explicitly**?
Use phrases like 'As of June 26, 2025' ## Footnote This helps the model understand the timeframe for the query.
52
True or false: Using vague terms like 'recent' is effective for precise prompting.
FALSE ## Footnote Vague terms can lead to misinterpretation by the model.
53
What is the importance of **query specificity** in prompt engineering?
It drives high-quality large language model (LLM) performance ## Footnote Vague prompts often lead to shallow, generic responses.
54
How do **generic prompts** affect LLM responses?
They activate common language patterns, resulting in generic answers ## Footnote This is due to LLMs being statistical sequence predictors.
55
What is the role of **priming** in prompt engineering?
It provides contextual signals that shape the model's behavior ## Footnote Priming can influence tone, depth, format, or persona.
56
What are the two main issues with vague prompts?
* Activate high-probability completions * Lack of contextual depth ## Footnote This leads to lowest-common-denominator content.
57
What should you add to a prompt to improve **domain or audience context**?
Specify the audience's background and expectations ## Footnote For example, target a graduate-level audience for technical explanations.
58
How can you specify the **output format** in a prompt?
Declare the structure and type of output desired ## Footnote For example, request a markdown table for comparisons.
59
What is the benefit of adding a **role or persona** in a prompt?
It helps the model adopt a style and purpose, reducing ambiguity ## Footnote For instance, asking the model to act as a CISO can tailor the response.
60
What does defining the **scope of the answer** accomplish?
It improves relevance and activates better reasoning ## Footnote For example, setting a word limit can trim verbosity.
61
What is a common pitfall when using the prompt 'What is X?'?
It triggers generic summary templates ## Footnote Instead, ask for comparisons or specific use cases.
62
When should you use **generic prompts**?
* Exploring unknown terrain * Seeking broad perspectives * Accepting surface-level synthesis ## Footnote Useful for brainstorming or initial exploration.
63
When is it essential to use **specific, primed prompts**?
* Needing depth or rigor * Simulating a role or audience * Performing structured tasks ## Footnote This ensures precision and reliability.
64
What is **meta-prompting**?
Teaching the model how to interpret the prompt ## Footnote For example, instructing it to identify the intended audience before answering.
65
What is the final takeaway regarding **generic prompts**?
They lead to statistically likely answers to vague questions ## Footnote Specificity and priming unlock the full potential of LLMs.
66
What are **system-level instructions** in the context of LLMs?
Behind-the-scenes directives that shape how LLMs interpret and respond to queries ## Footnote They influence tone, persona, output style, memory behavior, and more.
67
True or false: User prompts are the only factors influencing LLM responses.
FALSE ## Footnote System-level instructions also play a crucial role in shaping responses.
68
What does a **system prompt** typically include?
* Instructions on tone * Formatting guidelines * Behavioral expectations ## Footnote It sets the context for how the model should respond.
69
How does the **system instruction** affect the model's response?
It primes the model to be cooperative, avoid speculation, and format answers helpfully ## Footnote Different instructions can lead to vastly different response styles.
70
What is the impact of the **first token** in the input sequence for LLMs?
It heavily influences downstream behavior and sets the tone for the entire response ## Footnote Similar to how the first line of a book sets the tone.
71
List some aspects that system prompts can control beyond tone.
* Response length expectations * Factuality bias * Stylistic choices * Confidence calibration * Handling of edge cases ## Footnote These factors can significantly alter the output.
72
What is a **meta-prompt**?
A prompt that simulates system-level instructions by embedding role, tone, and audience context ## Footnote It helps guide the model's behavior without direct access to system prompts.
73
What should you do to **control behavior** with inline constraints?
Embed soft rules into your prompts, such as specifying response length or style ## Footnote This signals the model's decoding behavior.
74
When should you use **system-level control**?
* For consistent tone * When role-specific behavior is needed * For output predictability ## Footnote It is useful for automation or evaluation.
75
When should you **avoid heavy system control**?
* During creative brainstorming * When users expect natural language freedom * When testing ambiguous inputs ## Footnote Flexibility is important in these scenarios.
76
What are common **pitfalls** to avoid when using system-level instructions?
* Ignoring system influence * Repeating instructions in every message * Conflicting tone * Assuming formatting defaults ## Footnote These can lead to inefficient or inconsistent outputs.
77
What is the **final takeaway** regarding system-level instructions?
They are the invisible scaffolding of LLM behavior, essential for steering the model toward productive outcomes ## Footnote Mastering system-level instructions can elevate prompt engineering.
78
What is the **token budget** in prompt engineering?
A constraint on the number of tokens that can be processed in a single interaction ## Footnote Exceeding this budget can lead to truncation and degraded response quality.
79
LLMs think in **tokens**, not _______.
words ## Footnote This difference is crucial for understanding how to interact with large language models.
80
What are the **token limits** for the following LLMs? 1. GPT-4o 2. Claude 3 Opus 3. Gemini 1.5 Pro
1. ~128,000 tokens 2. ~200,000 tokens 3. ~1 million tokens ## Footnote These limits include system prompts, user prompts, chat history, and model output.
81
What happens when the **token budget** is exceeded?
Earlier content is truncated ## Footnote This can lead to a decline in response quality.
82
What is the role of a **tokenizer** in LLMs?
Converts text into tokens for processing ## Footnote Tokens are mapped from subword units and passed through transformer layers.
83
What is the effect of a **sliding attention window** in transformer models?
* Prioritizes newer tokens * De-emphasizes older tokens ## Footnote This can lead to forgetting earlier parts of long conversations.
84
What are the consequences of **redundant tokens** in prompts?
* Loss of focus * Circular logic * Output truncation ## Footnote Redundant tokens can increase the likelihood of these issues.
85
How can you **trim redundancy** in prompts?
By using concise language ## Footnote For example, instead of saying 'In this task, you are expected to help me summarize...', simply say 'Summarize the article’s main argument in simple terms.'
86
What is a **token-efficient formatting** technique?
Using markdown lists or tables instead of long prose ## Footnote This helps compress information better and makes it easier to scan.
87
When is it acceptable to use **abbreviations** in prompts?
When the model understands the domain ## Footnote Examples include 'LLM' for 'large language model' and 'Ctx window' for 'context window'.
88
What is a strategy for **chunking information** across turns?
* Split prompts into phased stages * Avoid pasting entire documents at once ## Footnote This keeps each message within a clean token budget.
89
In what scenarios does **verbosity** help?
* Context setup * Role definition (used carefully) ## Footnote Models need context for multi-step reasoning or simulating persona.
90
When does **verbosity** hurt in prompt engineering?
* System instructions * Formatting explanations * Prompt templating (if repeated) ## Footnote These can waste tokens and weaken constraints.
91
What is **meta-prompting**?
Using the model to optimize its own prompts ## Footnote For example, asking the model to rewrite a prompt using fewer tokens.
92
What tools can be used for **measuring token count**?
* OpenAI tokenizer * Anthropic token estimator * Python tools like `tiktoken` or `transformers` ## Footnote Tracking token usage is critical for performance and cost control.
93
What is the **final takeaway** regarding token budgeting?
It's about strategic communication ## Footnote Every word, comma, and formatting choice affects token consumption and the model's reasoning ability.
94
What are the **three main tools** used in prompt engineering to control LLM outputs?
* Constraints * Guiding language * Structural markers ## Footnote These tools shape how the model interprets and generates responses.
95
What is the **core problem** with unbounded prompts in LLMs?
* Over-elaboration * Choosing its own format * Introducing irrelevant details * Misinterpreting the intended task ## Footnote Unbounded prompts can lead to outputs that do not align with user intent.
96
True or false: LLMs perform best when given **boundaries**.
TRUE ## Footnote Boundaries help the model produce more consistent and aligned outputs.
97
What happens when constraints are added to prompts?
* Collapses the distribution of potential next tokens * Guides the decoding process ## Footnote This helps the model stay on-task and produce relevant outputs.
98
What do **structural markers** do in prompt engineering?
They signal structure to the model, such as numbered lists or markdown syntax ## Footnote Using structural markers primes the model to match the desired output format.
99
Instead of asking to **describe the role of AI in healthcare**, how should you phrase the prompt for better results?
Explain the role of AI in healthcare in three sentences. Focus only on diagnostics. ## Footnote This limits verbosity and narrows the domain.
100
What is the benefit of using **output format markers** in prompts?
* Anchors model output * Encourages clear alignment of data * Easier to parse, reuse, and verify ## Footnote Format markers help structure the response effectively.
101
Fill in the blank: Use guiding language to tell the model what **not to do**. For example, 'In this response, do not explain what blockchain is. Focus only on _______.
why it improves auditability ## Footnote This helps restrict unnecessary exposition.
102
What is an example of **constraint chaining** in prompt engineering?
Summarize the following research abstract in under 100 words. Use plain language appropriate for high school students. Present the result as a three-sentence paragraph. ## Footnote This reduces ambiguity at every level.
103
What is a common **pitfall** when prompts lack format specification?
The model invents its own layout ## Footnote Specifying a format helps guide the model's response.
104
When should you use **strong constraints** in prompts?
* When needing structured outputs * For automated workflows * To minimize variance across runs ## Footnote Strong constraints are useful for tasks requiring consistency.
105
When is it appropriate to use **loose prompts**?
* Brainstorming or ideation * Encouraging creativity * Simulating personalities or debates ## Footnote Loose prompts are better when exploring ideas.
106
What is **meta-prompting**?
Using the model to check its own constraints ## Footnote This can enhance the accuracy of the output.
107
What is the **final takeaway** regarding constraints in prompt engineering?
They are mechanisms of control that influence token selection and response formatting ## Footnote Mastering these techniques is essential for effective prompt design.
108
What is the main capability of **large language models (LLMs)** like ChatGPT?
Ability to **reason step-by-step** ## Footnote This capability allows for more transparent and logical responses.
109
Why do LLMs default to **answer-first output**?
They prioritize **confident-sounding answers** over showing reasoning paths ## Footnote This is a predictable byproduct of how LLMs generate text.
110
What happens when you prompt an LLM with a math question like **37 × 12**?
It may respond with a direct answer, like **444**, instead of showing the reasoning steps ## Footnote This highlights the need for specific prompting to elicit reasoning.
111
What does **greedy token generation** in LLMs lead to?
It skips reasoning by default, generating the most likely continuations ## Footnote This often results in direct answers rather than logical explanations.
112
What is the purpose of using **Chain-of-Thought (CoT) prompts**?
To shift the output frame towards **structured logic patterns** ## Footnote This encourages the model to simulate reasoning.
113
How do LLMs **imitate reasoning**?
They mimic how humans write logical steps rather than performing symbolic logic ## Footnote This requires careful prompting to achieve desired outputs.
114
What should you include in prompts to ensure **step-by-step reasoning**?
* Signal that step-by-step output is desired * Structure the format to reduce hallucinated steps * Anchor focus on intermediate reasoning ## Footnote These techniques help guide the model's responses.
115
What is an example of a **Chain-of-Thought prompt**?
Instead of asking, 'Is 173 divisible by 4?', use: 'Let’s think step by step. First, divide 173 by 4.' ## Footnote This encourages the model to follow a logical process.
116
What is the benefit of using **role framing** in prompts?
It improves step fidelity by shifting the tone toward **instructional clarity** ## Footnote This helps the model provide clearer reasoning.
117
What are **common pitfalls** to avoid when prompting LLMs?
* Asking only for the answer * No structure to reasoning * Overloading with unrelated tasks * Assuming math = logic ## Footnote These pitfalls can lead to incoherent or incomplete responses.
118
When should you use **step-by-step reasoning** prompts?
* When needing traceable logic paths * For multi-part inference * To test reasoning quality * To reduce hallucinations ## Footnote These scenarios benefit from structured reasoning.
119
When should you **avoid** using step-by-step reasoning prompts?
* When only a summary is needed * If the prompt is ambiguous * When optimizing for speed ## Footnote In these cases, reasoning may introduce unnecessary verbosity.
120
What is **self-reflection prompting**?
Prompting the model to **check its own steps** for logical errors ## Footnote This is useful for validation in complex reasoning tasks.
121
What is the **final takeaway** regarding showing work in LLMs?
LLMs can emulate structured thought when prompted correctly ## Footnote This is essential for rigorous applications in various domains.
122
What is the **core challenge** in prompt engineering with LLMs regarding time-relative expressions?
Handling **contextually ambiguous** phrases like 'last quarter' or 'next summer' ## Footnote These phrases may seem clear to humans but require dynamic resolution for LLMs.
123
True or false: LLMs automatically resolve temporal references based on the current system date.
FALSE ## Footnote LLMs resolve temporal references based on **statistical priors** unless explicitly prompted.
124
What do LLMs rely on to resolve temporal phrases when not prompted?
**Statistical priors** learned from historical training data ## Footnote This can lead to outdated or mismatched timeframes.
125
What are **latent temporal embeddings** in the context of LLMs?
Statistical associations built from patterns in training data ## Footnote Examples include phrases like 'the past five years' being associated with specific years.
126
What happens when an LLM is asked about 'the past 5 years' without context?
It may yield **2015–2020** instead of the correct timeframe ## Footnote This is due to reliance on **static language priors**.
127
How can you ground a prompt in a **deterministic temporal frame**?
Replace relative phrases with **explicit date ranges** ## Footnote For example, use 'between 2015 and 2025' instead of 'the past decade'.
128
What is a **meta-prompt** in the context of temporal resolution?
A prompt that instructs the model to reinterpret relative time references ## Footnote Example: 'Assume today is June 26, 2025.'
129
What is the purpose of **temporal grounding tokens**?
To constrain the model’s interpretation of time-relevant phrases ## Footnote Examples include phrases like 'As of [date]' or 'Between [start year] and [end year]'.
130
What is a common pitfall when using relative phrasing in prompts?
Ambiguity in phrases like 'In recent years...' ## Footnote This can lead to confusion about what time period is being referenced.
131
When should you use **relative phrasing** in prompts?
* In a live conversation * When the date context is established * When the model acts in a dynamic assistant role ## Footnote This allows for flexibility and follow-up clarifications.
132
When is it better to resolve temporal references explicitly?
* For reporting or summarization * When outputs are used downstream * When traceable temporal context is needed ## Footnote This ensures accuracy and clarity in outputs.
133
What is the **final takeaway** regarding LLMs and temporal resolution?
LLMs predict language and do not perceive time ## Footnote Temporal resolution must be engineered into prompts to avoid outdated outputs.
134
What is the benefit of **scripted date injection** in LLM workflows?
Ensures time-relevant queries remain aligned with the **current date** ## Footnote This can be done dynamically using programming.
135
What is a **two-phase approach** for complex temporal queries?
* Step 1: Convert relative time phrases to absolute date ranges * Step 2: Answer the question using those date ranges ## Footnote This ensures proper temporal disambiguation.
136
What should you do to maintain consistency in long interactions with LLMs?
Provide a **persistent time anchor** ## Footnote This helps the model interpret relative phrases based on a fixed date.
137
What is the main challenge when working with **large language models (LLMs)** like ChatGPT?
Understanding whether a model’s output is based on **knowledge** or a **statistical assumption** ## Footnote Users often assume the model has internal certainty, but it approximates knowledge through token prediction.
138
True or false: LLMs retrieve knowledge from a fact database.
FALSE ## Footnote LLMs generate the most likely next token based on input context and training corpus.
139
What does LLM stand for?
Large Language Model ## Footnote Examples include ChatGPT, Claude, Gemini, and Grok.
140
What is the term for when LLMs produce incorrect information confidently?
Hallucination ## Footnote This occurs when confidence is low, leading to plausible but incorrect outputs.
141
What does **token prediction** not equate to?
Epistemic certainty ## Footnote LLMs predict the next token based on patterns, not verified truths.
142
What is a significant limitation of LLMs regarding factual information?
No internal fact graph ## Footnote LLMs compress text into a latent space without storing verified facts.
143
What can confident language in LLM outputs be misleading about?
Accuracy of the information ## Footnote Confident tone does not guarantee factual correctness.
144
What is one technique to assess the model's confidence in its responses?
Ask the model to assess its own confidence on a scale from 1 to 5 ## Footnote This primes the model to simulate caution when unsure.
145
What should you prompt the model to do to justify its claims?
Cite sources or reasoning behind the answer ## Footnote This encourages multi-hop reasoning and reduces hallucination.
146
What type of questions can reveal ambiguity in LLM responses?
Ask for counterfactuals or alternatives ## Footnote This encourages the model to explore multiple viewpoints.
147
What is a structured output format to use for confidence labeling?
| Claim | Confidence (High/Medium/Low) | Justification | ## Footnote This format helps distinguish between likely true and likely guessed information.
148
What is a common pitfall when interpreting LLM outputs?
Taking confident tone as proof ## Footnote Surface fluency does not equate to factual grounding.
149
When should you use confidence-assessing strategies?
When generating summaries, analyses, or recommendations ## Footnote Precision is crucial in fields like law, finance, or science.
150
What is a meta-prompting technique to improve LLM outputs?
Embed meta-instructions to distinguish between facts and assumptions ## Footnote This primes the model to simulate epistemic discipline.
151
What is the final takeaway regarding LLMs and knowledge?
LLMs don’t truly know anything; they simulate knowledge through pattern prediction ## Footnote Outputs mix grounded information with stylistically confident guesses.
152
What is the **core problem** with treating prompts as single-use magic spells?
They miss the reality that **prompt engineering is an iterative process** ## Footnote High-performance output requires deliberate, data-driven refinement.
153
List the three main reasons why prompt failures occur.
* Ambiguous token phrasing * Latent pattern mismatches * Under-specified goals ## Footnote These issues arise from misunderstanding how models think.
154
True or false: LLMs **reason backward** from your intent.
FALSE ## Footnote LLMs generate forward from tokens, which can lead to loss of the original goal.
155
What does **token sampling** in LLMs rely on?
**Probability distribution** ## Footnote Each token is chosen based on probabilities, creating a branching tree of likely continuations.
156
What is the purpose of **iterative refinement** in prompt engineering?
To **prune unproductive branches** and re-weight the generative path ## Footnote This helps improve the quality of the output.
157
How do **latent biases** affect prompt outcomes?
They compete with surface tokens, influencing the model's tone and structure ## Footnote Different wording can lead to different associations.
158
What must you design your workflow to do regarding LLM prompts?
To **observe failure modes** and adjust inputs ## Footnote This is necessary since the model cannot self-diagnose prompt fit.
159
What is the first step in the **iterative prompt refinement** strategy?
Isolate your **objective** ## Footnote Clarify what a successful output looks like using objective success criteria.
160
What should you do in **Step 2** of iterative prompt refinement?
Run **multiple prompt variants** simultaneously ## Footnote This helps identify which dimension of control your prompt is failing on.
161
What is **reflective re-prompting**?
Prompting the model to **analyze its own response** and suggest improvements ## Footnote This can surface hidden assumptions and propose refinements.
162
What should you extract from generated outputs in **Step 4**?
Common **failure types** ## Footnote Group these by prompt variant to identify which elements steer toward desired outcomes.
163
What is the goal of consolidating best traits into **composite prompts**?
To improve final precision by combining high-performing elements ## Footnote This mirrors ensemble learning in AI.
164
What is the purpose of using **evaluation prompts**?
To quantify improvements across generations ## Footnote This allows for quality control and identification of stagnation.
165
Differentiate between **prompt mutation** and **prompt evolution**.
* Prompt mutation: Broad exploration * Prompt evolution: Narrow optimization ## Footnote Balancing both accelerates convergence while avoiding local maxima.
166
What is a benefit of using **parameterized prompts**?
It unlocks **programmatic iteration** ## Footnote This is ideal for batch testing across topics, formats, and domains.
167
List one pitfall to avoid in iterative prompting.
* Only tweaking surface phrasing * Ignoring model completion style * Re-prompting without feedback loops * Optimizing a bad baseline ## Footnote Each pitfall can lead to ineffective prompt engineering.
168
What is the **final takeaway** regarding prompt refinement?
It’s about building a controlled loop of **evaluation, mutation, and feedback** ## Footnote This aligns model output with human goals.
169
How do **previous prompts** influence the model’s interpretation of new ones?
They shape how the model interprets subsequent prompts ## Footnote Users often assume each prompt is processed independently, but in multi-turn conversations, this is not the case.
170
What is the role of **contextual engines** in LLMs?
They infer context across a growing prompt stack ## Footnote This can lead to powerful features or unintended biases in responses.
171
What happens when you ask a simplified question like, 'Explain the difference between AC and DC power like I’m five'?
The model knows you want a simplified explanation and assumes a non-technical baseline ## Footnote This affects the complexity of subsequent answers.
172
What is a **context window** in LLMs?
A window of past tokens that includes every prompt and generated response ## Footnote The model doesn't forget prior turns unless manually cleared or the context length limit is exceeded.
173
What does each new prompt do to the model’s internal embeddings?
It updates the model’s embedding of what the conversation is about ## Footnote This includes tone, scope, domain, formality, persona, and topical assumptions.
174
True or false: Newer tokens in a conversation carry more weight than earlier ones in LLMs.
TRUE ## Footnote However, earlier instructions can persist longer than expected if they set strong priors.
175
What should you use to manage persistent influence in LLMs?
* System messages or priming blocks * Role resetting or meta-reframing prompts * Deliberate segmentation for multi-mode workflows ## Footnote These techniques help control how context affects responses.
176
What is a common pitfall when asking unrelated questions in one thread?
Prior context still influences new tasks ## Footnote Starting a new thread or giving reset instructions can fix this issue.
177
When should you allow **prompt persistence**?
* Building cumulative logic or storylines * Wanting style and voice consistency * Performing roleplay, tutoring, or sequential Q&A ## Footnote This can enhance the effectiveness of the conversation.
178
When should you **avoid prompt persistence**?
* Testing isolated outputs * Comparing responses across formats or temperature settings * Doing fact-checking or precision generation tasks ## Footnote Avoiding persistence helps maintain clarity and accuracy.
179
What is **meta-prompting**?
Teaching the model to watch itself ## Footnote This can expose biases and help adjust responses based on prior context.
180
What is the final takeaway regarding prompts in multi-turn conversations?
Every prompt is a sculpting tool that reshapes the model’s expectations, tone, and structure ## Footnote Master prompt engineers understand when and how prompts echo, designing sessions with precision.
181
What is the main challenge in **prompt engineering** for APIs and code generation?
Designing prompts that reliably generate usable code, accurate API calls, or coherent responses ## Footnote This involves understanding how LLMs interpret structure, intent, and modality.
182
True or false: LLMs can validate code against a compiler or interpreter.
FALSE ## Footnote LLMs predict tokens based on statistical patterns, not by executing code.
183
When prompting an LLM to write a function, what does it rely on?
* Statistical patterns in its training corpus * Recent examples in the context window * Priming cues from the prompt ## Footnote Ambiguous prompts can lead to incorrect guesses about API structure or syntax.
184
What is the difference between **syntax** and **semantics** in the context of LLMs?
* Syntax: Structure of code * Semantics: Meaning and logic behind code ## Footnote LLMs can produce syntactically correct code but may fail on semantic correctness.
185
What should you include in a prompt to reduce hallucination when asking for code?
* Format * Constraints * Dependencies * Inputs ## Footnote These elements clarify the intent and reduce ambiguity.
186
What is an **intent block** in prompt engineering?
A structured format that clarifies the requirements for code generation ## Footnote It includes details like input, output, and specific libraries to use.
187
What is a common pitfall when prompting for code?
Being too vague in requests ## Footnote For example, asking for 'code to call the API' without specifics can lead to incorrect outputs.
188
What is the recommended approach for **retrieval-augmented generation (RAG)** prompts?
Tell the model what’s in context and instruct it not to guess ## Footnote This helps lower the risk of hallucination when relevant data is missing.
189
What is a **meta-prompt**?
A prompt that anchors the model’s generation probability toward a desired format ## Footnote Examples include specifying coding style or library usage.
190
What is the suggested workflow for validating generated code?
* Prompt to write the code * Feed the code back for validation * Ask for fixes based on feedback ## Footnote This mirrors a multi-pass compiler strategy.
191
What should you do when asking for an **API interaction**?
Include the API spec or schema in the prompt ## Footnote This avoids LLM creativity that may generate invalid payloads.
192
What is the final takeaway regarding prompting for code and APIs?
It requires clarity of input/output contracts, format priming, and retrieval control ## Footnote LLMs do not validate or run what they generate; they pattern-match.
193
What is a **low-memory language model**?
A model with limited context windows (e.g., 2K or 4K tokens) that does not persist memory across prompts ## Footnote These models require specific strategies for effective prompting due to their constraints.
194
What is the **context window** in transformer-based LLMs?
A finite input length that determines how many tokens can be processed at once ## Footnote For smaller models, this might be just 2,048 tokens.
195
True or false: Low-memory models automatically summarize or condense earlier inputs.
FALSE ## Footnote Users must introduce explicit state compression for continuity.
196
What does **self-attention** allow transformer models to do?
Weigh every token in the input sequence relative to others ## Footnote However, it does not store long-term state across invocations.
197
What happens when the token limit is reached in low-memory models?
Earlier parts are **clipped** ## Footnote They are not compressed or summarized unless done explicitly.
198
What is a **scaffolded prompt architecture**?
A structured approach that breaks tasks into stages ## Footnote Example: Summarize project background, summarize API, generate test cases.
199
How can **named anchors** help in low-memory models?
They simulate memory by providing semantic tags for contextual recall ## Footnote Example: Anchor A defines API endpoints, Anchor B describes client SDK requirements.
200
What is the benefit of optimizing **prompt-to-token efficiency**?
Every token counts in low-memory settings ## Footnote Using symbolic control cues or compressed structures can save space.
201
What is a method to create **sliding windows** in prompts?
Split input into batches and summarize each ## Footnote This emulates long-term memory through manual chunking and synthesis.
202
List three **common pitfalls** when working with low-memory models.
* Sending too much at once * Verbose prompt instructions * Assuming model 'remembers' across runs ## Footnote Each of these can lead to ineffective prompting.
203
When should you **avoid using low-memory models**?
* Long contextual chains without manual state tracking * Large document analysis without chunking * Real-time multi-turn conversation ## Footnote These scenarios require more memory than low-memory models can provide.
204
What is **meta-prompting**?
Explicitly re-embedding state to simulate memory ## Footnote Example: Referencing previous definitions in a new prompt.
205
What is the **final takeaway** regarding low-memory LLMs?
They reveal the maturity of a prompt designer and require careful planning ## Footnote Mastering these strategies can yield surprisingly high-quality results.
206
What is the main focus of **mixed-modality prompting**?
Engineering prompts for text, images, code, and audio ## Footnote This involves leveraging the capabilities of models like Gemini, Claude, GPT-4o, and Mistral.
207
True or false: **Multimodal inputs** are treated equally by models.
FALSE ## Footnote Models often prioritize text over other modalities unless explicitly instructed otherwise.
208
What must models do with each modality in **mixed-modality prompting**?
* Convert to tokenized embeddings * Concatenate into a single attention stream ## Footnote Text typically dominates the probability space unless specified otherwise.
209
What are the **four types of encoders** used in multimodal models?
* Text → Tokens → Transformer embeddings * Images → Vision encoder → Patch embeddings * Audio → Spectrogram or waveform encoder → Embeddings * Code → Tokenizer + Code-aware layers ## Footnote These encoders process different types of input data.
210
What is a common **failure** when prompting with images?
Not providing textual grounding ## Footnote For example, asking about an image without describing its content can lead to generic responses.
211
What is the **better way** to prompt a model regarding an image?
Describe the image and ask specific questions about it ## Footnote For instance, stating what the image shows before asking for analysis.
212
What should you use to signal what each input represents in a prompt?
Explicit **modality grounding tokens** ## Footnote This helps the model understand how to assign attention to different inputs.
213
What is a recommended technique when using **visual inputs**?
Pair them with **minimal textual summaries** ## Footnote This primes the model’s image-token alignment.
214
What is the benefit of **prompting sequential reasoning** across modalities?
Improves processing by breaking tasks into modality-specific stages ## Footnote This allows for clearer analysis and comparison.
215
What should you do if you want an image as output from a multimodal prompt?
Explicitly state the desired output modality ## Footnote For example, saying 'generate an image' or 'produce code'.
216
List some **common pitfalls** to avoid in multimodal prompting.
* No text accompanying an image * Multiple images with no labels * Vague prompts * Expecting output in the wrong format ## Footnote These issues can lead to ineffective model responses.
217
When should you **use multimodal prompts**?
* Spatial or visual reasoning needed * Input data can't be efficiently expressed in text * Semantic alignment of image and text desired ## Footnote These scenarios leverage the strengths of multimodal models.
218
When should you **avoid multimodal prompts**?
* Precise mathematical values needed * Input data better structured as JSON, CSV, or code * Working with modality-sensitive data ## Footnote These situations may lead to inaccuracies.
219
What is a **meta-prompt**?
A reusable template for guiding model tasks across modalities ## Footnote It helps in structuring tasks for better alignment.
220
What is the **final takeaway** regarding mixed-modality prompting?
It's about orchestrating attention, not just stacking inputs ## Footnote Properly structured prompts can significantly enhance model performance.
221
What is a **game-changing technique** in prompt engineering?
Development of **reusable prompt templates** ## Footnote These templates help avoid rewriting similar prompts and reduce inconsistencies.
222
What is the **core problem** associated with prompt engineering?
**Prompt drift** and redundancy ## Footnote This occurs when prompts vary slightly, leading to unpredictable behavior from the model.
223
What do **LLMs** learn from billions of examples?
**Latent structures** from text tasks ## Footnote These structures serve as statistical blueprints for inferring user intent.
224
How do reusable templates help in prompt engineering?
They preserve a **consistent structural fingerprint** across calls ## Footnote This reduces the chance of misclassifying the task type.
225
What do prompt templates do to **entropy** in task specification?
They **reduce entropy** by fixing task structure ## Footnote This leads to more consistent outcomes and better generalization.
226
What is a technique for creating robust, reusable prompt templates?
**Use variable substitution placeholders** ## Footnote This separates constant task logic from dynamic content.
227
Why is it important to keep **prompt scaffolding consistent**?
To avoid introducing variance in the model’s interpretation ## Footnote Consistent phrasing leads to more reliable outputs.
228
What should reusable prompts specify clearly?
**Output formats** ## Footnote This ensures consistency when integrating results into other tools.
229
What is an example of **instructional framing** in prompts?
Transform the following paragraph into a haiku, preserving its central metaphor: [INPUT_TEXT] ## Footnote This improves reliability and portability across different models.
230
What is a common pitfall in prompt engineering?
**Hard-coded prompt examples** ## Footnote They do not generalize across inputs; use dynamic substitution instead.
231
When should you **use templates** in prompt engineering?
* Running repeated queries over large datasets * Automating interactions via APIs * Training a team to prompt consistently * Building AI assistants with strict behavior expectations ## Footnote Templates are beneficial for structured tasks.
232
When should you **avoid templates**?
* Brainstorming creative content * Engaging in freeform conversation * Exploring new types of tasks ## Footnote Templates can limit creativity and flexibility.
233
What is the purpose of adding **internal documentation** to prompt templates?
To make the prompt **self-describing** ## Footnote This aids in debugging, auditing, or AI-driven prompt editing.
234
How can you think of each prompt in terms of programming?
As a **pure function** with inputs, parameters, and deterministic outputs ## Footnote This mindset shifts the approach from crafting messages to programming behavior.
235
What is the **final takeaway** regarding reusable prompt templates?
They are the foundation of **scalable, interpretable, and high-fidelity interactions** with LLMs ## Footnote They help reduce cognitive load and eliminate prompt drift.
236
What is the **functional difference** between simple prompts and compound prompts?
Simple prompts activate single task instructions; compound prompts activate multi-task reasoning pathways ## Footnote The structure of a prompt directly influences the model's latent reasoning circuits.
237
Define a **simple prompt**.
A single instruction or request ## Footnote Example: 'Summarize this article.'
238
Define a **compound prompt**.
Combines multiple directives ## Footnote Example: 'Summarize this article in 3 bullet points. Then list 2 unanswered questions based on the content.'
239
What do **simple prompts** typically activate in LLMs?
Single task signatures ## Footnote They lead to fluent execution of known instruction patterns.
240
What do **compound prompts** activate in LLMs?
Multi-task plans ## Footnote They require the model to track dependencies and maintain internal counters.
241
What is the role of **token span and context weighting** in prompts?
Tracks dependencies across the generation span ## Footnote This includes keeping intermediate reasoning steps in memory.
242
What does **implicit chain-of-thought (CoT) activation** refer to?
Longer prompts prime the model for multi-step reasoning ## Footnote Even without explicit instructions, compound structures trigger reasoning.
243
When should you use **simple prompts**?
* Translating a sentence * Extracting one piece of data * Asking for a yes/no classification ## Footnote They keep the model in low-entropy prediction mode.
244
When should you use **compound prompts**?
* Contextual understanding * Prioritization * Creativity across dimensions * Evaluation of alternatives ## Footnote They are suitable for deep, composite tasks.
245
What is the benefit of using a **bullet structure** in compound prompts?
Acts as hard attention anchors ## Footnote Each bullet locks in a task mode, aiding in output organization.
246
What is a method for achieving **tool-like behavior** in prompts?
Separate tasks with a delimiter token ## Footnote This creates a pseudo-multi-agent prompting effect.
247
Identify a common pitfall of compound prompts.
Vague compound tasks ## Footnote They are too open-ended and lack structure.
248
What is a recommended fix for too many tasks in a compound prompt?
Limit to 2–4 clear parts per prompt ## Footnote This helps stay within the model’s working memory window.
249
When should you use **simple prompts** for testing?
* Model latency * Repeatable classification tasks * API calls with post-processing logic ## Footnote They are effective for straightforward tasks.
250
When should you use **compound prompts** for tasks requiring interpretability?
* Asking for editorial judgment * Seeking advice or critique * Exploring nuanced tasks ## Footnote They provide deeper insights and reasoning.
251
What is **meta-prompting**?
Priming the model with instructions about interpreting the prompt ## Footnote This can increase stability and modularity of output.
252
What is the **final takeaway** regarding simple vs. compound prompts?
They activate different pathways of model behavior ## Footnote Understanding this distinction allows for better control over model output.
253
What is the **paradigm** where **language models replace traditional code** as the mechanism of execution?
Software 3.0 ## Footnote This represents a shift in how software is developed and executed, moving from traditional programming to leveraging language models.
254
In Software 3.0, how do users interact with models like ChatGPT?
You *prompt* it ## Footnote This indicates a shift from writing code to providing natural language instructions.
255
What are the three main components of a **classical computer**?
* CPU * RAM * Instruction Set Architecture (ISA) ## Footnote These components work together to execute programs in a deterministic manner.
256
In Software 3.0, what replaces the **Instruction Set Architecture (ISA)**?
Natural Language Prompt ## Footnote Prompts serve as the new instruction set, activating reasoning paths in the model.
257
What does the **CPU + RAM** in classical computing correspond to in LLMs?
Transformer Layers + Attention Cache ## Footnote This reflects how LLMs process information and maintain context.
258
What is the role of the **Operating System** in classical computing compared to LLMs?
Model Runtime / Inference Engine ## Footnote This component manages the execution of the model's operations.
259
What is the equivalent of a **program** in Software 3.0?
Prompt Template ## Footnote Prompts serve as dynamic programs that guide the model's output.
260
How does an LLM execute logic compared to a traditional CPU?
Token-by-Token Sampling ## Footnote This method contrasts with the cyclical execution of instructions in a CPU.
261
What is the significance of **temperature** in LLM sampling?
Controls how 'risky' each choice can be ## Footnote It affects the randomness and creativity of the model's responses.
262
What does **prompt engineering** require mastery of in Software 3.0?
* Instruction phrasing * Few-shot example selection * Output shaping * Constraint priming * Meta-prompting ## Footnote These skills are essential for effectively guiding LLMs to produce desired outputs.
263
What is a key limitation of LLMs compared to classical computers?
No true control flow ## Footnote LLMs cannot execute deterministic jumps or loops like traditional programming.
264
What is the **final takeaway** regarding how to work with LLMs in Software 3.0?
Think like a compiler ## Footnote This involves designing prompts and execution constraints to optimize clarity and effectiveness.
265
In Software 3.0, what does the model infer from a prompt?
* The conditional * The threshold * The alert format * The judgment domain ## Footnote This demonstrates how LLMs can derive complex logic from simple human language.
266
What is the role of **context** in LLMs during inference?
Tracks token history ## Footnote Context is crucial for maintaining coherence and relevance in responses.
267
What does **stateful computation** in LLMs rely on?
Embedding context in tokens ## Footnote This allows LLMs to simulate state across interactions.
268
What is the tradeoff in Software 3.0 compared to traditional software?
Precision for flexibility and ease of expression ## Footnote This reflects the inherent differences in how LLMs and classical software operate.
269
What is the **emerging era** of software where machine-learned programs replace traditional code?
Software 3.0 ## Footnote This era emphasizes local deployment of large language models (LLMs).
270
Name the **frameworks** that allow developers to run sophisticated models on consumer-grade hardware.
* Ollama * LM Studio * llamacpp * KoboldCpp ## Footnote These frameworks enable local deployment of LLMs, enhancing privacy and control.
271
List the **benefits** of running models locally.
* Privacy * Latency * Cost * Customizability * Offline Access ## Footnote Local models keep data on the machine and avoid API delays.
272
What are the **challenges** faced by local models?
* Limited memory * Slower inference speeds * Lack of guardrails ## Footnote These challenges require engineering solutions for optimal performance.
273
What is the importance of **model selection** in local LLM performance?
Balancing size and speed ## Footnote Choosing the smallest model that meets functional requirements is crucial.
274
What does **quantization** do to model weights?
Reduces precision to save memory and boost inference speed ## Footnote Supported formats include GGUF, 4-bit, and higher precision options.
275
What are the **hardware recommendations** for running local LLMs?
* CPU: Multi-core (8+ threads) * GPU: Optional (NVIDIA RTX 3060+ or Apple M1/M2) * RAM: 16GB minimum; 32GB ideal * Disk: SSD/NVMe ## Footnote Proper hardware enhances model speed and performance.
276
What are some **runtime flags** for optimizing llamacpp / Ollama?
* --n_threads $(nproc) * --ctx-size 4096 * --batch_size 512 * --low-vram ## Footnote These flags help configure system performance for local models.
277
What should you **do** for effective prompt design?
* Use compressed, declarative prompts * Separate instruction from context * Insert system-style scaffolding ## Footnote Smart prompt design minimizes unnecessary input and enhances performance.
278
What should you **avoid** in prompt design?
* Long chat history * Repeating boilerplate instructions * Embedding full documents ## Footnote These practices can hinder model performance and efficiency.
279
What are the **sampling parameters** for controlling model behavior?
* Temperature: 0.2–0.4 * Top-p: 0.8–0.95 * Repeat penalty: 1.1–1.3 ## Footnote Adjusting these parameters helps achieve deterministic and controllable output.
280
What is the purpose of a **system prompt**?
To steer model behavior ## Footnote It helps anchor persona and tone across multiple queries.
281
How can local models be integrated into workflows?
* Python API * Shell Scripts * VSCode Extensions * Prompt templates ## Footnote These integrations enhance the utility of local models in various applications.
282
What metrics should you measure for **benchmarking** your setup?
* tokens/sec * prompt load time * response delay ## Footnote These metrics help evaluate the performance of local LLMs.
283
True or false: Running LLMs locally is just a novelty.
FALSE ## Footnote It is a serious strategy for developers seeking full stack AI control.
284
What are the **three major axes** of consideration when running LLaMA 3 locally?
* Speed * Size * Sampling Control ## Footnote These axes help determine how well the model can be run based on memory, CPU/GPU capabilities, and expected generation quality.
285
What is the **memory footprint** of LLaMA 3 dictated by?
* Precision (quantization level) * Model architecture and number of parameters ## Footnote These factors influence the RAM and disk requirements for different model variants.
286
What is the **VRAM needed** for LLaMA 3 70B using FP16 quantization?
>80 GB ## Footnote FP16 provides the highest fidelity but requires significant GPU resources.
287
What does **quantization** do to model weights?
Reduces precision (e.g., from 16-bit to 4-bit) ## Footnote This often results in minor degradation in output quality but major gains in speed and memory usage.
288
What does **Ollama** provide for LLaMA and other models?
* Simplified local runtime * GPU acceleration * GGUF backends for pre-quantized variants * On-demand streaming ## Footnote Ollama abstracts away low-level complexity and provides a user-friendly interface for inference.
289
What is the function of the **temperature** parameter in Ollama API sampling?
Controls randomness (0 = deterministic) ## Footnote Adjusting temperature affects the variability of the model's responses.
290
What is the **average latency/token** for LLaMA 3 8B using Q2_K quantization on an RTX 3060?
~22ms ## Footnote This shows some drop in language precision compared to higher fidelity settings.
291
True or false: Using **smaller quantized models** (Q4/Q5) is recommended for interactive use.
TRUE ## Footnote Smaller models are more responsive and suitable for real-time applications.
292
What is the recommended setup for a **Coding Assistant** using LLaMA 3?
LLaMA 3 8B Q4_0, top_p=0.95, temp=0.3 ## Footnote This configuration optimizes performance for coding tasks.
293
Fill in the blank: The **LLaMA 3 8B** model has approximately _______ parameters.
~8 billion ## Footnote This size allows for efficient performance while maintaining quality.
294
What are the **sampling parameters** that can impact output quality and latency?
* temperature * top_p * top_k * repeat_penalty * num_predict ## Footnote Adjusting these parameters can drastically change the model's responses.
295
What is the **core concept** of **Constitutional AI**?
A framework for aligning language models with guiding principles rather than relying solely on human feedback ## Footnote This model is trained to self-critique and self-correct according to a fixed ethical charter.
296
List the **guiding principles** encoded into Claude's model.
* Choose the response that is most harmless * Avoid helping with illegal or unethical activities * Support human flourishing and autonomy ## Footnote These principles are instilled through supervised fine-tuning and reinforcement learning stages.
297
True or false: Claude responds to prompts in the same way as GPT-family models.
FALSE ## Footnote Claude filters prompts through its constitutional lens, which can suppress or reinterpret instructions.
298
What is the difference in **system prompt anchoring** between GPT-4/ChatGPT and Claude?
* GPT-4 / ChatGPT: Strong * Claude: Weaker influence ## Footnote Claude's responses are influenced by its constitutional framework rather than strict adherence to system prompts.
299
What is the **behavior moderation** process in Claude?
Claude evaluates responses during generation for alignment violations ## Footnote It may silently discard or reformulate completions to avoid harmful outputs.
300
Fill in the blank: Claude maintains a core identity rooted in __________.
constitutional rules ## Footnote This identity helps Claude resist jailbreak attempts that succeed on other models.
301
What are the **tradeoffs** of Constitutional AI in practice?
* Safety: Reduced risk of harmful output * Consistency: Predictable behavior in uncertain scenarios * Ethics-as-a-layer: Encourages pro-social interactions * Alignment: Stronger protection against misuse ## Footnote These tradeoffs can limit creative freedom and flexibility in certain contexts.
302
What is a recommended strategy for **prompt engineering** with Claude?
* Align with constitutional goals * Use multi-step prompting * Prime with positive intent * Embrace self-auditing behavior ## Footnote These strategies help in effectively guiding Claude's responses.
303
What is the **impact** of Claude's constitutional lens on its responses?
It can suppress, reinterpret, or neutralize instructions that violate its core principles ## Footnote This behavior is different from traditional LLMs that may comply more directly with user instructions.
304
How does Claude handle **role-playing** compared to GPT-family models?
Claude often constrained by ethical filters ## Footnote In contrast, GPT-family models have unrestricted role-playing capabilities.
305
What are the **three major deployment options** for large language models (LLMs)?
* Local (e.g., Ollama) * API (e.g., Claude, GPT-4) * Serverless infrastructure (e.g., vLLM) ## Footnote Each option has distinct tradeoffs in performance, privacy, cost, and latency.
306
What is a key **advantage** of deploying LLMs **locally**?
* Privacy: No data leaves your machine * Control: Full control over model weights * Low latency: Once loaded, inference latency can be very low * Offline usage: Suitable for edge applications ## Footnote Local deployment is ideal for sensitive or proprietary inputs.
307
What is a **tradeoff** of local deployment of LLMs?
* Memory footprint: Full model must fit in RAM * Limited context length: Generally shorter than API-hosted models * Slower cold starts: Initial load can take 10–30 seconds * Lack of multimodal support: Mostly text-only ## Footnote These factors can limit the usability of local deployments.
308
What are the **best use cases** for local deployment of LLMs?
* Red-teaming / prompt experimentation * Autonomous agents * Edge robotics / IoT * Prompt sandboxing ## Footnote Local deployments are particularly effective in these scenarios.
309
What is a key **advantage** of using **API-based models**?
* Access to powerful models * Multimodal fusion: Supports various data types * Long context windows: 100k+ tokens * Maintenance-free: No hardware provisioning ## Footnote API-based models provide significant capabilities without the need for local infrastructure.
310
What is a **tradeoff** of API-based deployment of LLMs?
* Latency: Network delays can add 300–2000 ms * Privacy: Data sent to external servers * Cost: Pay-per-token pricing can accumulate * Vendor lock-in: Reliance on external APIs ## Footnote These factors can impact the decision to use API-based models.
311
What are the **best use cases** for API-based models?
* Production chat interfaces * Enterprise search / summarization * Research analysis * Audio/video/image-heavy tasks ## Footnote API-based models excel in these applications.
312
What is a key **advantage** of **serverless infrastructure** for LLMs?
* Custom model hosting: Bring your own weights * Performance optimization: Fast KV cache reuse * Scalability: Serve many concurrent users * API-like flexibility: Build OpenAI-compatible endpoints ## Footnote Serverless infrastructure allows for tailored deployment of models.
313
What is a **tradeoff** of using serverless infrastructure for LLMs?
* DevOps overhead: Requires containerization * Latency variability: Affected by GPU queueing * Security maintenance: Responsibility for endpoint security * Inference cost: Can be expensive for large models ## Footnote These challenges can complicate serverless deployments.
314
What are the **best use cases** for serverless infrastructure in LLMs?
* High-traffic apps with custom model needs * Internal API platforms * Finetuned model deployment * Production-scale retrieval-augmented generation (RAG) ## Footnote Serverless infrastructure is particularly suited for these scenarios.
315
What is the average **latency** for local deployment of LLMs?
100–300 ms (warm) ## Footnote This latency is significantly lower compared to API-based models.
316
What is the average **latency** for API-based models?
800–2000 ms ## Footnote This latency includes network delays and server-side queuing.
317
What is the average **latency** for serverless infrastructure deployment of LLMs?
150–500 ms (tuned) ## Footnote This latency can vary based on system tuning and load.
318
In the decision framework for LLM deployment, if the data is sensitive or regulated, what should you choose?
Local or internal vLLM ## Footnote This choice ensures better privacy and control over data.
319
In the decision framework for LLM deployment, if you need the best accuracy or longest context, what should you choose?
API models ## Footnote API models provide access to the largest and most capable models.
320
In the decision framework for LLM deployment, if cost is a concern, what should you choose?
Local for small teams, vLLM for scale ## Footnote This approach helps manage costs effectively.
321
True or false: **Hybrid strategies** for LLM deployment are not common in real-world systems.
FALSE ## Footnote Hybrid strategies, such as local fallback + API override, are increasingly common.
322
What are the **three open-source large language models (LLMs)** mentioned?
* Falcon * Mistral * LLaMA ## Footnote These models are popular among local inference enthusiasts and edge-AI developers.
323
What is the **single biggest barrier** to deploying open-source LLMs locally?
RAM ## Footnote Licensing is not the primary concern; memory requirements are the main challenge.
324
What is the **RAM usage** for the Mistral 7B model in float16 format?
~13–14 GB ## Footnote This model has a significant memory requirement for local deployment.
325
What does **quantization** affect in open-source LLMs?
* Memory consumption * Accuracy ## Footnote Lower quantization leads to lower accuracy but lower memory usage.
326
What are the **disk storage formats** mentioned for running LLMs locally?
* GGUF * Safetensors * PyTorch .bin ## Footnote These formats are used for managing large model binaries.
327
What is the **recommended SSD type** for models over 10B?
NVMe ## Footnote SATA SSDs can bottleneck loading times for large models.
328
What is the **max token length** for the LLaMA 2 7B model?
4096 ## Footnote This model has a specific context length that affects prompt engineering.
329
True or false: **Ollama** completely removes the constraints of memory usage when deploying models.
FALSE ## Footnote Ollama abstracts model management but does not eliminate core hardware tradeoffs.
330
What is the **recommended model** for a low-end laptop with 8GB RAM?
TinyLLaMA, Q4 ## Footnote This model is suitable for basic reasoning tasks.
331
What are the **strategic recommendations** for a high-end desktop with 64GB+ RAM?
LLaMA 3 8B+, Q5+ ## Footnote This setup enables larger prompts and better performance.
332
What is the **final takeaway** regarding open-source LLMs?
They are not resource-free ## Footnote RAM, disk space, and token window constraints significantly impact deployment and performance.
333
What is the central design constraint in **production environments** for language models?
Balance between latency and accuracy ## Footnote This balance is crucial for real-time interactions such as chatbots and copilot assistants.
334
Name the factors that affect **latency** in language models.
* Model size (number of parameters) * Batch size * Hardware acceleration * I/O overhead ## Footnote Latency is the time taken to return a token.
335
What influences **accuracy** in language models?
* Model depth * Training data diversity * Context awareness * Decoding strategies ## Footnote Accuracy is crucial for fulfilling user intent effectively.
336
True or false: Over-optimizing for speed can lead to **shallow responses** or factual errors.
TRUE ## Footnote This highlights the tradeoff between speed and the quality of responses.
337
What is the average token generation speed for **GPT-4 (API)**?
20–50 tokens/sec ## Footnote Speed can vary based on cloud latency.
338
What is the RAM requirement for **LLaMA 3 70B**?
>48 GB ## Footnote This model is used for knowledge agents and copilot AI.
339
Fill in the blank: **Mistral 7B** has an average token generation speed of _______.
~30–60 ## Footnote This model is suitable for mobile assistants and local RAG.
340
What is the tradeoff impact of a **higher temperature** setting in sampling?
More creative responses, but may hurt factuality ## Footnote Temperature settings influence the determinism of responses.
341
What happens when models approach their **context limit**?
Degradation in performance ## Footnote This is especially problematic for retrieval-augmented generation or multi-step reasoning.
342
What is a solution for **context truncation**?
* Trimming irrelevant input * Chunking with semantic overlap * Prompt compression techniques ## Footnote These methods help maintain accuracy in longer prompts.
343
What is a benefit of using **asynchronous I/O** in real-time architecture?
Prevents blocking frontend rendering ## Footnote This improves user experience by allowing partial completions.
344
What is the startup time for a **small model** like Mistral 7B?
Faster (5–10s) ## Footnote This contrasts with larger models, which have slower startup times.
345
When should you use **small models**?
* Task is repetitive or rule-based * Need real-time feedback * Resources are constrained ## Footnote Small models are efficient for straightforward tasks.
346
When is it appropriate to use **large models**?
* Complexity or nuance matters * Need deep memory or high coherence * Latency is acceptable ## Footnote Large models provide nuanced and reliable answers.
347
What is the **final takeaway** regarding latency vs. accuracy?
It's a spectrum, not a binary choice ## Footnote Understanding use case needs and hardware limits is essential for optimal performance.
348
What is **quantization** in the context of local language models?
Reducing the numerical precision of a model’s weights and activations ## Footnote Examples include going from 16-bit floating point (FP16) to 8-bit integers (INT8) or 4-bit values.
349
What are the **three types of numerical precision** mentioned in quantization?
* Full precision (FP32/FP16) * INT8 quantization * 4-bit quantization ## Footnote Each type has different impacts on accuracy, memory usage, and performance.
350
What is the **advantage** of quantization regarding memory?
Significantly reduces model size and VRAM usage ## Footnote Ideal for edge devices, low-end GPUs, or CPU-only deployments.
351
What is the **VRAM usage** for FP16, 8-bit, and 4-bit quantization for a LLaMA 7B model?
* FP16: ~14GB * 8-bit: ~8GB * 4-bit: ~4.5GB ## Footnote Each level of quantization reduces memory requirements while impacting fidelity.
352
True or false: **Quantization** only serves as a space-saving technique without affecting model performance.
FALSE ## Footnote Quantization changes how the model thinks and can degrade aspects like numerical fidelity and emergent behavior.
353
What are some **losses** associated with 4-bit quantization?
* Loss of high-precision computation * Rougher logits leading to repetition * Suppression of large-scale capabilities ## Footnote Models lose subtlety and rely more on pattern-matching.
354
What are the **four strategies** for adapting prompt design for quantized models?
* Be specific, not subtle * Use shorter prompts * Lower temperature * Favor deterministic tasks ## Footnote These adjustments help optimize performance with quantized models.
355
What are the **strengths** of the tools Ollama, KoboldCpp, and LM Studio in quantization support?
* Ollama: Simple setup, fast switching * KoboldCpp: Custom sampling configs * LM Studio: Visual workflow tuning ## Footnote Each tool has different strengths and interfaces for working with quantized models.
356
What are some **tasks** that still work well with 4-bit and 8-bit models?
* Summarization * Document parsing * Chat-based retrieval * Embedded AI agents ## Footnote Users often cannot tell the difference in performance unless compared side-by-side with full models.
357
What is the **final takeaway** regarding quantized models?
They unlock massive deployment flexibility but must be treated as distinct tools ## Footnote 4-bit inference is different, not worse, and requires careful tuning for optimal performance.
358
What is a **composable LLM architecture**?
A workflow made of interchangeable components, each doing one thing well ## Footnote Components include inference engines, RAG layers, vector databases, and controllers.
359
Name the **key components** of a composable LLM architecture.
* Inference Engine * RAG Layer * Vector DB * Controller ## Footnote Each component plays a specific role in the workflow.
360
What is the role of the **Inference Engine** in a composable LLM architecture?
Provides low-latency, cost-free inference ## Footnote Ideal for structured tasks, control logic, and template-based generation.
361
What does a **RAG Engine** do?
Acts as glue between models and vector DBs ## Footnote It chunks text, embeds it, indexes it, and retrieves it based on semantic similarity.
362
What is the purpose of a **Vector Database**?
Stores semantically indexed knowledge chunks ## Footnote Supports efficient k-NN search and metadata filtering.
363
What are the functions of **Controllers & Agents** in a composable LLM architecture?
* Manage memory * Reasoning hops * Context injection * Fallbacks * Tool use ## Footnote Can be scripted manually or built with tools like LangGraph.
364
Describe the **Local + RAG Feedback Loop** pipeline.
User ➝ Local LLM ➝ Query Rewriter ➝ Vector DB ➝ Chunk Retriever ➝ Local LLM ➝ Final Response ## Footnote This pipeline enhances user queries and retrieves relevant knowledge.
365
What is the **Escalation Layer** in a composable LLM architecture?
Starts local and escalates only if needed ## Footnote Uses small LLM for first-pass reasoning and routes to larger models if confidence drops.
366
What is a **Knowledge Memory Bank**?
Gives small models memory by summarizing past chats and indexing them ## Footnote Relevant past dialogue can be fetched and injected into prompts.
367
Why not just use a **big model** for AI tasks?
* Expensive * Slow * Opaque * Privacy-risky ## Footnote Composable systems allow for local logic and querying high-end models only when necessary.
368
What are some **technical considerations** for composable LLM architectures?
* Prompt Composition * Embedding Model Choice * Context Window Planning * Input/Output Normalization * Memory & Storage ## Footnote These considerations impact the performance and efficiency of the system.
369
What is the **minimal setup** to get started with a composable LLM architecture?
* Run Ollama Locally * Embed and Index Docs with Chroma or LlamaIndex * Wire Up Retrieval using LangChain * Chain the Calls with a local controller ## Footnote Example code is provided for setting up a local LLM with retrieval capabilities.
370
What are the **final takeaways** regarding composable LLM pipelines?
* Lower latency * Stronger privacy * Transparent control * Modular upgradability ## Footnote These pipelines are architectural power tools that enhance AI capabilities.
371
What is **prompt engineering** in the context of large language models (LLMs)?
An operational science that involves designing prompts for effective model interaction ## Footnote It emphasizes the importance of prompt design decisions when working with LLMs served locally.
372
What are the **three technical dimensions** of prompting Ollama-served models?
* Prompt Size and Context Window Management * Repetition Penalties and Sampling Behavior * System Roles, Formatting, and Compatibility ## Footnote These dimensions behave differently under Ollama, especially with quantized open-weight models.
373
What is the typical **context window** size for most LLMs?
* 2k * 4k * 8k * Up to 32k tokens ## Footnote In Ollama, prompt length is model-dependent and hardware-bound.
374
What happens if the prompt length exceeds the **memory ceiling** in Ollama?
* Context truncation * Slower token generation * System crashes ## Footnote Each additional token in the prompt consumes VRAM or RAM budget.
375
To optimize prompt size, what are some **compression techniques** recommended?
* Avoid verbose instructions * Replace boilerplate with reusable macros * Use numeric encoding ## Footnote Treat every token like a byte in a memory-constrained embedded system.
376
True or false: The **repetition_penalty** in Ollama acts as a post-logit suppression factor.
TRUE ## Footnote It penalizes tokens that have already appeared in the output.
377
What is the recommended **range** for repetition penalties in most 7B or 13B models?
1.05 – 1.15 ## Footnote This range helps maintain coherence and avoid loops.
378
What is the **recommended temperature** range for creative writing in Ollama?
0.7 – 1.0 ## Footnote Lower temperatures (0.3–0.6) are better for factual responses.
379
What role-based prompt format was popularized by **OpenAI**?
* System * User * Assistant ## Footnote In the Ollama ecosystem, these roles may be ignored or require manual formatting.
380
What should you check in the **Modelfile** or documentation for Ollama models?
Preferred prompt format ## Footnote This ensures that the prompt is compatible with the model's training.
381
What is the significance of treating prompts as **dynamic interfaces**?
It allows for better interaction with local models and respects constraints imposed by quantization and memory ceilings ## Footnote This approach can yield high-quality results from local models.
382
What is the **architecture** of Mistral that has become popular among developers?
7B ## Footnote Mistral's 7B architecture is favored for high performance on modest hardware.
383
What are the key factors to consider when optimizing **Mistral's throughput**?
* Token length * Compression strategies * Prompt design ## Footnote Understanding these factors is crucial for harnessing Mistral's full potential.
384
What is the maximum **token context window** supported by most 7B deployments?
4K to 8K tokens ## Footnote This limit is important for managing memory and compute budgets.
385
True or false: Larger prompts in Mistral have no impact on **latency**.
FALSE ## Footnote Input tokens are processed all at once, so larger prompts dramatically impact latency.
386
What are the **compression tactics** suggested for optimizing Mistral's performance?
* Instruction Prefix Folding * Bulleting vs Paragraphing * Avoid Fluffy Contextual Priming ## Footnote These techniques help save tokens and improve prompt efficiency.
387
Fill in the blank: To save tokens, compress verbose system prompts to **Act as a concise, helpful assistant. Avoid repetition.**
Act as a concise, helpful assistant. Avoid repetition. ## Footnote This can save 40-60 tokens per interaction.
388
What is the benefit of using **reusable prompt scaffolds**?
They act as compressed templates to fill with user input dynamically ## Footnote This reduces redundancy across steps in workflows.
389
What happens to Mistral's coherence as prompts approach the edge of its **context window**?
* Chain-of-thought becomes brittle * Topic drift increases * More hallucinated transitions ## Footnote Best practice is to keep total tokens under 3,000 when possible.
390
What are the recommended settings for **token-aware sampling** in Mistral?
* Temperature: 0.3–0.6 * Top-p: 0.85–0.95 * Repeat Penalty: 1.15–1.25 * Max Tokens: 300–500 ## Footnote These settings help maintain readable and efficient outputs.
391
What is a final tip for **model-aware prompting** with Mistral?
* Avoid complete sentences in setup * Use abbreviations and acronyms * Drop unnecessary articles ## Footnote This approach can enhance the model's performance.
392
What is the key takeaway regarding prompt design for Mistral 7B?
Prompt like every token counts ## Footnote Efficient prompt design unlocks the full power of Mistral without needing extensive GPU resources.
393
What is the key decision point for developers in **prompt engineering**?
How the model should interact with the external world ## Footnote This involves choices between empowering the model to call predefined tools or relying on command-prompt chaining techniques.
394
Define **Function Calling** in the context of LLMs.
A structured, schema-defined API call initiated by the model ## Footnote Inputs and outputs are validated against a contract, usually JSON schema.
395
What are **In-Context Commands**?
Natural language instructions embedded in a prompt ## Footnote These are interpreted and carried out by the model or downstream components.
396
List the benefits of **Function Calling**.
* Validation & Type Safety * Reliable API Orchestration * Minimal Prompt Injection Risk ## Footnote These benefits stem from structured inputs and external parsing.
397
What are the constraints of **Function Calling**?
* Schema Design Burden * Limited Flexibility * Vendor Dependency ## Footnote Each tool requires its own contract, which can limit creative reasoning.
398
List the benefits of **In-Context Commands**.
* No Special API Required * Highly Flexible * Easier to Debug Prompt Logic ## Footnote These benefits make it suitable for prototyping and experimental workflows.
399
What are the constraints of **In-Context Commands**?
* Fragile Interpretation * Higher Hallucination Risk * No Built-In Validation ## Footnote These constraints can lead to issues with command parsing and accuracy.
400
How does **Function Calling** influence model thinking?
Encourages structured, discrete, atomic thinking ## Footnote This leads to easier monitoring for hallucinations and deterministic token generation.
401
How does **In-Context Commands** influence model thinking?
Encourages fluid, token-driven, narrative thinking ## Footnote This approach works better for creative flows but may result in higher token variance.
402
When should you use **Function Calling**?
* API-backed data pipelines * Safety-critical applications * Complex multi-tool orchestration ## Footnote These scenarios benefit from accuracy, validation, and atomicity.
403
When should you use **In-Context Commands**?
* Local/offline environments * Lightweight prototyping * Model creativity or soft goals ## Footnote These situations favor fluidity, creativity, and quick prototyping.
404
What is a **hybrid strategy** in prompt orchestration?
Combining in-context prompts with function calls ## Footnote This allows for decomposing tasks and handling atomic actions effectively.
405
What does the structure of LLM interactions influence?
The cognition simulated by the model ## Footnote Function calls produce deterministic outputs, while in-context commands enable creativity.
406
True or false: Function calling is preferred for fluidity and creativity.
FALSE ## Footnote Function calling is preferred when accuracy, validation, and atomicity are needed.
407
What is the **default mode** of working with LLMs?
Black Box ## Footnote In this mode, the internal workings of the model are hidden, leading to nondeterministic outputs.
408
List the **signs** that indicate you are in **Black Box Mode**.
* Relying on raw chat interfaces with no temperature adjustments * Using vague or underspecified prompts with no version control * Seeing output that shifts dramatically for small changes in phrasing ## Footnote These signs indicate a lack of visibility into the model's decision-making process.
409
What is a **key risk** of operating in **Black Box Mode**?
Nondeterministic outputs ## Footnote This can lead to brittle workflows and failures that are hard to debug or replicate.
410
What does a **White Box** approach involve?
Monitoring, controlling, and profiling the LLM ## Footnote True white-box access is mostly limited to researchers, but can be approximated through token-level visibility.
411
Name one technique for gaining **White Box Insight**.
* Prompt Tracing ## Footnote This involves breaking down prompts into segments and tracing their impact through controlled runs.
412
What is **Token Inspection** used for?
* Detecting when a model changes direction * Watching chain-of-thought formations in real time * Analyzing token frequency to compare completion paths ## Footnote Tools include streaming output and token-by-token logging.
413
What does **Temperature Sweeps** reveal?
Latent response diversity ## Footnote Varying the temperature while keeping prompts constant helps probe decision boundaries and surface alternate interpretations.
414
What is the **temperature** setting of 0.2 likely to produce?
Precise, dry output ## Footnote This setting has a low hallucination risk.
415
What is the **Grey Box** zone in LLM usage?
Partial visibility, some control, and tactical feedback loops ## Footnote Most real-world prompt engineering occurs in this zone.
416
What is a strategy for building **Grey Box Workflows**?
* Use Prompt Checkpoints * Test Hypotheses via Iterative Prompting * Probe for Latent Behavior * Log Everything ## Footnote These strategies help monitor and improve prompt effectiveness.
417
What should you do if a prompt fails?
Test theories ## Footnote Instead of random tweaks, analyze what might have gone wrong.
418
What is the **visibility** and **control** level in a **Black Box** mode?
Low visibility, low control ## Footnote This mode typically uses chat UIs and default playgrounds.
419
What is the **final takeaway** regarding LLM usage?
You don’t need full model access to work smarter ## Footnote Applying white-box strategies and careful hypothesis testing can enhance model transparency.
420
What is a **Model Behavior Matrix**?
A test harness that maps: * Model engine * Sampling configuration * Prompt archetype * Output characteristics ## Footnote It helps forecast output style and reliability before production deployment.
421
What are the **key components** of a Model Behavior Matrix?
* Model engine * Sampling configuration * Prompt archetype * Output characteristics ## Footnote These components allow for structured experimentation across different models.
422
True or false: The same prompt produces the **same outputs** across different models.
FALSE ## Footnote Different models vary in architecture, training corpus, alignment techniques, and decoding strategies.
423
What does **temperature** control in model behavior?
Entropy control ## Footnote Higher temperature increases randomness, useful for creativity but dangerous for logic.
424
How does **top-k** function in model outputs?
Limits token choices to the top-k most likely options ## Footnote Its impact varies across different models and APIs.
425
What are the **prompt types** defined for testing?
* Factual Q&A * Chain-of-thought reasoning * JSON completion * Creative narrative * Multi-turn conversation ## Footnote These types help in structuring the tests for the behavior matrix.
426
What is the recommended **temperature** range for production to ensure determinism?
Less than 0.3 ## Footnote This minimizes surprises and ensures latency predictability.
427
What should be avoided in prompts for **Ollama models** like Mistral?
Memory-constrained prompts ## Footnote These can lead to truncation; shorter prompt templates are preferred.
428
What is the **core problem** in prompt engineering across different models?
Same prompt, different outputs ## Footnote Variations in model architecture and parameter interpretation complicate reasoning about behavior.
429
What is the effect of **high temperature** on model outputs?
Increases randomness ## Footnote This can enhance creativity but may compromise logical coherence.
430
What is a key observation about **GPT-4o**?
Sensitive to both temperature and top-p ## Footnote Generally stable with large models retaining internal coherence even at higher entropy.
431
What is the purpose of **running tests** in the behavior matrix?
To evaluate each prompt-parameter-model combination ## Footnote This involves using API logging or local inference tracing.
432
What should be favored in **prototyping** for model behavior?
Expressive, creative behaviors ## Footnote High temperature and varied prompt styles are ideal for discovering model capabilities.
433
What is the significance of **coherence score** in the behavior matrix?
Measures the logical consistency of outputs ## Footnote Can be assessed manually or through heuristic methods.
434
What does **hallucination flags** refer to in model outputs?
Indicators of factual inaccuracies ## Footnote These flags help in assessing the reliability of generated content.
435
What is **Ollama** known for in the context of local large language models?
A go-to platform for developers to run sophisticated models like LLaMA, Mistral, or Phi on their own hardware ## Footnote Ollama emphasizes the importance of how to prompt models effectively.
436
What are **reusable prompt modules**?
* Structured, parameterized text blocks * Can be dropped into workflows * Switched between variants * Refined through feedback loops ## Footnote They enhance the efficiency of repeated tasks.
437
Why is **reusability** important in local prompting?
* Consistent results across sessions * Rapid iteration of task formats * Composable prompt chains ## Footnote Local LLMs reset between calls unless explicitly cached.
438
What is a **prompt template**?
A structured string with variable placeholders ## Footnote At minimum, it includes user input and a system instruction.
439
What should be included in a **prompt template** at minimum?
* System instruction * User input ## Footnote As tasks grow in complexity, modularity should also increase.
440
What are some **best practices** for Ollama-compatible templates?
* Use explicit instructions * Use clear section headers * Keep variable slots token-efficient ## Footnote These practices enhance the effectiveness of local models.
441
What is the purpose of **prompt variants**?
* Use different tones * Invoke different behaviors * Handle different inputs ## Footnote A single template is rarely sufficient; multiple variants are often needed.
442
How should you **name and version** prompt variants?
Give each variant a name that encodes its structure or goal ## Footnote Examples include summarize_v1_plain, summarize_v2_outline.
443
What is **prompt mutation**?
Intentional changes to prompts to explore different outputs ## Footnote This includes changing section labels or adding motivational frames.
444
What is the purpose of **feedback-based refinement** in prompting?
* Capture user reactions * Measure task success metrics * Track prompt lineage ## Footnote This helps in identifying high-performing prompt variants.
445
What are some **performance constraints** unique to Ollama?
* Prompt size consumes RAM * Larger models require aggressive prompt compression * Default --system flag doesn't persist roles ## Footnote Keep modules under 1000 tokens unless using large-context models.
446
What are the three components that create a **prompt ecosystem**?
* Prompt Templates * Variants * Feedback Loops ## Footnote Together, these elements help in building a local LLM system that improves over time.