What are the two primary trade-offs associated with using larger Foundation Models (more parameters)?
They capture more complex patterns but require more computational resources (cost) and have higher latency.
Which model selection criterion ensures a model can handle long documents for tasks like summarization without truncation?
Input Length (or Context Window).
If a real-time application like a customer service chatbot requires faster responses, which Bedrock runtime API parameter should be adjusted?
Set the latency parameter to “optimized”.
What is the effect of setting a low temperature value (e.g., near 0) during inference?
The output becomes more deterministic, focused, and consistent (steep probability distribution).
What is the effect of setting a high temperature value during inference?
The output becomes more random, creative, and diverse (flat probability distribution).
Which inference parameter allows you to control the verbosity or conciseness of the generated response?
Output Length (Max Tokens).
What is Retrieval-Augmented Generation (RAG)?
A method to improve model accuracy by retrieving relevant data from external knowledge bases and adding it to the context before generation.
What are “Embeddings” in the context of RAG?
Numerical representations (vectors) of text/data that capture semantic meaning.
What algorithm is typically used by vector databases to find relevant embeddings for a user query?
k-nearest neighbor (k-NN) similarity search.
Which AWS service supports vector search using the pgvector extension?
Amazon Aurora PostgreSQL (or Amazon RDS for PostgreSQL).
Which AWS Graph Database service supports vector indexing via Neptune Analytics?
Amazon Neptune.
Which AWS document database (MongoDB compatible) supports storing and searching vector embeddings?
Amazon DocumentDB.
What is the primary cost advantage of “In-Context Learning” over “Fine-Tuning”?
It has lower initial costs because it requires no model training or weight updates.
Comparing RAG and Fine-Tuning, which method typically offers lower latency during the inference phase?
Fine-Tuning (because it avoids the external retrieval step).
What is “Continued Pre-training”?
Training a foundation model further on large amounts of unlabeled domain-specific data to improve general domain knowledge.
What are “Agents” used for in Generative AI applications?
To orchestrate multi-step tasks, break down complex problems, and interact with external APIs/systems.
In a multi-agent system, what is the role of a “Supervisor Agent”?
To coordinate and orchestrate the efforts of specialized agents to ensure tasks are executed in the correct sequence.
What is “Metadata Filtering” in a RAG application?
Refining search results based on custom attributes (tags) to reduce noise and retrieve more relevant context.
Which customization method is best for adapting a model to a specific task using a labeled dataset?
Fine-tuning.
What are the four steps of the Amazon OpenSearch vector search process?
Configure permissions, Create collection, Upload data (embeddings), Perform similarity search.