AWS AI Practitioner Flashcards

Question

Fine tuning does it add weight to the data?

Answer 1

Supplies domain-relevant data as context to produce responses based on that data.

Answer 2

Provisioned Throughput only which billed by the hour

Answer 3

Rather than having to fine-tune an FM with a small set of labeled examples, RAG retrieves a small set of relevant documents and uses that to provide context to answer the user prompt

Answer 4

Classification and Regression

Answer 5

Regression

Answer 6

Regression

Answer 7

Classfication

Answer 8

Clustering and Dimensionality reduction

Answer 9

Amazon OpenSearch Service(KNN capability, vector embeddings), DynamoDB(high performance,vector embeddings), Aurora(RDS), RDS for PostgreqSQL(RDS and open source), Neptune(GraphQL)

Answer 10

Clustering

Answer 11

Dimensionality reduction

Answer 12

Reinforcement learning

Answer 13

Reinforcement learning

Answer 14

Use guardrail models

Answer 15

These models will detect and filter out unwanted content

Answer 16

Hellucinations

Answer 17

Nondeterminism

Answer 18

Data security and privacy concerns

Answer 19

Regulatory violations

Answer 20

Stable Diffusion

Answer 21

Amazon Titan

Answer 22

SPARCD Adaptability Responsiveness Simplicity Creativity and exploration Data efficiency Personalization

Answer 23

Traditional AI

Answer 24

Generative AI

Answer 25

When a model has a high bias

Answer 26

When model performs well on the training data but does not perform well on the evaluation data

Answer 27

You can automatically evaluate FMs for your generative AI use case with metrics such as accuracy, robustness, and toxicity to support your responsible AI initiative

Answer 28

Evaluate, compare, and select the best foundation model for your use case in just a few clicks

Answer 29

Yes, using built in task types

Answer 30

Guardrails helps control the interaction between users and FMs by filtering undesirable and harmful content, redacting personally identifiable information (PII), and enhancing content safety and privacy in generative AI applications

Answer 31

Offers three balancing operators: random undersampling, random oversampling, and Synthetic Minority Oversampling Technique (SMOTE) to rebalance data in your unbalanced datasets

Answer 32

Provide scores detailing which features contributed the most to your model prediction on a particular input for tabular, natural language processing (NLP), and computer vision models You can use to create, manage, analyze, and compare your machine learning experiments.

Answer 33

Human review of ML predictions

Answer 34

Amazon SageMaker Role Manager - define minimum permissions in minutes Amazon SageMaker Model Cards - capture, retrieve, and share essential model information, such as intended uses, risk ratings, and training details, from conception to deployment Amazon SageMaker Model Dashboard - You can keep your team informed on model behavior in production, all in one place

Answer 35

Responsible AI documentation Basic concepts to help customers better understand the service or service features Intended use cases and limitations Responsible AI design considerations Guidance on deployment and performance optimization

Answer 36

Value alignment Responsible reasoning skills Appropriate level of autonomy Transparency and accountability

Answer 37

Curating datasets is the process of labeling, organizing, and preprocessing the data

Answer 38

Data preprocessing, augmentation and audit

Answer 39

AWS AI Service Cards and Amazon SageMaker Model Cards

Answer 40

SageMaker Clarify and SageMaker Autopilot

Answer 41

Sagemaker Canvas using AutoML powered by Sagemaker Autopilot

Answer 42

Interpretability

Answer 43

They tell how single feature influence the predicted outcome. Used for interpretibiity and explainability

Answer 44

Design for amplified decision making Design for unbiased decision making Design for human and AI learning

Answer 45

Design for amplified decision-making

Answer 46

Design for unbiased decision-making

Answer 47

Design for human and AI learning

Answer 48

(RLHF) is an ML technique that uses human feedback to optimize ML models to self-learn more efficiently

Answer 49

Amazon SageMaker Ground Truth

Answer 50

It is the process of creating, transforming, extracting, and selecting variables from data. Convert raw data into meaningful data

Answer 51

80,10,10 or 70,15,15

Answer 52

Amazon SageMaker Data Wrangler

Answer 53

Amazon SageMaker Feature Store

Answer 54

Amazon SageMaker Canvas

Answer 55

Access to ready models from Bedrock and Jumpstart and No coding is required It is integrated with Comprehend, Rekognition and Textextract

Answer 56

Provides pretrained, open source models that customers can use for a wide range of problem types

Answer 57

Amazon SageMaker Experiments

Answer 58

Amazon SageMaker Automatic Model Tuning

Answer 59

So if you have a higher learning rate, that means that your model is going to have a faster conversions, but there is a risk of you to overshoot the optimal solution because while you're going too fast for learning. And if you have a low learning rate, it may be more precise but slower convergence

Answer 60

Amazon SageMaker Model Monitor

Answer 61

(FxKL) Linera learner Factorization machines XGBoost K-Nearst Neighbours(KNN)

Answer 62

Clustering - K-means, LDA Topic Modeling - LDA Embeddings - Object2Vec Anomoly detection - Random cut forest, IP insights Dimensionality reduction - Pricipal component analysis (PCA)

Answer 63

Image classification - MXNet tensor flow Object detection - MXNet tensor flow Semantic segmentation - FCN,PSP,Deeplab V3 Time series - DeepAR

Answer 64

Text classification -Blazing text Word2Vec - Blazing text Machine translation - Sequence to sequence Topic modeling - LDA,NTM Speech - Sequence to sequence

Answer 65

High variance

Answer 66

Feature selection for more important features and multiple sets of training and test sets of data

Answer 67

Confusion matrix

Answer 68

It is used to evaluate the performance of the model that does classfication

Answer 69

(TP+TN)/(TP+FP+TN+FN)

Answer 70

(TP)/(TP+FP)

Answer 71

When the cost of false positives are high in your particular business situation Think about a classification model that identifies emails as spam or not. In this case, you do not want your model labeling a legitimate email as spam and preventing your users from seeing that email.

Answer 72

(TP)/(TP+FN)

Answer 73

If it is extremely important and vital to the success of the model that it not give false negative results Think about a model that needs to predict whether a patient has a terminal illness or not

Answer 74

ROC is a probability curve, and AUC represents the degree or measure of separability. In general, AUC-ROC can show what the curve for true positive compared to false positive looks like at various thresholds.

Answer 75

You take the difference between the prediction and actual value, square that difference, and then sum up all the squared differences for all the observations and divide by number of predictions

Answer 76

R squared explains the fraction of variance accounted for by the model

Answer 77

MSE focuses the measure of model performance. R squared provides a measure of the model's goodness of fit to the data.

Answer 78

Developers can experiment with two or more variants of a model and help achieve the business goals.

Answer 79

Asynchronous

Answer 80

Batch transform

Answer 81

Serverless

Answer 82

Productivity

Answer 83

Reliability

Answer 84

Manage entire ML lifecycle in Sagemaker Studio

Answer 85

Repeatibility

Answer 86

Auditibility

Answer 87

Prepare data: Sagemake data wrangle Sagemake processing job Curate feature: Sagemake feature store Experiment tracking: Sagemaker experiments Train model: Sagemaker training job Evaluate model: Sagemaker processing job Register model: Sagemaker model registry Deploy model: Deployments Manage model: Sagemaker model monitor

Answer 88

LLM - text generation, contextual question answering, summarization, and classification

Answer 89

Embeddings, text generation, and image generation

Answer 90

Art vision and text AI models

Answer 91

Text-based responses based on prompts

Answer 92

LLM - generate coherent and contextually relevant text

Answer 93

large reasoning capabilities or are highly specialized, like synthetic text generation, code generation, RAG, or agents

Answer 94

Can generate images of from text input.

Answer 95

Using prompt engineering, RAG, fine-tuning, or automation agents

Answer 96

Augmentation

Answer 97

Ensembling

Answer 98

1. Customer support, virtual assistants 2. Journalism and research 3. Content marketing

Answer 99

Provide you the capability of amassing data sources into a repository of information

Answer 100

RAG without customization

Answer 101

Pay as you go Based on no of tokens in input and response for text based models Pay as you go Based on no of images in input and response for image based models Batch Multiple predictions at a time and sent as 1 file to S3 50% discount Provisioned Throughput Based on no of input and response tokens processed each minute for text based models and is called Provisioned Throughput

Answer 102

Taking a pre-trained language model and further training it on a specific task or domain-specific dataset

Answer 103

Prompt Tuning

Answer 104

Labeled examples Prompt-response pairs

Answer 105

Particular field or area of knowledge Unlabeled data Domain specific training

Answer 106

Reinforcement learning from human feedback (RLHF)

Answer 107

1. Selecting the appropriate neural network architecture, layers, and hyperparameters 2. A large and diverse dataset must be curated, cleaned, and preprocessed 3. Model is initialized with random weights and trained using various optimization algorithms

Answer 108

Carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities. Task coordination Reporting and logging Scalability and concurrency Integration and communication

Answer 109

Cheapest to Costly Prompt Engineering - No model training needed RAG - Use external knowledge but no FM changes. Cost for using vector dbs Instruction based fine tuning - FM is fine tuned with instructions and change the tone of the model. Domain Adaption fine tuning - Domain specific model training

Answer 110

GLUE - text classification, question answering, and natural language inference SuperGLUE - compositional language understanding SQuAD - question-answering capabilities WMT - machine translation systems

Answer 111

Perplexity (a measure of how well the model predicts the next token) BLEU score (for evaluating machine translation) F1 score (for evaluating classification or entity recognition tasks)

Answer 112

(2xPrecisionxRecall) / (Precision+Recall)

Answer 113

Automated metrics can be useful for rapid iterations and fine-tuning during model development

Answer 114

They fail to capture the nuances and complexities of human language and might not align perfectly with human judgments

Answer 115

Automatic summarization and machine translation systems The main idea behind ROUGE is to count the number of overlapping units

Answer 116

Similarity between a generated text and one or more reference translations Used to evaluate the quality of text that has been machine-translated from one natural language to another

Answer 117

Compute contextualized embeddings for the input texts, and then calculates the cosine similarity between them It relies on semantic similarity rather than relying on exact lexical matches

Answer 118

Compare N-gram matches Vs Evaluate Quality(Prcesion and penalizes) Vs Semantic similarity(Compare embeddings) Vs How confident the model to predict next token(lower is better)

Answer 119

Assess the performace of a FM in text summarization, machine translation, and open-ended text generation

Answer 120

Negative prompting is used to guide the model away from producing certain types of content or exhibiting specific behaviors

Answer 121

Intructions, Context, Input data and desired output

Answer 122

Inference parameters

Answer 123

Temperature - A higher temperature makes the output more diverse and unpredictable, and a lower temperature makes it more focused and predictable Top P - With a low top p setting, like 0.250, the model will only consider words that make up the top 25 percent of the total probability distribution. Higher P means more diverse Top K - Set to 50, the model will only consider the 50 most likely words for the next word in the sequence

Answer 124

Maximum length - Used in text summarization and translation Stop Sequence - When the model encounters a stop sequence during the inference process, it will terminate the generation regardless of the maximum length setting

Answer 125

Running Small Language Models on an edge device

Answer 126

Zero-shot - Present task to generative model w/o and example Few-shot - Present task to generative model with some examples Chain of Thought - Divides intricate reasoning tasks into smaller, intermediary steps

Answer 127

Poisoning - intentional introduction of malicious or biased data Hijacking, and prompt injection - influencing the outputs of generative models by embedding specific instructions

Answer 128

Risk of exposing sensitive or confidential information from its training corpus

Answer 129

Exposing the prompt or inputs used within the model or data used by the model

Answer 130

Modifying or circumventing the constraints and safety measures implemented in a generative model or AI assistant to gain unauthorized access or functionality

Answer 131

k-NN or cosine similarity

Answer 132

Amazon Opesearch pgvector extension in RDS Amazon Kendra

Answer 133

Accuracy Speed and efficiency Scalability

Answer 134

True. It is a set of questions and answers provided by the SME. Model's response to the same questions is compared with benchmark datasets answers and model performance is scored

Answer 135

Instruction tuning

Answer 136

Reinforcement learning from human feedback (RLHF)

Answer 137

Continuous pretraining

Answer 138

Data curation Labeling Governance and compliance Representativeness and bias checking Feedback integration

Answer 139

ROUGE-N - This metric primarily assesses the fluency of the text and the extent to which it includes key ideas from the reference. Compare N-gram matches between required vs actual output ROUGE-L - It is good at evaluating the coherence and order of the narrative in the outputs. Compare the longest sequence of words matche between required vs actual output

Answer 140

Measures the precision of N-grams in the machine-generated text that appears in the reference texts and applies a penalty for overly short translations (brevity penalty)

Answer 141

iAM and NACLs

Answer 142

Managing, optimizing, and scaling the organizational AI initiative Maintaining responsible and trustworthy AI practices Establish clear policies, guidelines, and oversight mechanisms

Answer 143

Data residency

Answer 144

Data logging

Answer 145

Data analysis

Answer 146

Accuracy Precision Recall F1-score Latency

Answer 147

Fine tuning a model using your data vs training a model from scratch using your data

Answer 148

It refers to the act of properly attributing and acknowledging the sources of the data used to train the model. Datasets Databases Other sources

Answer 149

It provides detailed information about the provenance, or the place of origin of the data used to train the model. Details about the data collection process The methods used to curate and clean the data Any preprocessing or transformations applied to the data

Answer 150

Data lineage

Answer 151

Cataloging

Answer 152

Model cards

Answer 153

Scope 1 : Consumer App (ChatGpt) Scope 2 : Enterprise App (SaaS like Amazon Q developer) Scope 3: Pre-trained models (Amazon Bedrock) Scope 4: Fine-tuned models (Amazon Bedrock customized or SageMaker Jumpstart) Scope 5: Self trained models (SageMaker )

Answer 154

Automation and access control - AWS Glue Data collection - Kinesis, DMS, Glue Data prep and cleaning - EMR or Glue Data quality check - Glue data brew or Glue data quality check Data visualization and analysis - Quicksight or Neptune IaC deployment - CloudFormation Monitoring and Debugging - CloudWatch

Answer 155

Reuse Adapt Customize Start from scratch

Answer 156

Subscription based - Lite and Pro + Data storage for client documents

Answer 157

Same as Guardrails

Answer 158

Amazon Bedrock

Answer 159

Amazon SageMaker

Answer 160

Amazon Titan Text Express, LLAMA 2, Claude, stability.ai Claude can take maximum tokens - 200K stability.ai is for image Content creation is by Titan Text generation and customer service by LLAMA 2 Analysis and Forecasting by Claude Image creation by stability.ai

Answer 161

Its a playground on Amazon Bedrock to build GenAI apps You can access without having AWS account

Answer 162

Amazon Comprehend

Answer 163

Amazon Transcribe to remove PII information

Answer 164

Amazon Transcribe to transcribe technical terms and jargons and context

Answer 165

Amazon Polly to read specific type of text and add break, whisper etc

Answer 166

Amazon Lex to provide input parameters

Answer 167

Comprehend, Connect, Lambda function & Kendra

AWS AI Practitioner Flashcards

(208 cards)