A retail company is exploring machine learning algorithms to improve its customer segmentation systems. The data science team is evaluating both K-Means and K-Nearest Neighbors (KNN) algorithms but needs to understand the key differences between them, since understanding these distinctions will help the team choose the right algorithm for their specific tasks.
Given this context, what do you recommend to the company?
A. K-Means is primarily used for regression tasks, while KNN is used for reducing the dimensionality of data
B. K-Means is a supervised learning algorithm used for classification, while KNN is an unsupervised learning algorithm used for clustering
C. K-Means requires labeled data to form clusters, whereas KNN does not use labeled data for making predictions
D. K-Means is an unsupervised learning algorithm used for clustering data points into groups, while KNN is a supervised learning algorithm used for classifying data points based on their proximity to labeled examples
D
Incorrect
A retail company is exploring machine learning to improve customer segmentation and discover hidden patterns in sales data. The data science team is particularly interested in using unsupervised learning to analyze large volumes of unlabeled customer and product data to identify trends and groupings without predefined categories. To determine the best approach, they need to understand which methods fall under unsupervised learning.
Which of the following would you suggest to the company as examples of unsupervised learning? (Select two)
Your selection is correct
A. Clustering
B. Decision tree
C. Neural network
D. Dimensionality reduction
E. Sentiment analysis
A and D
Incorrect
A financial services company is developing a machine learning model to predict credit risk. During the model evaluation, the data science team notices that the model performs exceptionally well on the training data but struggles with new, unseen data, indicating overfitting. To address this issue, the team needs to identify the root cause of overfitting.
What would you recommend to the team?
A. Overfitting occurs when the model is using fewer feature combinations
B. Overfitting occurs when the model ignores the training data and makes predictions based on pre-defined rules
C. Overfitting occurs when the model is not updated frequently enough with new data, leading to outdated patterns
D. Overfitting occurs when the model is overly complex and captures noise or random fluctuations in the training data rather than the underlying patterns
D
A video streaming company is developing machine learning models to recommend content and analyze user interactions. The data science team needs to understand the specific capabilities of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
Which of the following would you suggest to the team?
A. While RNNs are used for single image analysis, CNNs are used for video analysis
B. Both RNNs and CNNs are used for single image analysis
C. Both RNNs and CNNs are used for video analysis
D. While CNNs are used for single image analysis, RNNs are used for video analysis
D
Incorrect
A technology company is planning to implement machine learning to improve its product recommendation system and optimize supply chain management. The data science team is evaluating different types of machine learning approaches. Gaining a clear understanding of these types will help them choose the right strategy for model development.
What of the following option would you suggest to the team as the three main types of machine learning?
A .Transfer Learning, Semi-supervised Learning, Self-supervised Learning
B. Deep Learning, Self-supervised Learning, Reinforcement Learning
C. Reinforcement Learning, Transfer Learning, Semi-supervised Learning
D. Supervised learning, Unsupervised learning, Deep Learning
D
Incorrect
A healthcare company is considering using Amazon Bedrock to develop AI solutions that handle sensitive patient data, such as medical records and diagnostic information. Given the strict regulatory requirements in healthcare, the company needs to ensure that Amazon Bedrock provides robust data security and compliance features. The company is evaluating the platform’s capabilities to safeguard data and meet compliance standards like HIPAA.
Which of the following is correct regarding the data security and compliance aspects of Amazon Bedrock for the given use case?
A .The company’s data is only used to improve the base Foundation Models (FMs), however, it is not shared with any model providers
B. The company’s data is used to improve the base Foundation Models (FMs) and it is also shared with the model providers for model optimization
C. The company’s data is not used to improve the base Foundation Models (FMs), however, it is shared with the model providers for model optimization
D. The company’s data is not used to improve the base Foundation Models (FMs) and it is not shared with any model providers
D
A media analytics company utilizes Amazon Bedrock to run inferences with its generative AI models to analyze large volumes of user-generated content and provide insights to its clients. The company frequently processes numerous inference requests and is looking for a way to minimize the costs associated with running these inferences while still maintaining the required level of service. Given that the company can tolerate some delays in receiving responses, it seeks a cost-effective inference method that optimizes resource usage without sacrificing too much on turnaround time.
Which inference approach would be the most suitable for the company to use in order to reduce its overall inference costs?
A. The company should use batch inference, thereby allowing it to run multiple inference requests in a single batch
B. The company should use real-time inference, which is designed for low-latency responses and continuous, immediate processing
C. The company should use on-demand inference, which allows the company to pay only for the resources consumed during each inference
D. The company should use serverless inference, which automatically scales resources based on traffic
A
Which security discipline in the Generative AI Security Scoping Matrix focuses on identifying potential threats to generative AI solutions and recommending mitigations?
A. Risk management
B. Legal and privacy
C. Governance and compliance
D. Resilience
A
A company is using Amazon Bedrock based Foundation Model in a Retrieval Augmented Generation (RAG) configuration to provide tailored insights and responses based on client data stored in Amazon S3. Each team within the company is assigned to different clients and uses the foundation model to generate insights specific to their clients’ data. To maintain data privacy and security, the company needs to ensure that each team can only access the model responses generated from the data of their respective clients, preventing any unauthorized access to other teams’ client data.
What is the most effective approach to implement this access control and maintain data security?
A. The company should create a service role for Amazon Bedrock for each team, granting access only to the specific team’s clients data in Amazon S3
B. The company should configure S3 bucket policies to allow access to all teams but monitor usage through AWS CloudTrail logs to detect any unauthorized access
C. The company should create a single IAM policy that grants read-only access to all S3 buckets for all teams
D. The company should create a single role for Amazon Bedrock with full access to Amazon S3 and then create separate IAM roles for each team that are limited to each team’s clients data
A
You are working as an NLP engineer at a tech company tasked with building an advanced text summarization tool to help customers generate concise summaries of lengthy documents. After successfully training your model, your manager asks you to evaluate its performance and quality to ensure it meets the required standards for deployment.
Considering the nature of text summarization, which evaluation method would be most appropriate to assess the model’s output effectively?
A. Conduct code review to analyze the implementation code for logical or structural issues
B. Use automated testing via metrics like ROUGE or BLEU to measure the similarity between generated and reference summaries
C. Leverage human evaluation to assess the quality of summaries
D. Test the model on established benchmark datasets to evaluate performance
C
A traffic monitoring application needs to detect license plate numbers for the vehicles that pass a certain location from 11 PM to 7 AM every day.
Which ML-powered AWS service is the right fit for this requirement?
A. Amazon Rekognition
B. Amazon Textract
C. Amazon SageMaker image classification algorithm
D. Amazon SageMaker JumpStart
A
Which of the following represents the CORRECT statement regarding Amazon SageMaker Model Cards?
A. Describes how a model should be used in a production environment
B. Model Cards can be customized to meet the business needs
C. The purpose of a Model card is to describe the technical requirements to which an ML model should be deployed
D. Model cards cannot be created for models not trained on Amazon SageMaker
A
large e-commerce company uses a language model (LLM) to assist its customer service agents by generating responses to customer queries. However, the company is concerned about prompt engineering attacks, where malicious users craft inputs to manipulate the LLM into producing incorrect or harmful responses.
What is the best approach to mitigate this issue?
A. Disable user-generated inputs and rely solely on internal data for prompts
B. Create a prompt template that teaches the LLM to detect attack patterns
C. Monitor the length of the prompts to prevent overly long inputs
D. Restrict LLM output to a fixed set of predefined responses
B
A healthcare analytics company is developing machine learning models to predict patient outcomes and improve treatment plans. To enhance these models, conduct rigorous testing, and ensure data privacy when sharing with partners, the company needs to generate synthetic data that closely mirrors its existing patient records without compromising sensitive information. The synthetic data must maintain the statistical properties and patterns of the original dataset to be useful for model training and evaluation.
Which of the following methods would be most suitable for generating synthetic data?
A. The company should use Support Vector Machines (SVMs), a type of supervised learning algorithm used for generating synthetic data
B. The company should use WaveNet, a deep generative model specifically designed for generating general-purpose synthetic data across various domains
C. The company should use a Generative Adversarial Network (GAN) for creating realistic synthetic data while preserving the statistical properties of the original data
D. The company should use a Convolutional Neural Network (CNN), a type of deep learning model, best-suited for generating synthetic data
C
A technology company is exploring AWS DeepRacer to introduce its employees to machine learning through an engaging and hands-on platform. The team wants to understand the key features and capabilities of AWS DeepRacer. Which of the following represents the CORRECT statement about AWS DeepRacer?
A. You need an AWS DeepRacer car to use the AWS DeepRacer simulator
B. The AWS DeepRacer vehicle is a Wi-Fi enabled, physical vehicle that can drive itself on a physical track
C. AWS DeepRacer vehicle is only a virtual vehicle running on AWS DeepRacer simulator
D. AWS DeepRacer car is based on a model that uses a supervised learning ML algorithm
B
A company is creating a custom search solution that will bring together the company’s data repositories, FAQs, and support tickets. The support tickets might contain personally identifiable information (PII) that needs to be redacted before the tickets are processed to create the search indexes.
Which AWS service will help you redact the PII in support tickets?
A. Amazon Comprehend
B. Amazon Kendra
C. Amazon Lex
D. Amazon Textract
A