A healthcare company is developing a machine learning model to analyze medical images and patient records to assist with diagnostics. The team has access to a large amount of unlabeled data and a smaller set of labeled data, and they are considering using semi-supervised learning to maximize the utility of both datasets. To make an informed decision on the approach, the data science team wants to understand which methods fall under semi-supervised learning.
Which of the following are examples of semi-supervised learning? (Select two)
A. Clustering
B. Dimensionality reduction
C. Fraud identification
D. Neural network
E. Sentiment analysis
C and E
Neural network - A neural network solution is a more complex supervised learning technique. To produce a given outcome, it takes some given inputs and performs one or more layers of mathematical transformation based on adjusting data weightings. An example of a neural network technique is predicting a digit from a handwritten image.
Clustering - Clustering is an unsupervised learning technique that groups certain data inputs, so they may be categorized as a whole. There are various types of clustering algorithms depending on the input data. An example of clustering is identifying different types of network traffic to predict potential security incidents.
Dimensionality reduction - Dimensionality reduction is an unsupervised learning technique that reduces the number of features in a dataset. It’s often used to preprocess data for other machine learning functions and reduce complexity and overheads. For example, it may blur out or crop background features in an image recognition application.
The development team at a company needs to select the most appropriate large language model (LLM) for the company’s flagship application. Given the vast array of LLMs available, the team is uncertain about the best choice. Additionally, since the application will be publicly accessible, the team has concerns about the possibility of generating harmful or inappropriate content.
Which AWS solutions should the team implement to address both the selection of the appropriate model and the mitigation of harmful content generation? (Select two).
Your selection is correct
A. Guardrails for Amazon Bedrock
B. Amazon Comprehend
C. Model Evaluation on Amazon Bedrock
D. Amazon SageMaker Clarify
E. Amazon SageMaker Model Monitor
A and C
Incorrect
A healthcare company is evaluating the use of Foundation Models (FMs) in generative AI to automate tasks such as medical report generation, data analysis, and personalized patient communications. The company’s data science team wants to better understand the key features and benefits of Foundation Models, particularly how they can be applied to various tasks with minimal fine-tuning and customization. To ensure they choose the right model for their needs, the team is seeking to clarify the essential characteristics of FMs in generative AI.
Which of the following is correct regarding Foundation Models (FMs) in the context of generative AI?
A. FMs use unlabeled training data sets for supervised learning
B. FMs use unlabeled training data sets for self-supervised learning
C. FMs use labeled training data sets for self-supervised learning
D. FMs use labeled training data sets for supervised learning
B
Incorrect
A healthcare technology company is developing machine learning models to analyze both structured data, such as patient records, and unstructured data, such as medical images and clinical notes. The data science team is working on feature engineering to extract the most relevant information for the models but is aware that the process differs depending on whether the data is structured or unstructured. To ensure they approach each data type correctly, they need to understand the key differences in feature engineering tasks for structured versus unstructured data in machine learning.
What is a key difference in feature engineering tasks for structured data compared to unstructured data in the context of machine learning?
A . Feature engineering for structured data often involves tasks such as normalization and handling missing values, while for unstructured data, it involves tasks such as tokenization and vectorization
B. Feature engineering for structured data is not necessary as the data is already in a usable format, whereas for unstructured data, extensive preprocessing is always required
C. Feature engineering tasks for structured data and unstructured data are identical and do not vary based on data type
D. Feature engineering for structured data focuses on image recognition, whereas for unstructured data, it focuses on numerical data analysis
A
Feature engineering for structured data often involves tasks such as normalization and handling missing values, while for unstructured data, it involves tasks such as tokenization and vectorization
Feature engineering for structured data typically includes tasks like normalization, handling missing values, and encoding categorical variables. For unstructured data, such as text or images, feature engineering involves different tasks like tokenization (breaking down text into tokens), vectorization (converting text or images into numerical vectors), and extracting features that can represent the content meaningfully.
Incorrect options:
Feature engineering for structured data focuses on image recognition, whereas for unstructured data, it focuses on numerical data analysis - Structured data can include numerical and categorical data, while unstructured data includes text, images, audio, etc. The focus is not limited to image recognition or numerical data analysis.
Feature engineering for structured data is not necessary as the data is already in a usable format, whereas for unstructured data, extensive preprocessing is always required - Feature engineering is important for both structured and unstructured data. While structured data may require less preprocessing, tasks like normalization and handling missing values are still crucial. Unstructured data typically requires more extensive preprocessing.
Feature engineering tasks for structured data and unstructured data are identical and do not vary based on data type - Feature engineering tasks vary significantly between structured and unstructured data due to the inherent differences in data types and the requirements for preprocessing each type.
An e-commerce company is developing a chatbot to enhance its user experience by allowing customers to submit queries that include both text descriptions and images, such as product photos or screenshots of issues. The company aims for the chatbot to understand these multi-modal inputs and provide accurate and context-aware responses, seamlessly combining visual and textual information to address customer needs effectively.
Which approach would be the most cost-effective for enabling the chatbot to process such multi-modal queries effectively?
A. The company should use a multi-modal embedding model, which is designed to represent and align different types of data (such as text and images) in a shared embedding space, allowing the chatbot to understand and interpret both forms of input simultaneously
B. The company should use a multi-modal generative model, which can generate responses or outputs based on combined inputs from different modalities, such as text and images, enhancing the chatbot’s ability to provide contextually relevant answers
C. The company should use a convolutional neural network (CNN), a deep learning model primarily designed for processing image data
D. The company should use a text-only language model, which is trained exclusively on textual data
A
A logistics company is exploring the use of Machine Learning models to optimize its supply chain operations, such as demand forecasting, route optimization, and inventory management. The company’s data science team needs to understand the fundamental principles of Machine Learning models, including how they are trained, evaluated, and applied to real-world problems. This understanding will help the team select the right model for their use cases and improve operational efficiency.
Which of the following is correct regarding Machine Learning models?
A. Machine Learning models are deterministic for supervised learning and probabilistic for unsupervised learning
B. Machine Learning models can only be probabilistic
C. Machine Learning models can only be deterministic
D. Machine Learning models can be deterministic or probabilistic or a mix of both
D
A financial services company is deploying AI models to assess credit risk and make lending decisions. As part of ensuring ethical AI use, the company wants to build models that are both interpretable and explainable to regulators, stakeholders, and customers. The data science team needs to understand the distinction between interpretability and explainability in the context of Responsible AI to choose the right techniques for transparency. This distinction will guide the company in making its AI models more trustworthy and compliant.
Which of the following represents the best option for the given use case?
A. Interpretability refers to the ability to understand the technical details of the model’s code, while explainability refers to the ability to reproduce the model’s results
B. Interpretability is about understanding the internal mechanisms of a machine learning model, whereas explainability focuses on providing understandable reasons for the model’s predictions and behaviors to stakeholders
C. Explainability is about understanding the internal mechanisms of a machine learning model, whereas interpretability focuses on providing understandable reasons for the model’s predictions and behaviors to stakeholders
D. Interpretability is used to enhance the model’s performance, while explainability is used to ensure the model’s security
B
Question 65Incorrect
A healthcare startup is developing a machine learning model to predict patient outcomes based on historical medical data. During the training process, the data science team notices signs of overfitting, where the model performs well on the training data but struggles with new, unseen data. To ensure the model generalizes effectively and avoids memorizing the training data, the team needs to implement strategies to prevent overfitting.
How can you prevent model-overfitting in machine learning?
A. By only training the model on a small subset of the available data to reduce the amount of information it has to learn
B. By increasing the complexity of the model to ensure it captures all nuances in the training data
C. By avoiding any form of model validation or testing to prevent the model from learning incorrect patterns
D. By using techniques such as cross-validation, regularization, and pruning to simplify the model and improve its generalization
D