You are tasked with analyzing a large dataset of unlabeled customer feedback text. Your goal is to discover underlying patterns and group similar feedback entries to identify common themes or issues without any predefined categories. Which machine learning approach and a suitable type of unsupervised learning would you most likely employ?
A. Supervised learning, using a classification model.
B. Unsupervised learning, specifically a regression problem.
C. Unsupervised learning, using clustering to identify patterns.
D. Supervised learning, using a linear regression model.
C
Unsupervised learning is used when data is unlabeled and the objective is to find underlying patterns or structures. Clustering is a major type of unsupervised learning that groups data points with similar characteristics, such as using customer demographics to determine customer segmentation. Since the customer feedback is unlabeled and the goal is to group them based on similarity, unsupervised learning with clustering is the appropriate approach. Classification and regression are types of supervised learning, which require labeled data
An application developer needs to deploy a new stateless, event-driven microservice that must scale automatically and efficiently, from zero instances when not in use, and only incur costs for the compute resources consumed during execution. Which Google Cloud computing service is best suited for these requirements?
A. Compute Engine
B. Google Kubernetes Engine (GKE)
C. App Engine
D. Cloud Run
D
Cloud Run is described as a “fully managed compute platform that enables you to run requests or event-driven stateless workloads without having to worry about servers”. It “automatically scales up and down from zero” and “charges only for the resources you use, so you never pay for over-
provisioned resources,” making it ideal for the specified requirements. Compute Engine is an Infrastructure as a Service (IaaS) offering providing maximum flexibility but requires more server management. GKE runs containerized applications and offers fine-grained control, but Cloud Run is specifically highlighted for its serverless, auto-scaling from zero, and cost-efficient nature for stateless workloads. App Engine is a fully managed Platform as a Service (PaaS) focused on application logic, but Cloud Run is more specifically tailored for event-driven stateless workloads with zero-scaling.
You are working on a machine learning project within BigQuery ML to predict whether a customer will make a purchase in the future (a binary outcome: Yes/No). After preparing your data, which SQL command would you use to initiate the training of your machine learning model for this problem, and which model type would be appropriate given the nature of the prediction?
A. ML.EVALUATE, using a linear regression model
B. ML.PREDICT, using a k-means clustering model
C. CREATE MODEL, using a linear regression model
D. CREATE MODEL, using a logistic regression model
D
To create and train an ML model in BigQuery ML, the CREATE MODEL command is used. Predicting “whether a customer will make a purchase in the future” is a classification problem, as it predicts a categorical outcome (will buy/will not buy). For classification problems, a logistic regression model is appropriate. ML.EVALUATE is used to assess model performance, and ML.PREDICT is used to generate predictions after a model has been trained. K-means clustering is for unsupervised learning, and linear regression is for regression problems (predicting a numeric variable)
✅ Por qué la Opción D es la correcta
La opción D. CREATE MODEL, using a logistic regression model es la respuesta correcta porque acierta en los dos puntos clave:
Comando SQL: Para iniciar el entrenamiento de un modelo nuevo en BigQuery ML, el comando que se utiliza es $CREATE MODEL$. Este comando le indica a BigQuery que debe crear y entrenar un modelo con los datos que le proporciones.
Tipo de Modelo: El problema consiste en predecir un resultado binario (Sí/No). Este tipo de problema se conoce como clasificación binaria. La regresión logística (logistic regression) es un modelo de machine learning diseñado específicamente para problemas de clasificación binaria, ya que predice la probabilidad de que un resultado pertenezca a una de las dos categorías.
En resumen, usas $CREATE MODEL$ para empezar el entrenamiento y eliges regresión logística porque es el modelo correcto para predecir “Sí” o “No”.
❌ Por qué las otras opciones son incorrectas
Opción A: $ML.EVALUATE$, usando un modelo de regresión lineal.
Comando incorrecto: $ML.EVALUATE$ no se usa para entrenar un modelo, sino para evaluar el rendimiento de un modelo que ya ha sido entrenado.
Modelo incorrecto: La regresión lineal se usa para predecir valores numéricos continuos (ej: predecir el precio de una casa), no para clasificar en categorías como “Sí” o “No”.
Opción B: $ML.PREDICT$, usando un modelo de k-means clustering.
Comando incorrecto: $ML.PREDICT$ se usa para hacer predicciones con un modelo ya entrenado, no para entrenarlo desde cero.
Modelo incorrecto: K-means es un modelo de agrupamiento no supervisado (clustering). Se usa para encontrar grupos naturales en los datos, pero no para predecir una etiqueta específica como “compra” o “no compra” a partir de datos históricos etiquetados.
Opción C: $CREATE MODEL$, usando un modelo de regresión lineal.
Comando correcto: El comando $CREATE MODEL$ es correcto para iniciar el entrenamiento.
Modelo incorrecto: Sin embargo, la regresión lineal es el tipo de modelo equivocado para este problema, como ya se explicó. No sirve para una predicción de tipo Sí/No.
Google’s approach to Artificial Intelligence (AI) is guided by three core principles for responsible AI. Which of the following statements accurately reflects one of these principles?
A. AI development should solely prioritize financial gain for the company, as this is the primary driver of innovation.
B. AI systems should be deployed without extensive testing to accelerate market entry and gain a competitive advantage.
C. Google strives to make tools that empower others to harness AI for individual and collective benefit.
D. AI should be used to solve any problem, regardless of potential ethical complexities or societal impact.
C
One of Google’s three AI principles for responsible AI is “Collaborative progress, together,” which states that Google makes tools that empower others to harness AI for individual and collective benefit. The other options contradict Google’s stated principles of responsible development and deployment, which emphasize ethical considerations, fairness, accountability, safety, and transparency throughout the AI lifecycle. Google understands that AI poses evolving complexities and risks and pursues AI responsibly
A large enterprise needs to store massive amounts of structured data (petabyte-scale) from various operational systems. This data will primarily be used for complex analytical queries involving aggregations and requires SQL access. Which Google Cloud storage service is the most suitable for this specific use case?
A. Cloud Storage
B. Firestore
C. Bigtable
D. BigQuery
D
BigQuery is Google’s data warehouse solution, designed to “analyze petabyte-scale datasets”. It is specifically highlighted for “analytical workloads that require SQL commands” and is used when “entire datasets need to be read,” often requiring “complex queries, for example, aggregations”. Cloud Storage is typically used for unstructured data like documents, images, and audio files. Firestore is a transactional, NoSQL, document-oriented database suitable for transactional workloads without SQL. Bigtable provides a scalable NoSQL solution for analytical workloads, best for real-time, high-throughput applications requiring millisecond latency, but for SQL access and complex aggregations on petabyte-scale structured data, BigQuery is the optimal choice
✅ Respuesta Correcta
D. BigQuery
Explicación: BigQuery es el almacén de datos (data warehouse) sin servidor de Google Cloud, diseñado específicamente para este caso de uso. Cumple con todos los requisitos:
Escala Masiva: Está construido para manejar datos a escala de petabytes.
Consultas Analíticas Complejas: Su principal fortaleza es ejecutar rápidamente consultas complejas que involucran agregaciones (SUM, AVG, COUNT) y uniones (JOIN) sobre enormes conjuntos de datos. 📊
Acceso SQL: Utiliza una interfaz SQL estándar, lo que facilita su uso para análisis de datos.
❌ Respuestas Incorrectas
A. Cloud Storage
Explicación: Es un servicio de almacenamiento de objetos, ideal para guardar grandes cantidades de archivos no estructurados (imágenes, videos, copias de seguridad). No es una base de datos y no se puede consultar directamente con SQL. 🗄️
B. Firestore
Explicación: Es una base de datos NoSQL diseñada para aplicaciones (móviles y web). Es excelente para consultas rápidas y pequeñas, como recuperar el perfil de un usuario, pero no para realizar análisis complejos sobre todo el conjunto de datos.
C. Bigtable
Explicación: Es una base de datos NoSQL de alto rendimiento, ideal para grandes cargas de trabajo de lectura y escritura con baja latencia, como datos de IoT o series temporales. No utiliza SQL y no está optimizada para las consultas analíticas complejas que requiere este caso.
What is the main benefit of decoupling compute and storage in Google Cloud infrastructure?
A. It improves local data access speed.
B. It reduces the number of servers needed.
C. It allows compute and storage to scale independently.
D. It ensures compute and storage are always used together.
C
In cloud computing, decoupling compute and storage means they can scale separately based on demand, which increases flexibility and efficiency.
Which of the following are valid supervised learning tasks?
A. Classifying emails as spam or not spam
B. Predicting housing prices
C. Segmenting customers based on behavior
D. Forecasting future product sales
A, B y D
El aprendizaje supervisado (supervised learning) se refiere a cualquier tarea de machine learning en la que un modelo aprende a partir de un conjunto de datos que ya contiene las “respuestas correctas” o etiquetas. Piensa en ello como aprender con un profesor que te muestra ejemplos y te dice la solución para que puedas aprender el patrón.
Hay dos tipos principales:
Clasificación: La etiqueta es una categoría (ej: “spam”, “perro”, “fraude”).
Regresión: La etiqueta es un número continuo (ej: 250.000 €, 32°, 1.5M$).
Tareas Supervisadas Válidas ✅
A. Clasificar emails como spam o no spam
Este es un problema clásico de clasificación. El modelo aprende de miles de correos electrónicos que ya han sido etiquetados por humanos como spam o no spam para poder clasificar correos nuevos.
B. Predecir los precios de las viviendas
Este es un problema de regresión. El modelo aprende de un conjunto de datos de casas donde se conocen sus características (metros cuadrados, número de habitaciones, etc.) y su precio final de venta (la etiqueta numérica).
D. Pronosticar las ventas futuras de productos
Este también es un problema de regresión (específicamente, una previsión de series temporales). El modelo utiliza datos históricos de ventas (las ventas de cada día o mes pasado son las etiquetas) para predecir las cifras de ventas en el futuro.
Tarea No Supervisada ❌
C. Segmentar clientes según su comportamiento
Esta tarea es un ejemplo de aprendizaje no supervisado, concretamente clustering. Aquí, no tienes etiquetas predefinidas. El objetivo es que el algoritmo descubra por sí mismo los grupos o segmentos naturales que existen en los datos. No le enseñas cuáles son los segmentos, sino que le pides que los encuentre.
Which Google Cloud product enables the creation of ML models using only SQL commands?
A. AutoML
B. Vertex AI Studio
C. BigQuery ML
D. TensorFlow
C
BigQuery ML allows users to create and run machine learning models using standard SQL syntax, simplifying the ML workflow without code.
Which tools or products are part of the AI and machine learning layer in the Google Cloud data-to-AI workflow?
A. BigQuery
B. Vertex AI
C. AutoML
D. Cloud Storage
B y C
Vertex AI and AutoML are part of the machine learning layer. BigQuery is primarily an analytics tool, and Cloud Storage belongs to the storage layer.
Which machine learning algorithm is most appropriate for predicting whether a customer will make a future purchase (binary classification)?
A. Linear regression
B. K-means clustering
C. Logistic regression
D. Principal component analysis
C
Logistic regression is used for classification problems, especially binary ones like predicting whether a user will or won’t buy.
Which of the following is NOT a layer in the AI/ML framework on Google Cloud?
A. AI foundations
B. AI development
C. AI solutions
D. Deep learning
D
The three layers in the Google Cloud AI/ML framework are AI foundations, AI development, and AI solutions. Deep learning is a subset of machine learning, not a framework layer.
Which Google Cloud storage class is best for data that needs to be accessed less than once a year?
A. Standard storage
B. Nearline storage
C. Coldline storage
D. Archive storage
D
Archive storage is designed for data that is accessed less than once a year, offering the lowest cost but higher access fees and minimum storage duration.
Which of the following are examples of unsupervised learning tasks? (Select all that apply)
A. Clustering
B. Regression
C. Association
D. Classification
A y C
Clustering and association are unsupervised learning tasks, as they find patterns in unlabeled data. Regression and classification are supervised learning tasks.
What is the main advantage of using TPUs (Tensor Processing Units) for machine learning workloads on Google Cloud?
A. They are general-purpose processors
B. They are optimized for matrix multiplication and ML tasks
C. They are cheaper than CPUs for all workloads
D. They are only used for storage management
B
TPUs are custom hardware designed specifically for machine learning, especially for operations like matrix multiplication, making them faster and more efficient for ML workloads.
Which SQL command is used to create a machine learning model in BigQuery ML?
A. ML.EVALUATE
B. CREATE MODEL
C. ML.PREDICT
D. CREATE CLASSIFICATION
B
The CREATE MODEL SQL command is used in BigQuery ML to define and train a new machine learning model within the BigQuery environment.
A retail company wants to analyze its customer purchase history to group customers into distinct segments like “high-value,” “frequent but low-spend,” and “at-risk of churn.” The company does not have these segments predefined and wants the model to discover these groupings from the data itself.
Which machine learning approach and specific task are most appropriate for this business problem?
A. Supervised Learning, Regression
B. Unsupervised Learning, Clustering
C. Supervised Learning, Classification
D. Unsupervised Learning, Dimensionality Reduction
B
The key is that the segments are not “predefined.” The model needs to discover the underlying patterns and group the data. This is the definition of Unsupervised Learning. The specific task of grouping data points into sets is Clustering. Regression is for predicting continuous values, classification is for predicting predefined categories, and dimensionality reduction is for reducing the number of features, not for creating customer segments.
According to the material, a core architectural principle of Google Cloud’s data platform is the separation of two key components, allowing them to scale independently based on demand. This is a major advantage over traditional, on-premises systems.
What are these two decoupled components?
A. Networking and Security
B. Machine Learning and Data Analytics
C. Compute and Storage
D. SQL and NoSQL databases
C
The slides (e.g., slide 48, 118) repeatedly emphasize that a fundamental advantage of Google Cloud’s infrastructure is that compute and storage are decoupled. This allows a user to, for example, store petabytes of data in BigQuery (storage) and only pay for the processing power (compute) when they run a query, scaling each resource independently as needed.
A data analyst is using BigQuery ML to build a model that predicts whether a website visitor will make a purchase in the future. After preparing the data, they are ready to train the model using SQL.
Which BigQuery ML command should they use to initiate the model training process?
A. ML.PREDICT
B. ML.EVALUATE
C. CREATE MODEL
D. TRAIN MODEL
C
¿Por qué ‘CREATE MODEL’ es la respuesta correcta?
El comando CREATE MODEL (o CREATE OR REPLACE MODEL) es la instrucción específica en el lenguaje SQL de BigQuery que se usa para iniciar el proceso de creación y entrenamiento de un modelo de machine learning.
Cuando ejecutas esta consulta, le estás diciendo a BigQuery:
Crea un nuevo objeto de modelo en tu conjunto de datos.
Entrénalo usando los datos que le proporcionas en la subconsulta AS SELECT ….
Configúralo con las opciones que especificas en la cláusula OPTIONS(…), como el tipo de modelo (regresión logística, k-means, etc.), el objetivo, y otros hiperparámetros.
En resumen, CREATE MODEL es el comando que comienza todo el proceso de entrenamiento desde cero. ⚙️
¿Por qué las otras opciones son incorrectas?
Las otras opciones son funciones que se utilizan después de que el modelo ya ha sido entrenado.
A. ML.PREDICT: Esta función se usa para hacer predicciones con un modelo que ya existe y está entrenado. Le das nuevos datos y el modelo te devuelve el resultado predicho. Es el paso para usar el modelo, no para crearlo.
B. ML.EVALUATE: Esta función se usa para evaluar el rendimiento de un modelo ya entrenado. Te proporciona métricas como la precisión (accuracy), la pérdida (loss) o el AUC, para que sepas qué tan bueno es tu modelo. Es el paso para validar el modelo, no para entrenarlo.
D. TRAIN MODEL: Aunque conceptualmente “train model” (entrenar modelo) es lo que quieres hacer, no es un comando SQL válido en BigQuery ML. Es una distracción común porque describe la acción, pero no es la sintaxis correcta. La sintaxis correcta para iniciar el entrenamiento es CREATE MODEL.
El flujo de trabajo típico en BigQuery ML es:
CREATE MODEL para entrenar el modelo.
ML.EVALUATE para comprobar su rendimiento.
ML.PREDICT para usarlo y hacer predicciones.
The Data-to-AI workflow on Google Cloud consists of several stages. Products like Pub/Sub and Dataflow are used for the initial stage, while BigQuery and Looker are used for a later stage.
To which stage of the workflow do Vertex AI, AutoML, and Model Garden primarily belong?
A. Ingestion and process
B. Storage
C. Analytics
D. AI / machine learning
D
As outlined in the workflow diagrams (slides 56-61), Vertex AI is Google’s unified platform for machine learning development. AutoML (a tool within Vertex AI) and Model Garden are specifically for building, training, and managing ML models. This places them squarely in the final AI / machine learning stage of the workflow, which consumes data from the previous stages to create predictive or generative outputs.
Google has developed custom Application-Specific Integrated Circuits (ASICs) to accelerate ML workloads, making them significantly faster and more energy-efficient than general-purpose hardware for certain tasks.
Which Google hardware innovation is a domain-specific architecture tailored to accelerate the tensor and matrix operations fundamental to deep learning models?
A. DPU (Data Processing Unit)
B. GPU (Graphics Processing Unit)
C. CPU (Central Processing Unit)
D. TPU (Tensor Processing Unit)
D
The Tensor Processing Unit (TPU) is Google’s custom-developed ASIC designed specifically to accelerate the workloads of ML frameworks like TensorFlow (as mentioned on slide 46). While GPUs are also used for ML, TPUs are a Google-specific innovation purpose-built for the matrix multiplication (tensor operations) that are at the core of neural networks.
What is the primary difference between supervised and unsupervised learning?
A) Supervised learning uses labeled data, while unsupervised learning does not.
B) Supervised learning is used for regression tasks, while unsupervised learning is for classification.
C) Supervised learning requires more computational power than unsupervised learning.
D) Unsupervised learning is always used for predictive modeling.
A
Supervised learning involves training on labeled data, where inputs are paired with correct outputs, such as in classification or regression. Unsupervised learning, conversely, works with unlabeled data to identify patterns, like clustering, without predefined outputs.
Which of the following Google Cloud services is specifically designed for accelerating machine learning workloads?
A) Compute Engine
B) Cloud Storage
C) Tensor Processing Units (TPUs)
D) BigQuery
C
TPUs are custom AI accelerators optimized for speeding up machine learning model training and inference, distinguishing them from general compute services like Compute Engine or data storage solutions like Cloud Storage.
In the data-to-AI workflow on Google Cloud, what is the role of Pub/Sub?
A) It is used for storing large datasets.
B) It provides real-time messaging for ingesting streaming data.
C) It is a managed relational database service.
D) It is used for running batch processing jobs.
B
Pub/Sub facilitates real-time messaging, enabling the ingestion of streaming data into the data pipeline, which is essential for real-time data processing in the data-to-AI workflow.
Which SQL command is used to create a machine learning model in BigQuery ML?
A) CREATE TABLE
B) SELECT
C) CREATE MODEL
D) INSERT INTO
C
The CREATE MODEL command in BigQuery ML is used to define and train a machine learning model, specifying parameters like model type and input data, which is a key step in model development.