For all non-numeric columns other than timestamp, BigQuery ML performs a one-hot encoding transformation. This transformation generates a separate feature for each unique value in the column.
True
To understand other performance metrics, you can configure Google’s Cloud monitoring to monitor your model’s traffic patterns, error rates, latency, and resource utilization. This can help spot problems with your models and find the right machine type to optimize latency and cost.
True
What is the trade-off between static and dynamic training?
What are three potential architectures to explore for dynamic training?
Describe In a general architecture for dynamic training using Cloud Functions.
Describe In a general architecture for dynamic training using App Engine.
Describe how the Dataflow pipeline can be invoked the model for predictions.
How the latency can be improved when serving models?
Describe a space-time trade-off in serving prediction model.
What is Peakedness in a data distribution?
Peakedness in a data distribution is the degree to which data values are concentrated around the mean, or in the case of choosing between model serving approaches, how concentrated the distribution of the prediction workload is.
What is Cardinality in a data distribution?
Cardinality refers to the number of values in a set. In this case, the set is composed of all the possible things we might have to make predictions for.
When to choose static vs. dynamic model serving?
What design changes need to be made If you want to build a static serving system?
Explain Extrapolation and Interpolation.
How can you protect a model from changing distributions?
Describe types of drift in ML models.
List changes in the data distribution of the inputs.
Define concept drift in ML model.
Concept drift occurs when the distribution of our observations shifts over time, or that the joint probability distribution we mentioned before changes.
Concept drift can occur due to shifts in the feature space and/or the decision boundary, so we need to be aware of these during production.
What if you diagnose concept drift?
If you diagnose concept drift, the old data needs to be relabeled and the model retrained.
What if you diagnose data drift?
If you diagnose data drift, enough of the data needs to be labeled to introduce new classes and the model retrained.
Why and when does distribution skew occur?
Distribution skew occurs when the distribution of feature values for training data is significantly different from serving data and one of the key causes for distribution skew is how data is handled or changed in training vs production.
What is TensorFlow Data Validation?
TensorFlow Data Validation is a library for analyzing and validating machine learning data, for which there are three components:
- The Statistics Generation component ( generates features statistics and random samples over training data, which can be used for visualization and validation)
- the Schema Generation component,
- the Example Validator component.
What the SchemaGen pipeline component is for?
A SchemaGen pipeline component will automatically generate a schema by inferring types, categories, and ranges from the training data.
What ExampleValidator pipeline component is for?
The ExampleValidator pipeline component identifies anomalies in training and serving data.
The ExampleValidator pipeline component identifies any anomalies in the example data by comparing data statistics computed by the StatisticsGen pipeline component against a schema.