Normalizations (MinMaxScaler) are usually good if you know that the distribution of your data is not ____ .
Gaussian
What are Feature crosses?
Well, they combine multiple features together into a new feature. That’s fundamentally what a feature across.
It encodes non-linearity in the feature space, or encodes the same information in fewer features.
You are working on a taxi tip prediction model and your raw dataset has columns for the latitude and longitude of both pickup and dropoff locations. These do not assume a Gaussian distribution.
Is the below feature engineering process useful? Why?
Because the data does not assume a Gaussian distribution, you should normalize these location features following the formula: Xnorm = (X - Xmin)/(Xmax - Xmin) This puts the values into the range [0,1] so it can help the training converge faster.
It’s not useful.
Normalization of the raw data implies that these geographic coordinates carry quantitative meaning. In this application, that can be counter-intuitive. For example: given all other features are constant, you can’t always say that a bigger tip will be given just because the pickup location is 1 degree longitude “greater” than a previous trip.
What are 4 of the issues of doing feature engineering at scale?
What are the benefits of using tf.transform?
With Tensorflow Transform, you can preprocess data using the same code for both training a model and serving inferences in production. True/False
C2-W2-Lab1
True
It provides several utility functions for common preprocessing tasks including creating features that require a full pass over the training dataset.
What are the outputs of Tensorflow Transform?
C2-W2-Lab1
The outputs are the transformed features and a TensorFlow graph which you can use for both training and serving.
Using the same graph for both training and serving can prevent feature skew, since the same transformations are applied in both stages.
What’s the difference between a tensorflow operation and tensorflow transfrom analyzer?
C2-W2-Lab1: Create a preprocessing function
unlike TensorFlow ops they only run once during training, and typically make a full pass over the entire training dataset. They create tensor constants, which are added to your graph. For example, tft.min computes the minimum of a tensor over the training dataset.
What are the main steps of TensorFlow Transform to preprocess input data?
C2-W2-Lab1
Collect raw data
Define metadata
Create a preprocessing function
Generate a constant graph with the required transformations
Like TFDV, Tensorflow Transform also uses ____ for deployment scalability and flexibility.
C2-W2-Lab1:Create a preprocessing function
Apache Beam
What do:
ExampleGen
StatisticsGen
SchemaGen
ExampleValidator
Transform
do in a TFX pipeline?
C2-W2-Lab2
ingest data from a base directory with ExampleGen
compute the statistics of the training data with StatisticsGen
infer a schema with SchemaGen
detect anomalies in the evaluation data with ExampleValidator
preprocess the data into features suitable for model training with Transform
What are the steps of building a data pipeline using Tensorflow Extended (TFX) to prepare features from a dataset?
C2W2-Assignment
we refer to the outputs of pipeline components as artifacts