What is feature engineering?
The process of creating or transforming features to make data better suited for machine learning.
What is the main goal of feature engineering?
To improve model performance, reduce computational needs, or enhance interpretability.
What does mutual information measure?
The reduction in uncertainty about the target given a feature; detects any kind of relationship.
How does mutual information differ from correlation?
MI detects any relationship, while correlation only detects linear relationships.
What is a common method to encode high-cardinality categorical features?
Target encoding (e.g., mean encoding with smoothing).
What is target encoding?
Replacing categories with a number derived from the target (e.g., mean of target per category).
Why is smoothing used in target encoding?
To prevent overfitting and handle rare or missing categories by blending category mean with overall mean.
What is PCA (Principal Component Analysis) used for?
To reduce dimensionality and capture the main axes of variation in the data.
What does PCA produce?
Principal components: new features that are linear combinations of original features.
When should you scale data before PCA?
Always, because PCA is sensitive to feature scale.
What is K-means clustering used for in feature engineering?
To create cluster labels as new categorical features based on similarity.
What does the ‘k’ in K-means represent?
The number of clusters (centroids) to create.
What is a group transform?
Aggregating information across rows grouped by a category (e.g., average income by state).
How can you create interaction features?
By combining two or more features (e.g., multiplying, adding, or concatenating).
What is a mathematical transform in feature engineering?
Applying arithmetic operations or functions (e.g., log, square, ratio) to features.
What is a count feature?
A feature created by counting the presence of multiple binary/boolean features.
What is a useful way to handle skewed numerical features?
Applying a log transformation to normalize the distribution.
What is the purpose of feature utility metrics?
To rank features by their potential usefulness for predicting the target.
What is an example of a domain-motivated feature?
Creating a feature like ‘apparent temperature’ from temperature, humidity, and wind speed.
Why is feature engineering important for linear models?
Linear models can only learn linear relationships; transformations can make relationships linear.
What is a common pitfall when using target encoding?
Overfitting, especially with rare categories or without a separate encoding split.
How can PCA help with multicollinearity?
It creates uncorrelated components, reducing redundancy among features.
What is a Voronoi tessellation in K-means?
The partition of feature space into regions where each point belongs to the nearest centroid.
What is the difference between overfitting and underfitting in feature engineering?
Overfitting: capturing noise; Underfitting: missing important patterns.