Feature Engineering Flashcards by Саргылаана Тобохова

What is feature engineering?

The process of creating or transforming features to make data better suited for machine learning.

How well did you know this?

Not at all

Perfectly

What is the main goal of feature engineering?

To improve model performance, reduce computational needs, or enhance interpretability.

How well did you know this?

Not at all

Perfectly

What does mutual information measure?

The reduction in uncertainty about the target given a feature; detects any kind of relationship.

How well did you know this?

Not at all

Perfectly

How does mutual information differ from correlation?

MI detects any relationship, while correlation only detects linear relationships.

How well did you know this?

Not at all

Perfectly

What is a common method to encode high-cardinality categorical features?

Target encoding (e.g., mean encoding with smoothing).

How well did you know this?

Not at all

Perfectly

What is target encoding?

Replacing categories with a number derived from the target (e.g., mean of target per category).

How well did you know this?

Not at all

Perfectly

Why is smoothing used in target encoding?

To prevent overfitting and handle rare or missing categories by blending category mean with overall mean.

How well did you know this?

Not at all

Perfectly

What is PCA (Principal Component Analysis) used for?

To reduce dimensionality and capture the main axes of variation in the data.

How well did you know this?

Not at all

Perfectly

What does PCA produce?

Principal components: new features that are linear combinations of original features.

How well did you know this?

Not at all

Perfectly

When should you scale data before PCA?

Always, because PCA is sensitive to feature scale.

How well did you know this?

Not at all

Perfectly

What is K-means clustering used for in feature engineering?

To create cluster labels as new categorical features based on similarity.

How well did you know this?

Not at all

Perfectly

What does the ‘k’ in K-means represent?

The number of clusters (centroids) to create.

How well did you know this?

Not at all

Perfectly

What is a group transform?

Aggregating information across rows grouped by a category (e.g., average income by state).

How well did you know this?

Not at all

Perfectly

How can you create interaction features?

By combining two or more features (e.g., multiplying, adding, or concatenating).

How well did you know this?

Not at all

Perfectly

What is a mathematical transform in feature engineering?

Applying arithmetic operations or functions (e.g., log, square, ratio) to features.

How well did you know this?

Not at all

Perfectly

What is a count feature?

A feature created by counting the presence of multiple binary/boolean features.

How well did you know this?

Not at all

Perfectly

What is a useful way to handle skewed numerical features?

Applying a log transformation to normalize the distribution.

How well did you know this?

Not at all

Perfectly

What is the purpose of feature utility metrics?

To rank features by their potential usefulness for predicting the target.

How well did you know this?

Not at all

Perfectly

What is an example of a domain-motivated feature?

Study These Flashcards

Creating a feature like ‘apparent temperature’ from temperature, humidity, and wind speed.

Why is feature engineering important for linear models?

Study These Flashcards

Linear models can only learn linear relationships; transformations can make relationships linear.

What is a common pitfall when using target encoding?

Study These Flashcards

Overfitting, especially with rare categories or without a separate encoding split.

How can PCA help with multicollinearity?

Study These Flashcards

It creates uncorrelated components, reducing redundancy among features.

What is a Voronoi tessellation in K-means?

Study These Flashcards

The partition of feature space into regions where each point belongs to the nearest centroid.

What is the difference between overfitting and underfitting in feature engineering?

Study These Flashcards

Overfitting: capturing noise; Underfitting: missing important patterns.

How can you validate engineered features?

By measuring model performance on a validation set not used during feature creation.

What is a ratio feature?

A new feature created by dividing one numerical feature by another.

What is a segmentation feature?

A feature created by clustering to group similar data points (e.g., customer segments).

What is the benefit of using cluster labels as features?

They simplify complex relationships by breaking data into homogeneous groups.

What is the role of domain knowledge in feature engineering?

It helps identify meaningful transformations and interactions.

What is a frequency encoding?

Replacing categories with their frequency or proportion in the dataset.

How can you handle missing values before feature engineering?

Imputation or dropping rows/columns, depending on context.

What is the purpose of feature scaling before clustering?

To ensure all features contribute equally to distance calculations.

What is a one-hot encoding?

Creating binary columns for each category of a categorical feature.

What is label encoding?

Replacing categories with integer labels (ordinal encoding).

What is a synthetic feature?

A new feature created from existing ones (e.g., ratios, sums, products).

What is the advantage of using tree-based models with count features?

Tree models can naturally aggregate counts across many binary features.

What is the effect of normalization on neural networks?

It helps convergence by scaling features to a similar range (often 0-1 or -1 to 1).

How can you use PCA for anomaly detection?

Low-variance components may capture unusual patterns not visible in original features.

What is the 'm' parameter in m-estimate smoothing?

A smoothing factor controlling blend between category mean and overall mean.

What is a high-cardinality feature?

A categorical feature with many unique categories (e.g., zip codes).

How can you create geographic segment features?

By clustering latitude and longitude coordinates with K-means.

What is the purpose of mutual information scores?

To identify which features have the strongest relationship with the target.

What is a built-in Pandas method for creating boolean features?

Using .gt(0) or similar comparison methods to create binary columns.

How can you encode cyclical features (like hours or months)?

Using sine/cosine transformations to preserve cyclical nature.

What is the benefit of using PCA before modeling?

Reducing noise, decorrelating features, and improving model efficiency.

What is the risk of using in-sample scores for feature selection?

It can lead to overfitting and overly optimistic performance estimates.

Feature Engineering Flashcards

(46 cards)