Feature Selection Algorithms Flashcards

(17 cards)

1
Q

Describe the 2 methods of feature selection

A

Scalar methods -
Consider each feature independently, evaluating its importance and relevance to the task. Do not take into account the relationships or dependencies between features

Vector methods -
Consider the joint distribution or relationships between features

Evaluate features in groups or as a whole, taking into account interactions between them

More computationally intensive than scalar, but can potentially lead to better feature selection when features interact in a complex way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Key characteristics of scalar methods

A

Each feature is assessed on its own merit

They are simple and computationally efficient

Suitable for problems where features are independent or have minimal interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Common techniques of scalar methods

A

Filter methods: Apply a statistical measure to evaluate the relationship between a feature and a target variable

Univariate selection / Statistical tests: features are ranked based on a statistical test and the top ranked ones are selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Key characteristics of vector methods

A

Features are considered together in their joint distributions

Can capture correlations and interactions between features

More computationally expensive but more powerful when dealing with correlated features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define wrapper methods

A

Vector methods which evaluate subsets of features by training a model on them and measuring the model’s performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 examples of wrapper mtehods

A

Forwards selection: Starts with an empty set and adds the best features one by one

Backward elimination: Starts with all features and removes the least useful ones step by step

Recursive Feature Elimination: Recursively removes the least important features based on model performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define embedded methods

A

Vector method that performs feature selection during the model training process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give 4 examples of embedded methods

A

Lasso:

Decision trees/Random forests

Principal Component Analysis

Independent Component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the lasso method

A

Penalises the absolute value of coefficients, effectively setting some coefficients to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the decision tree/random forest method

A

Feature importance can be derived from tree-based models by examining the importance of each feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the PCA method

A

Transforms the feature space into a new set of orthogonal axes, capturing the maximum variance of the data. Not strictly a feature selection method, reduces the feature space by creating a smaller set of uncorrelated features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the ICA method

A

Similar to PCA, ICA focuses on separating statistically independent components rather than uncorrelated components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the process of simple scalar feature selection

A

Choose a 1-dimensional class separability criteria, C. (Choose something that evaluates one feature at a time e.g. divergence)

The value of C(k) is computed for each feature, k

Select the n features corresponding to the n best values of C(k)

Simple to perform but does not consider correlation between features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the improved scalar feature selection method

A

Calculate the value of C(k) for each feature, k as before

Select the largest feature

Calculate C’(k) of the remaining features

C’(k) = C(k) - p(largest feature, current feature)

This will give the next best feature which doesn’t correlate with the feature already selected.

actual C’(k) has weights:
C’(k) = a1C(k) - a2p(largest feature, current feature)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the Sequential forward selection algorithm

A

Start with empty vector and progressively add features

At each iteration try adding each feature in turn to see which gives the best n-dimensional separability measure

Repeat until the vector is of the required length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the Sequential backward selection algorithm

A

Start by selecting all the features

Evaluate the impact of removing each feature one at a time

Select the feature whose removal has the least impact on the model performance and remove this

Keep selecting and removing until n features are left

17
Q

When to use forward/backward selection

A

Both are suboptimal

If the number of features is close to the desired number of features, choose backward selection

If the number of features is closer to 1, choose forward selection