What are Categorical Variables?
import statement need for OneHotEncoder?
What is Ordinal encoding?
What is one-hot and dummy encoding?
How do we use apndas to apply one-hot or dummy encoding to a dataframe?
How do we change this one-hot encoding to dummy encoding?
How do we use scikit-learn for one-hot encoding for a pandas df? How is it different from pandas?
This is what is read is sparse_output = False
<Compressed Sparse Row sparse matrix of dtype ‘float64’
with 5 stored elements and shape (5, 3)>
How do we use sckit-learn’s encoder to get dummy encoding?
When do you use dummy encoding vs. one-hot encoding?
Say if you have use scikit-learn to use onehot encoding how do we convert it back into a dataframe? What do we need to be mindful of?
What is mixed or heterogenous data types?
datasets that have both nuerical and categorical features, that is mixed data types, also called multiple variables types or heterogenous data.
how would you code to perform one-hot and dummy encoding on a mixed data set laptop_price.csv?
For a mixed data set how could you code to explicity specify the categorical features for one-hot coding?
How could you separate hot the numerical and categorical columns?
How is inconsistent preprocessing a common pitfall?
Common pitfalls: What is Data leakage?
How can we avoid data leakage when pre-processing?
What can lead to preprocessing could we perform he that would lead to data leakage?
How can we correct for this preprocessing error that is causing data leakage?
What property do some scikit learn object have inherently? How does this end up leading to prepreocessing errors?
What does the random_state prameter determine?
What happens to our estimators if we pass instances to our random_state?
MIGHT NEED ADJUSTING
What are CV Splitters?
When do we want to pass an integer vs. random_state to an estimator?
When RandomState are passed to CV splitters what occurs?