What is exploratory data analysis (EDA)?
The process of going through a dataset and discovering more about it.
EDA helps in understanding the data’s structure, patterns, and anomalies.
What is model training?
Create model(s) to learn to predict a target variable based on other variables.
This involves using training data to adjust the model parameters.
What does model evaluation entail?
Evaluating a model’s predictions using problem-specific evaluation metrics.
Common metrics include accuracy, precision, recall, and F1 score.
What is the purpose of model comparison?
Comparing several different models to find the best one.
This helps in identifying which model performs best under given conditions.
What is model hyperparameter tuning?
Tweaking a model’s hyperparameters to improve it after finding a good model.
Hyperparameters are settings that are not learned from the data but are set prior to training.
What does feature importance refer to?
Identifying features/characteristics that are more important for predicting heart disease.
Feature importance helps in understanding which variables have the most influence on the prediction.
What is cross-validation?
A method to ensure a good model works on unseen data.
It involves partitioning the dataset into subsets to evaluate the model’s performance.
What should be included in reporting what we’ve found?
Presenting the work and findings in a clear manner.
This may include visualizations, key metrics, and insights derived from the analysis.
What are attributes in the context of predictive modeling?
Attributes are the variables used to predict the target variable
What are attributes also referred to as?
Independent variables
What is the target variable in predictive modeling?
The dependent variable
Fill in the blank: Attributes are also called _______.
features
True or False: The target variable can also be referred to as an independent variable.
False
What is an evaluation metric?
An evaluation metric is something you usually define at the start of a project.
Why can evaluation metrics change over time?
Because machine learning is very experimental.
What example goal might a project start with?
Reach 95% accuracy at predicting whether or not a patient has heart disease.
What is the purpose of setting a goal for machine learning engineers?
It provides a rough goal to work towards.
What may happen to the project goal as it progresses?
It may have to be adjusted based on real-world testing.
What are features in the context of data?
Different parts and characteristics of the data.
What should you do during the step of identifying important features?
Start exploring what each portion of the data relates to and create a reference.
What is a common way to document features of data?
Create a data dictionary.
What is a data dictionary?
A data dictionary describes the data you’re dealing with.
It provides metadata about the data elements in a dataset.
Do all datasets come with a data dictionary?
No, not all datasets come with data dictionaries.
This may require additional research or consultation with a subject matter expert.
What should you do if a dataset does not have a data dictionary?
You may have to do your research or ask a subject matter expert.
A subject matter expert is someone who knows about the data.