What does NLP stand for?
Natural Language Processing
NLP is the process of turning messy human language into something machines can analyze and learn from.
Why is NLP considered challenging?
These factors make it difficult for machines to understand human language accurately.
What is the Bag of Words / TF-IDF technique used for in NLP?
Counts how often words appear
Common words like ‘and’ are less important, while rare words are more significant.
What does tokenization do?
Splits text into words or sentences and removes punctuation
It can break meaning, as in the case of names.
What is the purpose of stop word removal?
Removes boring words like ‘the’, ‘and’, ‘to’
This makes text smaller and faster to process, but care must be taken not to remove meaning.
What is stemming in NLP?
Chops words down to a base form
For example, ‘faster’ becomes ‘fast’.
What is the difference between stemming and lemmatization?
Lemmatization keeps meaning better, e.g., ‘coding’, ‘coded’, ‘codes’ → ‘code’.
What do recommender systems do?
Suggest things you’ll probably like
They are used by platforms like Netflix, Amazon, and Spotify.
What is Collaborative Filtering?
Based on your past behavior
Example: ‘You liked X, so try Y’. It improves over time but struggles with new users.
What is Content-Based Filtering?
Based on who you are and what the item is
It uses factors like age, gender, preferences, and item features.
True or false: NLP helps machines extract meaning from human language.
TRUE
It involves both simple methods that count words and advanced methods that try to understand context.
What is the final takeaway regarding NLP and recommender systems?
NLP helps machines extract meaning; cleaned text feeds into ML models for recommendations
Human-level language understanding remains a significant AI research goal.
What is the definition of Machine learning?
A data-driven approach that uses algorithms to learn patterns and relationships from data without being explicitly programmed
Machine learning enables systems to improve their performance on tasks through experience.
In machine learning, what does the developer provide to the algorithm?
The algorithm uses this information to learn and create a model.
What is created after the algorithm is trained on the provided data?
A model
The trained model is used for predicting behaviors and outputs.
True or false: The trained model in machine learning can be used for decision-making on unseen data.
TRUE
This capability allows for predictions based on new inputs.
List some practical applications of machine learning.
These applications demonstrate the versatility of machine learning across various industries.
What is the first step in the supervised ML project workflow in scikit-learn?
Data
The workflow follows a loop: Data → split → fit model → predict → (evaluate) → save model.
What libraries are imported to handle data in a supervised ML project?
These libraries are essential for data manipulation and analysis.
In the context of supervised ML, what does X represent?
Features (inputs)
X = df.drop(‘species’, axis=1) represents measurements of the flower.
In the context of supervised ML, what does y represent?
Target (what we predict)
y = df[‘species’] indicates the flower species being predicted.
What is the purpose of the train_test_split function?
It is crucial for evaluating model performance.
What does test_size=0.2 indicate in the train_test_split function?
80% train, 20% test
This defines the proportion of data used for training versus testing.
What does random_state=101 ensure in the train_test_split function?
Reproducibility
It allows for the same split every time, yielding consistent results.