Image & Text Analysis Flashcards

(28 cards)

1
Q

What Python techniques will you learn to explore your image dataset?

A
  • NumPy
  • Matplotlib

These libraries are essential for data manipulation and visualization in image analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In image analysis, what are the common ML tasks mentioned?

A
  • Classification
  • Object detection

These tasks involve predicting labels for images in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does it mean if a dataset is balanced?

A

Labels have similar frequencies

A balanced dataset is crucial for good model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the consequence of an imbalanced dataset?

A

Requires extensive cleaning before model training

Imbalanced datasets can hinder model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is it advisable to visualize examples from each label in a dataset?

A

To get familiar with the dataset

Human identification skills are superior to machines for assessing data quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What technique can help you in your exploratory data analysis process?

A

Image montage for each label

This helps verify if the images correspond to their labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What dataset will be explored for analyzing handwritten Arabic numerals?

A

A dataset of handwritten Arabic numerals written by many people

This analysis provides insights into general patterns and variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do Machine Learning models learn better with a balanced dataset?

A

Balanced datasets can be fed directly into model training

Models trained on balanced data tend to perform better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens to a model trained on data with almost no images of parrots?

A

It will struggle to identify a parrot

This highlights the importance of having sufficient examples for each label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the applications of data science when dealing with text?

A

Remarkable applications

Data science techniques can be applied to analyze and derive insights from textual data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of learning about the most commonly occurring words in your data?

A

To visualize and compare the most frequently occurring words

This can be particularly useful for analyzing reviews, such as good and bad movie reviews.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What tool can help visualize frequent words in a text?

A

Wordcloud

Wordcloud visualizes frequent words where increasing font size indicates higher frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is it important to know the technique of Exploratory Data Analysis?

A

It allows you to analyze words present in different classes of textual data

This can quickly reveal significant insights into your project business goals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the first step in the machine learning process?

A

UNDERSTAND THE PROBLEM

This step involves clarifying what you want to predict and understanding the input and output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the possible inputs in machine learning?

A
  • Images
  • Text
  • Both

Understanding the type of input is crucial for selecting the right algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the expected output in machine learning?

A
  • A category (classification)
  • A number (regression)

The output type influences the choice of model and evaluation metrics.

17
Q

What should you check after loading the data?

A
  • Dataset size
  • First few rows
  • Missing values

These checks ensure that the data is loaded correctly and is ready for analysis.

18
Q

What command is used to check for missing values in a DataFrame?

A

df.isnull().sum()

Identifying missing values is essential for data cleaning.

19
Q

What does EDA stand for in the context of data analysis?

A

Exploratory Data Analysis

EDA is crucial for understanding the dataset before applying machine learning techniques.

20
Q

What should you check during Dataset Overview in EDA?

A
  • Total number of samples
  • Types of data (text, image, numbers)

This helps in understanding the dataset’s structure and characteristics.

21
Q

How can you check the samples per class in a dataset?

A

df[‘label’].value_counts()

This command provides insight into class distribution and balance.

22
Q

What is a critical step in CLEANING THE DATA?

A
  • Remove corrupted images
  • Remove empty texts
  • Fix or remove wrong labels
  • Remove duplicates
  • Normalize formats

Cleaning data is essential for improving model performance.

23
Q

What is the purpose of feature engineering?

A

Turn data into numbers

This process is necessary for preparing data for machine learning algorithms.

24
Q

What is a common method for converting text into numbers?

A

Count Vectorizer or TF-IDF

These methods are used to transform text data into a numerical format suitable for machine learning.

25
What is the command to split data into **train and test sets**?
train_test_split(X, y, test_size=0.2, random_state=42) ## Footnote This step is crucial for evaluating model performance on unseen data.
26
Name a few **text models** suitable for text features.
* Logistic Regression * Naive Bayes * SVM ## Footnote These models are commonly used for text classification tasks.
27
What is the first step in the **model training** process?
Train on training data ## Footnote This involves fitting the model to the training dataset.
28
What should you do if the **model performance is bad**?
* Go back to EDA * Improve features * Try a different model * Clean data more ## Footnote This iterative process helps in refining the model and improving accuracy.