8a. Text Mining Flashcards

(22 cards)

1
Q

What is the relevance of text data today?

A

Highly relevant…
-> mainly due to data growth in unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is text data unstructed?

A

Somewhat..

Oftentimes text data is not completely unstructured due to punctuation and other semantic features -> weakly structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Text data is _________ structured

A

Text data is WEAKLY structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some typographic elements that give structure to a document?

A
  • Punctuation
  • Capitalization
  • Numeric Special characters
  • Asterisks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 types of text data?

A
  1. Customer
  2. Firm
  3. Insitutions/ Society
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is “customer text data”?

A
  • Customer reviews of online products on Amazon
  • Customer emails, chats, or even (transcribed) calls with customer service staff
  • Customer posts and comments on social networking sites such as Facebook or Instagram.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is “firm text data”?

A
  • Owned meda (eg. company social media)\
  • Adverts
  • Communication with investors
  • Packaging + labels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is “Insitutions/ Society text data”?

A
  • News content
  • Movies
  • Songs
  • Books
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the definition of “text mining”?

A

“A computer-assisted methodology that allows researchers to rid themselves of measurement straitjackets, such as scales and scripted questions, and to quanity information contained in textual data as it naturally occurs”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the goal of “text mining”?

A

convert text into numbers to be analyzed with statistical and machine learning techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does “text mining” relate to linguistics/ Natural language processing/ ML/ Statistics?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are two main analytical techniques for analysing unstructured big tech data?

A
  1. Text retrieval
  2. Text mining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between “text retrieval” and “text mining”?

A

Text retrieval:
- obtaining the data
- also referred to as scraping (and sometimes mining, which is why its a major source of confusion).
- finds most relevant data needed from a large collection of text data

Text mining:
- processing the data
- assists user in analysing patterns in text data and discover actionable knowledge for decision-making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Between “text retrieval” and “text mining”, which takes place first?

A

Text retrieval first! (obtaining relevant data)
= text mining (discovering actionable knowledge)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are 4 main applications of “text mining”?

A
  • marketing
  • IS
  • management
  • finance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is text said to “matter”?

A
  • Generates framing (Gamson 1992) in the way the text conveys a product, company, market or an idea. This has been shown to change perceptions and behaviour across many different contexts.
  • Agenda-setting, psychological priming, availability, associations, etc.
17
Q

What are the two main functions of “text mining”? (goals)

A
  • Prediction
  • Understanding
18
Q

How is “prediction” a goal of “text mining”?

A

Goal: predict a particular outcomes with the highest level of accuracy
-> focus on how features can best be combined to achieve the best prediction

eg. “which movie will be more profitable?” using IMDB reviews (prediction)

19
Q

How is “understanding” a goal of “text mining”?

A

Goal: gain a deeper understanding of why and how something has happened.
-> Focus on uncovering which text features drive which outcomes and why

eg. “Why do some movies become more popular than others?”

20
Q

What are the two types of “Automated text analyses”?

A
  • Top-down
  • Bottom-up
21
Q

What is “Bottom-up” text analysis?

A

Bottom-up means you analyze the text without a preset dictionary and let themes, patterns, or topics emerge from the data itself.

22
Q

What is “Top-down” text analysis?

A

Top-down = you use a dictionary or predefined coding scheme to look for specific things in the data, so it’s more deductive.