What is the relevance of text data today?
Highly relevant…
-> mainly due to data growth in unstructured data
Is text data unstructed?
Somewhat..
Oftentimes text data is not completely unstructured due to punctuation and other semantic features -> weakly structured
Text data is _________ structured
Text data is WEAKLY structured
What are some typographic elements that give structure to a document?
What are the 3 types of text data?
What is “customer text data”?
What is “firm text data”?
What is “Insitutions/ Society text data”?
What is the definition of “text mining”?
“A computer-assisted methodology that allows researchers to rid themselves of measurement straitjackets, such as scales and scripted questions, and to quanity information contained in textual data as it naturally occurs”.
What is the goal of “text mining”?
convert text into numbers to be analyzed with statistical and machine learning techniques
How does “text mining” relate to linguistics/ Natural language processing/ ML/ Statistics?
What are two main analytical techniques for analysing unstructured big tech data?
What is the difference between “text retrieval” and “text mining”?
Text retrieval:
- obtaining the data
- also referred to as scraping (and sometimes mining, which is why its a major source of confusion).
- finds most relevant data needed from a large collection of text data
Text mining:
- processing the data
- assists user in analysing patterns in text data and discover actionable knowledge for decision-making
Between “text retrieval” and “text mining”, which takes place first?
Text retrieval first! (obtaining relevant data)
= text mining (discovering actionable knowledge)
What are 4 main applications of “text mining”?
Why is text said to “matter”?
What are the two main functions of “text mining”? (goals)
How is “prediction” a goal of “text mining”?
Goal: predict a particular outcomes with the highest level of accuracy
-> focus on how features can best be combined to achieve the best prediction
eg. “which movie will be more profitable?” using IMDB reviews (prediction)
How is “understanding” a goal of “text mining”?
Goal: gain a deeper understanding of why and how something has happened.
-> Focus on uncovering which text features drive which outcomes and why
eg. “Why do some movies become more popular than others?”
What are the two types of “Automated text analyses”?
What is “Bottom-up” text analysis?
Bottom-up means you analyze the text without a preset dictionary and let themes, patterns, or topics emerge from the data itself.
What is “Top-down” text analysis?
Top-down = you use a dictionary or predefined coding scheme to look for specific things in the data, so it’s more deductive.