L 1.1.x
What are some alternative names for the process of data mining?
What are some alternative names for the process of data mining?
Why do we need data mining?
Why do we need data mining?
To handle the explosive growth of data. By 2025 ~100 zettabytes of data will be generated worldwide.
1 zettabyte is 1 billion terabytes.
What is one of the purposes of data mining?
What is one of the purposes of data mining?
To extract knowledge from the data.
What are Jim Gray’s four paradigms of Science?
What are Jim Gray’s four paradigms of Science?
What is another name for data mining?
What is another name for data mining?
Knowledge Discovery from Data
Sunita Sarawagi’s definition of data mining?
Sunita Sarawagi’s definition of data mining?
The process of semi-automatically analyzing large databases to find patterns that are:
• Valid: hold on new data with some certainty
• Novel: non-obvious to the system
• Useful: should be possible to act on the item
• Understandable: humans should be able to interpret the pattern.
Jiawei Han’s definition of data mining?
Jiawei Han’s definition of data mining?
The extraction of interesting [non-trivial, implicit, previously unknown, potentially useful] patterns or knowledge from huge amounts of data.
Vipin Kumar’s definition of data mining?
Vipin Kumar’s definition of data mining?
Exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns.
Not everything is data mining, give some examples.
Not everything is data mining, give some examples.
L 1.2.x
Name concepts related to but different from data mining.
Name concepts related to but different from data mining.
What is the database view of data mining?
What is the database view of data mining?
The processes and techniques that connect data warehouses to the discovery of patterns.
What is the machine learning view?
What is the machine learning view?
What is the Business Intelligence view?
What is the Business Intelligence view?
Business Intelligence (bottom up) • Lower level • Data sources • Preprocessing/ Integration/Warehousing • Middle level • Data Mining • Higher level • Knowledge evaluation • Presentation • Business Decisions
What is the Human-Centered view (as presented by UM)?
What is the Human-Centered view?
• Selection, Detection, Characterization,
Explanation, Prediction, Intervention
What is the first dimension of data mining?
What is the first dimension of data mining?
The Data to be mined (inputs).
What is the second dimension of data mining?
What is the second dimension of data mining?
Knowledge to be discovered
• Can be descriptive or predictive
What is the third dimension of data mining?
What is the third dimension of data mining?
Techniques Utilized
What is the fourth dimension of data mining?
What is the fourth dimension of data mining?
Application of Data Mining
L 1.3.x
What is the name of the object that bridges the gap between typical data structures and those required for data mining?
What is the name of the object that bridges the gap between typical data structures and those required for data mining?
Data Representation (DR).
DR is a mathematical way to represent data.
What are the three V’s of big data?
What are the three V’s of big data?
Volume, Variety, and Velocity
What more about Data Formulation?
What more about Data Formulation?
What questions does a Data Scientist look to answer when observing data?
What questions does a Data Scientist look to answer when observing data?
A Data Scientist must be able to answer these questions in a mathematical way!
• This is the task and purpose of data representation.
Name several types of data representation.
Name several types of data representation.
Describe an Item Set.
Describe an Item Set.