What would be three key components of Data Science (DS)?
Enligt föreläsning PPU161_Introduction_to_DataScience_230925.pdf
What is data mining?
Korta versionen: - Non-trivial process of identifying valid, novel, potentially useful, and ultimately
understandable patterns in data.
Explain the 3 diffrent data mining process models
Describe the phases of KDD
Understanding the Goal: Define the problem you want to address and determine the goal of the knowledge discovery process.
Data Selection: Gather and select the data that is relevant to the problem you’re trying to solve. This step involves choosing the right dataset from various available sources.
Data Preprocessing: Clean and preprocess the data to handle missing values, outliers, and noise. Data preprocessing also involves transforming data into a suitable format for analysis.
Data Transformation: Convert the preprocessed data into appropriate forms for mining. This step can include normalization, aggregation, and other transformations to make the data suitable for the chosen data mining technique.
Data Mining: Apply various data mining techniques to extract patterns, trends, and insights from the transformed data. Common data mining techniques include clustering, classification, regression, and association rule mining.
Pattern Evaluation: Evaluate the discovered patterns to ensure their quality and relevance to the problem at hand. This step involves assessing patterns based on measures like accuracy, precision, recall, and relevance to the problem domain.
Knowledge Representation: Present the discovered knowledge in a comprehensible form, often using visualization techniques. This step is crucial for stakeholders to understand and interpret the results effectively.
Interpretation and Evaluation: Interpret the mined patterns in the context of the problem domain. Evaluate the knowledge discovered to determine its usefulness and effectiveness in addressing the initial goal.
Deployment: Implement and integrate the discovered knowledge into existing systems or processes. Deployed knowledge can lead to informed decision-making and improved outcomes in various applications.
Describe the phases of CRISP-DM
Från tenta 2021-10-23
Från: PPU161_Introduction_to_DataScience_230925.pdf, sid 27
Describe the phases of SEMMA
Explain/define the follwoing: Artificial intelligence, Machine Learning and Deep Learning
Från föreläsning: PPU161_Datamining_Visualization_AI_ML_lecture_slides_230928.pdf
Big data sources and forms
Från föreläsning: PPU161_Introduction_to_DataScience_230925.pdf
Big data and its characteristics
Big data is any data that is expensive to manage and hard to extract value from due to its associated primary characteristics called the 3Vs:
Från föreläsning: PPU161_Introduction_to_DataScience_230925.pdf
Describe the diffrent data preprocessing steps
Från föreläsning: PPU161_Datamining_Visualization_AI_ML_lecture_slides_230928.pdf
Explain supervised learning and under what conditions would you use it.
If you have examples to train the system with known results from those examples, supervised learning is used (regression and classification problems)
Example: Use the lables to build a model. Model used to classify new house size only on the know feature set.
Explain unsupervised learning and under what conditions would you use it.
When it is not clear which type of information is going to be found.
* Finding patterns in data without any truth
* Target/output values does not exists
* Knowledge discovery
* Most data have this form
Example: Size is missing. We need to look fo similarities in the data and group them into clusers.