Data Flashcards

Question 1

Q

What would be three key components of Data Science (DS)?

Enligt föreläsning PPU161_Introduction_to_DataScience_230925.pdf

Question 2

Q

What is data mining?

Answer

A

Korta versionen: - Non-trivial process of identifying valid, novel, potentially useful, and ultimately
understandable patterns in data.

Data mining (knowledge discovery in databases): Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.
Alternative names: Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, information harvesting, business intelligence, etc.

Question 3

Q

Explain the 3 diffrent data mining process models

Answer

A

The Knowledge Discovery Databases (KDD) model is an iterative and interactive model. It has total nine steps. It refers to finding knowledge in data and emphasizes the high level of specific data mining method.
Cross-Industry Standard Process for Data Mining (CRISP-DM) was launched in late 1996 by Daimler Chrysler (then Daimler-Benz), SPSS (then ISL) and NCR. This models the refines over the years. It has six steps or phases.
Sample, Explore, Modify, Model, Assess (SEMMA) model was developed by SAS institute. It has five different phases.

Question 4

Q

Describe the phases of KDD

Answer

A

Understanding the Goal: Define the problem you want to address and determine the goal of the knowledge discovery process.

Data Selection: Gather and select the data that is relevant to the problem you’re trying to solve. This step involves choosing the right dataset from various available sources.

Data Preprocessing: Clean and preprocess the data to handle missing values, outliers, and noise. Data preprocessing also involves transforming data into a suitable format for analysis.

Data Transformation: Convert the preprocessed data into appropriate forms for mining. This step can include normalization, aggregation, and other transformations to make the data suitable for the chosen data mining technique.

Data Mining: Apply various data mining techniques to extract patterns, trends, and insights from the transformed data. Common data mining techniques include clustering, classification, regression, and association rule mining.

Pattern Evaluation: Evaluate the discovered patterns to ensure their quality and relevance to the problem at hand. This step involves assessing patterns based on measures like accuracy, precision, recall, and relevance to the problem domain.

Knowledge Representation: Present the discovered knowledge in a comprehensible form, often using visualization techniques. This step is crucial for stakeholders to understand and interpret the results effectively.

Interpretation and Evaluation: Interpret the mined patterns in the context of the problem domain. Evaluate the knowledge discovered to determine its usefulness and effectiveness in addressing the initial goal.

Deployment: Implement and integrate the discovered knowledge into existing systems or processes. Deployed knowledge can lead to informed decision-making and improved outcomes in various applications.

Question 5

Q

Describe the phases of CRISP-DM

Från tenta 2021-10-23

Answer

A

Från: PPU161_Introduction_to_DataScience_230925.pdf, sid 27

Question 6

Q

Describe the phases of SEMMA

Question 7

Q

Explain/define the follwoing: Artificial intelligence, Machine Learning and Deep Learning

Answer

A

Artificial intelligence: Getting machines to do what humans are good at
Machine Learning: Feeding an algorithm data to learn and predict something
Deep Learning: A subtype of machine learning which utilizes multi-layer neural networks

Från föreläsning: PPU161_Datamining_Visualization_AI_ML_lecture_slides_230928.pdf

Question 8

Q

Big data sources and forms

Answer

A

Från föreläsning: PPU161_Introduction_to_DataScience_230925.pdf

Question 9

Q

Big data and its characteristics

Answer

A

Big data is any data that is expensive to manage and hard to extract value from due to its associated primary characteristics called the 3Vs:

Volume - The size of the data
Velocity - The speed at which the data is generated and processed
Variety and complexity - The diversity of sources, formats, quality, and structures
Other V:s - Veracity, Value, Variability

Från föreläsning: PPU161_Introduction_to_DataScience_230925.pdf

Question 10

Q

Describe the diffrent data preprocessing steps

Answer

A

Data cleaning - Fill in missing values, smooth noisy data, identify or remove outliers, and resolve
inconsistencies
Data integration - integration of multiple databases, files, or notes
Data transformation - Normalization (scaling to a specific range) and Summarization/Aggregation
Data reduction - Reduced representation of data in volume but produces the same or similar analytical results. And Feature selection, dimensionality reduction, data compression, etc.
34

Från föreläsning: PPU161_Datamining_Visualization_AI_ML_lecture_slides_230928.pdf

Question 11

Q

Explain supervised learning and under what conditions would you use it.

Answer

A

If you have examples to train the system with known results from those examples, supervised learning is used (regression and classification problems)

Patterns that predict some target value
Target/output values do exist and are used

Example: Use the lables to build a model. Model used to classify new house size only on the know feature set.

Question 12

Q

Explain unsupervised learning and under what conditions would you use it.

Answer

A

When it is not clear which type of information is going to be found.
* Finding patterns in data without any truth
* Target/output values does not exists
* Knowledge discovery
* Most data have this form

Example: Size is missing. We need to look fo similarities in the data and group them into clusers.

Data Flashcards

(12 cards)