DAHI Flashcards

(71 cards)

1
Q

– Uses statistical treatment (mathematical methods) to analyze data. Helps identify main characteristics, patterns, and distribution before deeper analysis.

A

Exploratory Data Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

– Reshaping cleaned data and reprocessing missing values to make the dataset ready for modeling or assessment.

A

Data Preparation for Modelling/Assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

– Testing different models and strategies to solve the business problem and achieve objectives.

A

Modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

– Setting up a validation scheme while the data product is working to monitor performance and ensure reliable results.

A

Implementation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

– The practice of analyzing large datasets to discover useful patterns and hidden relationships. Uses machine learning, statistics, and AI for tasks like marketing, fraud detection, and scientific discovery. Also known as KDD (Knowledge Discovery in Data).

A

Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

– Process with stages: Business Understanding, Gathering of Data, Data Preparation, Conceptualization, and Evaluation of Model.

A

Traditional Data Mining Life Cycle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

– A five-stage process designed to guide data mining projects for developing predictive models:

A

SEMMA Methodology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

– selecting the dataset for modeling

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

– understanding the data by discovering both expected and unexpected relationships, including abnormalities, often with visualization

A

Explore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

– selecting, creating, and transforming variables in preparation for modeling

A

Modify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Building the strategy to solve the problem

A

Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

– evaluating the modeling results to test reliability and usefulness

A

Assess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the five stages of the SEMMA methodology in data mining?

A

Sample, Explore, Modify, Model, Assess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

–refers to a collection of data that is extremely large in volume grows exponentially over time, and is too complex for traditional data management tools to store or process efficiently. It is still data, but much larger and harder to handle.

A

Big Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Characteristics of Big Data

A

Volume, Velocity, Variety, Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

– the size of data is massive

A

Volume

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

– the speed of data generation and processing

A

Velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

– data comes in multiple forms (Structured, Unstructured, Semi- structured)

A

Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What form of data in Big Data is available in spreadsheets and databases?

A

Structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What form of data in Big Data includes text, images, audio, and video?

A

Unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What form of data in Big Data is a combination of structured and unstructured?

A

Semi-structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

– ensures completeness and quality of information

A

Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

– identifying the problem and determining the main cause

A

Business Problem Definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

– analyzing what other companies have done in similar cases

A

Research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
– checking if staff is capable of completing the project
Human Resource Assessment
26
– key step; data is collected according to policies and guidelines
Data Acquisition
27
– reviewing, cleaning, filtering, and transforming data for analysis, visualization, or modeling
Data Munging
28
saving the prepared data in databases for further use
Data Storage
29
What are the steps in the Big Data Cycle?
Business Problem Definition Research Human Resource Assessment Data Acquisition Data Munging Data Storage
30
– A process of extracting useful information or knowledge from previously collected data.
Data Mining
31
Types of Database
Relational Database, Data Warehouse, Transactional Database, Multimedia/Streaming Database, Text Database
32
– uses tables, for example Microsoft SQL Server
Relational Database
33
– centralized data storage, for example Oracle
Data Warehouse
34
– records of purchases, deposits, and withdrawals
Transactional Database
35
– stores text, images, audio, video
Multimedia/Streaming Database
36
– collection of documents, articles, and emails
Text Database
37
What are the steps in the Data Mining Implementation Process?
Business Understanding, Data Understanding, Data Preparation, Data Transformation, Evaluation, Deployment
38
– defining business and client objectives
Business Understanding
39
– collecting data from multiple databases and using metadata to minimize errors
Data Understanding
40
– cleansing, selecting, and integrating data into a consistent format
Data Preparation
41
– changing data into usable format through: Smoothing, Aggregation, Generalization, Normalization, Attribute Constructive
Data Transformation
42
What method of data transformation removes noise or irregularities from data to make it clearer?
Smoothing
43
What method of data transformation summarizes data by combining it into more compact forms (e.g., totals, averages)?
Aggregation
44
What method of data transformation replaces detailed, low-level data with higher-level concepts for easier analysis?
Generalization
45
What method of data transformation adjusts values to a common scale, such as converting numbers to a range like -2.0 to 2.0?
Normalization
46
What method of data transformation creates new attributes from the existing data that may be more useful for analysis?
Attribute Constructive
47
– testing if identified patterns match business objectives
Evaluation
48
– applying findings in business operations and creating reports
Deployment
49
What are the different Data Mining Techniques?
Classification, Clustering, Regression, Association Rules, Outlier Detection, Sequential Patterns, Prediction
50
– classifying data into different groups or categories
Classification
51
– grouping data that are similar and identifying differences
Clustering
52
– determining relationships among variables. Example: sales forecasting based on seasonality or economic indicators
Regression
53
– finding associations between two or more items to predict customer behavior
Association Rules
54
– identifying unusual or anomalies data that do not match patterns
Outer Detection
55
– discovering repeated trends in transaction data over time
Sequential Patterns
56
– forecasting future customer behavior by analyzing past events in sequence
Prediction
57
– Methods used with a specific reason or purpose in mind.
Techniques for Data Analytics/Analysis
58
– discovering patterns in large datasets using statistics, databases, or mining tools; involves extracting and interpreting meaning from text
Text Analytics
59
– answering the question “Why did it happen?” by finding causes from statistical insights
Diagnostic Analytic
60
– answering “What is likely to happen?” by using historical data to predict future outcomes
Predictive Analytic
61
– recommending the best actions to take in solving current problems
Prescriptive Analytic
62
5. – collecting, organizing, exploring, and interpreting data systematically
Statistical Analysis
63
– summarizing data into a complete overview, whether sample or whole population
Descriptive Analytic
64
What are the Statistical Analysis Tools?
Mean and Deviation Percentage and Frequency Inferential tools
65
What statistical tool is used to analyze continuous data by measuring central tendency and spread?
Mean and Deviation
66
What statistical tool is used to analyze categorical data by showing proportions and counts?
Percentage and Frequency
67
What statistical tool is used to test for significant differences and relationships between variables?
Inferential tools
68
What are the methods under Inferential Analysis?
ANOVA and ANCOVA, T-test, Pearson-r
69
What inferential analysis tool is used to test for differences among three (3) or more variables?
ANOVA and ANCOVA
70
What inferential analysis tool is used to compare the means of only two (2) variables or groups?
T-test
71
What inferential analysis tool measures if there is a significant correlation or linear relationship between two variables?
Pearson-r