Big Data and Machine Learning Flashcards

(11 cards)

1
Q

What is Continuous data and Discrete data

A
  1. data with any value in range
  2. Quantitative data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a relational database

A

A type of database that organises data into a tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 types of data?

A

Structured, semi-structured and unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give examples for each data type

A

stuctured: spreadsheets, transactonal records, continuous data , relational records.

semi : emails, zipped files, xml files,

unstructured: social media, weather data, entertainment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can each data type be captured?

A

stuctured:IoE sensors , Retail checkpoints , Online surveys

semi :Qualitative data, Sensors, Satellites , gps

unstructured: Scrappers, Posts , comments , Photos , videos

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is data preparation?

A

the identification of the data gathered and transforming it into its raw data to be ready for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what steps are taken during textural data cleaning preparation?

A

-Get rid of useless words
-Get rid of punctuations
-Lower case
-Fix errors
-Translate language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is data wrangling?

A

the conversion of data from one form to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the data cleaning steps?

(Challenge: Define them)

A

Discovery - identifying the data that has been gathered and working out how it needs altering for analysis

Structuring - cornering data so it can be analysed for apple taking a HTML document and putting it into a table

Cleaning - removing duplicates and converting data types

Enriching - to combine data with data from other sources or to fill in using sessions to add more content or find context or to find additional data that may need gathering

Validation - making sure the data is reasonably complete and consistent

Publishing - the hanged data is released in its new first for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Data Mining

A

The process of automatically finding hidden patterns relationships and anomalies in large datasets to predict outcomes or gain valuable insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly