Big Data Flashcards

(16 cards)

1
Q

What is big data?

A

Data that can’t be processed or analysed using traditional processes or tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 characteristics of big data?

A

1) Variety: Variety of different forms of information // data may lack structure
Cannot be represented by a relational database: e.g. email messages, videos, images

2) Volume: There is a lot / high volume of data (to process as one dataset) // data will not fit on one server
hundreds of terabytes: e.g. medical datasets for diagnosis, predicting disease outbreaks

3) Velocity: The data is generated/received and processed at high velocity
Data must be processed as it is received - it cannot be batched and processed later: e.g. card payment fraud detecton, recommendation systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the challenges that come with big data?

A
  • Data cannot be stored on one server / computer.
  • Not possible to process data quickly enough with one computer.
  • Data cannot be represented in a table // by a relational database.
  • Some forms of data / unstructured data are difficult to analyse.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are Big Data’s challenges overcome?

A
  • Distributed database systems
  • Using functional programming can help process and analyse the unstructured data: e.g. MapReduce can be used by splitting the input data into parts then executing mapper on each part, before combing results with Reduce, functional programming makes it easier to write correct code
  • Using a fact-based model can manage bigger data sets better than relational models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the ethical issues around big data?

A

Area 3: Ethical and Legal Issues

  • How can data be kept securely?
  • Who should have access to what data?
  • Will people know what data is being stored about them?
  • Where should / will the data be stored // concerns relating to data being stored in other countries.
  • What rights do people have in relation to data stored about them?
  • Example laws (allow two examples): Computer Misuse Act, General Data Protection Regulations / GDPR / Data Protection Act, Regulation of Investigatory Powers Act / RIPA.
  • Who owns data about individuals?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Features of functional programming languages that make it easier to write code that can be distributed to run across multiple servers

A
  • Data structures are immutable meaning the values stored in data structures cannot be changed
  • Programs are stateless meaning functions do not have side-effects
  • Map-reduce can be used // Higher-order functions can compose the results of processing on multiple processors
  • The order of execution can be determined at runtime
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In the fact based model, each individual piece of information is stored as a […]

A

In the fact based model, each individual piece of information is stored as a fact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the fact based model, […] is stored as a fact

A

In the fact based model, each individual piece of information is stored as a fact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is stored with each fact in a fact based model?

A

Timestamp of the date and time at which a piece of information was recorded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why are timestamps stored with facts?

A

Facts are never deleted or overwritten and multiple different values could be held for the same attribute so the timestamps allow the computer to discern which value is most recent (immutable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why can traditional relational databases not be used for big data?

A

Due to the volume of big data, there are usually several terabytes of information which are unstructured and need to be processed extremely rapidly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to construct graph schema?

A
  • Solid lines with writing to show relationship between two nodes
  • Dashed lines between nodes and its properties
  • Rectangular boxes for properties
  • Ovals to represent nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are fact-based models useful for storing Big Data?

A
  • Facts are immutable
  • Reduces risks of accidently losing data due to human error
  • Does not require indexing as new data is simply appended as it is created
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Big Data characteristics MS with examples

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Big Data problems and solutions MS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Big Data ethics MS