Core Data Concepts Flashcards

(30 cards)

1
Q

Why has data exploded recently?

A

Because modern apps, devices, and systems produce huge amounts of data.

This explosion is driven by the increasing number of connected devices and applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is it easier to collect data today?

A

Storage is cheaper and tools are more advanced.

Advances in technology have made data collection more efficient and cost-effective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do businesses depend on data?

A
  • To make decisions
  • Improve revenue
  • Gain competitive advantage

Data-driven decision-making is crucial for modern business success.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is data?

A

Facts about something.

Data serves as the foundation for analysis and decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give four examples of data entities.

A
  • Customers
  • Products
  • Sales
  • Orders

These entities represent key components of business operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are attributes?

A

Details describing an entity (e.g., name, price, address).

Attributes provide specific information about data entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is structured data?

A

Data organised in tables with rows and columns.

Structured data is easily searchable and analyzable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Where is structured data usually stored?

A

Relational databases (SQL).

SQL databases are designed to handle structured data efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is semi-structured data?

A

Data with some structure but flexible (e.g., JSON).

Semi-structured data allows for more variability than structured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is unstructured data?

A

Data with no fixed format (images, PDFs, videos, emails).

Unstructured data is more challenging to analyze due to its lack of organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two big storage categories?

A
  • Files
  • Databases

These categories encompass the primary methods for storing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are CSV/TSV files good for?

A

Structured, table-like data.

These formats are commonly used for data interchange.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is JSON popular?

A

It’s flexible and hierarchical.

JSON is widely used for data interchange in web applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What type of data is stored in binary files?

A
  • Images
  • Audio
  • Video

Binary files are used for non-text data types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name three optimised big-data formats.

A
  • Avro
  • ORC
  • Parquet

These formats are designed for efficient storage and processing of large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do relational databases store?

A

Tables of organised data.

Relational databases use structured query language (SQL) for data manipulation.

17
Q

What prevents data duplication in SQL databases?

A

Normalisation.

Normalisation is a process to reduce redundancy in database design.

18
Q

What language is used to query relational databases?

A

SQL.

SQL stands for Structured Query Language.

19
Q

What is a key characteristic of NoSQL databases?

A

They do NOT use traditional tables.

NoSQL databases are designed to handle unstructured and semi-structured data.

20
Q

Give an example of a key-value database.

A

Redis.

Key-value databases are known for their simplicity and speed.

21
Q

Give an example of a document database.

A

MongoDB.

Document databases store data in document formats like JSON.

22
Q

Give an example of a column family database.

A

Cassandra.

Column family databases are designed for high scalability.

23
Q

Give an example of a graph database.

A

Neo4j.

Graph databases excel in managing relationships between data points.

24
Q

What is OLTP used for?

A

Real-time business operations (orders, bank transfers).

OLTP stands for Online Transaction Processing.

25
What do the **ACID rules** ensure?
* Correct * Reliable * Consistent transactions ## Footnote ACID is an acronym for Atomicity, Consistency, Isolation, Durability.
26
What is **OLAP** used for?
Analytics, reporting, dashboards. ## Footnote OLAP stands for Online Analytical Processing.
27
What type of data does **OLAP** typically use?
Large amounts of historical data. ## Footnote OLAP systems are designed for complex queries and analysis.
28
What is **ETL**?
Extract, Transform, Load. ## Footnote ETL is a process used to integrate data from multiple sources.
29
What is a **Data Lake**?
Raw file-based storage for flexible analysis. ## Footnote Data lakes can store structured and unstructured data.
30
What is a **Data Warehouse**?
Structured, relational storage designed for BI and fast SQL queries. ## Footnote Data warehouses are optimized for reporting and analysis.