Why has data exploded recently?
Because modern apps, devices, and systems produce huge amounts of data.
This explosion is driven by the increasing number of connected devices and applications.
Why is it easier to collect data today?
Storage is cheaper and tools are more advanced.
Advances in technology have made data collection more efficient and cost-effective.
Why do businesses depend on data?
Data-driven decision-making is crucial for modern business success.
What is data?
Facts about something.
Data serves as the foundation for analysis and decision-making.
Give four examples of data entities.
These entities represent key components of business operations.
What are attributes?
Details describing an entity (e.g., name, price, address).
Attributes provide specific information about data entities.
What is structured data?
Data organised in tables with rows and columns.
Structured data is easily searchable and analyzable.
Where is structured data usually stored?
Relational databases (SQL).
SQL databases are designed to handle structured data efficiently.
What is semi-structured data?
Data with some structure but flexible (e.g., JSON).
Semi-structured data allows for more variability than structured data.
What is unstructured data?
Data with no fixed format (images, PDFs, videos, emails).
Unstructured data is more challenging to analyze due to its lack of organization.
What are the two big storage categories?
These categories encompass the primary methods for storing data.
What are CSV/TSV files good for?
Structured, table-like data.
These formats are commonly used for data interchange.
Why is JSON popular?
It’s flexible and hierarchical.
JSON is widely used for data interchange in web applications.
What type of data is stored in binary files?
Binary files are used for non-text data types.
Name three optimised big-data formats.
These formats are designed for efficient storage and processing of large datasets.
What do relational databases store?
Tables of organised data.
Relational databases use structured query language (SQL) for data manipulation.
What prevents data duplication in SQL databases?
Normalisation.
Normalisation is a process to reduce redundancy in database design.
What language is used to query relational databases?
SQL.
SQL stands for Structured Query Language.
What is a key characteristic of NoSQL databases?
They do NOT use traditional tables.
NoSQL databases are designed to handle unstructured and semi-structured data.
Give an example of a key-value database.
Redis.
Key-value databases are known for their simplicity and speed.
Give an example of a document database.
MongoDB.
Document databases store data in document formats like JSON.
Give an example of a column family database.
Cassandra.
Column family databases are designed for high scalability.
Give an example of a graph database.
Neo4j.
Graph databases excel in managing relationships between data points.
What is OLTP used for?
Real-time business operations (orders, bank transfers).
OLTP stands for Online Transaction Processing.