A2 Databases & Distributed Systems Flashcards

Question 1

Q

Normalisation

Answer

A

Normalisation is the process of structuring the data in a database to
reduce redundancy and improve integrity.

Question 2

Q

First Normal Form

Answer

A

Requirements: There are no repeated groups of data. The data is atomic. Each field contains one value only.

Question 3

Q

Second Normal Form

Answer

A

Requirements: Meets the requirements for 1NF. No partial dependency. All non-key attributes fully depend on the entire primary key and not only on part of the primary key.

Question 4

Q

Third Normal Form

Answer

A

Requirements: Meets the requirements for 2NF. No transitive dependency. Non-key attributes depend only on the primary key.

Question 5

Q

SQL

Answer

A

Structured query language is the language used to search databases. A query language allows users to retrieve and manipulate data from a database. Here is the statement used to output or display data.

Question 6

Q

Big Data

Answer

A

Extremely large and complex datasets that traditional databases are unable to store and process them within acceptable time frames.

Question 7

Q

The 5 V’s

Answer

A

Volume: Huge amounts of data.
Example: Facebook generates 4 petabytes/day.

Velocity: Data is created at high speed.
Example: Social media posts in real-time.

Variety: Different types of data – structured, unstructured, semi-structured.
Example: Text, images, videos.

Veracity: Data quality and accuracy.
Example: Inconsistent or missing values in surveys.

Value: Useful information extracted from data.
Example: Targeted advertising from customer data.

Question 8

Q

Data Warehousing

Answer

A

A central repository where data from multiple sources is stored in an organised way for reporting and analysis.

Question 9

Q

Why Data Warehousing is needed

Answer

A

Integrates Data from Multiple Sources
In most organizations, data comes from different systems: sales, inventory, HR, and finance.
Problem: Each system has its own format → hard to analyse together.
Solution: A data warehouse collects and organizes all data in a standard format.
Example: A supermarket combining online and in-store sales data.
Supports Efficient Reporting and Analysis
Operational databases are optimized for daily transactions, not for complex queries.
Data warehouse is optimized for analysis → faster queries and reports.
Example: Generating monthly sales trend reports in seconds instead of hours.

Question 10

Q

Data Mining

Answer

A

The retrieval and analysis of large sets of data in data warehouses to identify trends and patterns. E.g. Marketing opportunities.

Question 11

Q

Structured Data

Answer

A

Data that is organised in a fixed format, usually in rows and columns.
Characteristics: Stored in tables. Easy to search, sort, and analyse.
E.g. Student records, Bank transactions.

Question 12

Q

Unstructured Data

Answer

A

Data that does not have a predefined format or organised structure.
Characteristics: Not stored in traditional table format & difficult to analyse using simple tools. Requires advanced tools.
E.g., emails, social media Posts, images, videos and PDF files.

Question 13

Q

Techniques

Answer

A

Classification: Sorting data into categories.
Clustering: Grouping similar data together.
Association: Finding relationships between data items.
Examples: Market basket analysis: Customers who buy bread often buy butter.
Detecting fraudulent credit card transactions.

Question 14

Q

Predictive Analysis

Answer

A

A subset of data mining used to make predictions about future events based on historical behaviour. E.g. weather/economic forecasting.

Question 15

Q

Techniques used for Predictive Analysis

Answer

A

Predictive Score - Assigns a probability for the likelihood that something, such as a customer, will behave a certain way. It is used to predict behaviour and assess risk over a wide variety of disciplines.
Statistical modelling - Using statistical techniques to build models to predict what might happen in the future.
Machine learning algorithms - AI

Question 16

Q

Distributed Systems/Processing

Answer

Study These Flashcards

A

Distributed Systems/Processing is a system/technique of carrying out a large computing task by sharing the processing between computers in different locations.

Question 17

Q

How Distributed Systems/Processing Works

Answer

Study These Flashcards

A

Each computer runs its own programs and has its own store of data, but will share data with other computers.
Computers in various locations will be linked in a wide-area network.
Each computer will have the software necessary to carry out database operations on records and to display any associated information/images.
Records will be held locally, but additional records may be held centrally when needed.
Staff/users may access and update information at any of the locations by means of the network.
The overall system may provide summary management data.
The system will be able to inform users of updates and any actions needed.

Question 18

Q

Advantages of Distributed Systems/Processing

Answer

Study These Flashcards

A

Faster processing - Tasks are split into smaller parts and processed at the same time by multiple machines to complete the work faster.
Fault tolerance - if one machine fails, others continue.

Question 19

Q

Distribution of Data

Answer

Study These Flashcards

A

When data is distributed, the database is
physically stored in multiple locations. Each site holds part or all of the data.

Question 20

Q

Distribution of Processing

Answer

Study These Flashcards

A

When processing is distributed, the computations and database operations are executed across several computers rather than on a single machine. This means: Each node can process requests. Workload is shared among multiple machines. Tasks can run in parallel.

A2 Databases & Distributed Systems Flashcards

(20 cards)