A2 Databases & Distributed Systems Flashcards

(20 cards)

1
Q

Normalisation

A

Normalisation is the process of structuring the data in a database to
reduce redundancy and improve integrity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

First Normal Form

A

Requirements: There are no repeated groups of data. The data is atomic. Each field contains one value only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Second Normal Form

A

Requirements: Meets the requirements for 1NF. No partial dependency. All non-key attributes fully depend on the entire primary key and not only on part of the primary key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Third Normal Form

A

Requirements: Meets the requirements for 2NF. No transitive dependency. Non-key attributes depend only on the primary key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SQL

A

Structured query language is the language used to search databases. A query language allows users to retrieve and manipulate data from a database. Here is the statement used to output or display data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Big Data

A

Extremely large and complex datasets that traditional databases are unable to store and process them within acceptable time frames.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The 5 V’s

A

Volume: Huge amounts of data.
Example: Facebook generates 4 petabytes/day.

Velocity: Data is created at high speed.
Example: Social media posts in real-time.

Variety: Different types of data – structured, unstructured, semi-structured.
Example: Text, images, videos.

Veracity: Data quality and accuracy.
Example: Inconsistent or missing values in surveys.

Value: Useful information extracted from data.
Example: Targeted advertising from customer data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Warehousing

A

A central repository where data from multiple sources is stored in an organised way for reporting and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why Data Warehousing is needed

A
  1. Integrates Data from Multiple Sources
    In most organizations, data comes from different systems: sales, inventory, HR, and finance.
    Problem: Each system has its own format → hard to analyse together.
    Solution: A data warehouse collects and organizes all data in a standard format.
    Example: A supermarket combining online and in-store sales data.
  2. Supports Efficient Reporting and Analysis
    Operational databases are optimized for daily transactions, not for complex queries.
    Data warehouse is optimized for analysis → faster queries and reports.
    Example: Generating monthly sales trend reports in seconds instead of hours.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Mining

A

The retrieval and analysis of large sets of data in data warehouses to identify trends and patterns. E.g. Marketing opportunities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Structured Data

A

Data that is organised in a fixed format, usually in rows and columns.
Characteristics: Stored in tables. Easy to search, sort, and analyse.
E.g. Student records, Bank transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unstructured Data

A

Data that does not have a predefined format or organised structure.
Characteristics: Not stored in traditional table format & difficult to analyse using simple tools. Requires advanced tools.
E.g., emails, social media Posts, images, videos and PDF files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Techniques

A

Classification: Sorting data into categories.
Clustering: Grouping similar data together.
Association: Finding relationships between data items.
Examples: Market basket analysis: Customers who buy bread often buy butter.
Detecting fraudulent credit card transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Predictive Analysis

A

A subset of data mining used to make predictions about future events based on historical behaviour. E.g. weather/economic forecasting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Techniques used for Predictive Analysis

A

Predictive Score - Assigns a probability for the likelihood that something, such as a customer, will behave a certain way. It is used to predict behaviour and assess risk over a wide variety of disciplines.
Statistical modelling - Using statistical techniques to build models to predict what might happen in the future.
Machine learning algorithms - AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distributed Systems/Processing

A

Distributed Systems/Processing is a system/technique of carrying out a large computing task by sharing the processing between computers in different locations.

17
Q

How Distributed Systems/Processing Works

A

Each computer runs its own programs and has its own store of data, but will share data with other computers.
Computers in various locations will be linked in a wide-area network.
Each computer will have the software necessary to carry out database operations on records and to display any associated information/images.
Records will be held locally, but additional records may be held centrally when needed.
Staff/users may access and update information at any of the locations by means of the network.
The overall system may provide summary management data.
The system will be able to inform users of updates and any actions needed.

18
Q

Advantages of Distributed Systems/Processing

A

Faster processing - Tasks are split into smaller parts and processed at the same time by multiple machines to complete the work faster.
Fault tolerance - if one machine fails, others continue.

19
Q

Distribution of Data

A

When data is distributed, the database is
physically stored in multiple locations. Each site holds part or all of the data.

20
Q

Distribution of Processing

A

When processing is distributed, the computations and database operations are executed across several computers rather than on a single machine. This means: Each node can process requests. Workload is shared among multiple machines. Tasks can run in parallel.