What is the motivation for distributed databases?
What does each site store data about in distributed databases?
When are distributed databases needed?
More general: large organisations/companies
◦ …with different branches or offices or sub-companies
◦ …or simply so large that one computer can’t handle all the request/transactions you want to do, so distributed databases can handle this.
Why are distributed databases useful for providing access to large datasets to many users?
◦ Distribute data over several computers –don’t have to be identical in software or hardware
◦ Computers could be at geographically different physical locations (but also in the same place) depending on what system you’re dealing with.
What are the advantages of using distributed databases?
◦ Balance workload & network traffic, handle multiple queries simultaneously, twice as many operations executed by having two sites.
◦ Easier to extend capacity or scale to higher number of users, just plug in more hardware.
◦ If there’s a physical damage to one site the other locations remain undamaged, for example a fire occurs.
What is the formal definition of Distributed Databases
◦ Collection of multiple logically interrelated databases that are distributed over a computer network.
What does DDBMS stand for?
what does a graph representing a DDBMS contain?
-different sites are known as nodes - they correspond to where the database is stored.
- lines connecting them represent network links. In general you may not have network links between every pair.
What are the advantages of DDBMS?
What are the advantages of DDBMSs?
What are the two methods of transparency in DDBMS?
Fragmentation and replication
what is transparency in DDBMS?
Keeps data hidden from people accessing the database
What is fragmentation?
Describe horizontal fragmentation
How do you get the original table back after doing horizontal fragmentation?
Describe vertical fragmentation
How is the database transparent due to using fragmentation?
The user doesn’t actually see all these fragmentations, they just see the full relation when they query R because the DBMS puts all the fragmentations back together
Which methods for transparency are the most commonly used?
Fragmentation and replication transparency
What is typical with fragmentation so a DBMS can put all the fragments back together?
Typically tuples are stored at a particular site according to a common value of a specific attribute
- so all rows that have a as their type may be stored at the site in Liverpool
What is special about fragmentation?
Why does redundancy improve resilience?
Why does redundancy improve efficiency?
Example: we have a query about suppliers, if other sites keep copies of data about the suppliers, then we may be able to execute it faster by getting different parts of the query from different sites.
- Allows stores to answer queries involving suppliers without establishing a connection to the central office
How is replication used to provide transparency?
What is full replication?
Where a copy of the whole database is stored at every site.