Distributed systems Flashcards

(43 cards)

1
Q

What is a distributed system?

A
  • A collection of nodes that appear to users as a single coherent system
  • Nodes can be hardware devices or software processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 4 benefits of a distributed system?

A
  • Resource sharing
  • Performance improvement (parallel execution)
  • Scalability
  • Reliability and availability (redundancy improves fault tolerance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fallacies:

How is “the network is reliable” a false assumption?

A
  • The network can have hardware failures or software bugs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fallacies:

How is “the networkis secure” a false assumption?

A
  • Networks are vunerable and security must be built into applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Fallacies:

How is “the network is homogeneous” a false assumption?

A
  • Systems contain different OSs, hardware and protocols
  • Require interoperability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fallacies:

How is “the network topology does not change” a false assumption?

A
  • Devices frequently join and leave
  • Use abstractions like DNS instead of IP address
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fallacies:

How is “latency is zero” a false assumption?

A
  • Communication delay is unavoidable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fallacies:

How is “bandwidth is infinite” a false assumption?

A
  • There is limited transfer capacity and data compression
  • Traffic management is required
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Fallacies:

How is “transport cost is zero” a false assumption?

A
  • Data serialisation requires time and infrastructure
  • Network usage costs money
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fallacies:

How is “there is one administrator” a false assumption?

A
  • Multiple organisations manage networks and coordination
  • This means that compatability issues may arise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does communicaton take place in a distributed system?

A
  • Message passing: nodes send messages rather than sharing memory
  • Marshalling: transforms object memory into a transferable format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the middleware in a distributed system?

A
  • The middle software layer above the OS that provides support and security
  • Provides a uniform interface to application
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is transparency in a distributed system and name 7 types?

A
  • Transparency hides complexity from users and applications
  • Types: access, location, relocation, migration, replication, concurrency, failure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explaing the difference between data and process synchronisation

A
  • Data: keeping multiple copies of data in coherence
  • Process: multiple processes need to act together for fast and reliable communication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Lamport’s logical clock?

A
  • When an event occurs, the logical clock is incremented by 1
  • Logical clock is send to reciever, and reciever is set to one step ahead, to preserve partial ordering
  • This means we cannot infer the order of 2 events
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are central lock servers?

A
  • Marks a resource as being used, and grants a mutex lock
  • If not available, the mutex request is put into a queue

However Central Lock Server is a single point of failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a two-phase commit algorithm and give an example?

A
  • Ensures a sequence of events is either completed or returned to original state
  • Bully algorithm chooses a coordinator node which requests a transaction
  • Sends election where all participating nodes must vote whether to commit or abort
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain the two clock synchronisation algorithms

A
  • Cristians algorithm: uses a Round Trip Time RTT between nodes, then halves it, and adds it to the time servers time
  • Berkley algorithm: server is master, averages RTT of other nodes and sends out amount each slave must adjust its clock by
19
Q

What are the 4 conditions of deadlock?

A
  • Mutual exclusion: one resource is non-shareable
  • Hold and wait: process is holding a resource
  • No pre-emption: nothing can force a process to relinquish a resource
  • Circular wait: process is waiting for a resource being held
20
Q

What is a naming resolution?

A
  • How a process can access a named entity using its name
  • Supported by a naming system
21
Q

Explain the 3 naming systems

A
  • Centralised naming: single point of contact that allocates unique names- single point of failure and not scalable
  • Free-for-all naming: allows object to make up its own name- not unique
  • Delegating responsibilty: smaller part of system allocates names based on rules- balances issues with uniqueness and scalability+failure
23
Q

Give 3 delegated naming systems

A
  • MAC addresses (48-bit): unique identifier to each device- does not say where device is on network
  • IP addresses (32-bit): contains information about where a device is on a network
  • DNS: Distributed system on top of internet which associates names and IP addresses- uses real-time allocation with top of hierarchy dealing with user requests
24
Q

What is a protocol and give 2?

A
  • Protocols are sets of rules governing how objects interact
  • HTTP and SMTP
25
Explain HTTP
- **Hypertext transport protocol** - Allows clients to **request resources** from stateless web servers - Stateless means once request is fulfilled the server forgets the client - Allows system to treat each request as an independent transaction
26
Explain SMTP
- **Simple Mail Transport Protocol** - Connection based protocol that **sends emails** between clients and servers - Ensures emails are in exactly one place at any one time
27
How does the OS manage resource usage?
- **Assigns unique identity** to each process and grants access by **assigning address space** - Ensures no other process tampers with locations
28
What is multi-tasking?
- Allows multiple processes to be underway by **controlling resources** - Uses a scheduling policy, **not really concurrent**
29
What is multi-processing and how does it differ in forking and threading?
- Multi-processing allows 2 processes to be active concurrently - **Forking** **copies** a process and gives them distinct addres spaces- **safe but expensive due to copying** - **Threading** **shares** the address space, rather than copying, which allows the processes to interact- **less expensive and less safe**
30
Explain the difference between concurrent, parallel and distributed?
- **Concurrent**: one processor with mutliple threads - **Parallel**: many processors which interconnect - **Distributed**: many independent machines that exchange messages
31
What is a centralised architecture?
- Traditional client-server where server implements software - **Simple** only has **clients and server** with reply-request behaviour - **Multi-tiered** has **UI layer, processing layer and data layer** which are seperated into client machine and server machine- all functionality handled by server ## Footnote Called **vertical distribution**
32
What is a decentralised architecture?
- **Peer-to-peer architectures** where all nodes are equal - Distributes **client-server functionality evenly**- better workload balance ## Footnote Called horizontal distribution
33
What are overlay networks?
- Implementation of decentralised architecture - **Symmetrical network** where each node communicates through other nodes - **Structured**: nodes organised in a topology to efficiently look up data using hashing - **Unstructured**: each neighbour requires an ad hoc list of neighbours, which changes continuously- requires searching
34
Explain 2 searching methods in overlay networks
- **Flooding**: nodes continuously pass request to all neighbours until it is found- **expensive** so **time-to-live value** needed to give max forwards allowed - **Random walks**: randomly chosen neighbour passes along request- **less traffic**, but more time, so can impose n walks simultaneously
35
What is a hybrid achitecture and give 2 types?
- **Combines** centralised and decentralised architectures - **Collaborative distributed systems**: centralised when a node joins, decentralised once it has joined - **Edge-server systems**: deployed on internet, servers placed at edge of the network, serves all content
36
Give and explain 4 arhcitecture styles
- **Layered**: bottom layers provide service to top layers, with requests flowing from the top - **Object-based**: objects correspond to components, and invoke methods to communicate (Remote Procedure Calls RPCs) - **Resource-centered**: system is collection of resources that are managed by components and primary communication happens in data center - **Event-based**: processes publish a notification and others subscribe to notifications
37
Explain the 2 characteristics of processed in event-based architecture style
- **Referentially decoupled**: one process does not explicitly know any other processes - **Temporally coupled**: processes must be running at the same time to communicate
38
What is interprocess communication?
- **Communication mechanism** which allows processes on different machines to exchange info - Enables data transfer and coordination ## Footnote Forms basis of Remote Procedure Control and middleware
39
Explain Remote Procedure Call
- Hides message-passing by allowing processes to call procedures remotely - **Client stub marshals** parameters into a message, **server stub unmarshals** and executes ## Footnote Improves communication transparency, making it ideal for client-server architectures
40
What is parameter marshalling in Remote Procedure Call?
- Packing **parameters into message** before transmission - Transforms data into a **machine-independant** and **network-independant** format - Ensures data is interpreted consistently - Passes references as objects by copying entrie data structures
41
Explain asynchronous Remote Procedure Call
- Server** immediately sends a reply/acknowledgement** after receiving request (before processing) - Client completes execution without** waiting for full result** - Enables **non-blocking communication** between processes ## Footnote Useful in systems where reponsiveness is more important than immediate results
42
What is Message-Oriented Middleware?
- Middleware that enables communication through **message-queueing mechanisms** - Supports **asynchronous** communication where **sender+reciever are not simultaneously active** - Messages stored in queues+forwarded to destination
43