What is cloud computing?
Cloud Computing is Computing in the Internet
What is the fat tree design?
Network design for datacenter:
- Three tier design: Edge, Aggregation, Core
- Defined by single parameter k = number of ports on a switch
- All layers use the same switch
- Supports k³/4 hosts
- High redundancy: k*k/4 paths between two endpoints
What is the jellyfish network design?
Forget network structure and use random connections:
- Each 4L ports switch connects to
– L hosts
– 3L other random switches
What is the CAP theorem?
In a distributed system you can satisfy at most two out of the following three properties:
1. Consistency: all nodes have same data at any time
2. Availability: the system allows operations all the time
3. Partition-tolerance: the system continues to work in spite of network
partitions
How does Cassandra handle the CAP theorem?
Weak consistency
What are the characteristics of Cassandra?
How is data stored in Cassandra?
What are the replica policies in Cassandra?
How does a write operation in Cassandra work?
How do Bloom filters work and what are they used for in Cassandra?
Bloom filter: Bit map and a set of hash functions.
- Use the set of hash functions to create a fingerprint for a given key:
– h(x) = y -> BIT[y] = 1
- is used to check if data is present on a node
- might create false positives
How is a delete operation done in Cassandra?
How is a read operation done in Cassandra?
How is the potential speed-up of parallelization computed?
S = 1 / ((1-p) + p/n)
Describe the two methods of parallelization in cloud computing
Request Level Parallelism (RLP):
- Concurrent processing of multiple requests: e.g. Google
– Distribute indexing, images, documents, ads, … to multiple nodes
Data Level Parallelism (DLP):
- Concurrent processing of multiple data: e.g. MapReduce
– Distribute data with map and reduce nodes
Explain the main principle of MapReduce
How is the architecture of MapReduce?
Master-Worker architecture:
Master = Job Tracker (JT), Worker = Task Tracker
- TT pulls map or reduce tasks from JT
- TT periodically sends heartbeat to JT
How is fault tolerance implemented in MapReduce?