Protocol: Hypertext Transfer Protocol (HTTP) - application layer protocol to transfer web content - protocol browser uses to request webpages - protocol to return objects to browser - layered on top of byte stream protocol like TP Server is stateless

Request line - indicates method of request (GET, POST sends data to server, HEAD returns headers of get request) - includes URL - includes version number Optional headers - referrer - what caused page to be requested - user agent -client software

Status line - HTTP version - response code indicating outcomes (100s information, 200s success, 300s redirect, 400s errors, 500s server errors) - Server - Lcoation - Allow - Content encoding - Content length - Expires - Modified

v.0.9/1.9 One request/response per TCP Connection Advantage: simple to implement Disadvantage: TCP connection for every request -overhead, slowing transfer, - TCP three way handshake for every request - TCP must start in slow start every time connection opens - servers have many connections that are forced to keep TCP connections in time-wait states until timers expire, so reserve additional resources even after connections completed Solution to increase efficiency: persistent connections

Improves performance - Origin web server hosts content but is far away - TCP throughput is inversely proportional to RTT - So further away that web content is, the slower the web page will load because latency is bigger and throughput is lower - Solution: fetch content from nearby location that has been cached - Also benefit when multiple clients requesting same content because local clients see less latency/more throughput and ISP doesn't have to pay to keep transferring content over expensive links Can occur in multiple places - Browser locally - Network (ISP, content distribution networks) Caches expire to ensure client see the most recent version of a page - based on the expire setter - cache can also check with origin server to check whether original content modified (304 not modified response) Ways to redirect clients to caches: - browser configuration to explicitly configure browser to point to local cache so all HTTP request first directed to local cache before request is forwarded to origin - Origin server or service hosting the content direct browser to cache - -Done by special reply to DNS request to ping potential cache servers

Can occur in multiple places - Browser locally - Network (ISP, content distribution networks)

CDNs and ISPs have symbiotic relationship when it comes to peering relationships CDNs like to peer with ISP because peering directing with ISP where customer located because - better throughput since no intermediate AS hops and network latency lower - redundancy: more vectors to deliver content increases reliability - burstiness: during large request events, having connectivity to multiple networks where content is hosted allows ISP to spread traffic across multiple transit links thereby reducing 95th percentile and lowering transit costs ISPs like peering with CDNs (or host caches locally) because: - providing content closer to ISP customers allows ISP to provide good performance for customers - lower transit costs because traffic isn't traversing costly links

Test 2 L8 Flashcards by Stacia Stokes

The Web and Caching

Protocol: Hypertext Transfer Protocol (HTTP)

application layer protocol to transfer web content
protocol browser uses to request webpages
protocol to return objects to browser
layered on top of byte stream protocol like TP

Server is stateless

How well did you know this?

Not at all

Perfectly

HTTP Requests

Request line

indicates method of request (GET, POST sends data to server, HEAD returns headers of get request)
includes URL
includes version number

Optional headers

referrer - what caused page to be requested
user agent -client software

How well did you know this?

Not at all

Perfectly

HTTP Response

Status line

HTTP version
response code indicating outcomes (100s information, 200s success, 300s redirect, 400s errors, 500s server errors)
Server
Lcoation
Allow
Content encoding
Content length
Expires
Modified

How well did you know this?

Not at all

Perfectly

Early HTTP

v.0.9/1.9

One request/response per TCP Connection

Advantage: simple to implement

Disadvantage: TCP connection for every request -overhead, slowing transfer,

TCP three way handshake for every request
TCP must start in slow start every time connection opens
servers have many connections that are forced to keep TCP connections in time-wait states until timers expire, so reserve additional resources even after connections completed

Solution to increase efficiency: persistent connections

How well did you know this?

Not at all

Perfectly

Persistent Connections

Multiple HTTP requests/responses are multiplexed on a single TCP connection

-Delimiters indicate end of requests
Content length allows receive to identify length of response
—So server must know size of transfer in advance

How well did you know this?

Not at all

Perfectly

Pipelining

Client sends the next request as soon as it encounters a referenced object

So as little as one RTT for all referenced objects before they begin to be fetched

Persistent connections with pipelining was default in HTTP 1.1

How well did you know this?

Not at all

Perfectly

Caching

Improves performance

Origin web server hosts content but is far away
TCP throughput is inversely proportional to RTT
So further away that web content is, the slower the web page will load because latency is bigger and throughput is lower
Solution: fetch content from nearby location that has been cached
Also benefit when multiple clients requesting same content because local clients see less latency/more throughput and ISP doesn’t have to pay to keep transferring content over expensive links

Can occur in multiple places

Browser locally
Network (ISP, content distribution networks)

Caches expire to ensure client see the most recent version of a page

based on the expire setter
cache can also check with origin server to check whether original content modified (304 not modified response)

Ways to redirect clients to caches:

browser configuration to explicitly configure browser to point to local cache so all HTTP request first directed to local cache before request is forwarded to origin
Origin server or service hosting the content direct browser to cache
-Done by special reply to DNS request to ping potential cache servers

How well did you know this?

Not at all

Perfectly

TCP Throughput is inversely proportional to…

RTT

How well did you know this?

Not at all

Perfectly

Locations of caches

Can occur in multiple places

Browser locally
Network (ISP, content distribution networks)

How well did you know this?

Not at all

Perfectly

Ways to redirect clients to caches

Ways to redirect clients to caches:

browser configuration to explicitly configure browser to point to local cache so all HTTP request first directed to local cache before request is forwarded to origin
Origin server or service hosting the content direct browser to cache
-Done by special reply to DNS request to ping potential cache servers

How well did you know this?

Not at all

Perfectly

Benefits of caching

Reduced transit costs for ISP

Improved performance for clients

How well did you know this?

Not at all

Perfectly

Web Content Distribution Networks (CDN) defined

Overlay network of web caches designed to deliver content to a client from optimal location
-Optimal can be geographically closest or something else.

CDNs often have geographically disparate servers

Purpose to place caches as close to users as possible

Owned by content providers (google), networks (At&t) or independently (Akami)

Nonnetwork CDNs typically place servers in other autonomous systems or ISPs

The number of cache nodes vary.

How well did you know this?

Not at all

Perfectly

How server selection works in CDN

Which server criteria

least loaded server
lowest latency (most typical)
any alive server

How well did you know this?

Not at all

Perfectly

Content Routing: How clients get redirected to the right server in CDN

How to direct clients
1-Routing systems (e.g. anycast): number all replicas with same IP address then allow routing to take them to closes replica
–simple but servers given very little control since at whims of Internet routing (simple but coarse)

2-Application-based (e.g. HTTP redirect): requires client to first go to origin server to get redirect, increasing latency (simple but delays)

3-Naming system (e.g. DNS): client looks up domain and response contains IP address of nearby cache. Significant flexibility in directing different clients to different server replicas (fine-grained control and fast)

How well did you know this?

Not at all

Perfectly

CDNs ++ ISPs

CDNs and ISPs have symbiotic relationship when it comes to peering relationships

CDNs like to peer with ISP because peering directing with ISP where customer located because

better throughput since no intermediate AS hops and network latency lower
redundancy: more vectors to deliver content increases reliability
burstiness: during large request events, having connectivity to multiple networks where content is hosted allows ISP to spread traffic across multiple transit links thereby reducing 95th percentile and lowering transit costs

ISPs like peering with CDNs (or host caches locally) because:

providing content closer to ISP customers allows ISP to provide good performance for customers
lower transit costs because traffic isn’t traversing costly links

How well did you know this?

Not at all

Perfectly

Bit Torrent

Study These Flashcards

Peer-to-Peer CDN used for file sharing and distribution of large files

Fetch content from peers rather than origin

reduce congestion and -
prevent overload at network where content hosted

Break original file into many pieces, replicate different pieces on different peers as soon as possible so each peer assembles file by picking up different pieces and get remaining pieces from other peers. Eventually assemble entire files

Bit Torrent Publishing

Study These Flashcards

Peer creates torrent
- tracker metadata
- pieces of file
Seeders create initial copy (has complete copy)
- tracker has metadata about file including list of seeders that contain initial copy of file
client starts to download pieces of file from seeder.
- hopefully different from other peers
Clients swap chunks of the file and after enough swapping, they should all have complete files.

Leechers: clients with incomplete copies of the file.

Trackers: allows peers to find each other and returns random list of peers leechers can use to swap parts of the file

Freeloading: client leaves network as soon as finished downloading file. Solved by bit torrent

Solution to freeriding

Study These Flashcards

Bit torrent solved freeriding (clients leaving network as soon as finished downloading file) with choking

Choking (tit-for-tat) temporary refusal to upload chunks to another peer
-if peer can’t download from peer, don’t upload to it
(eliminated freerider problem)

Bit torrent getting chunks to swap

Study These Flashcards

If all client receive same chunks, no one has complet copy and clients won’t swap

Rarest piece first: client determines which pieces are rarest and download those first

so most common pieces left to the end and larger variety of pieces downloaded from seeder
to begin, randomly download because rare piece not a good idea initially
redundant requests deleted when piece received so single peer with slow transfer rate doesn’t slow down the network

Distributed Hash Tables

Study These Flashcards

Enable a form of content overlay called a structured overlay

Chord

Study These Flashcards

Distributed hash table

Scalable, distributed lookup service (maps keys to values)

Scalable
Provable correctness
Reasonably good performance

Main motivation is scalable location of data in a large distributed system

Key problem is lookup: hash table isn’t located in one place, distributed across network (DHT distributed hash table)

Consistent Hashing

Study These Flashcards

keys and nodes map to same ID space

Create metric space (like a ring) with nodes. Nodes have ID and key maps to ID space. Consistent hash function will assign the keys and nodes and ID in this space. Hash function used to assign identifiers. Nodes hash of IP address, keys hash of keys.

In chord, key stored at successor, which is node with next highest ID.

Consistent hashing provides:

load balance because all nodes receive roughly same number of keys
flexibility because when node leaves or joins network, only a small fraction of keys needs to be moved

Implementing consistent hashing

Study These Flashcards

Every node knows location of every other nodes

lookups are fast on the order of 1
routing table must be large because every node must know location of every other node, order N (number of nodes in network)

Each node only knows location of immediate sucessor

small table, size order 1
requires order N lookups

Finger tables: every node knows location of N other nodes, distance of nodes that it knows increases exponentially
-finger i points to the successor of n+2i
-find predecessor for a particular ID and ask what is the successor of that ID
finger tables have mapping of predecessor too. Then ask that predecessor for its successor, moves around ring looking for node whose successor’s id is bigger than id of data
-require order of long(n) hopes
-order logn messages per lookup
-size of finger table is order of log n state per node
-when nodes joins network, initialize fingers of node and update fingers of existing nodes, transfer keys from successor to new node

-when a node leaves, any particular node keeps track of fingers of successor so predecessor can reach nodes corresponding to failed/left node’s fingerlings

Finger tables

Study These Flashcards

(from wiki)
To avoid the linear search above, Chord implements a faster search method by requiring each node to keep a finger table containing up to m entries, recall that m is the number of bits in the hash key.

The i^{th} entry of node n will contain:
successor((n+2^{i-1]) mod 2^m)

The first entry of finger table is actually the node’s immediate successor (and therefore an extra successor field is not needed). Every time a node wants to look up a key k, it will pass the query to the closest successor or predecessor (depending on the finger table) of k in its finger table (the “largest” one on the circle whose ID is smaller than k), until a node finds out the key is stored in its immediate successor.

With such a finger table, the number of nodes that must be contacted to find a successor in an N-node network is O(\log N)

Finger table node join

Whenever a new node joins, three invariants should be maintained (the first two ensure correctness and the last one keeps querying fast): 1 Each node's successor points to its immediate successor correctly. 2 Each key is stored in successor(k). 3 Each node's finger table should be correct. To satisfy these invariants, a predecessor field is maintained for each node. As the successor is the first entry of the finger table, we do not need to maintain this field separately any more. The following tasks should be done for a newly joined node n: 1 Initialize node n (the predecessor and the finger table). 2 Notify other nodes to update their predecessors and finger tables. 3 The new node takes over its responsible keys from its successor

Test 2 L8 Flashcards

(25 cards)