Load Balancer
a) A load balancer is a system component that distributes incoming network traffic across multiple servers to ensure no single server gets overwhelmed. Prevents any server from becoming the single point of failure.
b) It improves availability, scalability, and responsiveness.
How does it work
a) LB has its own IP
b) LB typically:
1)Maintains a list of these servers and their IP addresses or DNS names.
2)Regularly checks their health using health checks (like HTTP pings, TCP connects). Heath checks usually every 5-30 seconds
3)Keeps metadata like:
Which servers are healthy or unhealthy
A load balancer keeps an internal configuration and runtime memory structure (often just in RAM), which contains:
The list of registered backend servers (their IP addresses or DNS names + ports).
Their health status (healthy/unhealthy).
Where it is present usually.
a)Between the user and the web server
b) Between web servers and an internal platform layer, like application servers or cache servers
c) Between internal platform layer and database.
Flow: Client -> LB -> Web Servers -> Internal LB -> App Servers -> DB LB -> DB Servers
Describe the full flow
1️⃣ Client requests website:
Enters www.xyz.com in the browser.
2️⃣ DNS resolution:
Browser (or OS) performs a DNS lookup.
DNS returns the public IP address of the external load balancer (or a CNAME that ultimately resolves to it).
3️⃣ Client connects to LB:
Browser opens a TCP connection to the external load balancer on port 80/443.
Sends the HTTP request for the page.
4️⃣ External Load Balancer distributes the request:
Receives the HTTP request and forwards it to one of the available web servers.
5️⃣ Web server handles request:
If it’s for static content (HTML, CSS, JS, images), the web server responds directly back to the client with an HTTP response.
6️⃣ If dynamic / business logic needed:
The web server forwards the HTTP request to an internal load balancer, which then routes it to an appropriate application server.
7️⃣ Application server processes:
Runs the business logic, may query the database via its own internal connections.
8️⃣ Response flows back:
Application server sends the HTTP response back to the web server.
The web server then sends the final HTTP response to the client via the existing TCP connection to the LB.
So the response from web servers go directly to client or via external LB.
It all depends on how your external load balancer is configured.
a) If the LB is a Layer 7 HTTP(S) proxy , then the client opens the TCP connection to LB and client sends the request to the LB. LB makes a new TCP connection with the web server and sends the request to the web server.
The web server responds back to the LB, which sends the HTTP response over the original TCP connection to the client.
b) If the LB is a Layer 4 TCP forwarder
The LB just forwards the TCP connection itself to the web server. In this case, it’s effectively direct back to the client,
How does the load balancer choose the backend server?
a) LB should only forward requests to “Healthy” servers. It uses health checks regularly, If a server fails a health check, it is automatically removed from the pool, and traffic will not be forwarded to it until it responds to the health checks again.
b) Based on a pre-configured algorithm.
Diffrent LB algorithms
a) Least Connection method: This method directs traffic to the server with the fewest active connections.
Good for: when there are a large number of persistent client connections which are unevenly distributed between the servers.
b) Least Response Time method: This method directs traffic to the server with the lowest average response time.
Good for: Systems where response times can vary widely, or some servers occasionally get slower due to CPU/memory pressure.
c) Least Bandwidth Method - This method selects the server that is currently serving the least amount of traffic measured in megabits per second (Mbps).
Good for: Ideal for workloads where the amount of data sent is uneven, like:Streaming video,Large file downloads
d) Round Robin: This method simply rotates through the list of servers. Sends each new request and when it reaches the end of the list, it starts over at the beginning.
Good: where the servers are of equal specifications.
e) Weighted Round Robin: The weighted round-robin is designed to better handle servers with different specifications/processing capacities. Each server is assigned a weight (an integer value that indicates the processing capacity). Servers with higher weights get more traffic. Server A (weight 3)
Server B (weight 1)
So Server A gets 3x the requests.
f) IP hash: Uses the client’s IP address to determine which server gets the request.
✅ Good for: simple session stickiness (same client tends to hit the same server). (helpful if the server keeps temporary user session data in memory)
eg: Client IP: 203.0.113.25
hash(203.0.113.25) → 4
(hash value) MOD (number of servers)
4 MOD (lets say 3 servers)= 1 i.e Server 2 . All the requests from this IP will always go to Server 2.
Redundant LB
LB can be a single point of failures. To overcome this, second LB(Passive) is connect to the first to form a cluster.
a) In case of active LB fails, passive takes over the traffic.
b) They share the same virtual IP and when the active LB fails, the passive one takes over the virtual IP and starts accepting traffic.
c) They keep check on each other health by sending health checks pings every 1–2 seconds to each other.
The monitoring is bi-directional:
If the passive stopped sending heartbeats, the active would notice — maybe the passive failed, so it might log alerts, escalate, or prepare for degraded failover. We’re in a degraded failover state — our redundancy is compromised,
If the active stopped sending heartbeats, the passive detects it and takes over the virtual IP to keep traffic flowing.
Types of LB
ALB ( Application Load Balancer) eg: Nginx, HAproxy, AWS ALB
NLB ( Network Load Balancer)
eg: NGinx TCP, HAProxy TCB, AWS NLB
ALB (Layer 7) Application layer
NLB (Layer 4) Transport layer
ALB Understands HTTP/HTTPS
NLB: Only sees TCP/UDP packets
ALB; Best for Web apps, REST APIs, microservices
NLB: Databases, MQTT, iOT, Redis, gaming, non-HTTP protocols
ALB: Slightly higher latency (parses HTTP)
NLB: Ultra-low latency, very high scale
Why does ALB (Application Load Balancer) have higher latency?
Because it’s a Layer 7 load balancer, which means:
a) It doesn’t just pass the traffic blindly.
b) It actually looks inside the HTTP request (like URLs, headers, cookies, etc.)
c) It sends the request to the pool based on the pre-defined routic logic inside the LB RAM.
HTTP Info: Host header
Sample Rule in ALB: If Host is api.example.com
Server Targeted: Send to API servers
HTTP Info: URL
Sample Rule in ALB: If path starts with /images
Server Targeted: Send to Image server
HTTP Info: Header
Sample Rule in ALB: If header X-App-Version = v2
Server Targeted: Send to v2 backend
Compared to NLB:
It’s Layer 4, so it only checks IP and port.
It doesn’t look into the request.
It just forwards traffic — much faster, but less intelligent.