Load Balancing: Basic Algorithms and Methods

Workloads need to be assessed and addressed when a project is first being development. This is the best way to avoid dropped servers and ensuing losses, both reputational and material. Even though we can increase a server’s power and optimize its algorithms and code to accommodate growing loads, sooner or later, these methods just won’t cut it.

We’ll eventually have to turn to clustering, which is when servers are linked together and the workload is split between them. This is done using various methods known as balancing. In addition to solving high-load related issues, balancing also offers added redundancy.
It’s important to note that clustering is only as effective as the method of distributing (balancing) loads between cluster members.

Both hardware and software solutions can be used for load balancing. In this article, we’ll be reviewing the basic balancing methods and algorithms.

Balancing Levels

Balancing is done by performing a full series of algorithms and operations on the following OSI model layers:

network;
transport;
application.

Let’s take a closer look at these layers.

Balancing on the Network Layer

Balancing on the network layer can be used if we want several physical machines to respond to one IP address. There are a lot of different ways we can do this:

DNS balancing. Several IP addresses are shared under one domain. The server that receives a client’s request is usually defined by a round robin algorithm (we’ll cover balancing methods and algorithms in more detail below).
Creating an NLB cluster. With this method, servers are connected in a cluster made up of front-end and back-end nodes. The workload is distributed using a special algorithm. This is used in Microsoft solutions.
Balancing over IP using an additional router.
Geographic load balancing. This is done by placing identical services with identical addresses in geographically different regions (this is how Anycast DNS technology works). Geographic load balancing is used by many CDNs.

Balancing on the Transport Layer

This is the simplest option: the client communicates with a balancer, which forwards a request to one of the servers, which then processes it. The server that processes the request is chosen using different algorithms (which we’ll talk about below): cyclic scheduling, identifying the least loaded server from the pool, etc.
At times, it may be difficult to distinguish transport-layer balancing from network-layer balancing. Let’s look at the following rule for a PF network filter in BSD systems:

web_servers = "{ 10.0.0.10, 10.0.0.11, 10.0.0.13 }"
match in on $ext_if proto tcp to port 80 rdr-to $web_servers round-robin sticky-address

Here, we are balancing traffic on a specific TCP port.

Now let’s take a look at another example:

pass in on $int_if from $lan_net \
   route-to { ($ext_if1 $ext_gw1), ($ext_if2 $ext_gw2) }\
   round-robin

In this rule, outgoing traffic is balanced on the network layer. No specific port or protocol has been defined.

The differences between these balancing methods can be explained as follows: solutions that don’t terminate with the user session work on the network layer. They forward traffic and don’t work in proxy mode.
At the network layer, the balancer only decides which server packets will be sent to. The server executes the client session.

At the transport layer, the balancer acts as a proxy and isolates communication with the client. It serves as a middleman, transferring client information as additional data and in headings. This is how the HAProxy balancer works.

Balancing on the Application Layer

When balancing occurs at the application layer, the balancer works in smart-proxy mode. It analyzes client requests and forwards them to different servers depending on the properties of the requested content. This is how Nginx works, distributing requests to either the front-end or back-end. The Upstream module in Nginx is responsible for balancing. You can read more on the specifics of balancing in Nginx with different algorithms here.

Pgpool is another tool for balancing at the application layer–an abstraction layer between the client and PostgreSQL DBMS server. Depending on the content, this can be used to distribute requests by database servers. For example, read requests will be forwarded to one server, and write requests to another. For more information and details on pgpool, see the official documentation.

Balancing Algorithms and Methods

There are a lot of algorithms and methods for balancing workloads. Algorithms should be chosen based firstly on the specifics of the project and secondly on what exactly needs to be accomplished.

Balancing is used to achieve the following goals:

fairness: we have to ensure that every request is allocated the appropriate system resources and that there is never a situation where one request is processed while others are queued;
efficiency: all of the servers processing requests should be working at 100% capacity; we should avoid situations where a server idles while requests are queued for processing (we should say right off the bat that this in particular is far from always achieved);
reduced request execution time: we should ensure that requests are processed in the shortest time possible from start (or from queuing) to finish.
reduced response time: the user-request response time should be as short as possible.

The following properties are also recommended for balancing algorithms:

predictability: you need to clearly understand in which situations and under which loads the algorithm will efficiently perform tasks;
equal loads on system resources;
scalability: the algorithm should still be able to work under increased workloads.

Round Robin

A round robin algorithm is a cyclical scheduling function: the first request is sent to one server, the second to another, and so on. Once the last server has been sent a request, the cycle starts anew.

The most popular implementation of this algorithm is of course the round robin DNS balancing method. As you know, every DNS server saves “host name – IP address” pairs for every machine under a specific domain. This list may look like the following:

example.com	xxx.xxx.xxx.2
www.example.com	xxx.xxx.xxx.3

Every domain on the list may be associated with several IP addresses:

www.example.com	xxx.xxx.xxx.2
www.example.com	xxx.xxx.xxx.3
www.example.com	xxx.xxx.xxx.4
www.example.com	xxx.xxx.xxx.5
www.example.com	xxx.xxx.xxx.6

The DNS server scans all of these records and returns an IP address for every new request: xxx.xxx.xxx.2 to the first request, xxx.xx.xxx.3 to the second, and so on. This way, all of the servers in the cluster receive the same number of requests.

Out of all the obvious advantages of this algorithm, we should first mention that it works independent of a top-level protocol. Round robin algorithms can be used by any protocol that refers servers by name.
Round robin balancing in no way depends on the server’s workload: caching DNS servers help manage client inflow.

Since round robin algorithms can work even when servers are not connected to each other, they can be used for both local and global balancing.
Finally, solutions built on round robin algorithms are often low cost: adding a few records to the DNS server is usually all it takes to get going.

We should mention that round robin algorithms have their drawbacks as well. In order for the workload distribution to meet the aforementioned criteria of fairness and efficiency, each server must have the same set of resources available. When performing operations, the same number of resources should also be activated. In practice, this condition is not often met.

Additionally, the workloads of individual servers in the cluster are not considered when balancing with a round robin algorithm. Let’s say, hypothetically, that one server is under 100% load, and at the same time, others are only under 10-15%. The round robin algorithm doesn’t consider this, so the overloaded server will still receive requests. This lacks any fairness, efficiency, or predictability.

Because of these setbacks, actual red robin applications are extremely limited.

Weighted Round Robin

This is the improved version of the round robin algorithm. In this case, every server is assigned a weighted coefficient according to its capacity. This makes workload distribution more flexible since heavier servers process more requests. However, load balancing problems still remain. More effective balancing can be achieved using other methods that take more factors into consideration when planning and distributing workloads.

Least Connections

In the previous section, we listed the main disadvantages of the round robin algorithm. We’ll mention one more here: the number of active connections is not taken into consideration.

Let’s look at an example. Say we have two servers, we’ll call them A and B. Server A has fewer users than server B at the moment, but server A appears to be under a heavier load than B. How is this possible? The answer is fairly simple: server A’s connections have been maintained longer than server B’s.

This problem can be resolved using the Least Connections algorithm (abbreviated as leastconn). It factors in the number of connections a server has at that moment in time. Each subsequent request is sent to the server with the least number of active connections.

There is an updated version of this algorithm, which is mainly intended for clusters of servers with different specifications and capacities. It’s called Weighted Least Connections and doesn’t only consider the number of active connections, but the server’s weighted coefficient when distributing workloads.

There are other versions of the Least Connections algorithm, namely Locality-Based Least Connection Scheduling and Locality-Based Least Connection Scheduling with Replication Scheduling.

The first method was created specially for proxy-cache servers. Essentially, requests are sent to the servers with the fewest connections. A group of client IPs is attached to each client server. Requests from these IPs are sent to the native server, provided it’s not at full capacity. If it is, then the request will be forwarded to another server (whose load should be less than its max capacity).

In the Locality-Based Least Connection Scheduling with Replication Scheduling algorithm, each IP or group of IP addresses is attached not to individual servers, but to the entire server group. Requests are sent to the server under the lightest load. If all of the servers in the native group are overloaded, then another under-loaded server from the cluster is picked up and added to the group servicing the IP the request came from. The server with the lightest load will then be deleted from the group. This lets us avoid excessive replication.

Destination Hash Scheduling and Source Hash Scheduling

The Destination Hash Scheduling algorithm was created for proxy-cache server clusters, but is often used with other configurations. With this algorithm, servers are chosen to process a request from a static table of destination IP addresses.

The Source Hash Scheduling algorithm is based on the same principles as Destination Hash Scheduling, but source IP addresses are used instead of destination addresses.

Sticky Sessions

Sticky Sessions is an algorithm for distributing incoming requests whereby connections are forwarded to a specific server in a group. This algorithm is used by Nginx. User sessions can be attached to a specific server using IP hash (see the official documentation for more information). Requests are distributed to servers based on the client’s IP address. As stated in the documentation (see the previous link), “The method ensures that requests from the same client will always be passed to the same server except when this server is unavailable.” If the designated server is unavailable, the request will be forwarded to another. A sample fragment of the configuration file would look like this:

upstream backend {
ip_hash;

server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
server backend4.example.com;
}

As of version 1.2.2, a weight can be set for each server in Nginx.

This method has its own problems though. If a client uses a dynamic IP address, then there may be errors when attaching sessions. If a large number of requests are transferred to one proxy server, then we can’t really consider this balancing efficient or fair. The problems we’ve described can be solved using cookies, however. There is a special sticky module in the commercial version of Nginx that can use cookies for balancing. There are similar free modules, like nginx-sticky-module.

Sticky sessions can also be used in HAProxy, which you can read about here.

Conclusion

This article is an introduction to load balancing and related issues. We’ll continue this topic in future articles. If you have any questions, comments, or anything to add, please feel free to do so in the comments below. We’d also be happy to hear your experience with load balancing.

Monitoring

Network and Security

Kubernetes

Selectel Serverless