Lately I was visiting a customer that was moving its applications to new servers. They had acquired two huge servers to host their middleware solution, connecting their on premise applications and their cloud-hosted solutions. The client went for an active-active setup, where both servers are handling the load, but if one server fails, the remaining server must be able to handle all the trafic. It sounds like a good setup, but it introduces a lot of waste. We will go into this and show why a cloud solution using auto-scaling is so much better.
The machines are scaled to handle peak performance. This is 100% load, and we must keep in mind that this is a large company with a big IT landscape, so it must handle a lot of data at times. This load is balanced over two instances, but if either server fails, the other must be able to take on the full load, so we can never put more than 50% load on one machine. This means that at best, we can expect to have 50% idle time on these machines. However, since we won’t be running at maximum capacity all day, only at a few peak-moments, the real loss can be expected to be larger, like 60-80% idle time.
To summarize, the client had bought two huge, expensive machines that would be working for 20-40% of the time…
What could they have done better?
Improved server utilization
We keep the original requirement of the client to have active-active setup, and to have one machine in reserve in case an other goes down. However, we change our approach from a few big machines to many, small machines.
Lets do the calculations again, now for 5 smaller machines instead of two big machines. Again we want to be able to handle a single machine failure.
100% load must be handled by our landscape of 5 machines, but if one fails, only 4 machines are left, so each machine must be scaled to handle 25% of the total traffic. During normal operations however, we have 5 machines available, so they will run at 20% of the traffic at peak-moments. This means that at peak moments, the server will be running at 20/25 * 100 = 80% of the maximum load the machine can handle. In other words, it will have 20% idle time, instead of 50% idle time in the previous example.
You pay for capacity, so having less idle time means you’ll be paying less for your servers.
Can we improve on this?
The cloud approach to such a scaling problem would be to have auto-scaling group. Such an auto-scaling group consists of a number of smaller machines, all virtualized. In the above example, we would have created a group of max 5 machines, and a minimum of two machines. This satisfies the requirements of the customer that we must have one hot standby at all times, and we must be able to handle 100% load at peak-moments. The auto-scaling group can spin up extra servers when the need arises, or it can remove servers if the load drops below a certain threshold. The rules must be set so, that the total load of the n servers will never be more than n-1 servers could handle, because when a server fails, the others must be able to handle the load. As soon as this threshold is met, a new instance will be started in order to lower the load per machine.
Now, whenever the load of the system drops below the peak-load, the auto-scaling group can turn of servers to reduce cost. When the load increases again, the system simply starts a new server.
The thresholds for scaling up- and down can be calculated using the following formula:
let S be the sum of the loads of all the running instances, let T be the maximum load that can be handled by all running instances, and let N be the number of running instances.
upper-bound: if S > (N-1 / N) * T then scale up
lower-bound: if S < (N-2 / N-1) * T then scale down
If the loads fluctuates around a boundary, the number of servers would keep going up and down. To prevent this from happening, a delay is introduced by the cloud hoster, but we should also lower the lower-bound by 5% or so to have some bandwidth in which the number of servers is stable. We can’t touch the upper bound, since it would violate the one server standby requirement.
Of course down-scaling can only occur for N>2, as we need a minimum of 2 servers. Up-scaling could occur unlimited, we only introduce a soft limit of 5 servers to limit the maximum costs, but if business needs change, we could up this limit.
From this follows that the load of N machines (when we have more than 2) will never be below N-2 / N-1, for example the load of 3 machines will never be below 1/2 (50%), the load of 4 machines will never be below 2/3 (67%) etc. The higher the load, the better the utilization becomes, and when the load drops, the extra machines are shut down to improve utilization again.
The result will be that depending on the fluctuations in your landscape, your average utilization percentage will increase even more, resulting in a lower running cost of your landscape.
How to achieve this auto-scaling behavior will be addressed in an upcoming post.