In a previous post, I showcased an example of waste in a customer environment. The customer had two massive servers in a fail-over setup, which resulted in low utilization. The customer was paying for a system that was at best 50% utilized, so 50% of his costs were for the fail-over scenario that may happen once or twice during the lifetime of the servers. To reduce the cost, we can switch to a scalable environment. But how do you achieve a scalable environment? How do you implement cloud scaling?
How to imlement cloud scaling
The general approach in cloud design is to apply the pets versus cattle idiom:
Pets versus Cattle
Pets versus Cattle
A pet is unique. It is something you care for, put in lots of effort and if something happens to it, it can’t be replaced without loss.
Cattle is uniform. If it gets sick or it misbehaves, you take it away and get new cattle. The replacement costs are low.
How does this apply to cloud computing and scaling? We must make sure our servers are not like pets: high value, high risk, hard to replace. We can do this by taking the following steps.
Before we can make a scalable cluster of servers, we first need to make sure the server needs as little attention as possible. All steps for startup and shutdown should be automated. This allows an automated system to manage the server without the need of a human to intervene. The system can adjust to the actual load during the night, in weekends and during the holidays at no extra costs.
Automating startup is not enough. If it takes two hours to start an extra server, the need may already have passed. Even worse, the load may have been to high for the other servers to handle, causing problems and unhappy customers. The startup process should be as light as possible.
In order to do this, the server should be pre-installed as much as possible. All packages should be there in the correct version, and no installation steps should be required.
Pre-install all packages
To get to this point, we need a server image that isn’t a raw operating system like we normally use to start a fresh server. The server image we need is one that has been prepared with the correct software pre-installed. The complete runtime platform must be configured on the image so that no time is wasted installing modules. Once we have all platform software installed, we make our own image snapshot, and we’ll use this master image for all our server instances. Only one master image exists, and new servers are created by cloning the master image and starting the copy. The master image itself is never started.
Second, we will need the platform to load the actual packages. Where the platform software usually is stable and proven software, the actual business logic usually resides in custom build packages that are loaded in the platform. These packages are fast changing with business needs. To merge them with the server image would mean we either have to install the latest version after the server comes up, or we have to rebuild the server image whenever a new package is released. Both are unwanted.
Separate the platform and the custom software
One solution is to separate the packages from the platform. Packages need to be stored at a central location, configured and all, ready to be loaded by the server. Once the server starts, it loads the package and configuration from the central location without the need for an extra installation step. When the server shuts down, the configuration persists, while the server image is discarded.
To create packages that can be stored and accessed like this is not trivial. Not every platform supports this behavior. One of the possible solutions to this is the combination of micro-services and docker. We will go into these subjects in a future post.
Cloud scaling is not simply the process of starting more servers on demand. The server images need to be prepared in order to start automatically, efficiently and reliably. Once this is done, the cloud software is able to manage the server instances and to keep the utilization at the requested levels.
One important thing to note is that the above steps for preparing a scalable server image also hold for preparing any other scalable service. From docker images to database clusters to cloud scaling, every scalable environment shares these same principles.
Lately I was visiting a customer that was moving its applications to new servers. They had acquired two huge servers to host their middleware solution, connecting their on premise applications and their cloud-hosted solutions. The client went for an active-active setup, where both servers are handling the load, but if one server fails, the remaining server must be able to handle all the trafic. It sounds like a good setup, but it introduces a lot of waste. We will go into this and show why a cloud solution using auto-scaling is so much better.
The machines are scaled to handle peak performance. This is 100% load, and we must keep in mind that this is a large company with a big IT landscape, so it must handle a lot of data at times. This load is balanced over two instances, but if either server fails, the other must be able to take on the full load, so we can never put more than 50% load on one machine. This means that at best, we can expect to have 50% idle time on these machines. However, since we won’t be running at maximum capacity all day, only at a few peak-moments, the real loss can be expected to be larger, like 60-80% idle time.
To summarize, the client had bought two huge, expensive machines that would be working for 20-40% of the time…
What could they have done better?
Improved server utilization
We keep the original requirement of the client to have active-active setup, and to have one machine in reserve in case an other goes down. However, we change our approach from a few big machines to many, small machines.
Lets do the calculations again, now for 5 smaller machines instead of two big machines. Again we want to be able to handle a single machine failure.
100% load must be handled by our landscape of 5 machines, but if one fails, only 4 machines are left, so each machine must be scaled to handle 25% of the total traffic. During normal operations however, we have 5 machines available, so they will run at 20% of the traffic at peak-moments. This means that at peak moments, the server will be running at 20/25 * 100 = 80% of the maximum load the machine can handle. In other words, it will have 20% idle time, instead of 50% idle time in the previous example.
You pay for capacity, so having less idle time means you’ll be paying less for your servers.
Can we improve on this?
The cloud approach to such a scaling problem would be to have auto-scaling group. Such an auto-scaling group consists of a number of smaller machines, all virtualized. In the above example, we would have created a group of max 5 machines, and a minimum of two machines. This satisfies the requirements of the customer that we must have one hot standby at all times, and we must be able to handle 100% load at peak-moments. The auto-scaling group can spin up extra servers when the need arises, or it can remove servers if the load drops below a certain threshold. The rules must be set so, that the total load of the n servers will never be more than n-1 servers could handle, because when a server fails, the others must be able to handle the load. As soon as this threshold is met, a new instance will be started in order to lower the load per machine.
Now, whenever the load of the system drops below the peak-load, the auto-scaling group can turn of servers to reduce cost. When the load increases again, the system simply starts a new server.
The thresholds for scaling up- and down can be calculated using the following formula:
let S be the sum of the loads of all the running instances, let T be the maximum load that can be handled by all running instances, and let N be the number of running instances.
upper-bound: if S > (N-1 / N) * T then scale up
lower-bound: if S < (N-2 / N-1) * T then scale down
If the loads fluctuates around a boundary, the number of servers would keep going up and down. To prevent this from happening, a delay is introduced by the cloud hoster, but we should also lower the lower-bound by 5% or so to have some bandwidth in which the number of servers is stable. We can’t touch the upper bound, since it would violate the one server standby requirement.
Of course down-scaling can only occur for N>2, as we need a minimum of 2 servers. Up-scaling could occur unlimited, we only introduce a soft limit of 5 servers to limit the maximum costs, but if business needs change, we could up this limit.
From this follows that the load of N machines (when we have more than 2) will never be below N-2 / N-1, for example the load of 3 machines will never be below 1/2 (50%), the load of 4 machines will never be below 2/3 (67%) etc. The higher the load, the better the utilization becomes, and when the load drops, the extra machines are shut down to improve utilization again.
The result will be that depending on the fluctuations in your landscape, your average utilization percentage will increase even more, resulting in a lower running cost of your landscape.
How to achieve this auto-scaling behavior will be addressed in an upcoming post.