Containers

Containerization is a lightweight alternative to machine virtualization that involves encapsulating an application in a container with its own operating environment.


7 golden rules for docker images

Creating a good Docker image is an art. There are no fixed rules that can be applied in every situation. Instead, we need to look at the pros and cons of every decision. We can however provide guidance.

Here are 7 golden rules for docker images. Following these rules, you can improve the containers you build, making them more reusable, more efficient and more stable.

Stateless

The prime requirement for all scalable containers is to never keep track of a state. Every action should be executed in it’s own context, without the need to store long-term information anywhere inside the micro-service. This means that permanent storage, like databases, data files or caches should not be living inside the container. When no data lives inside the container, it means that requests can be executed by any copy of the micro-service, so we can load-balance the requests, or recycle malfunctioning containers.

Statelessness is hard to achieve. It requires that the software you try to encapsulate is written in way that allows statelessness. When done properly, the software should allow you to push the state to an external resource, such as a database. As a hint, you could try to put your datafiles in central folders that can be mounted as external shared volumes (watch out for file-locks and concurrency on the files), make use of external, shared caches etc.

If the software doesn’t allow a stateless runtime, you might be able to use clustering features of the software. This is not advised, because it puts extra requirements on the network setup of your runtime, but it can allow you to run load-balanced.

My request should be handled by any container, independent of previous requests.

Small

When your container travels through the development landscape towards production, it is downloaded and uploaded many times. Even when it has hit production, it will be copied and unpacked every time a new instance is started. Even though disk, memory and cpu might be cheap, the total sum can add up. Look at the follow examples:

  • When an micro-service fails, we need a replacement instantly and all the delay during unpacking should be avoided.
  • After a disaster, all services might need to start at the same time, congesting the bandwidth for downloading image and other resources for unpacking the docker image.
  • Your artifact repository might hold hundreds of versions of you images. Even with good housekeeping rules, the diskspace might grow beyond what is available.

What can you do about this?

  • Clean up you package repository cache after using it to install. Package manager like yum and apt download a copy of the version information of all available packages. You won’t need that inside your running container, so clean up after installing.
  • Make sure you clean up at the end of every line in your Dockerfile. Docker creates a filesystem snapshot after every line, and stores the diff as a layer. Cleaning up on the next line will not reduce space, but instead use more space.

Fast Startup

Load-balancing and high-availability depend on the ability to react to changing conditions in a timely manner. If a container fails, you wish to have it replaced now, not one hour later. When sales are peaking on your website, you wish to have extra capacity now, without any lead time, or you might miss some revenue.

Your micro-service should be able to start fast, and without dependencies to external systems. You don’t want your service to download extra packages from a repository system at startup, but instead all the packages should be part of the docker image. A service should not register itself on a central server, other services should be able to connect to your service by using a well-known url or name, such as a openshift service name. There should never be an external license server that needs to authorize your instance, and that becomes a show-stopper if it is unavailable.

Stress

Configurable

Configuration is all about being able to change behavior without rebuilding the image

Think about how others will use your micro-service. What flexibility can we give them? Does your container need a server name and port to access an external database, or can we provide a full jndi url which allows more fine-tuning? Try not to restrict the use of your container. Containers are for IT-experts, not for end-users, so give them a powerful interface. It will be used by competent people that are trying to make it work in a situation you might not have foreseen, so try to give them the tools.

There are multiple ways to inject configuration into your container.

  • Environment variables are the easiest to use. They are clear, easy to find and well understood. They are however immutable once the container has started. The process that runs inside the container gets a copy at startup.
  • Configuration files are a bit more complex. The format depends on your software, and the may be scattered all over your file-system. However, good software package are able to detect changes at runtime and can re-load the file. Also, files can be mounted, so multiple images can use the same central configuration file, which makes it easier to maintain the settings.

Other solutions, such as storing settings in the central database are possible, but not advisable in a micro-service landscape. If you are running a large openshift or kubernetes environment, you wish to see config changes, and hiding them in a database is not advisable.

Extendable

When configuration is all about changing the runtime behavior of the container within the bounds of the software, extendability is about being able to enhance software behavior by building on top of the image.

Many container images on docker.io are build according to this principle. When you use an image of an application service such as glassfish, you can choose to start the container and to upload your application modules to the running instance. This is however is a painful process that needs to be repeated every time you start the container. Instead you can choose to build a new docker image on top of the application service image with the packages pre-installed. When you do so, you have extended the original micro-service with your code, creating a new micro-service.

When you design your micro-service, think about how others can extend it. Can they add classes, libraries or other things that enhance the behavior? Maybe you should split your container into two parts: a reusable base image and a customized extension to that image for your single purpose.

Layered

Docker containers are build in layers. A layer is basically a diff: we take the previous layer and apply a number of changes to it, in order to arrive at a new situation. Each layer adds a piece of the final image, and each layer depends on a previous layer.

We already mentioned that we should avoid adding unnecessary data to the layers, but even when we avoid that, we still need to look critically at the layer structure. When you create a docker image, you are focused on the end result: making it work. This is your primary goal. Once you have it working, you should review your Dockerfile, and see if you need to make changes.

Docker tries to be smart about the layer structure. Whenever you use a docker image, the layers are downloaded and cached locally. A cached layer can be re-used when 1) The chain of layers from the root layer until this layer is exactly the same and 2) this layer itself has not changed. As soon as you make a change to one layer, it invalidates the current layer and all layers that come after it. All these layers will need to be downloaded again, even-though the previous version of the layer was cached and the layer itself has not changed.

When you look at the order of the layers, you should follow the following guideline:

large before small, stable before volatile

When the order of two lines in your docker file is not defined by any dependencies, you should consider the above rule. Ideally the line that constitutes to the larges layer in size, should be the one that is on top. Also lines that are not subject to change in a next version should come before lines that will change, such as lines referring to a specific version of a package. These two rules allow docker to use its cache more effectively, reducing memory, disk space and bandwith significantly. As a bonus, the build-time for images is also reduced anytime you make a change to one of the volatile layers.

Versioned

You should use explicit versioning, always.

Dockerfiles are code, just like any other language. You put the in source-control and use a compiler (docker) to build it into an executable (the container). You want this process to be predictable and repeatable. You don’t want it to break suddenly when a dependency is updated. Imagine your container is in production and happily running for more than a year. A small change is requested and you agree to it. You take the Dockerfile out of source control, only to find out that it fails to build. Now you are left with an investigation that is preventing you from meeting your deadline.

What are the pieces you need to version?

  • The docker image in the FROM header. This is quite obvious.
  • The software package you encapsulate in the container, still a no-brainer.
  • The libraries and packages used by the software.
  • And finally, the tools you install with apt, yum etc. to prepare the docker image.

This last line is often forgotten. If you use a packaging system from a distribution, these packages also change. The behavior or interface may change slightly. The newer versions might be incompatible with the old distribution you are using through your FROM image. Make sure you version the tools, and that the tools remain available for download, for example by copying them to an artifact repository under your control.

Conclusion

There are many things to take into account when we create a container image. This makes the creation of a good image an art in its own. Good design might not be apparent at first. If the program inside the container runs correctly, who will complain? Only when an image is used extensively, will the flaws become visible. By following the steps in this article, you can remain clear of many of the pitfalls.

Read More

Jenkins logoComponent testing is an important protection against regression errors. After every change to your component, you should test its public interfaces in isolation from the environment it runs in. In classic OTAP setups, this can be a pain, but using Docker, you can avoid many of the problems by creating a dedicated environment, just for the occasion.

Our test strategy consists of just 5 simple steps for component testing, that are fully automated using a Jenkins build server:

1) Perform unit test after compiling your code

During development, it’s important to get quick feedback on errors in your code. The GUI you use is the first layer, and the most direct protection. It protects against syntax errors. The second layer is the unit test. It should verify that your code has no erroneous constructions, like breaking on empty lists and other out of bounds exceptions. It should focus on the technical details of your implementation. It should test the constructions in your code, but take care your are not testing the libraries and such that you are using. Libraries have their own test suites, and duplicating these tests will not add any value.

 

SonarQube LogoIf your are using a code-quality gateway, such as SonarQube, it should be invoked just after the unit-tests. A gateway will improve the quality of the entire project by enforcing code standards, unittest coverage and by preventing architectural debt in your code. It reduces the burden of peer-reviews by automating the bulk of the review work, leaving only the interesting work to the developers.

 

2) Create a docker image of your component, as usual

Docker logo
Once you have passed your unit tests, you should create a docker image of your component, ready for deployment in the environment. This is a candidate image for production, and it will not be changed anymore. Whenever it passes a test-phase, it will be promoted to a next environment. This means that our docker image needs to be configurable for different environments, but the executable inside together with the internal structure of the docker image must be final.

3) Create a docker image from the image of step 2, and add mocks and settings

The image created in step 2 is final, but Docker allows us to derive from an image and add extra components. We run an application server in our docker image, where the component is deployed. In the same application server, we can deploy our mock services. All external api’s used by our component are mocked using the same platform as our component itself. This is important, because when using Docker, you should have only one executable running per container. This executable should perform the role of both the component as the mocked services. It also simplifies things, because the developer needs only one skill-set instead of two: the application and the mock framework.

The configuration for our component is also added to this image, so that it connects to the mocked api’s out of the box instead of the external api’s. The docker image needs no further configuration, and is ready to respond to our test messages directly after spinning up.

4) Deploy the image of step 3, and run your component tests against your mocked component

Our component uses one dependency that is hard to mock: the database. This can be circumvented by creating a dedicated database per test. Again, Docker shows its strengths, as we can just spin up a Docker database image together with our component test image. This implies that our component must be able to create it’s own database structure, or that we have a database image with the predefined structure available. We use the former.

Now that our component is running together with it’s mocks and database dependencies, we can initiate the component test suite from Jenkins. All tests are run in isolation, on the just created stand-alone environment, and results are gathered.

The things we verify in the component test phase are functional, and can be written down using the following format: given that the mocks provide certain data, when I call the provided public api of my component, then I expect a certain result. For example: given that a customer X is returned from the customer mock service, when I call the order service to create an order for customer X, the result should be that an order is created.

A good practice is to write down small scenario’s of business events and bundle each scenario in a testcase. Testcases should be independent of eachother, so you shouldn’t use database data stored in one scenario to execute the next one. The only dependencies are between steps inside a scenario, where you can create something, read it, update it, etc. This way can can choose the order of the scenarios and perhaps limit your testing to one case when you try to reproduce an issue.

5) Proceed to deploy the image of step 2, and perform integration and system tests as usual

Once the component testing is successful, the component test environment is deleted, since it isn’t needed anymore, and it should be newly created before every test.

We take the base image we created in step 2 and deploy it on our integration test environment.

 

Some points to take away

  • Our component is able to create its own database structure from scratch, so we can start with an empty database every time.
  • We use an application server to host both the component and the mock services
  • We build a mocked docker image on top of our production-ready image
  • Jenkins is used to create and destroy the docker environments
  • Docker compose can be used to create an environment, but specialized products such as Kubernetes or OpenShift make life for a developer much easier.
  • Component testing can seem expensive, but the longer the software lives, the more value is returned from component tests. Don’t skip out on the tests, but make implementing tests easier.
Read More

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close