POSTS
Cloud Computing in Development
When developers are working on a given task, they will generally wish to work in their own environment which is not subject to any other changes (and starts, stops) that other developers may be making at the same time. This becomes more and more important as the size of a development team increases, as more and more changes will be put into the system every day. Traditionally, each developer will run a copy of the software on their own development machine, or on a shared development server. Every couple of days, if the Continuous Integration system indicates that the software is in a working state, the developer will update his system with everyone else's changes, and can continue developing on an up to date copy of the software. Depending on the level of changes, this could take a significant amount of time.
If each developer in the team has their own environment, we quickly reach the point where there are dozens of environments, all running slightly different versions of the system. If we add in the system testing environments, it all becomes very complicated very quickly. On a recent project, we had a total of 40 developers working on a distributed Web Service based system, co-operating together to provide a business capability. Performing a full build from scratch took approximately one and a half hours. Every few days, each developer would spend this time getting his system up to date. We also had two engineers, working almost full time on keeping the system test environments up to date, along with managing the other aspects of environment management (operating system updates, testing bug fixes provided by software vendors etc...). This is a lot of time spent on just keeping the environment up to date. To make things worse, if a problem is discovered during the build process or if a bug sneaks into the system, the person maintaining the environment is faced with the prospect of going through the entire build process again to revert to an older copy the software.
This is where Cloud Computing can help, or rather two important concepts from it: Disk Snapshots, and the ability to quickly create computing environments. The concept is quite simple: When a user (developer or tester) wants an up to date environment, he goes and finds the disk image of the last known good Continuous Integration build, clones it, runs it as a virtualised environment and voilla! No waiting around for one and a half hours... no worrying about whether the build has completed successfully or not. No wondering whether the little experiment you did last Thursday has affected the operation of your system because you just created a new completely fresh environment.
When a new tool is required for the development environment, or a new
version is released, instead of instructing each developer to install
it separately, all that has to be done is to update the base image and
the next time each developer creates an environment, he will
automatically pick up any changes that have been made.
The situation is just as good for the system testing environments, as generally we want all of our system testing environments to be identical. Instead of having to build each environment separately, we simply build it once, and clone it as many times as we need it.
Snapshots and Cloud Computing also has the advantage of making very efficient usage of the available computing resources. In particular, it will be very efficient on disk storage. By using copy-on-write volumes, each environment will only require disk storage for what has changed between itself and the base image on which it is based. Because the environments will be 99% the same, each environment will not use very much storage at all. One terabyte disks are commonplace now, and would be capable of storing hundreds of disk images.
But the best advantage of using this approach is that it drastically reduces the workload on the project's environment engineers. If a change is required, or a particular developer needs another environment (say to test out operation in a clustered environment), he can do it himself. The results of the few tasks that he still needs to perform are also much more scalable. He can manage a project with 100 engineers almost as easily as he could for a project with just 10.
Of course, there are a few things that need to be done to your environments in order to be able to support operating in a cloud computing system, especially in the presence of cloning. For example, Oracle's application server OC4J stores the hostname and IP address of the server in its configuration. This will need to be changed each time the disk image is cloned. Many cloud computing environments (including EC2) do not support Multicast either, so alternative methods must be found for managing clusters. None of these problems are insurmountable however.
More of a problem is the licensing arrangement for Application Servers. Some vendors do not charge license fees for development environments, which is great. Others, such as Oracle, charge for each server, or each named developer. It is difficult to reconcile this licensing model with a cloud computing system, where environments come and go very often.
The final challenge is organisational. Some development shops are not set up in a way that makes it easy to use cloud computing services. It may not be possible to get to EC2 from your intranet, or management (or even your client) may be nervous about running software on computers that are not under their direct control. To get around this, you might be able to set up your own virtualisation cloud within your organisation. Its not that hard to do, and depending on how sophisticated you make the setup you may get most of the benefits you would see from using a real cloud computing provider.
First up, we need to get some shared (NAS) storage on which we can store our disk images. Because different people on different computers will be wanting access to the images, we need a way of getting to disk images over the network. ATA over Ethernet (AoE) is a way of getting a central file server (NAS) to which other computers (the virtualised ones) can access over the network and pretend that it is a normal drive. iSCSI is another option, which is more standards complient, works over routed networks instead of just the local Ethernet segment, but is a little more resource hungry. Both are supported well supported by Linux so should be easy to use.
Whatever NAS solution we choose should also support Copy on Write snapshots of disk images. Linux LVM has got image support, but performance will drop as the number of snapshots increases. A better solution would be to use ZFS, which comes with OpenSolaris. ZFS has very good snapshot support, along with other new and exciting storage features, but OpenSolaris only supports iSCSI, not AoE. Thats fine, as the xVM virtualisation solution we are about to talk about has iSCSI support out of the box. Once the image snapshot box is set up, it is important to ensure that spare parts are kept and backups are taken, as it becomes a central point of failure for all of your developers.
So, now we've got our snapshot storage sorted out, how are we going to do the virtualisation? If we were going to set up a full cloud, we could use Eucalyptus. That would allow us to centralise all of our virtual environments and provide proper scalability. Each developer is likely to only need one environment though, and even laptops these days have enough oomph to run at least one virtual environment. So why don't we allow our developers to run the virtualised environment directly on their own machines? Sun provide a virtualisation solution called xVM Virtualbox which can source its disk images via a built in iSCSI driver. It also has a command line utility, which makes it very easy to include in scripts. Perfect.
All that is left now is to produce the tools that will allow users (and Continuous Integration) to manipulate disk images from their desktops, and set up a base level development image. As the technique is intended for internal development, we have chosen to use shell scripts and SSH to run our scripts, with a Web application fronting it to allow for manual management. Setting up the images can be tricky. Luckily Oracle have already set up some oracle Fusion images. You can download the image and convert it to an VDI image (Instructions to follow).
So, we can set up a virtualised environment with snapshot management for use in development using only one extra server and a few cheap drives. This provides a very easy way of setting up this sort of development environment without having to go down the sometimes difficult path of convincing your company to fully embrace cloud computing.