A Lightweight Linux-image Engine (ALLiE)

ALLiE is a kit of software which helps setting-up a cluster of PCs or laptops with Linux systems and configuring and administering those systems with common system images and configurations. It allows systems to be installed and updated with zero down-time. It was written to accommodate the likelihood of several such systems on a PC or laptop, separately centrally maintainable, though that's not essential!

ALLiE is lightweight in that it requires no constantly-running daemons of its own, and requires no metafiles to describe the files that are to be downloaded and kept in synchronisation across the cluster. Furthermore, it's efficient in that file transfer uses rsync, so overheads are minimal. It's also lightweight in being simple, with a short learning curve.

Maybe ALLiE shouldn't be said to be an engine as it has no daemon, but I've done this by analogy to other more elaborate and powerful methods like CFEngine and Puppet. And it makes for a nice acronym.

Overview

ALLiE provides the methodology and scripts for installing images on PCs in a cluster. This includes the cases where a PC or laptop have multiple images simultaneously, and where those images can be used using the old long-standing method of multi-boot, or (for me) using a chrooting invocation like MELEE, or some VM method. 

Here an image just means the complete set of files which comprise an operating system and its set of applications. Whether that image is the original vanilla tree of files comprising an unmodified system, or is some customised version, is up the local administrator.

A first image on a brand-new PC or server can be installed in the traditional way, with or without kickstart, or can be fetched directly from the central server holding the image. Further images on that PC will be fetchable automatically.

Thereafter, for each system image, day-to-day configuration is performed as an rsync overlay. By this I mean that there is a centrally-maintained sparse tree of files which represent the changes to the system from the original image, and that sparse tree is periodically rsynced by the client PCs/laptops to overlay any similarly-named files already present. This is very efficient because it means updates are performed on that small set of files, rather than the full set of files of the original image (unlike some other methods using rsync).

Another feature is the use of file tags to designate file versions in the overlay tree which are destined to become active only on particular clients. This saves creating a separate overlay tree for such special cases. For example, the file /etc/hosts.allow might need to be different on one or two clients while all the other configured files might be the same. This is optional: you might prefer to have a separate overlay tree per client type, and/or several overlaid trees per image.

An important added-on feature is the use of conf scripts. These scripts are part of the file trees rsynced from the central server, most likely the overlay tree, and sit in a particular directory of the client, /root/conf/. These files are potentially run perhaps every hour, as well as at boot-time, but, if a particular script runs and exits with a zero return code, then it's satisfied and so will not be re-run, in the normal course of events. It therefore acts like an assertion that the client system has performed some action. Uses of such conf files are many, but a reasonable example is to install packages which are additional, perhaps conditionally additional, to the original image. When the package has been installed, the script can exit with a zero return code, and the script won't need to be invoked again. If a client is switched off for a while and then switched on, the methodology ensures that it automatically updates to the latest state, without intervention.

For a case where a client has multiple images, all active at the same time, the overlay step and the conf scripts can run independently within their separate images and so keep each of those images separately up to date.

Example

At the time of writing I have around 70 PCs and interactive servers which require the latest packages on a Fedora 18 system, and have available an older Fedora 15 system to which users can move from/to at their own pace, and have SL4 and SL5 and SL6 systems (and a Ubuntu system in one or two cases) for software compatibility with experimental collaborators. Each PC has those systems installed, in their own partitions, and each system separately keeps itself up to date by reference to central server images. When a new system comes along, it can be added on the central server and then each client remotely without the user even knowing that this is taking place. They can then start making use of that system, or even reboot to make that system their base system, at a time convenient to themselves, without any down-time.

Important scripts

The following scripts can be downloaded as part of the overlay area:

/root/conf/run-conf: this script invokes number-sequenced scripts in the same directory.

/root/conf/00overlay: this script invokes /root/util/rsync-image to overlay the current image with its overlay file tree. As you can imagine, this script can therefore bring in further conf scripts as well as other files. It keeps a list of files that have been fetched on this occasion (could be empty).

/root/conf/01overact: this script scans the list of files that rsync has just fetched, and acts on any tag classes that match this server.

/root/util/rsync-image: this general script invokes rsync as a client to fetch files from a central server. It can be invoked either to load the initial image of the system, or to fetch an overlay area: the sparse tree of files which logically overlay that initial image. Since rsync in the -a mode contains all the logic to ensure that it will only fetch files that have been added or changed since the last fetch, this is fast and efficient.

Testing scripts

A /root/conf script can easily be introduced in test mode by putting it into the overlay image but not marking it as executable. It can then be run selectively on a chosen test PC to see if it works satisfactorily. For example, on the test PC you could run /root/conf/00overlay, followed by /root/conf/01overact (if necessary), and this will have been sufficient to fetch updated files that you can then test.

Alarms

Some /root/conf scripts might do non-trivial processing and so ideally are not run often. An example is a script which checks to see if packages in this image are up-to-date. To avoid overhead, such scripts can touch a particular file in /root/done/, time-stamped for a week hence (for example). When the script sees that its alarm has been passed, it runs in full. Otherwise it exits quickly with a non-zero (unsatisfied) return code.

Index