We’ve packaged all of the free software…what now?
Today, virtually all of the free software available can be found in packaged form in distributions like Debian and Ubuntu. Users of these distributions have access to a library of thousands of applications, ranging from trivial to highly sophisticated software systems. Developers can find a vast array of programming languages, tools and libraries for constructing new applications.
This is possible because we have a mature system for turning free software components into standardized modules (packages). Some software is more difficult to package and maintain, and I’m occasionally surprised to find something very useful which isn’t packaged yet, but in general, the software I want is packaged and ready before I realize I need it. Even the “long tail” of niche software is generally packaged very effectively.
Thanks to coherent standards, sophisticated management tools, and the principles of software freedom, these packages can be mixed and matched to create complete software stacks for a wide range of devices, from netbooks to supercomputing clusters. These stacks are tightly integrated, and can be tested, released, maintained and upgraded as a unit. The Debian system is unparalleled for this purpose, which is why Ubuntu is based on it. The vision, for a free software operating system which is highly modular and customizable, has been achieved.
This is a momentous achievement, and the Debian packaging system fulfills its intended purpose very well. However, there are a number of areas where it introduces friction, because the package model doesn’t quite fit some new problems. Most of these are becoming more common over time as technology evolves and changes shape.
- Embedded systems need to be pared down to the essentials to minimize storage, distribution, computation and maintenance costs. Standardized packaging introduces excessive code, data and interdependency which make the system larger than necessary. Tight integration makes it difficult to bootstrap the system from scratch for custom hardware. Projects like Embedded Debian aim to adapt the Debian system to be more suitable for use in these environments, to varying degrees of success. Meanwhile, smart phones will soon become the most common type of computer globally.
- Data, in contrast to software, has simple requirements. It just needs to be up to date and accessible to programs. Packaging and distributing it through the standardized packaging process is awkward, doesn’t offer tangible benefits, and introduces overhead. There have been extensive debates in Debian about how to handle large data sets. Meanwhile, this problem is becoming increasingly important as data science catalyzes a new wave of applications.
- Client/server and other types of distributed applications are notoriously tricky to package. The packaging system works within the context of a single OS instance, and so relationships which span multiple OS instances (e.g. a server application which depends on a database running on another server) are not straightforward. Meanwhile, the web has become a first-class application development platform, and this kind of interdependency is extremely common on both clients and servers.
- Cross-platform applications such as Firefox, Chromium and OpenOffice.org have long struggled with packaging. In order to be portable, they tend to bundle the components they depend on, such as libraries. Packagers strive for normalization, and want these applications to use the packaged versions of these libraries instead. Application developers build, test and ship one set of dependencies, but their users receive a different stack when they use the packaged version of the application. Developers on both sides are in constant tension as they expect their configuration to be the canonical one, and want it to be tightly integrated. Cross-platform application developers want to provide their own, application-specific cross-platform update mechanism, while distributions want to use the same mechanism for all their components.
- Virtual appliances aim to combine application and operating system into a portable bundle. While a modular OS is definitely called for, appliances face some of the same problems as embedded systems as they need to be minimized. Furthermore, the appliance becomes a component in itself, and requires metadata, distribution mechanisms and so on. If someone wants to “install” a virtual appliance, how should that work? Packaging them up as .debs doesn’t make much sense for the same reasons that apply to large data sets. I haven’t seen virtual appliances really taking off, but I expect cloud to change that.
- Runtime libraries for languages such as Perl, Python and Ruby provide their own packaging systems, which manage dependencies and other metadata, installation, upgrades and removal in a standardized way. Because these operate independently of the OS package manager, all sorts of problems arise. Projects such as GoboLinux have attempted to tie them together, to varying degrees of success. Meanwhile, each new programming language we invent comes with a different, incompatible package manager, and distribution developers need to spend time repackaging them into their preferred format.
Why are we stuck?
I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
– Abraham Maslow
The packaging ecosystem is very strong. Not only do we have powerful tools for working with packages, we also benefit from packages being a well-understood concept, and having established processes for developing, exchanging and talking about them. Once something is packaged, we know what it is and how to work with it, and it “fits” into everything else. So, it is tempting to package everything in sight, as we already know how to make sense of packages. However, this may not always be the right tool for the job.
Various attempts have been made to extend the packaging concept to make it more general, for example:
- Portage, of Gentoo fame, offers impressive flexibility by building packages with a custom configuration, tailored for the needs of the target system.
- Conary, from rPath, offers finer-grained dependencies, powerful revision control and object-oriented build recipes.
- Nix provides a consistent build and runtime environment, ensuring that programs are run with the same dependencies used to build them, by keeping the relevant versions installed. I don’t know much about it, but it sounds like all dependencies implicitly refer to an exact version.
Other package managers aim to solve a specific problem, such as providing lightweight package management for embedded systems, or lazy dependency installation, or fixing the filesystem hierarchy. There is a long list of package managers of various levels which solve different problems.
Most of these systems suffer from an important fundamental tradeoff: they are designed to manage the entire system, from the kernel through applications, and so they must be used wholesale in order to reap their full benefit. In other words, in their world, everything is a package, and anything which is not a package is out of scope. Therefore, each of these systems requires a separate collection of packages, and each time we invent a new one, its adherents set about packaging everything in the new format. It takes a very long time to do this, and most of them lose momentum before a mature ecosystem can form around them.
This lock-in effect makes it difficult for new packaging technologies to succeed.
Divide and Conquer
No single package management framework is flexible enough to accommodate all of the needs we have today. Even more importantly, a generic solution won’t account for the needs we will have tomorrow. I propose that in order to move forward, we must make it possible to solve packaging problems separately, rather than attempting to solve them all within a single system.
- Decouple applications from the platform. Debian packaging is an excellent solution for managing the network of highly interdependent components which make up the core of a modern Linux distribution. It falls short, however, for managing the needs of modern applications: fast-moving, cross-platform and client/server (especially web). Let’s stop trying to fit these square pegs into round holes, and adopt a different solution for this space, preferably one which is comprehensible and useful to application developers so that they can do most of the work.
- Treat data as a service. It’s no longer useful to package up documentation in order to provide local copies of it on every Linux system. The web is a much, much richer and more effective solution to that problem. The same principle is increasingly applicable to structured data. From documents and contacts to anti-virus signatures and PCI IDs, there’s much better data to be had “out there” on the web than “down here” on the local filesystem.
- Simplify integration between packaging systems in order to enable a heterogeneous model. When we break the assumption that everything is a package, we will need new tools to manage the interfaces between different types of components. Applications will need to introspect their dependency chain, and system management tools will need to be able to interrogate applications. We’ll need thoughtfully designed interfaces which provide an appropriate level of abstraction while offering sufficient flexibility to solve many different packaging problems. There is unarguably a cost to this heterogeneity, but I believe it would easily outweigh the shortcomings of our current model.
But I like things how they are!
We don’t have a choice. The world is changing around us, and distributions need to evolve with it. If we don’t adapt, we will eventually give way to systems which do solve these problems.
Take, for example, modern web browsers like Firefox and Chromium. Arguably the most vital application for users, the browser is coming under increasing pressure to keep up with the breakneck pace of innovation on the web. The next wave of real-time collaboration and multimedia applications relies on the rapid development of new capabilities in web browsers. Browser makers are responding by accelerating deployment in the field: both aggressively push new releases to their users. A report from Google found that Chrome upgrades 97% of their users within 21 days of a new release, and Firefox 85% (both impressive numbers). Mozilla recently changed their maintenance policies, discontinuing maintenance of stable releases and forcing Ubuntu to ship new upstream releases to users.
These applications are just the leading edge of the curve, and the pressure will only increase. Equally powerful trends are pressing server applications, embedded systems, and data to adapt as well. The ideas I’ve presented here are only one possible way forward, and I’m sure there are more and better ideas brewing in distribution communities. I’m sure that I’m not the only one thinking about these problems.
Whatever it looks like in the end, I have no doubt that change is ahead.