Posted on December 27, 2015 by Gabe Parmer
Outside of research communities, it is not very common to commit serious time to writing a new operating system. There are many good reasons for this including the fact that existing OSes seem to do a pretty good job, and the momentum behind existing systems makes adoption of something new unrealistic. It is easy to look at Linux and think that it solves most of the problems we really care about. It has proven to be an amazingly adaptive system and powers everything from cell phones up to huge servers.
However, within the halls of research, the goal is not market-share, instead it is intellectual discovery. Popular OSes tend to have designs that complicate, if not make impossible, a number of desirable properties. These include:
Each of these is debatable, and requires a long list of caveats to be accurate. Quite a bit of great research attempts to solve these problems from the other direction by increasing the viability of existing systems along each of these dimensions. Regardless, new systems enable us to understand the core of each of these issues, rather than have it occluded by the software complexity of adapting feature-rich, existing systems.
Composite does not focus on all of these dimensions as other systems are better positioned to investigate the underlying problems. For example, seL4 is best positioned for functional correctness, and Barrelfish is better positioned for heterogeneous hardware. Composite has differentiated in the publication record in the following main categories.
Predictable parallelism. Qi Wang has done great work in the design and implementation of system support for predictable parallelism. The focus has been on scaling to many cores and sockets, while maintaining low latency bounds. The intention is to open up the use of large-scale parallelism in systems that require timing guarantees such as embedded and real-time systems, while also providing lower latency for systems that already use parallelism. We’ve investigated programming model runtimes (for example, in FJOS, an OpenMP runtime), and scalable kernel construction in Speck.
System-level fault tolerance. Jiguo Song has researched the Computational Crash Cart (C3) which provides the efficient, and latency-bounded recovery from system-level faults. It uses a unique combination of micro-rebooting portions of the system, and an interface-driven approach to re-constituting the failed component’s state. This recovery process is integrated into a timing analysis, enabling system deadlines to be made regardless of faults.
Adaptable system structure. Composite’s design enables communication between components, and isolation boundaries throughout the system to take many different forms, often with the capability to change at runtime. For example, Mutable Protection Domains (MPD) enable the hardware protection boundaries between separate user-level software components to be erected and torn down to control and trade performance and fault tolerance. An emphasis is for all system policies to be implemented as user-level components, so the adaptable system structure impacts even some of the lowest-level software of the system.
Additionally, Composite is being adapted to handle some of the other areas discussed above.
What makes Composite a good platform for investigating some of these issues? A number of guiding principles lay the necessary foundation.
Kernel minimality. We share the motivation of many modern micro-kernels (introduced in Liedtke’s seminal paper) to remove policy from the kernel, so that it can be customized in user-level. We take this a step further than most modern micro-kernels by removing scheduling and capability management from the kernel.
End-to-end bounded latency. All relevant code paths in Composite must have a bounded latency between the reception of I/O, and transmission. This requires that the latency of constituent operations in the system are bounded (i.e. predictable), but also that their composition is also bounded (though, of course with a higher bound). Areas that care about tail-end latency such as the cloud don’t require such strict latency bounds, but these bounds aid in controlling latency none-the-less.
Fine-grained components. System software is complicated. Composite prefers different policies and abstractions to be implemented as separate, share-nothing components, each with a contractually specified interface of functions to harness their functionality. This enables the composition of system software from components that are customized for the system’s goals. This is similar in motivation to unikernels, but software is customized at a finer granularity, and can maintain inter-component isolation. This principle effectively subsumes similar principles such as the separation of concerns, and the separation of mechanism and policy. When paired with controlled inter-component interactions (via a capability system), this principle provides a version of the principle of least privilege. Even more-so than the first principal, this implies that IPC must be lightning fast.
Effective polymorphism. A fine-grained decomposition of system functionality is not useful if each component can only be used in a very specific context (i.e. with a very specific set of other components). This would prevent the practical composition of systems that designers didn’t foresee. Components export a relatively small set of interfaces (thus, functions) and rely on structural subtyping for practical polymorphism. This enables a large number of components to be used in a specific “location” in a system, thus greatly altering the system behavior by simply plugging in a different policy. This should remind you of UNIX pipes, but with more efficient interfaces than text streams, and more efficient IPC than pipe
s. For many data-movement components, the generic interface is similar to Plan 9’s 9P, but spans even low-level system components.
It is inevitable that our society will become more dependent on software to manage most aspects of our life. Given this, we can’t think about our software in the traditional way. Failure will have a physical impact. In extreme cases, failure has the potential for physical harm. Our personal interactions with this software will be limited, and both updating and software maintenance will have to be automated. These systems already are network connected, thus remote attacks are able to manipulate the environment (this has already caused damage in a steel mill). Search youtube for “building hack”, and be terrified about our autonomous future. Computational latency spikes in complex systems impact a physical system’s dynamics. Software development must change to accomodate these concerns.
Embedded systems have been dealing with these problems for a long time. But the scale of these systems and the number of developers involved will be unprecedented, and the quality of the software development process will be comparatively inferior (an economic necessity as current processes for software certification are a signficant burden).
In short, the next twenty years of software development will see significant changes. I think it is reasonable to look at how systems are put together to better accommodate this future. A revolutionary investigation will likely yield significant insight into what systems and software can provide. An evolution from existing systems might suitably accommodate this future. Even in this case, this evolution will likely be guided by the clean-slate approach.
All Composite code is public.
Why the name “Composite”? This follows from the 3rd and 4th principle of the system. We compose fully-featured systems from a set of constituent components. Thus, an executable system is, quite literally, a composite of atomic components.