Gallifrey: a New Language for Distributed Programming

Stop me if you’ve heard this one before.

You’re working on your phone or tablet, editing an online document, and you get on a plane. At first everything is fine; your edits are cached locally, and most of the application is local to your phone anyway. But then something happens; the page refreshes, or your device auto-connects to the dodgy in-flight wifi, and your previously-functioning application suddenly begins to … not.

Distributed programs need to accept disconnection.

This common experience is a demonstration of an equally common flaw in modern client-facing distributed applications. Trouble is, it’s fundamentally hard to write a wide-area, heavily replicated distributed system—leading many applications to opt for a weaker model, like an “interactive web page” where data can be “cached at the client” for quick responsiveness. But this model fundamentally isn’t reality. No matter what our frameworks tell us, distributed programs can’t use the same assumptions and metaphors as local concurrency or traditional websites. Modern distributed frameworks must accept that caches are replicas, that replicas are fundamental to good performance, and that replicas must tolerate periods of disconnection elegantly. It is key that a distributed program should never implicitly block, operating off of local replicas whenever possible.

This makes consistency hard.

Because the moment you embrace the fact that your cached data is actually a first-class replica of shared global state, you must ask yourself how stale you can tolerate this data becoming, and what guarantees you need on the visibility of updates you make to this data. Existing replicated systems that handle disconnection well do quite badly at this, giving rise to bizarre errors on re-connection including the ever-present threat of losing the last few hours—or even days—of work. Correct applications must provide strong consistency by default, while ensuring any deviation from sequential consistency is locally justified and isolated from sensitive parts of the application.

These features need language support.

There are a few (very few) applications today which can offer transparent replication without sacrificing consistency. But these applications are invariably hard to build, and prone to shipping subtle consistency bugs in production. The properties that we’re proposing—non-blocking replication which tolerates disconnection while ensuring strong consistency by default—are global properties of a distributed application. They can’t be enforced by a library alone, and they’re difficult enough that relying on “programmer discipline” has, repeatedly, been shown to be insufficient. These features can only be enforced at the language level—which is where we come in.

So we’re building a new language.

We’ve recently begun work on Gallifrey, a new programming language designed to address exactly these issues. Gallifrey enables per-task programming, where each thread can share any object with the world without sacrificing consistency. Gallifrey is based on Java with a minimum of changes, making it easy to pick up and intuitive to use.

Gallifrey’s core feature is enabled via author-provided Restrictions on objects’ interfaces. When you want to share an object in Gallifrey, you must define a Restriction of its interface, limiting the available operations to only those which are safe to call concurrently and during disconnection. This Restriction, in effect, transforms your normal local object into a CRDT, a class of object which has a rich and well-studied semantics for guaranteeing convergence even in the face of weakly-consistent updates.

To keep the world of Restricted objects from consuming entire local programs, Gallifrey also employs a hybrid affine ownership type system which statically guarantees that an object (and everything it points to), when shared under a Restriction, can only be accessed via that Restriction. We’ve managed to take the best of affine typing (from languages like Rust) and apply it to Gallifrey, while still allowing a majority of Gallifrey code to enjoy the [almost] unrestricted aliasing and pervasive mutability common to Java programs.

Finally we introduce branches, a new take on transactions inspired by the eponymous concept from version control. Opening a branch allows a program to create a new scope which is totally isolated from the rest of the world, in effect voluntarily disconnecting a portion of the program from the wider Gallifrey network. While within a branch, programmers can perform provisional operations which don’t guarantee sequential consistency, without worrying that these operations will threaten the consistency of the wider application.

More details can be found in the Gallifrey SNAPL paper.

Avatar
Mae Milano
PhD, Programming Languages and Systems

Posts

My dissertation is now available here

CONTINUE READING

Publications

Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for developing CRDT data stores that let developers safely and efficiently interact with replicated application state.
Proceedings of the VLDB endowment, Vol. 16 No. 4, 2023

Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between noncommutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification.
Proceedings of the ACM in Programming Languages, Vol. 6, No. OOPSLA2, Article 173, 2022

This paper proposes a new type system for concurrent programs, allowing threads to exchange complex object graphs without risking destructive data races. While this goal is shared by a rich history of past work, existing solutions either rely on strictly enforced heap invariants that prohibit natural programming patterns or demand pervasive annotations even for simple programming tasks. As a result, past systems cannot express intuitively simple code without unnatural rewrites or substantial annotation burdens. Our work avoids these pitfalls through a novel type system that provides sound reasoning about separation in the heap while remaining flexible enough to support a wide range of desirable heap manipulations. This new sweet spot is attained by enforcing a heap domination invariant similarly to prior work, but tempering it by allowing complex exceptions that add little annotation burden. Our results include: (1) code examples showing that common data structure manipulations which are difficult or impossible to express in prior work are natural and direct in our system, (2) a formal proof of correctness demonstrating that well-typed programs cannot encounter destructive data races at run time, and (3) an efficient type checker implemented in Gallina and OCaml.
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2022

Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through restrictions with merge strategies, contingencies for conflicts arising from concurrency, and branches, a novel concurrency control construct inspired by version control, to contain provisional behavior.
3rd Summit on Advances in Programming Languages (SNAPL 2019), 2019

Talks

Presenting a new type system for Fearless Concurrency

Presenting a new type system for Fearless Concurrency

Presenting Gallifrey, a new Java-like language for wide-area distributed programming

Presenting Gallifrey, a new Java-like language for wide-area distributed programming
Next
Previous