Gallifrey: a New Language for Distributed Programming
Last updated on
Stop me if you’ve heard this one before.
You’re working on your phone or tablet, editing an online document,
and you get on a plane. At first everything is fine; your edits are
cached locally, and most of the application is local to your phone
anyway. But then something happens; the page refreshes, or your
device auto-connects to the dodgy in-flight wifi, and your
previously-functioning application suddenly begins to … not.
Distributed programs need to accept disconnection.
This common experience is a demonstration of an equally common flaw in
modern client-facing distributed applications. Trouble is, it’s
fundamentally hard to write a wide-area, heavily replicated
distributed system—leading many applications to opt for a weaker
model, like an “interactive web page” where data can be “cached at the
client” for quick responsiveness. But this model fundamentally
isn’t reality. No matter what our frameworks tell us, distributed
programs can’t use the same assumptions and metaphors as local
concurrency or traditional websites. Modern distributed frameworks
must accept that caches are replicas, that replicas are fundamental to
good performance, and that replicas must tolerate periods of
disconnection elegantly. It is key that a distributed program should
never implicitly block, operating off of local replicas
whenever possible.
This makes consistency hard.
Because the moment you embrace the fact that your cached data is
actually a first-class replica of shared global state, you must ask
yourself how stale you can tolerate this data becoming, and what
guarantees you need on the visibility of updates you make to this
data. Existing replicated systems that handle disconnection well do
quitebadlyatthis, giving rise to bizarre errors on
re-connection including the ever-present threat of losing the last few
hours—or even days—of work. Correct applications must provide
strong consistency by default, while ensuring any deviation from
sequential consistency is locally justified and isolated from
sensitive parts of the application.
These features need language support.
There are a few (veryfew) applications
today which can offer transparent replication without sacrificing
consistency. But these applications are invariably hard to build, and
prone to shipping subtle consistency bugs
in production. The properties that we’re proposing—non-blocking
replication which tolerates disconnection while ensuring
strong consistency by default—are global properties of a
distributed application. They can’t be enforced by a library alone,
and they’re difficult enough that relying on “programmer discipline”
has, repeatedly, been shown to be insufficient. These features can
only be enforced at the language level—which is where we come in.
So we’re building a new language.
We’ve recently begun work on
Gallifrey,
a new programming language designed to address exactly these issues.
Gallifrey enables per-task programming, where each thread can share
any object with the world without sacrificing consistency.
Gallifrey is based on Java with a minimum of changes, making it easy
to pick up and intuitive to use.
Gallifrey’s core feature is enabled via author-provided
Restrictions on objects’ interfaces. When you want to share an
object in Gallifrey, you must define a Restriction of its interface,
limiting the available operations to only those which are safe to call
concurrently and during disconnection. This Restriction, in effect,
transforms your normal local object into a
CRDT,
a class of object which has a rich and well-studied semantics for
guaranteeing convergence even in the face of weakly-consistent
updates.
To keep the world of Restricted objects from consuming entire local
programs, Gallifrey also employs a hybrid affine ownership type
system which statically guarantees that an object (and everything it
points to), when shared under a Restriction, can only be accessed via
that Restriction. We’ve managed to take the best of affine typing
(from languages like Rust) and apply it
to Gallifrey, while still allowing a majority of Gallifrey code to
enjoy the [almost] unrestricted aliasing and pervasive mutability
common to Java programs.
Finally we introduce branches, a new take on transactions inspired
by the eponymous concept from version control. Opening a branch
allows a program to create a new scope which is totally isolated from
the rest of the world, in effect voluntarily disconnecting a portion
of the program from the wider Gallifrey network. While within a
branch, programmers can perform provisional operations which
don’t guarantee sequential consistency, without worrying that these
operations will threaten the consistency of the wider application.
Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for developing CRDT data stores that let developers safely and efficiently interact with replicated application state.
Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein
Proceedings of the VLDB endowment, Vol. 16 No. 4,
2023
Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between noncommutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification.
Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein
Proceedings of the ACM in Programming Languages, Vol. 6, No. OOPSLA2, Article 173,
2022
This paper proposes a new type system for concurrent programs, allowing threads to exchange complex object graphs without risking destructive data races. While this goal is shared by a rich history of past work, existing solutions either rely on strictly enforced heap invariants that prohibit natural programming patterns or demand pervasive annotations even for simple programming tasks. As a result, past systems cannot express intuitively simple code without unnatural rewrites or substantial annotation burdens. Our work avoids these pitfalls through a novel type system that provides sound reasoning about separation in the heap while remaining flexible enough to support a wide range of desirable heap manipulations. This new sweet spot is attained by enforcing a heap domination invariant similarly to prior work, but tempering it by allowing complex exceptions that add little annotation burden. Our results include: (1) code examples showing that common data structure manipulations which are difficult or impossible to express in prior work are natural and direct in our system, (2) a formal proof of correctness demonstrating that well-typed programs cannot encounter destructive data races at run time, and (3) an efficient type checker implemented in Gallina and OCaml.
Mae Milano, Joshua Turcotti, Andrew Myers
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation,
2022
Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through restrictions with merge strategies, contingencies for conflicts arising from concurrency, and branches, a novel concurrency control construct inspired by version control, to contain provisional behavior.
Mae Milano, Rolph Recto, Tom Magrino, Andrew Myers
3rd Summit on Advances in Programming Languages (SNAPL 2019),
2019