Verifying State & Reconciliation in Collaborative Web Apps

Each post in this series pushes verification into harder territory. Post 1 introduced a replay kernel for single-user state with undo/redo — proving that every state in history satisfies the domain invariant. Post 2 walked through building a real app with that kernel.

This post tackles how (and why) we built, from the ground up, a formally verified system that handles multi-user collaboration in a web app. We will cover, at a high level, the architecture and infrastructure used to achieve true state verification and the many fun complexities we came across — client state orchestration, data atomicity, serialization, and more.

There are many ways to approach the system’s design and implementation. We took one approach to demonstrate feasibility. This post will walk through the general idea behind verified multi-collaborative systems rather than the specific implementation. The code is open and the app is live for anyone to inspect in more detail. As always, we welcome feedback and contributions.

Live App Code

Architecture

The Traditional Way: Two Layers of Correctness

Collaborative state management traditionally involves two distinct problems that are solved at different layers.

Convergence: When multiple users make concurrent edits, all replicas should eventually agree on the same state. There are different ways to achieve convergence (OTs, CRDTs, server-authoritative) but the main job at this level is to ensure all users eventually see the same data, regardless of the order operations arrive.
Domain correctness: When the state converges, does it satisfy the application’s domain-specific business rules? A task can’t exist in two lists simultaneously. A project can’t have negative capacity. Removed users cannot be tagged.

We built a gated system

Most systems handle these as separate layers: converge first, validate after. We kept them as separate concerns - the kernel handles convergence, the domain defines validity - but wired them together so reconciliation attempts are validated atomically. No path through the kernel can produce an invalid state.

Data Flow

When a user performs an action in the UI (e.g., creates a task), the React UI captures that intent and passes it to the Effect State Machine. The user sees the change immediately because the state machine applies an optimistic update while queuing the action for the server. In the background, the action is dispatched to a Supabase Edge Function, which runs the verified reconciliation logic (accepting, rebasing if the user’s view was stale, trying fallback interpretations, or rejecting). If accepted, the new state is written to PostgreSQL and broadcast via Realtime to all connected clients, keeping everyone in sync.

For a more technical and applied architectural overview of this system, see ARCHITECTURE.md.

The Kernels

Our system uses three verified kernels:

Multi-Collaboration Kernel: server-side reconciliation. Rebases stale actions, tries fallback interpretations when needed, checks domain rules. Key guarantee: every accepted action preserves domain invariants.
Multi-Project Kernel: cross-project atomicity. When a task moves between projects, both must update together or neither does. Key guarantee: no operation leaves data inconsistent across projects.
Effect State Machine: client-side orchestration. Queues actions, handles retries, processes server responses. Key guarantees: pending actions are never lost, retries are bounded, the system makes progress when online.

Reconciliation Under Verification

Rebase is a well-established technique in collaborative systems. When two users act on the same version and one lands first, one action must be transformed to account for the other. The technique itself isn’t new, but we wanted to approach it from a formal verification angle.

The question we asked was: Can we verify reconciliation logic? That is, can we prove at compile time that every transformed action, if accepted, preserves the domain invariant regardless of which interleaving occurs at runtime?

In reconciliation, the number of possible interleavings grows combinatorially making it a particularly difficult state space to test exhaustively. By verifying that every transformation preserves the domain invariant we cover all possible interleavings without the need to enumerate test cases.

The key mechanism is having a single entry point for all state changes: TryStep. TryStep is a function that takes the current model and an action, and returns either a new valid model or an error. Every state transition goes through this gate.

The multi-collaboration kernel is generic and has no knowledge of domain-specific rules. TryStep is domain-specific; it defines what a valid state means. Both are proved. If TryStep returns Ok, the kernel trusts the response. In theory, the multi-collaboration kernel can work with any suitable proved implementation of TryStep.

In practice, TryStep is a tailored implementation for collaborative domains that take discrete actions (like a to-do app, a kanban app, or an expense tracker app). We are still exploring ways to apply this to more continuously changing domains like collaborative text editors.

Cross-Project Operations

We could have stopped at single-project collaboration. The multi-collaboration kernel handles concurrent users editing the same project. But users naturally work across project boundaries — moving a task from “Work” to “Personal,” reorganizing lists between team projects. We wanted to see if the verified foundation could extend to operations that span multiple projects.

The Problem: Partial Failure

When you add a task to a project, one thing changes. When you move a task between projects, two things must change together. The task leaves the source and arrives at the destination. If only one succeeds, the task is either duplicated or lost.

A cross-project move decomposes into two single-project mutations:

MoveTaskTo(projectA → projectB, task #42)
  ├── projectA: DeleteTask(#42)
  └── projectB: AddTask(#42's data)

Both must succeed, or neither should. Without this guarantee, a network hiccup or version conflict mid-operation could leave you with inconsistent state.

Scoping the Operation

TouchedProjects is a function that returns the set of projects an action affects — one for single-project actions, two for cross-project moves. That set then drives three things:

Loading: The server only fetches projects in that set
Locking: Version checks apply to exactly those projects
Atomicity: All touched projects update in one transaction, or roll back together

When the database update runs, it checks versions on all touched projects. If any has been modified since the client last synced, the entire transaction rolls back. The task stays safe in the source project until the client retries with fresh versions.

What the Multi-Project Kernel Proves

The kernel proves one key property: if all projects are valid before a cross-project operation, and the operation succeeds, all projects are valid after.

Internally, a cross-project move calls TryStep on the source project, then again on the destination. Each TryStep is already proven to preserve the domain invariant. The multi-project proof composes these — no new domain reasoning required, just the observation that two valid transitions produce two valid states.

Infrastructure: Supabase

We used Supabase for hosted PostgreSQL, realtime, and auth. The interesting part is what runs inside the edge function: compiled Dafny code.

When a client dispatches an action, it hits a Supabase Edge Function. The function is TypeScript, but the core logic — dispatch() — is compiled Dafny. The reconciliation, rebasing, and candidate selection all run as proven code.

The Trust Boundary

The critical boundary is JSON conversion. The edge function receives JSON from the client, converts it to Dafny datatypes, runs verified reconciliation, then converts back to JSON for storage. If the conversion is wrong, the verified code operates on garbage.

We do both serialization and deserialization using dafny2js — it’s unverified, but systematic. The converters are automatically generated from the Dafny datatype definitions, not hand-written. Each Dafny type (Task, Model, Action, MultiAction, etc.) gets a corresponding typeFromJson and typeToJson function that mirrors the type’s structure exactly. This means the conversion logic is formulaic: BigNumber for integers, Dafny sequences for strings, type-tagged objects for discriminated unions. Bugs can still exist, but they’d be systematic across the entire type, not one-off mistakes in ad-hoc parsing code.

We accepted this boundary — verifying serialization is possible but orthogonal to our focus.

What the Proofs Guarantee

The proofs guarantee that every accepted action preserves the domain invariant, user intent is never silently dropped, and cross-project operations are atomic. We verified the logic; we trusted the plumbing (JSON serialization, Supabase, React).

In Summary

This is a proof of concept for formally verified collaborative state management in web apps. We built a system where concurrent users can edit shared state, the server reconciles conflicts, and every accepted state transition is proven to preserve domain invariants — at compile time, not runtime.

The verified pieces: a multi-collaboration kernel that rebases stale actions and tries fallback interpretations, a multi-project kernel for atomic cross-project operations, and a client-side effect state machine for orchestration. All written in Dafny, compiled to JavaScript, running in the browser and Supabase edge functions.

The unverified pieces: JSON serialization (systematic but not proven), the database, the network, React. We drew the trust boundary and accepted it.

The domain we used to test this is simple and low-stakes (a to-do app), but the challenges are real. Concurrent reconciliation, cross-project atomicity, client orchestration — these are hard to test exhaustively while verification covers all paths at once. Any code that is written using a correct-by-construction approach will drastically reduce the review surface area and possibility of bugs. The to-do app is a demonstration vehicle; the techniques apply wherever collaborative state needs guarantees.

What’s Next?

Open questions remain. Can this approach extend to domains with continuous state, like collaborative text editors where every keystroke is an operation? How do we handle schemas that evolve over time — can migrations be verified? And as the number of concurrent users grows, what are the practical limits of server-side reconciliation before we need to consider peer-to-peer architectures? Can peer-to-peer architecture be verified?

We’re continuing to explore these edges. If you’re working on similar problems or have thoughts on where verification helps (or doesn’t), we’d like to hear from you.