You’ll build a room booking system, the kind you’d use for meeting rooms, studio space, or appointment slots. It’s a simple concept, but under the hood, booking systems are full of edge cases that silently corrupt data: overlapping reservations, double-bookings after cancellations, capacity violations. You’ll prove all of those edge-case bugs are impossible before compiling the code.
You don’t need to know Dafny or formal methods.
You’ll need:
- Node.js 18+,
- Claude Code
- lemmafit
Lemmafit works best with Claude Opus 4.6. Dafny proofs require precise reasoning about preconditions and postconditions — Opus handles this more reliably than Sonnet.
Step 1: Install and scaffold
Install lemmafit globally, then create a new project.
$ npm install -g lemmafit
$ lemmafit init bookings
$ cd bookings && npm install
Your project:
bookings/
├── SPEC.yaml # your requirements go here
├── lemmafit/
│ ├── dafny/
│ │ ├── Domain.dfy # verified logic (Claude writes this)
│ │ └── Replay.dfy # state machine kernel (don't edit)
│ └── .vibe/
│ └── config.json
├── src/
│ ├── dafny/ # auto-generated TypeScript API (don't edit)
│ └── App.tsx
├── .claude/ # hooks and skills that connect Claude to the verifier
└── package.json
Step 2: Start three terminals
# Terminal 1 — verification daemon
# Watches your .dfy files and SPEC.yaml; writes to status.json
$ npm run daemon
# Terminal 2 — Vite dev server
# Run your app locally
$ npm run dev
# Terminal 3 — Claude Code
# Vibe as usual
$ claude
Use split panes or tmux. Keep the daemon terminal visible — it’s satisfying to watch proofs pass.
Step 3: Describe what you want
You can go about this step in two different ways:
- Start vague, iterate on design decisions with Claude
- Start with a SPEC.md or PRD.md containing detailed design decisions
Vague with iteration
In the Claude Code terminal, tell it what to build. You don’t need to mention Dafny, verification, or proofs.
> Build a room booking app. Users can book time slots for meeting rooms, modify bookings, and cancel them. Multiple rooms, each with a max capacity. No double-booking allowed. What considerations should we take into account for a system like this?
Start with a detailed markdown file
Add a SPEC.md file (like this example) to your root directory and simply tell Claude to review and implement it.
Once you and Claude agree on design decisions, you can move on to Step 4.
Step 4: Claude writes your SPEC.yaml
First, Claude translates your description into structured requirements. This is the source of truth for what gets verified. Here are a few example entries (your app will have more):
entries:
# A rule that must always be true, no matter what operations have happened.
# This one is verifiable — the prover can mathematically guarantee it.
- id: spec-001
req_id: null
title: No two bookings in the same room can overlap in time
group: Business Logic
layer: logic
type: invariant
property: "NoOverlaps(model.bookings)"
module: BookingCore
depends_on: []
verifiable: true
guarantee_type: verified
state: DRAFT
# A condition that must be true before an operation is allowed.
# The function rejects the input rather than corrupting state.
- id: spec-005
req_id: null
title: Bookings cannot be created in the past
group: Business Logic
layer: logic
type: precondition
property: "booking.start >= model.now"
module: BookingCore
depends_on: []
verifiable: true
guarantee_type: verified
state: DRAFT
# A guarantee about what happens after a specific operation.
# Cancelling a booking must actually free the slot for someone else.
- id: spec-008
req_id: null
title: Cancelled bookings free their slot for new reservations
group: Business Logic
layer: logic
type: postcondition
property: "NoConflict(CancelBooking(model, id).bookings, newBooking)"
module: BookingCore
depends_on: [spec-001]
verifiable: true
guarantee_type: verified
state: DRAFT
Each verifiable entry becomes something the prover will prove. For all possible inputs, all possible orderings, all possible room configurations.
Invariant — must hold at all times, after every operation. “No overlapping bookings” is always true, no matter how many creates, modifies, and cancels have happened.
Postcondition — must hold after a specific operation. “Cancelled bookings free their slot” is guaranteed after every cancel.
Precondition — must be true before an operation is allowed. “Cannot book in the past” means the function rejects the input rather than corrupting state.
Step 5: Claude writes verified logic
Next, Claude writes Domain.dfy — the core booking logic in Dafny. This is the formalized spec against which the proof will run. You do not need to review this file. Later steps will help ensure that the formalized specs accurately capture your intent.
You don’t need to know Dafny. Claude writes all the Dafny, and a deterministic verifier checks it via the daemon. The hooks feed Claude the verification errors and Claude iterates until verification passes.
You can watch the daemon terminal as it automatically runs the verifier and compiles the code after Claude finishes writing a .dfy file.
22:27:13 Change detected, verifying...
22:27:18 Verification failed (1 error)
22:27:33 Change detected, verifying...
22:28:05 Verification passed, compiling...
22:28:07 Change detected, compiling...
When verification fails, a hook sends the error to Claude and Claude automatically iterates on them until verification passes and the TypeScript API is compiled. A few (or even many) failures before a pass is normal, especially for complex invariants like overlap detection.
Proofs can take time. It’s normal for a complex invariant — like proving that no sequence of creates, modifies, and cancels can ever produce an overlap — to take double-digit minutes and multiple iterations. The exact amount of time will depend on your machine and the complexity of the proof. The verifier is exhaustive, so harder properties take longer to prove.
But if Claude has been cycling on the same error for a while without making meaningful progress, you can help it along:
Give it a hint. Stop the current iteration (Ctrl+C) and tell Claude what direction to try. For example:
> Try splitting difficult proofs into multiple lemmas
> Use a different approach for the proofs that keep failing
Tell it to use an axiom. If a property is true but difficult to prove — maybe it depends on arithmetic the verifier is slow to resolve, or on a property you’re confident about but Claude can’t close — you can tell Claude to assume it as an axiom and move on.
An axiom is an unproven assumption. Instead of proving the property, the verifier accepts it as given and uses it to prove everything else. The property still shows up in your guarantees report, but marked as assumed instead of verified.
Accept the use of axioms sparingly, and ensure that the assumption is reflected in the final guarantees about the app.
> Assume the time range arithmetic is correct as an axiom for now
> and move on to the next proof.
This is a practical tradeoff. You’re saying: “I believe this is true, I don’t need the machine to prove it right now, let’s keep going.” You can come back to an axiom later and prove it, or leave it as an assumption you test conventionally.
Check the daemon logs. Run lemmafit logs to see the full verifier output. Sometimes the error message reveals that Claude is proving something slightly different from what you intended — a misunderstanding in the spec rather than a hard proof problem. In that case, clarify your intent rather than pushing through.
This proof step is fundamentally different from writing code and testing it at runtime. The verifier is checking the structure of the code before it runs to ensure that no bugs are possible. When verification doesn’t pass, that means it has a counterexample showing exactly which inputs violate the property. When it passes, the property holds for all possible inputs, not just a subset of targeted or random inputs.
Step 6: Pre-React audit
Before writing any React code, Claude should automatically run /lemmafit-pre-react-audits.
This runs two audits that catch gaps between what you think is proven and what’s actually proven.
Proof Strength Audit — checks every proof against your SPEC.yaml entries. It looks for missing lemmas, weak postconditions that technically verify but prove less than you claimed, axiom debt, and invariants that are looser than they should be. A proof that passes isn’t necessarily a strong proof.
For example, your spec says “no two bookings in the same room can overlap,” but the proof might only ensure new bookings don’t overlap with existing ones, which misses the case where modifying a booking’s time range could create a conflict. The audit would catch this.
Logic-in-JS Audit — checks whether any verifiable logic is expected to live in JavaScript instead of Dafny. State derivations, validation, conditional rules — anything that belongs in the verified layer but might slip into React components or hooks. If it’s effect-free logic, it should be in Dafny.
Each finding gets a severity label:
| Severity | Meaning |
|---|---|
critical | Unverified logic that could produce wrong behavior, or a spec entry with no corresponding proof |
moderate | Weak proof that doesn’t fully cover the spec, or JS logic that should move to Dafny |
minor | Style issue or axiom with clear justification — can proceed, fix later |
The audit passes when there are zero critical and zero moderate findings. If any remain, Claude iterates on the proofs and re-runs the audit until it’s clean.
This step exists because verification passing doesn’t mean coverage is complete. A proof can be correct but narrow — proving less than the spec entry claims, or missing an interaction between two features. The audit closes that gap before you build on top of it.
If Claude moves straight to writing React after proofs pass, stop the process (Ctrl+C) and tell it to run the audits first. You can also just type /lemmafit-pre-react-audits directly in the chat to call the skill manually.
At this point, the verified and compiled TypeScript logic is ready to be hooked into the React app via the API. Claude will take care of this as well.
Step 7: See the running app
If you haven’t yet, run npm run dev in your terminal to see the app running locally. You’ll see a booking interface — rooms, time slots, a way to create and manage reservations.
This will likely have a typical vibe-coded look, and you can feel free to iterate on the UI since the logic and the UI are separate concerns.
If all worked correctly, you won’t be able to violate the rules that have been proved (e.g., create a booking that overlaps with another).
Step 8: Post-React audit
After writing React code, Claude should automatically run /lemmafit-post-react-audit. This is a mirror of Step 6, but instead of checking proofs before the UI exists, it checks whether any logic slipped into JavaScript while building it.
If you iterate on the app further (e.g., add/remove features or modify a business rule) and suspect that Claude is writing logic directly in JavaScript, you can ask it to run this audit to ensure that all verifiable logic is routed through Dafny.
Logic-in-JS Audit — scans your React components, hooks, and utils for effect-free logic that belongs in Dafny. State derivations, validation, conditional guards, formatting that encodes business rules, or logic duplicated between JS and Dafny. If it’s pure logic, it should be verified.
The same severity labels apply:
| Severity | Meaning |
|---|---|
critical | Effect-free logic in JS that could produce wrong behavior, or logic duplicated between JS and Dafny |
moderate | Logic that should be a Dafny function or predicate but isn’t yet causing a correctness risk |
minor | Borderline case — arguably pure formatting or a trivial derivation not worth a Dafny round-trip |
The audit passes with zero critical and zero moderate findings. If any remain, Claude moves the logic to Dafny, re-verifies, and re-runs the audit.
If Claude doesn’t run the post-React audit automatically, type /lemmafit-post-react-audit directly in the chat.
Step 9: Generate a guarantees report
Once the app is working and you are satisfied with it, run /lemmafit-guarantees in Claude Code (or ask Claude to run guarantees):
> /lemmafit-guarantees
This step does more than just summarize; it independently verifies that the proofs actually mean what you think they mean.
Here’s how it works: during verification, the daemon extracts claims from your Dafny code. These are the specific mathematical properties that were proven (invariant conjuncts, lemma postconditions, function contracts, and any axioms). The guarantees process maps each claim to the corresponding requirement in your SPEC.yaml, then runs claimcheck — a separate tool that reads each lemma and its requirement side by side and checks whether the proof faithfully expresses the natural-language claim.
Claimcheck can return three results for each guarantee:
- Confirmed — the proof faithfully expresses the requirement.
- Disputed — there’s a gap between what the requirement says and what the proof actually proves. For example, the requirement says “no overlapping bookings” but the lemma only proves it for creation, not modification. The result is similar to the pre-react-audits proof strength result, but it uses a more robust mechanism.
- Error — the lemma wasn’t found, usually a naming mismatch.
If any claims are disputed, Claude suggests fixes and iterates until everything is confirmed. Once clean, it generates a report in lemmafit/reports/guarantees.md.
This is the most important artifact lemmafit produces. Each guarantee has been both proven by the verifier and confirmed by claimcheck to faithfully express your requirement. You also know exactly what’s assumed and what’s outside the verification boundary entirely.
Why does claimcheck exist? Because a proof can be technically correct but prove the wrong thing. A lemma might verify successfully but express a weaker property than your requirement claims — for example, proving a tautology, or narrowing the scope to fewer cases than the spec describes. Claimcheck catches these gaps by reading the claims extracted from the proofs without visibility to the spec and then independently comparing the claims against the spec to flag any mismatches.
What do you get from this process?
You just built a booking app where the core business rules are mathematically proven, not just tested. Here’s what that actually means in practice:
You can trust vibe-coded logic. The UI can be as loosely built as you want. Swap CSS frameworks, refactor components, let Claude rewrite the entire frontend — the logic layer doesn’t care. As long as the React code dispatches through the verified API, the guarantees hold. The boundary between “proven correct” and “works well enough” is explicit.
You can iterate without fear. Add a new booking type, adjust capacity logic, introduce modification rules. The verifier tells you immediately if your change breaks an existing guarantee. No regression suite to maintain, no edge cases to remember. The proofs catch conflicts at compile time, before anything runs.
You know exactly what’s covered. The guarantees report isn’t a test coverage percentage; it’s a list of properties that will always hold true, alongside a list of what’s assumed and what’s not covered. The properties that are proved will not break. There’s no ambiguity about how your app can and can’t break.
The cost is front-loaded. Verification takes longer upfront than writing tests. But the time and tokens you don’t spend debugging overlap edge cases, writing regression tests, tracking diffs, or fixing bugs that slip through integration tests compounds quickly. For business-critical logic, the tradeoff is worth it.
Next steps: Where to go from here
Add more business rules like “no bookings longer than 4 hours,” “minimum 15-minute gap between bookings,” “recurring weekly bookings.” Each one becomes a new spec entry that gets proven.
Try lemmafit add to add a verified module to an existing project (guide coming soon).
Check the lemmafit repo (star, fork, contribute!) or the dafny-replay repo to understand the underlying methodology.
Good to know: What gets verified (and what doesn’t)
lemmafit verifies effect-free logic — properties that can be expressed as mathematical invariants and proven for all possible inputs. Everything else lives outside the verification boundary and needs standard testing. The guarantees report makes this boundary explicit.
Verified
- State transitions — “only valid states are reachable”
- Data integrity — “no overlapping bookings”
- Arithmetic — “capacity is never exceeded”
- Access rules — “checked-in bookings can’t be cancelled”
Not verified
- React rendering, CSS, browser APIs
- HTTP requests, network behavior
- Database reads and writes
- Authentication
- The wiring between UI and verified API
You still need tests, just not for the verified logic. Integration tests, end-to-end tests, and manual QA still matter for everything in the right column. Verification eliminates an entire class of bugs, but it doesn’t replace testing.