useful-notes/hassan/004-subduction-notes.md
2026-03-04 09:59:09 +01:00

18 KiB
Raw Blame History

Subduction Project Notes

Overview

Subduction is a peer-to-peer synchronization protocol and implementation for CRDTs (Conflict-free Replicated Data Types) that enables efficient synchronization of encrypted, partitioned data between peers without requiring a central server.

  • Repository: https://github.com/inkandswitch/subduction
  • Developed by: Ink & Switch
  • License: MIT OR Apache-2.0
  • Language: Rust (with WebAssembly bindings)
  • Status: Early release preview (unstable API)
  • Version: 0.5.0

Core Purpose & Features

Key Capabilities

  1. Efficient Sync Protocol — Uses Sedimentree for history sharding, diffing, and incremental synchronization
  2. Encryption-Friendly — Works with encrypted data partitions without requiring decryption during sync
  3. Decentralized — True peer-to-peer synchronization via pluggable transports
  4. Multi-Platform — Runs on native Rust, WebAssembly (browser & Node.js), and provides a CLI tool
  5. Automerge Integration — While protocol-agnostic, was originally designed for Automerge documents

Design Principles

  • no_std Compatible — Core logic works without standard library
  • Transport Agnostic — Protocol-independent message format
  • Policy Separation — Authentication separate from authorization
  • Subscription-Based — Efficient update forwarding
  • Content-Addressed — All data identified by BLAKE3 hash
  • Idempotent — Receiving same data twice is safe
  • Compile-Time Validation — Types make invalid states unrepresentable
  • Newtypes for Domain Concepts — Prevent mixing different semantic types

Architecture & Components

The project is organized as a Rust workspace with 16 member crates:

Core Layer (3 crates)

Crate Description
sedimentree_core Core data partitioning scheme using depth-based hierarchical layers for efficient metadata-based synchronization
subduction_crypto Cryptographic types: Ed25519-signed payloads with type-state pattern for verification status
subduction_core Main synchronization protocol implementation with pluggable storage, connections, and policies

Storage Layer (1 crate)

Crate Description
sedimentree_fs_storage Filesystem-based persistent storage for Sedimentree data

Transport Layer (3 crates)

Crate Description
subduction_http_longpoll HTTP long-poll transport for restrictive network environments
subduction_iroh Iroh (QUIC) transport for direct P2P connections with NAT traversal
subduction_websocket WebSocket transport for browser and Node.js environments

Integration Layer (4 crates)

Crate Description
automerge_sedimentree Adapter for synchronizing Automerge CRDT documents via Sedimentree
subduction_keyhive Integration with Keyhive access control system
subduction_keyhive_policy Keyhive-based authorization policy for connections
bijou64 Utility crate (64-bit optimizations)

WebAssembly Bindings (4 crates)

Crate Description
sedimentree_wasm Wasm bindings for Sedimentree
subduction_wasm Wasm bindings for browser and Node.js
automerge_sedimentree_wasm Wasm wrapper for Automerge + Sedimentree
automerge_subduction_wasm Full sync stack (Automerge + Subduction) for Wasm

Tools (1 crate)

Crate Description
subduction_cli Command-line tool for running sync servers and managing data

Technical Architecture

Key Abstractions

FutureForm Trait

Enables portable async code across native Rust (Tokio) and WebAssembly (single-threaded):

  • Provides two implementations: Sendable (Send + Sync) for multi-threaded and Local for single-threaded environments
  • Uses macro-based code generation to support both forms
Generic Parameters on Subduction
pub struct Subduction<'a, F: FutureForm, S: Storage<F>, C: Connection<F>,
    P: ConnectionPolicy<F> + StoragePolicy<F>,
    M: DepthMetric, const N: usize>

Compile-time configuration enables type-safe instantiation without runtime dispatch overhead.

Policy Traits (Capability-Based Access Control)
Trait Purpose
ConnectionPolicy Authorization at the connection level (is this peer allowed?)
StoragePolicy Authorization at the document level (can this peer read/write this document?)
OpenPolicy Permissive default (allows everything)
KeyhivePolicy Real authorization with Keyhive integration
Cryptographic Types
Type Description
Signed<T> Payload with Ed25519 signature (unverified)
VerifiedSignature<T> Witness that signature has been verified
VerifiedMeta<T> Witness that signature is valid AND blob matches metadata

The type-state pattern prevents "verify and forget" bugs at compile time.

NonceCache
  • Replay protection for handshake protocol
  • Time-based bucket system (4 buckets × 3 min = 12 min window)
  • Lazy garbage collection, no background task needed

Sedimentree Data Structure

Organizes CRDT data into depth-stratified layers based on content hash:

Depth Contains
Depth 0 All commits
Depth 1 Commits with 1+ leading zero bytes
Depth 2 Commits with 2+ leading zero bytes
Depth 3+ Further filtering

Benefits:

  • Efficient sync through fingerprint-based reconciliation
  • Compare summaries at higher depths first
  • Drill down only where differences exist
  • ~75% bandwidth reduction via 8-byte SipHash fingerprints instead of 32-byte digests

Connection Lifecycle

Client ─(TCP/WebSocket)─→ Server
Client ─(Signed Challenge)─→ Server
Server ─(Signed Response)─→ Client
Both verify signatures and check ConnectionPolicy
↓
Authenticated connection established

Protocol Layers

Layer Component Description
Layer 1 Transport WebSocket, HTTP Long-Poll, or Iroh/QUIC
Layer 2 Connection Handshake with mutual Ed25519 authentication, policy-based authorization
Layer 3 Sync Batch sync (pull), Incremental sync (push), Subscriptions
Layer 4 Application Automerge, custom CRDTs, or other data structures

Sync Protocols

Batch Sync

  • Type: Pull-based (request/response)
  • Mechanism: Uses fingerprint-based reconciliation for compact diffs
  • Delivery: Guaranteed
  • Use Cases: Initial sync, reconnection, consistency checks

Incremental Sync

  • Type: Push-based (fire-and-forget)
  • Mechanism: Immediate updates for active editing
  • Delivery: Best-effort
  • Use Cases: Real-time collaboration

Subscriptions

  • Optional per-document subscription
  • Updates forwarded only to authorized subscribed peers
  • Efficient batch update forwarding

Reconnection

  • Automatic detection and recovery
  • Batch sync used to catch up after missed incremental updates

Security Model

Trust Assumptions

Trusted:

  • Ed25519 signatures are unforgeable
  • BLAKE3 is collision-resistant and preimage-resistant
  • TLS provides transport encryption when used
  • Private keys remain secret
  • Local storage is not compromised

Not Trusted:

  • Network (attacker can observe, delay, drop, inject)
  • Peers (may be malicious, compromised, buggy)
  • Clocks (tolerate ±10 minutes drift)
  • Server operators (may attempt unauthorized access)

Security Goals

  1. Authentication — Know who you're talking to
  2. Integrity — Detect message tampering
  3. Replay Protection — Reject replayed handshakes
  4. Authorization — Enforce access control per document
  5. Confidentiality — Data encrypted at rest and in transit

CLI Tool (subduction_cli)

Commands

Command Description
server Starts a sync node with configurable transports
purge Deletes all stored data

Features

  • Multi-transport support (WebSocket, HTTP long-poll, Iroh)
  • Flexible key management (command-line, file-based, ephemeral)
  • Peer connection configuration
  • Metrics export (Prometheus)
  • Reverse proxy support
  • NixOS and Home Manager integration

Technologies & Dependencies

Core Rust

Dependency Purpose
tokio Async runtime for native targets
futures Async abstractions
ed25519-dalek Ed25519 signatures
blake3 Content hashing
serde + ciborium CBOR serialization

Networking

Dependency Purpose
async-tungstenite WebSocket implementation
axum HTTP server framework
iroh QUIC protocol and NAT traversal
hyper HTTP client/server

WebAssembly

Dependency Purpose
wasm-bindgen JS↔Wasm FFI
wasm-bindgen-futures Async support for Wasm
wasm-tracing Logging for browser

Testing & Quality

Tool Purpose
bolero Property-based fuzzing
criterion Benchmarking
playwright E2E testing for Wasm

Project Structure

subduction/
├── sedimentree_core/          ## Core partitioning & metadata
├── sedimentree_fs_storage/    ## Filesystem storage
├── sedimentree_wasm/          ## Wasm bindings
├── subduction_crypto/         ## Signed types & crypto
├── subduction_core/           ## Sync protocol
├── subduction_{http_longpoll,iroh,websocket}/  ## Transports
├── subduction_wasm/           ## Full Wasm bindings
├── subduction_keyhive/        ## Keyhive types
├── subduction_keyhive_policy/ ## Keyhive authorization
├── automerge_sedimentree/     ## Automerge integration
├── automerge_*_wasm/          ## Automerge Wasm bindings
├── subduction_cli/            ## CLI server tool
├── design/                    ## Protocol documentation
│   ├── security/             ## Threat model
│   └── sync/                 ## Sync protocols
└── HACKING.md                ## Contributor guide

Applications

Subduction's architecture makes it suitable for a wide range of decentralized, collaborative applications:

Real-Time Collaborative Editing

  • Document Editors — Google Docs-style collaborative editing without central servers
  • Code Editors — Pair programming and collaborative coding environments
  • Design Tools — Multi-user design applications (like Figma) with local-first architecture
  • Whiteboards — Real-time collaborative diagramming and brainstorming tools

Local-First Applications

  • Note-Taking Apps — Obsidian-like applications with seamless multi-device sync
  • Task Management — Todo lists and project management tools that work offline
  • Personal Knowledge Bases — Roam Research or Notion alternatives with true data ownership
  • Journaling Apps — Private journals with end-to-end encryption and cross-device sync

Decentralized Social & Communication

  • Chat Applications — End-to-end encrypted messaging without central servers
  • Social Networks — Federated or fully decentralized social platforms
  • Forums & Discussion Boards — Community platforms with no single point of failure
  • Email Alternatives — Decentralized messaging systems

Data Synchronization Infrastructure

  • Database Replication — Syncing distributed databases across nodes
  • Configuration Management — Distributing configuration across a fleet of servers
  • CDN-like Content Distribution — Efficient content propagation across edge nodes
  • IoT Device Sync — Synchronizing state across IoT devices and gateways

Privacy-Focused Applications

  • Healthcare Records — Patient-controlled medical records with selective sharing
  • Financial Data — Personal finance apps with encrypted cloud backup
  • Legal Documents — Secure document sharing for legal proceedings
  • Whistleblower Platforms — Secure, anonymous document sharing

Gaming & Virtual Worlds

  • Multiplayer Game State — Synchronizing game world state without dedicated servers
  • Virtual Worlds — Decentralized metaverse applications
  • Persistent Worlds — MMO-style games with community-run infrastructure

Developer Tools

  • Distributed Version Control — Git-like systems with better merge semantics
  • API Mocking & Testing — Sharing API state across development teams
  • Feature Flags — Distributed feature flag management
  • Configuration Sync — Developer environment configuration sharing

Enterprise Applications

  • Offline-First Field Apps — Applications for workers in low-connectivity environments
  • Multi-Region Deployment — Consistent state across globally distributed systems
  • Compliance & Audit — Tamper-evident logs with cryptographic verification
  • Disaster Recovery — Resilient data storage with no single point of failure

Research & Academic

  • Collaborative Research — Sharing datasets and findings across institutions
  • Lab Notebooks — Electronic lab notebooks with provenance tracking
  • Reproducible Science — Versioned, content-addressed research artifacts

Why Subduction Over Alternatives?

Feature Subduction Traditional Sync Blockchain
Decentralized Yes No Yes
Efficient Bandwidth Yes (fingerprints) Varies No (full replication)
Works with Encryption Yes Varies Limited
Real-time Updates Yes Yes No (consensus delay)
Offline Support Yes Limited No
No Token/Cryptocurrency Yes Yes Usually No
Flexible Authorization Yes (Keyhive) Centralized Smart contracts

Development & Deployment

Build Requirements

  • Rust 1.90+
  • For Wasm: wasm-pack
  • For browser testing: Node.js 22+, pnpm

Deployment Options

  • Standalone binary
  • Nix flake (with NixOS/Home Manager modules)
  • Docker containers (via CLI)
  • System services (systemd, launchd)

Reverse Proxy Support

  • Caddy integration
  • Custom TLS/HTTPS setup

Testing Strategy

Type Tool/Approach
Unit Tests Standard #[test] inline in modules
Property-Based Tests bolero for fuzz testing
E2E Tests Playwright tests for Wasm bindings
Integration Tests Round-trip transport connection tests

Current Status & Roadmap

  • Version: 0.5.0 (core crates)
  • Maturity: Early release preview with unstable API
  • Production Use: NOT recommended at this time
  • Active Development: Regular updates and bug fixes

Known Limitations

  • API is unstable and may change
  • Documentation is still evolving
  • Some edge cases may not be fully handled
  • Performance optimization ongoing

Getting Started

Basic Usage (Rust)

// Example usage with Automerge
use automerge_sedimentree::AutomergeSedimentree;
use subduction_core::Subduction;

// Initialize with your chosen transport and storage
let subduction = Subduction::new(storage, transport, policy);

// Sync documents
subduction.sync(document_id).await?;

CLI Server

## Start a WebSocket sync server
subduction server --websocket-port 8080

## Start with multiple transports
subduction server --websocket-port 8080 --http-port 8081

## Connect to peers
subduction server --peer ws://other-node:8080

References

Changelog

  • Mar 4, 2026 -- The first version was created.