useful-notes/hassan/004-subduction-notes.md

493 lines
18 KiB
Markdown
Raw Permalink Normal View History

2026-03-04 09:58:21 +01:00
## Subduction Project Notes
### Overview
**Subduction** is a peer-to-peer synchronization protocol and implementation for CRDTs (Conflict-free Replicated Data Types) that enables efficient
synchronization of encrypted, partitioned data between peers without requiring a central server.
- **Repository:** https://github.com/inkandswitch/subduction
- **Developed by:** Ink & Switch
- **License:** MIT OR Apache-2.0
- **Language:** Rust (with WebAssembly bindings)
- **Status:** Early release preview (unstable API)
- **Version:** 0.5.0
---
### Core Purpose & Features
#### Key Capabilities
1. **Efficient Sync Protocol** — Uses Sedimentree for history sharding, diffing, and incremental synchronization
2. **Encryption-Friendly** — Works with encrypted data partitions without requiring decryption during sync
3. **Decentralized** — True peer-to-peer synchronization via pluggable transports
4. **Multi-Platform** — Runs on native Rust, WebAssembly (browser & Node.js), and provides a CLI tool
5. **Automerge Integration** — While protocol-agnostic, was originally designed for Automerge documents
#### Design Principles
- **no_std Compatible** — Core logic works without standard library
- **Transport Agnostic** — Protocol-independent message format
- **Policy Separation** — Authentication separate from authorization
- **Subscription-Based** — Efficient update forwarding
- **Content-Addressed** — All data identified by BLAKE3 hash
- **Idempotent** — Receiving same data twice is safe
- **Compile-Time Validation** — Types make invalid states unrepresentable
- **Newtypes for Domain Concepts** — Prevent mixing different semantic types
---
### Architecture & Components
The project is organized as a Rust workspace with 16 member crates:
#### Core Layer (3 crates)
| Crate | Description |
|---------------------|------------------------------------------------------------------------------------------------------------------|
| `sedimentree_core` | Core data partitioning scheme using depth-based hierarchical layers for efficient metadata-based synchronization |
| `subduction_crypto` | Cryptographic types: Ed25519-signed payloads with type-state pattern for verification status |
| `subduction_core` | Main synchronization protocol implementation with pluggable storage, connections, and policies |
#### Storage Layer (1 crate)
| Crate | Description |
|--------------------------|----------------------------------------------------------|
| `sedimentree_fs_storage` | Filesystem-based persistent storage for Sedimentree data |
#### Transport Layer (3 crates)
| Crate | Description |
|----------------------------|---------------------------------------------------------------------|
| `subduction_http_longpoll` | HTTP long-poll transport for restrictive network environments |
| `subduction_iroh` | Iroh (QUIC) transport for direct P2P connections with NAT traversal |
| `subduction_websocket` | WebSocket transport for browser and Node.js environments |
#### Integration Layer (4 crates)
| Crate | Description |
|-----------------------------|--------------------------------------------------------------------|
| `automerge_sedimentree` | Adapter for synchronizing Automerge CRDT documents via Sedimentree |
| `subduction_keyhive` | Integration with Keyhive access control system |
| `subduction_keyhive_policy` | Keyhive-based authorization policy for connections |
| `bijou64` | Utility crate (64-bit optimizations) |
#### WebAssembly Bindings (4 crates)
| Crate | Description |
|------------------------------|---------------------------------------------------|
| `sedimentree_wasm` | Wasm bindings for Sedimentree |
| `subduction_wasm` | Wasm bindings for browser and Node.js |
| `automerge_sedimentree_wasm` | Wasm wrapper for Automerge + Sedimentree |
| `automerge_subduction_wasm` | Full sync stack (Automerge + Subduction) for Wasm |
#### Tools (1 crate)
| Crate | Description |
|------------------|--------------------------------------------------------------|
| `subduction_cli` | Command-line tool for running sync servers and managing data |
---
### Technical Architecture
#### Key Abstractions
##### FutureForm Trait
Enables portable async code across native Rust (Tokio) and WebAssembly (single-threaded):
- Provides two implementations: `Sendable` (Send + Sync) for multi-threaded and `Local` for single-threaded environments
- Uses macro-based code generation to support both forms
##### Generic Parameters on Subduction
```rust
pub struct Subduction<'a, F: FutureForm, S: Storage<F>, C: Connection<F>,
P: ConnectionPolicy<F> + StoragePolicy<F>,
M: DepthMetric, const N: usize>
```
Compile-time configuration enables type-safe instantiation without runtime dispatch overhead.
##### Policy Traits (Capability-Based Access Control)
| Trait | Purpose |
|--------------------|-------------------------------------------------------------------------------|
| `ConnectionPolicy` | Authorization at the connection level (is this peer allowed?) |
| `StoragePolicy` | Authorization at the document level (can this peer read/write this document?) |
| `OpenPolicy` | Permissive default (allows everything) |
| `KeyhivePolicy` | Real authorization with Keyhive integration |
##### Cryptographic Types
| Type | Description |
|------------------------|-----------------------------------------------------------|
| `Signed<T>` | Payload with Ed25519 signature (unverified) |
| `VerifiedSignature<T>` | Witness that signature has been verified |
| `VerifiedMeta<T>` | Witness that signature is valid AND blob matches metadata |
The type-state pattern prevents "verify and forget" bugs at compile time.
##### NonceCache
- Replay protection for handshake protocol
- Time-based bucket system (4 buckets × 3 min = 12 min window)
- Lazy garbage collection, no background task needed
#### Sedimentree Data Structure
Organizes CRDT data into depth-stratified layers based on content hash:
| Depth | Contains |
|----------|------------------------------------|
| Depth 0 | All commits |
| Depth 1 | Commits with 1+ leading zero bytes |
| Depth 2 | Commits with 2+ leading zero bytes |
| Depth 3+ | Further filtering |
**Benefits:**
- Efficient sync through fingerprint-based reconciliation
- Compare summaries at higher depths first
- Drill down only where differences exist
- ~75% bandwidth reduction via 8-byte SipHash fingerprints instead of 32-byte digests
#### Connection Lifecycle
```
Client ─(TCP/WebSocket)─→ Server
Client ─(Signed Challenge)─→ Server
Server ─(Signed Response)─→ Client
Both verify signatures and check ConnectionPolicy
Authenticated connection established
```
---
### Protocol Layers
| Layer | Component | Description |
|---------|-------------|--------------------------------------------------------------------------|
| Layer 1 | Transport | WebSocket, HTTP Long-Poll, or Iroh/QUIC |
| Layer 2 | Connection | Handshake with mutual Ed25519 authentication, policy-based authorization |
| Layer 3 | Sync | Batch sync (pull), Incremental sync (push), Subscriptions |
| Layer 4 | Application | Automerge, custom CRDTs, or other data structures |
---
### Sync Protocols
#### Batch Sync
- **Type:** Pull-based (request/response)
- **Mechanism:** Uses fingerprint-based reconciliation for compact diffs
- **Delivery:** Guaranteed
- **Use Cases:** Initial sync, reconnection, consistency checks
#### Incremental Sync
- **Type:** Push-based (fire-and-forget)
- **Mechanism:** Immediate updates for active editing
- **Delivery:** Best-effort
- **Use Cases:** Real-time collaboration
#### Subscriptions
- Optional per-document subscription
- Updates forwarded only to authorized subscribed peers
- Efficient batch update forwarding
#### Reconnection
- Automatic detection and recovery
- Batch sync used to catch up after missed incremental updates
---
### Security Model
#### Trust Assumptions
**Trusted:**
- Ed25519 signatures are unforgeable
- BLAKE3 is collision-resistant and preimage-resistant
- TLS provides transport encryption when used
- Private keys remain secret
- Local storage is not compromised
**Not Trusted:**
- Network (attacker can observe, delay, drop, inject)
- Peers (may be malicious, compromised, buggy)
- Clocks (tolerate ±10 minutes drift)
- Server operators (may attempt unauthorized access)
#### Security Goals
1. **Authentication** — Know who you're talking to
2. **Integrity** — Detect message tampering
3. **Replay Protection** — Reject replayed handshakes
4. **Authorization** — Enforce access control per document
5. **Confidentiality** — Data encrypted at rest and in transit
---
### CLI Tool (`subduction_cli`)
#### Commands
| Command | Description |
|----------|-------------------------------------------------|
| `server` | Starts a sync node with configurable transports |
| `purge` | Deletes all stored data |
#### Features
- Multi-transport support (WebSocket, HTTP long-poll, Iroh)
- Flexible key management (command-line, file-based, ephemeral)
- Peer connection configuration
- Metrics export (Prometheus)
- Reverse proxy support
- NixOS and Home Manager integration
---
### Technologies & Dependencies
#### Core Rust
| Dependency | Purpose |
|----------------------|----------------------------------|
| `tokio` | Async runtime for native targets |
| `futures` | Async abstractions |
| `ed25519-dalek` | Ed25519 signatures |
| `blake3` | Content hashing |
| `serde` + `ciborium` | CBOR serialization |
#### Networking
| Dependency | Purpose |
|---------------------|---------------------------------|
| `async-tungstenite` | WebSocket implementation |
| `axum` | HTTP server framework |
| `iroh` | QUIC protocol and NAT traversal |
| `hyper` | HTTP client/server |
#### WebAssembly
| Dependency | Purpose |
|------------------------|------------------------|
| `wasm-bindgen` | JS↔Wasm FFI |
| `wasm-bindgen-futures` | Async support for Wasm |
| `wasm-tracing` | Logging for browser |
#### Testing & Quality
| Tool | Purpose |
|--------------|------------------------|
| `bolero` | Property-based fuzzing |
| `criterion` | Benchmarking |
| `playwright` | E2E testing for Wasm |
---
### Project Structure
```
subduction/
├── sedimentree_core/ ## Core partitioning & metadata
├── sedimentree_fs_storage/ ## Filesystem storage
├── sedimentree_wasm/ ## Wasm bindings
├── subduction_crypto/ ## Signed types & crypto
├── subduction_core/ ## Sync protocol
├── subduction_{http_longpoll,iroh,websocket}/ ## Transports
├── subduction_wasm/ ## Full Wasm bindings
├── subduction_keyhive/ ## Keyhive types
├── subduction_keyhive_policy/ ## Keyhive authorization
├── automerge_sedimentree/ ## Automerge integration
├── automerge_*_wasm/ ## Automerge Wasm bindings
├── subduction_cli/ ## CLI server tool
├── design/ ## Protocol documentation
│ ├── security/ ## Threat model
│ └── sync/ ## Sync protocols
└── HACKING.md ## Contributor guide
```
---
### Applications
Subduction's architecture makes it suitable for a wide range of decentralized, collaborative applications:
#### Real-Time Collaborative Editing
- **Document Editors** — Google Docs-style collaborative editing without central servers
- **Code Editors** — Pair programming and collaborative coding environments
- **Design Tools** — Multi-user design applications (like Figma) with local-first architecture
- **Whiteboards** — Real-time collaborative diagramming and brainstorming tools
#### Local-First Applications
- **Note-Taking Apps** — Obsidian-like applications with seamless multi-device sync
- **Task Management** — Todo lists and project management tools that work offline
- **Personal Knowledge Bases** — Roam Research or Notion alternatives with true data ownership
- **Journaling Apps** — Private journals with end-to-end encryption and cross-device sync
#### Decentralized Social & Communication
- **Chat Applications** — End-to-end encrypted messaging without central servers
- **Social Networks** — Federated or fully decentralized social platforms
- **Forums & Discussion Boards** — Community platforms with no single point of failure
- **Email Alternatives** — Decentralized messaging systems
#### Data Synchronization Infrastructure
- **Database Replication** — Syncing distributed databases across nodes
- **Configuration Management** — Distributing configuration across a fleet of servers
- **CDN-like Content Distribution** — Efficient content propagation across edge nodes
- **IoT Device Sync** — Synchronizing state across IoT devices and gateways
#### Privacy-Focused Applications
- **Healthcare Records** — Patient-controlled medical records with selective sharing
- **Financial Data** — Personal finance apps with encrypted cloud backup
- **Legal Documents** — Secure document sharing for legal proceedings
- **Whistleblower Platforms** — Secure, anonymous document sharing
#### Gaming & Virtual Worlds
- **Multiplayer Game State** — Synchronizing game world state without dedicated servers
- **Virtual Worlds** — Decentralized metaverse applications
- **Persistent Worlds** — MMO-style games with community-run infrastructure
#### Developer Tools
- **Distributed Version Control** — Git-like systems with better merge semantics
- **API Mocking & Testing** — Sharing API state across development teams
- **Feature Flags** — Distributed feature flag management
- **Configuration Sync** — Developer environment configuration sharing
#### Enterprise Applications
- **Offline-First Field Apps** — Applications for workers in low-connectivity environments
- **Multi-Region Deployment** — Consistent state across globally distributed systems
- **Compliance & Audit** — Tamper-evident logs with cryptographic verification
- **Disaster Recovery** — Resilient data storage with no single point of failure
#### Research & Academic
- **Collaborative Research** — Sharing datasets and findings across institutions
- **Lab Notebooks** — Electronic lab notebooks with provenance tracking
- **Reproducible Science** — Versioned, content-addressed research artifacts
---
### Why Subduction Over Alternatives?
| Feature | Subduction | Traditional Sync | Blockchain |
|-------------------------|--------------------|------------------|-----------------------|
| Decentralized | Yes | No | Yes |
| Efficient Bandwidth | Yes (fingerprints) | Varies | No (full replication) |
| Works with Encryption | Yes | Varies | Limited |
| Real-time Updates | Yes | Yes | No (consensus delay) |
| Offline Support | Yes | Limited | No |
| No Token/Cryptocurrency | Yes | Yes | Usually No |
| Flexible Authorization | Yes (Keyhive) | Centralized | Smart contracts |
---
### Development & Deployment
#### Build Requirements
- Rust 1.90+
- For Wasm: wasm-pack
- For browser testing: Node.js 22+, pnpm
#### Deployment Options
- Standalone binary
- Nix flake (with NixOS/Home Manager modules)
- Docker containers (via CLI)
- System services (systemd, launchd)
#### Reverse Proxy Support
- Caddy integration
- Custom TLS/HTTPS setup
---
### Testing Strategy
| Type | Tool/Approach |
|----------------------|---------------------------------------|
| Unit Tests | Standard `#[test]` inline in modules |
| Property-Based Tests | `bolero` for fuzz testing |
| E2E Tests | Playwright tests for Wasm bindings |
| Integration Tests | Round-trip transport connection tests |
---
### Current Status & Roadmap
- **Version:** 0.5.0 (core crates)
- **Maturity:** Early release preview with unstable API
- **Production Use:** NOT recommended at this time
- **Active Development:** Regular updates and bug fixes
#### Known Limitations
- API is unstable and may change
- Documentation is still evolving
- Some edge cases may not be fully handled
- Performance optimization ongoing
---
### Getting Started
#### Basic Usage (Rust)
```rust
// Example usage with Automerge
use automerge_sedimentree::AutomergeSedimentree;
use subduction_core::Subduction;
// Initialize with your chosen transport and storage
let subduction = Subduction::new(storage, transport, policy);
// Sync documents
subduction.sync(document_id).await?;
```
#### CLI Server
```bash
## Start a WebSocket sync server
subduction server --websocket-port 8080
## Start with multiple transports
subduction server --websocket-port 8080 --http-port 8081
## Connect to peers
subduction server --peer ws://other-node:8080
```
---
### References
- [Ink & Switch Research](https://www.inkandswitch.com/)
- [Local-First Software](https://www.inkandswitch.com/local-first/)
- [Automerge](https://automerge.org/)
- [CRDTs](https://crdt.tech/)
- [Iroh](https://iroh.computer/)
## Changelog
* **Mar 4, 2026** -- The first version was created.