diff --git a/hassan/004-subduction-notes.md b/hassan/004-subduction-notes.md new file mode 100644 index 0000000..debd869 --- /dev/null +++ b/hassan/004-subduction-notes.md @@ -0,0 +1,492 @@ +## Subduction Project Notes + +### Overview + +**Subduction** is a peer-to-peer synchronization protocol and implementation for CRDTs (Conflict-free Replicated Data Types) that enables efficient +synchronization of encrypted, partitioned data between peers without requiring a central server. + +- **Repository:** https://github.com/inkandswitch/subduction +- **Developed by:** Ink & Switch +- **License:** MIT OR Apache-2.0 +- **Language:** Rust (with WebAssembly bindings) +- **Status:** Early release preview (unstable API) +- **Version:** 0.5.0 + +--- + +### Core Purpose & Features + +#### Key Capabilities + +1. **Efficient Sync Protocol** — Uses Sedimentree for history sharding, diffing, and incremental synchronization +2. **Encryption-Friendly** — Works with encrypted data partitions without requiring decryption during sync +3. **Decentralized** — True peer-to-peer synchronization via pluggable transports +4. **Multi-Platform** — Runs on native Rust, WebAssembly (browser & Node.js), and provides a CLI tool +5. **Automerge Integration** — While protocol-agnostic, was originally designed for Automerge documents + +#### Design Principles + +- **no_std Compatible** — Core logic works without standard library +- **Transport Agnostic** — Protocol-independent message format +- **Policy Separation** — Authentication separate from authorization +- **Subscription-Based** — Efficient update forwarding +- **Content-Addressed** — All data identified by BLAKE3 hash +- **Idempotent** — Receiving same data twice is safe +- **Compile-Time Validation** — Types make invalid states unrepresentable +- **Newtypes for Domain Concepts** — Prevent mixing different semantic types + +--- + +### Architecture & Components + +The project is organized as a Rust workspace with 16 member crates: + +#### Core Layer (3 crates) + +| Crate | Description | +|---------------------|------------------------------------------------------------------------------------------------------------------| +| `sedimentree_core` | Core data partitioning scheme using depth-based hierarchical layers for efficient metadata-based synchronization | +| `subduction_crypto` | Cryptographic types: Ed25519-signed payloads with type-state pattern for verification status | +| `subduction_core` | Main synchronization protocol implementation with pluggable storage, connections, and policies | + +#### Storage Layer (1 crate) + +| Crate | Description | +|--------------------------|----------------------------------------------------------| +| `sedimentree_fs_storage` | Filesystem-based persistent storage for Sedimentree data | + +#### Transport Layer (3 crates) + +| Crate | Description | +|----------------------------|---------------------------------------------------------------------| +| `subduction_http_longpoll` | HTTP long-poll transport for restrictive network environments | +| `subduction_iroh` | Iroh (QUIC) transport for direct P2P connections with NAT traversal | +| `subduction_websocket` | WebSocket transport for browser and Node.js environments | + +#### Integration Layer (4 crates) + +| Crate | Description | +|-----------------------------|--------------------------------------------------------------------| +| `automerge_sedimentree` | Adapter for synchronizing Automerge CRDT documents via Sedimentree | +| `subduction_keyhive` | Integration with Keyhive access control system | +| `subduction_keyhive_policy` | Keyhive-based authorization policy for connections | +| `bijou64` | Utility crate (64-bit optimizations) | + +#### WebAssembly Bindings (4 crates) + +| Crate | Description | +|------------------------------|---------------------------------------------------| +| `sedimentree_wasm` | Wasm bindings for Sedimentree | +| `subduction_wasm` | Wasm bindings for browser and Node.js | +| `automerge_sedimentree_wasm` | Wasm wrapper for Automerge + Sedimentree | +| `automerge_subduction_wasm` | Full sync stack (Automerge + Subduction) for Wasm | + +#### Tools (1 crate) + +| Crate | Description | +|------------------|--------------------------------------------------------------| +| `subduction_cli` | Command-line tool for running sync servers and managing data | + +--- + +### Technical Architecture + +#### Key Abstractions + +##### FutureForm Trait + +Enables portable async code across native Rust (Tokio) and WebAssembly (single-threaded): + +- Provides two implementations: `Sendable` (Send + Sync) for multi-threaded and `Local` for single-threaded environments +- Uses macro-based code generation to support both forms + +##### Generic Parameters on Subduction + +```rust +pub struct Subduction<'a, F: FutureForm, S: Storage, C: Connection, + P: ConnectionPolicy + StoragePolicy, + M: DepthMetric, const N: usize> +``` + +Compile-time configuration enables type-safe instantiation without runtime dispatch overhead. + +##### Policy Traits (Capability-Based Access Control) + +| Trait | Purpose | +|--------------------|-------------------------------------------------------------------------------| +| `ConnectionPolicy` | Authorization at the connection level (is this peer allowed?) | +| `StoragePolicy` | Authorization at the document level (can this peer read/write this document?) | +| `OpenPolicy` | Permissive default (allows everything) | +| `KeyhivePolicy` | Real authorization with Keyhive integration | + +##### Cryptographic Types + +| Type | Description | +|------------------------|-----------------------------------------------------------| +| `Signed` | Payload with Ed25519 signature (unverified) | +| `VerifiedSignature` | Witness that signature has been verified | +| `VerifiedMeta` | Witness that signature is valid AND blob matches metadata | + +The type-state pattern prevents "verify and forget" bugs at compile time. + +##### NonceCache + +- Replay protection for handshake protocol +- Time-based bucket system (4 buckets × 3 min = 12 min window) +- Lazy garbage collection, no background task needed + +#### Sedimentree Data Structure + +Organizes CRDT data into depth-stratified layers based on content hash: + +| Depth | Contains | +|----------|------------------------------------| +| Depth 0 | All commits | +| Depth 1 | Commits with 1+ leading zero bytes | +| Depth 2 | Commits with 2+ leading zero bytes | +| Depth 3+ | Further filtering | + +**Benefits:** + +- Efficient sync through fingerprint-based reconciliation +- Compare summaries at higher depths first +- Drill down only where differences exist +- ~75% bandwidth reduction via 8-byte SipHash fingerprints instead of 32-byte digests + +#### Connection Lifecycle + +``` +Client ─(TCP/WebSocket)─→ Server +Client ─(Signed Challenge)─→ Server +Server ─(Signed Response)─→ Client +Both verify signatures and check ConnectionPolicy +↓ +Authenticated connection established +``` + +--- + +### Protocol Layers + +| Layer | Component | Description | +|---------|-------------|--------------------------------------------------------------------------| +| Layer 1 | Transport | WebSocket, HTTP Long-Poll, or Iroh/QUIC | +| Layer 2 | Connection | Handshake with mutual Ed25519 authentication, policy-based authorization | +| Layer 3 | Sync | Batch sync (pull), Incremental sync (push), Subscriptions | +| Layer 4 | Application | Automerge, custom CRDTs, or other data structures | + +--- + +### Sync Protocols + +#### Batch Sync + +- **Type:** Pull-based (request/response) +- **Mechanism:** Uses fingerprint-based reconciliation for compact diffs +- **Delivery:** Guaranteed +- **Use Cases:** Initial sync, reconnection, consistency checks + +#### Incremental Sync + +- **Type:** Push-based (fire-and-forget) +- **Mechanism:** Immediate updates for active editing +- **Delivery:** Best-effort +- **Use Cases:** Real-time collaboration + +#### Subscriptions + +- Optional per-document subscription +- Updates forwarded only to authorized subscribed peers +- Efficient batch update forwarding + +#### Reconnection + +- Automatic detection and recovery +- Batch sync used to catch up after missed incremental updates + +--- + +### Security Model + +#### Trust Assumptions + +**Trusted:** + +- Ed25519 signatures are unforgeable +- BLAKE3 is collision-resistant and preimage-resistant +- TLS provides transport encryption when used +- Private keys remain secret +- Local storage is not compromised + +**Not Trusted:** + +- Network (attacker can observe, delay, drop, inject) +- Peers (may be malicious, compromised, buggy) +- Clocks (tolerate ±10 minutes drift) +- Server operators (may attempt unauthorized access) + +#### Security Goals + +1. **Authentication** — Know who you're talking to +2. **Integrity** — Detect message tampering +3. **Replay Protection** — Reject replayed handshakes +4. **Authorization** — Enforce access control per document +5. **Confidentiality** — Data encrypted at rest and in transit + +--- + +### CLI Tool (`subduction_cli`) + +#### Commands + +| Command | Description | +|----------|-------------------------------------------------| +| `server` | Starts a sync node with configurable transports | +| `purge` | Deletes all stored data | + +#### Features + +- Multi-transport support (WebSocket, HTTP long-poll, Iroh) +- Flexible key management (command-line, file-based, ephemeral) +- Peer connection configuration +- Metrics export (Prometheus) +- Reverse proxy support +- NixOS and Home Manager integration + +--- + +### Technologies & Dependencies + +#### Core Rust + +| Dependency | Purpose | +|----------------------|----------------------------------| +| `tokio` | Async runtime for native targets | +| `futures` | Async abstractions | +| `ed25519-dalek` | Ed25519 signatures | +| `blake3` | Content hashing | +| `serde` + `ciborium` | CBOR serialization | + +#### Networking + +| Dependency | Purpose | +|---------------------|---------------------------------| +| `async-tungstenite` | WebSocket implementation | +| `axum` | HTTP server framework | +| `iroh` | QUIC protocol and NAT traversal | +| `hyper` | HTTP client/server | + +#### WebAssembly + +| Dependency | Purpose | +|------------------------|------------------------| +| `wasm-bindgen` | JS↔Wasm FFI | +| `wasm-bindgen-futures` | Async support for Wasm | +| `wasm-tracing` | Logging for browser | + +#### Testing & Quality + +| Tool | Purpose | +|--------------|------------------------| +| `bolero` | Property-based fuzzing | +| `criterion` | Benchmarking | +| `playwright` | E2E testing for Wasm | + +--- + +### Project Structure + +``` +subduction/ +├── sedimentree_core/ ## Core partitioning & metadata +├── sedimentree_fs_storage/ ## Filesystem storage +├── sedimentree_wasm/ ## Wasm bindings +├── subduction_crypto/ ## Signed types & crypto +├── subduction_core/ ## Sync protocol +├── subduction_{http_longpoll,iroh,websocket}/ ## Transports +├── subduction_wasm/ ## Full Wasm bindings +├── subduction_keyhive/ ## Keyhive types +├── subduction_keyhive_policy/ ## Keyhive authorization +├── automerge_sedimentree/ ## Automerge integration +├── automerge_*_wasm/ ## Automerge Wasm bindings +├── subduction_cli/ ## CLI server tool +├── design/ ## Protocol documentation +│ ├── security/ ## Threat model +│ └── sync/ ## Sync protocols +└── HACKING.md ## Contributor guide +``` + +--- + +### Applications + +Subduction's architecture makes it suitable for a wide range of decentralized, collaborative applications: + +#### Real-Time Collaborative Editing + +- **Document Editors** — Google Docs-style collaborative editing without central servers +- **Code Editors** — Pair programming and collaborative coding environments +- **Design Tools** — Multi-user design applications (like Figma) with local-first architecture +- **Whiteboards** — Real-time collaborative diagramming and brainstorming tools + +#### Local-First Applications + +- **Note-Taking Apps** — Obsidian-like applications with seamless multi-device sync +- **Task Management** — Todo lists and project management tools that work offline +- **Personal Knowledge Bases** — Roam Research or Notion alternatives with true data ownership +- **Journaling Apps** — Private journals with end-to-end encryption and cross-device sync + +#### Decentralized Social & Communication + +- **Chat Applications** — End-to-end encrypted messaging without central servers +- **Social Networks** — Federated or fully decentralized social platforms +- **Forums & Discussion Boards** — Community platforms with no single point of failure +- **Email Alternatives** — Decentralized messaging systems + +#### Data Synchronization Infrastructure + +- **Database Replication** — Syncing distributed databases across nodes +- **Configuration Management** — Distributing configuration across a fleet of servers +- **CDN-like Content Distribution** — Efficient content propagation across edge nodes +- **IoT Device Sync** — Synchronizing state across IoT devices and gateways + +#### Privacy-Focused Applications + +- **Healthcare Records** — Patient-controlled medical records with selective sharing +- **Financial Data** — Personal finance apps with encrypted cloud backup +- **Legal Documents** — Secure document sharing for legal proceedings +- **Whistleblower Platforms** — Secure, anonymous document sharing + +#### Gaming & Virtual Worlds + +- **Multiplayer Game State** — Synchronizing game world state without dedicated servers +- **Virtual Worlds** — Decentralized metaverse applications +- **Persistent Worlds** — MMO-style games with community-run infrastructure + +#### Developer Tools + +- **Distributed Version Control** — Git-like systems with better merge semantics +- **API Mocking & Testing** — Sharing API state across development teams +- **Feature Flags** — Distributed feature flag management +- **Configuration Sync** — Developer environment configuration sharing + +#### Enterprise Applications + +- **Offline-First Field Apps** — Applications for workers in low-connectivity environments +- **Multi-Region Deployment** — Consistent state across globally distributed systems +- **Compliance & Audit** — Tamper-evident logs with cryptographic verification +- **Disaster Recovery** — Resilient data storage with no single point of failure + +#### Research & Academic + +- **Collaborative Research** — Sharing datasets and findings across institutions +- **Lab Notebooks** — Electronic lab notebooks with provenance tracking +- **Reproducible Science** — Versioned, content-addressed research artifacts + +--- + +### Why Subduction Over Alternatives? + +| Feature | Subduction | Traditional Sync | Blockchain | +|-------------------------|--------------------|------------------|-----------------------| +| Decentralized | Yes | No | Yes | +| Efficient Bandwidth | Yes (fingerprints) | Varies | No (full replication) | +| Works with Encryption | Yes | Varies | Limited | +| Real-time Updates | Yes | Yes | No (consensus delay) | +| Offline Support | Yes | Limited | No | +| No Token/Cryptocurrency | Yes | Yes | Usually No | +| Flexible Authorization | Yes (Keyhive) | Centralized | Smart contracts | + +--- + +### Development & Deployment + +#### Build Requirements + +- Rust 1.90+ +- For Wasm: wasm-pack +- For browser testing: Node.js 22+, pnpm + +#### Deployment Options + +- Standalone binary +- Nix flake (with NixOS/Home Manager modules) +- Docker containers (via CLI) +- System services (systemd, launchd) + +#### Reverse Proxy Support + +- Caddy integration +- Custom TLS/HTTPS setup + +--- + +### Testing Strategy + +| Type | Tool/Approach | +|----------------------|---------------------------------------| +| Unit Tests | Standard `#[test]` inline in modules | +| Property-Based Tests | `bolero` for fuzz testing | +| E2E Tests | Playwright tests for Wasm bindings | +| Integration Tests | Round-trip transport connection tests | + +--- + +### Current Status & Roadmap + +- **Version:** 0.5.0 (core crates) +- **Maturity:** Early release preview with unstable API +- **Production Use:** NOT recommended at this time +- **Active Development:** Regular updates and bug fixes + +#### Known Limitations + +- API is unstable and may change +- Documentation is still evolving +- Some edge cases may not be fully handled +- Performance optimization ongoing + +--- + +### Getting Started + +#### Basic Usage (Rust) + +```rust +// Example usage with Automerge +use automerge_sedimentree::AutomergeSedimentree; +use subduction_core::Subduction; + +// Initialize with your chosen transport and storage +let subduction = Subduction::new(storage, transport, policy); + +// Sync documents +subduction.sync(document_id).await?; +``` + +#### CLI Server + +```bash +## Start a WebSocket sync server +subduction server --websocket-port 8080 + +## Start with multiple transports +subduction server --websocket-port 8080 --http-port 8081 + +## Connect to peers +subduction server --peer ws://other-node:8080 +``` + +--- + +### References + +- [Ink & Switch Research](https://www.inkandswitch.com/) +- [Local-First Software](https://www.inkandswitch.com/local-first/) +- [Automerge](https://automerge.org/) +- [CRDTs](https://crdt.tech/) +- [Iroh](https://iroh.computer/) + +## Changelog + +* **Mar 4, 2026** -- The first version was created.