useful-notes/hassan/004-subduction-notes.md
2026-03-04 09:59:09 +01:00

493 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Subduction Project Notes
### Overview
**Subduction** is a peer-to-peer synchronization protocol and implementation for CRDTs (Conflict-free Replicated Data Types) that enables efficient
synchronization of encrypted, partitioned data between peers without requiring a central server.
- **Repository:** https://github.com/inkandswitch/subduction
- **Developed by:** Ink & Switch
- **License:** MIT OR Apache-2.0
- **Language:** Rust (with WebAssembly bindings)
- **Status:** Early release preview (unstable API)
- **Version:** 0.5.0
---
### Core Purpose & Features
#### Key Capabilities
1. **Efficient Sync Protocol** — Uses Sedimentree for history sharding, diffing, and incremental synchronization
2. **Encryption-Friendly** — Works with encrypted data partitions without requiring decryption during sync
3. **Decentralized** — True peer-to-peer synchronization via pluggable transports
4. **Multi-Platform** — Runs on native Rust, WebAssembly (browser & Node.js), and provides a CLI tool
5. **Automerge Integration** — While protocol-agnostic, was originally designed for Automerge documents
#### Design Principles
- **no_std Compatible** — Core logic works without standard library
- **Transport Agnostic** — Protocol-independent message format
- **Policy Separation** — Authentication separate from authorization
- **Subscription-Based** — Efficient update forwarding
- **Content-Addressed** — All data identified by BLAKE3 hash
- **Idempotent** — Receiving same data twice is safe
- **Compile-Time Validation** — Types make invalid states unrepresentable
- **Newtypes for Domain Concepts** — Prevent mixing different semantic types
---
### Architecture & Components
The project is organized as a Rust workspace with 16 member crates:
#### Core Layer (3 crates)
| Crate | Description |
|---------------------|------------------------------------------------------------------------------------------------------------------|
| `sedimentree_core` | Core data partitioning scheme using depth-based hierarchical layers for efficient metadata-based synchronization |
| `subduction_crypto` | Cryptographic types: Ed25519-signed payloads with type-state pattern for verification status |
| `subduction_core` | Main synchronization protocol implementation with pluggable storage, connections, and policies |
#### Storage Layer (1 crate)
| Crate | Description |
|--------------------------|----------------------------------------------------------|
| `sedimentree_fs_storage` | Filesystem-based persistent storage for Sedimentree data |
#### Transport Layer (3 crates)
| Crate | Description |
|----------------------------|---------------------------------------------------------------------|
| `subduction_http_longpoll` | HTTP long-poll transport for restrictive network environments |
| `subduction_iroh` | Iroh (QUIC) transport for direct P2P connections with NAT traversal |
| `subduction_websocket` | WebSocket transport for browser and Node.js environments |
#### Integration Layer (4 crates)
| Crate | Description |
|-----------------------------|--------------------------------------------------------------------|
| `automerge_sedimentree` | Adapter for synchronizing Automerge CRDT documents via Sedimentree |
| `subduction_keyhive` | Integration with Keyhive access control system |
| `subduction_keyhive_policy` | Keyhive-based authorization policy for connections |
| `bijou64` | Utility crate (64-bit optimizations) |
#### WebAssembly Bindings (4 crates)
| Crate | Description |
|------------------------------|---------------------------------------------------|
| `sedimentree_wasm` | Wasm bindings for Sedimentree |
| `subduction_wasm` | Wasm bindings for browser and Node.js |
| `automerge_sedimentree_wasm` | Wasm wrapper for Automerge + Sedimentree |
| `automerge_subduction_wasm` | Full sync stack (Automerge + Subduction) for Wasm |
#### Tools (1 crate)
| Crate | Description |
|------------------|--------------------------------------------------------------|
| `subduction_cli` | Command-line tool for running sync servers and managing data |
---
### Technical Architecture
#### Key Abstractions
##### FutureForm Trait
Enables portable async code across native Rust (Tokio) and WebAssembly (single-threaded):
- Provides two implementations: `Sendable` (Send + Sync) for multi-threaded and `Local` for single-threaded environments
- Uses macro-based code generation to support both forms
##### Generic Parameters on Subduction
```rust
pub struct Subduction<'a, F: FutureForm, S: Storage<F>, C: Connection<F>,
P: ConnectionPolicy<F> + StoragePolicy<F>,
M: DepthMetric, const N: usize>
```
Compile-time configuration enables type-safe instantiation without runtime dispatch overhead.
##### Policy Traits (Capability-Based Access Control)
| Trait | Purpose |
|--------------------|-------------------------------------------------------------------------------|
| `ConnectionPolicy` | Authorization at the connection level (is this peer allowed?) |
| `StoragePolicy` | Authorization at the document level (can this peer read/write this document?) |
| `OpenPolicy` | Permissive default (allows everything) |
| `KeyhivePolicy` | Real authorization with Keyhive integration |
##### Cryptographic Types
| Type | Description |
|------------------------|-----------------------------------------------------------|
| `Signed<T>` | Payload with Ed25519 signature (unverified) |
| `VerifiedSignature<T>` | Witness that signature has been verified |
| `VerifiedMeta<T>` | Witness that signature is valid AND blob matches metadata |
The type-state pattern prevents "verify and forget" bugs at compile time.
##### NonceCache
- Replay protection for handshake protocol
- Time-based bucket system (4 buckets × 3 min = 12 min window)
- Lazy garbage collection, no background task needed
#### Sedimentree Data Structure
Organizes CRDT data into depth-stratified layers based on content hash:
| Depth | Contains |
|----------|------------------------------------|
| Depth 0 | All commits |
| Depth 1 | Commits with 1+ leading zero bytes |
| Depth 2 | Commits with 2+ leading zero bytes |
| Depth 3+ | Further filtering |
**Benefits:**
- Efficient sync through fingerprint-based reconciliation
- Compare summaries at higher depths first
- Drill down only where differences exist
- ~75% bandwidth reduction via 8-byte SipHash fingerprints instead of 32-byte digests
#### Connection Lifecycle
```
Client ─(TCP/WebSocket)─→ Server
Client ─(Signed Challenge)─→ Server
Server ─(Signed Response)─→ Client
Both verify signatures and check ConnectionPolicy
Authenticated connection established
```
---
### Protocol Layers
| Layer | Component | Description |
|---------|-------------|--------------------------------------------------------------------------|
| Layer 1 | Transport | WebSocket, HTTP Long-Poll, or Iroh/QUIC |
| Layer 2 | Connection | Handshake with mutual Ed25519 authentication, policy-based authorization |
| Layer 3 | Sync | Batch sync (pull), Incremental sync (push), Subscriptions |
| Layer 4 | Application | Automerge, custom CRDTs, or other data structures |
---
### Sync Protocols
#### Batch Sync
- **Type:** Pull-based (request/response)
- **Mechanism:** Uses fingerprint-based reconciliation for compact diffs
- **Delivery:** Guaranteed
- **Use Cases:** Initial sync, reconnection, consistency checks
#### Incremental Sync
- **Type:** Push-based (fire-and-forget)
- **Mechanism:** Immediate updates for active editing
- **Delivery:** Best-effort
- **Use Cases:** Real-time collaboration
#### Subscriptions
- Optional per-document subscription
- Updates forwarded only to authorized subscribed peers
- Efficient batch update forwarding
#### Reconnection
- Automatic detection and recovery
- Batch sync used to catch up after missed incremental updates
---
### Security Model
#### Trust Assumptions
**Trusted:**
- Ed25519 signatures are unforgeable
- BLAKE3 is collision-resistant and preimage-resistant
- TLS provides transport encryption when used
- Private keys remain secret
- Local storage is not compromised
**Not Trusted:**
- Network (attacker can observe, delay, drop, inject)
- Peers (may be malicious, compromised, buggy)
- Clocks (tolerate ±10 minutes drift)
- Server operators (may attempt unauthorized access)
#### Security Goals
1. **Authentication** — Know who you're talking to
2. **Integrity** — Detect message tampering
3. **Replay Protection** — Reject replayed handshakes
4. **Authorization** — Enforce access control per document
5. **Confidentiality** — Data encrypted at rest and in transit
---
### CLI Tool (`subduction_cli`)
#### Commands
| Command | Description |
|----------|-------------------------------------------------|
| `server` | Starts a sync node with configurable transports |
| `purge` | Deletes all stored data |
#### Features
- Multi-transport support (WebSocket, HTTP long-poll, Iroh)
- Flexible key management (command-line, file-based, ephemeral)
- Peer connection configuration
- Metrics export (Prometheus)
- Reverse proxy support
- NixOS and Home Manager integration
---
### Technologies & Dependencies
#### Core Rust
| Dependency | Purpose |
|----------------------|----------------------------------|
| `tokio` | Async runtime for native targets |
| `futures` | Async abstractions |
| `ed25519-dalek` | Ed25519 signatures |
| `blake3` | Content hashing |
| `serde` + `ciborium` | CBOR serialization |
#### Networking
| Dependency | Purpose |
|---------------------|---------------------------------|
| `async-tungstenite` | WebSocket implementation |
| `axum` | HTTP server framework |
| `iroh` | QUIC protocol and NAT traversal |
| `hyper` | HTTP client/server |
#### WebAssembly
| Dependency | Purpose |
|------------------------|------------------------|
| `wasm-bindgen` | JS↔Wasm FFI |
| `wasm-bindgen-futures` | Async support for Wasm |
| `wasm-tracing` | Logging for browser |
#### Testing & Quality
| Tool | Purpose |
|--------------|------------------------|
| `bolero` | Property-based fuzzing |
| `criterion` | Benchmarking |
| `playwright` | E2E testing for Wasm |
---
### Project Structure
```
subduction/
├── sedimentree_core/ ## Core partitioning & metadata
├── sedimentree_fs_storage/ ## Filesystem storage
├── sedimentree_wasm/ ## Wasm bindings
├── subduction_crypto/ ## Signed types & crypto
├── subduction_core/ ## Sync protocol
├── subduction_{http_longpoll,iroh,websocket}/ ## Transports
├── subduction_wasm/ ## Full Wasm bindings
├── subduction_keyhive/ ## Keyhive types
├── subduction_keyhive_policy/ ## Keyhive authorization
├── automerge_sedimentree/ ## Automerge integration
├── automerge_*_wasm/ ## Automerge Wasm bindings
├── subduction_cli/ ## CLI server tool
├── design/ ## Protocol documentation
│ ├── security/ ## Threat model
│ └── sync/ ## Sync protocols
└── HACKING.md ## Contributor guide
```
---
### Applications
Subduction's architecture makes it suitable for a wide range of decentralized, collaborative applications:
#### Real-Time Collaborative Editing
- **Document Editors** — Google Docs-style collaborative editing without central servers
- **Code Editors** — Pair programming and collaborative coding environments
- **Design Tools** — Multi-user design applications (like Figma) with local-first architecture
- **Whiteboards** — Real-time collaborative diagramming and brainstorming tools
#### Local-First Applications
- **Note-Taking Apps** — Obsidian-like applications with seamless multi-device sync
- **Task Management** — Todo lists and project management tools that work offline
- **Personal Knowledge Bases** — Roam Research or Notion alternatives with true data ownership
- **Journaling Apps** — Private journals with end-to-end encryption and cross-device sync
#### Decentralized Social & Communication
- **Chat Applications** — End-to-end encrypted messaging without central servers
- **Social Networks** — Federated or fully decentralized social platforms
- **Forums & Discussion Boards** — Community platforms with no single point of failure
- **Email Alternatives** — Decentralized messaging systems
#### Data Synchronization Infrastructure
- **Database Replication** — Syncing distributed databases across nodes
- **Configuration Management** — Distributing configuration across a fleet of servers
- **CDN-like Content Distribution** — Efficient content propagation across edge nodes
- **IoT Device Sync** — Synchronizing state across IoT devices and gateways
#### Privacy-Focused Applications
- **Healthcare Records** — Patient-controlled medical records with selective sharing
- **Financial Data** — Personal finance apps with encrypted cloud backup
- **Legal Documents** — Secure document sharing for legal proceedings
- **Whistleblower Platforms** — Secure, anonymous document sharing
#### Gaming & Virtual Worlds
- **Multiplayer Game State** — Synchronizing game world state without dedicated servers
- **Virtual Worlds** — Decentralized metaverse applications
- **Persistent Worlds** — MMO-style games with community-run infrastructure
#### Developer Tools
- **Distributed Version Control** — Git-like systems with better merge semantics
- **API Mocking & Testing** — Sharing API state across development teams
- **Feature Flags** — Distributed feature flag management
- **Configuration Sync** — Developer environment configuration sharing
#### Enterprise Applications
- **Offline-First Field Apps** — Applications for workers in low-connectivity environments
- **Multi-Region Deployment** — Consistent state across globally distributed systems
- **Compliance & Audit** — Tamper-evident logs with cryptographic verification
- **Disaster Recovery** — Resilient data storage with no single point of failure
#### Research & Academic
- **Collaborative Research** — Sharing datasets and findings across institutions
- **Lab Notebooks** — Electronic lab notebooks with provenance tracking
- **Reproducible Science** — Versioned, content-addressed research artifacts
---
### Why Subduction Over Alternatives?
| Feature | Subduction | Traditional Sync | Blockchain |
|-------------------------|--------------------|------------------|-----------------------|
| Decentralized | Yes | No | Yes |
| Efficient Bandwidth | Yes (fingerprints) | Varies | No (full replication) |
| Works with Encryption | Yes | Varies | Limited |
| Real-time Updates | Yes | Yes | No (consensus delay) |
| Offline Support | Yes | Limited | No |
| No Token/Cryptocurrency | Yes | Yes | Usually No |
| Flexible Authorization | Yes (Keyhive) | Centralized | Smart contracts |
---
### Development & Deployment
#### Build Requirements
- Rust 1.90+
- For Wasm: wasm-pack
- For browser testing: Node.js 22+, pnpm
#### Deployment Options
- Standalone binary
- Nix flake (with NixOS/Home Manager modules)
- Docker containers (via CLI)
- System services (systemd, launchd)
#### Reverse Proxy Support
- Caddy integration
- Custom TLS/HTTPS setup
---
### Testing Strategy
| Type | Tool/Approach |
|----------------------|---------------------------------------|
| Unit Tests | Standard `#[test]` inline in modules |
| Property-Based Tests | `bolero` for fuzz testing |
| E2E Tests | Playwright tests for Wasm bindings |
| Integration Tests | Round-trip transport connection tests |
---
### Current Status & Roadmap
- **Version:** 0.5.0 (core crates)
- **Maturity:** Early release preview with unstable API
- **Production Use:** NOT recommended at this time
- **Active Development:** Regular updates and bug fixes
#### Known Limitations
- API is unstable and may change
- Documentation is still evolving
- Some edge cases may not be fully handled
- Performance optimization ongoing
---
### Getting Started
#### Basic Usage (Rust)
```rust
// Example usage with Automerge
use automerge_sedimentree::AutomergeSedimentree;
use subduction_core::Subduction;
// Initialize with your chosen transport and storage
let subduction = Subduction::new(storage, transport, policy);
// Sync documents
subduction.sync(document_id).await?;
```
#### CLI Server
```bash
## Start a WebSocket sync server
subduction server --websocket-port 8080
## Start with multiple transports
subduction server --websocket-port 8080 --http-port 8081
## Connect to peers
subduction server --peer ws://other-node:8080
```
---
### References
- [Ink & Switch Research](https://www.inkandswitch.com/)
- [Local-First Software](https://www.inkandswitch.com/local-first/)
- [Automerge](https://automerge.org/)
- [CRDTs](https://crdt.tech/)
- [Iroh](https://iroh.computer/)
## Changelog
* **Mar 4, 2026** -- The first version was created.