Tuần Bonus: CRDTs — Conflict-free Replicated Data Types

“5 người cùng edit Google Docs trong 5 quốc gia. Internet ai cũng chập chờn. Nhưng văn bản cuối cùng luôn convergent — không ai mất câu nào, không ai phải resolve conflict thủ công. Magic? Không, đó là CRDT — và Figma, Notion, Apple Notes, Linear, Yjs đều dùng pattern này.”

Tags: system-design crdt distributed-systems consistency collaboration bonus Student: Hieu (Backend Dev → Architect) Prerequisite: Tuan-Bonus-Consistency-Models-Isolation · Tuan-Bonus-Consensus-Raft-Paxos Liên quan: Tuan-07-Database-Sharding-Replication · Tuan-20-Design-Key-Value-Store · Case-Design-Nearby-Friends


1. Context & Why

Analogy đời thường — 5 người ghi sổ chung

Hieu, tưởng tượng em và 4 người bạn cùng quản lý sổ tay tổng hợp doanh thu cửa hàng. Nguyên tắc:

  • Không có “boss” quyết định ai đúng (no central authority)
  • Mỗi người có bản sao sổ riêng
  • Internet chập chờn → có lúc không liên lạc được
  • Cuối ngày, tất cả phải có cùng tổng kết doanh thu

Có 2 cách:

Cách 1 — Lock & coordinate:

  • Trước khi ghi, “raise hand” hỏi: “Ai đang ghi?”
  • Đợi tất cả đồng ý → ghi → release
  • Vấn đề: 1 người mất internet → cả nhóm pause

Cách 2 — CRDT pattern:

  • Mỗi người có sổ riêng, ghi tự do, không hỏi ai
  • Khi gặp nhau (sync), so sánh sổ và merge tự động
  • Quy tắc merge có tính chất toán học đảm bảo: bất kể merge order nào, kết quả cuối cùng giống nhau
  • Vấn đề: chỉ work với structures cụ thể (counter, set, register, sequence)

CRDT (Conflict-free Replicated Data Type) là cách 2. Nó là class các data structures được thiết kế sao cho merge tự động đảm bảo convergence.

Tại sao Backend Dev cần hiểu CRDT?

Lý doVí dụ thực tế
Multi-master DBRiak, Redis CRDT (Active-Active) — không có “primary”
Collaborative editingFigma (multiplayer design), Notion (docs), Linear (issues)
Offline-first appsApple Notes, Bear, Obsidian Sync — edit offline, merge khi online
Mobile synciOS CloudKit, signal messenger, WhatsApp
Edge computingCloudflare Workers, Vercel — replicate state to edge
Decentralized appsCrdts là foundation cho local-first software

Key insight: CRDT trade-off là giảm expressiveness (chỉ support specific data types) để có automatic merge without coordination. Nếu app của em fit với CRDT shape → có thể eliminate toàn bộ consensus layer.

Tại sao Alex Xu không đi sâu vào CRDT?

Alex Xu vol 1+2 nhắc CRDTs ở mức 1 dòng (trong Chat System chapter). Nhưng CRDTs là active research với production application ngày càng nhiều. Figma, Notion, Apple Notes đều base trên CRDT — đây là gap critical cho hệ thống hiện đại.

Tham chiếu chính


2. Deep Dive — Khái niệm cốt lõi

2.1 Tại sao CRDT — bài toán

Scenario: Multi-master database với 3 replicas. User A update X=10 trên Replica 1. Cùng lúc User B update X=20 trên Replica 2. Khi sync, X = ?

Solutions truyền thống:

ApproachVấn đề
Last-Write-Wins (LWW)Mất 1 update; clock drift gây bug
Quorum (Cassandra)Vẫn cần coordinate trong quorum
Manual conflict resolutionBad UX (cho user)
2PCBlock on partition

CRDT solution: Design data structure sao cho A ⊕ B = B ⊕ A và idempotent. Không có conflict — mọi state đều merge được.

2.2 Hai loại CRDT

2.2.1 State-based CRDT (CvRDT — Convergent)

Cơ chế: Mỗi replica maintain full state. Khi sync, replicas exchange state và merge.

Replica 1: state_1
Replica 2: state_2

Sync: state_merged = merge(state_1, state_2)

Yêu cầu (cho convergence):

  1. Commutative: merge(a, b) = merge(b, a)
  2. Associative: merge(merge(a, b), c) = merge(a, merge(b, c))
  3. Idempotent: merge(a, a) = a

3 properties này = “semilattice” — domain math trừu tượng nhưng quan trọng.

Pros:

  • Đơn giản về reasoning
  • Network có thể lossy (mất message OK, sẽ resync)

Cons:

  • State có thể lớn (phải gửi full state)
  • Có optimization: delta-state CRDT chỉ gửi delta

2.2.2 Operation-based CRDT (CmRDT — Commutative)

Cơ chế: Replicas exchange operations (deltas). Mỗi op apply locally + broadcast.

Replica 1: apply op1 → broadcast op1
Replica 2: receive op1 → apply op1

Yêu cầu:

  • Reliable broadcast: Mọi op phải reach mọi replica
  • Causal delivery: Op causally dependent phải arrive theo order
  • Commutative: Concurrent ops commute

Pros:

  • Smaller messages (chỉ ops, không state)
  • Có thể compose ops (e.g., delta CRDTs)

Cons:

  • Cần reliable + causal broadcast
  • Dropped op = inconsistency

2.3 Foundational CRDTs

2.3.1 G-Counter (Grow-only Counter)

Use case: Page view counter, like count, distributed metric.

Constraint: Chỉ tăng, không giảm.

State: Vector [c_1, c_2, ..., c_N] với c_i = count từ replica i.

class GCounter:
    def __init__(self, num_replicas, my_id):
        self.counts = [0] * num_replicas
        self.my_id = my_id
 
    def increment(self, n=1):
        self.counts[self.my_id] += n
 
    def value(self):
        return sum(self.counts)
 
    def merge(self, other):
        for i in range(len(self.counts)):
            self.counts[i] = max(self.counts[i], other.counts[i])

Tại sao max() đảm bảo convergence:

  • Commutative: max(a, b) = max(b, a) ✓
  • Associative: max(max(a,b), c) = max(a, max(b,c)) ✓
  • Idempotent: max(a, a) = a ✓

Ví dụ:

R1: counts=[5, 0, 0], increment → [6, 0, 0]
R2: counts=[0, 3, 0], increment → [0, 4, 0]

Merge: [max(6,0), max(0,4), max(0,0)] = [6, 4, 0] → value = 10

2.3.2 PN-Counter (Positive-Negative Counter)

Use case: Counter cần increment + decrement (e.g., shopping cart quantity, balance).

Trick: Combine 2 G-Counters — one for increments (P), one for decrements (N).

class PNCounter:
    def __init__(self, num_replicas, my_id):
        self.p = GCounter(num_replicas, my_id)
        self.n = GCounter(num_replicas, my_id)
 
    def increment(self, k=1):
        self.p.increment(k)
 
    def decrement(self, k=1):
        self.n.increment(k)
 
    def value(self):
        return self.p.value() - self.n.value()
 
    def merge(self, other):
        self.p.merge(other.p)
        self.n.merge(other.n)

2.3.3 LWW-Register (Last-Write-Wins Register)

Use case: Single value với occasional updates (e.g., user profile field).

State: (value, timestamp). Merge: keep value với timestamp lớn nhất.

class LWWRegister:
    def __init__(self):
        self.value = None
        self.timestamp = 0  # Hybrid Logical Clock recommended
 
    def write(self, value, timestamp):
        if timestamp > self.timestamp:
            self.value = value
            self.timestamp = timestamp
 
    def merge(self, other):
        if other.timestamp > self.timestamp:
            self.value = other.value
            self.timestamp = other.timestamp

Vấn đề: Tie-break khi 2 timestamps bằng nhau. Solutions:

  • Tie-break by replica ID (deterministic)
  • Use HLC (Hybrid Logical Clock) → unique per-replica

Limitation: LWW LOSES UPDATES. Nếu 2 user update đồng thời → 1 update bị “thua”. Không suitable cho collaborative editing.

2.3.4 OR-Set (Observed-Remove Set)

Use case: Distributed set với add + remove (e.g., shopping cart items, task list, friends list).

Vấn đề of naive set: Nếu A add X, B remove X concurrently → final state ?

Naive 2P-Set (two-phase set):

class TwoPhaseSet:
    """Add to A, remove to T (tombstones). Once removed, can't re-add."""
    def __init__(self):
        self.added = set()
        self.removed = set()  # tombstones
 
    def contains(self, x):
        return x in self.added and x not in self.removed
 
    def add(self, x):
        self.added.add(x)
 
    def remove(self, x):
        if x in self.added:
            self.removed.add(x)
 
    def merge(self, other):
        self.added |= other.added
        self.removed |= other.removed

Vấn đề: không thể re-add sau khi remove (forever tombstoned).

OR-Set giải quyết: Mỗi add tag với unique ID. Remove chỉ remove các unique IDs đã observe.

import uuid
 
class ORSet:
    def __init__(self):
        # element → set of unique add tags
        self.adds = {}     # {x: {tag1, tag2, ...}}
        self.removes = {}  # {x: {tag1, tag2, ...}} (tombstoned tags)
 
    def contains(self, x):
        active_tags = self.adds.get(x, set()) - self.removes.get(x, set())
        return len(active_tags) > 0
 
    def add(self, x):
        tag = str(uuid.uuid4())
        self.adds.setdefault(x, set()).add(tag)
 
    def remove(self, x):
        # Remove only the tags we've seen
        tags = self.adds.get(x, set()).copy()
        self.removes.setdefault(x, set()).update(tags)
 
    def merge(self, other):
        for x, tags in other.adds.items():
            self.adds.setdefault(x, set()).update(tags)
        for x, tags in other.removes.items():
            self.removes.setdefault(x, set()).update(tags)

Magic: Concurrent add and remove resolve correctly:

  • A removes X (sees tag T1), B adds X concurrently (creates tag T2)
  • A’s remove only tombstones T1
  • After merge: adds = {T1, T2}, removes = {T1} → active = {T2} → X is in set (B’s add wins)

Use cases real-world:

  • Cart items (add/remove products)
  • Tag/label assignments
  • Friend/follow lists

2.3.5 RGA (Replicated Growable Array) — for ordered text

Use case: Collaborative text editing (Google Docs, Figma).

Problem: Text = sequence of characters. Insert/delete at any position. 2 user concurrent edits → must converge to same final text.

Naive approach: Use string position (index 5). Bug: A inserts at position 5, B inserts at position 5 → after A’s insert, B’s “position 5” mean different thing.

RGA solution: Each character has unique ID + reference to previous character ID (linked list).

Initial: "ABC"
A:1 → B:2 → C:3

Replica 1 inserts X after B:
A:1 → B:2 → X:1.5 → C:3

Replica 2 inserts Y after B (concurrently):
A:1 → B:2 → Y:1.6 → C:3

Merge: A:1 → B:2 → [X:1.5, Y:1.6] → C:3
       Tie-break by ID → "ABXYC" or "ABYXC" (deterministic)

Implementation: Each char = (char, unique_id, parent_id). Merge inserts both X and Y after B, sort by ID.

2.3.6 Yjs / Y-CRDT — Production text CRDT

Yjs (https://yjs.dev) là production CRDT library cho text + structured data, được dùng bởi:

  • Figma (multiplayer design)
  • Linear (issue tracking)
  • Sanity (CMS)
  • Notion (parts of)
  • JupyterLab (RTC notebooks)

Y-Doc structure:

Y.Doc
├── Y.Text (collaborative string)
├── Y.Array (ordered list)
├── Y.Map (key-value)
├── Y.XmlElement (rich text DOM)
// Yjs example
import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
 
const ydoc = new Y.Doc()
const provider = new WebsocketProvider('wss://demos.yjs.dev', 'my-room', ydoc)
 
const ytext = ydoc.getText('chapter-1')
 
// Local edit
ytext.insert(0, 'Hello, ')
ytext.insert(7, 'World!')
 
// Observe remote changes
ytext.observe(event => {
  console.log('Text changed:', ytext.toString())
})

Yjs underlying algorithm: YATA (Yet Another Transformation Approach) — variant of RGA.

Performance:

  • 1M operations: ~100ms merge time
  • Memory ~10x text size (with tombstones)
  • Garbage collection để reduce overhead

2.3.7 Automerge — alternative to Yjs

Automerge (https://automerge.org/) — CRDT library by Martin Kleppmann.

Differences from Yjs:

  • More structured (JSON-like documents)
  • Pure JavaScript (Yjs has Rust core)
  • Better for complex documents (Y.Doc trees)
  • Slightly slower than Yjs

Use cases: Local-first software, offline-first mobile apps.

Tham chiếu:

2.4 CRDT Limitations

2.4.1 Tombstones — garbage problem

OR-Set, RGA, Yjs all use tombstones (remove markers). Khi delete, tombstone vẫn ở trong state forever (cần để reject re-deliver).

Result: State grows indefinitely.

Solutions:

  • Causal stability: Khi tombstone đã propagate to all replicas → safe to delete
  • Periodic GC: Yjs có GC mode (deletes tombstones after threshold)
  • Trade-off: GC giảm size nhưng có thể lose history

2.4.2 Cannot enforce invariants

CRDT không thể: “Total balance ≥ 0”, “Inventory ≥ 0”, “Max 5 admins”.

Reason: Each replica makes decision independently. 2 replicas concurrent withdraw → mỗi replica thấy balance OK → both succeed → balance < 0.

Solutions:

  • Use consensus (Raft) cho invariants
  • Or escrow tokens: Pre-allocate budget per replica → each replica spend within budget
  • Or bounded counter CRDT: research, ngày càng phổ biến

2.4.3 Causal context required

Operation-based CRDT requires causal delivery → cần vector clocks hoặc HLC.

Cost: Vector clock O(N) bytes per operation. Optimizations:

  • Dotted version vector
  • Compact vector clock (compress runs)
  • HLC (Hybrid Logical Clock) — fixed size

2.5 CRDT vs Operational Transformation (OT)

OT là pre-CRDT approach cho collaborative editing (Google Docs uses OT).

OTCRDT
Conflict resolutionTransform operations (complex)Commutative ops (simpler)
Server requirementCentral server (relay + transform)Optional (P2P possible)
Math foundationOperationalAlgebraic
ImplementationsGoogle Docs, ShareJSYjs, Automerge, Riak
ComplexityHard to implement correctlyEasier (compose primitives)
PerformanceBetter for centralizedBetter for P2P

Trend 2024-2026: New systems prefer CRDT (Figma, Linear). Legacy stays on OT (Google Docs).

2.6 Real-world implementations

2.6.1 Riak (Database with CRDT support)

# Riak CRDT data types: Counter, Set, Map, Register, Flag
import riak
client = riak.RiakClient()
bucket = client.bucket_type('counters').bucket('page_views')
 
counter = bucket.new('homepage')
counter.increment(1)
counter.store()
 
# Active-active across regions: each region writes locally, merges async

2.6.2 Redis CRDT (Redis Enterprise)

Redis Enterprise có Active-Active CRDB với CRDT support cho:

  • Strings, Counters (G-Counter, PN-Counter)
  • Sets (OR-Set)
  • Hashes (Map of LWW-Registers)
  • Sorted Sets, Lists, Streams

Multi-region active-active without conflicts.

2.6.3 Figma’s multiplayer

Figma uses CRDT cho live collaboration (10+ designers same file):

  • Each shape has unique ID
  • Properties (color, position) are LWW-Registers with HLC
  • Tree structure (parent-child) uses RGA-like ordering
  • Server is relay + state snapshot, not coordinator

Tham chiếu: Figma engineering blog — https://www.figma.com/blog/how-figmas-multiplayer-technology-works/

2.6.4 Notion’s blocks

Notion uses CRDT cho block-level editing:

  • Each block has UUID
  • Block tree = ordered list (RGA)
  • Block content = Y.Text
  • Sync via WebSocket + persistent server state

2.6.5 Apple Notes

iOS Notes uses CRDT cho cross-device sync:

  • Edit offline (CloudKit cache)
  • Sync when online → automatic merge
  • No “conflict resolution dialog” (unlike old Dropbox)

3. Estimation — CRDT Overhead

3.1 Memory overhead

CRDTMemory ratio (vs raw data)
G-Counter (5 replicas)5x (vector of N ints)
OR-Set2-3x (each element + tags)
LWW-Register1.1x (value + timestamp)
Y.Text5-10x (with tombstones), 2-3x with GC
Automerge document5-15x

Example: 100KB text document → Y.Text uses ~500KB memory (5x).

3.2 Network bandwidth

State-based: Send full state on sync (large) Operation-based: Send only ops (small per-op, ~50-200 bytes) Delta-state: Hybrid (send deltas of state, not full)

Yjs sync protocol:

  • State vector exchange: ~100 bytes
  • Update message: ~40 bytes/op + content
  • Compression (LZ4): 2-5x reduction

Bandwidth for 5 collaborators editing Y.Text:

  • ~1 KB/s/collaborator at active editing
  • Negligible during idle

3.3 Convergence time

For N replicas in fully-connected network:

  • Gossip protocol: O(log N) rounds for full convergence
  • WebSocket-based (Yjs): Sub-second for small docs, seconds for large
  • P2P (libp2p): variable, depends on topology

3.4 Garbage collection cost

Y.Text with 1M edit operations:

  • Without GC: 100MB+ memory, slow
  • With GC after 10K ops: 10MB memory, fast
  • GC pause: ~50-100ms per collection

4. Security First — CRDT Threats

4.1 Trust model

Default CRDT assumes all replicas are non-Byzantine (don’t lie). Attacker compromise 1 replica → can:

  • Inject fake operations with backdated timestamps (LWW exploit)
  • Generate huge tombstones to DoS storage
  • Resurrect deleted data

Mitigation:

  • Authenticated CRDT operations: HMAC sign each op với key per-user
  • Op validation: Reject ops with timestamps > now() + drift
  • Quotas: Limit per-user operations/min

4.2 LWW timestamp attacks

Attack: Attacker writes ("malicious_value", 9999999999) → wins LWW forever.

Mitigation:

  • Use server-issued timestamps (not client-generated)
  • HLC (Hybrid Logical Clock) bounded by server time
  • Reject timestamps > now() + reasonable drift (5 minutes)

4.3 Tombstone DoS

Attack: Attacker spam add/remove pairs → tombstones explode.

Mitigation:

  • Per-user op rate limits
  • GC tombstones aggressively
  • Monitor state size; alert if grow unexpectedly

4.4 Replay attacks

Attack: Capture op, replay later → cause unintended state.

Mitigation:

  • Each op has unique nonce
  • Replicas track seen nonces (per-user, per-time-window)
  • HMAC signature includes timestamp → bounded replay window

4.5 Encryption

End-to-end encrypted CRDTs:

  • Difficult — server can’t validate ops without seeing content
  • Solutions: ZK-CRDT (research), trust client validation
  • Y.js has experiment with encrypted updates

5. DevOps — Vận hành CRDT systems

5.1 Yjs server với y-websocket

// server.js
const WebSocket = require('ws')
const http = require('http')
const { setupWSConnection } = require('y-websocket/bin/utils')
 
const server = http.createServer()
const wss = new WebSocket.Server({ server })
 
wss.on('connection', (conn, req) => {
  setupWSConnection(conn, req, {
    docName: req.url.slice(1).split('?')[0],
    gc: true  // Enable garbage collection
  })
})
 
server.listen(1234, () => {
  console.log('Yjs WebSocket server on port 1234')
})

Persistence with LevelDB:

const { LeveldbPersistence } = require('y-leveldb')
const ldb = new LeveldbPersistence('./yjs-data')
 
setupWSConnection(conn, req, {
  docName: 'mydoc',
  persistence: ldb
})

5.2 Production deployment

# docker-compose.yml
version: "3.8"
 
services:
  yjs-server:
    build: .
    ports:
      - "1234:1234"
    volumes:
      - yjs-data:/app/yjs-data
    environment:
      NODE_ENV: production
    deploy:
      replicas: 3  # multiple instances
      restart_policy:
        condition: on-failure
 
  redis:
    image: redis:7
    # Use Redis pub/sub for cross-instance sync
    ports:
      - "6379:6379"
 
volumes:
  yjs-data:

Multi-instance challenge: y-websocket per-instance state. Need:

  • Sticky sessions (load balancer route same doc to same instance), OR
  • Redis pub/sub to broadcast updates between instances, OR
  • CRDT-as-Service (e.g., Cloudflare Durable Objects)

5.3 Monitoring

Key metrics:

  • Active connections per doc
  • State size per doc (memory)
  • Update frequency (ops/sec/doc)
  • GC efficiency (tombstones cleaned)
  • Sync latency (op → broadcast → ack)

Prometheus alerts:

- alert: YjsHighMemoryPerDoc
  expr: yjs_doc_state_size_bytes > 50000000  # 50 MB
  annotations:
    summary: "Doc {{ $labels.doc }} state is {{ $value | humanize1024 }}"
 
- alert: YjsHighOpRate
  expr: rate(yjs_ops_total[5m]) > 10000
  annotations:
    summary: "Suspicious op rate {{ $value }} ops/s — possible attack"
 
- alert: YjsTombstoneAccumulation
  expr: yjs_tombstone_count > 100000
  annotations:
    summary: "Run GC: {{ $value }} tombstones in {{ $labels.doc }}"

5.4 Backup & Recovery

State snapshots: Periodically dump full state.

// Backup
const state = Y.encodeStateAsUpdate(ydoc)
fs.writeFileSync('backup.bin', state)
 
// Restore
const state = fs.readFileSync('backup.bin')
const newDoc = new Y.Doc()
Y.applyUpdate(newDoc, state)

Frequency: Hourly snapshots + continuous WAL.

5.5 Sharding

Một Yjs doc lớn → load balance via:

  • Per-doc sharding: hash(doc_id) → instance
  • Sticky sessions: ensure same doc on same instance
  • Hot doc replication: replicate hot docs to N instances

6. Code Implementation

6.1 G-Counter từ scratch

"""
G-Counter (state-based CRDT) — Python implementation
"""
 
class GCounter:
    def __init__(self, num_replicas: int, my_id: int):
        if my_id < 0 or my_id >= num_replicas:
            raise ValueError("my_id out of range")
        self.counts = [0] * num_replicas
        self.my_id = my_id
        self.num_replicas = num_replicas
 
    def increment(self, n: int = 1):
        if n < 0:
            raise ValueError("G-Counter only supports increment")
        self.counts[self.my_id] += n
 
    def value(self) -> int:
        return sum(self.counts)
 
    def merge(self, other: "GCounter"):
        if self.num_replicas != other.num_replicas:
            raise ValueError("Replica count mismatch")
        for i in range(self.num_replicas):
            self.counts[i] = max(self.counts[i], other.counts[i])
 
    def __repr__(self):
        return f"GCounter({self.counts})"
 
 
# Demo: 3 replicas, partition then heal
def demo_gcounter():
    print("=== G-Counter Demo ===")
    r1 = GCounter(3, 0)
    r2 = GCounter(3, 1)
    r3 = GCounter(3, 2)
 
    # Phase 1: each replica increments independently
    r1.increment(5)
    r2.increment(3)
    r3.increment(7)
 
    print(f"R1: {r1}, value={r1.value()}")
    print(f"R2: {r2}, value={r2.value()}")
    print(f"R3: {r3}, value={r3.value()}")
 
    # Phase 2: gossip merge (any order)
    r1.merge(r2)
    r3.merge(r1)
    r2.merge(r3)
    r1.merge(r3)
 
    # All should converge
    print(f"\nAfter sync:")
    print(f"R1: {r1}, value={r1.value()}")
    print(f"R2: {r2}, value={r2.value()}")
    print(f"R3: {r3}, value={r3.value()}")
 
    assert r1.value() == r2.value() == r3.value() == 15
    print("✓ Converged to 15")
 
 
demo_gcounter()

6.2 OR-Set implementation

"""
OR-Set (Observed-Remove Set) — supports concurrent add+remove correctly.
"""
 
import uuid
from typing import Any
 
 
class ORSet:
    def __init__(self):
        # element → set of unique tags (each add creates new tag)
        self._adds: dict[Any, set[str]] = {}
        # element → set of removed tags
        self._removes: dict[Any, set[str]] = {}
 
    def add(self, x):
        tag = str(uuid.uuid4())
        self._adds.setdefault(x, set()).add(tag)
 
    def remove(self, x):
        # Tombstone all currently observed tags
        observed = self._adds.get(x, set()).copy()
        if observed:
            self._removes.setdefault(x, set()).update(observed)
 
    def contains(self, x) -> bool:
        active_tags = self._adds.get(x, set()) - self._removes.get(x, set())
        return len(active_tags) > 0
 
    def elements(self) -> set:
        return {x for x in self._adds if self.contains(x)}
 
    def merge(self, other: "ORSet"):
        for x, tags in other._adds.items():
            self._adds.setdefault(x, set()).update(tags)
        for x, tags in other._removes.items():
            self._removes.setdefault(x, set()).update(tags)
 
 
# Demo: concurrent add + remove
def demo_orset():
    print("\n=== OR-Set Demo ===")
 
    # Initial: A and B both have empty set
    a = ORSet()
    b = ORSet()
 
    # Both add "milk" (different tags!)
    a.add("milk")
    b.add("milk")
    print(f"A: {a.elements()}")
    print(f"B: {b.elements()}")
 
    # Concurrent: A removes milk, B adds bread + milk again
    a.remove("milk")
    b.add("bread")
    b.add("milk")  # Re-add with new tag
 
    print(f"\nBefore merge:")
    print(f"A: {a.elements()}")
    print(f"B: {b.elements()}")
 
    # Merge: A's remove only tombstones tags it observed
    # B's later add creates new unobserved tag → wins
    a.merge(b)
    b.merge(a)
 
    print(f"\nAfter merge:")
    print(f"A: {a.elements()}")
    print(f"B: {b.elements()}")
 
    # Both contain {"milk", "bread"} — A's remove was overridden by B's later add
    assert "milk" in a.elements() and "bread" in a.elements()
    print("✓ Converged with re-added 'milk' winning")
 
 
demo_orset()

6.3 LWW-Register với HLC

"""
LWW-Register with Hybrid Logical Clock (HLC) for safer timestamps.
"""
 
import time
from dataclasses import dataclass
 
 
@dataclass
class HLC:
    """Hybrid Logical Clock — physical + logical components."""
    physical: int  # ms
    logical: int
 
    def __lt__(self, other):
        if self.physical != other.physical:
            return self.physical < other.physical
        return self.logical < other.logical
 
 
class HLCClock:
    def __init__(self):
        self.last = HLC(int(time.time() * 1000), 0)
 
    def now(self) -> HLC:
        wall = int(time.time() * 1000)
        if wall > self.last.physical:
            self.last = HLC(wall, 0)
        else:
            self.last = HLC(self.last.physical, self.last.logical + 1)
        return HLC(self.last.physical, self.last.logical)
 
    def update(self, remote: HLC) -> HLC:
        wall = int(time.time() * 1000)
        max_phys = max(wall, self.last.physical, remote.physical)
 
        if max_phys == self.last.physical == remote.physical:
            new_logical = max(self.last.logical, remote.logical) + 1
        elif max_phys == self.last.physical:
            new_logical = self.last.logical + 1
        elif max_phys == remote.physical:
            new_logical = remote.logical + 1
        else:
            new_logical = 0
 
        self.last = HLC(max_phys, new_logical)
        return HLC(self.last.physical, self.last.logical)
 
 
class LWWRegister:
    def __init__(self, replica_id: str, clock: HLCClock):
        self.value = None
        self.timestamp: HLC = HLC(0, 0)
        self.replica_id = replica_id
        self.clock = clock
 
    def write(self, value):
        self.value = value
        self.timestamp = self.clock.now()
 
    def merge(self, other: "LWWRegister"):
        if other.timestamp > self.timestamp or (
            other.timestamp == self.timestamp
            and other.replica_id > self.replica_id  # tie-break
        ):
            self.value = other.value
            self.timestamp = other.timestamp
        self.clock.update(other.timestamp)
 
 
def demo_lww():
    print("\n=== LWW-Register Demo ===")
    clock_a = HLCClock()
    clock_b = HLCClock()
 
    a = LWWRegister("A", clock_a)
    b = LWWRegister("B", clock_b)
 
    a.write("Alice's value")
    time.sleep(0.001)
    b.write("Bob's value")
 
    print(f"A: value={a.value}, ts={a.timestamp}")
    print(f"B: value={b.value}, ts={b.timestamp}")
 
    a.merge(b)
    b.merge(a)
 
    print(f"\nAfter merge:")
    print(f"A: {a.value}")
    print(f"B: {b.value}")
    assert a.value == b.value
    print("✓ Converged")
 
 
demo_lww()

6.4 Yjs với React (Frontend)

import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
import { useEffect, useState } from 'react'
 
function CollaborativeEditor() {
  const [text, setText] = useState('')
  const [ydoc] = useState(() => new Y.Doc())
  const [ytext] = useState(() => ydoc.getText('content'))
 
  useEffect(() => {
    const provider = new WebsocketProvider('wss://server.example.com', 'doc-id', ydoc)
 
    const observer = () => setText(ytext.toString())
    ytext.observe(observer)
 
    return () => {
      ytext.unobserve(observer)
      provider.destroy()
    }
  }, [])
 
  const handleChange = (e) => {
    const newText = e.target.value
    // Calculate diff and apply to Y.Text
    ydoc.transact(() => {
      ytext.delete(0, ytext.length)
      ytext.insert(0, newText)
    })
  }
 
  return (
    <textarea
      value={text}
      onChange={handleChange}
      style={{ width: '100%', height: '400px' }}
    />
  )
}

7. System Design Diagrams

7.1 CRDT vs Consensus

flowchart LR
    subgraph Consensus["Consensus (Raft)"]
        C1[Leader] -.write.-> C2[Follower 1]
        C1 -.write.-> C3[Follower 2]
        C4[Client] --> C1
        Note1["Cần leader<br/>Latency = quorum RTT<br/>Strong consistency"]
    end

    subgraph CRDT["CRDT (Eventually Convergent)"]
        D1[Replica 1] <-->|gossip| D2[Replica 2]
        D2 <-->|gossip| D3[Replica 3]
        D1 <-->|gossip| D3
        D4[Client] --> D1
        Note2["No leader<br/>Local writes immediate<br/>Eventual consistency"]
    end

    style Note1 fill:#fff9c4
    style Note2 fill:#c8e6c9

7.2 OR-Set Concurrent Add + Remove

sequenceDiagram
    participant A as Replica A
    participant B as Replica B

    Note over A,B: Initial: both have {} (empty)

    A->>A: add("milk")<br/>tag=t1
    B->>B: add("milk")<br/>tag=t2

    Note over A: state.adds = {milk: {t1}}
    Note over B: state.adds = {milk: {t2}}

    A->>A: remove("milk")
    Note over A: state.removes = {milk: {t1}}<br/>(only tombstones t1, doesn't see t2)

    B->>B: add("bread")<br/>tag=t3
    Note over B: state.adds = {milk: {t2}, bread: {t3}}

    A->B: sync
    B->A: sync

    Note over A,B: After merge:<br/>adds = {milk: {t1, t2}, bread: {t3}}<br/>removes = {milk: {t1}}<br/>active = {milk: {t2}, bread: {t3}}<br/>→ {milk, bread}

7.3 Yjs Architecture

flowchart TB
    subgraph Client1["Client 1 (Browser)"]
        Y1[Y.Doc]
        Y1text[Y.Text 'content']
        Y1 --> Y1text
    end

    subgraph Client2["Client 2 (Browser)"]
        Y2[Y.Doc]
        Y2text[Y.Text 'content']
        Y2 --> Y2text
    end

    subgraph Server["Yjs WebSocket Server"]
        WS[y-websocket relay]
        Persistence[(LevelDB / Redis)]
    end

    Client1 <-->|"WebSocket<br/>updates"| WS
    Client2 <-->|"WebSocket<br/>updates"| WS
    WS --> Persistence

    Note["Server is RELAY only<br/>Not a coordinator<br/>State converges via CRDT merge"]

    style Note fill:#fff9c4

7.4 G-Counter Convergence

flowchart TD
    subgraph Initial["Initial State"]
        R1a["R1: [0,0,0]"]
        R2a["R2: [0,0,0]"]
        R3a["R3: [0,0,0]"]
    end

    subgraph Independent["Independent increments"]
        R1b["R1: [5,0,0]<br/>+5"]
        R2b["R2: [0,3,0]<br/>+3"]
        R3b["R3: [0,0,7]<br/>+7"]
    end

    subgraph Sync1["Sync R1↔R2"]
        R1c["R1: [5,3,0]<br/>value=8"]
        R2c["R2: [5,3,0]<br/>value=8"]
    end

    subgraph Sync2["Sync all"]
        R1d["R1: [5,3,7]<br/>value=15 ✓"]
        R2d["R2: [5,3,7]<br/>value=15 ✓"]
        R3d["R3: [5,3,7]<br/>value=15 ✓"]
    end

    Initial --> Independent
    Independent --> Sync1
    Sync1 --> Sync2

    style R1d fill:#c8e6c9
    style R2d fill:#c8e6c9
    style R3d fill:#c8e6c9

8. Aha Moments & Pitfalls

Aha Moments

#1: CRDT trade off expressiveness for automatic merge. Bạn không thể CRDT mọi thứ — chỉ specific shapes (counter, set, register, sequence). Nhưng cho những shape đó, bạn được automatic merge mà không cần coordination.

#2: 3 properties (commutative, associative, idempotent) là magic. Với 3 properties này, bất kể order nào em merge, kết quả đều giống. Đây là math foundation của CRDTs.

#3: Tombstones là cái giá phải trả. Để support remove + concurrent re-add, OR-Set tích lũy tombstones. State grows. GC giúp nhưng có trade-off (lose history).

#4: CRDT không thay thế consensus. CRDT cho convergence, không cho invariants. Cần “balance ≥ 0” → vẫn cần Raft. Nhưng cho “shopping cart items”, “page view counter”, “collaborative text” → CRDT > Raft (no leader, no coordination).

#5: Server có thể là pure relay. Yjs server không “decide” anything — chỉ relay updates và persist state. CRDT logic ở client. Khác với traditional server where server arbitrates.

#6: Local-first = CRDT-first. Mọi local-first software (Apple Notes, Obsidian Sync, Linear) underlie CRDTs. Edit offline, sync khi online → automatic merge, no conflict dialogs.

#7: HLC > LWW timestamps. Hybrid Logical Clock fixed size (16 bytes), gần với wall clock (debug dễ), nhưng đảm bảo causality. CockroachDB, YugabyteDB, MongoDB cluster time đều dùng HLC.

Pitfalls

Pitfall 1: Dùng CRDT cho invariant

Sai: Wallet balance là PN-Counter → 2 user concurrent withdraw → balance < 0. Đúng: Invariant cần consensus (Raft) hoặc escrow tokens.

Pitfall 2: LWW cho collaborative editing

Sai: Profile field = LWW, OK. Nhưng dùng LWW cho text → mất paragraph khi 2 user edit cùng lúc. Đúng: Text dùng RGA / Y.Text. LWW chỉ cho atomic single-value updates.

Pitfall 3: Quên garbage collection

Sai: Yjs document grow vô tận → 1 năm sau, 100MB cho 1KB visible text. Đúng: Enable GC trong y-websocket. Periodic state snapshot + clear history.

Pitfall 4: Client-generated timestamps

Sai: Client clock có thể skew → timestamp future → wins LWW forever. Đúng: Server-issued timestamps hoặc HLC bounded by server time.

Pitfall 5: 2P-Set (can’t re-add)

Sai: Use 2P-Set → “user removed friend X, can never add X back”. Đúng: Use OR-Set. 2P-Set chỉ cho true “irrevocable” state.

Pitfall 6: Partition + tombstone DoS

Sai: Replica isolated 30 ngày → return → state diff 100MB → bandwidth + memory crash. Đúng: Limit max state size. Forced GC. Anti-entropy throttling.

Pitfall 7: Vector clock scalability

Sai: 1000 replicas → vector clock 1000 entries per op → bandwidth explosion. Đúng: Use HLC (fixed size) hoặc dotted version vector. Hoặc shard CRDT to limit replica set.

Pitfall 8: Nghĩ Yjs/Automerge là silver bullet

Sai: Migrate everything to Yjs. Performance regress với large docs (>10MB). Đúng: CRDT phù hợp cho real-time collab, offline-first. Không phù hợp cho heavy mutation, large monolithic docs.

Pitfall 9: No backup

Sai: Yjs in-memory → server restart → lose everything. Đúng: Persistent layer (LevelDB, Redis, S3 snapshots).

Pitfall 10: Multi-instance không sync

Sai: 3 Yjs server instances behind LB → user A on instance 1, user B on instance 2 → not sync. Đúng: Sticky sessions OR Redis pub/sub broadcast OR Cloudflare Durable Objects (per-doc instance).


TopicLiên hệ
Tuan-Bonus-Consistency-Models-IsolationCRDT = một dạng eventual consistency với automatic merge
Tuan-Bonus-Consensus-Raft-PaxosRaft cho strong consistency; CRDT cho high availability
Tuan-07-Database-Sharding-ReplicationMulti-master replication → CRDT khi không thể coordinate
Tuan-20-Design-Key-Value-StoreDynamo dùng vector clocks; tương tự CRDTs
Tuan-17-Design-Chat-SystemChat presence, typing indicators có thể dùng CRDT
Case-Design-Nearby-FriendsReal-time location update — CRDT cho LWW position
Case-Design-Google-DriveCollaborative editing → CRDT (Yjs/OT)

Tham khảo

Foundational papers:

  • Shapiro et al., A comprehensive study of Convergent and Commutative Replicated Data Types (Inria 2011) — https://hal.inria.fr/inria-00555588/document
  • Bieniusa et al., An Optimized Conflict-free Replicated Set (2012) — OR-Set
  • Kleppmann & Beresford, A Conflict-Free Replicated JSON Datatype (2017) — Automerge foundation
  • Roh et al., Replicated abstract data types (2011) — CmRDT vs CvRDT formal

Books & resources:

Production systems:

Engineering blogs:

Courses & talks:


Hoàn thành toàn bộ Batch bonus chapters: Consensus → Consistency Models → Outbox → CRDTs.

Quay lại Tuan-20-Design-Key-Value-Store với hiểu biết sâu hơn về vector clock + CRDT để áp dụng vào DynamoDB-style design.