Foundations: Mathematics for System Architects

“Architect không cần là Math PhD. Nhưng phải hiểu: tại sao vector similarity dùng cosine, sao A/B test cần 5000 users, sao Bloom filter false positive rate, sao zstd compress 5x với 1MB block, sao CRDT cần lattice property, sao consistent hashing dùng modular arithmetic. Đây không phải toán học cao cấp — đây là discrete math + probability + linear algebra phổ thông áp dụng vào engineering.”

Tags: cs-foundations mathematics probability linear-algebra fundamentals Student: Hieu (Backend Dev → Architect) Liên quan: Tuan-Bonus-LLM-Serving-Infrastructure · Tuan-Bonus-CRDTs-Conflict-Free-Data-Types · Tuan-Bonus-Progressive-Delivery


1. Context & Why

Tại sao Architect cần biết Math?

Topic system designMath underlying
Vector similarity (RAG, semantic search)Linear algebra: dot product, cosine
A/B test sample sizeStatistics: hypothesis testing
Bloom filter false positive rateProbability: birthday paradox
HyperLogLog cardinalityProbability: random hash
Compression ratioInformation theory: entropy
CRDT convergenceDiscrete math: lattice theory
Consistent hashingNumber theory: modular arithmetic
Erasure coding (S3)Linear algebra: Galois fields
Cache hit rate modelProbability: Markov chains
Capacity planningCalculus: derivatives, growth rates
Embedding dimensionLinear algebra: vector spaces
GPU performance (FLOPS)Linear algebra: matrix multiply

Key insight: Architect-level decisions thường có math model behind. Hiểu math = predict performance, choose right algorithm, debug edge cases.

Tham chiếu chính


2.1 Vectors & Spaces

Vector = list of numbers, represents point in N-dimensional space.

v = [3, 4]  → point in 2D space
w = [1, 0, 0, ..., 1]  → 1024-dim embedding

Embedding: ML transforms text/image/anything into N-dim vector.

  • BERT: 768 dim
  • OpenAI ada-002: 1536 dim
  • Custom: 64-2048 dim typical

2.2 Operations You Need

2.2.1 Dot Product

import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.dot(a, b)  # 1*4 + 2*5 + 3*6 = 32

Geometric meaning: How much vectors point same direction.

  • Positive: same direction
  • Zero: perpendicular (uncorrelated)
  • Negative: opposite

2.2.2 Vector Norm (Length)

np.linalg.norm([3, 4])  # 5 (Pythagoras)

2.2.3 Cosine Similarity

Most used in vector search:

def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
 
cosine_sim([1, 0], [1, 0])  # 1.0 (identical)
cosine_sim([1, 0], [0, 1])  # 0.0 (perpendicular)
cosine_sim([1, 0], [-1, 0]) # -1.0 (opposite)

Used by: All RAG systems, recommendation engines, deduplication.

Why cosine over Euclidean: Magnitude doesn’t matter for “meaning similarity” — only direction.

2.2.4 Euclidean Distance

np.linalg.norm(a - b)

When to use: When magnitudes matter (e.g., image pixels).

2.2.5 L1 vs L2 Norms

  • L1 (Manhattan): — robust to outliers
  • L2 (Euclidean): — smooth, common
  • L∞ (Chebyshev): — worst component

2.3 Matrices

Matrix = 2D grid of numbers. Linear transformation from N-dim to M-dim.

M = np.array([[1, 2],
              [3, 4]])

2.3.1 Matrix Multiplication

A = np.random.rand(100, 50)   # 100 rows × 50 cols
B = np.random.rand(50, 30)    # 50 rows × 30 cols
C = A @ B                      # 100 × 30

Cost: O(N³) for N×N matrices. Most compute in ML.

Why GPUs dominate: Matrix multiply parallelizes perfectly.

2.3.2 Identity, Inverse, Transpose

  • Identity: I, M @ I = M
  • Transpose: M^T (rows become columns)
  • Inverse: M @ M^-1 = I (only square, full-rank)

2.3.3 Eigenvalues / Eigenvectors

Special vectors v where M @ v = λ × v (just scaled by λ).

Used in: PCA (dimensionality reduction), PageRank, spectral clustering.

2.4 Application: Attention Mechanism (LLM)

Transformer attention is matrix multiplication:

Where:

  • Q (Query): N × d
  • K (Key): N × d
  • V (Value): N × d
  • Output: N × d

Why memory bandwidth bound: For each token decode, need to read K, V matrices for all previous tokens.

# Database has 1M document embeddings (1536-dim)
embeddings = np.random.rand(1_000_000, 1536)
 
# Query embedding
query = np.random.rand(1536)
 
# Compute cosine similarity to all
scores = embeddings @ query / (np.linalg.norm(embeddings, axis=1) * np.linalg.norm(query))
 
# Top 10
top_k = np.argsort(scores)[-10:]

Cost: 1.5B multiplications for 1M docs. → Why ANN (approximate nearest neighbor) like HNSW exists.


3. Probability & Statistics

3.1 Basics

Probability = number between 0 and 1.

Random variable X = outcome of random process (e.g., dice roll).

Expected value:

Variance:

3.2 Common Distributions

3.2.1 Uniform

All outcomes equally likely. .

Example: Random hash, dice roll.

3.2.2 Normal (Gaussian)

Bell curve. Defined by μ (mean) and σ (std dev).

Why important: Central Limit Theorem — sum of many random vars approaches normal.

Examples:

  • Network latency distribution (sometimes log-normal)
  • Measurement errors

3.2.3 Exponential

Time between events in Poisson process.

Example: Time between requests in stable load.

3.2.4 Poisson

Number of events in fixed interval.

Example: Requests per second arrival.

3.2.5 Power-law / Pareto

80/20 rule. Few big, many small.

Examples:

  • File sizes (most small, few huge)
  • Web page popularity
  • Network traffic per user

3.3 The Birthday Paradox

Question: How many people in room before 50% chance 2 share birthday?

Answer: 23 (not 183!).

Why: With N people, pairs, each pair 1/365 chance.

For N=23, P ≈ 50%.

Implications for engineering:

  • UUID collision: 128-bit space, but generations expected before collision.
  • Hash collision in 32-bit: ~65,000 inserts.
  • Bloom filters: Trade-off based on this.

3.4 Bloom Filters Math

Bloom filter: Probabilistic set, “definitely not in” or “probably in”.

Parameters:

  • m = bit array size
  • k = number of hash functions
  • n = items inserted

False positive rate:

Optimal k: .

Example: Want 1% FP rate for 1M items:

  • m ≈ 9.6M bits = 1.2 MB
  • k ≈ 7 hash functions
import math
 
def bloom_size(n, fp_rate):
    m = -(n * math.log(fp_rate)) / (math.log(2) ** 2)
    k = (m / n) * math.log(2)
    return int(m), int(k)
 
m, k = bloom_size(1_000_000, 0.01)
print(f"m={m} bits ({m/8/1024:.1f} KB), k={k}")

Used by: Cassandra (per-SSTable), Bitcoin SPV, Chrome Safe Browsing, RocksDB.

3.5 HyperLogLog — Cardinality Estimation

Problem: Count distinct in stream without storing all items.

HLL idea:

  • Hash each item
  • Track longest leading-zeros run in any hash
  • Cardinality estimate = 2^max_zeros (loose math)

Practical: 16KB → estimate cardinality of billions with ~2% error.

Used by: Redis (PFCOUNT), BigQuery APPROX_COUNT_DISTINCT, Druid.

3.6 A/B Testing & Hypothesis Testing

Question: Is variant B better than A? (or random noise?)

3.6.1 Null hypothesis

H0: A = B (no difference) H1: A ≠ B

Goal: Reject H0 with confidence.

3.6.2 P-value

P(observe ≥ this difference | H0 true).

  • p < 0.05 → significant (5% false positive rate)
  • p < 0.01 → highly significant

3.6.3 Sample size formula

For 80% power, 5% significance, detect 5% lift:

Practical: ~10K-100K samples per arm typical for web A/B test.

3.6.4 Common pitfalls

  • Peeking: Stop test early when “significant” → false positives.
  • Multiple comparisons: Test 20 metrics, 1 will be “significant” by chance.
  • Sample ratio mismatch: 50/50 split actually 51/49 → statistical issue.

Tools: Statsig, LaunchDarkly, Eppo handle all this.

3.7 Markov Chains (briefly)

System with states + transition probabilities.

Stationary distribution: Long-run probabilities.

Used in:

  • PageRank (Google)
  • Caching strategies (LRU vs LFU)
  • Queue analysis

3.8 Queueing Theory

Little’s Law:

  • L = average requests in system
  • λ = arrival rate
  • W = average wait time

Practical: 1000 req/s, 100ms avg latency → 100 concurrent requests.

M/M/1 queue: Single server, exponential arrivals.

When utilization → 100%, latency → ∞. Why 80% utilization is sweet spot.

Universal Scalability Law (USL, Neil Gunther):

  • α: contention
  • β: coherence

Predicts: Performance ceiling, knee of curve. Why scaling 100x cores ≠ 100x throughput.

3.9 Power Laws & Pareto in Systems

80/20 rule appears everywhere:

  • 20% users → 80% traffic
  • 20% files → 80% accesses
  • 20% bugs → 80% incidents

Implications:

  • Cache top 20% → 80% hit rate
  • Optimize hot path
  • Per-tenant isolation matters (one user = noisy neighbor)

4. Discrete Mathematics

4.1 Set Theory

Set: Collection of unique elements.

  • A ∪ B (union)
  • A ∩ B (intersection)
  • A − B (difference)
  • A × B (Cartesian product)

Database: Tables are sets, JOIN is intersection-like, UNION = set union.

4.2 Graph Theory

Graph G = (V, E): vertices + edges.

Types:

  • Directed vs undirected
  • Weighted vs unweighted
  • Cyclic vs acyclic (DAG)
  • Connected vs disconnected

Used in system design:

  • Service dependency graphs
  • Database query plans (DAG)
  • Workflow systems (Airflow DAGs)
  • Saga compensations
  • Network topology

Algorithms:

  • BFS/DFS — connectivity
  • Dijkstra — shortest path (network routing)
  • Topological sort — task scheduling
  • Min-cut/max-flow — capacity planning

4.3 Lattices (cho CRDT)

Lattice = partially ordered set with join (∨) operation.

Properties:

  • Commutative: a ∨ b = b ∨ a
  • Associative: (a ∨ b) ∨ c = a ∨ (b ∨ c)
  • Idempotent: a ∨ a = a

These are the same 3 properties CvRDT requires!

Examples:

  • Natural numbers with max: max(a,b)
  • Sets with union: a ∪ b
  • Boolean with OR: a ∨ b

→ This is why G-Counter (max), OR-Set (union) work as CRDTs.

Tham chiếu: Tuan-Bonus-CRDTs-Conflict-Free-Data-Types.

4.4 Modular Arithmetic

a mod n = remainder when a divided by n.

Used in:

  • Hashing (hash(key) % N for shard)
  • Consistent hashing (mod 2^32)
  • Crypto (RSA mod p×q)
  • Round-robin scheduling

Properties:

  • (a + b) mod n = ((a mod n) + (b mod n)) mod n
  • Distributive: simplifies large number computation

4.5 Combinatorics

Permutations: ordering matters. P(n,k) = n!/(n-k)! Combinations: ordering doesn’t matter. C(n,k) = n!/(k!(n-k)!)

Used in:

  • Shuffle sharding (C(N, k) combinations)
  • Test case enumeration
  • Hash function analysis

Stirling’s approximation: for large n.


5. Information Theory

5.1 Entropy

Information content of message:

Unit: bits.

Examples:

  • Fair coin: 1 bit per flip
  • Biased coin (90% heads): 0.47 bits
  • 256 equally likely chars: 8 bits

Property: Maximum entropy when uniform distribution.

5.2 Compression Theory

Shannon’s source coding theorem: No lossless compressor can do better than entropy.

Why text compresses well: English entropy ~1-2 bits/char vs naive 8 bits/char → 4-8x compression possible.

Why random data doesn’t compress: Already maximum entropy.

5.3 Compression Algorithms

AlgorithmTypeUsed by
HuffmanStatic, optimal for fixed distgzip part
LZ77 / LZ78Dictionary, sliding windowzip, gzip
DEFLATELZ77 + Huffmangzip, zlib, PNG
LZ4Speed-optimized LZ77High-throughput
Snappy (Google)Speed-optimizedBig data
Zstd (Facebook)Modern, balancedBest general purpose
Brotli (Google)HTTP-optimizedWeb

Performance (typical):

AlgoSpeedRatio
LZ4500 MB/s2x
Snappy250 MB/s2x
zstd-1200 MB/s3x
zstd-195 MB/s5x
gzip-930 MB/s4x

5.4 Erasure Coding

Problem: Replication 3x = 200% storage overhead. Can we be more efficient?

Erasure coding (Reed-Solomon):

  • Split data into K chunks
  • Add M parity chunks
  • Need any K of K+M to reconstruct

Example: K=10, M=4 (Reed-Solomon 10+4)

  • Storage overhead: 40% (vs 200% for 3-way replication)
  • Tolerates loss of 4 chunks

Used by: S3 (~1.5x overhead), Hadoop EC, Ceph, MinIO.

Math foundation: Polynomial interpolation in Galois fields.

5.5 Hash Functions Theory

Cryptographic hash properties:

  • Preimage resistance: Given h, hard to find x where hash(x) = h
  • Second preimage: Given x, hard to find x’ where hash(x’) = hash(x)
  • Collision resistance: Hard to find any x, x’ where hash(x) = hash(x’)

Birthday attack: Collision in 2^(n/2) tries (not 2^n).

  • SHA-1 (160-bit): 2^80 birthday ≈ broken
  • SHA-256: 2^128 still strong

Non-cryptographic (faster, no security):

  • MurmurHash3, xxHash, CityHash
  • Used for: hash tables, consistent hashing, Bloom filter

6. Calculus (briefly)

6.1 Derivatives — Rate of Change

For f(x), derivative f’(x) = how fast f changes.

Used in:

  • Gradient descent (ML training)
  • Performance optimization (find peak)
  • Capacity planning (growth curves)

6.2 Optimization

Gradient descent:

Iteratively move opposite to gradient → minimum.

ML training = gradient descent over loss function.

6.3 Big-O & Asymptotic Behavior

Growth
O(1)Constant
O(log N)Logarithmic
O(N)Linear
O(N log N)Sorting
O(N²)Naive
O(N³)Matrix multiply naive
O(2^N)Exponential (NP-hard)

Limit notation: behavior.


7. Practical Applications in System Design

7.1 Vector search complexity

Naive cosine search: O(N×D) per query (D = dim).

ANN (HNSW): O(log N) typical, with build cost.

Why important: 1B vectors × 1536 dim naive = infeasible. ANN makes it possible.

7.2 Capacity estimation

Daily users: 100M
Sessions per user: 10
Events per session: 50
Event size: 1 KB

Throughput = 100M × 10 × 50 / 86400s ≈ 580K events/s
Storage/day = 100M × 10 × 50 × 1KB = 50 TB/day

Apply Pareto: Peak = 3-5x average. Apply queueing: Plan for 50-70% utilization.

7.3 Cache sizing

Power law: 20% items → 80% traffic.

Cache top 20% items: 80% hit rate. Cache top 5% items: ~50% hit rate (Zipf distribution).

→ Small cache can be very effective if access pattern skewed.

7.4 SLO & error budgets

99.9% SLO = 0.1% errors allowed = 43.2 minutes/month.

Error budget:

Burn rate alerting: Alert when consuming budget too fast.

  • 1-hour fast burn: consuming 14.4× rate → 4-day alert
  • 6-hour slow burn: consuming 2.4× rate

7.5 Reliability math

Series: System works if all components work.

10 services × 99.9% each = 99% overall (87.6 hours/year downtime!)

Parallel: System works if any component works.

3 replicas × 99% each = 99.9999% overall.

Implication: Avoid serial dependencies, embrace redundancy.

7.6 N-modular redundancy

For consensus: need majority. With N nodes tolerating f failures:

  • 3 nodes tolerate 1 failure
  • 5 nodes tolerate 2 failures
  • 7 nodes tolerate 3 failures

→ Why Raft/Paxos use odd N. Tham chiếu Tuan-Bonus-Consensus-Raft-Paxos.


8. Code Examples

8.1 Cosine similarity

import numpy as np
 
def cosine_similarity(a, b):
    """Range [-1, 1], 1 = identical direction."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
 
 
# Batch: 1 query vs N docs
def batch_cosine(query, docs):
    """docs: (N, D), query: (D,)"""
    norm_docs = np.linalg.norm(docs, axis=1)
    norm_q = np.linalg.norm(query)
    return (docs @ query) / (norm_docs * norm_q)
 
 
# Example: semantic search
docs = np.random.rand(10000, 768)
query = np.random.rand(768)
scores = batch_cosine(query, docs)
top10 = np.argsort(scores)[-10:][::-1]

8.2 Bloom filter

import mmh3
import math
 
class BloomFilter:
    def __init__(self, n_items, fp_rate):
        self.size = int(-(n_items * math.log(fp_rate)) / (math.log(2) ** 2))
        self.k = int((self.size / n_items) * math.log(2))
        self.bits = bytearray((self.size + 7) // 8)
 
    def _hashes(self, item):
        h1 = mmh3.hash(item, 0)
        h2 = mmh3.hash(item, 1)
        return [(h1 + i * h2) % self.size for i in range(self.k)]
 
    def add(self, item):
        for h in self._hashes(item):
            self.bits[h // 8] |= (1 << (h % 8))
 
    def __contains__(self, item):
        return all(
            self.bits[h // 8] & (1 << (h % 8))
            for h in self._hashes(item)
        )
 
 
bf = BloomFilter(n_items=10000, fp_rate=0.01)
bf.add("hello")
print("hello" in bf)  # True
print("world" in bf)  # False (probably)

8.3 Sample size for A/B test

import math
from scipy import stats
 
def sample_size(baseline, mde, alpha=0.05, power=0.8):
    """
    baseline: baseline conversion rate (e.g., 0.10)
    mde: minimum detectable effect (e.g., 0.01 for 1% absolute lift)
    alpha: significance level
    power: 1 - beta (typically 0.8)
    """
    z_alpha = stats.norm.ppf(1 - alpha / 2)
    z_beta = stats.norm.ppf(power)
 
    p1 = baseline
    p2 = baseline + mde
    p_avg = (p1 + p2) / 2
 
    n = ((z_alpha * math.sqrt(2 * p_avg * (1 - p_avg))
          + z_beta * math.sqrt(p1 * (1 - p1) + p2 * (1 - p2))) ** 2) / (mde ** 2)
 
    return math.ceil(n)
 
 
# Need to detect 1% lift on 10% baseline
n = sample_size(baseline=0.10, mde=0.01)
print(f"Need {n} samples per arm")
# Need 14744 samples per arm

8.4 Reliability calculator

def reliability_series(*reliabilities):
    """Reliability of system requiring all components."""
    r = 1.0
    for x in reliabilities:
        r *= x
    return r
 
 
def reliability_parallel(*reliabilities):
    """Reliability of system requiring any component."""
    failure_prob = 1.0
    for x in reliabilities:
        failure_prob *= (1 - x)
    return 1 - failure_prob
 
 
# 10 microservices in series, each 99.9%
print(f"Series 10×99.9%: {reliability_series(*[0.999]*10) * 100:.3f}%")
# 99.005%
 
# 3 replicas in parallel, each 99%
print(f"Parallel 3×99%: {reliability_parallel(*[0.99]*3) * 100:.4f}%")
# 99.9999%

8.5 HyperLogLog (simplified)

import mmh3
 
class HyperLogLog:
    def __init__(self, p=14):
        self.p = p
        self.m = 1 << p  # 16384 registers
        self.registers = [0] * self.m
 
    def add(self, item):
        h = mmh3.hash(item) & 0xFFFFFFFF
        idx = h >> (32 - self.p)
        rho = self._leading_zeros(h << self.p) + 1
        self.registers[idx] = max(self.registers[idx], rho)
 
    def _leading_zeros(self, x):
        if x == 0: return 32
        z = 0
        while x & 0x80000000 == 0:
            z += 1
            x <<= 1
        return z
 
    def count(self):
        alpha = 0.7213 / (1 + 1.079 / self.m)
        Z = sum(2 ** -r for r in self.registers)
        return int(alpha * self.m * self.m / Z)
 
 
hll = HyperLogLog()
for i in range(100000):
    hll.add(f"user_{i}")
print(f"Estimated: {hll.count()}")  # ~100K, ~1-2% error

9. System Design Diagrams

9.1 Vector Similarity

flowchart LR
    Query[Query embedding<br/>768-dim vector]
    DB[(Vector DB<br/>1M embeddings)]

    Query --> Compute[Compute cosine<br/>similarity]
    DB --> Compute

    Compute --> Sort[Sort by similarity]
    Sort --> TopK[Top-K results]

    Note["Naive: O(N·D)<br/>HNSW ANN: O(log N)"]

    style Note fill:#fff9c4

9.2 Bloom Filter

flowchart TB
    Item["Item: 'apple'"]

    Item --> H1[Hash 1: 312]
    Item --> H2[Hash 2: 1004]
    Item --> H3[Hash 3: 89]

    H1 --> BitArray
    H2 --> BitArray
    H3 --> BitArray

    subgraph BitArray["Bit array (size m=2048)"]
        Bits["...0010101...01010100...01010..."]
    end

    Lookup{"contains('apple')?"}
    Lookup -->|Check H1| Bits
    Lookup -->|Check H2| Bits
    Lookup -->|Check H3| Bits

    Result["All bits set → 'maybe in'<br/>Any bit unset → 'definitely not'"]

9.3 Reliability — Series vs Parallel

flowchart LR
    subgraph Series["Series: ALL must work"]
        S1[99.9%] --> S2[99.9%] --> S3[99.9%]
        SR[Result: 99.7%]
    end

    subgraph Parallel["Parallel: ANY must work"]
        P1[99%]
        P2[99%]
        P3[99%]
        PR[Result: 99.9999%]
    end

    style Series fill:#ffcdd2
    style Parallel fill:#c8e6c9

9.4 Universal Scalability Law

flowchart LR
    Throughput[Throughput]
    Cores[Cores N]

    subgraph USL["USL Curve"]
        Linear[Linear scaling<br/>ideal]
        Knee[Contention knee<br/>α causes saturation]
        Decline[Coherence decline<br/>β causes drop]
    end

    Cores --> Linear --> Knee --> Decline

    Note["X(N) = λN / (1 + α(N-1) + βN(N-1))<br/>Adding cores beyond knee<br/>= performance regression"]

    style Note fill:#fff9c4

10. Aha Moments & Pitfalls

Aha Moments

#1: Vectors + cosine = semantic search. Foundation of all RAG. Direction matters, not magnitude.

#2: Birthday paradox is everywhere. UUID collision, hash collision, all 2^(n/2) tries. SHA-1 → SHA-256 because of this.

#3: Bloom filter math is elegant. Trade memory for false positive rate. Cassandra, RocksDB use it.

#4: Lattice properties = CRDT properties. Math foundation: commutative, associative, idempotent. Same 3 things.

#5: 80/20 rule for design. Power law everywhere. Cache top 20%, optimize hot path.

#6: Series multiplies, parallel adds (1-R). 10 microservices × 99.9% = 99% (catastrophic). Embrace parallelism.

#7: Universal Scalability Law predicts decline. More cores → eventually slower. Profile, don’t assume.

#8: Information theory bounds compression. Random data can’t compress. English text compresses 4-8x.

Pitfalls

Pitfall 1: Forget normalize before cosine

Cosine sim works on unit vectors. Forget normalize → dot product behavior. Fix: Either normalize, or use cosine_similarity formula correctly.

Pitfall 2: 32-bit hash for big data

2^16 = 65K items → birthday collision. UUID v1 also collides. Fix: 128-bit (UUID v4), 256-bit (SHA-256) for security/uniqueness.

Pitfall 3: Peeking in A/B test

Stop test when “looks significant” → false positive rate explodes. Fix: Predetermine sample size. Or use sequential testing (Mann-Whitney).

Pitfall 4: Ignore queueing theory

“100 servers handle 100 req/s” → utilization 1 → infinite queue. Fix: Plan 50-70% utilization. Add capacity for burst.

Pitfall 5: Forget Big-O for big N

O(N²) fine for N=1000, dies at N=1M. Fix: Profile with realistic data sizes.

Pitfall 6: Linear scaling assumption

“10 cores = 10x throughput” → ignore Amdahl’s law, USL. Fix: Profile parallel speedup, identify contention points.

Pitfall 7: Misuse of Bloom filter

Used for “exists” check but app expects definitive yes/no. Fix: Bloom filter = “definitely not” or “probably yes”. Use for negative case (skip lookup).

Pitfall 8: Wrong distribution model

Assume normal distribution for latency → P99 way off. Fix: Latency often log-normal or Pareto. Use percentiles, not mean.

Pitfall 9: Sampling bias

A/B test on logged-in users only → bias. Fix: Random assignment at registration / first visit.

Pitfall 10: Compressing already-compressed data

Apply gzip on JPEG → minimal gain, CPU waste. Fix: Know data type. Random or compressed data ≈ entropy max.


TopicConnects to
Tuan-Bonus-LLM-Serving-InfrastructureLinear algebra, attention math, FLOPS
Tuan-Bonus-CRDTs-Conflict-Free-Data-TypesLattice theory, abstract algebra
Tuan-Bonus-Progressive-DeliveryA/B testing, sample size
Tuan-Bonus-Consistency-Models-IsolationProbability of conflicts
Tuan-02-Back-of-the-envelopeEstimation math
Tuan-10-Consistent-HashingModular arithmetic
Case-Design-S3-Object-StorageErasure coding (Galois fields)
Case-Design-Search-AutocompleteProbability, frequency analysis

Tham khảo

Books:

Online:

Courses:

  • MIT 18.06 Linear Algebra (Strang)
  • MIT 6.041 Probability
  • Stanford CS229 Machine Learning math reviews

Hoàn thành Phase B (5 files Foundations). Tiếp theo: Phase C — update References.md.