Foundations: Compilers, Virtual Machines & Memory Models

“Cloudflare Workers V8 isolate cold start <1ms? Đó là V8 (JS engine) khởi tạo isolate trong micro giây vì compile JIT-on-demand. Wasm Component Model? Đó là bytecode + capability sandbox. Java GC pause kill production? Đó là generational GC stop-the-world. Rust borrow checker? Đó là static type system phát hiện data race compile-time. Architect không hiểu PL/Compiler = không hiểu runtime characteristics của system mình build.”

Tags: cs-foundations compilers virtual-machines garbage-collection fundamentals Student: Hieu (Backend Dev → Architect) Liên quan: Tuan-Bonus-Edge-Wasm-Architecture · Tuan-Foundations-OS-Essentials · Tuan-Foundations-Computer-Architecture


1. Context & Why

Tại sao cần hiểu Compilers & VMs?

Architecture decisionPL/Compiler concept
Wasm Component ModelBytecode + capability sandbox
V8 isolate cold startJIT compilation, snapshot
Java/Go GC pausesGenerational GC, stop-the-world
Rust no GC, fastStatic analysis, ownership
Python GIL bottleneckInterpreter design
Erlang/BEAM hot reloadVM-level deployment
WebAssembly polyglotCommon bytecode target
Kubernetes operators in RustMemory safety + perf

Key insight: Choice of language = trade-off giữa performance, safety, productivity. Hiểu compiler internals → đưa ra decision đúng cho mỗi service.

Tham chiếu chính

  • Crafting Interpreters (Bob Nystrom, free) — https://craftinginterpreters.com/
  • Compilers: Principles, Techniques, and Tools (Dragon Book — Aho et al.)
  • Programming Language Pragmatics (Scott)
  • The Garbage Collection Handbook (Jones, Hosking, Moss)
  • Engineering a Compiler (Cooper & Torczon)

2. Deep Dive — Compilation Pipeline

2.1 Phases of Compilation

Source code
    │
    ▼
┌───────────────────────┐
│  Lexer (Tokenizer)    │  Source → Tokens
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Parser               │  Tokens → AST
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Semantic Analysis    │  Type checking, scope
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  IR (Intermediate     │  Architecture-independent
│  Representation)      │
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Optimization         │  Constant folding, dead code,
│                       │  inlining, vectorization
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Code Generation      │  IR → target machine code
└──────────┬────────────┘
           ▼
       Machine code
       (or Bytecode)

2.1.1 Lexer

Convert characters to tokens:

Input:  "let x = 42;"
Tokens: [LET, IDENT(x), EQUALS, NUMBER(42), SEMI]

2.1.2 Parser

Tokens to AST:

let x = 42 + y * 2;

       Assign
       /    \
      x      Add
            /    \
          42     Mul
                /   \
               y     2

Top-down (recursive descent) vs bottom-up (LR, LALR).

2.1.3 Type checking

let x: int = "hello";  // Error: string not assignable to int

Static type system: Catch errors at compile time. Type inference: Infer types from usage (Rust, Haskell, TypeScript).

2.2 Compilation Strategies

2.2.1 Ahead-of-Time (AOT)

Compile entire program before run.

gcc main.c -o main    # AOT compile
./main                 # Execute machine code directly

Examples: C, C++, Rust, Go (default), Swift.

Pros:

  • Fast startup (no compile at runtime)
  • Predictable performance
  • Whole-program optimization

Cons:

  • Slow build
  • Less runtime adaptability

2.2.2 Just-In-Time (JIT)

Compile during execution (typically in VM).

JS source
    ↓
V8 parses to AST
    ↓
Ignition (interpreter): runs bytecode
    ↓
TurboFan (JIT): hot functions → optimized machine code
    ↓
Deoptimize if assumptions wrong → back to interpreter

Examples: V8 (JS), JVM (HotSpot), .NET CLR, PyPy.

Pros:

  • Adaptive optimization (profile-guided)
  • Fast startup (interpret first)
  • Reoptimize based on runtime info

Cons:

  • Warmup time before peak perf
  • More memory (compiler in process)
  • Complexity

2.2.3 Interpretation

Execute AST or bytecode directly, no machine code.

# CPython:
source.py → bytecode (.pyc) → interpreter executes

Examples: CPython, Ruby (MRI), Bash, Lua.

Pros: Simplest, portable. Cons: 10-100x slower than compiled.

2.2.4 AOT + JIT (hybrid)

  • Java: JVM bytecode AOT, JIT to native at runtime
  • C# / .NET: IL AOT, JIT or AOT-compiled
  • Wasm: AOT to bytecode, AOT-compile or JIT in browser/runtime

2.3 Type Systems

2.3.1 Static vs Dynamic

Static (Java, Rust, Go)Dynamic (Python, JS, Ruby)
Type checkCompile timeRuntime
Errors caughtEarlyLate
Refactoring safetyHighLow
Productivity initialLowerHigher
Productivity at scaleHigherLower
PerformanceHigherLower

2.3.2 Strong vs Weak

Strong (Python, Java, Rust): no implicit conversions, rejects "3" + 4. Weak (JS, PHP): coerces, "3" + 4 = "34" (JS).

2.3.3 Nominal vs Structural

Nominal (Java, C#): Same name = same type.

class Point { int x, y; }
class Coord { int x, y; }  // Different from Point despite same shape

Structural (TypeScript, Go interfaces): Same shape = same type.

interface Point { x: number; y: number; }
interface Coord { x: number; y: number; }
// Compatible, can assign one to other

2.3.4 Type inference

let x = 42;          // Inferred: i32
let v = vec![1, 2];  // Inferred: Vec<i32>
 
fn add(a: i32, b: i32) -> i32 {
    a + b
}
let result = add(1, 2);  // Inferred: i32

Hindley-Milner: Foundational algorithm, used in Haskell, ML, OCaml.

2.4 Memory Management

3 main approaches: manual, garbage collected, ownership-based.

2.4.1 Manual

void* p = malloc(100);
// ... use p
free(p);  // Programmer responsibility

Pros: Predictable, no overhead. Cons: Memory leaks, use-after-free, double-free, buffer overflows.

2.4.2 Garbage Collection

VM tracks references, frees unreachable objects.

List<String> list = new ArrayList<>();  // Allocated on heap
list = null;  // Not freed yet, GC will collect

3 main GC algorithms:

Mark-Sweep
  1. Mark: Walk from roots (stack, globals), mark reachable objects
  2. Sweep: Free unmarked

Pros: Simple Cons: Stop-the-world, heap fragmentation

Mark-Compact

Mark-sweep + compact reachable objects to one end.

Pros: No fragmentation Cons: Even slower

Generational

Insight: Most objects die young.

Young Generation (Eden + Survivor)
  ↓ (objects survive several GCs)
Old Generation (long-lived)
  • Minor GC: Young gen only, frequent, fast
  • Major GC (Full GC): All gens, infrequent, slow

Pros: Fast for typical workload. Cons: Tuning complexity.

2.4.3 Modern GCs

G1 GC (Java 9+ default)
  • Region-based heap
  • Concurrent marking
  • Pauses: 10-200ms typical
ZGC (Java 11+, low-pause)
  • Concurrent everything
  • Pauses < 10ms even for huge heaps (TBs)
  • Trade-off: higher CPU/memory overhead
Shenandoah (Red Hat)

Similar to ZGC, low-pause.

Go GC (concurrent, mark-sweep)
  • Designed for low-latency
  • Concurrent marking
  • Sub-millisecond pauses (modern Go)
Python (reference counting + cycle detection)
  • Each object has refcount
  • When refcount → 0, free immediately
  • Periodic cycle detection for cycles
  • GIL prevents concurrent modifications

2.4.4 Ownership (Rust)

Compile-time memory management without GC.

fn main() {
    let s = String::from("hello");  // s owns
    take_ownership(s);
    // println!("{}", s);  // Error: s moved
}
 
fn take_ownership(s: String) {
    println!("{}", s);
    // s freed at end of scope
}

Borrow checker: Ensures no use-after-free, no data race, all at compile time.

Trade-off: Steeper learning curve, but no runtime overhead.

2.5 Bytecode & Virtual Machines

2.5.1 What is a VM?

Software-emulated machine. Executes bytecode instead of native code.

Source → Compile to bytecode → VM executes bytecode (interpret/JIT)

Examples:

  • JVM (Java, Kotlin, Scala, Clojure)
  • CLR (C#, F#, VB.NET)
  • V8 (JavaScript)
  • Wasm runtime (Wasmtime, V8, etc.)
  • CPython VM (Python)
  • Erlang BEAM (Erlang, Elixir)

2.5.2 Why VMs?

  • Portability: Same bytecode → run on any platform with VM
  • Safety: VM enforces type/memory checks
  • Optimization: Profile-guided JIT
  • Hot reload: Replace bytecode at runtime
  • Sandbox: Limit what code can do

2.5.3 JVM bytecode example

public int add(int a, int b) {
    return a + b;
}

Compiled bytecode:

iload_1     ; push a
iload_2     ; push b
iadd        ; add ints, push result
ireturn     ; return

Stack-based VM: Operations pop operands from stack.

2.5.4 Wasm bytecode

(module
  (func $add (param $a i32) (param $b i32) (result i32)
    local.get $a
    local.get $b
    i32.add)
  (export "add" (func $add)))

Wasm = stack-based bytecode + structured control flow + linear memory + types.

Designed for: Fast load, fast verify, fast execute.

2.5.5 V8 architecture (browser + Node.js + Cloudflare Workers)

JS source
    ↓
Parser → AST
    ↓
Ignition (interpreter): bytecode
    ↓ profile hot functions
TurboFan (optimizing JIT): optimized machine code
    ↓ deoptimize on bad assumption
Back to Ignition

Isolates: Lightweight V8 instances. Cloudflare Workers spawn isolate per request, share heap snapshot.

Cold start: V8 isolate creation ~5ms vs 100-1000ms for container.

2.6 JIT Optimizations

JIT compilers do amazing things runtime:

2.6.1 Inlining

function square(x) { return x * x; }
 
for (let i = 0; i < 1000; i++) {
    sum += square(i);
}
 
// JIT inlines:
for (let i = 0; i < 1000; i++) {
    sum += i * i;
}

2.6.2 Type specialization

function add(a, b) { return a + b; }
 
// JIT sees always called with int
// → Generate int-specialized version (fast)
// If suddenly called with string → deoptimize

2.6.3 Escape analysis

Allocate on stack instead of heap if object doesn’t escape function.

public void foo() {
    Point p = new Point(1, 2);  // JIT may allocate on stack
    System.out.println(p.x + p.y);
    // p doesn't escape
}

2.6.4 Loop optimizations

  • Loop unrolling
  • Loop invariant code motion
  • Vectorization (SIMD)

2.7 Memory Model & Concurrency

Memory model = rules about how thread sees memory writes from other threads.

2.7.1 The problem

// Thread 1
x = 1;
y = 2;
 
// Thread 2
print(y);  // Could see 2 but x still 0?
print(x);

Without memory model: Compiler/CPU can reorder writes for performance.

2.7.2 Memory ordering

Java (since 1.5), C++11, Rust define memory models:

OrderGuarantee
RelaxedNo ordering — fastest
Acquire/ReleasePairwise sync
Sequential consistencyTotal global order — slowest
use std::sync::atomic::{AtomicI32, Ordering};
 
let x = AtomicI32::new(0);
x.store(1, Ordering::Release);  // Pairs with Acquire load elsewhere
let v = x.load(Ordering::Acquire);

2.7.3 Atomic operations

let counter = AtomicU64::new(0);
counter.fetch_add(1, Ordering::Relaxed);  // Hardware atomic

Hardware support: x86 LOCK prefix, ARM LDXR/STXR, etc.

2.7.4 Lock-free data structures

// Treiber stack: lock-free
let head: AtomicPtr<Node> = ...;
 
fn push(value: T) {
    let new_node = Box::new(Node { value, next: null });
    loop {
        let old_head = head.load(Acquire);
        new_node.next = old_head;
        if head.compare_exchange(old_head, new_node, Release, Relaxed).is_ok() {
            return;
        }
    }
}

CAS (Compare-And-Swap) is foundation. Hardware supports atomic CAS.

2.7.5 Common pitfalls

  • Word tearing: 64-bit value updated non-atomically on 32-bit platform
  • Volatile (Java) ≠ atomic: Just prevents reordering, doesn’t make compound atomic
  • Java synchronized keyword: Implicit memory barrier + mutex

2.8 Concurrency Models

2.8.1 Threads + locks (traditional)

synchronized void deposit(int amount) {
    balance += amount;
}

Pros: Direct, familiar Cons: Deadlock, race conditions, hard to reason

2.8.2 Actor model (Erlang, Akka)

% Each actor = process with mailbox
loop(State) ->
    receive
        {deposit, Amount} ->
            loop(State + Amount);
        {balance, From} ->
            From ! State,
            loop(State)
    end.

Pros: No shared state, isolated failures Cons: Different mental model

2.8.3 CSP — Communicating Sequential Processes (Go)

ch := make(chan int)
go func() {
    ch <- 42
}()
result := <-ch

Channels for communication, goroutines for concurrency.

2.8.4 Async/await (modern)

async def fetch_user(id):
    response = await http_get(f"/users/{id}")
    return response.json()

Cooperative scheduling: Function explicitly yields control.

2.8.5 STM — Software Transactional Memory (Clojure, Haskell)

atomically $ do
    bal <- readTVar balance
    writeTVar balance (bal + 100)

Optimistic concurrency: Transactions retry on conflict.

2.9 Sandboxing — Security via Compiler/VM

2.9.1 Levels of sandboxing

LevelExampleCost
OS processContainerHeavy (~10MB)
VM (gVisor)Cloud FunctionsMedium
Bytecode VM (Wasm)WorkersLight (~1ms cold)
Language VM (V8 isolate)WorkersLight
Native sandbox (seccomp)ContainerVery light

2.9.2 Wasm sandbox

  • No syscalls direct
  • Limited memory (linear memory)
  • No network/file by default
  • Capability-based: import what you need

2.9.3 Capability security

Code can only do what’s explicitly granted:

Component imports:
  - wasi:filesystem  (file operations)
  - wasi:http        (HTTP calls)

Code cannot:
  - Spawn process
  - Access network outside http
  - Read files outside provided dirs

Different from “trusted code” — by construction, not by trust.

2.10 Compiler Optimizations

2.10.1 Constant folding

int x = 1 + 2 + 3;
// Compile time: x = 6

2.10.2 Dead code elimination

if (false) {
    expensive_call();  // Removed
}

2.10.3 Common subexpression elimination

y = a + b + c;
z = a + b + d;
// → tmp = a + b; y = tmp + c; z = tmp + d;

2.10.4 Loop unrolling

for (int i = 0; i < 100; i++) sum += arr[i];
// →
for (int i = 0; i < 100; i += 4) {
    sum += arr[i] + arr[i+1] + arr[i+2] + arr[i+3];
}

2.10.5 Inlining

Replace function call with body.

2.10.6 Auto-vectorization

Compiler emits SIMD instructions automatically.

for (int i = 0; i < N; i++) c[i] = a[i] + b[i];
// Compiler emits: SIMD vectorized add (4-8 elements at once)

3. Practical Implications

3.1 Choosing language for backend

NeedRecommended
Maximum performance, no GCRust, C++
High throughput, simpleGo
Mature ecosystem, JVMJava, Kotlin
Productivity, OK perfPython, Ruby (small services)
Browser/JS reuseTypeScript / Node
Real-time, fault-tolerantErlang, Elixir
ML/data sciencePython
Data engineeringScala, Java (Spark)

3.2 GC tuning matters

Java GC tuning can change throughput 2-10x:

# Throughput (G1 GC, default)
java -XX:+UseG1GC -Xmx4g app.jar
 
# Low latency (ZGC)
java -XX:+UseZGC -Xmx16g app.jar
 
# Heap sizing
-Xms2g -Xmx2g  # Same min/max → no resizing
 
# Logging
-Xlog:gc*:file=gc.log

Common settings:

  • -XX:MaxGCPauseMillis=200 (G1 target)
  • -XX:+ParallelRefProcEnabled
  • -XX:+AlwaysPreTouch (zero out heap upfront)

3.3 V8 isolate cold start

Why Workers fast:

  1. V8 process pre-started
  2. Per request: spawn new isolate (~1ms)
  3. Run code (~ms)
  4. Destroy isolate

Compare:

  • AWS Lambda: 100-1000ms cold start (container init)
  • Lambda@Edge: 100-300ms
  • Workers: 5ms

3.4 Rust adoption in infrastructure

Why Rust everywhere 2020+:

  • Memory safety without GC
  • Performance ≈ C++
  • Zero-cost abstractions
  • No data race compile-time
  • Modern type system

Used by:

  • TiKV (distributed KV)
  • Cloudflare Pingora (HTTP proxy replaces Nginx)
  • Apache Polaris (Iceberg REST catalog)
  • Scylla (planned Rust port)
  • Many sidecar/proxies

4. Performance & Profiling

4.1 JVM profiling

# Java Flight Recorder (free since Java 11)
java -XX:StartFlightRecording=duration=60s,filename=profile.jfr ...
 
# Async profiler (lower overhead)
async-profiler -d 30 -e cpu -f flame.html <pid>

4.2 Go profiling

import _ "net/http/pprof"
 
go func() {
    log.Println(http.ListenAndServe(":6060", nil))
}()
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

4.3 V8 profiling

node --prof app.js
# Generates isolate-*.log
 
node --prof-process isolate-*.log > processed.txt

4.4 Common metrics

  • CPU samples: Where is time spent?
  • Allocation rate: Bytes/sec allocated
  • GC pauses: Distribution of pauses
  • Heap fragmentation: Used vs reserved

5. Practical Code Patterns

5.1 Avoid GC pressure

Allocate less, reuse more:

// BAD: allocate per call
void process(Request req) {
    List<String> tags = new ArrayList<>();
    // ...
}
 
// GOOD: reuse via thread-local
ThreadLocal<List<String>> tagsTL = ThreadLocal.withInitial(ArrayList::new);
 
void process(Request req) {
    List<String> tags = tagsTL.get();
    tags.clear();
    // ...
}

5.2 Lock-free patterns

// Bad: blocking
private int count = 0;
synchronized void increment() { count++; }
 
// Good: atomic
private AtomicInteger count = new AtomicInteger();
void increment() { count.incrementAndGet(); }

5.3 Avoid boxing

// BAD: boxes on every call
Map<Integer, Integer> map = new HashMap<>();
map.put(1, 2);  // Integer boxing
 
// GOOD: primitive collections (Eclipse Collections, Trove)
IntIntMap map = new IntIntHashMap();
map.put(1, 2);  // No boxing

6. Code Examples

6.1 Simple lexer in Python

import re
 
TOKEN_SPEC = [
    ('NUMBER',   r'\d+'),
    ('IDENT',    r'[a-zA-Z_]\w*'),
    ('PLUS',     r'\+'),
    ('MINUS',    r'-'),
    ('LPAREN',   r'\('),
    ('RPAREN',   r'\)'),
    ('SKIP',     r'\s+'),
]
 
def tokenize(text):
    pattern = '|'.join(f'(?P<{name}>{pat})' for name, pat in TOKEN_SPEC)
    for match in re.finditer(pattern, text):
        kind = match.lastgroup
        value = match.group()
        if kind != 'SKIP':
            yield (kind, value)
 
 
tokens = list(tokenize("3 + (4 * 2)"))
print(tokens)
# [('NUMBER', '3'), ('PLUS', '+'), ('LPAREN', '('),
#  ('NUMBER', '4'), ('IDENT', '*'), ('NUMBER', '2'), ('RPAREN', ')')]

6.2 Recursive descent parser

class Parser:
    def __init__(self, tokens):
        self.tokens = list(tokens)
        self.pos = 0
 
    def peek(self):
        return self.tokens[self.pos] if self.pos < len(self.tokens) else None
 
    def consume(self):
        tok = self.tokens[self.pos]
        self.pos += 1
        return tok
 
    def parse_expr(self):
        left = self.parse_term()
        while self.peek() and self.peek()[0] in ('PLUS', 'MINUS'):
            op = self.consume()
            right = self.parse_term()
            left = ('binop', op[0], left, right)
        return left
 
    def parse_term(self):
        # Simplified
        return self.parse_atom()
 
    def parse_atom(self):
        tok = self.consume()
        if tok[0] == 'NUMBER':
            return ('num', int(tok[1]))
        elif tok[0] == 'LPAREN':
            expr = self.parse_expr()
            self.consume()  # RPAREN
            return expr
 
 
tokens = tokenize("3 + 4 + 5")
ast = Parser(tokens).parse_expr()
print(ast)
# ('binop', 'PLUS', ('binop', 'PLUS', ('num', 3), ('num', 4)), ('num', 5))

6.3 Stack-based VM

class VM:
    def __init__(self):
        self.stack = []
 
    def execute(self, bytecode):
        for op in bytecode:
            if op[0] == 'PUSH':
                self.stack.append(op[1])
            elif op[0] == 'ADD':
                b = self.stack.pop()
                a = self.stack.pop()
                self.stack.append(a + b)
            elif op[0] == 'PRINT':
                print(self.stack.pop())
 
 
# Bytecode for: print(3 + 4)
program = [
    ('PUSH', 3),
    ('PUSH', 4),
    ('ADD',),
    ('PRINT',),
]
VM().execute(program)  # 7

6.4 Reference counting GC (simplified)

class RefCounted:
    def __init__(self):
        self.refcount = 1
 
    def acquire(self):
        self.refcount += 1
        return self
 
    def release(self):
        self.refcount -= 1
        if self.refcount == 0:
            self._destroy()
 
    def _destroy(self):
        # Free resources
        pass

7. System Design Diagrams

7.1 Compilation Pipeline

flowchart TB
    Source[Source code]
    Source --> Lex[Lexer<br/>tokens]
    Lex --> Parse[Parser<br/>AST]
    Parse --> TypeCheck[Type Check<br/>Semantic Analysis]
    TypeCheck --> IR[Intermediate<br/>Representation]
    IR --> Opt[Optimizer<br/>const fold, inline,<br/>vectorize]
    Opt --> CodeGen[Code Generator]

    CodeGen --> Native[Native Machine Code]
    CodeGen --> Bytecode[Bytecode<br/>JVM, Wasm]

    Bytecode --> VM[Virtual Machine<br/>Interpret or JIT]

    style Source fill:#bbdefb
    style Native fill:#c8e6c9
    style Bytecode fill:#fff9c4

7.2 V8 JIT Pipeline

flowchart LR
    JS[JS Source] --> Parse[Parser]
    Parse --> AST[AST]
    AST --> Igni[Ignition<br/>Interpreter]
    Igni --> Bytecode[Bytecode]
    Bytecode --> Hot{Hot function?}
    Hot -->|Yes| Turbo[TurboFan<br/>Optimizing JIT]
    Hot -->|No| Continue[Continue interpreting]
    Turbo --> Optimized[Optimized<br/>Native Code]
    Optimized --> Deopt{Assumption broken?}
    Deopt -->|Yes| Igni
    Deopt -->|No| Continue

7.3 GC Generations

flowchart LR
    subgraph Heap["JVM Heap"]
        Eden["Eden<br/>(Young)"]
        S0[Survivor 0]
        S1[Survivor 1]
        Old["Old Gen<br/>(Tenured)"]

        Eden -->|Minor GC, survives| S0
        S0 -->|Minor GC| S1
        S1 -->|Tenured after N| Old
    end

    GC{GC Trigger}
    GC -->|Eden full<br/>Minor GC| Eden
    GC -->|Old full<br/>Major GC| Old

7.4 Memory Models

flowchart LR
    subgraph Manual["Manual (C/C++)"]
        M1[malloc/free<br/>direct control]
        M2[Risk: leaks, UAF]
    end

    subgraph GC["GC (Java/Go/Python)"]
        G1[VM tracks refs<br/>auto-frees]
        G2[Pause times<br/>throughput cost]
    end

    subgraph Ownership["Ownership (Rust)"]
        O1[Compile-time<br/>borrow checker]
        O2[No GC<br/>no UAF<br/>steeper learning]
    end

    style Manual fill:#ffcdd2
    style GC fill:#fff9c4
    style Ownership fill:#c8e6c9

8. Aha Moments & Pitfalls

Aha Moments

#1: Compilers are translators with optimizers. Source → AST → IR → optimized → target. Each step opens optimization opportunities.

#2: JIT vs AOT trade-off. JIT adapts to runtime patterns. AOT predictable startup. Java’s HotSpot famously beats AOT in long-running workloads.

#3: GC vs ownership = different trade-offs. GC easier, runtime cost. Rust harder upfront, no runtime cost.

#4: Bytecode is portable + safe. Wasm runs in browser, server, edge — same binary. JVM bytecode is verified before exec.

#5: V8 isolate ≠ container. Isolate = lightweight V8 sandbox, ms cold start. Container = whole OS namespace, ~ms init.

#6: Memory models matter for concurrent code. Without proper ordering, you’ll see “impossible” bugs. Java/C++/Rust formalize.

#7: Lock-free is hard but rewarding. CAS-based stacks/queues 10-100x throughput vs mutex. Need expert.

#8: Capability security via compiler. Wasm Component Model enforces capabilities at type level. Stronger than runtime checks.

Pitfalls

Pitfall 1: Trusting type for runtime

Java type checks at compile, but runtime can have ClassCastException via reflection. Fix: Treat type system as primary defense, runtime checks as safety net.

Pitfall 2: GC pause shock

4GB heap, default G1, sudden 5-second pause → outage. Fix: Tune GC, monitor pauses, use ZGC for low-latency.

Pitfall 3: Memory leak in GC language

“GC frees everything” → wrong. Holds references in static maps, listener callbacks → leak. Fix: Profiler, weak references where appropriate.

Pitfall 4: Boxing causing GC pressure

Map<Integer, Integer> allocates Integer objects per entry → GC churn. Fix: Primitive collections (Eclipse Collections, Trove for Java).

Pitfall 5: synchronized everywhere

Coarse locks → contention. 4-core machine performance ≈ 1-core. Fix: Fine-grained locks, lock-free DS, immutable data.

Pitfall 6: JIT warmup problem

Production deployment: first requests slow because JIT not optimized yet. Fix: Warm up with synthetic load before serving real traffic.

Pitfall 7: Stop-the-world surprise

JVM full GC freezes app for seconds. K8s health probe fails → restart. Fix: Increase health probe timeout, tune GC.

Pitfall 8: Native call boundary

JNI/FFI calls 100x slower than internal. Crossing repeatedly → bottleneck. Fix: Batch calls, use direct buffers.

Pitfall 9: Reflection abuse

Class.forName(), dynamic proxies → JIT can’t optimize. Fix: Code generation at startup (e.g., MapStruct, Dagger).

Pitfall 10: Premature lock-free

Implement lock-free queue → 100 lines of subtle code → bugs. Fix: Use proven libs (java.util.concurrent, crossbeam in Rust).


TopicConnects to
Tuan-Bonus-Edge-Wasm-ArchitectureWasm Component Model, V8 isolates
Tuan-Foundations-OS-EssentialsProcess, thread, namespaces underlie VMs
Tuan-Foundations-Computer-ArchitectureJIT optimization targets memory hierarchy
Tuan-Bonus-Consistency-Models-IsolationMemory model, atomicity

Tham khảo

Books:

  • Crafting Interpreters (Bob Nystrom, free) — https://craftinginterpreters.com/
  • Compilers: Principles, Techniques, and Tools (Dragon Book — Aho et al.)
  • Engineering a Compiler (Cooper & Torczon)
  • Programming Language Pragmatics (Scott)
  • The Garbage Collection Handbook (Jones, Hosking, Moss)
  • Modern Compiler Implementation in Java (Appel)

Online:

Papers:

  • The Cliff Click Hash Map (lock-free)
  • A History of Modern 64-bit Computing — context

Specific projects to study:

  • LLVM (modular compiler infrastructure)
  • V8 source code
  • Wasmtime (Wasm runtime)
  • HotSpot JVM
  • Go compiler

Tiếp theo: Tuan-Foundations-Math-for-Architects — Linear algebra, probability, discrete math, info theory.