Case Study: Design Ad Click Event Aggregation

“1 ty luot click moi ngay, moi click phai duoc dem dung 1 lan, ket qua phai co trong vai phut. Day la bai toan ma sai 0.1% cung co the mat hang trieu do quang cao.”

Tags: system-design ad-click event-aggregation streaming exactly-once mapreduce alex-xu-vol2 Student: Hieu Source: Alex Xu — System Design Interview Volume 2, Chapter 3 Prerequisite: Tuan-08-Message-Queue · Tuan-02-Back-of-the-envelope · Tuan-01-Scale-From-Zero-To-Millions Lien quan: Tuan-13-Monitoring-Observability · Case-Design-Metrics-Monitoring-Alerting · Tuan-07-Database-Sharding-Replication

1. Context & Why — Tai sao Ad Click Event Aggregation quan trong?

1.1 Analogy: Dem phieu bau cu quoc gia real-time

Hieu, em hay tuong tuong bai toan nay nhu viec dem phieu bau cu quoc gia dien ra real-time:

Bau cu quoc gia	Ad Click Aggregation	Diem chung
Hang trieu phieu tu khap ca nuoc	Hang ty click tu khap the gioi	Volume cuc lon, den lien tuc
Moi phieu chi duoc dem 1 lan — dem 2 lan = gian lan	Moi click chi duoc dem 1 lan — dem 2 lan = tinh tien sai nha quang cao	Exactly-once la yeu cau bat buoc
Ket qua phai cap nhat lien tuc — bao chi can biet ai dang dan moi phut	Dashboard phai cap nhat lien tuc — advertiser can biet bao nhieu click trong 1 phut qua	Real-time aggregation voi latency thap
Chia theo khu vuc — dem theo tinh, theo thanh pho	Chia theo filter — dem theo ad_id, campaign_id, country	Multi-dimensional aggregation
Phieu den muon — co van phong gui ket qua cham vi xa	Late events — click event den muon do network delay	Late event handling la thach thuc lon
Kiem tra cheo — ket qua dem thu cong phai khop voi may	Reconciliation — batch job kiem tra lai ket qua streaming	Reconciliation dam bao chinh xac
Co the bau trung — mot nguoi di bau 2 lan	Click fraud — bot click nhieu lan	Deduplication can xu ly

Aha Moment: Bai toan nay khong phai la “dem click” don gian. No la bai toan dem dung, dem du, dem nhanh o quy mo hang ty event moi ngay. Sai mot chut = mat tien that — hoac nha quang cao bi tinh thua, hoac platform mat doanh thu.

1.2 Tai sao bai toan nay kho?

Hieu, dem so tuong nhu don gian — nhung dem so o quy mo lon, real-time, va chinh xac tuyet doi lai la mot trong nhung bai toan kho nhat cua distributed systems:

Thach thuc	Giai thich
Volume cuc lon	1 ty click/ngay = ~11,574 click/giay trung binh, peak co the 50K+/giay
Real-time	Advertiser muon thay ket qua trong vai phut, khong phai cuoi ngay
Exactly-once	Dem thua = tinh tien sai. Dem thieu = mat doanh thu. Ca hai deu khong chap nhan duoc
Late events	Event co the den muon 5 phut, 1 gio, tham chi 1 ngay do mobile network, CDN delay
Multi-dimensional	Can aggregate theo nhieu chieu: ad_id, campaign_id, country, device_type, time window
Fault tolerance	He thong khong duoc mat data khi server crash, network partition
Hot spots	Quang cao viral co the nhan 100x traffic binh thuong — tao hot partition
Consistency vs Latency	Can can bang giua do chinh xac va toc do tra ket qua

1.3 Business context — Tai sao advertiser quan tam?

Trong mo hinh quang cao online (Google Ads, Facebook Ads, TikTok Ads), advertiser tra tien theo CPC (Cost Per Click). Moi click = mot khoan tien. Vi du:

Advertiser A chay quang cao voi CPC = $0.50
1 ngay co 1 trieu click vao quang cao cua A
Tong chi phi = $500,000/ngay

Neu he thong dem sai 1% → sai $5, 000/ n g a y * * \to s ai * *$ 1.8 trieu/nam. Voi hang trieu advertiser, con so nay nhan len gap boi. Day la ly do data accuracy la non-negotiable.

1.4 Scope cua bai toan

Thuoc tinh	Trong scope	Ngoai scope
Click event aggregation (dem click theo ad_id, time window)	Yes	—
Real-time query (advertiser xem dashboard)	Yes	—
Filtering (theo campaign_id, country)	Yes	—
Click fraud detection (bot, click farms)	Lien quan (Section 4)	Deep dive rieng
Conversion tracking (click → purchase)	—	Bai toan khac
Ad serving (chon quang cao nao hien thi)	—	Bai toan khac
Billing (tinh tien advertiser)	—	Downstream cua aggregation

2. Deep Dive — Alex Xu 4-Step Framework

Step 1 — Understand the Problem & Establish Design Scope

2.1.1 Functional Requirements

Chuc nang	Mo ta chi tiet
Aggregate ad clicks	Dem so luong click va unique click cho moi ad_id trong cac time window: 1 phut, 5 phut, 1 gio
Return aggregated data	Query API tra ve click_count va unique_click_count cho mot ad_id trong mot khoang thoi gian
Support filtering	Filter ket qua theo ad_id, campaign_id, country, device_type
Support multi-dimensional aggregation	Aggregate theo nhieu chieu cung luc: vi du click count per ad_id per country per minute
Store raw events	Luu raw click event de reconciliation va re-processing

Cac cau hoi em nen hoi interviewer:

Cau hoi	Cau tra loi gia dinh	Tai sao hoi
Bao nhieu click/ngay?	1 billion (1 ty) click/ngay	Quyet dinh throughput cua toan he thong
Bao nhieu ad dang active?	2 trieu ad	Quyet dinh cardinality cua aggregation
Aggregation window nao?	1 phut, 5 phut, 1 gio	Quyet dinh windowing strategy
Latency yeu cau?	< vai phut tu click den queryable	Quyet dinh streaming vs batch
Data accuracy?	Exactly-once — khong duoc dem thua hay thieu	Quyet dinh delivery semantics
Can store raw data khong?	Co — de reconciliation va ad-hoc analysis	Quyet dinh storage strategy
Bao lau giu raw data?	Raw: 7 ngay, Aggregated: years	Quyet dinh data retention
Click fraud co trong scope khong?	Basic dedup co, advanced fraud detection ngoai scope	Ranh gioi thiet ke

2.1.2 Non-Functional Requirements

Requirement	Target	Ly do
Throughput	1B clicks/ngay, peak 50K clicks/sec	Volume lon, bursty traffic
Latency	End-to-end < vai phut (tu click den queryable)	Near-real-time dashboard
Accuracy	Exactly-once processing	Lien quan den tien bac
Availability	99.99% cho ingestion pipeline	Mat event = mat doanh thu
Scalability	Scale horizontal khi traffic tang	Mua sale, viral ads
Fault tolerance	Khong mat data khi node failure	Zero data loss
Idempotency	Re-process khong thay doi ket qua	Retry safety
Queryability	Query response < 1 giay cho single ad_id	Dashboard UX

2.1.3 API Design

API 1: Aggregate click count cho mot ad_id

GET /v1/ads/{ad_id}/aggregated_count

Parameters:

Parameter	Type	Mo ta
`from`	long (epoch)	Thoi diem bat dau
`to`	long (epoch)	Thoi diem ket thuc
`filter`	object	Optional: `{campaign_id: "xxx", country: "VN"}`

Response:

Field	Type	Mo ta
`ad_id`	string	ID cua quang cao
`click_count`	long	Tong so click
`unique_click_count`	long	So click tu distinct users

API 2: Top N clicked ads trong khoang thoi gian

GET /v1/ads/top_clicked

Parameters:

Parameter	Type	Mo ta
`count`	int	So luong top ads can tra ve (N)
`window`	int	Khoang thoi gian tinh bang phut (1, 5, 60)
`filter`	object	Optional filter

Step 2 — High-Level Design

2.2.1 Kien truc tong quan

Hieu, kien truc nay giong nhu day chuyen xu ly ca trong nha may thuy san:

Tau ca (Client) mang ca ve cang → Click events tu user browsers/apps
Cang ca (Load Balancer) phan phoi ca vao cac bang chuyen → Ingestion Service nhan events
Kho lanh (Message Queue) giu ca tuoi truoc khi che bien → Kafka buffer events
Day chuyen che bien (Aggregation Service) cat, loc, dong goi → Map-Reduce aggregate clicks
Kho thanh pham (Database) luu san pham da dong goi → Aggregated DB luu ket qua
Quay ban hang (Query Service) phuc vu khach hang → Query API tra ket qua cho advertiser

flowchart LR
    subgraph "Data Generation"
        C1["User Clicks<br/>(Browsers/Apps)"]
    end

    subgraph "Log Ingestion"
        LB["Load Balancer"]
        IS1["Ingestion<br/>Service 1"]
        IS2["Ingestion<br/>Service 2"]
        ISN["Ingestion<br/>Service N"]
    end

    subgraph "Message Queue"
        K["Apache Kafka<br/>(Partitioned by ad_id)"]
    end

    subgraph "Streaming Aggregation"
        AG1["Aggregation Node 1<br/>(Map + Aggregate)"]
        AG2["Aggregation Node 2<br/>(Map + Aggregate)"]
        AGN["Aggregation Node N<br/>(Map + Aggregate)"]
    end

    subgraph "Second Message Queue (Optional)"
        K2["Kafka Topic 2<br/>(Aggregated Results)"]
    end

    subgraph "Storage"
        RAW[("Raw Data Store<br/>(Cassandra / S3)")]
        AGG[("Aggregated DB<br/>(Cassandra / ClickHouse)")]
    end

    subgraph "Serving"
        QS["Query Service"]
        CACHE[("Redis Cache")]
        DASH["Advertiser<br/>Dashboard"]
    end

    C1 --> LB
    LB --> IS1 & IS2 & ISN
    IS1 & IS2 & ISN --> K
    K --> AG1 & AG2 & AGN
    K -->|"Raw events"| RAW
    AG1 & AG2 & AGN --> K2
    K2 --> AGG
    AGG --> QS
    QS --> CACHE
    QS --> DASH

    style K fill:#FF9800,stroke:#333,stroke-width:2px,color:#fff
    style K2 fill:#FF9800,stroke:#333,stroke-width:2px,color:#fff
    style AGG fill:#4CAF50,stroke:#333,stroke-width:2px,color:#fff
    style QS fill:#2196F3,stroke:#333,stroke-width:2px,color:#fff

2.2.2 Data Flow — End to End

sequenceDiagram
    participant User as User Browser
    participant IS as Ingestion Service
    participant Kafka as Kafka (Raw Events)
    participant AGG as Aggregation Service
    participant Kafka2 as Kafka (Aggregated)
    participant DB as Aggregated DB
    participant QS as Query Service
    participant ADV as Advertiser Dashboard

    User->>IS: Click event (ad_id, timestamp, user_id, ip, country)
    IS->>IS: Validate & enrich event
    IS->>Kafka: Produce to partition (hash ad_id)

    Note over Kafka,AGG: Streaming Aggregation (per minute window)
    Kafka->>AGG: Consume batch of events
    AGG->>AGG: Map: filter + transform
    AGG->>AGG: Aggregate: count by ad_id per window
    AGG->>AGG: Reduce: merge partial results
    AGG->>Kafka2: Produce aggregated result

    Kafka2->>DB: Write aggregated data
    Note over DB: (ad_id, window_start, click_count, unique_count)

    ADV->>QS: GET /v1/ads/123/aggregated_count?from=...&to=...
    QS->>DB: Query aggregated data
    DB-->>QS: Results
    QS-->>ADV: JSON response

2.2.3 Component Responsibilities

Component	Trach nhiem	Technology choices
Ingestion Service	Nhan click event tu client, validate, enrich (geo lookup), gui vao Kafka	Custom service (Go/Java), stateless, horizontally scalable
Kafka (Raw)	Buffer raw events, decouple ingestion tu processing, replay capability	Apache Kafka, partitioned by ad_id
Raw Data Store	Luu raw events cho reconciliation, ad-hoc query, re-processing	Cassandra (write-heavy), hoac S3 (cost-effective)
Aggregation Service	Stream processing: map, aggregate, reduce trong time window	Apache Flink, Spark Streaming
Kafka (Aggregated)	Buffer aggregated results truoc khi ghi DB, decouple aggregation tu storage	Apache Kafka
Aggregated DB	Luu ket qua aggregation, phuc vu read queries	Cassandra, ClickHouse
Query Service	Parse query, fetch data tu DB, apply filter, return results	Custom service voi caching layer
Redis Cache	Cache hot queries (top ads, recent aggregations)	Redis cluster

2.2.4 Tai sao can 2 Kafka topics?

Loi ich	Giai thich
Decoupling	Aggregation service va DB writer co the scale doc lap
Exactly-once	Kafka transactions dam bao aggregated result duoc ghi chinh xac 1 lan
Replay	Neu DB writer loi, co the replay tu Kafka topic 2 ma khong can re-aggregate
Multiple consumers	Nhieu downstream systems co the doc aggregated data (alerting, billing, analytics)

Step 3 — Design Deep Dive

2.3.1 Data Model

Raw Click Event

Field	Type	Mo ta	Vi du
`ad_id`	string	ID cua quang cao	”ad_001”
`click_timestamp`	long (epoch ms)	Thoi diem user click (event time)	1678886400000
`user_id`	string	ID cua user (cookie/device ID)	“user_abc123”
`ip`	string	IP address cua user	”203.113.152.4”
`country`	string	Country code (tu IP geo lookup)	“VN”
`device_type`	string	Loai thiet bi	”mobile”
`campaign_id`	string	Campaign chua ad nay	”camp_xyz”

Kich thuoc trung binh: ~0.5-1 KB per event.

Aggregated Data

Field	Type	Mo ta	Vi du
`ad_id`	string	ID cua quang cao	”ad_001”
`window_start`	long (epoch)	Bat dau cua time window	1678886400
`window_size`	int	Kich thuoc window (phut)	1
`click_count`	long	Tong so click trong window	45,231
`unique_click_count`	long	So click tu distinct users	38,102
`filter_id`	string	Composite key cua filter dimensions	”country=VN”

Aha Moment: Raw event va aggregated data co dac diem hoan toan khac nhau. Raw event write-heavy, append-only, high volume. Aggregated data read-heavy, update-in-place (upsert), lower volume. Day la ly do chung can databases khac nhau — hoac it nhat la tables khac nhau voi access patterns khac nhau.

Star Schema cho Aggregation

Hieu, data model cho aggregation tuong tu star schema trong data warehousing:

Thanh phan	Vai tro	Vi du
Fact table	Luu metric (click_count)	`aggregated_clicks (ad_id, window, click_count, unique_count)`
Dimension tables	Luu thong tin filter	`ads (ad_id, campaign_id, advertiser_id)`, `campaigns (campaign_id, budget)`

Query pattern: join fact table voi dimension tables de filter theo campaign_id, advertiser_id, country.

2.3.2 Choosing the Right Database

Yeu cau	Raw Data	Aggregated Data
Write pattern	Append-only, sequential	Upsert (update or insert)
Read pattern	Rare (reconciliation, re-processing)	Frequent (dashboard queries)
Volume	Cuc lon (~1TB/ngay)	Nho hon nhieu (~vài GB/ngay)
Query pattern	Full scan, time-range scan	Point query (ad_id + time range), top-N
Retention	7 ngay (sau do archive)	Years
Consistency	Eventual OK	Strong preferred

Raw Data → Cassandra hoac S3

Lua chon	Uu diem	Nhuoc diem
Cassandra	Write-optimized (LSM-tree), time-series friendly, TTL support, distributed	Query flexibility han che, expensive storage
S3 + Parquet	Cuc re, unlimited storage, Athena/Spark co the query	Latency cao, khong phu hop real-time query
Hybrid (Cassandra 7 ngay + S3 archive)	Ket hop ca hai	Phuc tap operations

Ket luan: Dung Cassandra cho raw data voi TTL 7 ngay. Archive sang S3 (Parquet format) cho long-term storage. Cassandra write-heavy performance rat phu hop voi use case nay — partition key la ad_id, clustering key la click_timestamp.

Aggregated Data → Cassandra hoac ClickHouse

Lua chon	Uu diem	Nhuoc diem
Cassandra	Consistent voi raw data store, write-optimized	OLAP queries cham (khong co secondary index tot)
ClickHouse	OLAP-optimized, column-oriented, blazing fast aggregation queries	Write pattern khac (batch insert tot hon single upsert)
Druid	Real-time OLAP, pre-aggregation tai ingestion	Phuc tap operations

Ket luan: ClickHouse la lua chon tot nhat cho aggregated data vi:

Column-oriented storage nen toi uu cho aggregation queries (SUM, COUNT, GROUP BY)
Query speed cuc nhanh cho dashboard use case
Built-in materialized views co the tu dong aggregate
Compression ratio cao cho time-series data

Trade-off: Neu team chua co kinh nghiem voi ClickHouse, dung Cassandra cho ca raw va aggregated data de giam operational complexity. Performance khong tot bang nhung don gian hon.

2.3.3 Aggregation Service — MapReduce Model

Day la trai tim cua he thong. Aggregation service dung mo hinh MapReduce de xu ly events theo 3 giai doan:

Giai doan 1: Map Node (Filter + Transform)

Nhiem vu	Chi tiet
Nhan raw event tu Kafka	Moi Map node nhan events tu 1 hoac nhieu Kafka partitions
Filter	Loai bo invalid events (missing ad_id, invalid timestamp)
Transform	Enrich data (geo lookup tu IP → country), normalize fields
Emit key-value	Output: `(ad_id, 1)` — nghia la “ad nay duoc click 1 lan”

Giai doan 2: Aggregate Node (Count by ad_id per window)

Nhiem vu	Chi tiet
Nhan (ad_id, 1) tu Map	Group by ad_id
Aggregate trong time window	Dem so luong clicks va unique clicks trong tung window (1 phut)
Maintain state	Dung in-memory state (Flink state backend) de luu partial aggregation
Emit partial result	Output: `(ad_id, window, partial_count, partial_unique_count)`

Giai doan 3: Reduce Node (Merge Results)

Nhiem vu	Chi tiet
Nhan partial results tu nhieu Aggregate nodes	Truong hop ad_id bi chia ra nhieu node
Merge	Cong tat ca partial_count lai, merge HyperLogLog cho unique count
Output final result	`(ad_id, window, final_click_count, final_unique_count)`
Ghi vao Kafka topic 2	Aggregated result san sang ghi DB

flowchart TB
    subgraph "Kafka Partitions"
        P1["Partition 0<br/>ad_001, ad_005, ad_009..."]
        P2["Partition 1<br/>ad_002, ad_006, ad_010..."]
        P3["Partition 2<br/>ad_003, ad_007, ad_011..."]
        P4["Partition 3<br/>ad_004, ad_008, ad_012..."]
    end

    subgraph "Map Nodes"
        M1["Map Node 1<br/>Filter + Transform<br/>Emit (ad_id, 1)"]
        M2["Map Node 2<br/>Filter + Transform<br/>Emit (ad_id, 1)"]
        M3["Map Node 3<br/>Filter + Transform<br/>Emit (ad_id, 1)"]
        M4["Map Node 4<br/>Filter + Transform<br/>Emit (ad_id, 1)"]
    end

    subgraph "Aggregate Nodes (Windowed)"
        A1["Aggregate Node 1<br/>Window: 1 min<br/>Count by ad_id"]
        A2["Aggregate Node 2<br/>Window: 1 min<br/>Count by ad_id"]
    end

    subgraph "Reduce Nodes"
        R1["Reduce Node<br/>Merge partial counts<br/>Final result per ad_id"]
    end

    subgraph "Output"
        K2["Kafka Topic 2<br/>(Aggregated Results)"]
        DB[("ClickHouse")]
    end

    P1 --> M1
    P2 --> M2
    P3 --> M3
    P4 --> M4

    M1 & M2 --> A1
    M3 & M4 --> A2

    A1 & A2 --> R1
    R1 --> K2
    K2 --> DB

    style M1 fill:#42A5F5,color:#fff
    style M2 fill:#42A5F5,color:#fff
    style M3 fill:#42A5F5,color:#fff
    style M4 fill:#42A5F5,color:#fff
    style A1 fill:#FFA726,color:#fff
    style A2 fill:#FFA726,color:#fff
    style R1 fill:#EF5350,color:#fff

Aha Moment: Trong thuc te, khi dung Apache Flink, Map va Aggregate thuong merge thanh 1 operator. Flink tu dong lam chuyen nay qua operator chaining de giam network overhead. Mo hinh 3 giai doan la de hieu concept — implementation thuc te compact hon.

Unique Click Count — HyperLogLog

Dem unique clicks (distinct user_id) la bai toan kho vi:

Khong the giu tat ca user_id trong memory (qua nhieu)
Exact count yeu cau O(n) memory per ad_id per window

Giai phap: Dung HyperLogLog (HLL) — probabilistic data structure:

Thuoc tinh	Gia tri
Memory	~12 KB per counter (co dinh, bat ke bao nhieu elements)
Accuracy	Sai so ~0.81% (voi 16K registers)
Mergeable	Co the merge 2 HLL counters → perfect cho distributed aggregation
Trade-off	Mat do chinh xac tuyet doi, nhung tiet kiem memory cuc lon

Voi 2 trieu active ads, moi ad 1 HLL counter = 2M x 12KB = ~24 GB memory — hoan toan kha thi.

Neu can exact unique count (mot so advertiser tra them tien cho do chinh xac), dung bitmap hoac Bloom filter voi memory lon hon.

2.3.4 Streaming vs Batching — Lambda vs Kappa Architecture

Day la quyet dinh kien truc quan trong nhat cua bai toan. Hieu can hieu 2 truong phai:

Lambda Architecture (Batch + Speed Layer)

flowchart TB
    subgraph "Data Source"
        EVENTS["Click Events"]
    end

    subgraph "Batch Layer"
        BATCH_STORE[("Batch Storage<br/>(HDFS / S3)")]
        BATCH_PROC["Batch Processing<br/>(Spark / MapReduce)<br/>Chay moi gio hoac moi ngay"]
        BATCH_VIEW[("Batch View<br/>(accurate, complete)")]
    end

    subgraph "Speed Layer"
        STREAM_PROC["Stream Processing<br/>(Flink / Storm)<br/>Real-time"]
        RT_VIEW[("Real-time View<br/>(approximate, fast)")]
    end

    subgraph "Serving Layer"
        MERGE["Merge Results<br/>Batch View + Real-time View"]
        QS["Query Service"]
    end

    EVENTS --> BATCH_STORE
    EVENTS --> STREAM_PROC
    BATCH_STORE --> BATCH_PROC
    BATCH_PROC --> BATCH_VIEW
    STREAM_PROC --> RT_VIEW
    BATCH_VIEW --> MERGE
    RT_VIEW --> MERGE
    MERGE --> QS

    style BATCH_PROC fill:#4CAF50,color:#fff
    style STREAM_PROC fill:#FF9800,color:#fff
    style MERGE fill:#2196F3,color:#fff

Uu diem	Nhuoc diem
Batch layer dam bao accuracy	2 codebases — phai maintain 2 pipeline (batch + streaming)
Speed layer dam bao low latency	Logic trung lap — cung 1 aggregation nhung viet 2 lan
Neu streaming sai, batch se sua	Phuc tap debug — sai o layer nao?
	Merge logic khong don gian

Kappa Architecture (Streaming Only)

flowchart LR
    subgraph "Data Source"
        EVENTS["Click Events"]
    end

    subgraph "Streaming Layer (Only Layer)"
        KAFKA["Kafka<br/>(Immutable Log)"]
        FLINK["Stream Processing<br/>(Apache Flink)<br/>Real-time Aggregation"]
    end

    subgraph "Serving Layer"
        DB[("Aggregated DB")]
        QS["Query Service"]
    end

    subgraph "Reconciliation (Periodic)"
        RECON["Batch Reconciliation<br/>(Hourly / Daily)<br/>Verify streaming results"]
    end

    EVENTS --> KAFKA
    KAFKA --> FLINK
    FLINK --> DB
    DB --> QS
    KAFKA -.->|"Replay if needed"| FLINK
    DB -.-> RECON
    KAFKA -.-> RECON

    style KAFKA fill:#FF9800,color:#fff
    style FLINK fill:#4CAF50,color:#fff
    style RECON fill:#9E9E9E,color:#fff

Uu diem	Nhuoc diem
1 codebase — chi can maintain 1 pipeline	Can streaming framework mature (Flink co exactly-once)
Don gian hon Lambda nhieu	Neu can re-process, replay tu Kafka (can du retention)
Kafka la immutable log — co the replay bat ky luc nao
Apache Flink ho tro exactly-once processing

Ket luan cua Alex Xu: Kappa architecture la lua chon uu tien cho bai toan nay. Ly do: Apache Flink da mature du de dam bao exactly-once processing. Lambda chi can thiet khi streaming framework chua du tin cay — dieu nay khong con dung voi Flink 2024+.

Aha Moment: Lambda architecture ra doi vi streaming frameworks ngay xua (Storm, Samza) khong dam bao exactly-once. Khi Flink giai quyet van de nay, ly do ton tai cua Lambda giam di dang ke. Tuy nhien, reconciliation (batch job kiem tra lai) van can thiet nhu safety net — day khong phai la Lambda, ma la best practice.

2.3.5 Watermark va Late Events

Day la khai niem kho nhat trong stream processing. Hieu can hieu ro:

Event Time vs Processing Time

Khai niem	Dinh nghia	Vi du
Event time	Thoi diem su kien thuc su xay ra (user click)	14:00:00 — user click vao quang cao
Processing time	Thoi diem he thong nhan duoc event	14:00:05 — event den Flink (tre 5 giay do network)
Ingestion time	Thoi diem event vao Kafka	14:00:02 — Kafka nhan event

Tai sao quan trong? Vi em aggregate theo time window. Neu dung processing time, 1 click xay ra luc 13:59:59 nhung den luc 14:00:05 se bi dem vao window 14:00-14:01 thay vi 13:59-14:00. Sai window = sai ket qua.

→ Ket luan: Luon dung event time cho aggregation. Processing time chi dung cho monitoring he thong.

Watermark la gi?

Watermark = loi tuyen bo cua he thong rang: “Tat ca event voi event_time ⇐ W da den het”.

Vi du: Watermark = 14:05:00 co nghia la he thong tin rang khong con event nao voi timestamp truoc 14:05:00 se den nua. He thong co the dong (close) tat ca window truoc 14:05:00 va emit ket qua.

Van de: Lam sao biet tat ca event da den? Khong the biet chac chan. Vi vay watermark luon co allowed lateness (do tre cho phep).

Watermark strategy	Mo ta	Trade-off
Punctuated watermark	Emit watermark dua tren event data (vi du: max event_time - 5s)	Phu hop traffic deu
Periodic watermark	Emit watermark dinh ky (vi du: moi 200ms, watermark = max event_time seen - 10s)	Pho bien nhat, don gian
Watermark = max_event_time - allowed_lateness	Vi du: max_event_time - 5 phut	Balance giua latency va completeness

Late Event Handling

Khi event den sau watermark (late event), co 3 cach xu ly:

Strategy	Mo ta	Khi nao dung
Drop	Bo qua late event	Khi data accuracy khong quan trong (dashboard xap xi)
Update aggregation	Mo lai window, cap nhat ket qua, emit lai	Khi can accuracy cao — dung cho ad click
Side output	Gui late event sang 1 stream rieng de xu ly sau	Khi late event can xu ly khac (reconciliation, alerting)

Trong bai toan ad click, Alex Xu khuyen dung ket hop Update + Side output:

Late event trong 5 phut: update aggregation (mo lai window, dem lai)
Late event qua 5 phut: gui vao side output → reconciliation batch job xu ly

Aha Moment: Watermark la khai niem kho nhat trong stream processing vi no lien quan den trade-off giua latency va completeness. Watermark qua nhanh (allowed_lateness nho) → nhieu late events bi bo → sai ket qua. Watermark qua cham (allowed_lateness lon) → latency cao → dashboard cap nhat cham. Khong co gia tri “dung” — chi co gia tri “phu hop voi business yeu cau”.

2.3.6 Exactly-Once Processing

Trong he thong phan tan, exactly-once la cuc ky kho vi co nhieu diem failure:

Diem failure	Van de	Giai phap
Kafka consumer doc event	Consumer crash sau khi xu ly nhung truoc khi commit offset	Flink checkpointing (snapshot state + Kafka offset cung luc)
Aggregation service	Node crash giua chung xu ly	Flink savepoints/checkpoints — resume tu last consistent state
Ghi vao DB	Ghi thanh cong nhung chua ack	Idempotent writes — ghi lai cung data khong thay doi ket qua
Kafka producer (ghi aggregated result)	Producer crash sau khi Kafka nhan nhung truoc khi ack	Kafka idempotent producer (enable.idempotence=true)

Exactly-Once End-to-End Flow

flowchart TB
    subgraph "Source: Kafka Consumer"
        KC["Kafka Consumer<br/>Track offset per partition"]
    end

    subgraph "Processing: Flink"
        CP["Checkpoint Manager<br/>(Periodic snapshots)"]
        STATE["Flink State Backend<br/>(RocksDB / Heap)"]
        OP["Aggregation Operator<br/>(Stateful processing)"]
    end

    subgraph "Sink: Kafka Producer + DB"
        KP["Kafka Producer<br/>(Idempotent + Transactional)"]
        DB[("ClickHouse<br/>(Idempotent Upsert)")]
    end

    subgraph "Checkpoint Flow"
        BARRIER["Checkpoint Barrier<br/>(injected into stream)"]
    end

    KC -->|"Events"| OP
    OP --> STATE
    OP --> KP
    KP --> DB

    CP -->|"Trigger checkpoint"| BARRIER
    BARRIER -->|"Snapshot: offset + state + pending writes"| CP
    CP -->|"On success: commit Kafka offset + flush writes"| KC
    CP -->|"On success: commit Kafka transaction"| KP

    style CP fill:#e53935,color:#fff
    style STATE fill:#1e88e5,color:#fff
    style OP fill:#43a047,color:#fff

Co che Flink Checkpoint hoat dong nhu the nao:

Checkpoint Manager dinh ky (vi du moi 30 giay) inject checkpoint barrier vao stream
Khi barrier di qua operator, operator snapshot state (partial aggregation, HLL counters) vao durable storage (HDFS/S3)
Dong thoi luu Kafka consumer offset tai thoi diem do
Khi tat ca operators da snapshot xong → checkpoint hoan thanh
Neu failure xay ra → Flink rollback ve last successful checkpoint:
- Reset Kafka consumer offset ve checkpoint offset (doc lai events tu do)
- Restore operator state tu checkpoint
- Xu ly lai events tu checkpoint offset → ket qua giong het vi state giong het

Quan trong: Exactly-once trong Flink khong co nghia la moi event chi duoc xu ly 1 lan. No co nghia la ket qua cuoi cung giong nhu moi event chi duoc xu ly 1 lan. Thuc te, events co the duoc xu ly lai (sau checkpoint restore) nhung ket qua khong thay doi nho idempotent writes.

2.3.7 Deduplication — Cung 1 click bi gui 2 lan

Khac voi exactly-once (van de cua he thong), deduplication xu ly van de tu phia client:

Nguyen nhan	Mo ta	Vi du
Client retry	User click, request timeout, client gui lai	1 click → 2 events
Double click	User click nhanh 2 lan	2 clicks thuc te nhung chi nen dem 1
Page reload	User refresh trang co quang cao	Click event gui lai
CDN/Proxy retry	Intermediate server retry request	1 click → nhieu events

Dedup Strategy

Cach tiep can	Mo ta	Trade-off
Dedup key: (ad_id, user_id, click_timestamp)	2 events co cung 3 field = duplicate	Can luu seen keys trong window → memory
Dedup window	Chi check duplicate trong 1 khoang thoi gian (vi du 5 phut)	Duplicate xa hon window khong bi bat
Bloom filter	Probabilistic — check “da thay chua?” voi false positive rate thap	Tiet kiem memory, nhung co false positive (bo nham event that)
External dedup store (Redis)	Luu dedup key trong Redis voi TTL	Chinh xac hon Bloom filter, nhung them latency va dependency

Ket luan: Dung Bloom filter trong Flink operator voi dedup window 5 phut. False positive rate ~0.1% la chap nhan duoc (bo nham 1/1000 click that — insignificant voi 1 ty clicks/ngay).

Cho truong hop can chinh xac tuyet doi (billing), dung Redis dedup voi key {ad_id}:{user_id}:{minute_bucket} va TTL 10 phut.

2.3.8 Time Window Types

Flink (va cac streaming framework khac) ho tro nhieu loai time window:

Tumbling Window

Thuoc tinh	Gia tri
Dinh nghia	Fixed-size, non-overlapping windows
Vi du	Window 1 phut: [14:00-14:01), [14:01-14:02), [14:02-14:03)
Khi nao dung	Ad click aggregation — dem click trong tung phut, tung gio
Dac diem	Moi event thuoc dung 1 window. Don gian nhat, efficient nhat

Sliding Window

Thuoc tinh	Gia tri
Dinh nghia	Fixed-size windows nhung overlap voi nhau
Vi du	Window 5 phut, slide moi 1 phut: [14:00-14:05), [14:01-14:06), [14:02-14:07)
Khi nao dung	Khi can “moving average” — vi du “so click trong 5 phut gan nhat” cap nhat moi phut
Dac diem	Moi event thuoc nhieu windows. Ton memory hon tumbling

Session Window

Thuoc tinh	Gia tri
Dinh nghia	Window dong khi khong co event moi trong 1 khoang thoi gian (gap)
Vi du	Session gap 30 phut: events luc 14:00, 14:05, 14:10 thuoc 1 session. Event luc 14:50 bat dau session moi
Khi nao dung	User behavior analysis — 1 session browsing cua user
Dac diem	Kich thuoc window dong (dynamic), khong co dinh

Cho bai toan ad click: Dung tumbling window cho 1 phut, 5 phut, 1 gio. Day la lua chon tu nhien nhat vi:

Moi click chi thuoc 1 window → don gian, khong duplicate counting
Window 5 phut = aggregate 5 windows 1-phut → co the tinh tu window nho hon
Window 1 gio = aggregate 60 windows 1-phut → tiep tuc cascade

Aha Moment: Khong can 3 pipeline rieng cho 3 window sizes. Chi can aggregate window 1 phut (smallest), sau do cascade aggregation: 5 windows 1-phut → 1 window 5-phut, 12 windows 5-phut → 1 window 1-gio. Day giet kiem compute va storage dang ke.

2.3.9 Scaling — Xu ly traffic lon

Kafka Partitioning

Chien luoc	Mo ta	Uu/Nhuoc
Partition by ad_id	Hash(ad_id) % num_partitions	Tat ca clicks cua 1 ad vao cung partition → aggregation don gian, NHUNG hot ads tao hot partition
Partition by ad_id + minute	Hash(ad_id + minute_bucket)	Phan tan hon, nhung can reduce step de merge
Random partition	Round-robin	Phan tan deu nhat, nhung aggregation phai shuffle data (expensive)

Ket luan: Dung partition by ad_id lam default. Xu ly hot partition rieng (Section 2.3.11).

Flink Parallelism

Component	Parallelism	Ly do
Source (Kafka consumer)	= So Kafka partitions	Moi consumer doc 1 partition
Map operator	= So Kafka partitions	1:1 voi source
Aggregate operator	Tuy throughput	keyBy(ad_id) tu dong distribute
Sink (Kafka producer)	Tuy throughput	Co the nho hon source

Khi can scale:

Tang Kafka partitions (vi du tu 100 → 200)
Tang Flink parallelism tuong ung
Tang consumer group instances
Flink auto-scaling (reactive mode) — Flink 1.13+ ho tro

Aggregation Node Horizontal Scaling

Flink tu dong distribute state dua tren key group. Khi tang parallelism:

State cua key group A duoc migrate tu node 1 sang node 2
Trong qua trinh migration, co brief pause (managed boi checkpoint/restore)
Khong can manual resharding nhu voi database

2.3.10 Fault Tolerance

Co che	Muc	Mo ta
Flink Checkpoints	Streaming	Periodic snapshot cua tat ca operator state + Kafka offsets. Automatic recovery
Flink Savepoints	Streaming	Manual checkpoint cho planned maintenance (upgrade, config change)
Kafka Consumer Group Rebalancing	Messaging	Khi 1 consumer die, partitions duoc redistribute cho consumers con lai
Kafka Replication	Messaging	Moi partition co 3 replicas — tolerate 2 broker failures
DB Replication	Storage	Cassandra/ClickHouse replication factor 3
End-to-end exactly-once	Toan he thong	Flink checkpoint + Kafka transaction + idempotent DB write

Failure scenarios va recovery:

Scenario	He thong phan ung
1 Flink TaskManager crash	JobManager detect (heartbeat timeout) → redeploy tasks → restore tu last checkpoint
Flink JobManager crash	Standby JobManager take over (HA mode voi ZooKeeper) → restore job tu last checkpoint
1 Kafka broker crash	ISR (In-Sync Replicas) → leader election → consumer reconnect → continue tu last committed offset
DB node crash	Replication → query route sang replica → repair node sau
Network partition (Flink ↔ Kafka)	Consumer timeout → rebalance → reconnect → replay tu last offset
Entire data center failure	Cross-DC replication (Kafka MirrorMaker, Cassandra multi-DC) → failover sang DC khac

2.3.11 Reconciliation — Verify Streaming Results

Du Flink co exactly-once, van can reconciliation vi:

Bug trong aggregation logic
Clock skew giua servers
Edge cases trong watermark/late event handling
Data corruption

Reconciliation Flow

Buoc	Mo ta
1. Batch job chay moi gio	Doc raw events tu Cassandra/S3 cho window vua qua
2. Re-aggregate	Tinh lai click_count va unique_count bang batch processing (Spark)
3. So sanh	Compare batch result voi streaming result trong aggregated DB
4. Detect drift	Neu sai lech > threshold (vi du 0.1%) → alert
5. Auto-correct (optional)	Ghi de ket qua streaming bang ket qua batch (batch la source of truth)

Metric	Threshold	Action
Drift < 0.01%	Normal	Log only
Drift 0.01% - 0.1%	Warning	Alert team
Drift > 0.1%	Critical	Alert + auto-correct + investigate root cause

Aha Moment: Reconciliation la safety net, khong phai primary mechanism. Trong he thong tot, reconciliation luon cho ket qua khop (drift ~ 0%). Neu drift thuong xuyen > 0, co bug trong streaming pipeline.

2.3.12 Hot Shard Problem — Quang cao viral

Van de: Quang cao viral (vi du Super Bowl ad) co the nhan 100x-1000x traffic binh thuong. Vi partition by ad_id, tat ca clicks cua ad nay vao 1 Kafka partition va 1 Flink subtask → hot shard → bottleneck.

Trieu chung	Hau qua
1 Kafka partition nhan qua nhieu data	Consumer lag tang, latency tang
1 Flink subtask dung qua nhieu CPU/memory	Backpressure, checkpoint timeout
Cac partition/subtask khac idle	Waste resources

Giai phap: Salted Keys + Secondary Aggregation

Buoc	Mo ta
1. Detect hot key	Monitor click rate per ad_id. Neu vuot threshold (vi du 10x average) → mark as hot
2. Salt the key	Thay vi partition by ad_id, partition by `ad_id + random(0..N)` voi N = so shards mong muon
3. First aggregation	Moi shard aggregate rieng → N partial results cho 1 ad_id
4. Secondary aggregation	1 reducer merge N partial results → final result cho ad_id

Vi du voi ad_001 la hot key, N=10:

Thay vi 1 partition nhan tat ca clicks cua ad_001
10 partitions nhan clicks: ad_001_0, ad_001_1, …, ad_001_9
10 aggregation nodes dem rieng
1 reduce node cong 10 partial counts lai

Trade-off:

Pro: Traffic phan tan deu, khong con hot shard
Con: Them 1 aggregation step, logic phuc tap hon, latency tang chut it
Con: Unique count (HLL) can merge — nhung HLL ho tro merge tot nen khong van de

Aha Moment: Hot shard problem khong chi xay ra trong bai toan nay. No xuat hien bat cu khi nao em partition by key va 1 key co traffic bat can xung. Giai phap “salted keys + secondary aggregation” la general pattern ap dung duoc cho nhieu bai toan khac.

3. Back-of-the-Envelope Estimation

3.1 Traffic Estimation

Assumptions:

1 ty (1 billion) clicks/ngay
Peak traffic = 5x average

$Average QPS = \frac{1 , 000 , 000 , 000}{86 , 400} \approx 11, 574 clicks/sec$

$Peak QPS = 11, 574 \times 5 \approx 57, 870 clicks/sec \approx 50 K- 60 K clicks/sec$

3.2 Kafka Throughput

Raw event size: ~1 KB (ad_id + timestamp + user_id + ip + country + metadata)

$Average throughput = 11, 574 \times 1 KB \approx 11.3 MB/sec$

$Peak throughput = 57, 870 \times 1 KB \approx 56.5 MB/sec$

Kafka partition estimation:

Moi partition handle ~10 MB/sec (conservative)
So partitions can thiet cho peak: $⌈ 56.5/10 ⌉ = 6$ partitions toi thieu
Thuc te dung 100-200 partitions de scale consumers va dam bao parallelism cho Flink

Kafka retention:

Raw topic retention: 7 ngay
Storage per day: $1 B \times 1 KB = 1 TB/ngay$
7-day retention: $1 TB \times 7 = 7 TB$
Voi replication factor 3: $7 TB \times 3 = 21 TB$ Kafka cluster storage

3.3 Raw Data Storage

$Daily raw storage = 1, 000, 000, 000 \times 1 KB = 1 TB/ngay$

$7-day retention (Cassandra) = 7 TB$

$Long-term archive (S3, compressed 5:1) = \frac{1 TB}{5} \times 365 = 73 TB/nam$

3.4 Aggregated Data Storage

Assumptions:

2 trieu active ads
Aggregation windows: 1 phut, 5 phut, 1 gio
Moi aggregated record: ~100 bytes (ad_id + window + counts)

$Records per minute = 2, 000, 000 (1 per active ad)$

$Records per day (1-min window) = 2, 000, 000 \times 1, 440 phut = 2.88 \times 1 0^{9} records$

$Storage per day (1-min) = 2.88 \times 1 0^{9} \times 100 B = 288 GB/ngay$

Voi ClickHouse compression ratio ~10:1:

$Compressed storage per day \approx 28.8 GB/ngay$

$Storage per year \approx 28.8 \times 365 = 10.5 TB/nam (compressed)$

3.5 Memory cho Flink State

HyperLogLog cho unique count:

$HLL memory = 2, 000, 000 ads \times 12 KB/HLL = 24 GB$

Partial aggregation state (tumbling window 1 min):

$State per ad \approx 100 bytes (counts + metadata)$

$Total state = 2, 000, 000 \times 100 B = 200 MB$

Tong memory Flink state: ~25 GB — phan bo tren nhieu TaskManagers, hoan toan kha thi.

3.6 Tong hop

Resource	Gia tri
Traffic	11.5K avg / 58K peak clicks/sec
Kafka cluster storage	21 TB (7-day retention, RF=3)
Kafka throughput	11 MB/s avg / 57 MB/s peak
Kafka partitions	100-200
Raw data (Cassandra)	7 TB (7-day TTL)
Raw data (S3 archive)	73 TB/year (compressed)
Aggregated data (ClickHouse)	10.5 TB/year (compressed)
Flink state memory	~25 GB (distributed)
Flink TaskManagers	10-20 (tuy CPU/memory spec)

4. Security & Data Privacy

4.1 Click Fraud Detection

Click fraud la moi de doa lon nhat doi voi he thong quang cao. Neu khong phat hien, advertiser mat tien, platform mat uy tin.

Loai fraud	Mo ta	Dau hieu nhan biet
Bot clicks	Automated scripts click quang cao lien tuc	Click rate bat thuong tu 1 IP, khong co mouse movement/scroll
Click farms	Nhom nguoi duoc tra tien de click	Nhieu clicks tu cung geo location, pattern giong nhau
Competitor clicking	Doi thu click quang cao cua em de hay ngân sach	Clicks tu cung IP range, khong conversion
Ad stacking	Nhieu quang cao chong len nhau, 1 click dem cho nhieu ad	Click coordinates giong nhau cho nhieu ads
Pixel stuffing	Quang cao hien thi 1x1 pixel, khong ai thay nhung dem la “impression”	Impression khong tuong ung voi click

Anti-Fraud Strategies

Strategy	Mo ta	Khi nao ap dung
IP-based rate limiting	Gioi han so click tu 1 IP per ad per time window	Real-time, tai ingestion layer
User-agent fingerprinting	Phat hien bot dua tren browser fingerprint	Real-time, tai ingestion layer
Click-through rate (CTR) anomaly	CTR bat thuong cao cho 1 ad → suspicious	Near-real-time, tai aggregation layer
Geo anomaly	Click tu country khong phu hop voi target audience	Near-real-time
Session analysis	Phan tich hanh vi sau click (bounce rate, time on page)	Batch analysis
Machine learning model	Train model phat hien fraud patterns	Batch + real-time inference

Implementation: Fraud detection khong lam trong aggregation pipeline chinh. Thay vao do:

Raw events di qua fraud detection service (parallel voi aggregation)
Fraud service flag suspicious clicks
Flagged clicks khong duoc dem trong aggregation (hoac dem rieng)
Advertiser co the dispute va review flagged clicks

4.2 User Privacy

Nguyen tac	Mo ta
Aggregate only	Report chi chua aggregated numbers (click_count), KHONG chua individual user data
No PII in reports	user_id, ip KHONG xuat hien trong aggregated results
PII only for dedup	user_id chi dung trong dedup window (5 phut), sau do discard
IP hashing	IP address duoc hash truoc khi luu (khong luu raw IP)
Data minimization	Chi thu thap data can thiet cho aggregation, khong hon
GDPR/CCPA compliance	User co quyen yeu cau xoa data. Raw data TTL 7 ngay ho tro dieu nay
Consent-based tracking	Chi track click khi user da consent (cookie consent banner)

4.3 Data Retention Policies

Data type	Retention	Ly do	Storage
Raw click events	7 ngay (hot) + 90 ngay (archive)	Reconciliation + dispute resolution	Cassandra → S3
Aggregated data (1-min)	1 nam	Dashboard + reporting	ClickHouse
Aggregated data (1-hour)	3 nam	Long-term analytics	ClickHouse (compressed)
Aggregated data (1-day)	Vinh vien	Historical trend	ClickHouse (cold storage)
Dedup keys	10 phut	Dedup window	Redis (TTL)
Fraud detection logs	1 nam	Audit + dispute	S3

5. DevOps & Monitoring

5.1 Key Metrics can monitor

Kafka Metrics

Metric	Mo ta	Alert threshold
Consumer lag	So messages chua duoc consume	> 100K messages → Warning, > 1M → Critical
Under-replicated partitions	Partitions chua du replicas	> 0 → Warning
Broker disk usage	Disk usage cua Kafka broker	> 80% → Warning, > 90% → Critical
Producer error rate	Ty le loi khi produce message	> 0.1% → Warning
Request latency (p99)	Latency cua produce/consume request	> 100ms → Warning

Flink Metrics

Metric	Mo ta	Alert threshold
Checkpoint duration	Thoi gian hoan thanh 1 checkpoint	> 30s → Warning, > 60s → Critical
Checkpoint failure rate	Ty le checkpoint that bai	> 0 trong 1 gio → Critical
Backpressure	Operator dang bi cham, buffer day	isBackPressured = true > 5 phut → Warning
Records per second	Throughput cua Flink job	Giam > 50% so voi baseline → Warning
State size	Kich thuoc state cua operators	Tang > 2x so voi baseline → Warning
Late events dropped	So late events bi drop	> 1% total events → Warning
TaskManager failures	So TaskManager crash	> 0 → Alert

Data Freshness & Accuracy

Metric	Mo ta	Alert threshold
End-to-end latency	Thoi gian tu click xay ra den data queryable trong DB	> 5 phut → Warning, > 10 phut → Critical
Data freshness	Khoang cach giua “now” va timestamp cua record moi nhat trong DB	> 3 phut → Warning
Reconciliation drift	% chenh lech giua streaming va batch result	> 0.01% → Warning, > 0.1% → Critical
Query latency (p99)	Latency cua query service	> 500ms → Warning, > 2s → Critical

Database Metrics

Metric	Mo ta	Alert threshold
Write latency (p99)	Latency ghi vao ClickHouse/Cassandra	> 100ms → Warning
Disk usage	Storage usage	> 75% → Warning
Compaction lag (Cassandra)	Pending compactions	> 10 → Warning
Query queue size (ClickHouse)	So queries dang cho	> 50 → Warning

5.2 Alerting Strategy

Severity	Vi du	Notification	Response time
P1 - Critical	Consumer lag > 1M, checkpoint failures, data freshness > 10 min	PagerDuty (phone call)	< 15 phut
P2 - Warning	Consumer lag > 100K, reconciliation drift > 0.01%, high backpressure	Slack + PagerDuty (SMS)	< 1 gio
P3 - Info	Traffic spike, scheduled maintenance	Slack	Next business day

5.3 Runbook — Common Issues

Van de	Nguyen nhan thuong gap	Cach xu ly
Consumer lag tang dot ngot	Traffic spike (viral ad), slow consumer, GC pause	Scale Flink parallelism, check backpressure, check GC logs
Checkpoint failure	State qua lon, slow storage, network issue	Tang checkpoint timeout, check state size, check HDFS/S3 health
Reconciliation drift	Bug trong aggregation logic, clock skew, late events	So sanh raw events voi aggregated results, check watermark config
High query latency	ClickHouse overloaded, missing index, large result set	Add materialized view, optimize query, scale read replicas
Kafka broker down	Disk full, hardware failure, network partition	Check disk, failover sang replica, add new broker

5.4 Deployment Strategy

Chien luoc	Mo ta	Khi nao dung
Blue-green deployment	Chay 2 Flink jobs song song (old + new), compare results, switch traffic	Major logic changes
Canary deployment	Route 5% traffic sang new version, monitor metrics	Minor changes
Savepoint + restart	Take savepoint cua Flink job, deploy new version, restore tu savepoint	Config changes, minor updates
A/B testing	2 pipelines xu ly cung data, compare results	Testing new aggregation logic

6. Mermaid Diagrams — Tong hop

6.1 High-Level Architecture (Complete)

flowchart TB
    subgraph "Clients"
        BROWSER["User Browsers"]
        APP["Mobile Apps"]
    end

    subgraph "Ingestion Layer"
        LB["Load Balancer<br/>(L7)"]
        IS["Ingestion Service Cluster<br/>(Stateless, Auto-scale)"]
    end

    subgraph "Message Queue Layer"
        KR["Kafka — Raw Events Topic<br/>(100+ partitions, RF=3)"]
    end

    subgraph "Processing Layer"
        FRAUD["Fraud Detection<br/>Service (Async)"]
        FLINK["Apache Flink Cluster<br/>Map → Aggregate → Reduce"]
    end

    subgraph "Storage Layer"
        RAW_CASS[("Cassandra<br/>Raw Events<br/>TTL 7 days")]
        RAW_S3[("S3 Archive<br/>Raw Events<br/>Parquet")]
        AGG_CH[("ClickHouse<br/>Aggregated Data")]
    end

    subgraph "Serving Layer"
        QS["Query Service"]
        REDIS[("Redis Cache")]
    end

    subgraph "Reconciliation"
        SPARK["Spark Batch Job<br/>(Hourly)"]
    end

    subgraph "Consumers"
        DASH["Advertiser Dashboard"]
        BILLING["Billing Service"]
        ANALYTICS["Analytics Platform"]
    end

    BROWSER & APP --> LB
    LB --> IS
    IS --> KR
    KR --> FRAUD
    KR --> FLINK
    KR --> RAW_CASS
    RAW_CASS -.->|"Archive"| RAW_S3
    FLINK --> AGG_CH
    AGG_CH --> QS
    QS --> REDIS
    QS --> DASH & BILLING & ANALYTICS

    RAW_S3 -.-> SPARK
    AGG_CH -.-> SPARK

    style KR fill:#FF9800,stroke:#333,stroke-width:2px,color:#fff
    style FLINK fill:#4CAF50,stroke:#333,stroke-width:2px,color:#fff
    style AGG_CH fill:#2196F3,stroke:#333,stroke-width:2px,color:#fff
    style FRAUD fill:#f44336,stroke:#333,stroke-width:2px,color:#fff
    style SPARK fill:#9C27B0,stroke:#333,stroke-width:2px,color:#fff

6.2 Exactly-Once End-to-End Flow (Detailed)

sequenceDiagram
    participant Kafka as Kafka (Source)
    participant Flink as Flink Operator
    participant State as Flink State (RocksDB)
    participant CM as Checkpoint Manager
    participant Storage as Durable Storage (HDFS)
    participant Sink as Kafka (Sink) + DB

    Note over Kafka,Sink: Normal Processing
    Kafka->>Flink: Event batch (offset 100-200)
    Flink->>State: Update aggregation state
    Flink->>Sink: Emit partial result (uncommitted)

    Note over CM,Storage: Checkpoint Triggered (every 30s)
    CM->>Flink: Inject checkpoint barrier
    Flink->>State: Snapshot current state
    State->>Storage: Persist state snapshot
    Flink->>CM: Report: offset=200, state=snapshot_42
    CM->>Kafka: Commit consumer offset=200
    CM->>Sink: Commit Kafka transaction (flush writes to DB)

    Note over Kafka,Sink: Failure & Recovery
    Flink->>Flink: TaskManager CRASH!
    CM->>Storage: Load last checkpoint (snapshot_42, offset=200)
    CM->>Kafka: Reset consumer to offset=200
    CM->>Flink: Restore state from snapshot_42
    Kafka->>Flink: Replay events from offset 200
    Flink->>State: Re-process (same state → same result)
    Flink->>Sink: Re-emit results (idempotent write → no duplicate)

6.3 Lambda vs Kappa Architecture Comparison

flowchart TB
    subgraph "Lambda Architecture"
        direction TB
        L_SRC["Events"] --> L_BATCH["Batch Layer<br/>(Spark, chay moi gio)"]
        L_SRC --> L_SPEED["Speed Layer<br/>(Flink, real-time)"]
        L_BATCH --> L_BV["Batch View<br/>(accurate, stale)"]
        L_SPEED --> L_RV["Real-time View<br/>(fast, approximate)"]
        L_BV --> L_MERGE["Merge"]
        L_RV --> L_MERGE
        L_MERGE --> L_QS["Query"]

        style L_BATCH fill:#4CAF50,color:#fff
        style L_SPEED fill:#FF9800,color:#fff
        style L_MERGE fill:#f44336,color:#fff
    end

    subgraph "Kappa Architecture (Preferred)"
        direction TB
        K_SRC["Events"] --> K_KAFKA["Kafka<br/>(Immutable Log)"]
        K_KAFKA --> K_STREAM["Stream Layer<br/>(Flink, real-time<br/>exactly-once)"]
        K_STREAM --> K_DB["Serving DB"]
        K_DB --> K_QS["Query"]
        K_KAFKA -.->|"Replay<br/>if needed"| K_STREAM
        K_DB -.-> K_RECON["Reconciliation<br/>(Safety net)"]

        style K_KAFKA fill:#FF9800,color:#fff
        style K_STREAM fill:#4CAF50,color:#fff
        style K_RECON fill:#9E9E9E,color:#fff
    end

6.4 Hot Shard Solution — Salted Keys

flowchart TB
    subgraph "Problem: Hot Shard"
        direction LR
        HOT_AD["ad_001 (viral)<br/>100K clicks/sec"] --> P1_HOT["Partition 0<br/>OVERLOADED"]
        NORMAL1["ad_002<br/>10 clicks/sec"] --> P2["Partition 1<br/>idle"]
        NORMAL2["ad_003<br/>15 clicks/sec"] --> P3["Partition 2<br/>idle"]

        style P1_HOT fill:#f44336,color:#fff
    end

    subgraph "Solution: Salted Keys + Secondary Aggregation"
        direction TB
        SALT["ad_001 → ad_001_0, ad_001_1, ..., ad_001_9"]

        subgraph "First Aggregation (Parallel)"
            S0["ad_001_0<br/>count=10K"]
            S1["ad_001_1<br/>count=10K"]
            S9["ad_001_9<br/>count=10K"]
        end

        subgraph "Secondary Aggregation (Merge)"
            REDUCE["Reduce Node<br/>10K x 10 = 100K"]
        end

        SALT --> S0 & S1 & S9
        S0 & S1 & S9 --> REDUCE

        style SALT fill:#FF9800,color:#fff
        style REDUCE fill:#4CAF50,color:#fff
    end

7. Aha Moments & Pitfalls

7.1 Aha Moments — Nhung dieu Hieu can nho

#	Insight	Giai thich
1	Kappa > Lambda cho hau het truong hop	Lambda architecture ra doi vi streaming frameworks chua mature. Voi Flink exactly-once, Kappa don gian hon va du chinh xac. Chi can them reconciliation batch job lam safety net — day khong phai Lambda
2	Watermark la concept kho nhat	No dai dien cho trade-off giua latency va completeness. Khong co gia tri “dung” — chi co gia tri phu hop voi business. Cach duy nhat hieu watermark: thuc hanh voi Flink
3	Hot shard co the giet performance	1 quang cao viral → 1 partition overloaded → toan bo pipeline cham lai (backpressure). Salted keys la giai phap, nhung tang complexity. Monitor click rate per ad_id de detect som
4	Reconciliation la safety net bat buoc	Du streaming exactly-once, van can batch job verify. Bug trong logic, clock skew, edge cases — tat ca co the gay sai lech nho. Reconciliation phat hien truoc khi advertiser phat hien
5	Exactly-once khong co nghia “xu ly 1 lan”	No co nghia “ket qua giong nhu xu ly 1 lan”. Events co the bi replay (sau checkpoint restore) nhung ket qua khong doi nho idempotent writes. Hieu sai concept nay → thiet ke sai
6	Cascade aggregation tiet kiem compute	Aggregate 1-min window, roi tinh 5-min tu 5x 1-min, 1-hour tu 12x 5-min. Khong can 3 pipeline rieng — 1 pipeline + cascade
7	HyperLogLog la hero an minh	Dem unique users trong 12KB memory per counter voi sai so < 1%. Khong co HLL, bai toan unique count o scale nay gan nhu bat kha thi voi memory hop ly
8	Raw data la bao hiem	Luon luu raw events (du dat). Khi logic sai, khi can re-process, khi advertiser dispute — raw data la nguon su that duy nhat

7.2 Pitfalls — Nhung loi thuong gap

#	Pitfall	Hau qua	Cach tranh
1	Dung processing time thay vi event time	Click xay ra luc 13:59 bi dem vao window 14:00	Luon dung event time cho aggregation. Processing time chi cho system monitoring
2	Khong xu ly late events	Click den muon bi mat → under-count → advertiser khieu nai	Configure allowed lateness, dung side output cho late events
3	Khong co reconciliation	Bug trong streaming lam sai ket qua ma khong ai biet	Chay batch reconciliation moi gio, alert khi drift > threshold
4	Partition by random key	Moi aggregation node can du lieu cua moi ad → shuffle qua nhieu data	Partition by ad_id de data locality. Chi salt khi can xu ly hot keys
5	Checkpoint interval qua lon	Khi failure, phai replay nhieu events → recovery cham, duplicate window lon	Checkpoint moi 30s-1 phut. Balance giua overhead va recovery time
6	Khong monitor consumer lag	Pipeline cham ma khong biet → data stale → advertiser thay so cu	Consumer lag la metric #1 can monitor. Alert ngay khi tang bat thuong
7	Single point of failure cho fraud detection	Fraud service down → tat ca clicks duoc chap nhan → mat tien	Fraud detection async, khong block aggregation pipeline. Fallback: flag va review sau
8	Unique count dung exact counting	O(n) memory per ad per window → memory explosion voi 2M ads	Dung HyperLogLog. Chi dung exact counting cho billing-critical use cases
9	Khong luu raw data	Khong the re-process khi logic sai, khong the reconcile	Luon luu raw events — Cassandra (7 ngay) + S3 (archive)
10	Scale vertically thay vi horizontally	Hit hardware limit, single point of failure	Kafka partitions + Flink parallelism cho horizontal scaling. Vertical chi cho DB optimization

7.3 Interview Tips

Tip	Giai thich
Bat dau voi requirements	Hoi ro: bao nhieu clicks/ngay, latency target, accuracy yeu cau. Khong nhay vao solution ngay
Noi ve trade-offs	Lambda vs Kappa, HLL vs exact count, Cassandra vs ClickHouse — interviewer muon nghe em can nhac, khong phai chi chon 1
Ve diagram truoc, chi tiet sau	High-level flow: Ingestion → Queue → Aggregation → DB → Query. Roi dive vao tung component
Nhac den exactly-once som	Day la yeu cau kho nhat cua bai toan. Neu em biet giai thich Flink checkpoint + idempotent write → an tuong lon
Hot shard la bonus point	Nhieu ung vien khong nghi toi. Neu em tu de cap va giai phap salted keys → interviewer biet em co kinh nghiem thuc te
Reconciliation cho thay em la senior	Junior engineer chi nghi streaming la du. Senior biet can safety net. Nhac den reconciliation batch job → chung to production experience

8. Tong ket — Summary Table

Khia canh	Quyet dinh	Ly do
Architecture	Kappa (streaming only + reconciliation)	Don gian hon Lambda, Flink exactly-once mature
Message queue	Apache Kafka	Durable, partitioned, replayable
Streaming engine	Apache Flink	Exactly-once, stateful processing, checkpoint/savepoint
Raw data store	Cassandra (7 days) + S3 (archive)	Write-heavy, time-series, cost-effective
Aggregated data store	ClickHouse	OLAP-optimized, fast aggregation queries
Partitioning strategy	By ad_id (salted for hot keys)	Data locality cho aggregation
Time window	Tumbling 1-min, cascade to 5-min and 1-hour	Simple, no overlap, efficient
Unique counting	HyperLogLog	Memory efficient, mergeable, ~1% error OK
Deduplication	Bloom filter (5-min window)	Memory efficient, low false positive
Late events	Watermark + allowed lateness 5 min + side output	Balance latency va completeness
Reconciliation	Spark batch job hourly	Safety net, detect drift
Fraud detection	Async, parallel pipeline	Khong block aggregation

9. Internal Links & Further Reading

Prerequisite (da hoc)

Tuan-01-Scale-From-Zero-To-Millions — Nen tang ve scalability
Tuan-02-Back-of-the-envelope — Ky nang estimation
Tuan-08-Message-Queue — Kafka fundamentals, consumer groups, partitions

Lien quan (tham khao cheo)

Tuan-13-Monitoring-Observability — Monitoring pipeline tuong tu (metrics ingestion)
Case-Design-Metrics-Monitoring-Alerting — Bai toan tuong tu nhung cho infrastructure metrics
Tuan-07-Database-Sharding-Replication — Cassandra partitioning, ClickHouse replication

Concepts de tim hieu them

Apache Flink — Stateful stream processing, checkpointing, exactly-once semantics
ClickHouse — Column-oriented OLAP database, materialized views
HyperLogLog — Probabilistic cardinality estimation
Kafka Streams vs Flink — Khi nao dung cai nao
Event sourcing — Luu tat ca events, derive state tu events (Kafka la event log)

“He thong tot khong phai la he thong khong bao gio sai — ma la he thong biet khi nao no sai, va tu sua duoc. Reconciliation chinh la co che tu sua do.”

lthieu's notes

Explorer

Case-Design-Ad-Click-Event-Aggregation