Bonus 1 — Time-Series Databases: TimescaleDB, InfluxDB v3, Prometheus, VictoriaMetrics

“Time-series data tăng theo cấp số nhân và truy vấn theo time window. Postgres làm được, nhưng tools chuyên biệt nhanh hơn 10-100x. Khi nào cần switch + cách chọn đúng.”

Tags: database time-series timescaledb influxdb prometheus observability Thời lượng: 5-6 ngày Prerequisites: Tuan-01-DB-Internals-Refresh · Tuan-14-OLAP-Columnar-ClickHouse Liên quan: Tuan-09-DB-Observability-Tuning · Case-Design-Data-Realtime-Analytics

1. Context — TS workloads

1.1 Characteristics

Time-series data:

Insert-heavy — write rate >> read rate (1000:1 typical)
Append-only — rare update/delete
Sequential timestamp — natural ordering
Aggregation queries — sum/avg/percentile over windows
Retention policy — drop old data, automated
Downsampling — keep summaries of old data
High cardinality issues — series count explodes with labels

1.2 Use cases

IoT sensors (temperature, humidity, GPS)
Observability metrics (CPU, memory, latency)
Financial ticks (price, volume per ms)
Application events aggregated
Real-time dashboards
Anomaly detection input
Forecast models input

1.3 Why specialized vs Postgres

Postgres time-series can work:

BRIN index on time column
Partition by month
TimescaleDB extension

But specialized TS DBs offer:

5-10x better compression
Sub-second aggregations on billions of points
Built-in downsampling
Cardinality optimizations

Pattern: start Postgres for small data (<100M points), move when scale demands.

2. TimescaleDB — Postgres Extension

2.1 Hypertable basics

CREATE EXTENSION timescaledb;
 
CREATE TABLE sensor_data (
    time timestamptz NOT NULL,
    sensor_id int NOT NULL,
    value double precision,
    metadata jsonb
);
 
-- Convert to hypertable
SELECT create_hypertable('sensor_data', 'time',
    chunk_time_interval => interval '1 day');

Hypertable behind scenes:

Postgres partitioned table by time
Each chunk = child table (1 day default)
Inserts auto-routed to right chunk

-- Query (looks normal)
SELECT * FROM sensor_data
WHERE sensor_id = 42 AND time > now() - interval '1 hour';
 
-- See chunks
SELECT * FROM timescaledb_information.chunks WHERE hypertable_name = 'sensor_data';

2.2 Continuous aggregates

Auto-materialized aggregates with incremental refresh.

CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS hour,
    sensor_id,
    avg(value) AS avg,
    max(value) AS max,
    min(value) AS min,
    count(*) AS samples,
    percentile_cont(0.95) WITHIN GROUP (ORDER BY value) AS p95
FROM sensor_data
GROUP BY hour, sensor_id;
 
-- Schedule incremental refresh
SELECT add_continuous_aggregate_policy('sensor_hourly',
    start_offset => INTERVAL '2 hours',
    end_offset => INTERVAL '10 minutes',
    schedule_interval => INTERVAL '10 minutes');

Real-time aggregation: query hits BOTH materialized aggregate AND raw recent data → consistent up to current time.

SELECT * FROM sensor_hourly
WHERE sensor_id = 42 AND hour > now() - interval '24 hours';
-- Returns: pre-aggregated + real-time blended

2.3 Retention policy

SELECT add_retention_policy('sensor_data', INTERVAL '30 days');
-- Background job drops chunks >30 days old (drop, not delete = fast)

2.4 Compression

ALTER TABLE sensor_data SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id',
    timescaledb.compress_orderby = 'time DESC'
);
 
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');

Chunks > 7 days → compressed (~95% reduction). Still queryable but read-mostly.

Compression algorithm: delta encoding (timestamps) + Gorilla (floats) + dictionary (low-card) → similar to ClickHouse for time-series.

2.5 TimescaleDB v2 changes

Notable changes 2022-2024:

Distributed Hypertables deprecated (was multi-node feature) — most users don’t need; complexity not worth
Focus on single-node scale + read replicas
Better continuous aggregate (refresh policies, hierarchical aggregates)
Hyperfunctions (statistical functions)

2.6 Hyperfunctions

-- Toolkit
CREATE EXTENSION timescaledb_toolkit;
 
-- Time-weighted average
SELECT sensor_id, time_weight('LOCF', time, value) AS tw_avg
FROM sensor_data WHERE time > now() - interval '1 hour' GROUP BY sensor_id;
 
-- Approximate percentile (HDR)
SELECT sensor_id, approx_percentile(0.95, percentile_agg(value))
FROM sensor_data GROUP BY sensor_id;
 
-- Counter reset detection
SELECT counter_agg(time, value) FROM counter_data;

2.7 License (TSL)

TimescaleDB has Timescale License (TSL) — not OSS by OSI. Some features Apache-licensed, advanced features (compression, CAGG policies) TSL.

For pure OSS: use base Postgres + manual partitioning (Bonus Postgres Partitioning).

2.8 Pros vs cons

Pros:

Postgres SQL/tools/ORMs
ACID transactions
Join with regular tables (lookup dims)
Continuous aggregate elegance
Compression decent

Cons:

Performance ceiling lower than dedicated TS at massive scale (billion+ points/day)
TSL license (not pure OSS)
Operational complexity (extension management)
No multi-node (since v2 removed)

3. InfluxDB

3.1 v2 vs v3 (2024-2026)

v1: TSM engine, InfluxQL
v2: TSM/TSI, Flux, OSS + cloud
v3 (IOx): GA 2024 — Rust + Apache Arrow + DataFusion + Parquet storage. Completely different.

InfluxDB v3 changes the game:

Parquet-based storage (open format)
SQL support (in addition to InfluxQL)
Lower memory, better cardinality
Cloud-first, self-host v3 also available 2024

3.2 Data model

Line protocol:

sensor_data,location=room1,sensor=temp value=22.5,humidity=65.2 1700000000000000000

Measurement: sensor_data (table)
Tags: indexed, low-cardinality (location, sensor)
Fields: not indexed, values (value, humidity)
Timestamp: nanoseconds

Critical: tags indexed → cardinality matters. Each unique tag combo = unique series.

3.3 Query languages

InfluxQL (v1, v2):

SELECT mean(value) FROM sensor_data WHERE time > now() - 1h GROUP BY time(5m), sensor

Flux (v2):

from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "sensor_data")
  |> aggregateWindow(every: 5m, fn: mean)

SQL (v3):

SELECT mean(value), date_bin('5 minutes', time) AS bucket
FROM sensor_data
WHERE time > now() - INTERVAL '1 hour'
GROUP BY bucket, sensor;

3.4 Write path (v3)

graph LR
    Client --> Write[Write API]
    Write --> Buffer[In-memory buffer]
    Buffer -.flush.-> Parquet[(Parquet files)]
    Parquet --> Compaction[Background compaction]
    Compaction --> Parquet2[(Compacted Parquet)]

Writes to in-memory buffer
Flush to Parquet
Background compaction merges files
Storage cheap (S3-compatible)

3.5 When InfluxDB

High write throughput (millions points/sec)
Tag-based filtering predominant
Don’t need joins with relational
Ecosystem (Telegraf collector, Kapacitor)
v3: when you want SQL + Parquet open format

4. Prometheus

4.1 Architecture

graph LR
    Target1[App / exporter] -.scrape.-> Prom[Prometheus]
    Target2[Node exporter] -.scrape.-> Prom
    SD[Service Discovery<br/>Kubernetes, Consul] --> Prom
    Prom --> TSDB[(Local TSDB)]
    Prom --> Remote[(Remote storage<br/>Cortex, Mimir, Thanos)]
    Prom --> Grafana
    Prom --> AM[Alertmanager]

Pull model — Prometheus scrapes targets at intervals (15s default).

Why pull:

Service discovery — Prometheus knows what’s healthy
Centralized scraping config
Caveat: doesn’t fit serverless/cron well → use Pushgateway

4.2 Local TSDB

Memory: 2h "head" block (recent writes)
Disk: 2h blocks → compacted into bigger blocks

Storage format:

Chunks of compressed samples per series
Gorilla compression (Facebook 2015) — ~1.37 bytes/sample average for typical metrics
Mmapped chunk files

Retention default 15 days. Configurable up to TBs.

4.3 PromQL

# Instant value
http_requests_total
 
# Rate (per-sec) over 5min
rate(http_requests_total[5m])
 
# Aggregate
sum by (status_code) (rate(http_requests_total[5m]))
avg without (instance) (cpu_usage)
 
# Histogram percentile
histogram_quantile(0.99, sum(rate(http_duration_bucket[5m])) by (le))
 
# Subquery
max_over_time(rate(http_requests_total[5m])[1h:1m])
 
# Difference
deriv(disk_usage[5m])
delta(counter_total[5m])
 
# Predict linear
predict_linear(disk_usage[1h], 24*3600)  # 24h ahead

4.4 Recording rules + Alerts

# rules.yml
groups:
- name: http
  rules:
  - record: http:request_rate_5m
    expr: sum by (service) (rate(http_requests_total[5m]))
 
- name: alerts
  rules:
  - alert: HighErrorRate
    expr: sum(rate(http_errors_total[5m])) by (service) > 0.05
    for: 5m
    annotations:
      summary: "{{ $labels.service }} error rate >5%"

Recording rules precompute hot expressions. Alerts evaluated periodically.

4.5 Long-term storage

Local 15 days. For longer:

	Thanos	Cortex / Mimir	VictoriaMetrics
Storage	Object (S3)	Object	Local + Object
Dedup	Yes	Yes	Yes
Multi-tenant	Limited	Yes	Yes
Query	Across clusters	Across	Single
Maintenance	Medium	High	Lower
Performance	Good	Good	Best (claims)

VictoriaMetrics rising 2024 — drop-in Prometheus replacement, faster, less memory, simpler ops.

4.6 Cardinality explosion

Each unique label combination = unique series.

http_requests_total{method="GET", status="200", user_id="42"}

Adding user_id as label → millions of series → OOM.

Rule: keep labels low-cardinality (<10K distinct). Move high-cardinality data to logs or traces.

Diagnose:

prometheus_tsdb_symbol_table_size_bytes
count by (__name__) ({__name__=~".+"})

4.7 Histograms vs Summaries

Histogram:

_bucket{le="..."}, _sum, _count
Aggregate across instances
Percentile via histogram_quantile

Summary:

Pre-computed percentile per instance
Can’t aggregate (mean of medians ≠ median)

→ Always Histogram for distributed systems.

PG14+ Postgres exporter ships Histogram-style.

5. VictoriaMetrics — Rising star 2024

5.1 Why notable

Drop-in Prometheus + PromQL compatible
7-10x less memory than Prometheus
MetricsQL (extended PromQL)
Single-binary or cluster mode
OSS Apache 2.0

5.2 Architecture

graph LR
    Scraper[vmagent<br/>scrape + push] --> Insert[vminsert]
    Insert --> Storage[(vmstorage)]
    Storage --> Select[vmselect]
    Select --> Grafana
    AM[vmalert] --> Storage

Each component scales independently.

5.3 Use case

If you’re on Prometheus and feeling OOM/perf pain, migrate to VictoriaMetrics. Same config, less pain.

Major adopters: Adidas, Wix, Razorpay.

6. Comparison

	TimescaleDB	InfluxDB v3	Prometheus	VictoriaMetrics	ClickHouse
Type	Postgres ext	Dedicated TS	Metrics-focused	Prom-compat	OLAP general
Write throughput	100K-1M/s	1M+/s	1M+/s (with remote)	1M+/s	1M+/s
Query language	SQL	SQL + Flux + InfluxQL	PromQL	PromQL + MetricsQL	SQL
Joins	Yes	Limited	No	No	Limited
Retention	Built-in	Built-in	Built-in	Built-in	Manual
Compression	Good	Excellent	Excellent (Gorilla)	Excellent	Excellent
Cardinality	Medium tolerance	Low-Med (v3 better)	Strict	Better tolerance	Best
Use case	Hybrid TS + OLTP	Pure TS at scale	Monitoring	Better Prometheus	TS as part of analytics
OSS	TSL (mixed)	MIT	Apache 2.0	Apache 2.0	Apache 2.0

7. Decision Matrix

flowchart TD
    A[TS workload] --> B{Use case?}

    B -->|App/infra metrics| C[Prometheus or VictoriaMetrics]
    C --> C1{Need long retention/cluster?}
    C1 -->|Yes| C2[VictoriaMetrics<br/>or Prom + Thanos/Mimir]
    C1 -->|No, single instance| C3[Prometheus]

    B -->|Sensor / business TS| D{Postgres already in stack?}
    D -->|Yes, <1B points| D1[TimescaleDB]
    D -->|No or very large| D2[InfluxDB v3]

    B -->|Analytics over TS<br/>+ other dims| E[ClickHouse]

    B -->|Financial ticks ultra-low latency| F[kdb+ commercial<br/>or QuestDB OSS]

    style C3 fill:#c8e6c9
    style C2 fill:#c8e6c9
    style D1 fill:#c8e6c9
    style E fill:#c8e6c9

8. Downsampling Strategy

Pattern: keep recent data fine-grained, downsample old to coarse.

Last 24h: 1-sec resolution → 86400 points/series
Last 7 days: 1-min resolution → 10080 points/series
Last 30 days: 5-min resolution → 8640 points/series
Last year: 1-hour resolution → 8760 points/series

Storage saved exponentially.

8.1 TimescaleDB

Continuous aggregate at each level + retention policy:

-- 1-min aggregate retain 7 days
SELECT add_continuous_aggregate_policy('sensor_1min',
    start_offset => INTERVAL '1 day',
    end_offset => INTERVAL '1 min',
    schedule_interval => INTERVAL '1 min');
SELECT add_retention_policy('sensor_1min', INTERVAL '7 days');
 
-- 5-min retain 30 days
-- 1-hour retain 1 year

8.2 Prometheus / VictoriaMetrics

Recording rules + remote write tier:

- record: 'job:http_rate:5m'
  expr: sum by (job) (rate(http_requests_total[5m]))

Aggregate stored as new series, can have different retention via storage tier.

8.3 InfluxDB v3

Continuous query + retention policy.

8.4 ClickHouse

Materialized view rollup (Tuần 14).

9. Operations & Tuning

9.1 Prometheus tuning

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: prod-us
 
storage:
  tsdb:
    retention.time: 15d
    retention.size: 200GB
    wal-compression: true

9.2 Cardinality monitoring

# Top metrics by series count
topk(10, count by (__name__) ({__name__=~".+"}))
 
# Memory usage
process_resident_memory_bytes{job="prometheus"}

Alert if cardinality > 1M series.

9.3 Backup

TimescaleDB: pgBackRest (it’s Postgres after all)
InfluxDB v3: Parquet on S3, just snapshot
Prometheus: snapshot endpoint then copy
VictoriaMetrics: same

9.4 HA

Prometheus: 2 instances scraping same targets + dedup downstream
TimescaleDB: streaming replication
VictoriaMetrics: built-in HA in cluster mode

10. Common Patterns

10.1 Alerting

# Disk filling up - predict in 4h
predict_linear(node_filesystem_avail_bytes[1h], 4*3600) < 0

10.2 SLO tracking

# Availability SLO
(1 - (sum(rate(errors_total[30d])) / sum(rate(requests_total[30d])))) * 100

10.3 Heartbeat

# Service hasn't reported in 5 min
absent_over_time(up{service="critical"}[5m])

10.4 Histogram-based latency

# P99 latency by endpoint
histogram_quantile(0.99,
    sum by (endpoint, le) (rate(http_duration_bucket[5m]))
)

11. Anti-patterns

Pattern	Why bad	Fix
High-cardinality labels (user_id, trace_id)	Series explosion → OOM	Use logs/traces for these
Scrape interval too short (1s)	Storage explosion	15-60s typical
No retention policy	Disk fills	Set retention
Using Summary instead of Histogram	Can’t aggregate percentiles	Histogram
Prometheus as long-term store	Limit 15-30d sensible	Thanos/Mimir/VM
Using TS DB for join-heavy workload	Joins limited	Use Postgres or warehouse
Mixing TS + relational in same DB at scale	Resource contention	Separate
TimescaleDB without compression	Storage bloats	Compression policy
Querying raw data when aggregates exist	Slow	Use continuous aggs
Aggregating across high-cardinality	Slow even with downsampling	Pre-aggregate

12. Lab

Day 1: TimescaleDB

Setup TimescaleDB + ingest 100M sensor readings. Create hypertable + continuous aggregate. Compare vs plain Postgres partitioned.

Day 2: Prometheus + Grafana

Scrape node-exporter + postgres-exporter. Build dashboard. Write 5 alerts.

Day 3: VictoriaMetrics migration

Replace Prometheus with VictoriaMetrics. Run same dashboard, compare memory + speed.

Day 4: InfluxDB v3

Ingest IoT-style data. SQL queries. Compare with TimescaleDB on same data.

Day 5: Downsampling

Build 3-tier downsampling in TimescaleDB. Verify storage saved.

Day 6: Cardinality experiment

Push high-cardinality metric to Prometheus. Watch memory. Refactor with label_replace or aggregation.

Day 7: Real-world

Build full observability stack: app instrumented with histograms, Prometheus, alerts, Grafana dashboards.

13. Self-check

TS workload characteristics — 5 đặc điểm?
Hypertable trong TimescaleDB là gì?
Continuous aggregate — khác Postgres MV thế nào?
Gorilla compression đặc trưng cho gì?
Pull vs Push model — Prometheus chọn pull tại sao?
Cardinality explosion — nguyên nhân + fix?
Histogram vs Summary — pick cái nào và tại sao?
VictoriaMetrics vs Prometheus — khác biệt chính?
TimescaleDB compression: trade-off?
Decision: business sensor data → pick DB nào?

14. Tiếp theo

Tuan-Bonus-Graph-DB-Neo4j

Cập nhật: 2026-05-16

lthieu's notes

Explorer

Tuan-Bonus-Time-Series-DB