Bonus 1 — Time-Series Databases: TimescaleDB, InfluxDB v3, Prometheus, VictoriaMetrics
“Time-series data tăng theo cấp số nhân và truy vấn theo time window. Postgres làm được, nhưng tools chuyên biệt nhanh hơn 10-100x. Khi nào cần switch + cách chọn đúng.”
Tags: database time-series timescaledb influxdb prometheus observability Thời lượng: 5-6 ngày Prerequisites: Tuan-01-DB-Internals-Refresh · Tuan-14-OLAP-Columnar-ClickHouse Liên quan: Tuan-09-DB-Observability-Tuning · Case-Design-Data-Realtime-Analytics
1. Context — TS workloads
1.1 Characteristics
Time-series data:
- Insert-heavy — write rate >> read rate (1000:1 typical)
- Append-only — rare update/delete
- Sequential timestamp — natural ordering
- Aggregation queries — sum/avg/percentile over windows
- Retention policy — drop old data, automated
- Downsampling — keep summaries of old data
- High cardinality issues — series count explodes with labels
1.2 Use cases
- IoT sensors (temperature, humidity, GPS)
- Observability metrics (CPU, memory, latency)
- Financial ticks (price, volume per ms)
- Application events aggregated
- Real-time dashboards
- Anomaly detection input
- Forecast models input
1.3 Why specialized vs Postgres
Postgres time-series can work:
- BRIN index on time column
- Partition by month
- TimescaleDB extension
But specialized TS DBs offer:
- 5-10x better compression
- Sub-second aggregations on billions of points
- Built-in downsampling
- Cardinality optimizations
Pattern: start Postgres for small data (<100M points), move when scale demands.
2. TimescaleDB — Postgres Extension
2.1 Hypertable basics
CREATE EXTENSION timescaledb;
CREATE TABLE sensor_data (
time timestamptz NOT NULL,
sensor_id int NOT NULL,
value double precision,
metadata jsonb
);
-- Convert to hypertable
SELECT create_hypertable('sensor_data', 'time',
chunk_time_interval => interval '1 day');Hypertable behind scenes:
- Postgres partitioned table by
time - Each chunk = child table (1 day default)
- Inserts auto-routed to right chunk
-- Query (looks normal)
SELECT * FROM sensor_data
WHERE sensor_id = 42 AND time > now() - interval '1 hour';
-- See chunks
SELECT * FROM timescaledb_information.chunks WHERE hypertable_name = 'sensor_data';2.2 Continuous aggregates
Auto-materialized aggregates with incremental refresh.
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS hour,
sensor_id,
avg(value) AS avg,
max(value) AS max,
min(value) AS min,
count(*) AS samples,
percentile_cont(0.95) WITHIN GROUP (ORDER BY value) AS p95
FROM sensor_data
GROUP BY hour, sensor_id;
-- Schedule incremental refresh
SELECT add_continuous_aggregate_policy('sensor_hourly',
start_offset => INTERVAL '2 hours',
end_offset => INTERVAL '10 minutes',
schedule_interval => INTERVAL '10 minutes');Real-time aggregation: query hits BOTH materialized aggregate AND raw recent data → consistent up to current time.
SELECT * FROM sensor_hourly
WHERE sensor_id = 42 AND hour > now() - interval '24 hours';
-- Returns: pre-aggregated + real-time blended2.3 Retention policy
SELECT add_retention_policy('sensor_data', INTERVAL '30 days');
-- Background job drops chunks >30 days old (drop, not delete = fast)2.4 Compression
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time DESC'
);
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');Chunks > 7 days → compressed (~95% reduction). Still queryable but read-mostly.
Compression algorithm: delta encoding (timestamps) + Gorilla (floats) + dictionary (low-card) → similar to ClickHouse for time-series.
2.5 TimescaleDB v2 changes
Notable changes 2022-2024:
- Distributed Hypertables deprecated (was multi-node feature) — most users don’t need; complexity not worth
- Focus on single-node scale + read replicas
- Better continuous aggregate (refresh policies, hierarchical aggregates)
- Hyperfunctions (statistical functions)
2.6 Hyperfunctions
-- Toolkit
CREATE EXTENSION timescaledb_toolkit;
-- Time-weighted average
SELECT sensor_id, time_weight('LOCF', time, value) AS tw_avg
FROM sensor_data WHERE time > now() - interval '1 hour' GROUP BY sensor_id;
-- Approximate percentile (HDR)
SELECT sensor_id, approx_percentile(0.95, percentile_agg(value))
FROM sensor_data GROUP BY sensor_id;
-- Counter reset detection
SELECT counter_agg(time, value) FROM counter_data;2.7 License (TSL)
TimescaleDB has Timescale License (TSL) — not OSS by OSI. Some features Apache-licensed, advanced features (compression, CAGG policies) TSL.
For pure OSS: use base Postgres + manual partitioning (Bonus Postgres Partitioning).
2.8 Pros vs cons
Pros:
- Postgres SQL/tools/ORMs
- ACID transactions
- Join with regular tables (lookup dims)
- Continuous aggregate elegance
- Compression decent
Cons:
- Performance ceiling lower than dedicated TS at massive scale (billion+ points/day)
- TSL license (not pure OSS)
- Operational complexity (extension management)
- No multi-node (since v2 removed)
3. InfluxDB
3.1 v2 vs v3 (2024-2026)
- v1: TSM engine, InfluxQL
- v2: TSM/TSI, Flux, OSS + cloud
- v3 (IOx): GA 2024 — Rust + Apache Arrow + DataFusion + Parquet storage. Completely different.
InfluxDB v3 changes the game:
- Parquet-based storage (open format)
- SQL support (in addition to InfluxQL)
- Lower memory, better cardinality
- Cloud-first, self-host v3 also available 2024
3.2 Data model
Line protocol:
sensor_data,location=room1,sensor=temp value=22.5,humidity=65.2 1700000000000000000
- Measurement:
sensor_data(table) - Tags: indexed, low-cardinality (
location,sensor) - Fields: not indexed, values (
value,humidity) - Timestamp: nanoseconds
Critical: tags indexed → cardinality matters. Each unique tag combo = unique series.
3.3 Query languages
InfluxQL (v1, v2):
SELECT mean(value) FROM sensor_data WHERE time > now() - 1h GROUP BY time(5m), sensorFlux (v2):
from(bucket: "sensors")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "sensor_data")
|> aggregateWindow(every: 5m, fn: mean)SQL (v3):
SELECT mean(value), date_bin('5 minutes', time) AS bucket
FROM sensor_data
WHERE time > now() - INTERVAL '1 hour'
GROUP BY bucket, sensor;3.4 Write path (v3)
graph LR Client --> Write[Write API] Write --> Buffer[In-memory buffer] Buffer -.flush.-> Parquet[(Parquet files)] Parquet --> Compaction[Background compaction] Compaction --> Parquet2[(Compacted Parquet)]
- Writes to in-memory buffer
- Flush to Parquet
- Background compaction merges files
- Storage cheap (S3-compatible)
3.5 When InfluxDB
- High write throughput (millions points/sec)
- Tag-based filtering predominant
- Don’t need joins with relational
- Ecosystem (Telegraf collector, Kapacitor)
- v3: when you want SQL + Parquet open format
4. Prometheus
4.1 Architecture
graph LR Target1[App / exporter] -.scrape.-> Prom[Prometheus] Target2[Node exporter] -.scrape.-> Prom SD[Service Discovery<br/>Kubernetes, Consul] --> Prom Prom --> TSDB[(Local TSDB)] Prom --> Remote[(Remote storage<br/>Cortex, Mimir, Thanos)] Prom --> Grafana Prom --> AM[Alertmanager]
Pull model — Prometheus scrapes targets at intervals (15s default).
Why pull:
- Service discovery — Prometheus knows what’s healthy
- Centralized scraping config
- Caveat: doesn’t fit serverless/cron well → use Pushgateway
4.2 Local TSDB
Memory: 2h "head" block (recent writes)
Disk: 2h blocks → compacted into bigger blocks
Storage format:
- Chunks of compressed samples per series
- Gorilla compression (Facebook 2015) — ~1.37 bytes/sample average for typical metrics
- Mmapped chunk files
Retention default 15 days. Configurable up to TBs.
4.3 PromQL
# Instant value
http_requests_total
# Rate (per-sec) over 5min
rate(http_requests_total[5m])
# Aggregate
sum by (status_code) (rate(http_requests_total[5m]))
avg without (instance) (cpu_usage)
# Histogram percentile
histogram_quantile(0.99, sum(rate(http_duration_bucket[5m])) by (le))
# Subquery
max_over_time(rate(http_requests_total[5m])[1h:1m])
# Difference
deriv(disk_usage[5m])
delta(counter_total[5m])
# Predict linear
predict_linear(disk_usage[1h], 24*3600) # 24h ahead4.4 Recording rules + Alerts
# rules.yml
groups:
- name: http
rules:
- record: http:request_rate_5m
expr: sum by (service) (rate(http_requests_total[5m]))
- name: alerts
rules:
- alert: HighErrorRate
expr: sum(rate(http_errors_total[5m])) by (service) > 0.05
for: 5m
annotations:
summary: "{{ $labels.service }} error rate >5%"Recording rules precompute hot expressions. Alerts evaluated periodically.
4.5 Long-term storage
Local 15 days. For longer:
| Thanos | Cortex / Mimir | VictoriaMetrics | |
|---|---|---|---|
| Storage | Object (S3) | Object | Local + Object |
| Dedup | Yes | Yes | Yes |
| Multi-tenant | Limited | Yes | Yes |
| Query | Across clusters | Across | Single |
| Maintenance | Medium | High | Lower |
| Performance | Good | Good | Best (claims) |
VictoriaMetrics rising 2024 — drop-in Prometheus replacement, faster, less memory, simpler ops.
4.6 Cardinality explosion
Each unique label combination = unique series.
http_requests_total{method="GET", status="200", user_id="42"}
Adding user_id as label → millions of series → OOM.
Rule: keep labels low-cardinality (<10K distinct). Move high-cardinality data to logs or traces.
Diagnose:
prometheus_tsdb_symbol_table_size_bytes
count by (__name__) ({__name__=~".+"})4.7 Histograms vs Summaries
Histogram:
_bucket{le="..."},_sum,_count- Aggregate across instances
- Percentile via
histogram_quantile
Summary:
- Pre-computed percentile per instance
- Can’t aggregate (mean of medians ≠ median)
→ Always Histogram for distributed systems.
PG14+ Postgres exporter ships Histogram-style.
5. VictoriaMetrics — Rising star 2024
5.1 Why notable
- Drop-in Prometheus + PromQL compatible
- 7-10x less memory than Prometheus
- MetricsQL (extended PromQL)
- Single-binary or cluster mode
- OSS Apache 2.0
5.2 Architecture
graph LR Scraper[vmagent<br/>scrape + push] --> Insert[vminsert] Insert --> Storage[(vmstorage)] Storage --> Select[vmselect] Select --> Grafana AM[vmalert] --> Storage
Each component scales independently.
5.3 Use case
If you’re on Prometheus and feeling OOM/perf pain, migrate to VictoriaMetrics. Same config, less pain.
Major adopters: Adidas, Wix, Razorpay.
6. Comparison
| TimescaleDB | InfluxDB v3 | Prometheus | VictoriaMetrics | ClickHouse | |
|---|---|---|---|---|---|
| Type | Postgres ext | Dedicated TS | Metrics-focused | Prom-compat | OLAP general |
| Write throughput | 100K-1M/s | 1M+/s | 1M+/s (with remote) | 1M+/s | 1M+/s |
| Query language | SQL | SQL + Flux + InfluxQL | PromQL | PromQL + MetricsQL | SQL |
| Joins | Yes | Limited | No | No | Limited |
| Retention | Built-in | Built-in | Built-in | Built-in | Manual |
| Compression | Good | Excellent | Excellent (Gorilla) | Excellent | Excellent |
| Cardinality | Medium tolerance | Low-Med (v3 better) | Strict | Better tolerance | Best |
| Use case | Hybrid TS + OLTP | Pure TS at scale | Monitoring | Better Prometheus | TS as part of analytics |
| OSS | TSL (mixed) | MIT | Apache 2.0 | Apache 2.0 | Apache 2.0 |
7. Decision Matrix
flowchart TD A[TS workload] --> B{Use case?} B -->|App/infra metrics| C[Prometheus or VictoriaMetrics] C --> C1{Need long retention/cluster?} C1 -->|Yes| C2[VictoriaMetrics<br/>or Prom + Thanos/Mimir] C1 -->|No, single instance| C3[Prometheus] B -->|Sensor / business TS| D{Postgres already in stack?} D -->|Yes, <1B points| D1[TimescaleDB] D -->|No or very large| D2[InfluxDB v3] B -->|Analytics over TS<br/>+ other dims| E[ClickHouse] B -->|Financial ticks ultra-low latency| F[kdb+ commercial<br/>or QuestDB OSS] style C3 fill:#c8e6c9 style C2 fill:#c8e6c9 style D1 fill:#c8e6c9 style E fill:#c8e6c9
8. Downsampling Strategy
Pattern: keep recent data fine-grained, downsample old to coarse.
Last 24h: 1-sec resolution → 86400 points/series
Last 7 days: 1-min resolution → 10080 points/series
Last 30 days: 5-min resolution → 8640 points/series
Last year: 1-hour resolution → 8760 points/series
Storage saved exponentially.
8.1 TimescaleDB
Continuous aggregate at each level + retention policy:
-- 1-min aggregate retain 7 days
SELECT add_continuous_aggregate_policy('sensor_1min',
start_offset => INTERVAL '1 day',
end_offset => INTERVAL '1 min',
schedule_interval => INTERVAL '1 min');
SELECT add_retention_policy('sensor_1min', INTERVAL '7 days');
-- 5-min retain 30 days
-- 1-hour retain 1 year8.2 Prometheus / VictoriaMetrics
Recording rules + remote write tier:
- record: 'job:http_rate:5m'
expr: sum by (job) (rate(http_requests_total[5m]))Aggregate stored as new series, can have different retention via storage tier.
8.3 InfluxDB v3
Continuous query + retention policy.
8.4 ClickHouse
Materialized view rollup (Tuần 14).
9. Operations & Tuning
9.1 Prometheus tuning
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: prod-us
storage:
tsdb:
retention.time: 15d
retention.size: 200GB
wal-compression: true9.2 Cardinality monitoring
# Top metrics by series count
topk(10, count by (__name__) ({__name__=~".+"}))
# Memory usage
process_resident_memory_bytes{job="prometheus"}Alert if cardinality > 1M series.
9.3 Backup
- TimescaleDB: pgBackRest (it’s Postgres after all)
- InfluxDB v3: Parquet on S3, just snapshot
- Prometheus: snapshot endpoint then copy
- VictoriaMetrics: same
9.4 HA
- Prometheus: 2 instances scraping same targets + dedup downstream
- TimescaleDB: streaming replication
- VictoriaMetrics: built-in HA in cluster mode
10. Common Patterns
10.1 Alerting
# Disk filling up - predict in 4h
predict_linear(node_filesystem_avail_bytes[1h], 4*3600) < 010.2 SLO tracking
# Availability SLO
(1 - (sum(rate(errors_total[30d])) / sum(rate(requests_total[30d])))) * 10010.3 Heartbeat
# Service hasn't reported in 5 min
absent_over_time(up{service="critical"}[5m])10.4 Histogram-based latency
# P99 latency by endpoint
histogram_quantile(0.99,
sum by (endpoint, le) (rate(http_duration_bucket[5m]))
)11. Anti-patterns
| Pattern | Why bad | Fix |
|---|---|---|
| High-cardinality labels (user_id, trace_id) | Series explosion → OOM | Use logs/traces for these |
| Scrape interval too short (1s) | Storage explosion | 15-60s typical |
| No retention policy | Disk fills | Set retention |
| Using Summary instead of Histogram | Can’t aggregate percentiles | Histogram |
| Prometheus as long-term store | Limit 15-30d sensible | Thanos/Mimir/VM |
| Using TS DB for join-heavy workload | Joins limited | Use Postgres or warehouse |
| Mixing TS + relational in same DB at scale | Resource contention | Separate |
| TimescaleDB without compression | Storage bloats | Compression policy |
| Querying raw data when aggregates exist | Slow | Use continuous aggs |
| Aggregating across high-cardinality | Slow even with downsampling | Pre-aggregate |
12. Lab
Day 1: TimescaleDB
Setup TimescaleDB + ingest 100M sensor readings. Create hypertable + continuous aggregate. Compare vs plain Postgres partitioned.
Day 2: Prometheus + Grafana
Scrape node-exporter + postgres-exporter. Build dashboard. Write 5 alerts.
Day 3: VictoriaMetrics migration
Replace Prometheus with VictoriaMetrics. Run same dashboard, compare memory + speed.
Day 4: InfluxDB v3
Ingest IoT-style data. SQL queries. Compare with TimescaleDB on same data.
Day 5: Downsampling
Build 3-tier downsampling in TimescaleDB. Verify storage saved.
Day 6: Cardinality experiment
Push high-cardinality metric to Prometheus. Watch memory. Refactor with label_replace or aggregation.
Day 7: Real-world
Build full observability stack: app instrumented with histograms, Prometheus, alerts, Grafana dashboards.
13. Self-check
- TS workload characteristics — 5 đặc điểm?
- Hypertable trong TimescaleDB là gì?
- Continuous aggregate — khác Postgres MV thế nào?
- Gorilla compression đặc trưng cho gì?
- Pull vs Push model — Prometheus chọn pull tại sao?
- Cardinality explosion — nguyên nhân + fix?
- Histogram vs Summary — pick cái nào và tại sao?
- VictoriaMetrics vs Prometheus — khác biệt chính?
- TimescaleDB compression: trade-off?
- Decision: business sensor data → pick DB nào?
14. Tiếp theo
Cập nhật: 2026-05-16