Bonus 1 — Time-Series Databases: TimescaleDB, InfluxDB v3, Prometheus, VictoriaMetrics

“Time-series data tăng theo cấp số nhân và truy vấn theo time window. Postgres làm được, nhưng tools chuyên biệt nhanh hơn 10-100x. Khi nào cần switch + cách chọn đúng.”

Tags: database time-series timescaledb influxdb prometheus observability Thời lượng: 5-6 ngày Prerequisites: Tuan-01-DB-Internals-Refresh · Tuan-14-OLAP-Columnar-ClickHouse Liên quan: Tuan-09-DB-Observability-Tuning · Case-Design-Data-Realtime-Analytics


1. Context — TS workloads

1.1 Characteristics

Time-series data:

  • Insert-heavy — write rate >> read rate (1000:1 typical)
  • Append-only — rare update/delete
  • Sequential timestamp — natural ordering
  • Aggregation queries — sum/avg/percentile over windows
  • Retention policy — drop old data, automated
  • Downsampling — keep summaries of old data
  • High cardinality issues — series count explodes with labels

1.2 Use cases

  • IoT sensors (temperature, humidity, GPS)
  • Observability metrics (CPU, memory, latency)
  • Financial ticks (price, volume per ms)
  • Application events aggregated
  • Real-time dashboards
  • Anomaly detection input
  • Forecast models input

1.3 Why specialized vs Postgres

Postgres time-series can work:

  • BRIN index on time column
  • Partition by month
  • TimescaleDB extension

But specialized TS DBs offer:

  • 5-10x better compression
  • Sub-second aggregations on billions of points
  • Built-in downsampling
  • Cardinality optimizations

Pattern: start Postgres for small data (<100M points), move when scale demands.


2. TimescaleDB — Postgres Extension

2.1 Hypertable basics

CREATE EXTENSION timescaledb;
 
CREATE TABLE sensor_data (
    time timestamptz NOT NULL,
    sensor_id int NOT NULL,
    value double precision,
    metadata jsonb
);
 
-- Convert to hypertable
SELECT create_hypertable('sensor_data', 'time',
    chunk_time_interval => interval '1 day');

Hypertable behind scenes:

  • Postgres partitioned table by time
  • Each chunk = child table (1 day default)
  • Inserts auto-routed to right chunk
-- Query (looks normal)
SELECT * FROM sensor_data
WHERE sensor_id = 42 AND time > now() - interval '1 hour';
 
-- See chunks
SELECT * FROM timescaledb_information.chunks WHERE hypertable_name = 'sensor_data';

2.2 Continuous aggregates

Auto-materialized aggregates with incremental refresh.

CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS hour,
    sensor_id,
    avg(value) AS avg,
    max(value) AS max,
    min(value) AS min,
    count(*) AS samples,
    percentile_cont(0.95) WITHIN GROUP (ORDER BY value) AS p95
FROM sensor_data
GROUP BY hour, sensor_id;
 
-- Schedule incremental refresh
SELECT add_continuous_aggregate_policy('sensor_hourly',
    start_offset => INTERVAL '2 hours',
    end_offset => INTERVAL '10 minutes',
    schedule_interval => INTERVAL '10 minutes');

Real-time aggregation: query hits BOTH materialized aggregate AND raw recent data → consistent up to current time.

SELECT * FROM sensor_hourly
WHERE sensor_id = 42 AND hour > now() - interval '24 hours';
-- Returns: pre-aggregated + real-time blended

2.3 Retention policy

SELECT add_retention_policy('sensor_data', INTERVAL '30 days');
-- Background job drops chunks >30 days old (drop, not delete = fast)

2.4 Compression

ALTER TABLE sensor_data SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id',
    timescaledb.compress_orderby = 'time DESC'
);
 
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');

Chunks > 7 days → compressed (~95% reduction). Still queryable but read-mostly.

Compression algorithm: delta encoding (timestamps) + Gorilla (floats) + dictionary (low-card) → similar to ClickHouse for time-series.

2.5 TimescaleDB v2 changes

Notable changes 2022-2024:

  • Distributed Hypertables deprecated (was multi-node feature) — most users don’t need; complexity not worth
  • Focus on single-node scale + read replicas
  • Better continuous aggregate (refresh policies, hierarchical aggregates)
  • Hyperfunctions (statistical functions)

2.6 Hyperfunctions

-- Toolkit
CREATE EXTENSION timescaledb_toolkit;
 
-- Time-weighted average
SELECT sensor_id, time_weight('LOCF', time, value) AS tw_avg
FROM sensor_data WHERE time > now() - interval '1 hour' GROUP BY sensor_id;
 
-- Approximate percentile (HDR)
SELECT sensor_id, approx_percentile(0.95, percentile_agg(value))
FROM sensor_data GROUP BY sensor_id;
 
-- Counter reset detection
SELECT counter_agg(time, value) FROM counter_data;

2.7 License (TSL)

TimescaleDB has Timescale License (TSL) — not OSS by OSI. Some features Apache-licensed, advanced features (compression, CAGG policies) TSL.

For pure OSS: use base Postgres + manual partitioning (Bonus Postgres Partitioning).

2.8 Pros vs cons

Pros:

  • Postgres SQL/tools/ORMs
  • ACID transactions
  • Join with regular tables (lookup dims)
  • Continuous aggregate elegance
  • Compression decent

Cons:

  • Performance ceiling lower than dedicated TS at massive scale (billion+ points/day)
  • TSL license (not pure OSS)
  • Operational complexity (extension management)
  • No multi-node (since v2 removed)

3. InfluxDB

3.1 v2 vs v3 (2024-2026)

  • v1: TSM engine, InfluxQL
  • v2: TSM/TSI, Flux, OSS + cloud
  • v3 (IOx): GA 2024 — Rust + Apache Arrow + DataFusion + Parquet storage. Completely different.

InfluxDB v3 changes the game:

  • Parquet-based storage (open format)
  • SQL support (in addition to InfluxQL)
  • Lower memory, better cardinality
  • Cloud-first, self-host v3 also available 2024

3.2 Data model

Line protocol:

sensor_data,location=room1,sensor=temp value=22.5,humidity=65.2 1700000000000000000
  • Measurement: sensor_data (table)
  • Tags: indexed, low-cardinality (location, sensor)
  • Fields: not indexed, values (value, humidity)
  • Timestamp: nanoseconds

Critical: tags indexed → cardinality matters. Each unique tag combo = unique series.

3.3 Query languages

InfluxQL (v1, v2):

SELECT mean(value) FROM sensor_data WHERE time > now() - 1h GROUP BY time(5m), sensor

Flux (v2):

from(bucket: "sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "sensor_data")
  |> aggregateWindow(every: 5m, fn: mean)

SQL (v3):

SELECT mean(value), date_bin('5 minutes', time) AS bucket
FROM sensor_data
WHERE time > now() - INTERVAL '1 hour'
GROUP BY bucket, sensor;

3.4 Write path (v3)

graph LR
    Client --> Write[Write API]
    Write --> Buffer[In-memory buffer]
    Buffer -.flush.-> Parquet[(Parquet files)]
    Parquet --> Compaction[Background compaction]
    Compaction --> Parquet2[(Compacted Parquet)]
  • Writes to in-memory buffer
  • Flush to Parquet
  • Background compaction merges files
  • Storage cheap (S3-compatible)

3.5 When InfluxDB

  • High write throughput (millions points/sec)
  • Tag-based filtering predominant
  • Don’t need joins with relational
  • Ecosystem (Telegraf collector, Kapacitor)
  • v3: when you want SQL + Parquet open format

4. Prometheus

4.1 Architecture

graph LR
    Target1[App / exporter] -.scrape.-> Prom[Prometheus]
    Target2[Node exporter] -.scrape.-> Prom
    SD[Service Discovery<br/>Kubernetes, Consul] --> Prom
    Prom --> TSDB[(Local TSDB)]
    Prom --> Remote[(Remote storage<br/>Cortex, Mimir, Thanos)]
    Prom --> Grafana
    Prom --> AM[Alertmanager]

Pull model — Prometheus scrapes targets at intervals (15s default).

Why pull:

  • Service discovery — Prometheus knows what’s healthy
  • Centralized scraping config
  • Caveat: doesn’t fit serverless/cron well → use Pushgateway

4.2 Local TSDB

Memory: 2h "head" block (recent writes)
Disk: 2h blocks → compacted into bigger blocks

Storage format:

  • Chunks of compressed samples per series
  • Gorilla compression (Facebook 2015) — ~1.37 bytes/sample average for typical metrics
  • Mmapped chunk files

Retention default 15 days. Configurable up to TBs.

4.3 PromQL

# Instant value
http_requests_total
 
# Rate (per-sec) over 5min
rate(http_requests_total[5m])
 
# Aggregate
sum by (status_code) (rate(http_requests_total[5m]))
avg without (instance) (cpu_usage)
 
# Histogram percentile
histogram_quantile(0.99, sum(rate(http_duration_bucket[5m])) by (le))
 
# Subquery
max_over_time(rate(http_requests_total[5m])[1h:1m])
 
# Difference
deriv(disk_usage[5m])
delta(counter_total[5m])
 
# Predict linear
predict_linear(disk_usage[1h], 24*3600)  # 24h ahead

4.4 Recording rules + Alerts

# rules.yml
groups:
- name: http
  rules:
  - record: http:request_rate_5m
    expr: sum by (service) (rate(http_requests_total[5m]))
 
- name: alerts
  rules:
  - alert: HighErrorRate
    expr: sum(rate(http_errors_total[5m])) by (service) > 0.05
    for: 5m
    annotations:
      summary: "{{ $labels.service }} error rate >5%"

Recording rules precompute hot expressions. Alerts evaluated periodically.

4.5 Long-term storage

Local 15 days. For longer:

ThanosCortex / MimirVictoriaMetrics
StorageObject (S3)ObjectLocal + Object
DedupYesYesYes
Multi-tenantLimitedYesYes
QueryAcross clustersAcrossSingle
MaintenanceMediumHighLower
PerformanceGoodGoodBest (claims)

VictoriaMetrics rising 2024 — drop-in Prometheus replacement, faster, less memory, simpler ops.

4.6 Cardinality explosion

Each unique label combination = unique series.

http_requests_total{method="GET", status="200", user_id="42"}

Adding user_id as label → millions of series → OOM.

Rule: keep labels low-cardinality (<10K distinct). Move high-cardinality data to logs or traces.

Diagnose:

prometheus_tsdb_symbol_table_size_bytes
count by (__name__) ({__name__=~".+"})

4.7 Histograms vs Summaries

Histogram:

  • _bucket{le="..."}, _sum, _count
  • Aggregate across instances
  • Percentile via histogram_quantile

Summary:

  • Pre-computed percentile per instance
  • Can’t aggregate (mean of medians ≠ median)

→ Always Histogram for distributed systems.

PG14+ Postgres exporter ships Histogram-style.


5. VictoriaMetrics — Rising star 2024

5.1 Why notable

  • Drop-in Prometheus + PromQL compatible
  • 7-10x less memory than Prometheus
  • MetricsQL (extended PromQL)
  • Single-binary or cluster mode
  • OSS Apache 2.0

5.2 Architecture

graph LR
    Scraper[vmagent<br/>scrape + push] --> Insert[vminsert]
    Insert --> Storage[(vmstorage)]
    Storage --> Select[vmselect]
    Select --> Grafana
    AM[vmalert] --> Storage

Each component scales independently.

5.3 Use case

If you’re on Prometheus and feeling OOM/perf pain, migrate to VictoriaMetrics. Same config, less pain.

Major adopters: Adidas, Wix, Razorpay.


6. Comparison

TimescaleDBInfluxDB v3PrometheusVictoriaMetricsClickHouse
TypePostgres extDedicated TSMetrics-focusedProm-compatOLAP general
Write throughput100K-1M/s1M+/s1M+/s (with remote)1M+/s1M+/s
Query languageSQLSQL + Flux + InfluxQLPromQLPromQL + MetricsQLSQL
JoinsYesLimitedNoNoLimited
RetentionBuilt-inBuilt-inBuilt-inBuilt-inManual
CompressionGoodExcellentExcellent (Gorilla)ExcellentExcellent
CardinalityMedium toleranceLow-Med (v3 better)StrictBetter toleranceBest
Use caseHybrid TS + OLTPPure TS at scaleMonitoringBetter PrometheusTS as part of analytics
OSSTSL (mixed)MITApache 2.0Apache 2.0Apache 2.0

7. Decision Matrix

flowchart TD
    A[TS workload] --> B{Use case?}

    B -->|App/infra metrics| C[Prometheus or VictoriaMetrics]
    C --> C1{Need long retention/cluster?}
    C1 -->|Yes| C2[VictoriaMetrics<br/>or Prom + Thanos/Mimir]
    C1 -->|No, single instance| C3[Prometheus]

    B -->|Sensor / business TS| D{Postgres already in stack?}
    D -->|Yes, <1B points| D1[TimescaleDB]
    D -->|No or very large| D2[InfluxDB v3]

    B -->|Analytics over TS<br/>+ other dims| E[ClickHouse]

    B -->|Financial ticks ultra-low latency| F[kdb+ commercial<br/>or QuestDB OSS]

    style C3 fill:#c8e6c9
    style C2 fill:#c8e6c9
    style D1 fill:#c8e6c9
    style E fill:#c8e6c9

8. Downsampling Strategy

Pattern: keep recent data fine-grained, downsample old to coarse.

Last 24h: 1-sec resolution → 86400 points/series
Last 7 days: 1-min resolution → 10080 points/series
Last 30 days: 5-min resolution → 8640 points/series
Last year: 1-hour resolution → 8760 points/series

Storage saved exponentially.

8.1 TimescaleDB

Continuous aggregate at each level + retention policy:

-- 1-min aggregate retain 7 days
SELECT add_continuous_aggregate_policy('sensor_1min',
    start_offset => INTERVAL '1 day',
    end_offset => INTERVAL '1 min',
    schedule_interval => INTERVAL '1 min');
SELECT add_retention_policy('sensor_1min', INTERVAL '7 days');
 
-- 5-min retain 30 days
-- 1-hour retain 1 year

8.2 Prometheus / VictoriaMetrics

Recording rules + remote write tier:

- record: 'job:http_rate:5m'
  expr: sum by (job) (rate(http_requests_total[5m]))

Aggregate stored as new series, can have different retention via storage tier.

8.3 InfluxDB v3

Continuous query + retention policy.

8.4 ClickHouse

Materialized view rollup (Tuần 14).


9. Operations & Tuning

9.1 Prometheus tuning

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: prod-us
 
storage:
  tsdb:
    retention.time: 15d
    retention.size: 200GB
    wal-compression: true

9.2 Cardinality monitoring

# Top metrics by series count
topk(10, count by (__name__) ({__name__=~".+"}))
 
# Memory usage
process_resident_memory_bytes{job="prometheus"}

Alert if cardinality > 1M series.

9.3 Backup

  • TimescaleDB: pgBackRest (it’s Postgres after all)
  • InfluxDB v3: Parquet on S3, just snapshot
  • Prometheus: snapshot endpoint then copy
  • VictoriaMetrics: same

9.4 HA

  • Prometheus: 2 instances scraping same targets + dedup downstream
  • TimescaleDB: streaming replication
  • VictoriaMetrics: built-in HA in cluster mode

10. Common Patterns

10.1 Alerting

# Disk filling up - predict in 4h
predict_linear(node_filesystem_avail_bytes[1h], 4*3600) < 0

10.2 SLO tracking

# Availability SLO
(1 - (sum(rate(errors_total[30d])) / sum(rate(requests_total[30d])))) * 100

10.3 Heartbeat

# Service hasn't reported in 5 min
absent_over_time(up{service="critical"}[5m])

10.4 Histogram-based latency

# P99 latency by endpoint
histogram_quantile(0.99,
    sum by (endpoint, le) (rate(http_duration_bucket[5m]))
)

11. Anti-patterns

PatternWhy badFix
High-cardinality labels (user_id, trace_id)Series explosion → OOMUse logs/traces for these
Scrape interval too short (1s)Storage explosion15-60s typical
No retention policyDisk fillsSet retention
Using Summary instead of HistogramCan’t aggregate percentilesHistogram
Prometheus as long-term storeLimit 15-30d sensibleThanos/Mimir/VM
Using TS DB for join-heavy workloadJoins limitedUse Postgres or warehouse
Mixing TS + relational in same DB at scaleResource contentionSeparate
TimescaleDB without compressionStorage bloatsCompression policy
Querying raw data when aggregates existSlowUse continuous aggs
Aggregating across high-cardinalitySlow even with downsamplingPre-aggregate

12. Lab

Day 1: TimescaleDB

Setup TimescaleDB + ingest 100M sensor readings. Create hypertable + continuous aggregate. Compare vs plain Postgres partitioned.

Day 2: Prometheus + Grafana

Scrape node-exporter + postgres-exporter. Build dashboard. Write 5 alerts.

Day 3: VictoriaMetrics migration

Replace Prometheus with VictoriaMetrics. Run same dashboard, compare memory + speed.

Day 4: InfluxDB v3

Ingest IoT-style data. SQL queries. Compare with TimescaleDB on same data.

Day 5: Downsampling

Build 3-tier downsampling in TimescaleDB. Verify storage saved.

Day 6: Cardinality experiment

Push high-cardinality metric to Prometheus. Watch memory. Refactor with label_replace or aggregation.

Day 7: Real-world

Build full observability stack: app instrumented with histograms, Prometheus, alerts, Grafana dashboards.


13. Self-check

  1. TS workload characteristics — 5 đặc điểm?
  2. Hypertable trong TimescaleDB là gì?
  3. Continuous aggregate — khác Postgres MV thế nào?
  4. Gorilla compression đặc trưng cho gì?
  5. Pull vs Push model — Prometheus chọn pull tại sao?
  6. Cardinality explosion — nguyên nhân + fix?
  7. Histogram vs Summary — pick cái nào và tại sao?
  8. VictoriaMetrics vs Prometheus — khác biệt chính?
  9. TimescaleDB compression: trade-off?
  10. Decision: business sensor data → pick DB nào?

14. Tiếp theo

Tuan-Bonus-Graph-DB-Neo4j

Cập nhật: 2026-05-16