Tuan 15: Data Security & Encryption

“Encryption without key management is like buying a safe and taping the combination to the door.”

Tags: system-design security encryption devops compliance Student: Hieu Prerequisite: Tuan-14-AuthN-AuthZ-Security Lien quan: Tuan-02-Back-of-the-envelope · Tuan-07-Database-Sharding-Replication · Tuan-12-CICD-Pipeline · Tuan-13-Monitoring-Observability


1. Context & Why

Analogy doi thuong: Ket sat ngan hang — nhieu lop bao ve

Hieu, tuong tuong em di gui vang o ngan hang. Ngan hang khong chi co mot o khoa:

  • Cua chinh ngan hang co bao ve 24/7 Network security (TLS/firewall)
  • Phong ket sat can the tu nhan vien + ma PIN Authentication & Authorization (da hoc o Tuan-14-AuthN-AuthZ-Security)
  • Moi ngan ket co khoa rieng, chi chu so huu giu chia khoa Encryption at rest (du lieu luu tren disk)
  • Khi van chuyen vang, co xe boc thep + GPS tracking Encryption in transit (du lieu di tren network)
  • Chia khoa ket khong de trong ket ma cat rieng o kho bao mat khac Key Management Service (KMS)
  • So sach giao dich ghi chi tiet ai mo ket, luc nao, lay gi Audit logging (cho compliance)
  • Dinh ky doi ma ket Key rotation
  • Vang, tien mat, giay to duoc phan loai va bao ve khac nhau Data classification

Bai hoc cot loi: Bao mat du lieu khong phai mot lop duy nhat ma la defense in depth — nhieu lop chong len nhau. Mat mot lop, cac lop khac van bao ve.

Tai sao Alex Xu nhan manh Data Security?

Trong moi System Design Interview, khi interviewer hoi “how do you handle sensitive data?”, ho muon nghe:

“User PII duoc encrypt at rest bang AES-256-GCM voi envelope encryption. Key duoc quan ly boi AWS KMS voi automatic rotation moi 365 ngay. Data in transit qua TLS 1.3. PII fields trong database dung column-level encryption. Audit log ghi moi access vao sensitive data, luu 7 nam cho compliance. GDPR right to erasure duoc implement bang crypto-shredding — xoa encryption key thay vi xoa tung record.”

Do la su khac biet giua mot engineer binh thuong va mot Security-aware Architect.


2. Deep Dive — Cac khai niem cot loi

2.1 Encryption at Rest vs Encryption in Transit

Khia canhAt Rest (Du lieu luu tru)In Transit (Du lieu truyen tai)
Muc dichBao ve data tren disk/storageBao ve data khi di qua network
Ky thuat chinhAES-256, TDE, column-level encryptionTLS 1.2/1.3, mTLS
Chong laiPhysical theft, unauthorized disk accessMan-in-the-middle, eavesdropping
Vi duDatabase files, S3 objects, backupsAPI calls, database connections, inter-service communication
Ai quan ly keyKMS (AWS KMS, Vault)Certificate Authority (CA), cert-manager
Performance impactThap (hardware-accelerated AES-NI)Thap voi TLS 1.3 (1-RTT handshake)

Quy tac vang: Encrypt ca hai. Khong bao gio chi encrypt mot phia. Data at rest khong encrypt = mot vu data breach la lo het. Data in transit khong encrypt = moi request co the bi sniff.

2.2 Symmetric vs Asymmetric Encryption

Symmetric Encryption (Ma hoa doi xung)

Mot key duy nhat dung cho ca encrypt va decrypt.

Thuat toanKey SizeBlock SizeToc doUse Case
AES-128128 bit128 bitRat nhanhGeneral purpose
AES-256256 bit128 bitNhanhSensitive data, government, compliance
ChaCha20256 bitStreamNhanh (khong can AES-NI)Mobile, IoT (thieu hardware AES)

AES Modes quan trong:

ModeTinh chatDung khi
ECBKhong an toan — cung plaintext → cung ciphertextKHONG BAO GIO dung
CBCCan IV, sequential processingLegacy systems
CTRParallelizable, can unique nonceHigh-throughput encryption
GCMAuthenticated encryption (confidentiality + integrity)Khuyến nghị mac dinh — TLS, API payload

Aha Moment: AES-GCM la “gold standard” vi no vua encrypt vua dam bao integrity (authentication tag). Neu attacker sua ciphertext, GCM phat hien ngay khi decrypt. CBC khong co tinh nang nay — can them HMAC rieng.

Asymmetric Encryption (Ma hoa bat doi xung)

Hai key: public key (encrypt / verify) va private key (decrypt / sign).

Thuat toanKey Size tuong duong AES-128Toc doUse Case
RSA-20482048 bitCham (1000x cham hon AES)Key exchange, digital signature
RSA-40964096 bitRat chamHigh-security scenarios
ECDSA (P-256)256 bitNhanh hon RSA nhieuTLS certificates, JWT signing
Ed25519256 bitRat nhanhSSH keys, modern signatures

Tai sao khong dung asymmetric cho moi thu? Vi no cham 100-1000x so voi symmetric. Trong thuc te, asymmetric chi dung de trao doi symmetric key (key exchange), sau do dung symmetric key cho bulk encryption. Day chinh la cach TLS hoat dong.

2.3 TLS In Detail

TLS (Transport Layer Security) la nen tang cua moi giao tiep an toan tren internet.

TLS 1.3 Handshake (1-RTT)

Client                                          Server
  |                                                |
  |--- ClientHello (supported ciphers, key share) -->|
  |                                                |
  |<-- ServerHello (chosen cipher, key share,      |
  |    EncryptedExtensions, Certificate,           |
  |    CertificateVerify, Finished) ---------------|
  |                                                |
  |--- Finished (encrypted) ---------------------->|
  |                                                |
  |<========= Application Data (encrypted) =======>|

So sanh TLS 1.2 vs TLS 1.3:

Khia canhTLS 1.2TLS 1.3
Handshake2-RTT1-RTT (0-RTT cho resumption)
Cipher suitesNhieu (co nhieu weak)Chi 5 cipher suites (tat ca manh)
Forward secrecyTuy chonBat buoc
RSA key exchangeCoLoai bo (chi ECDHE)
Toc doCham honNhanh hon ~40% handshake

Forward Secrecy (Bi mat chuyen tiep): Ngay ca neu private key bi lo trong tuong lai, toan bo traffic truoc do van an toan. Vi moi session dung ephemeral key rieng (ECDHE).

mTLS (Mutual TLS)

Trong microservices, khong chi client verify server ma server cung verify client:

Service A                                    Service B
  |                                              |
  |--- ClientHello + Client Certificate -------->|
  |<-- ServerHello + Server Certificate ---------|
  |--- (Both verify each other's certificate) ---|
  |<========= Encrypted communication ==========>|

Use case: Service mesh (Istio, Linkerd), zero-trust architecture. Xem Tuan-11-Microservices-Pattern.

2.4 Envelope Encryption

Van de: Encrypt 1TB data voi AES-256 key. Khi can rotate key → phai decrypt va re-encrypt toan bo 1TB? Khong kha thi!

Giai phap: Envelope Encryption (ma hoa phong bi):

  1. Data Encryption Key (DEK): Key truc tiep encrypt data. Moi object/record co DEK rieng.
  2. Key Encryption Key (KEK): Key dung de encrypt DEK. Duoc quan ly boi KMS.
                    ┌─────────────────┐
                    │   KMS (KEK)     │
                    │  Master Key     │
                    └────────┬────────┘
                             │ encrypt/decrypt DEK
                    ┌────────▼────────┐
                    │  Encrypted DEK  │
                    │  (stored with   │
                    │   data)         │
                    └────────┬────────┘
                             │ DEK decrypts data
                    ┌────────▼────────┐
                    │  Encrypted Data │
                    │  (on disk/S3)   │
                    └─────────────────┘

Loi ich:

  • Key rotation nhanh: Chi can re-encrypt DEK (vai byte) bang KEK moi, khong can re-encrypt data
  • Performance: DEK duoc cache trong memory de encrypt/decrypt nhanh
  • Isolation: Moi object/tenant co DEK rieng — compromise mot DEK chi anh huong mot phan data
  • Scale: KMS chi can handle DEK encrypt/decrypt (nho, nhanh), khong can handle bulk data

Day la cach AWS S3 SSE-KMS, Google Cloud KMS, va Azure Key Vault hoat dong. Moi object trong S3 co DEK rieng, encrypt boi KEK trong KMS.

2.5 Key Management (KMS)

AWS KMS

Tinh nangChi tiet
Key typesSymmetric (AES-256), Asymmetric (RSA, ECC)
Key storageFIPS 140-2 Level 3 HSM
Key rotationTu dong moi 365 ngay (configurable)
Access controlIAM policies + Key policies + Grants
AuditMoi API call log trong CloudTrail
Pricing0.03/10,000 requests
Multi-regionMulti-Region Keys (replicate key across regions)

HashiCorp Vault

Tinh nangChi tiet
Secret enginesKV, PKI, Transit, Database, AWS, SSH, …
Auth methodsToken, AppRole, Kubernetes, LDAP, OIDC
Encryption as a ServiceTransit engine — app gui plaintext, Vault tra ciphertext
Dynamic secretsTao database credentials tu dong, auto-revoke
Seal/UnsealMaster key chia thanh Shamir shares (m-of-n)
AuditMoi operation duoc log (tamper-evident)

Khi nao dung AWS KMS vs Vault?

Tieu chiAWS KMSHashiCorp Vault
Cloud-native AWSTot nhatTot
Multi-cloud / HybridKhongTot nhat
Dynamic secretsKhong coCo
PKI / Certificate managementHan cheRat manh
Encryption as a ServiceCo (nhung phai gui data den AWS)Co (self-hosted, data khong roi datacenter)
Operational complexityThap (managed)Cao (phai van hanh cluster)
Cost at scaleCo the dat (per-request)License Vault Enterprise hoac tu host

2.6 Key Rotation

Tai sao phai rotate key?

  • Giam blast radius neu key bi compromise
  • Compliance requirement (PCI-DSS yeu cau rotate key it nhat moi 12 thang)
  • Giam luong data duoc encrypt boi cung mot key

Key rotation voi envelope encryption:

Truoc rotation:
  KEK-v1 encrypt --> DEK-001 (encrypted)
  DEK-001 encrypt --> Data Object A

Sau rotation:
  KEK-v2 encrypt --> DEK-001 (re-encrypted boi KEK-v2)
  DEK-001 van giu nguyen --> Data Object A KHONG can re-encrypt!

Chi phi rotation = so luong DEK x thoi gian re-encrypt DEK (microseconds). Voi 1 trieu DEK, rotation mat vai giay, khong phai vai ngay.

Key versioning: Luon giu lai key cu de decrypt data cu. AWS KMS tu dong giu tat ca phien ban cu.

2.7 Data Classification (Phan loai du lieu)

LevelTen goiVi duBao ve yeu cau
PublicCong khaiMarketing content, public API docsIntegrity check, khong can encrypt
InternalNoi boInternal wiki, employee directoryEncrypt in transit, access control
ConfidentialMatFinancial reports, customer emails, business plansEncrypt at rest + in transit, audit logging, need-to-know access
RestrictedToi matPII, PHI, payment card data, encryption keysEncrypt at rest + in transit, field-level encryption, strict access control, audit moi access, key management, compliance (GDPR/PCI-DSS/HIPAA)

Quy tac: Classify truoc, encrypt sau. Khong phan loai khong biet can bao ve den dau hoac encrypt thieu (nguy hiem) hoac encrypt thua (ton tien va performance).

2.8 PII Handling (Xu ly thong tin ca nhan)

PII (Personally Identifiable Information) gom:

LoaiVi duMuc do nhay cam
Direct identifiersHo ten, CMND/CCCD, email, SDT, dia chiCao
Indirect identifiersNgay sinh, gioi tinh, zip code, IP addressTrung binh (ket hop 3+ co the identify ca nhan)
Sensitive PIISo the tin dung, medical records, criminal history, biometric dataRat cao

Cac ky thuat bao ve PII:

  1. Encryption (field-level): Encrypt chi cac truong PII trong database
  2. Tokenization: Thay the PII bang token ngau nhien, luu mapping trong vault rieng
  3. Data masking: Hien thi ***@email.com thay vi dia chi email day du
  4. Pseudonymization: Thay PII bang pseudonym, van co the reverse voi key rieng
  5. Anonymization: Xoa hoan toan kha nang identify — khong the reverse (GDPR safe)
  6. Minimization: Chi thu thap PII that su can thiet

2.9 Data Masking & Tokenization

Data Masking

Original:    Nguyen Van Hieu | 0912345678 | hieu@company.com
Static Mask: Nguyen V** H*** | 091****678 | h***@company.com
Dynamic Mask (role=admin): Nguyen Van Hieu | 0912345678 | hieu@company.com
Dynamic Mask (role=support): Nguyen V** H*** | 091****678 | h***@company.com

Cac loai masking:

  • Static masking: Du lieu bi mask vinh vien (dung cho test/dev environment)
  • Dynamic masking: Mask at query time dua tren role nguoi dung (production)
  • On-the-fly masking: Mask khi export data ra ngoai he thong

Tokenization

Credit Card: 4532-1234-5678-9012
     |
     v (tokenize)
Token: tok_8f3a2b1c9d4e
     |
     v (stored in Token Vault - rieng biet, bao mat cao)
Mapping: tok_8f3a2b1c9d4e --> 4532-1234-5678-9012

So sanh Encryption vs Tokenization:

Khia canhEncryptionTokenization
Format preservationKhong (ciphertext dai hon plaintext)Co (token co the cung format)
Quan he toan hocCiphertext co quan he toan hoc voi plaintextKhong co quan he — hoan toan random
PCI-DSS scopeHe thong van trong scopeCo the giam scope (chi Token Vault trong scope)
PerformanceNhanh (AES-NI)Lookup table (can database call)

PCI-DSS tip: Tokenization duoc uu tien cho credit card data vi no giam PCI scope. Chi Token Vault can PCI compliant, cac service khac chi thay token.

2.10 GDPR Basics (Tong quan GDPR)

GDPR (General Data Protection Regulation) — Luat bao ve du lieu cua EU, anh huong moi cong ty xu ly du lieu nguoi dung EU.

QuyenTen tieng VietY nghia ky thuat
Right to AccessQuyen truy capUser co the yeu cau xuat tat ca data cua ho
Right to Erasure (Right to be Forgotten)Quyen xoaUser yeu cau xoa → phai xoa moi noi (including backups!)
Right to PortabilityQuyen chuyen doiExport data user ra format chuan (JSON, CSV)
Right to RectificationQuyen chinh suaUser co the yeu cau sua data sai
ConsentSu dong yPhai co bang chung user dong y truoc khi thu thap data
Data MinimizationToi thieu hoaChi thu thap data thuc su can thiet
Breach NotificationThong bao lo lotPhai thong bao trong 72 gio sau khi phat hien breach

Crypto-shredding — Giai phap cho Right to Erasure

Van de: User yeu cau xoa data, nhung data nam trong backups, replicas, analytics pipeline, Kafka topics… Xoa het la bat kha thi!

Giai phap: Crypto-shredding:

  1. Moi user co DEK rieng (per-user encryption key)
  2. Tat ca PII cua user encrypt bang DEK nay
  3. Khi user yeu cau xoa → xoa DEK cua user
  4. Data van ton tai nhung khong the decrypt = effectively deleted
User A's data:
  DEK-A --> encrypt --> [encrypted PII in DB, backups, logs...]

User A requests erasure:
  DELETE DEK-A from KMS
  --> All User A's data = unreadable garbage
  --> GDPR compliant!

Aha Moment: Crypto-shredding la ly do per-entity encryption key cuc ky quan trong. Neu dung chung mot key cho moi user, khong the xoa data cua mot user ma khong anh huong nguoi khac.

2.11 PCI-DSS Overview

PCI-DSS (Payment Card Industry Data Security Standard) — bat buoc cho moi he thong xu ly the thanh toan.

RequirementMo taKy thuat
Req 3Protect stored cardholder dataEncryption at rest (AES-256), tokenization, masking
Req 4Encrypt transmission of cardholder dataTLS 1.2+ cho moi transmission
Req 3.5Protect encryption keysKMS, split knowledge, dual control
Req 3.6Key management proceduresDocumented key rotation, generation, destruction
Req 10Track and monitor all accessAudit logging, SIEM, log retention
Req 3.1Minimize data storageChi giu data can thiet, co retention policy

PCI-DSS scope reduction strategies:

  1. Tokenization: Thay credit card bang token giam he thong trong scope
  2. Network segmentation: Tach Payment zone rieng, firewall cat voi cac zone khac
  3. Third-party processing: Dung Stripe/Braintree ho chiu PCI scope, minh chi giu token

2.12 Backup Encryption

Khia canhRecommendation
EncryptionAES-256-GCM cho tat ca backups
Key managementBackup encryption key rieng, luu trong KMS
Key khong luu cung backupTuyet doi — mat ca hai = mat het
Test restoreDinh ky test restore tu encrypted backup (it nhat moi quy)
Offsite backupEncrypt truoc khi chuyen ra offsite
RetentionBackup key phai ton tai it nhat bang thoi gian retention cua backup

Pitfall kinh dien: Team lam key rotation cho production nhung quen rotate backup encryption key hoac te hon, xoa key cu trong khi backup cu van con. Ket qua: backup ton tai nhung khong the restore.

2.13 Secure Deletion (Xoa an toan)

Phuong phapMo taHieu qua
rm / DELETEXoa pointer, data van tren diskKhong an toan
Overwrite (1-pass zero)Ghi de bang 0Du cho HDD hien dai
DoD 5220.22-M3 passes (0, 1, random)Legacy standard
Crypto-shreddingXoa encryption keyHieu qua nhat cho cloud/SSD
Physical destructionNghien, dot, khu tuCho hardware decommission

SSD luu y: SSD co wear leveling va spare blocks — overwrite khong dam bao xoa het. Crypto-shredding la cach duy nhat dam bao tren SSD/cloud storage.

2.14 Database-level Encryption

TDE (Transparent Data Encryption)

TDE encrypt toan bo database files tren disk. Application khong can thay doi code.

DatabaseTDE SupportChi tiet
PostgreSQLKhong native (dung pgcrypto, pg_tde extension)Can 3rd party hoac file-system encryption
MySQLCo (InnoDB tablespace encryption)AES-256, key trong keyring plugin
SQL ServerCo (Enterprise edition)AES-256, certificate-based key management
OracleCo (Advanced Security Option)AES-256, wallet-based key management
MongoDBCo (Enterprise)AES-256-CBC, KMIP integration

TDE han che: Chi bao ve data on disk. Khi data duoc load vao memory (query result, buffer pool), no o dang plaintext. DBA co quyen truy cap van doc duoc.

Column-level Encryption

Encrypt chi cac column chua sensitive data:

-- PostgreSQL voi pgcrypto
-- Encrypt khi INSERT
INSERT INTO users (name, email_encrypted, phone_encrypted)
VALUES (
    'Nguyen Van Hieu',
    pgp_sym_encrypt('hieu@company.com', 'my-secret-key'),
    pgp_sym_encrypt('0912345678', 'my-secret-key')
);
 
-- Decrypt khi SELECT (chi role co quyen)
SELECT name,
       pgp_sym_decrypt(email_encrypted::bytea, 'my-secret-key') as email,
       pgp_sym_decrypt(phone_encrypted::bytea, 'my-secret-key') as phone
FROM users
WHERE id = 1;

Han che: Khong the index hoac search tren encrypted columns (vi gia tri encrypted khac nhau moi lan do random IV).

Field-level Encryption (Client-side)

Application encrypt truoc khi gui xuong database. Database chi thay ciphertext.

Uu diem so voi TDE/column-level:

  • DBA khong doc duoc (key nam phia app/KMS)
  • Data encrypted suot vong doi (at rest + in transit giua app va DB)
  • Co the dung khac key cho khac tenant (multi-tenant isolation)

Nhuoc diem:

  • Application code phuc tap hon
  • Khong the query tren encrypted fields (truoc khi encrypt, can luu search index rieng)
  • Schema migration phuc tap

MongoDB Client-Side Field Level Encryption (CSFLE): MongoDB ho tro native CSFLE tu version 4.2, voi automatic encryption/decryption. Cuc ky manh cho multi-tenant SaaS.


3. Estimation — Encryption Overhead

3.1 Encryption CPU Overhead

AES-256-GCM voi hardware AES-NI (hau het CPU hien dai):

So sanh: Mot API request trung binh mat 10-100ms. Encryption mat 0.5us. Encryption overhead < 0.005% latency cho data nho.

3.2 Storage Overhead

AES-GCM them vao moi encrypted object:

Voi envelope encryption, them encrypted DEK:

Vi du: Encrypt 100 trieu records, moi record 1KB:

Voi data lon (images, videos), overhead gan nhu 0% vi 284 bytes / 2MB = 0.014%.

3.3 KMS Request Cost

AWS KMS pricing (us-east-1, tinh cho 2026):

Vi du: He thong voi 10 CMKs, 50M encrypt/decrypt requests/thang:

Toi uu: Cache DEK trong memory (co TTL). Thay vi goi KMS moi request, chi goi khi DEK cache miss hoac expire. Giam tu 50M requests xuong con ~500K requests/thang = **150.

3.4 Audit Log Storage cho Compliance

Moi access vao sensitive data can log:

Gia su he thong co 10K sensitive data accesses/s:

Chi phi storage (S3 Standard → Glacier cho data cu):

TierDataCost/month
S3 Standard (0-1 thang)432 GB$10
S3 IA (1-12 thang)5.2 TB$66
S3 Glacier (1-7 nam)1,100 TB$4,400
Total~$4,500/month

Voi compression (typ. 10:1 cho text logs), giam xuong ~$450/month. Van la mot khoan chi phi dang ke.

3.5 TLS Handshake Overhead

Voi connection pooling va keep-alive, handshake chi xay ra mot lan cho nhieu requests:

Ket luan: TLS overhead la khong dang ke voi connection reuse. Khong co ly do nao de khong dung TLS.


4. Security Deep Dive

4.1 Key Escrow Risks (Rui ro ky gui khoa)

Key escrow: Giao encryption key cho ben thu ba (VD: chinh phu, cloud provider) giu ho.

Rui roMo ta
Single point of compromiseBen thu ba bi hack → tat ca data bi lo
Insider threatNhan vien ben thu ba co the truy cap key
Legal compulsionChinh phu co the ep ben thu ba giao key
Trust boundaryBat buoc trust ben thu ba — vi pham zero-trust

Giai phap:

  • Tu quan ly key (self-managed KMS) cho data cuc ky nhay cam
  • BYOK (Bring Your Own Key): Dung key cua minh voi cloud KMS
  • Hold Your Own Key (HYOK): Key khong bao gio roi infrastructure cua minh
  • Split knowledge: Khong ai mot minh co du key de decrypt (Shamir’s Secret Sharing)

4.2 HSM vs Software KMS

Khia canhHSM (Hardware Security Module)Software KMS
Key storageTamper-resistant hardwareEncrypted in software
FIPS 140-2Level 3 (physical tamper protection)Level 1-2
Key extractionKhong the extract key ra ngoai HSMKey trong memory co the bi dump
Performance1,000-10,000 operations/s100,000+ operations/s
Cost50,000/unit (hoac CloudHSM ~$1.50/hr)Thap (software license)
Use caseRoot CA, master keys, PCI-DSS, governmentApplication-level encryption, dev/staging

Rule of thumb: Dung HSM cho root of trust (master key, CA signing key). Dung software KMS cho bulk operations (encrypt/decrypt data). Ket hop ca hai: HSM giu KEK, software KMS handle DEK.

4.3 Side-Channel Attacks Awareness

Side-channel attack: Tan cong khong nhắm vao thuat toan ma nhắm vao implementation cua no.

LoaiCach tan congPhong chong
Timing attackDo thoi gian encrypt/decrypt de suy ra keyConstant-time comparison, padding
Cache attackQuan sat CPU cache access patternsCache partitioning, disable hyperthreading
Power analysisDo dien nang tieu thu khi encryptHSM (co shielding)
Spectre/MeltdownKhai thac speculative executionOS/CPU patches, process isolation
Padding oracleExploit error messages khi padding saiDung authenticated encryption (GCM), khong leak error details

Thuc te cho developer: Dung thu vien crypto da duoc audit (libsodium, OpenSSL, AWS Encryption SDK). KHONG BAO GIO tu implement crypto algorithm. Ngay ca Google, Apple cung dung standard libraries.

4.4 Data Breach Response Plan

Timeline bat buoc (GDPR yeu cau thong bao trong 72 gio):

GioAction
0-1hPhat hien va xac nhan breach (monitoring/SIEM alert)
1-4hContainment: isolate affected systems, revoke compromised credentials, rotate keys
4-12hAssessment: xac dinh scope (bao nhieu records, loai data gi, ai bi anh huong)
12-24hEvidence preservation: forensic copy truoc khi remediate
24-48hRemediation: patch vulnerability, restore tu clean backup
48-72hNotification: bao DPA (Data Protection Authority), bao users bi anh huong
72h+Post-incident: root cause analysis, update runbook, re-test

Encryption giam thiet hai breach:

  • Data encrypted + key khong bi compromise = breach khong can thong bao (GDPR Article 34.3a)
  • Tokenized data bi lo = vo gia tri vi khong co token vault

Aha Moment: Encrypt data dung cach co the bien mot catastrophic breach thanh mot security incident khong can thong bao. Day la ROI thuc te cua encryption.


5. DevOps — Thuc hanh trien khai

5.1 HashiCorp Vault Setup va Usage

Vault Architecture cho Production

# vault-values.yaml (Helm chart for Kubernetes)
server:
  ha:
    enabled: true
    replicas: 3
    raft:
      enabled: true
      config: |
        storage "raft" {
          path = "/vault/data"
          retry_join {
            leader_api_addr = "https://vault-0.vault-internal:8200"
          }
          retry_join {
            leader_api_addr = "https://vault-1.vault-internal:8200"
          }
          retry_join {
            leader_api_addr = "https://vault-2.vault-internal:8200"
          }
        }
 
  extraEnvironmentVars:
    VAULT_SEAL_TYPE: awskms
    VAULT_AWSKMS_SEAL_KEY_ID: "alias/vault-unseal-key"
 
  # Auto-unseal dung AWS KMS (khong can manual unseal)
  seal:
    awskms:
      region: "ap-southeast-1"
      kms_key_id: "alias/vault-unseal-key"
 
  auditStorage:
    enabled: true
    size: 50Gi
 
ui:
  enabled: true
 
injector:
  enabled: true  # Vault Agent Injector cho Kubernetes pods

Vault Policies (Principle of Least Privilege)

# policy-app-payment.hcl
# Chi cho phep payment service doc secrets cua no
 
path "secret/data/payment/*" {
  capabilities = ["read", "list"]
}
 
path "transit/encrypt/payment-key" {
  capabilities = ["update"]
}
 
path "transit/decrypt/payment-key" {
  capabilities = ["update"]
}
 
# Khong cho phep:
# - Doc secrets cua service khac
# - Tao/xoa keys
# - Access root/admin paths

5.2 AWS KMS Integration

Terraform IaC cho KMS

# kms.tf
resource "aws_kms_key" "data_encryption" {
  description             = "Customer data encryption key"
  deletion_window_in_days = 30  # Safety: 30 ngay truoc khi xoa that
  enable_key_rotation     = true  # Tu dong rotate moi 365 ngay
  multi_region            = false
 
  key_usage      = "ENCRYPT_DECRYPT"
  customer_master_key_spec = "SYMMETRIC_DEFAULT"  # AES-256
 
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AllowKeyAdmin"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/SecurityAdmin"
        }
        Action   = ["kms:*"]
        Resource = "*"
      },
      {
        Sid    = "AllowAppEncryptDecrypt"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/AppServiceRole"
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:GenerateDataKey",
          "kms:GenerateDataKeyWithoutPlaintext",
          "kms:DescribeKey"
        ]
        Resource = "*"
      }
    ]
  })
 
  tags = {
    Environment      = "production"
    DataClassification = "restricted"
    ManagedBy        = "terraform"
  }
}
 
resource "aws_kms_alias" "data_encryption" {
  name          = "alias/customer-data-key"
  target_key_id = aws_kms_key.data_encryption.key_id
}
 
# CloudWatch alarm neu KMS requests bat thuong (co the la attack)
resource "aws_cloudwatch_metric_alarm" "kms_anomaly" {
  alarm_name          = "kms-request-anomaly"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "NumberOfAPIRequests"
  namespace           = "AWS/KMS"
  period              = 300
  statistic           = "Sum"
  threshold           = 10000  # 10K requests trong 5 phut la bat thuong
 
  dimensions = {
    KeyId = aws_kms_key.data_encryption.key_id
  }
 
  alarm_actions = [aws_sns_topic.security_alerts.arn]
}

5.3 cert-manager cho Kubernetes

# cert-manager-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: security@company.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            class: nginx
 
---
# Tu dong tao va renew TLS certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: production
spec:
  secretName: api-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  commonName: api.company.com
  dnsNames:
    - api.company.com
    - "*.api.company.com"
  duration: 2160h    # 90 ngay
  renewBefore: 720h  # Renew truoc 30 ngay
 
---
# Internal mTLS voi private CA
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  vault:
    path: pki/sign/internal-services
    server: https://vault.vault.svc.cluster.local:8200
    auth:
      kubernetes:
        role: cert-manager
        mountPath: /v1/auth/kubernetes
        secretRef:
          name: vault-token
          key: token
 
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: service-a-mtls
  namespace: production
spec:
  secretName: service-a-mtls-secret
  issuerRef:
    name: internal-ca
    kind: ClusterIssuer
  commonName: service-a.production.svc.cluster.local
  usages:
    - server auth
    - client auth   # mTLS: ca server va client auth
  duration: 720h    # 30 ngay (internal certs rotate nhanh hon)
  renewBefore: 240h # Renew truoc 10 ngay

5.4 Automated Key Rotation

# CronJob rotate application-level encryption keys
apiVersion: batch/v1
kind: CronJob
metadata:
  name: key-rotation
  namespace: security
spec:
  schedule: "0 2 1 */3 *"  # Moi 3 thang, 2AM ngay 1
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: key-rotation-sa
          containers:
            - name: key-rotation
              image: company/key-rotation:latest
              env:
                - name: VAULT_ADDR
                  value: "https://vault.vault.svc.cluster.local:8200"
                - name: KMS_KEY_ALIAS
                  value: "alias/customer-data-key"
                - name: SLACK_WEBHOOK
                  valueFrom:
                    secretRef:
                      name: slack-webhook
                      key: url
              command:
                - /bin/sh
                - -c
                - |
                  # 1. Rotate key trong Vault Transit engine
                  vault write -f transit/keys/payment-key/rotate
 
                  # 2. Update min_decryption_version (giu 3 versions cu)
                  vault write transit/keys/payment-key \
                    min_decryption_version=$(vault read -field=latest_version transit/keys/payment-key | awk '{print $1 - 3}')
 
                  # 3. Trigger re-encryption cua DEKs bang key moi
                  python3 /scripts/reencrypt-deks.py
 
                  # 4. Thong bao ket qua
                  curl -X POST $SLACK_WEBHOOK \
                    -d '{"text":"Key rotation completed for payment-key"}'
          restartPolicy: OnFailure

5.5 Data Classification Tooling

# Prometheus rules de phat hien unencrypted sensitive data access
groups:
  - name: data_classification_alerts
    rules:
      - alert: UnencryptedPIIAccess
        expr: |
          rate(db_query_total{table=~"users|payments|medical_records",
                              encrypted="false"}[5m]) > 0
        for: 1m
        labels:
          severity: critical
          compliance: "gdpr,pci-dss"
        annotations:
          summary: "Unencrypted access to sensitive table {{ $labels.table }}"
          runbook: "https://wiki.internal/runbooks/data-classification"
 
      - alert: SensitiveDataInLogs
        expr: |
          rate(log_pii_detected_total[5m]) > 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "PII detected in application logs"
          action: "Scrub logs immediately, check log sanitization filters"

6. Code Examples

6.1 Python: AES-256-GCM Encryption/Decryption

"""
AES-256-GCM Encryption/Decryption with envelope encryption pattern.
Dung trong production voi proper key management (KMS).
"""
 
import os
import json
import base64
from dataclasses import dataclass
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
 
 
@dataclass
class EncryptedPayload:
    """Cau truc luu tru ciphertext + metadata de decrypt."""
    ciphertext: bytes
    nonce: bytes          # 12 bytes, unique per encryption
    key_id: str           # ID cua KEK da encrypt DEK nay
    encrypted_dek: bytes  # DEK da duoc encrypt boi KEK
 
    def to_json(self) -> str:
        return json.dumps({
            "ciphertext": base64.b64encode(self.ciphertext).decode(),
            "nonce": base64.b64encode(self.nonce).decode(),
            "key_id": self.key_id,
            "encrypted_dek": base64.b64encode(self.encrypted_dek).decode(),
        })
 
    @classmethod
    def from_json(cls, data: str) -> "EncryptedPayload":
        d = json.loads(data)
        return cls(
            ciphertext=base64.b64decode(d["ciphertext"]),
            nonce=base64.b64decode(d["nonce"]),
            key_id=d["key_id"],
            encrypted_dek=base64.b64decode(d["encrypted_dek"]),
        )
 
 
class EnvelopeEncryptor:
    """
    Envelope encryption:
    1. Generate random DEK
    2. Encrypt data voi DEK (AES-256-GCM)
    3. Encrypt DEK voi KEK (tu KMS)
    4. Luu encrypted DEK cung voi ciphertext
    """
 
    def __init__(self, kms_client):
        """
        kms_client: doi tuong co method encrypt_key() va decrypt_key()
        Co the la AWS KMS, Vault Transit, hoac mock cho testing.
        """
        self.kms = kms_client
 
    def encrypt(self, plaintext: bytes, key_id: str,
                associated_data: bytes = None) -> EncryptedPayload:
        # 1. Generate random DEK (256-bit)
        dek = AESGCM.generate_key(bit_length=256)
 
        # 2. Generate random nonce (96-bit, NIST recommended for GCM)
        nonce = os.urandom(12)
 
        # 3. Encrypt data voi DEK
        aesgcm = AESGCM(dek)
        ciphertext = aesgcm.encrypt(nonce, plaintext, associated_data)
 
        # 4. Encrypt DEK voi KEK (thong qua KMS)
        encrypted_dek = self.kms.encrypt_key(dek, key_id)
 
        # 5. Xoa DEK khoi memory ngay khi khong can
        # (Python khong dam bao secure erase, nhung minimizes window)
        del dek
 
        return EncryptedPayload(
            ciphertext=ciphertext,
            nonce=nonce,
            key_id=key_id,
            encrypted_dek=encrypted_dek,
        )
 
    def decrypt(self, payload: EncryptedPayload,
                associated_data: bytes = None) -> bytes:
        # 1. Decrypt DEK tu KMS
        dek = self.kms.decrypt_key(payload.encrypted_dek, payload.key_id)
 
        # 2. Decrypt data voi DEK
        aesgcm = AESGCM(dek)
        plaintext = aesgcm.decrypt(payload.nonce, payload.ciphertext,
                                   associated_data)
 
        del dek
        return plaintext
 
 
# === Vi du su dung ===
class MockKMS:
    """Mock KMS cho demo. Production dung AWS KMS hoac Vault."""
 
    def __init__(self):
        # KEK -- trong production, key nay nam trong HSM/KMS
        self._keys = {
            "key-001": AESGCM.generate_key(bit_length=256),
        }
 
    def encrypt_key(self, dek: bytes, key_id: str) -> bytes:
        kek = self._keys[key_id]
        nonce = os.urandom(12)
        aesgcm = AESGCM(kek)
        return nonce + aesgcm.encrypt(nonce, dek, None)
 
    def decrypt_key(self, encrypted_dek: bytes, key_id: str) -> bytes:
        kek = self._keys[key_id]
        nonce = encrypted_dek[:12]
        ciphertext = encrypted_dek[12:]
        aesgcm = AESGCM(kek)
        return aesgcm.decrypt(nonce, ciphertext, None)
 
 
if __name__ == "__main__":
    kms = MockKMS()
    encryptor = EnvelopeEncryptor(kms)
 
    # Encrypt PII
    user_data = json.dumps({
        "name": "Nguyen Van Hieu",
        "email": "hieu@company.com",
        "phone": "0912345678",
        "cccd": "001234567890"
    }).encode()
 
    # associated_data = context khong encrypt nhung bind vao ciphertext
    # Neu associated_data bi thay doi, decrypt se fail --> chong tampering
    context = b"user_id=12345"
 
    encrypted = encryptor.encrypt(user_data, "key-001",
                                  associated_data=context)
    print(f"Encrypted payload: {encrypted.to_json()[:100]}...")
 
    # Decrypt
    decrypted = encryptor.decrypt(encrypted, associated_data=context)
    print(f"Decrypted: {json.loads(decrypted)}")
 
    # Thu thay doi context --> decrypt fail (integrity check)
    try:
        encryptor.decrypt(encrypted, associated_data=b"user_id=99999")
    except Exception as e:
        print(f"Tamper detected! {e}")

6.2 Vault Integration (Read/Write Secrets)

"""
HashiCorp Vault integration cho application secrets va encryption.
Dung Vault Transit engine cho Encryption as a Service.
"""
 
import hvac
import base64
import os
from functools import lru_cache
 
 
class VaultClient:
    """Production-ready Vault client voi retry va caching."""
 
    def __init__(self, vault_addr: str = None, role: str = "app"):
        self.vault_addr = vault_addr or os.getenv("VAULT_ADDR",
                                                   "https://vault:8200")
        self.client = hvac.Client(url=self.vault_addr)
        self._authenticate(role)
 
    def _authenticate(self, role: str):
        """Authenticate bang Kubernetes ServiceAccount (production)
        hoac Token (development)."""
        token = os.getenv("VAULT_TOKEN")
        if token:
            self.client.token = token
            return
 
        # Kubernetes auth
        jwt_path = "/var/run/secrets/kubernetes.io/serviceaccount/token"
        if os.path.exists(jwt_path):
            with open(jwt_path) as f:
                jwt = f.read()
            self.client.auth.kubernetes.login(role=role, jwt=jwt)
        else:
            raise RuntimeError("No Vault authentication method available")
 
    # --- KV Secrets Engine (Static secrets) ---
 
    def read_secret(self, path: str) -> dict:
        """Doc secret tu KV v2 engine."""
        result = self.client.secrets.kv.v2.read_secret_version(path=path)
        return result["data"]["data"]
 
    def write_secret(self, path: str, data: dict):
        """Ghi secret vao KV v2 engine (versioned)."""
        self.client.secrets.kv.v2.create_or_update_secret(
            path=path, secret=data
        )
 
    # --- Transit Engine (Encryption as a Service) ---
 
    def encrypt(self, key_name: str, plaintext: bytes,
                context: bytes = None) -> str:
        """Encrypt data qua Vault Transit engine.
        Key KHONG BAO GIO roi Vault -- chi ciphertext tra ve."""
        b64_plaintext = base64.b64encode(plaintext).decode()
 
        params = {"plaintext": b64_plaintext}
        if context:
            params["context"] = base64.b64encode(context).decode()
 
        result = self.client.secrets.transit.encrypt_data(
            name=key_name, **params
        )
        return result["data"]["ciphertext"]  # "vault:v1:base64..."
 
    def decrypt(self, key_name: str, ciphertext: str,
                context: bytes = None) -> bytes:
        """Decrypt data qua Vault Transit engine."""
        params = {"ciphertext": ciphertext}
        if context:
            params["context"] = base64.b64encode(context).decode()
 
        result = self.client.secrets.transit.decrypt_data(
            name=key_name, **params
        )
        return base64.b64decode(result["data"]["plaintext"])
 
    def rewrap(self, key_name: str, ciphertext: str,
               context: bytes = None) -> str:
        """Re-encrypt voi key version moi nhat (key rotation).
        Vault decrypt bang key cu, re-encrypt bang key moi.
        Plaintext KHONG BAO GIO roi Vault."""
        params = {"ciphertext": ciphertext}
        if context:
            params["context"] = base64.b64encode(context).decode()
 
        result = self.client.secrets.transit.rewrap_data(
            name=key_name, **params
        )
        return result["data"]["ciphertext"]
 
    # --- Dynamic Database Credentials ---
 
    def get_db_credentials(self, role: str = "app-readonly") -> dict:
        """Lay dynamic database credentials (tu dong revoke sau TTL)."""
        result = self.client.secrets.database.generate_credentials(
            name=role
        )
        return {
            "username": result["data"]["username"],
            "password": result["data"]["password"],
            "ttl": result["lease_duration"],
            "lease_id": result["lease_id"],
        }
 
 
# === Vi du su dung ===
if __name__ == "__main__":
    vault = VaultClient()
 
    # 1. Doc database password tu Vault (static secret)
    db_config = vault.read_secret("database/production")
    print(f"DB Host: {db_config['host']}")
 
    # 2. Encrypt PII qua Transit engine
    pii = b'{"email": "hieu@company.com", "phone": "0912345678"}'
    ciphertext = vault.encrypt("pii-key", pii,
                               context=b"user_id=12345")
    print(f"Ciphertext: {ciphertext}")
 
    # 3. Decrypt
    plaintext = vault.decrypt("pii-key", ciphertext,
                              context=b"user_id=12345")
    print(f"Plaintext: {plaintext.decode()}")
 
    # 4. Key rotation: rewrap existing ciphertext voi key moi
    new_ciphertext = vault.rewrap("pii-key", ciphertext,
                                  context=b"user_id=12345")
    print(f"Rewrapped: {new_ciphertext}")
 
    # 5. Dynamic DB credentials (tu dong expire)
    creds = vault.get_db_credentials("app-readonly")
    print(f"Dynamic DB user: {creds['username']}, TTL: {creds['ttl']}s")

6.3 TLS Certificate Generation Script

#!/bin/bash
# generate-internal-certs.sh
# Tao internal CA va service certificates cho mTLS
# Dung cho development/staging. Production dung cert-manager + Vault PKI.
 
set -euo pipefail
 
CERTS_DIR="./certs"
CA_DAYS=3650      # CA valid 10 nam
CERT_DAYS=365     # Service certs valid 1 nam
KEY_SIZE=4096     # RSA key size cho CA
EC_CURVE="prime256v1"  # ECDSA cho service certs (nhanh hon RSA)
 
mkdir -p "${CERTS_DIR}"
 
echo "=== 1. Tao Root CA ==="
openssl genrsa -out "${CERTS_DIR}/ca-key.pem" ${KEY_SIZE}
openssl req -new -x509 \
    -key "${CERTS_DIR}/ca-key.pem" \
    -out "${CERTS_DIR}/ca-cert.pem" \
    -days ${CA_DAYS} \
    -subj "/C=VN/ST=HCM/O=Company/OU=Security/CN=Internal Root CA" \
    -addext "basicConstraints=critical,CA:TRUE" \
    -addext "keyUsage=critical,keyCertSign,cRLSign"
 
echo "=== 2. Tao Service Certificate (ECDSA) ==="
generate_service_cert() {
    local SERVICE_NAME=$1
    local SANS=$2  # Subject Alternative Names
 
    echo "--- Generating cert for ${SERVICE_NAME} ---"
 
    # Generate ECDSA private key (nhanh hon RSA, key nho hon)
    openssl ecparam -genkey -name ${EC_CURVE} \
        -out "${CERTS_DIR}/${SERVICE_NAME}-key.pem"
 
    # Create CSR
    openssl req -new \
        -key "${CERTS_DIR}/${SERVICE_NAME}-key.pem" \
        -out "${CERTS_DIR}/${SERVICE_NAME}.csr" \
        -subj "/C=VN/ST=HCM/O=Company/OU=${SERVICE_NAME}/CN=${SERVICE_NAME}"
 
    # Sign voi CA, them SANs
    openssl x509 -req \
        -in "${CERTS_DIR}/${SERVICE_NAME}.csr" \
        -CA "${CERTS_DIR}/ca-cert.pem" \
        -CAkey "${CERTS_DIR}/ca-key.pem" \
        -CAcreateserial \
        -out "${CERTS_DIR}/${SERVICE_NAME}-cert.pem" \
        -days ${CERT_DAYS} \
        -extfile <(cat <<EOF
subjectAltName=${SANS}
keyUsage=critical,digitalSignature,keyEncipherment
extendedKeyUsage=serverAuth,clientAuth
EOF
)
 
    # Xoa CSR (khong can giu)
    rm -f "${CERTS_DIR}/${SERVICE_NAME}.csr"
 
    # Verify certificate
    openssl verify -CAfile "${CERTS_DIR}/ca-cert.pem" \
        "${CERTS_DIR}/${SERVICE_NAME}-cert.pem"
 
    echo "--- ${SERVICE_NAME} cert OK ---"
}
 
# Tao cert cho cac services
generate_service_cert "api-gateway" \
    "DNS:api-gateway,DNS:api-gateway.production.svc.cluster.local,DNS:localhost,IP:127.0.0.1"
 
generate_service_cert "payment-service" \
    "DNS:payment-service,DNS:payment-service.production.svc.cluster.local"
 
generate_service_cert "user-service" \
    "DNS:user-service,DNS:user-service.production.svc.cluster.local"
 
echo "=== 3. Tao Kubernetes Secrets ==="
echo "Run these commands to create K8s secrets:"
for service in api-gateway payment-service user-service; do
    echo "kubectl create secret tls ${service}-tls \\"
    echo "  --cert=${CERTS_DIR}/${service}-cert.pem \\"
    echo "  --key=${CERTS_DIR}/${service}-key.pem \\"
    echo "  -n production"
    echo ""
done
 
echo "=== 4. Certificate Info ==="
for cert in "${CERTS_DIR}"/*-cert.pem; do
    echo "--- $(basename ${cert}) ---"
    openssl x509 -in "${cert}" -noout -subject -dates -ext subjectAltName
    echo ""
done
 
echo "=== Done! ==="
echo "IMPORTANT: Trong production, dung cert-manager + Vault PKI thay vi script nay."
echo "CA private key (${CERTS_DIR}/ca-key.pem) phai duoc bao ve cuc ky can than!"

6.4 Field-level Encryption in PostgreSQL

"""
Field-level encryption cho PostgreSQL.
Encrypt PII fields truoc khi luu vao database.
Database chi thay ciphertext -- DBA khong doc duoc.
"""
 
import json
import hashlib
from typing import Optional
import asyncpg
from vault_client import VaultClient  # Tu section 6.2
 
 
class SecureUserRepository:
    """Repository pattern voi field-level encryption cho PII."""
 
    # Fields can encrypt (Restricted classification)
    ENCRYPTED_FIELDS = {"email", "phone", "cccd", "address"}
    # Fields can search (luu blind index)
    SEARCHABLE_ENCRYPTED_FIELDS = {"email", "phone"}
 
    def __init__(self, db_pool: asyncpg.Pool, vault: VaultClient,
                 transit_key: str = "user-pii-key"):
        self.db = db_pool
        self.vault = vault
        self.transit_key = transit_key
        # HMAC key cho blind index (cho phep search tren encrypted fields)
        self._hmac_key = vault.read_secret(
            "secrets/blind-index-key"
        )["key"].encode()
 
    def _blind_index(self, field_name: str, value: str) -> str:
        """Tao blind index de search tren encrypted field.
        HMAC(field_name + value) --> deterministic hash, khong the reverse."""
        return hashlib.blake2b(
            f"{field_name}:{value.lower().strip()}".encode(),
            key=self._hmac_key,
            digest_size=32
        ).hexdigest()
 
    def _encrypt_field(self, value: str, user_id: str) -> str:
        """Encrypt mot field voi context = user_id (chong swap attack)."""
        return self.vault.encrypt(
            self.transit_key,
            value.encode(),
            context=f"user:{user_id}".encode()
        )
 
    def _decrypt_field(self, ciphertext: str, user_id: str) -> str:
        """Decrypt mot field."""
        return self.vault.decrypt(
            self.transit_key,
            ciphertext,
            context=f"user:{user_id}".encode()
        ).decode()
 
    async def create_user(self, user_id: str, data: dict) -> None:
        """Tao user voi PII duoc encrypt."""
        encrypted_data = {}
        blind_indexes = {}
 
        for key, value in data.items():
            if key in self.ENCRYPTED_FIELDS and value:
                encrypted_data[key] = self._encrypt_field(value, user_id)
                if key in self.SEARCHABLE_ENCRYPTED_FIELDS:
                    blind_indexes[f"{key}_idx"] = self._blind_index(key, value)
            else:
                encrypted_data[key] = value
 
        await self.db.execute("""
            INSERT INTO users (
                id, name, email_encrypted, phone_encrypted,
                cccd_encrypted, address_encrypted,
                email_idx, phone_idx,
                created_at
            ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, NOW())
        """,
            user_id,
            encrypted_data.get("name"),  # name = Internal, khong encrypt
            encrypted_data.get("email"),
            encrypted_data.get("phone"),
            encrypted_data.get("cccd"),
            encrypted_data.get("address"),
            blind_indexes.get("email_idx"),
            blind_indexes.get("phone_idx"),
        )
 
    async def get_user(self, user_id: str) -> Optional[dict]:
        """Doc user va decrypt PII."""
        row = await self.db.fetchrow(
            "SELECT * FROM users WHERE id = $1", user_id
        )
        if not row:
            return None
 
        return {
            "id": row["id"],
            "name": row["name"],
            "email": self._decrypt_field(row["email_encrypted"], user_id),
            "phone": self._decrypt_field(row["phone_encrypted"], user_id),
            "cccd": self._decrypt_field(row["cccd_encrypted"], user_id),
            "address": self._decrypt_field(row["address_encrypted"], user_id),
            "created_at": str(row["created_at"]),
        }
 
    async def find_by_email(self, email: str) -> Optional[dict]:
        """Tim user bang email (dung blind index, khong decrypt toan bo)."""
        email_idx = self._blind_index("email", email)
        row = await self.db.fetchrow(
            "SELECT id FROM users WHERE email_idx = $1", email_idx
        )
        if not row:
            return None
        return await self.get_user(row["id"])
 
    async def gdpr_erase_user(self, user_id: str) -> None:
        """GDPR Right to Erasure -- crypto-shredding approach.
 
        Option 1 (simple): Xoa record hoan toan
        Option 2 (crypto-shredding): Xoa user's DEK -- data thanh garbage
 
        Dung Option 1 cho database records.
        Dung Option 2 cho data trong backups/logs (khong the xoa truc tiep).
        """
        # Xoa tu database
        await self.db.execute("DELETE FROM users WHERE id = $1", user_id)
 
        # Xoa user-specific key version trong Vault
        # (neu dung per-user keys thay vi shared transit key)
        # self.vault.delete_key(f"user-{user_id}-key")
 
        # Log erasure cho compliance audit
        await self.db.execute("""
            INSERT INTO gdpr_erasure_log (user_id, erased_at, method)
            VALUES ($1, NOW(), 'direct_delete + crypto_shredding')
        """, user_id)
 
 
# SQL Schema
CREATE_TABLE_SQL = """
CREATE TABLE users (
    id              VARCHAR(36) PRIMARY KEY,
    name            VARCHAR(255),                -- Internal: khong encrypt
    email_encrypted TEXT NOT NULL,                -- Restricted: encrypted
    phone_encrypted TEXT,                         -- Restricted: encrypted
    cccd_encrypted  TEXT,                         -- Restricted: encrypted
    address_encrypted TEXT,                       -- Restricted: encrypted
    email_idx       VARCHAR(64),                  -- Blind index cho search
    phone_idx       VARCHAR(64),                  -- Blind index cho search
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ
);
 
CREATE INDEX idx_users_email ON users(email_idx);
CREATE INDEX idx_users_phone ON users(phone_idx);
 
-- Audit table cho GDPR compliance
CREATE TABLE gdpr_erasure_log (
    id         SERIAL PRIMARY KEY,
    user_id    VARCHAR(36) NOT NULL,
    erased_at  TIMESTAMPTZ NOT NULL,
    method     VARCHAR(100) NOT NULL
);
"""

6.5 Field-level Encryption in MongoDB (CSFLE)

"""
MongoDB Client-Side Field Level Encryption (CSFLE).
MongoDB driver tu dong encrypt/decrypt -- transparent voi application code.
"""
 
from pymongo import MongoClient
from pymongo.encryption import ClientEncryption, Algorithm
from pymongo.encryption_options import AutoEncryptionOpts
from bson.codec_options import CodecOptions
from bson.binary import STANDARD, UUID_SUBTYPE
import os
 
 
def setup_mongodb_csfle():
    """Setup MongoDB CSFLE voi AWS KMS."""
 
    # KMS provider config
    kms_providers = {
        "aws": {
            "accessKeyId": os.getenv("AWS_ACCESS_KEY_ID"),
            "secretAccessKey": os.getenv("AWS_SECRET_ACCESS_KEY"),
        }
    }
 
    # Master key config (CMK trong AWS KMS)
    master_key = {
        "region": "ap-southeast-1",
        "key": os.getenv("AWS_KMS_KEY_ARN"),  # ARN cua CMK
    }
 
    # Tao Data Encryption Key (DEK) -- chi lam mot lan
    key_vault_namespace = "encryption.__keyVault"
    key_vault_client = MongoClient(os.getenv("MONGODB_URI"))
 
    client_encryption = ClientEncryption(
        kms_providers=kms_providers,
        key_vault_namespace=key_vault_namespace,
        key_vault_client=key_vault_client,
        codec_options=CodecOptions(uuid_representation=STANDARD),
    )
 
    # Tao DEK (luu trong key vault collection, encrypted boi AWS KMS CMK)
    data_key_id = client_encryption.create_data_key(
        "aws", master_key=master_key, key_alt_names=["user-pii-key"]
    )
 
    # Schema map -- dinh nghia field nao can encrypt va bang algorithm nao
    json_schema = {
        "bsonType": "object",
        "encryptMetadata": {"keyId": [data_key_id]},
        "properties": {
            "name": {"bsonType": "string"},  # Khong encrypt
            "email": {
                "encrypt": {
                    "bsonType": "string",
                    # Deterministic: cho phep query exact match
                    "algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
                }
            },
            "phone": {
                "encrypt": {
                    "bsonType": "string",
                    # Random: an toan hon, khong query duoc
                    "algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
                }
            },
            "cccd": {
                "encrypt": {
                    "bsonType": "string",
                    "algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
                }
            },
            "medical_history": {
                "encrypt": {
                    "bsonType": "object",
                    "algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
                }
            },
        },
    }
 
    schema_map = {"mydb.users": json_schema}
 
    # Tao auto-encrypting client
    auto_encryption_opts = AutoEncryptionOpts(
        kms_providers=kms_providers,
        key_vault_namespace=key_vault_namespace,
        schema_map=schema_map,
        # mongocryptd hoac crypt_shared library
        crypt_shared_lib_path="/usr/lib/mongo_crypt_v1.so",
    )
 
    # Client nay TU DONG encrypt khi write va decrypt khi read
    secure_client = MongoClient(
        os.getenv("MONGODB_URI"),
        auto_encryption_opts=auto_encryption_opts,
    )
 
    return secure_client
 
 
if __name__ == "__main__":
    client = setup_mongodb_csfle()
    db = client["mydb"]
    users = db["users"]
 
    # Insert -- email va phone TU DONG duoc encrypt truoc khi gui den MongoDB
    users.insert_one({
        "name": "Nguyen Van Hieu",     # Plaintext (khong encrypt)
        "email": "hieu@company.com",   # Auto-encrypted (deterministic)
        "phone": "0912345678",         # Auto-encrypted (random)
        "cccd": "001234567890",        # Auto-encrypted (random)
    })
 
    # Find by email (deterministic encryption cho phep exact match)
    user = users.find_one({"email": "hieu@company.com"})
    print(f"Found: {user['name']}, {user['email']}")
    # Output: Found: Nguyen Van Hieu, hieu@company.com
    # (tu dong decrypt!)
 
    # Neu connect bang client KHONG co auto-encryption:
    plain_client = MongoClient(os.getenv("MONGODB_URI"))
    raw = plain_client["mydb"]["users"].find_one({"name": "Nguyen Van Hieu"})
    print(f"Raw email: {raw['email']}")
    # Output: Raw email: Binary(6, b'\x06\x...')  <-- ciphertext, khong doc duoc!

7. Mermaid Diagrams

7.1 Envelope Encryption Flow

sequenceDiagram
    participant App as Application
    participant KMS as KMS (AWS/Vault)
    participant Store as Database/S3

    Note over App,Store: === ENCRYPTION FLOW ===

    App->>KMS: GenerateDataKey(KeyId="master-key")
    KMS-->>App: {PlaintextDEK, EncryptedDEK}

    Note over App: Encrypt data voi PlaintextDEK (AES-256-GCM)

    App->>App: ciphertext = AES-GCM(PlaintextDEK, plaintext)
    App->>App: Xoa PlaintextDEK khoi memory

    App->>Store: Store {ciphertext + EncryptedDEK + nonce}

    Note over App,Store: === DECRYPTION FLOW ===

    App->>Store: Read {ciphertext + EncryptedDEK + nonce}
    Store-->>App: {ciphertext, EncryptedDEK, nonce}

    App->>KMS: Decrypt(EncryptedDEK)
    KMS-->>App: PlaintextDEK

    App->>App: plaintext = AES-GCM-Decrypt(PlaintextDEK, ciphertext)
    App->>App: Xoa PlaintextDEK khoi memory

    Note over App,Store: === KEY ROTATION ===

    App->>Store: Read EncryptedDEK (encrypted by KEK-v1)
    App->>KMS: ReEncrypt(EncryptedDEK, NewKeyId="KEK-v2")
    KMS-->>App: NewEncryptedDEK (encrypted by KEK-v2)
    App->>Store: Update EncryptedDEK (ciphertext KHONG doi!)

    Note over App: Data KHONG can re-encrypt!<br/>Chi re-encrypt DEK (vai bytes)

7.2 KMS Architecture

flowchart TB
    subgraph "Applications"
        A1[Payment Service]
        A2[User Service]
        A3[Analytics Service]
    end

    subgraph "Key Management Layer"
        direction TB
        V[HashiCorp Vault Cluster]
        V --> VT[Transit Engine<br/>Encryption as a Service]
        V --> VK[KV Engine<br/>Static Secrets]
        V --> VP[PKI Engine<br/>Certificate Authority]
        V --> VD[Database Engine<br/>Dynamic Credentials]
    end

    subgraph "Cloud KMS"
        KMS[AWS KMS]
        KMS --> CMK1["CMK: vault-unseal<br/>(Auto-unseal Vault)"]
        KMS --> CMK2["CMK: s3-encryption<br/>(S3 SSE-KMS)"]
        KMS --> CMK3["CMK: rds-encryption<br/>(RDS TDE)"]
    end

    subgraph "HSM Layer"
        HSM[CloudHSM Cluster]
        HSM --> HSMK1["Root CA Key<br/>(never extracted)"]
        HSM --> HSMK2["Master Signing Key"]
    end

    subgraph "Storage"
        S3[S3 Buckets<br/>SSE-KMS encrypted]
        RDS[RDS PostgreSQL<br/>TDE enabled]
        Mongo[MongoDB Atlas<br/>CSFLE enabled]
    end

    A1 --> VT
    A1 --> VD
    A2 --> VT
    A2 --> VK
    A3 --> KMS

    V --> KMS
    KMS --> HSM
    VP --> HSM

    CMK2 --> S3
    CMK3 --> RDS
    A2 --> Mongo

    style HSM fill:#ff6b6b,stroke:#333,stroke-width:2px,color:#fff
    style V fill:#7950f2,stroke:#333,stroke-width:2px,color:#fff
    style KMS fill:#f9a825,stroke:#333,stroke-width:2px

7.3 Data Classification Decision Tree

flowchart TD
    Start["Data nay la gi?"] --> Q1{"Chua thong tin<br/>dinh danh ca nhan<br/>(PII/PHI)?"}

    Q1 -->|Co| Q2{"Loai PII nao?"}
    Q1 -->|Khong| Q3{"Data noi bo<br/>hay cong khai?"}

    Q2 -->|"Credit card,<br/>medical, biometric"| R["RESTRICTED<br/>🔴"]
    Q2 -->|"Email, phone,<br/>name, address"| C["CONFIDENTIAL<br/>🟠"]

    Q3 -->|Cong khai| PUB["PUBLIC<br/>🟢"]
    Q3 -->|Noi bo| INT["INTERNAL<br/>🟡"]

    R --> R_ACT["Actions:<br/>- Field-level encryption<br/>- Audit moi access<br/>- Key per tenant<br/>- Tokenization<br/>- PCI-DSS/HIPAA compliance<br/>- 7-year audit retention"]

    C --> C_ACT["Actions:<br/>- Encrypt at rest + transit<br/>- Column-level encryption<br/>- Access control (RBAC)<br/>- Audit log<br/>- GDPR compliance<br/>- Data masking for non-prod"]

    INT --> INT_ACT["Actions:<br/>- Encrypt in transit (TLS)<br/>- Basic access control<br/>- Standard logging"]

    PUB --> PUB_ACT["Actions:<br/>- Integrity check (signing)<br/>- CDN caching OK<br/>- No encryption needed"]

    style R fill:#ff6b6b,stroke:#333,color:#fff
    style C fill:#ff922b,stroke:#333,color:#fff
    style INT fill:#ffd43b,stroke:#333
    style PUB fill:#51cf66,stroke:#333

8. Aha Moments & Pitfalls

Aha Moments

#1 — Encrypting Everything vs Encrypting Smart: Encrypt toan bo database voi TDE thi don gian, nhung khong bao ve khoi DBA doc data. Field-level encryption mat cong hon nhung bao ve tot hon. Phan loai data truoc, chon encryption level phu hop sau.

#2 — Key management is harder than encryption: AES-256 la “solved problem” — thu vien nao cung co. Nhung ai giu key? Key luu o dau? Rotate the nao? Revoke ra sao? Backup key the nao? Day moi la 90% do kho cua encryption. Encryption khong co key management = khong co encryption.

#3 — GDPR Right to Erasure voi Encrypted Data: Khong can xoa tung record khoi moi backup, log, Kafka topic. Chi can xoa encryption key (crypto-shredding). Du lieu van ton tai nhung vinh vien khong the doc. Day la giai phap elegant nhat cho “right to be forgotten” trong he thong phuc tap.

#4 — Audit log lon hon data: Trong he thong compliance, audit log thuong lon gap 5-10 lan data chinh. Phai tinh vao storage estimation va co tiered storage strategy (hot → warm → cold → archive).

#5 — Tokenization giam PCI scope: Thay vi encrypt credit card (van trong PCI scope), dung tokenization de dua data ra khoi scope hoan toan. Chi Token Vault can PCI compliant. Giam chi phi audit va compliance dang ke.

Pitfalls

Pitfall #1 — Encrypt tat ca bang mot key duy nhat: Mot key cho toan bo database. Key bi lo = toan bo data bi lo. Dung envelope encryption voi DEK per-record hoac per-tenant.

Pitfall #2 — Luu encryption key cung cho voi encrypted data: “Em de key trong config file tren cung server voi database.” Hacker lay duoc server = lay duoc ca data va key. Key PHAI nam rieng biet (KMS/Vault/HSM).

Pitfall #3 — Quen encrypt backup: Production encrypt chuan chinh. Nhung backup file tren S3 lai khong encrypt. Attacker chi can access backup la co toan bo data plaintext. Moi backup phai encrypt, va backup encryption key phai khac production key.

Pitfall #4 — Key rotation xoa key cu: Rotate key nhung xoa key cu ngay lap tuc. Tat ca data encrypt bang key cu khong the decrypt. Luon giu key cu it nhat bang thoi gian retention cua data. AWS KMS tu dong giu tat ca key versions.

Pitfall #5 — Dung ECB mode: AES-ECB khong dung IV, cung plaintext block → cung ciphertext block. Co the nhin thay pattern trong data (vi du noi tieng: “ECB penguin”). Luon dung GCM hoac CTR+HMAC.

Pitfall #6 — Khong test restore tu encrypted backup: Team cau hinh backup encryption, nhung khong bao gio test restore. Khi can restore that, phat hien key da bi rotate va version cu bi xoa. Test restore dinh ky, it nhat moi quy.

Pitfall #7 — Log chua PII: Application log ghi INFO: User hieu@company.com logged in from 192.168.1.1. Email la PII, IP la PII. Log bi truy cap = data breach. Sanitize PII trong logs, hoac encrypt log entries chua PII.

Pitfall #8 — GDPR right to erasure cho backups: User yeu cau xoa data. Team xoa khoi database nhung quen rang data con trong 30 ban backup. Voi crypto-shredding (per-user DEK), xoa key la du. Khong co per-user key = phai restore va rewrite moi backup.


Prerequisite

Lien quan truc tiep

Se dung kien thuc nay

Tham khao

  • Alex Xu, System Design Interview — Chapter 9: Design a Web Crawler (HTTPS/TLS), Chapter 13: Design a Chat System (E2E encryption)
  • NIST SP 800-57: Recommendation for Key Management
  • OWASP Cryptographic Failures (Top 10 #2)
  • AWS Well-Architected Framework — Security Pillar: Data Protection
  • GDPR Articles 5, 17, 20, 25, 32, 33, 34
  • PCI-DSS v4.0 Requirements 3, 4, 10
  • HashiCorp Vault Documentation: Transit Secrets Engine
  • MongoDB Client-Side Field Level Encryption Documentation

Tuan truoc: Tuan-14-AuthN-AuthZ-Security — Authentication & Authorization Tuan sau: Tuan-16-Design-URL-Shortener — Ap dung tat ca kien thuc vao bai toan thuc te