Tuần Bonus: Platform Engineering & Internal Developer Platform (IDP)
“Year 1: Mỗi dev tự setup K8s, CI/CD, monitoring → 2 tuần để ship feature đầu tiên. Year 2: Có IDP với golden paths → dev mới ship feature trong 2 giờ. Đó là ‘platform as a product’ — Platform Engineering không phải ‘DevOps mới’, nó là discipline thiết kế DevEx ở quy mô.”
Tags: system-design platform-engineering idp backstage devex bonus Student: Hieu (Backend Dev → Architect) Prerequisite: Tuan-11-Microservices-Pattern · Tuan-12-CICD-Pipeline Liên quan: Tuan-13-Monitoring-Observability · Tuan-Bonus-FinOps-Cloud-Unit-Economics
1. Context & Why
Analogy đời thường — Khu công nghiệp tự cung
Hieu, tưởng tượng em mở công ty 100 startup nhỏ trong cùng 1 khu công nghiệp. Có 2 mô hình:
Mô hình 1 — Mỗi startup tự xoay:
- Startup A tự thuê điện, nước, internet, security guard
- Startup B làm tương tự
- 100 startup × 5 việc setup × 2 tuần = 1000 tuần chỉ để bắt đầu
- Mỗi startup mất tiền & thời gian “non-core”
Mô hình 2 — Khu công nghiệp có shared services:
- Khu đã setup: điện, nước, internet, security, fire safety
- Startup chỉ cần “đăng ký, ký hợp đồng, plug-in” → 1 ngày bắt đầu
- “Self-service catalog” cho mọi tiện ích
- 100 startup × 1 ngày = 100 ngày total
Đây chính là Platform Engineering: build “khu công nghiệp” cho dev teams. Internal Developer Platform (IDP) là portal self-service đó.
Tại sao Backend Dev cần hiểu Platform Engineering?
| Lý do | Hậu quả |
|---|---|
| 53% organizations dùng IDP (Port 2025 report) | Industry standard, không adopt = behind |
| DevEx = retention | Bad DevEx → engineer churn ($150K/hire) |
| Cognitive load | Backend dev không nên phải biết K8s, Terraform, Prometheus chi tiết |
| Cost vs value | Investing in platform → 2-5x productivity team |
| Career path | ”Platform Engineer” là role hot 2024-2026 |
Tại sao Alex Xu không cover?
Alex Xu Vol 1+2 nói về CI/CD, K8s nhưng không cover operating model của infrastructure. Platform Engineering là organizational pattern, không phải tool — gap cho architect-level.
Tham chiếu chính
- Team Topologies (Skelton & Pais, 2nd ed 2024) — https://teamtopologies.com/
- Platform Engineering (Camille Fournier et al. 2024)
- Backstage docs — https://backstage.io/docs/
- Port 2025 State of IDP — https://www.port.io/state-of-internal-developer-portals
- CNCF Platforms WG — https://github.com/cncf/tag-app-delivery/blob/main/platforms-whitepaper/
2. Deep Dive — Khái niệm cốt lõi
2.1 Team Topologies — Foundation
Team Topologies (Skelton & Pais 2019) defines 4 team types:
| Team type | Vai trò |
|---|---|
| Stream-aligned | Customer-facing, ship features (most teams) |
| Platform | Provide internal capabilities for stream teams |
| Enabling | Coach stream teams on new tech (temporary) |
| Complicated subsystem | Specialized expertise (e.g., ML, cryptography) |
Interaction modes:
- X-as-a-Service: Platform team provides service, stream consumes (most common)
- Collaboration: Two teams work together (limited duration)
- Facilitating: Enabling team helps stream team adopt new tech
2.2 Platform as a Product
Critical mindset shift: Internal platform = product, dev teams = customers.
Product disciplines apply:
- User research: Interview devs, identify pain points
- Roadmap: Prioritize features by impact
- Adoption metrics: Are devs using the platform?
- NPS / CSAT: Are devs happy?
- Iterate: Continuous improvement
Anti-pattern: “Build it, they will come”. Force devs to use → resentment, shadow IT.
Right mindset: “Make the right way the easy way” → devs want to use platform.
2.3 Golden Paths
Golden Path = opinionated, well-supported way to do common task.
Example: “Deploy a new microservice”
Without golden path (10 days):
1. Decide language (Go? Python? Node?)
2. Setup repo, CI, linters
3. Write Dockerfile from scratch
4. Setup K8s manifests
5. Configure ingress, certs
6. Setup monitoring (Prometheus scrape, dashboards)
7. Setup logging (Loki, log format)
8. Setup tracing (OpenTelemetry SDK)
9. Setup secrets (Vault integration)
10. Setup CI/CD pipeline
11. Code review process
12. Deploy to staging, prod
With golden path (1 day):
$ idp create-service --template=python-api --name=my-service
→ Repo created with template (Dockerfile, helm chart, monitoring)
→ CI/CD pipeline auto-configured
→ Service registered in catalog
→ Developer just writes business logic
Key principles:
- Opinionated: Strong defaults (specific language, framework)
- Paved: Well-supported (docs, on-call, examples)
- Optional: Devs can deviate if needed (but harder)
- Versioned: v1 → v2 with migration path
Common golden paths:
- New microservice
- New frontend app
- New data pipeline
- New ML model serving
- Database migration
2.4 Internal Developer Platform (IDP) Components
┌─────────────────────────────────────────────────────┐
│ Developer Portal UI │
│ (Backstage, Port, Cortex) │
│ │
│ - Service Catalog │
│ - TechDocs │
│ - Software Templates (scaffolder) │
│ - Plugins (CI status, on-call, costs) │
└──────────────────────┬──────────────────────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌────────┐ ┌─────────┐ ┌─────────┐
│Source │ │ CI/CD │ │Infra as │
│Control │ │ │ │Code │
│GitHub/ │ │ Argo, │ │ Terra- │
│GitLab │ │ Jenkins │ │ form, │
│ │ │ │ │ Cross- │
│ │ │ │ │ plane │
└────────┘ └─────────┘ └─────────┘
┌────────────────────────────────────────┐
│ Underlying Infrastructure │
│ K8s, Cloud (AWS/GCP/Azure), │
│ DBs, Monitoring, Logging │
└────────────────────────────────────────┘
2.4.1 Service Catalog
What every dev team needs: “What services exist? Who owns them? What do they depend on?”
Backstage Catalog:
# catalog-info.yaml — committed to repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-service
description: Handles payment processing
annotations:
backstage.io/techdocs-ref: dir:.
pagerduty.com/integration-key: PE-12345
prometheus.io/dashboard: https://grafana/d/payment
spec:
type: service
lifecycle: production
owner: team-payments
system: checkout
dependsOn:
- resource:postgres-payments
- component:fraud-detection
providesApis:
- payment-api-v1Auto-discovery: Backstage scans repos for catalog-info.yaml files.
Visualizations: Service dependency graph, ownership map.
2.4.2 TechDocs
Documentation lives với code, not in separate wiki.
my-service/
├── catalog-info.yaml
├── mkdocs.yml
├── docs/
│ ├── index.md
│ ├── architecture.md
│ ├── runbook.md
│ └── api.md
└── src/
Backstage TechDocs plugin auto-renders Markdown → searchable docs site.
Benefit: Docs versioned with code. Update code → update docs in same PR.
2.4.3 Software Templates (Scaffolder)
Self-service service creation.
# template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: python-microservice
title: Python Microservice
description: Create new Python service with FastAPI
spec:
parameters:
- title: Basic info
properties:
name:
type: string
title: Service name
description:
type: string
owner:
type: string
ui:field: OwnerPicker
steps:
- id: fetch-template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
- id: publish
action: publish:github
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=myorg
defaultBranch: main
- id: register
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}Result: Dev clicks button → repo created with template, registered in catalog, CI configured.
2.4.4 Crossplane / OpenTofu — Infrastructure as platform service
Crossplane: Provision cloud resources via K8s CRDs.
# Postgres database via Crossplane composition
apiVersion: db.example.org/v1alpha1
kind: PostgreSQLDatabase
metadata:
name: payments-db
spec:
parameters:
storageGB: 100
tier: production
region: us-east-1
compositionSelector:
matchLabels:
provider: aws
tier: productionBehind the scenes: Crossplane creates RDS instance, security group, secret in K8s.
Why? Devs use familiar K8s YAML, platform team controls compositions.
2.5 Backstage vs Port vs Cortex vs OpsLevel
| Tool | Origin | Strengths | Best for |
|---|---|---|---|
| Backstage | Spotify (open source) | Most flexible, plugin ecosystem | Engineering-heavy orgs |
| Port | Israeli startup | No-code, fast time-to-value | Mid-size, less custom |
| Cortex | US startup | Service quality scorecards | Quality-focused orgs |
| OpsLevel | US startup | DevEx maturity model | Ops-focused orgs |
| Humanitec | Score.dev | Workload-centric, multi-cloud | Enterprise, multi-cloud |
| CNOE (CNCF) | Adobe et al. | OSS reference architecture | Vendor-neutral preference |
2.6 Adoption Patterns
Common failure: Build platform 18 months → 0 adoption.
Right approach (Camille Fournier):
- Start with 1-2 stream teams as design partners
- Solve their top 3 pain points (don’t build big bang)
- Make it easy to adopt (auto-migration tools)
- Measure adoption, iterate
- Expand to more teams
Adoption metrics:
- % services in catalog
- % services using golden path template
- DevEx survey scores (NPS, satisfaction)
- Time to first deploy (new dev)
- Incident frequency (lower with platform)
2.7 Score.dev — Workload Specification
Score (open spec): Cloud-native workload definition agnostic of platform.
# score.yaml — describe workload portably
apiVersion: score.dev/v1b1
metadata:
name: my-service
containers:
app:
image: myorg/my-service:latest
variables:
PORT: "8080"
DB_URL: ${resources.db.uri}
resources:
db:
type: postgres
service:
ports:
web:
port: 80
targetPort: 8080Translate to:
- Local:
score-compose generate→ docker-compose.yml - K8s:
score-helm→ Helm values - Humanitec: native consumption
Goal: Dev writes 1 spec, deploys anywhere.
2.8 GitOps for Platform
Platform configuration = Git repo. Apply via ArgoCD/Flux.
platform-config/
├── teams/
│ ├── team-payments/
│ │ ├── members.yaml
│ │ ├── services.yaml
│ │ └── permissions.yaml
│ └── team-fraud/
├── golden-paths/
│ ├── python-api/
│ └── go-cli/
├── policies/
│ ├── opa/ # Open Policy Agent rules
│ └── kyverno/
└── infra/
├── shared/ # Shared infrastructure (DBs, queues)
└── per-tenant/
Changes via PR: Platform changes reviewed like code.
2.9 Policy as Code (OPA, Kyverno)
Enforce platform standards.
# policy: deployments must have resource limits
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Deployment"
container := input.request.object.spec.template.spec.containers[_]
not container.resources.limits.cpu
msg := sprintf("Container %v missing CPU limit", [container.name])
}# Kyverno: enforce ownership label
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-owner-label
spec:
validationFailureAction: enforce
rules:
- name: check-owner-label
match:
any:
- resources:
kinds: ["Deployment"]
validate:
message: "Deployments must have 'owner' label"
pattern:
metadata:
labels:
owner: "?*"2.10 Anti-Patterns
Anti-pattern 1: Platform as Ivory Tower
Platform team builds without listening to dev needs. Fix: Embed platform engineers in stream teams initially.
Anti-pattern 2: Mandatory adoption from Day 1
“Use platform or get fired”. Fix: Voluntary adoption, make it 10x better than alternatives.
Anti-pattern 3: Platform as DevOps Rebrand
Same DevOps team, just renamed. Fix: Platform mindset = product mindset. Hire product manager.
Anti-pattern 4: Tool-first
“Let’s deploy Backstage!” without understanding why. Fix: Start with user research. Tool comes after problem.
Anti-pattern 5: All-in-one mega platform
Try to platform-ize everything immediately. Fix: Start with 1-2 golden paths, expand based on demand.
3. Estimation
3.1 Platform team size
Rule of thumb: 1 platform engineer per 10-30 stream-aligned engineers.
| Org size | Stream eng | Platform eng |
|---|---|---|
| Startup (50) | 30 | 2-3 |
| Growth (200) | 150 | 8-15 |
| Scale (1000) | 700 | 30-70 |
3.2 Time investment
To ship MVP IDP:
- Service catalog + TechDocs: 1-2 months
- 1-2 golden paths: 2-3 months
- Self-service infrastructure: 3-6 months
- Mature multi-team adoption: 12-18 months
Cost trade-off:
- Investment: 4-8 platform engineers × 12 months × 1-2M
- Savings: 100 stream engineers × 20% productivity gain × 3M/year
- ROI: Year 2 onwards
3.3 Adoption metrics
Healthy IDP:
-
80% services in catalog
-
60% new services use templates
- DevEx NPS > 40
- Time to first deploy: < 1 day for new dev
Unhealthy:
- < 20% adoption (platform built but unused)
- DevEx NPS < 0
- Stream teams build shadow platforms
4. Security First
4.1 RBAC across platform
# Backstage RBAC permission policy
permissions:
- resource: catalog-entity
actions:
- read: allow
- update:
conditions:
- rule: IS_OWNER
params:
claims:
- sub
- resource: scaffolder-template
actions:
- execute:
conditions:
- rule: HAS_GROUP
params:
claims:
- groups
expected: developers4.2 Secret management
Platform must integrate with secret store (Vault, AWS Secrets Manager).
# Service template includes secret integration
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: my-service-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: my-service-secrets
data:
- secretKey: db-password
remoteRef:
key: services/my-service/db-password4.3 Supply chain security
- SBOM (Software Bill of Materials) for every service
- Image signing (Cosign, Sigstore)
- Dependency scanning (Snyk, Dependabot)
- Policy enforcement (no critical CVEs in production)
4.4 Audit trail
Every platform action logged:
- Who triggered template scaffolder
- What changed in catalog
- Who deployed to production
Forward to SIEM for compliance.
5. DevOps — Vận hành Platform
5.1 Backstage deployment
# docker-compose.yml — local dev
version: "3"
services:
backstage:
image: backstage:latest
ports:
- "3000:3000"
- "7007:7007"
environment:
POSTGRES_HOST: db
POSTGRES_PORT: 5432
POSTGRES_USER: backstage
POSTGRES_PASSWORD: ${DB_PASSWORD}
GITHUB_TOKEN: ${GITHUB_TOKEN}
depends_on: [db]
db:
image: postgres:15
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- bs-data:/var/lib/postgresql/data
volumes:
bs-data:Production: K8s deployment with HA postgres, Redis cache, OAuth integration.
5.2 Catalog auto-discovery
# app-config.yaml
catalog:
rules:
- allow: [Component, System, API, Resource, Location, Group, User]
locations:
# Static location
- type: file
target: ../../examples/entities.yaml
# GitHub org auto-discovery
- type: github-discovery
target: https://github.com/myorg/*
providers:
githubOrg:
myorg:
orgUrl: https://github.com/myorg
catalogPath: /catalog-info.yaml
schedule:
frequency: { minutes: 30 }
timeout: { minutes: 3 }5.3 Plugins ecosystem
Common plugins:
- CI/CD: GitHub Actions, GitLab CI, ArgoCD
- Monitoring: Grafana, Prometheus, Datadog
- On-call: PagerDuty, Opsgenie
- Cost: Kubecost, Vantage
- Security: Snyk, Dependabot
- Infra: AWS, Crossplane, Terraform
5.4 Metrics
groups:
- name: idp_metrics
rules:
- alert: BackstageDown
expr: up{job="backstage"} == 0
for: 5m
- alert: CatalogIngestionLag
expr: backstage_catalog_processing_duration_seconds > 600
for: 30m
- alert: ScaffolderFailures
expr: rate(backstage_scaffolder_task_failed_total[1h]) > 0.1
for: 30mCustom DevEx metrics:
services_in_catalog_totalgolden_path_usage_totaltemplate_executions_totaltime_to_first_deploy_seconds(per dev)
5.5 Roll out plan
Phase 1 (Months 1-3): Pilot
- 1-2 design partner teams
- Service catalog only
- Validate value
Phase 2 (Months 4-6): Expand
- All teams onboarded to catalog
- 1-2 golden paths (most common service type)
- TechDocs for top services
Phase 3 (Months 7-12): Mature
- 5+ golden paths
- Self-service infrastructure
- Cost dashboards
- On-call integration
Phase 4 (Year 2+): Optimize
- DevEx metrics-driven improvement
- Multi-cluster, multi-cloud
- Advanced governance
6. Code Implementation
6.1 Custom Backstage plugin
// plugins/cost-tracker/src/plugin.ts
import { createPlugin, createRouteRef } from '@backstage/core-plugin-api';
import { Entity } from '@backstage/catalog-model';
export const costTrackerPlugin = createPlugin({
id: 'cost-tracker',
routes: {
root: createRouteRef({ id: 'cost-tracker' }),
},
});
// Component to display cost on entity page
export const CostCard = ({ entity }: { entity: Entity }) => {
const cost = useEntityCost(entity);
return (
<Card>
<CardHeader title="Monthly Cost" />
<CardContent>
<Typography variant="h3">${cost.total}</Typography>
<Typography>Compute: ${cost.compute}</Typography>
<Typography>Storage: ${cost.storage}</Typography>
<Typography>Network: ${cost.network}</Typography>
</CardContent>
</Card>
);
};
const useEntityCost = (entity: Entity) => {
// Query Kubecost / Vantage / etc.
return useApi(costApiRef).getEntityCost(entity);
};6.2 Golden path template (FastAPI service)
# template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: fastapi-service
title: FastAPI Microservice
spec:
type: service
parameters:
- title: Service Info
properties:
name: { type: string, title: Service Name }
owner: { type: string, ui:field: OwnerPicker }
description: { type: string }
- title: Database
properties:
useDatabase:
type: boolean
title: Need Postgres?
dbSize:
type: string
enum: [small, medium, large]
if:
properties:
useDatabase: { const: true }
steps:
- id: fetch-skeleton
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
useDatabase: ${{ parameters.useDatabase }}
- id: publish-github
action: publish:github
input:
repoUrl: github.com?owner=myorg&repo=${{ parameters.name }}
description: ${{ parameters.description }}
repoVisibility: internal
- id: provision-db
if: ${{ parameters.useDatabase }}
action: aws:rds:create
input:
dbName: ${{ parameters.name }}-db
dbSize: ${{ parameters.dbSize }}
region: us-east-1
- id: register-catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish-github.output.repoContentsUrl }}# skeleton/${{values.name}}/app/main.py
"""${{ values.name }} - FastAPI service"""
from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
app = FastAPI(title="${{ values.name }}")
# Auto-monitoring
Instrumentator().instrument(app).expose(app)
FastAPIInstrumentor.instrument_app(app)
@app.get("/")
async def root():
return {"service": "${{ values.name }}"}
@app.get("/health")
async def health():
return {"status": "ok"}# skeleton/${{values.name}}/catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: ${{ values.name }}
description: ${{ values.description }}
annotations:
github.com/project-slug: myorg/${{ values.name }}
backstage.io/techdocs-ref: dir:.
pagerduty.com/integration-key: REPLACE_ME
prometheus.io/dashboard: https://grafana/d/service-template
spec:
type: service
lifecycle: experimental
owner: ${{ values.owner }}
system: my-system# skeleton/${{values.name}}/.github/workflows/ci.yml
name: CI
on:
push: { branches: [main] }
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- run: pip install -r requirements.txt
- run: pytest
- run: ruff check .
build:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- run: |
docker build -t ghcr.io/myorg/${{ values.name }}:latest .
docker push ghcr.io/myorg/${{ values.name }}:latest6.3 Crossplane composition for Postgres
# composition for Postgres in K8s
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: postgres-aws
labels:
provider: aws
tier: production
spec:
compositeTypeRef:
apiVersion: db.example.org/v1alpha1
kind: PostgreSQLDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.crossplane.io/v1alpha1
kind: DBInstance
spec:
forProvider:
engine: postgres
engineVersion: "15.4"
dbInstanceClass: db.t3.medium
allocatedStorage: 100
multiAZ: true
backupRetentionPeriod: 7
deletionProtection: true
patches:
- fromFieldPath: spec.parameters.storageGB
toFieldPath: spec.forProvider.allocatedStorage
- fromFieldPath: spec.parameters.tier
toFieldPath: spec.forProvider.dbInstanceClass
transforms:
- type: map
map:
small: db.t3.small
medium: db.t3.medium
production: db.r6i.xlarge7. System Design Diagrams
7.1 Team Topologies
flowchart TB subgraph Platform["Platform Teams"] DataPlatform[Data Platform Team] InfraPlatform[Infra Platform Team] SecPlatform[Security Platform Team] end subgraph Stream["Stream-Aligned Teams"] Payments[Payments Team] Checkout[Checkout Team] Catalog[Catalog Team] Search[Search Team] end subgraph Enabling["Enabling Teams"] SRE[SRE Coaches] Architects[Architecture Council] end subgraph Subsystem["Complicated Subsystem"] ML[ML Platform] Crypto[Crypto/PKI] end Stream -->|consume X-as-a-Service| Platform Enabling -.facilitate.-> Stream Subsystem -->|specialized service| Stream style Platform fill:#bbdefb style Stream fill:#c8e6c9 style Enabling fill:#fff9c4 style Subsystem fill:#ffe0b2
7.2 IDP Layered Architecture
flowchart TB Dev[Developer] --> Portal[Developer Portal<br/>Backstage / Port] Portal --> Catalog[Service Catalog] Portal --> Templates[Software Templates] Portal --> Docs[TechDocs] Portal --> Insights[Insights & Scorecards] Templates --> Scaffolder[Scaffolder Engine] Scaffolder --> SCM[GitHub / GitLab] Scaffolder --> CICD[CI/CD Pipelines] Scaffolder --> Infra[Crossplane / Terraform] SCM --> Apps[Application Code] CICD --> Deploy[Deploy to K8s] Infra --> Cloud[Cloud Resources] Apps --> Deploy Deploy --> Runtime[K8s Cluster] Cloud --> Runtime Runtime --> Observ[Observability<br/>Prometheus / Grafana] Observ --> Insights style Portal fill:#bbdefb style Scaffolder fill:#c8e6c9
7.3 Golden Path Flow
sequenceDiagram participant Dev as Developer participant Portal as IDP Portal participant Scaff as Scaffolder participant GitHub participant CICD participant K8s participant Catalog Dev->>Portal: Click "New Service" template Portal->>Dev: Form (name, owner, options) Dev->>Portal: Submit Portal->>Scaff: Execute template Scaff->>GitHub: Create repo Scaff->>GitHub: Push skeleton code Scaff->>CICD: Configure pipeline Scaff->>K8s: Create namespace Scaff->>Catalog: Register entity Catalog-->>Portal: Service appears Portal-->>Dev: Done! Repo URL, dashboard link Note over Dev,Catalog: 5 minutes vs 5 days previously
7.4 Team-Platform Interaction Modes
flowchart LR subgraph TS1["Stream Team"] DevA[Developer A] end subgraph TP["Platform Team"] PE[Platform Engineer] end subgraph Modes["Interaction Modes"] XaaS["X-as-a-Service<br/>(default mode)<br/>Self-service portal,<br/>docs, low-friction"] Collab["Collaboration<br/>(temporary)<br/>Joint work on<br/>new pattern"] Facilitate["Facilitating<br/>(coaching)<br/>Help adopt<br/>new tech"] end DevA -->|90% interactions| XaaS DevA <-->|5%| Collab DevA <-.5%.-> Facilitate XaaS --> PE Collab --> PE Facilitate --> PE style XaaS fill:#c8e6c9 style Collab fill:#fff9c4 style Facilitate fill:#bbdefb
8. Aha Moments & Pitfalls
Aha Moments
#1: Platform = product, devs = customers. Mindset shift quan trọng nhất. Apply product disciplines: research, roadmap, NPS.
#2: Golden paths > flexibility. Strong opinions với good defaults > “you can use anything”. Reduce cognitive load là main value.
#3: Self-service > tickets. Dev mở ticket “give me K8s namespace” → 2 ngày. Self-service template → 2 phút. Time saved = team velocity.
#4: Catalog là source of truth. Service ownership, dependencies, runbooks — all in catalog. On-call can find anything in 30 seconds.
#5: Adoption is hard. Built ≠ used. Continuous sales effort to platform team. Make it 10x better than DIY.
#6: Team Topologies trumps tools. Right team structure > best tool. Wrong team structure can’t be fixed by Backstage.
#7: TechDocs với code. Docs in repo, versioned, updated với PR. No more outdated wiki.
#8: Platform engineering ≠ DevOps rebrand. Different mindset. DevOps = “you build it, you run it”. Platform = “we provide tools so you build it well”.
Pitfalls
Pitfall 1: Build first, ask later
Spend 18 months → 5% adoption. Fix: Start with 1-2 design partners, MVP fast, iterate.
Pitfall 2: Force adoption
Mandate platform → resentment, shadow IT. Fix: Make it 10x better, voluntary adoption.
Pitfall 3: One platform fits all
Try to satisfy every team’s needs → bloat. Fix: 80/20 rule. Solve common cases well. Allow exceptions.
Pitfall 4: No product manager
Platform team without PM → no roadmap, no user research. Fix: Hire dedicated platform PM.
Pitfall 5: Tool worship
“We adopted Backstage!” → not used. Fix: Tool serves people. Start with problem, end with tool.
Pitfall 6: No DevEx metrics
Don’t know if platform is working. Fix: NPS quarterly, time-to-first-deploy, adoption metrics.
Pitfall 7: Platform team isolation
Platform team in vacuum, away from stream teams. Fix: Embed engineers in stream teams initially. Office hours.
Pitfall 8: Reinventing wheels
Build custom service catalog instead of Backstage. Fix: Adopt OSS, customize. Don’t compete with category leaders.
Pitfall 9: No security baked in
Devs use platform but bypass security controls. Fix: Make secure way the easy way. Policy-as-code enforces.
Pitfall 10: Underestimate ongoing investment
Build once, expect to last forever. Fix: Continuous investment. Tech debt accumulates fast.
9. Internal Links
| Topic | Liên hệ |
|---|---|
| Tuan-11-Microservices-Pattern | Microservices need platform; service catalog tracks them |
| Tuan-12-CICD-Pipeline | Golden paths automate CI/CD setup |
| Tuan-13-Monitoring-Observability | Platform integrates monitoring |
| Tuan-14-AuthN-AuthZ-Security | RBAC across platform |
| Tuan-Bonus-FinOps-Cloud-Unit-Economics | Cost dashboard in IDP |
| Tuan-Bonus-Progressive-Delivery | Deploy strategy via platform |
Tham khảo
Books:
- Team Topologies (Skelton & Pais, 2nd ed 2024) — https://teamtopologies.com/
- Platform Engineering (Camille Fournier, 2024)
- The DevOps Handbook (Kim, Humble, Debois 2016)
Reports:
- Port, 2025 State of Internal Developer Portals — https://www.port.io/state-of-internal-developer-portals
- Gartner Platform Engineering Magic Quadrant
- ThoughtWorks Tech Radar — Platform sections
Tools docs:
- Backstage — https://backstage.io/docs/
- Port — https://docs.getport.io/
- Cortex — https://docs.cortex.io/
- Crossplane — https://docs.crossplane.io/
- Score — https://score.dev/
Engineering blogs:
- Spotify Engineering — https://engineering.atspotify.com/category/platform/
- Netflix Tech Blog (platform posts)
- Pinterest, Platform-as-a-Product
- Adobe (CNOE founder)
Tiếp theo: Tuan-Bonus-FinOps-Cloud-Unit-Economics — FinOps complement Platform Engineering với cost lens.