Tuần Bonus: Model Context Protocol (MCP) Architecture

“Trước MCP: Mỗi AI app phải tự code integration với GitHub, Slack, Postgres, Stripe — N×M problem cho N apps × M services. Sau MCP: 1 standard protocol, 2000+ servers chia sẻ chung. MCP là ‘USB-C của AI tools’ — Anthropic giới thiệu cuối 2024, OpenAI/Google/Sourcegraph adopt 2025.”

Tags: system-design mcp ai-infrastructure protocol anthropic bonus Student: Hieu (Backend Dev → Architect) Prerequisite: Tuan-04-API-Design-REST-gRPC · Tuan-14-AuthN-AuthZ-Security Liên quan: Tuan-Bonus-LLM-Serving-Infrastructure · Tuan-Bonus-Outbox-Pattern


1. Context & Why

Analogy đời thường — Chuẩn USB-C

Hieu, tưởng tượng năm 2010 mỗi nhãn điện thoại 1 cổng sạc:

  • iPhone: Lightning
  • Samsung: Mini-USB
  • Nokia: 3.5mm
  • BlackBerry: Mini-USB B
  • Sony: Magnetic

Khách hàng phải mua N cáp khác nhau. Khách sạn phải có M loại cáp cho khách. Total integration = N × M.

USB-C (2015): 1 chuẩn, hoạt động với mọi thiết bị. Khách 1 cáp, khách sạn 1 loại.

Model Context Protocol (MCP) là USB-C của AI tools:

  • Trước: Claude Desktop, Cursor, ChatGPT, Cline mỗi cái tự code integration với GitHub, Slack, Postgres, Linear…
  • Sau: 1 chuẩn JSON-RPC, ai cũng implement được, share lẫn nhau

Tại sao Backend Dev cần hiểu MCP?

Lý doHậu quả
AI agents are taking offMọi product 2025+ tích hợp AI
Tool standardizationBuild 1 MCP server, work cho mọi LLM
Industry standardAnthropic, OpenAI, Google DeepMind, Sourcegraph adopt
Security perimeterMCP servers expose data → cần auth, rate limit, audit
Backend skillBuild MCP server = Python/TS server với rich API

Timeline & Adoption

  • Nov 2024: Anthropic giới thiệu MCP với Claude Desktop
  • Dec 2024: First wave servers (filesystem, GitHub, Slack)
  • Mar 2025: Streamable HTTP transport — production ready
  • 2025 Q2-Q3: ~2000 community servers
  • Aug 2025: OpenAI announces support
  • Q4 2025: Google DeepMind, Sourcegraph integrate
  • Nov 2025: Spec version 2025-11-25

Tham chiếu chính


2. Deep Dive — Khái niệm cốt lõi

2.1 The N×M Problem

Trước MCP:

LLM Apps (N):                          Services (M):
- Claude Desktop                       - GitHub
- Cursor                               - Slack
- Cline                                - Postgres
- ChatGPT                              - Linear
- Custom apps                          - Notion
                                       - Stripe
                                       - Files

Total integrations needed: N × M = 5 × 7 = 35
Each app must build/maintain integration with each service.

Sau MCP:

LLM Apps speak MCP client:
- Claude Desktop → MCP client
- Cursor → MCP client

Services expose MCP server:
- github-mcp-server
- slack-mcp-server
- postgres-mcp-server

Total integrations: N + M = 5 + 7 = 12
Each app implements MCP client once.
Each service implements MCP server once.

2.2 MCP Architecture

┌──────────────────┐                   ┌──────────────────┐
│   AI Application │                   │   MCP Server     │
│   (Host)         │                   │   (Tool/Service) │
│                  │                   │                  │
│  ┌────────────┐ │  JSON-RPC 2.0    │ ┌──────────────┐ │
│  │ MCP Client ├─┼──────────────────┼─┤   Service    │ │
│  └────────────┘ │     (transport)   │ │   logic      │ │
│                  │                   │ └──────────────┘ │
│  ┌────────────┐ │                   │                  │
│  │    LLM     │ │                   │ Exposes:         │
│  │  (Claude,  │ │                   │ - Tools          │
│  │   GPT-4...│ │                   │ - Resources      │
│  └────────────┘ │                   │ - Prompts        │
└──────────────────┘                   └──────────────────┘

3 actor:

  • Host: User-facing AI app (Claude Desktop, Cursor)
  • Client: Embedded in host, manages MCP connections
  • Server: Exposes capabilities (tools, resources, prompts)

2.3 Three Capabilities

2.3.1 Tools (callable functions)

LLM can invoke functions on server.

// Server exposes tool
{
  "name": "search_github_issues",
  "description": "Search GitHub issues by query",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": { "type": "string" },
      "repo": { "type": "string" },
      "limit": { "type": "integer", "default": 10 }
    },
    "required": ["query"]
  }
}
 
// LLM calls
{
  "method": "tools/call",
  "params": {
    "name": "search_github_issues",
    "arguments": {
      "query": "memory leak",
      "repo": "anthropic/sdk",
      "limit": 5
    }
  }
}
 
// Response
{
  "content": [
    {
      "type": "text",
      "text": "Found 3 issues: ..."
    }
  ]
}

2.3.2 Resources (read-only data)

LLM can read files, DB rows, web pages.

// List resources
{
  "method": "resources/list"
}
// Response
{
  "resources": [
    {
      "uri": "file:///path/to/doc.md",
      "name": "Documentation",
      "mimeType": "text/markdown"
    }
  ]
}
 
// Read resource
{
  "method": "resources/read",
  "params": { "uri": "file:///path/to/doc.md" }
}
// Response
{
  "contents": [
    { "uri": "...", "mimeType": "text/markdown", "text": "# Title\n..." }
  ]
}

2.3.3 Prompts (reusable templates)

Server provides prompt templates user can invoke.

{
  "method": "prompts/get",
  "params": {
    "name": "code_review",
    "arguments": { "file": "src/app.py" }
  }
}
// Response: structured prompt with file content embedded

2.4 Transport Layers

2.4.1 stdio (local processes)

Most common cho local tools (filesystem, shell access).

Host process spawns server as subprocess
Communication via stdin/stdout (JSON-RPC over newline-delimited JSON)

Pros: Simple, secure (no network exposure), low latency Cons: Local only, single client

Example: Claude Desktop launches npx @modelcontextprotocol/server-filesystem as subprocess.

2.4.2 Streamable HTTP (production)

Introduced 2025: HTTP transport for remote MCP servers.

POST /mcp HTTP/1.1
Host: server.example.com
Content-Type: application/json
Authorization: Bearer <token>

{"jsonrpc":"2.0","method":"tools/list","id":1}

HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"jsonrpc":"2.0","result":{...},"id":1}
data: {"jsonrpc":"2.0","method":"notifications/...","params":{...}}

Features:

  • Server-Sent Events (SSE) for streaming responses
  • HTTP-friendly (auth, proxy, CDN)
  • Stateless or stateful (session)

Pros: Production-ready, standard HTTP infrastructure Cons: Need auth design

2.4.3 WebSocket (deprecated in favor of Streamable HTTP)

Initially supported, now superseded.

2.5 JSON-RPC 2.0 Foundation

MCP messages are JSON-RPC 2.0:

Request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": { "name": "...", "arguments": {...} }
}

Response (success):

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": { "content": [...] }
}

Response (error):

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": { "code": -32602, "message": "Invalid params" }
}

Notification (no response expected):

{
  "jsonrpc": "2.0",
  "method": "notifications/cancelled",
  "params": { "requestId": 1 }
}

2.6 Lifecycle & Capability Negotiation

Client                              Server
  │                                   │
  ├──── initialize ──────────────────►│
  │     {protocolVersion, capabilities,
  │      clientInfo}
  │                                   │
  │◄──── initialized ─────────────────┤
  │     {protocolVersion, capabilities,
  │      serverInfo}
  │                                   │
  ├──── initialized notification ────►│
  │                                   │
  ├──── tools/list ──────────────────►│
  │◄──── result ──────────────────────┤
  │                                   │
  ├──── tools/call ──────────────────►│
  │◄──── result ──────────────────────┤
  │                                   │
  ├──── shutdown ────────────────────►│
  │◄──── result ──────────────────────┤

Capability negotiation: Client and server announce what they support (resources, tools, prompts, sampling, logging).

2.7 Security Model

MCP servers expose powerful capabilities (file system, DB, APIs). Security paramount.

2.7.1 Local stdio: process isolation

  • Server runs as subprocess of host
  • Inherits user permissions
  • No network attack surface
  • Risk: Malicious server can read user files. Mitigation: trusted server registry.

2.7.2 Remote HTTP: OAuth 2.1 + DPoP

MCP spec 2025-11-25 mandates OAuth 2.1 for HTTP transport:

1. Client redirects user to /oauth/authorize
2. User consents to scopes (e.g., "tools:execute", "resources:read")
3. Authorization code → token exchange with PKCE
4. Token bound to client via DPoP (RFC 9449) — see Tuan-14
5. Each MCP request includes:
   Authorization: DPoP <access_token>
   DPoP: <signed JWT>

Tham chiếu: Tuan-14-AuthN-AuthZ-Security sections 2.16 (DPoP) và 2.17 (FAPI 2.0).

2.7.3 Server-side sandboxing

# Example: filesystem server with path restriction
ALLOWED_PATHS = ["/tmp/mcp-workspace", "/home/user/projects"]
 
def read_file(path: str) -> str:
    abs_path = os.path.realpath(path)  # Resolve symlinks
    if not any(abs_path.startswith(p) for p in ALLOWED_PATHS):
        raise PermissionError(f"Path {path} not allowed")
    return open(abs_path).read()

Best practices:

  • Validate all paths (prevent path traversal)
  • Sandbox file system access (chroot, containers)
  • Rate limit per client
  • Audit log all tool invocations

2.8 MCP vs OpenAPI vs gRPC

FeatureMCPOpenAPI/RESTgRPC
Designed forLLM tool useHuman-facing APIsRPC between services
SchemaJSON Schema (per tool)OpenAPI YAMLProtobuf
DiscoverabilityBuilt-in (tools/list)OpenAPI docReflection (limited)
StreamingYes (SSE)LimitedFirst-class
AI semanticsTools, prompts, resourcesGeneric CRUDGeneric methods
AuthOAuth 2.1 + DPoPBearer, OAuthPluggable
Best forAI agents calling toolsWeb APIsService-to-service

MCP key differentiator: Designed for AI consumption — schema includes natural language descriptions, prompt templates, resource semantics.

2.9 Building MCP Server — Pattern

Reference architecture:

┌────────────────────────────────────────┐
│         MCP Server                      │
│                                          │
│  ┌──────────────────────────────────┐  │
│  │     Transport Layer               │  │
│  │  - stdio (subprocess)             │  │
│  │  - Streamable HTTP                │  │
│  └────────────┬─────────────────────┘  │
│               │                          │
│  ┌────────────▼──────────────────────┐ │
│  │     Protocol Handler               │ │
│  │  - JSON-RPC parsing                │ │
│  │  - Capability negotiation          │ │
│  └────────────┬──────────────────────┘ │
│               │                          │
│  ┌────────────▼──────────────────────┐ │
│  │     Auth & Authorization          │ │
│  │  - OAuth token validation         │ │
│  │  - Scope enforcement              │ │
│  │  - DPoP verification              │ │
│  └────────────┬──────────────────────┘ │
│               │                          │
│  ┌────────────▼──────────────────────┐ │
│  │     Tool/Resource Registry        │ │
│  │  - Tool schemas                   │ │
│  │  - Resource URIs                  │ │
│  │  - Prompt templates               │ │
│  └────────────┬──────────────────────┘ │
│               │                          │
│  ┌────────────▼──────────────────────┐ │
│  │     Business Logic                │ │
│  │  - Service integration            │ │
│  │  - Rate limiting                  │ │
│  │  - Audit log                      │ │
│  └───────────────────────────────────┘ │
└────────────────────────────────────────┘

2.10 Production Deployment Patterns

2.10.1 Local stdio (developer tools)

User runs claude-desktop → spawns local MCP servers as subprocesses.

// claude_desktop_config.json
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/projects"]
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..."
      }
    }
  }
}

2.10.2 Remote MCP (enterprise)

Centralized MCP server, accessible via HTTPS.

Cloudflare Workers → MCP Server (TypeScript)
                     ↓
                     Postgres / GitHub API / etc.

Client (Claude Desktop, web app) → HTTPS → MCP Server

Cloudflare offers Remote MCP hosting (Apr 2025): Workers-based, OAuth built-in.

2.10.3 Multi-tenant MCP

            ┌──────────────────┐
            │  MCP Gateway     │
            │  - Auth          │
            │  - Tenant routing│
            └────────┬─────────┘
                     │
       ┌─────────────┼─────────────┐
       ▼             ▼             ▼
  ┌─────────┐  ┌─────────┐  ┌─────────┐
  │ Server A│  │ Server B│  │ Server C│
  │ Tenant 1│  │ Tenant 2│  │ Tenant 3│
  └─────────┘  └─────────┘  └─────────┘

Per-tenant MCP servers with isolated data and rate limits.


3. Estimation

3.1 Tool latency budget

Typical MCP tool call:

  • LLM decision to call: ~500ms
  • Network to MCP server: ~50ms
  • Tool execution: variable (10ms - 5s)
  • Response back: ~50ms
  • LLM processes result: ~500ms

Total: 1.1s + tool execution time.

Implication: Tools should be fast (<100ms) for good UX. Long-running ops → progress events.

3.2 Throughput

Single MCP server:

  • stdio: Limited by host (1 client)
  • HTTP: Standard web server throughput (~1K-10K req/s per instance)

Scaling: Stateless HTTP MCP servers scale horizontally with load balancer.

3.3 Cost

Self-hosted Cloudflare Workers MCP:

  • 20-100/month (downstream API costs)

vs custom REST API:

  • $50-200/month server + integration cost
  • MCP cost-effective via standardization

4. Security First

4.1 Threat model

ThreatMitigation
Malicious MCP server steals dataTrusted registry, code signing, sandbox
Prompt injection via tool resultsValidate tool output, sanitize before returning to LLM
Token leakOAuth + DPoP, short-lived tokens, key rotation
Path traversal in filesystem serverRealpath validation, allowed-paths whitelist
SQL injection in DB MCPParameterized queries, read-only access
Resource exhaustionRate limits, timeouts, query complexity bounds

4.2 OAuth 2.1 flow (MCP spec)

1. Client → /authorize?
     response_type=code&
     client_id=...&
     redirect_uri=...&
     scope=tools:execute resources:read&
     code_challenge=...&  (PKCE)
     code_challenge_method=S256

2. User consents

3. Server redirects → redirect_uri?code=...

4. Client → /token (POST)
     grant_type=authorization_code&
     code=...&
     code_verifier=...

5. Server returns access_token (short-lived) + refresh_token

6. Client uses access_token in MCP calls:
   Authorization: Bearer <access_token>
   DPoP: <signed JWT>

4.3 Permission scopes

Recommended scope structure:

tools:list                  # See tool catalog
tools:execute               # Call any tool
tools:execute:read_only     # Only side-effect-free tools
tools:execute:filesystem:read
tools:execute:database:write

resources:list
resources:read
resources:read:public

prompts:list
prompts:use

Principle: Users grant minimal scopes. Apps request granular permissions.

4.4 Audit logging

CREATE TABLE mcp_audit_log (
    id BIGSERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    user_id UUID,
    client_id UUID,
    server_id TEXT,
    method TEXT NOT NULL,        -- e.g., 'tools/call'
    tool_name TEXT,              -- if tools/call
    arguments JSONB,
    result_status TEXT,          -- 'success', 'error', 'denied'
    duration_ms INT,
    error_message TEXT
);
 
-- Index for compliance queries
CREATE INDEX idx_audit_user_time ON mcp_audit_log (user_id, timestamp DESC);

4.5 Sandboxing untrusted servers

Risk: User installs random MCP server from internet → reads all files.

Mitigations:

  • Container sandbox: Run server in restricted container (no network, limited fs)
  • eBPF policy: Block syscalls
  • Code signing: Verified publisher (like browser extensions)
  • Permissions UI: Host shows what server can access

5. DevOps — MCP Operations

5.1 MCP server in TypeScript

// server.ts — minimal MCP server
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
  CallToolRequestSchema,
  ListToolsRequestSchema
} from "@modelcontextprotocol/sdk/types.js";
 
const server = new Server(
  { name: "my-mcp-server", version: "1.0.0" },
  { capabilities: { tools: {} } }
);
 
// Define tools
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "search_users",
      description: "Search users by query",
      inputSchema: {
        type: "object",
        properties: {
          query: { type: "string" },
          limit: { type: "integer", default: 10 }
        },
        required: ["query"]
      }
    }
  ]
}));
 
// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;
 
  if (name === "search_users") {
    const users = await searchUsers(args.query, args.limit);
    return {
      content: [
        { type: "text", text: JSON.stringify(users, null, 2) }
      ]
    };
  }
 
  throw new Error(`Unknown tool: ${name}`);
});
 
// Start
const transport = new StdioServerTransport();
await server.connect(transport);

5.2 MCP server in Python (with HTTP transport)

# server.py
from mcp.server.fastmcp import FastMCP
import httpx
 
mcp = FastMCP("my-server")
 
 
@mcp.tool()
async def search_users(query: str, limit: int = 10) -> list[dict]:
    """Search users by query."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"https://api.example.com/users",
            params={"q": query, "limit": limit}
        )
        return resp.json()
 
 
@mcp.resource("user://{user_id}")
async def get_user(user_id: str) -> str:
    """Get user profile by ID."""
    # ... fetch user
    return user_profile_markdown
 
 
@mcp.prompt()
def code_review(file: str) -> str:
    """Generate code review prompt."""
    return f"Please review the code in {file} and suggest improvements."
 
 
if __name__ == "__main__":
    # stdio transport
    mcp.run()
    # Or HTTP:
    # mcp.run_sse(host="0.0.0.0", port=8000)

5.3 Cloudflare Workers MCP

// wrangler.toml
// name = "mcp-worker"
// main = "src/index.ts"
// compatibility_date = "2025-11-01"
 
// src/index.ts
import { MCPServer } from "@cloudflare/workers-mcp";
 
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const server = new MCPServer({
      name: "github-mcp",
      version: "1.0.0",
      tools: [
        {
          name: "search_issues",
          description: "Search GitHub issues",
          handler: async (args) => {
            // Use env.GITHUB_TOKEN
            return await searchIssues(args.query, env.GITHUB_TOKEN);
          }
        }
      ]
    });
 
    return server.handleRequest(request);
  }
};

5.4 Monitoring

groups:
  - name: mcp_alerts
    rules:
      - alert: MCPHighErrorRate
        expr: |
          sum(rate(mcp_requests_total{status="error"}[5m])) /
          sum(rate(mcp_requests_total[5m])) > 0.05
        for: 5m
        annotations:
          summary: "MCP error rate > 5%"
 
      - alert: MCPHighLatency
        expr: |
          histogram_quantile(0.99,
            rate(mcp_tool_duration_seconds_bucket[5m])
          ) > 5
        for: 5m
        annotations:
          summary: "P99 MCP tool latency > 5s"
 
      - alert: MCPAuthFailures
        expr: rate(mcp_auth_failures_total[5m]) > 1
        for: 5m
        annotations:
          summary: "Suspicious auth failure rate"

5.5 Testing MCP servers

# pytest test
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client
 
 
async def test_mcp_server():
    params = StdioServerParameters(
        command="python", args=["server.py"]
    )
 
    async with stdio_client(params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
 
            # Test tools/list
            tools = await session.list_tools()
            assert any(t.name == "search_users" for t in tools.tools)
 
            # Test tools/call
            result = await session.call_tool(
                "search_users",
                arguments={"query": "alice", "limit": 5}
            )
            assert len(result.content) > 0

6. Code Implementation

6.1 Production MCP server (Python)

"""
Production-grade MCP server với:
- OAuth 2.1 auth
- Rate limiting per client
- Audit logging
- Permission scopes
"""
 
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import asyncio
import time
 
 
class ProductionMCPServer:
    def __init__(self):
        self.server = Server("production-server")
        self.rate_limits: dict[str, list[float]] = {}
        self._setup_handlers()
 
    def _setup_handlers(self):
        @self.server.list_tools()
        async def list_tools() -> list[Tool]:
            return [
                Tool(
                    name="query_database",
                    description="Run SQL query (read-only)",
                    inputSchema={
                        "type": "object",
                        "properties": {
                            "sql": {"type": "string"},
                            "params": {"type": "array"}
                        },
                        "required": ["sql"]
                    }
                ),
            ]
 
        @self.server.call_tool()
        async def call_tool(name: str, arguments: dict) -> list[TextContent]:
            client_id = self._get_client_id()
 
            # Rate limiting
            if not self._check_rate_limit(client_id):
                return [TextContent(
                    type="text",
                    text="Rate limit exceeded. Try again later."
                )]
 
            # Permission check
            if not self._has_permission(client_id, name):
                return [TextContent(
                    type="text",
                    text=f"Permission denied for {name}"
                )]
 
            # Audit log
            start = time.time()
            try:
                if name == "query_database":
                    result = await self._safe_query(arguments)
                    return [TextContent(type="text", text=str(result))]
                else:
                    raise ValueError(f"Unknown tool: {name}")
            except Exception as e:
                await self._audit_log(
                    client_id, name, arguments,
                    status="error", duration=time.time() - start,
                    error=str(e)
                )
                raise
            finally:
                await self._audit_log(
                    client_id, name, arguments,
                    status="success", duration=time.time() - start
                )
 
    def _check_rate_limit(self, client_id: str) -> bool:
        now = time.time()
        history = self.rate_limits.setdefault(client_id, [])
        # Keep only last 60s
        self.rate_limits[client_id] = [t for t in history if t > now - 60]
 
        if len(self.rate_limits[client_id]) >= 100:  # 100 req/min
            return False
 
        self.rate_limits[client_id].append(now)
        return True
 
    async def _safe_query(self, arguments: dict):
        """Read-only query with timeout."""
        sql = arguments["sql"]
        if not sql.strip().lower().startswith("select"):
            raise ValueError("Only SELECT allowed")
 
        params = arguments.get("params", [])
 
        # Use parameterized query, timeout 5s
        async with asyncio.timeout(5):
            return await db.fetch(sql, *params)
 
 
async def main():
    server = ProductionMCPServer()
    async with stdio_server() as (read, write):
        await server.server.run(read, write)
 
 
if __name__ == "__main__":
    asyncio.run(main())

6.2 MCP client in app

"""
Embed MCP client in custom AI app.
"""
 
from mcp.client.session import ClientSession
from mcp.client.streamable_http import streamablehttp_client
 
 
class MCPToolClient:
    def __init__(self, server_url: str, token: str):
        self.server_url = server_url
        self.token = token
        self.session: ClientSession | None = None
 
    async def connect(self):
        self.transport_cm = streamablehttp_client(
            self.server_url,
            headers={"Authorization": f"Bearer {self.token}"}
        )
        read, write = await self.transport_cm.__aenter__()
        self.session = ClientSession(read, write)
        await self.session.initialize()
 
    async def list_tools(self):
        return (await self.session.list_tools()).tools
 
    async def call_tool(self, name: str, arguments: dict):
        result = await self.session.call_tool(name, arguments=arguments)
        return result.content
 
    async def close(self):
        if self.session:
            await self.session.close()
        await self.transport_cm.__aexit__(None, None, None)
 
 
# In LLM agent loop
async def agent_loop(user_query: str):
    mcp = MCPToolClient(
        "https://mcp.myapp.com/mcp",
        token=get_user_token()
    )
    await mcp.connect()
 
    tools = await mcp.list_tools()
    tool_specs = [
        {
            "name": t.name,
            "description": t.description,
            "input_schema": t.inputSchema
        }
        for t in tools
    ]
 
    # Pass to LLM
    response = await claude.messages.create(
        model="claude-3-5-sonnet",
        tools=tool_specs,
        messages=[{"role": "user", "content": user_query}]
    )
 
    if response.stop_reason == "tool_use":
        for tool_call in response.content:
            if tool_call.type == "tool_use":
                result = await mcp.call_tool(
                    tool_call.name,
                    arguments=tool_call.input
                )
                # Feed back to Claude...
 
    await mcp.close()

7. System Design Diagrams

7.1 N×M Problem → MCP Pattern

flowchart LR
    subgraph Before["Before MCP (N×M)"]
        AppA[App A] --> SvcA1[GitHub]
        AppA --> SvcB1[Slack]
        AppA --> SvcC1[Postgres]
        AppB[App B] --> SvcA2[GitHub]
        AppB --> SvcB2[Slack]
        AppB --> SvcC2[Postgres]
    end

    subgraph After["After MCP (N+M)"]
        AppA2[App A] --> MCP1[MCP Client]
        AppB2[App B] --> MCP2[MCP Client]
        MCP1 --> SrvA[GitHub MCP Server]
        MCP1 --> SrvB[Slack MCP Server]
        MCP1 --> SrvC[Postgres MCP Server]
        MCP2 --> SrvA
        MCP2 --> SrvB
        MCP2 --> SrvC
    end

    style Before fill:#ffcdd2
    style After fill:#c8e6c9

7.2 MCP Lifecycle

sequenceDiagram
    participant H as Host App
    participant C as MCP Client
    participant S as MCP Server
    participant L as LLM

    H->>C: Initialize
    C->>S: initialize(version, capabilities)
    S-->>C: serverInfo + capabilities
    C->>S: notifications/initialized

    C->>S: tools/list
    S-->>C: [tool schemas]

    H->>L: User query + tool schemas
    L-->>H: Tool call: search_users("alice")

    H->>C: callTool("search_users", {...})
    C->>S: tools/call
    S->>S: Auth check
    S->>S: Rate limit
    S->>S: Execute
    S-->>C: {content: [...]}
    C-->>H: result

    H->>L: Tool result
    L-->>H: Final response
    H-->>User: Answer

7.3 Streamable HTTP Transport

sequenceDiagram
    participant C as Client
    participant Auth as OAuth Server
    participant S as MCP Server

    C->>Auth: Authorization Code Flow + PKCE
    Auth-->>C: access_token

    C->>S: POST /mcp<br/>Authorization: Bearer ...<br/>DPoP: ...<br/>{tools/call}
    S->>S: Verify token + DPoP
    S->>S: Execute tool

    Note over S: Long-running tool

    S-->>C: 200 OK<br/>Content-Type: text/event-stream
    S-->>C: data: {progress 25%}
    S-->>C: data: {progress 50%}
    S-->>C: data: {progress 75%}
    S-->>C: data: {result}

7.4 Multi-tenant MCP Gateway

flowchart TB
    Clients[AI Clients] --> Gateway[MCP Gateway<br/>OAuth + Tenant Routing]

    Gateway --> Tenant1[Tenant 1 Servers]
    Gateway --> Tenant2[Tenant 2 Servers]
    Gateway --> Tenant3[Tenant 3 Servers]

    subgraph Tenant1["Tenant 1"]
        T1GH[GitHub MCP]
        T1DB[Postgres MCP]
        T1FS[Filesystem MCP]
    end

    subgraph Tenant2["Tenant 2"]
        T2GH[GitHub MCP]
        T2Slack[Slack MCP]
    end

    Audit[Audit Log] -.- Gateway

8. Aha Moments & Pitfalls

Aha Moments

#1: MCP solves N×M integration problem cho AI. Cùng pattern như USB-C, ODBC, Web standards. 1 protocol, ai cũng implement được.

#2: Schema = AI-readable. JSON Schema với natural language descriptions, LLMs có thể “đọc” tool definitions. Khác OpenAPI dành cho human developers.

#3: 3 capabilities orthogonal: Tools (actions), Resources (data), Prompts (templates). Server có thể chỉ expose 1, không phải tất cả.

#4: stdio cho local, HTTP cho remote. stdio đơn giản nhất, secure, đủ cho 90% use cases. HTTP cho enterprise/cloud.

#5: OAuth 2.1 + DPoP là standard. Stolen token không reusable. Tham chiếu T14 cho deep dive.

#6: Streamable HTTP > WebSocket. SSE simpler, HTTP-friendly, works through proxies/CDN. Spec deprecated WebSocket transport.

#7: MCP server = backend service. Same skills as REST/gRPC server: auth, rate limit, observability, audit. Just different protocol.

#8: Trust matters. User installs MCP server = giving access to data. Code signing, registry, sandboxing critical.

Pitfalls

Pitfall 1: No auth on remote MCP

Sai: Public HTTP endpoint without auth → anyone can call tools. Đúng: OAuth 2.1 mandatory. DPoP for high-value.

Pitfall 2: Path traversal in filesystem server

Sai: read_file(path) accepts ../../../etc/passwd. Đúng: Validate realpath against allowed prefixes.

Pitfall 3: SQL injection in DB MCP

Sai: f"SELECT * FROM users WHERE id = {user_input}". Đúng: Parameterized queries always.

Pitfall 4: No rate limit

Sai: LLM in loop calls tool 1000 times/sec → DoS service. Đúng: Per-client, per-tool rate limits.

Pitfall 5: Long-running tools without progress

Sai: compile_project runs 60s, no feedback → client times out. Đúng: Send progress notifications via SSE.

Pitfall 6: Tool output too large

Sai: Tool returns 10MB result → LLM context overflow. Đúng: Pagination, summary, return URI to fetch separately.

Pitfall 7: Trust untrusted servers

Sai: User runs random GitHub MCP server → server reads all files. Đúng: Verified registry, code signing, sandbox.

Pitfall 8: Verbose tool descriptions

Sai: 5KB description per tool → LLM context bloat. Đúng: Concise, action-oriented descriptions. Examples in description.

Pitfall 9: No audit log

Sai: No record of what tools called, by whom, when. Đúng: Audit log every tool call. Required for compliance.

Pitfall 10: Tool errors not handled

Sai: Tool throws exception → MCP returns generic error → LLM stuck. Đúng: Structured error response with retry hints.


TopicLiên hệ
Tuan-04-API-Design-REST-gRPCFoundation; MCP là JSON-RPC variant
Tuan-14-AuthN-AuthZ-SecurityOAuth 2.1, DPoP, scopes
Tuan-09-Rate-LimiterPer-client rate limit cho MCP
Tuan-13-Monitoring-ObservabilityMonitor MCP servers
Tuan-Bonus-LLM-Serving-InfrastructureLLM consumes MCP tools
Tuan-Bonus-Multi-Tenancy-SaaS-PatternsMulti-tenant MCP gateway

Tham khảo

Spec & Docs:

Servers:

Engineering blogs:

Tools:


Hoàn thành Phase F. Tiếp theo: Phase G — Platform Engineering, FinOps, Progressive Delivery, Edge+Wasm.