Technical deep dive

Cloudflare-native architecture for a real front desk agent. Zero origin servers.

The live application runs its core channel path on Cloudflare Pages, Hono, D1, KV, Workers AI, Agent Memory, and Cloudflare Email. Larger services like R2, Durable Objects, Vectorize, Queues, and AI Gateway are explicit roadmap items until they are bound and used.

Live runtime
Pages + Hono

Every route enters the same Worker shell.

Live intelligence
Kimi K2.6 + tools

Conversation stays separate from deterministic pricing.

Live evidence
D1 audit trail

Inquiries, jobs, memory, and events persist for staff review.

Platform Architecture

Every inbound request enters through Hono on Cloudflare Pages, then flows through deterministic tools and D1-backed audit records. Roadmap services are marked directly in the diagram.

Ingress Channels
Twilio Voice
Voice calls
CF Email / Resend
Inbound email
Twilio SMS
Text messages
Web Chat
Browser widget
Webhooks
Worker Layer
Channel Router
Hono middleware
KV Guardrails
Caps / Idempotency
Agent Memory
Retrieve D1 chunks
Agent Router
Hono + Workers AI
Function Calling
AI + Tool Chain
Kimi K2.6
LLM Inference
Memory Context
Policies / FAQs
Whisper (roadmap)
Raw audio STT
ElevenLabs
TTS
check_availability
Tool
quote_price
Tool
book_job
Tool
Read / Write
Storage Layer
D1
Relational (SQLite)
D1 Memory
Docs & chunks
KV
Config & rate cards
R2 (roadmap)
Contracts & recordings
Vectorize (roadmap)
Semantic RAG
Queues (roadmap)
Async tasks

Live demo-critical path

Bound services that support the interview workflow today.

Live
Workers AI AI binding

Kimi K2.6 agent inference and native tool-calling in the channel path.

D1 Database DB binding

Businesses, service windows, rate cards, inquiries, jobs, events, and Agent Memory.

Workers KV KV binding

Request caps, payload guardrails, idempotency, and edge-ready config.

Cloudflare Email Email Worker

Companion Worker for inbound email and native outbound reply attempts.

Twilio Voice + SMS Webhook

Webhook paths normalize calls and texts into the shared agent loop.

Roadmap services

Important, but not claimed as live until bindings and code prove it.

Roadmap
Durable Objects Planned

Stateful sessions per business and conversation for richer multi-turn calls.

R2 Storage Planned

Contracts, call recordings, transcripts, and attachments.

Vectorize Planned

Semantic retrieval upgrade for larger policy and historical corpora.

AI Gateway / Queues Planned

AI observability, caching, provider fallback, async sync, and retries.

AI Stack

The agent combines Kimi reasoning, staff-approved D1 memory, speech input, and voice output without letting retrieved text override deterministic tools.

Kimi K2.6

@cf/moonshotai/kimi-k2.6

Primary reasoning model. MoE architecture activates only relevant expert sub-networks per token, keeping inference cost low at high volume.

Architecture MoE: 1T total, ~32B active
Context 128K tokens
Function calling Native (not prompt-injected)
Temperature 0.4 (bounded creativity for customer service)
Tool chain 5 tools in job sequence
Fallback Graceful apology response if Workers AI fails

Agent Memory

D1 + toMarkdown

Live retrieval layer for business policies, procedures, warranty rules, and FAQs. Staff upload PDFs or paste text, then enabled chunks are injected into every channel.

Upload types PDF, TXT, Markdown, CSV
Conversion Workers AI Markdown Conversion
Retrieval Keyword-ranked D1 chunks
Safety Cannot override pricing or availability

Whisper Large v3 Turbo

@cf/openai/whisper-large-v3-turbo

Roadmap speech-to-text layer. The current voice path accepts Twilio Gather speech transcripts and routes them into the same Kimi K2.6 loop.

Latency < 300ms per chunk
Languages 99+ (English primary)
Current input Twilio SpeechResult / demo transcript
Features Timestamps, confidence scores
Runs on Cloudflare Workers AI GPU

ElevenLabs TTS

Primary + Workers AI fallback

Text-to-speech for voice responses. ElevenLabs for production-quality voices; Workers AI MeloTTS as a zero-latency fallback.

Primary ElevenLabs (configurable voice)
Fallback Workers AI MeloTTS
Current output MP3 via /api/tts/speak for Twilio
Voice IDs Per-business configurable
Live knowledge retrieval

Agent Memory: upload docs, answer from policy

Agent Memory is the live RAG path for business operations. Original files are not retained; the app stores extracted Markdown, document metadata, and retrieval chunks in D1, then injects up to four relevant enabled snippets into each Kimi K2.6 request.

1
Upload or paste

Staff add PDF, TXT, Markdown, CSV, or direct policy text from /agent-memory.

2
Convert to Markdown

Workers AI toMarkdown extracts readable text from PDFs and documents.

3
Chunk in D1

Markdown is normalized, capped, chunked, keyword-indexed, and scoped by business_id.

4
Retrieve on every channel

Web, email, SMS, and voice retrieve policy snippets before the agent replies.

Memory boundary

Memory can answer warranty rules, safety procedures, escalation policy, and operational FAQs. Pricing still flows through quote_price, and service window availability still flows through check_availability.

Memory-Aware 5-Tool Job Chain

Before Kimi K2.6 answers, the Worker retrieves relevant enabled memory chunks. The model still calls these tools via native function-calling for operational state and customer commitments.

1
check_availability

Queries D1 for matching service windows with date-overlap exclusion and job dimension filtering.

2
quote_price

Deterministic pricing engine: callout + service + urgency + after-hours/date + travel zone + add-ons. Never LLM-generated.

3
draft_contract

Demo work-order placeholder today. R2 PDF storage and e-signature envelopes are roadmap.

4
take_payment

Payment-pending response today. Stripe Checkout is roadmap.

5
book_job

Writes confirmed job to D1 and returns a demo ServiceM8 reference. External PMS sync is roadmap.

Escape hatch: escalate_to_human

A 6th tool the agent can call at any point to route the conversation to a human. Triggered automatically when confidence drops below threshold, dollar cap is exceeded, or max turns is reached.

Data Layer: D1 Schema

9 tables, all scoped by business_id for strict multi-tenant isolation.

businesses

Tenant root table. One row per business.

id TEXT PK
name TEXT
field_service_tenant_id TEXT
timezone TEXT
business_hours_json TEXT
phone / email TEXT
availability_windows

Technician service-window inventory with category, zone, and notes.

id TEXT PK
business_id TEXT FK
window_label TEXT
technician_name TEXT
service_category TEXT
start_ts / end_ts DATETIME
travel_zone / status / capacity_score
rate_cards

Pricing configuration with deterministic JSON curves.

business_id TEXT PK
base_rate_json
urgency_curve_json
after_hours_curve_json
travel_zone_curve_json
add_on_json
minimum_charge_cents INT
agent_configs

Per-business AI agent personality and guardrails.

business_id TEXT FK
system_prompt TEXT
greeting_message TEXT
voice_id TEXT
dollar_cap_per_job INT
confidence_threshold REAL
max_turns_before_escalation INT
escalation_rules_json
agent_memory_documents

Uploaded or pasted staff knowledge converted to Markdown.

id TEXT PK
business_id TEXT FK
title TEXT
source_type ENUM
filename / mime_type / byte_size
extracted_markdown TEXT
status ENUM
chunk_count INT
agent_memory_chunks

Retrieval chunks automatically injected into Kimi context.

id TEXT PK
document_id TEXT FK
business_id TEXT FK
chunk_index INT
heading TEXT
content TEXT
search_text TEXT
token_estimate INT
inquiries

Every inbound interaction across all channels.

id TEXT PK
business_id TEXT FK
channel ENUM
caller_name / phone / email
transcript_text TEXT
intent TEXT
confidence REAL
status ENUM
jobs

Confirmed jobs with field-service sync status.

id TEXT PK
business_id TEXT FK
availability_window_id TEXT FK
inquiry_id TEXT FK
customer_name / phone / site_address
service_code / urgency TEXT
start_ts / end_ts DATETIME
price_cents INT
pms_synced BOOLEAN
events

Full audit trail for every action the agent takes.

id TEXT PK
business_id TEXT FK
inquiry_id TEXT FK
type TEXT
actor TEXT
payload_json TEXT
tool_name TEXT
ts DATETIME

Multi-Tenancy Rule

Every live D1 query and job/inquiry/event path is scoped by business_id. Roadmap storage surfaces like R2, Vectorize, Queues, and Durable Objects should keep the same tenant prefix rule when added.

Deterministic Pricing Engine

The LLM never generates prices. Every dollar amount comes from this formula, executed deterministically on the Worker.

// Final price calculation
subtotal = callout_fee + base_service + diagnostic_allowance
total = subtotal
        × urgency_multiplier
        × after_hours_date_multiplier
        × travel_zone_multiplier
        + add_ons
then apply minimum charge floor
Urgency
Standard: 1.00×
Same day: 1.25×
After hours: 1.75× / Emergency: 2.20×
Date / Time
Business hours: 1.00×
Saturday: 1.25× / Sunday: 1.60×
Weekday after 5pm: 1.35×
Travel Zone
Metro: 1.00×
Outer metro: 1.15×
Regional: 1.35×
Service Base
Blocked drain: $265
Hot water fault: $315
Switchboard fault: $285
Minimum Floor
Default: $149
Quote visit can waive diagnostic
Never below configured floor
Add-Ons
Camera inspection: $185
Temporary make-safe: $120
Parts run: $65

Why deterministic?

LLMs are great at conversation but unreliable at arithmetic. A hallucinated price creates legal liability and erodes customer trust. By running pricing as a pure function on the Worker, the agent can confidently quote exact rates that match your published rate card.

Guardrails & Security

Production AI needs more than vibes. These are hard constraints, not suggestions.

Agent Guardrails

Dollar cap Max job value before auto-escalation. Default: $15,000.
Confidence threshold Below this score, the agent escalates. Default: 0.75.
Max turns Turn limit before forcing human handoff. Default: 20.
Deterministic pricing Prices always from rate card function, never LLM-generated.
Availability check D1 query checks window overlaps before any job write.
Agent Memory boundary Retrieved policy text can inform answers, but cannot set prices, discounts, availability, or payment status.
Out-of-policy detection Warranty disputes, customer-supplied parts, and high-value commercial scopes are routed to staff.

Infrastructure Security

Google OAuth SSO Google OAuth 2.0 with JWT session cookies on all dashboard routes.
API tokens as secrets Twilio, ElevenLabs, email, and future vendor keys stored as Cloudflare Secrets.
Tenant isolation All data paths include business_id. No shared-namespace leaks.
KV request guardrails Demo-critical routes have payload caps, request caps, and idempotency where KV is bound.
Audit trail Every agent action logged with actor, timestamp, and detail JSON.
Roadmap AI Gateway AI request logs, caching, rate controls, and provider fallback come after live-channel reliability.

Voice Pipeline: Current And Roadmap

The current voice path uses Twilio webhooks and speech transcripts. Full Media Streams plus Whisper is the production evolution.

TWI
Twilio

Customer calls demo number. Twilio posts Gather speech results to the Worker.

Live
WOR
Worker

Hono route validates shape, applies guardrails, and preserves XML response semantics.

Live
KIM
Kimi K2.6

Transcript → agent reasoning → tool calls → response text.

Live
D1
D1

Inquiry, transcript, channel metadata, and tool events are persisted.

Live
ELE
ElevenLabs

When configured, short replies are rendered as audio for Twilio .

Live
WHI
Whisper

Media Streams plus Workers AI speech-to-text for raw audio is roadmap.

Roadmap
Current proof: call transcript in, TwiML response out, audit trail saved. Media Streams is planned after the current voice proof.

Full Stack Reference

Everything that powers Tradie Front Office AI, in one table.

Layer Technology Purpose
Framework Hono 4 Lightweight, fast web framework for Workers
Build Vite + @hono/vite-build SSR bundle for Cloudflare Pages
Runtime Cloudflare Workers V8 isolates at 300+ global PoPs
LLM Kimi K2.6 (MoE) Reasoning + native function-calling
STT Twilio SpeechResult live; Whisper roadmap Transcript input now, raw audio STT later
TTS ElevenLabs / MeloTTS Natural voice synthesis
Database Cloudflare D1 (SQLite) Relational data, multi-tenant
Agent Memory D1 chunks + Workers AI Markdown Conversion PDF/text policy retrieval
KV Store Cloudflare Workers KV Request caps, payload caps, idempotency, and edge config
Object Storage Cloudflare R2 (roadmap) Contracts, recordings, attachments
Vector DB Cloudflare Vectorize (roadmap) Semantic RAG upgrade for larger corpora
Sessions Durable Objects (roadmap) Stateful multi-turn agent sessions
Gateway Cloudflare AI Gateway (roadmap) LLM caching, rate limits, fallback
Queues Cloudflare Queues (roadmap) Async PMS sync, notifications
Auth Google OAuth 2.0 + JWT SSO for dashboard with session cookies
Voice Twilio webhooks live; Media Streams roadmap Telephony ingress/egress
Email Cloudflare Email Service + Resend fallback Native inbound/outbound email path
SMS Twilio Messaging Text message channel
Payments Stripe Checkout (roadmap) Customer payment collection
Contracts DocuSign (roadmap) E-signature for rental agreements
PMS ServiceM8 API (roadmap) Property management sync
Alerts Slack API (roadmap) Staff notifications & escalations
Frontend Tailwind CSS + Space Grotesk Utility-first styling, Dispatch Intelligence theme
TypeScript ES2022 target Type-safe Workers code

Ready to inspect the live system?

Open the dashboard, run the live-channel demo, and inspect the API readiness flags.