Tradie Front Office AI | Autonomous Front-Office Agent for Trades

Platform Architecture

Every inbound request enters through Hono on Cloudflare Pages, then flows through deterministic tools and D1-backed audit records. Roadmap services are marked directly in the diagram.

Ingress Channels

Twilio Voice

Voice calls

CF Email / Resend

Inbound email

Twilio SMS

Text messages

Web Chat

Browser widget

Webhooks

Worker Layer

Channel Router

Hono middleware

KV Guardrails

Caps / Idempotency

Agent Memory

Retrieve D1 chunks

Agent Router

Hono + Workers AI

Function Calling

AI + Tool Chain

Kimi K2.6

LLM Inference

Memory Context

Policies / FAQs

Whisper (roadmap)

Raw audio STT

ElevenLabs

TTS

check_availability

Tool

quote_price

Tool

book_job

Tool

Read / Write

Storage Layer

D1

Relational (SQLite)

D1 Memory

Docs & chunks

KV

Config & rate cards

R2 (roadmap)

Contracts & recordings

Vectorize (roadmap)

Semantic RAG

Queues (roadmap)

Async tasks

Live demo-critical path

Bound services that support the interview workflow today.

Live

Workers AI AI binding

Kimi K2.6 agent inference and native tool-calling in the channel path.

D1 Database DB binding

Businesses, service windows, rate cards, inquiries, jobs, events, and Agent Memory.

Workers KV KV binding

Request caps, payload guardrails, idempotency, and edge-ready config.

Cloudflare Email Email Worker

Companion Worker for inbound email and native outbound reply attempts.

Twilio Voice + SMS Webhook

Webhook paths normalize calls and texts into the shared agent loop.

Roadmap services

Important, but not claimed as live until bindings and code prove it.

Roadmap

Durable Objects Planned

Stateful sessions per business and conversation for richer multi-turn calls.

R2 Storage Planned

Contracts, call recordings, transcripts, and attachments.

Vectorize Planned

Semantic retrieval upgrade for larger policy and historical corpora.

AI Gateway / Queues Planned

AI observability, caching, provider fallback, async sync, and retries.

AI Stack

The agent combines Kimi reasoning, staff-approved D1 memory, speech input, and voice output without letting retrieved text override deterministic tools.

Kimi K2.6

@cf/moonshotai/kimi-k2.6

Primary reasoning model. MoE architecture activates only relevant expert sub-networks per token, keeping inference cost low at high volume.

Architecture MoE: 1T total, ~32B active

Context 128K tokens

Function calling Native (not prompt-injected)

Temperature 0.4 (bounded creativity for customer service)

Tool chain 5 tools in job sequence

Fallback Graceful apology response if Workers AI fails

Agent Memory

D1 + toMarkdown

Live retrieval layer for business policies, procedures, warranty rules, and FAQs. Staff upload PDFs or paste text, then enabled chunks are injected into every channel.

Upload types PDF, TXT, Markdown, CSV

Conversion Workers AI Markdown Conversion

Retrieval Keyword-ranked D1 chunks

Safety Cannot override pricing or availability

Whisper Large v3 Turbo

@cf/openai/whisper-large-v3-turbo

Roadmap speech-to-text layer. The current voice path accepts Twilio Gather speech transcripts and routes them into the same Kimi K2.6 loop.

Latency < 300ms per chunk

Languages 99+ (English primary)

Current input Twilio SpeechResult / demo transcript

Features Timestamps, confidence scores

Runs on Cloudflare Workers AI GPU

ElevenLabs TTS

Primary + Workers AI fallback

Text-to-speech for voice responses. ElevenLabs for production-quality voices; Workers AI MeloTTS as a zero-latency fallback.

Primary ElevenLabs (configurable voice)

Fallback Workers AI MeloTTS

Current output MP3 via /api/tts/speak for Twilio

Voice IDs Per-business configurable

Live knowledge retrieval

Agent Memory: upload docs, answer from policy

Agent Memory is the live RAG path for business operations. Original files are not retained; the app stores extracted Markdown, document metadata, and retrieval chunks in D1, then injects up to four relevant enabled snippets into each Kimi K2.6 request.

1

Upload or paste

Staff add PDF, TXT, Markdown, CSV, or direct policy text from /agent-memory.

2

Convert to Markdown

Workers AI toMarkdown extracts readable text from PDFs and documents.

3

Chunk in D1

Markdown is normalized, capped, chunked, keyword-indexed, and scoped by business_id.

4

Retrieve on every channel

Web, email, SMS, and voice retrieve policy snippets before the agent replies.

Memory boundary

Memory can answer warranty rules, safety procedures, escalation policy, and operational FAQs. Pricing still flows through quote_price, and service window availability still flows through check_availability.

Memory-Aware 5-Tool Job Chain

Before Kimi K2.6 answers, the Worker retrieves relevant enabled memory chunks. The model still calls these tools via native function-calling for operational state and customer commitments.

1

check_availability

Queries D1 for matching service windows with date-overlap exclusion and job dimension filtering.

2

quote_price

Deterministic pricing engine: callout + service + urgency + after-hours/date + travel zone + add-ons. Never LLM-generated.

3

draft_contract

Demo work-order placeholder today. R2 PDF storage and e-signature envelopes are roadmap.

4

take_payment

Payment-pending response today. Stripe Checkout is roadmap.

5

book_job

Writes confirmed job to D1 and returns a demo ServiceM8 reference. External PMS sync is roadmap.

Escape hatch: escalate_to_human

A 6th tool the agent can call at any point to route the conversation to a human. Triggered automatically when confidence drops below threshold, dollar cap is exceeded, or max turns is reached.

Data Layer: D1 Schema

9 tables, all scoped by business_id for strict multi-tenant isolation.

businesses

Tenant root table. One row per business.

id TEXT PK

name TEXT

field_service_tenant_id TEXT

timezone TEXT

business_hours_json TEXT

phone / email TEXT

availability_windows

Technician service-window inventory with category, zone, and notes.

id TEXT PK

business_id TEXT FK

window_label TEXT

technician_name TEXT

service_category TEXT

start_ts / end_ts DATETIME

travel_zone / status / capacity_score

rate_cards

Pricing configuration with deterministic JSON curves.

business_id TEXT PK

base_rate_json

urgency_curve_json

after_hours_curve_json

travel_zone_curve_json

add_on_json

minimum_charge_cents INT

agent_configs

Per-business AI agent personality and guardrails.

business_id TEXT FK

system_prompt TEXT

greeting_message TEXT

voice_id TEXT

dollar_cap_per_job INT

confidence_threshold REAL

max_turns_before_escalation INT

escalation_rules_json

agent_memory_documents

Uploaded or pasted staff knowledge converted to Markdown.

id TEXT PK

business_id TEXT FK

title TEXT

source_type ENUM

filename / mime_type / byte_size

extracted_markdown TEXT

status ENUM

chunk_count INT

agent_memory_chunks

Retrieval chunks automatically injected into Kimi context.

id TEXT PK

document_id TEXT FK

business_id TEXT FK

chunk_index INT

heading TEXT

content TEXT

search_text TEXT

token_estimate INT

inquiries

Every inbound interaction across all channels.

id TEXT PK

business_id TEXT FK

channel ENUM

caller_name / phone / email

transcript_text TEXT

intent TEXT

confidence REAL

status ENUM

jobs

Confirmed jobs with field-service sync status.

id TEXT PK

business_id TEXT FK

availability_window_id TEXT FK

inquiry_id TEXT FK

customer_name / phone / site_address

service_code / urgency TEXT

start_ts / end_ts DATETIME

price_cents INT

pms_synced BOOLEAN

events

Full audit trail for every action the agent takes.

id TEXT PK

business_id TEXT FK

inquiry_id TEXT FK

type TEXT

actor TEXT

payload_json TEXT

tool_name TEXT

ts DATETIME

Multi-Tenancy Rule

Every live D1 query and job/inquiry/event path is scoped by business_id. Roadmap storage surfaces like R2, Vectorize, Queues, and Durable Objects should keep the same tenant prefix rule when added.

Deterministic Pricing Engine

The LLM never generates prices. Every dollar amount comes from this formula, executed deterministically on the Worker.

// Final price calculation

subtotal = callout_fee + base_service + diagnostic_allowance
total = subtotal
        × urgency_multiplier
        × after_hours_date_multiplier
        × travel_zone_multiplier
        + add_ons
then apply minimum charge floor

Urgency

Standard: 1.00×

Same day: 1.25×

After hours: 1.75× / Emergency: 2.20×

Date / Time

Business hours: 1.00×

Saturday: 1.25× / Sunday: 1.60×

Weekday after 5pm: 1.35×

Travel Zone

Metro: 1.00×

Outer metro: 1.15×

Regional: 1.35×

Service Base

Blocked drain: $265

Hot water fault: $315

Switchboard fault: $285

Minimum Floor

Default: $149

Quote visit can waive diagnostic

Never below configured floor

Add-Ons

Camera inspection: $185

Temporary make-safe: $120

Parts run: $65

Why deterministic?

LLMs are great at conversation but unreliable at arithmetic. A hallucinated price creates legal liability and erodes customer trust. By running pricing as a pure function on the Worker, the agent can confidently quote exact rates that match your published rate card.

Guardrails & Security

Production AI needs more than vibes. These are hard constraints, not suggestions.

Agent Guardrails

Dollar cap Max job value before auto-escalation. Default: $15,000.

Confidence threshold Below this score, the agent escalates. Default: 0.75.

Max turns Turn limit before forcing human handoff. Default: 20.

Deterministic pricing Prices always from rate card function, never LLM-generated.

Availability check D1 query checks window overlaps before any job write.

Agent Memory boundary Retrieved policy text can inform answers, but cannot set prices, discounts, availability, or payment status.

Out-of-policy detection Warranty disputes, customer-supplied parts, and high-value commercial scopes are routed to staff.

Infrastructure Security

Google OAuth SSO Google OAuth 2.0 with JWT session cookies on all dashboard routes.

API tokens as secrets Twilio, ElevenLabs, email, and future vendor keys stored as Cloudflare Secrets.

Tenant isolation All data paths include business_id. No shared-namespace leaks.

KV request guardrails Demo-critical routes have payload caps, request caps, and idempotency where KV is bound.

Audit trail Every agent action logged with actor, timestamp, and detail JSON.

Roadmap AI Gateway AI request logs, caching, rate controls, and provider fallback come after live-channel reliability.

Voice Pipeline: Current And Roadmap

The current voice path uses Twilio webhooks and speech transcripts. Full Media Streams plus Whisper is the production evolution.

TWI

Twilio

Customer calls demo number. Twilio posts Gather speech results to the Worker.

Live

WOR

Worker

Hono route validates shape, applies guardrails, and preserves XML response semantics.

Live

KIM

Kimi K2.6

Transcript → agent reasoning → tool calls → response text.

Live

D1

Inquiry, transcript, channel metadata, and tool events are persisted.

Live

ELE

ElevenLabs

When configured, short replies are rendered as audio for Twilio .

Live

WHI

Whisper

Media Streams plus Workers AI speech-to-text for raw audio is roadmap.

Roadmap

Current proof: call transcript in, TwiML response out, audit trail saved. Media Streams is planned after the current voice proof.

Full Stack Reference

Everything that powers Tradie Front Office AI, in one table.

Layer	Technology	Purpose
Framework	Hono 4	Lightweight, fast web framework for Workers
Build	Vite + @hono/vite-build	SSR bundle for Cloudflare Pages
Runtime	Cloudflare Workers	V8 isolates at 300+ global PoPs
LLM	Kimi K2.6 (MoE)	Reasoning + native function-calling
STT	Twilio SpeechResult live; Whisper roadmap	Transcript input now, raw audio STT later
TTS	ElevenLabs / MeloTTS	Natural voice synthesis
Database	Cloudflare D1 (SQLite)	Relational data, multi-tenant
Agent Memory	D1 chunks + Workers AI Markdown Conversion	PDF/text policy retrieval
KV Store	Cloudflare Workers KV	Request caps, payload caps, idempotency, and edge config
Object Storage	Cloudflare R2 (roadmap)	Contracts, recordings, attachments
Vector DB	Cloudflare Vectorize (roadmap)	Semantic RAG upgrade for larger corpora
Sessions	Durable Objects (roadmap)	Stateful multi-turn agent sessions
Gateway	Cloudflare AI Gateway (roadmap)	LLM caching, rate limits, fallback
Queues	Cloudflare Queues (roadmap)	Async PMS sync, notifications
Auth	Google OAuth 2.0 + JWT	SSO for dashboard with session cookies
Voice	Twilio webhooks live; Media Streams roadmap	Telephony ingress/egress
Email	Cloudflare Email Service + Resend fallback	Native inbound/outbound email path
SMS	Twilio Messaging	Text message channel
Payments	Stripe Checkout (roadmap)	Customer payment collection
Contracts	DocuSign (roadmap)	E-signature for rental agreements
PMS	ServiceM8 API (roadmap)	Property management sync
Alerts	Slack API (roadmap)	Staff notifications & escalations
Frontend	Tailwind CSS + Space Grotesk	Utility-first styling, Dispatch Intelligence theme
TypeScript	ES2022 target	Type-safe Workers code

Ready to inspect the live system?

Open the dashboard, run the live-channel demo, and inspect the API readiness flags.

Open Dashboard Home Agent Memory

Cloudflare-native architecture for a real front desk agent. Zero origin servers.

Platform Architecture

Live demo-critical path

Roadmap services

AI Stack

Kimi K2.6

Agent Memory

Whisper Large v3 Turbo

ElevenLabs TTS

Agent Memory: upload docs, answer from policy

Memory-Aware 5-Tool Job Chain

Data Layer: D1 Schema

Multi-Tenancy Rule

Deterministic Pricing Engine

Why deterministic?

Guardrails & Security

Agent Guardrails

Infrastructure Security

Voice Pipeline: Current And Roadmap

Full Stack Reference

Ready to inspect the live system?