Cannula Robotics
← Platform
Platform

Request flow

What happens, in order, when you press send.

Chat (with retrieval)

  1. You hit ai.cannularobotics.com. Cloudflare Access checks your Google identity. If you’re not on @cannularobotics.com, you’re rejected at the edge. (Access setup is in progress; until it lands the site is on the cr-ai-intranet.pages.dev preview URL.)
  2. The Astro site loads. Pages and content are pre-rendered HTML.
  3. You open the chat panel and send a message.
  4. The browser POSTs to /api/chat. The Pages Function reads the cf-access-authenticated-user-email header, attaches a bearer, forwards to cr-ai-proxy at cr-proxy.cannularobotics.com.
  5. cr-ai-proxy:
    • Calls cr-retrieval/search via a service binding (internal, never exposes traffic publicly) for top-k relevant chunks across the indexed sources.
    • Builds a prompt with chunks as context and your message as the user turn.
    • Streams the request to Anthropic via cr-ai-gateway. If the gateway returns 401 (slug or auth misconfig), the proxy automatically falls through to api.anthropic.com direct so chat never breaks.
  6. Anthropic returns Claude’s response (SSE). The proxy streams it back through the Pages Function to the browser.
  7. The chat panel renders the response and the citations from step 5.

Search (Documents / Meeting notes)

  1. You type a query into /docs or /notes.
  2. The browser POSTs to /api/search. The Pages Function forwards to cr-retrieval with a bearer.
  3. cr-retrieval embeds the query with Workers AI (BGE-large), runs topK queries against the requested Vectorize indices, and returns ranked chunks with their source URLs and snippets.

Ingest

  1. On-demand via POST /api/admin/ingest?source={notion|granola|drive} with the X-Admin-Token header. Cron triggers are paused until a workers.dev subdomain is registered (CF requirement).
  2. cr-retrieval walks the source (Notion search, Drive files.list, etc.), chunks each document, embeds via Workers AI, upserts to the appropriate Vectorize index.
  3. Per-invocation cap of 10 documents (Workers free-plan subrequest limit). Repeated calls drain the queue via cursor state in KV.

Tracing

The request ID at the top of the chat response lets us cross-reference: