How a single answer is built
- Query rewrite — multi-turn condense (so "what about side effects" resolves against the previous question) plus peptide-synonym expansion (e.g. "BPC-157" → also searches for "Body Protection Compound 157").
- Retrieve + ground — Gemini File Search pulls the top-K (10) most relevant chunks from your vault, then generates the answer using only those chunks. Grounding is mandatory; we can't answer outside your sources.
- Rerank citations — a small Gemini call scores the citations the model used for relevance, drops the bottom half, and applies a diversity filter (max 2 cites per source).
- Final top-5 citations attached to the answer for the UI to render.
The off-topic refusal
If you ask something the corpus can't answer (e.g. "what's the weather"), you get the canonical refusal:
I can only answer questions about the documents in this vault. Try rephrasing your question to be about something in your sources.
This is deliberate — we'd rather refuse than fabricate. If you think the corpus DOES have the answer and you got refused, try adding 1-2 more specific keywords (an author name, a peptide abbreviation) to surface the right chunks.
Model routing
We pick the cheapest model that still answers well, based on your plan and the shape of your question:
- Definitional queries on Free ("what is BPC-157?") →
gemini-2.5-flash-lite - Everything else →
gemini-2.5-flash - Pro + deep-research mode (Phase 6+) →
gemini-2.5-pro
Sessions and history
Each chat sits in a session. Sessions remember the prior turns so follow-ups work naturally ("and the dosing in that study?"). You can rename sessions, browse them at /vaults/{id}/history, and delete them individually.
What we don't do (yet)
- Streaming answers — the non-streaming endpoint returns the full reply at once. Phase 8 work.
- Source toggles in the prompt — every enabled source in the vault is searched. To exclude a source, toggle it off in the sources list.
- Multi-vault queries — one chat = one vault. To search across, copy sources between vaults or wait for the cross-vault feature.
Quota
Free: 50 queries/month. Pro: unlimited within fair-use (~2000/mo and a $5/day Gemini cost ceiling). When you hit a cap the chat endpoint returns HTTP 402 upgrade_required — see quotas.