When to use bulk vs single-file upload
For one or two files: regular Add source is fine. For more than ~5 at a time: bulk gives you per-file progress, a single retry path, and avoids re-checking the per-file extension and MIME validation for each one separately.
Bulk PDF / DOCX / TXT
From the vault, click Bulk upload. Pick or drag up to 50 files. We:
- Read each file and check its magic bytes (so a renamed
.execan't sneak through with a.pdfextension) - Reject anything over 50 MB
- Check your source quota up front — if the batch would put you over, we 402 the whole batch (nothing partially uploaded)
- Stage each file to Storage with a UUID-prefixed path (so two files named
paper.pdfdon't collide) - Create one
ingestion_jobsrow plus per-source rows + queued ingest jobs - The worker picks them up and processes them in the background — usually 1-2 per minute depending on size
Bulk YouTube
Paste up to 50 YouTube URLs. We accept the four common shapes:
youtube.com/watch?v=VIDEOIDyoutu.be/VIDEOIDyoutube.com/embed/VIDEOIDyoutube.com/shorts/VIDEOID
If any URL fails to parse to a video ID, the whole batch is rejected up front so you don't end up with a half-ingested set.
Tracking progress
After submission you get an ingestion_job_id. The UI polls GET /api/v1/vaults/{vault_id}/sources/bulk/{ingestion_job_id}every few seconds and shows:
- Total — how many you submitted
- Processed — done + failed (sum of terminal states)
- Error count — failed only
Counts are computed live from job_queue joined byparent_job_id — there's no separate counter to fall out of sync.
Pre-flight cost estimate
Before submitting a large batch, you can call POST /api/v1/vaults/{vault_id}/sources/bulk/estimate with {file_count, total_bytes}. Returns indexing cost in USD + estimated wall-clock time. Useful when you're about to drop 50 large PDFs and want to know what it'll spend on Gemini File Search indexing.
Re-running a partial-failure batch
Bulk uploads are idempotent at the Storage layer (the underlying upload uses upsert) but the database insert path doesn't dedup yet. If a batch partially failed, retry only the failed files individually rather than re-submitting the whole batch.