Bulk upload — many files at once

Drop up to 50 PDFs or 50 YouTube URLs in one batch; track progress as they ingest.

When to use bulk vs single-file upload

For one or two files: regular Add source is fine. For more than ~5 at a time: bulk gives you per-file progress, a single retry path, and avoids re-checking the per-file extension and MIME validation for each one separately.

Bulk PDF / DOCX / TXT

From the vault, click Bulk upload. Pick or drag up to 50 files. We:

  1. Read each file and check its magic bytes (so a renamed .exe can't sneak through with a .pdf extension)
  2. Reject anything over 50 MB
  3. Check your source quota up front — if the batch would put you over, we 402 the whole batch (nothing partially uploaded)
  4. Stage each file to Storage with a UUID-prefixed path (so two files named paper.pdf don't collide)
  5. Create one ingestion_jobs row plus per-source rows + queued ingest jobs
  6. The worker picks them up and processes them in the background — usually 1-2 per minute depending on size

Bulk YouTube

Paste up to 50 YouTube URLs. We accept the four common shapes:

  • youtube.com/watch?v=VIDEOID
  • youtu.be/VIDEOID
  • youtube.com/embed/VIDEOID
  • youtube.com/shorts/VIDEOID

If any URL fails to parse to a video ID, the whole batch is rejected up front so you don't end up with a half-ingested set.

Tracking progress

After submission you get an ingestion_job_id. The UI polls GET /api/v1/vaults/{vault_id}/sources/bulk/{ingestion_job_id}every few seconds and shows:

  • Total — how many you submitted
  • Processed — done + failed (sum of terminal states)
  • Error count — failed only

Counts are computed live from job_queue joined byparent_job_id — there's no separate counter to fall out of sync.

Pre-flight cost estimate

Before submitting a large batch, you can call POST /api/v1/vaults/{vault_id}/sources/bulk/estimate with {file_count, total_bytes}. Returns indexing cost in USD + estimated wall-clock time. Useful when you're about to drop 50 large PDFs and want to know what it'll spend on Gemini File Search indexing.

Re-running a partial-failure batch

Bulk uploads are idempotent at the Storage layer (the underlying upload uses upsert) but the database insert path doesn't dedup yet. If a batch partially failed, retry only the failed files individually rather than re-submitting the whole batch.

Related articles