Bulk upload — many files at once — Help — Peptides Vault

When to use bulk vs single-file upload

For one or two files: regular Add source is fine. For more than ~5 at a time: bulk gives you per-file progress, a single retry path, and avoids re-checking the per-file extension and MIME validation for each one separately.

Bulk PDF / DOCX / TXT

From the vault, click Bulk upload. Pick or drag up to 50 files. We:

Read each file and check its magic bytes (so a renamed .exe can't sneak through with a .pdf extension)
Reject anything over 50 MB
Check your source quota up front — if the batch would put you over, we 402 the whole batch (nothing partially uploaded)
Stage each file to Storage with a UUID-prefixed path (so two files named paper.pdf don't collide)
Create one ingestion_jobs row plus per-source rows + queued ingest jobs
The worker picks them up and processes them in the background — usually 1-2 per minute depending on size

Bulk YouTube

Paste up to 50 YouTube URLs. We accept the four common shapes:

youtube.com/watch?v=VIDEOID
youtu.be/VIDEOID
youtube.com/embed/VIDEOID
youtube.com/shorts/VIDEOID

If any URL fails to parse to a video ID, the whole batch is rejected up front so you don't end up with a half-ingested set.

Tracking progress

After submission you get an ingestion_job_id. The UI polls GET /api/v1/vaults/{vault_id}/sources/bulk/{ingestion_job_id}every few seconds and shows:

Total — how many you submitted
Processed — done + failed (sum of terminal states)
Error count — failed only

Counts are computed live from job_queue joined byparent_job_id — there's no separate counter to fall out of sync.

Pre-flight cost estimate

Before submitting a large batch, you can call POST /api/v1/vaults/{vault_id}/sources/bulk/estimate with {file_count, total_bytes}. Returns indexing cost in USD + estimated wall-clock time. Useful when you're about to drop 50 large PDFs and want to know what it'll spend on Gemini File Search indexing.

Re-running a partial-failure batch

Bulk uploads are idempotent at the Storage layer (the underlying upload uses upsert) but the database insert path doesn't dedup yet. If a batch partially failed, retry only the failed files individually rather than re-submitting the whole batch.