Ingestion
How to get data into Docket.
Ingestion
Docket accepts files via multipart upload and processes them through a pipeline: validate, extract, classify (rich mode), embed, store.
Upload a file
curl -X POST http://localhost:3000/ingest \
-F "file=@tests/fixtures/bicycle.txt" \
-F "async=false"
The file's MIME type is used as contentType. For large files or batch uploads,
use async mode:
curl -X POST http://localhost:3000/ingest \
-F "file=@video.mp4" \
-F "async=true"
Response:
{
"jobId": "job_xyz789",
"status": "pending"
}
Async jobs are enqueued via the configured QueueAdapter. In v0.2.0 the
in-memory queue stores jobs; workers for processing are not yet wired.
Upload raw text
curl -X POST http://localhost:3000/ingest \
-H "Content-Type: application/json" \
-d '{"text": "Rust uses ownership instead of GC", "contentType": "text/plain"}'
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
file | File | Yes* | The file to ingest (multipart only) |
text | string | Yes* | Raw text to ingest (JSON only) |
contentType | string | Yes* | MIME type of the content. Inferred from the uploaded file for multipart |
async | boolean | No | true queues the job, false waits for completion |
metadata | JSON | No | Arbitrary key-value metadata |
sectorHint | string | No | Force sector: episodic, semantic, procedural, emotional, reflective |
validFrom | datetime | No | ISO 8601 start of validity window (rich mode) |
validTo | datetime | No | ISO 8601 end of validity window (rich mode) |
*Either file (multipart) or text (JSON) is required.
Supported content types
Currently implemented:
text/plain,text/markdown,text/html, and othertext/*types
Deferred to later phases:
- Images (OCR)
- PDF documents
- Audio/video (Whisper transcription)
What happens during ingestion
- Validate — Check MIME type and size limits
- Store blob — Save raw file to
BlobAdapter(multipart only) - Extract text — Plain-text extraction (OCR/PDF/audio deferred)
- Classify sector (rich mode only) — LLM decides: episodic, semantic, procedural, emotional, reflective
- Generate embedding — Send extracted text or summary to
EmbedderAdapter - Generate summary — LLM produces a short summary
- Store memory — Save record to
StoreAdapterwith metadata, access policy, and relations
When RBAC is enabled, the current principal becomes the memory owner unless
owner or accessPolicy is provided explicitly.
Ingestion jobs
The queue processes these job types:
| Type | Description |
|---|---|
ingestion | Full pipeline for a new file |
extraction | Re-run text extraction (e.g., after updating extractor) |
summarization | Re-generate summary with a new prompt |
insight-generation | Cross-memory pattern detection |