Docket Docs

Ingestion

How to get data into Docket.

Ingestion

Docket accepts files via multipart upload and processes them through a pipeline: validate, extract, classify (rich mode), embed, store.

Upload a file

curl -X POST http://localhost:3000/ingest \
  -F "file=@tests/fixtures/bicycle.txt" \
  -F "async=false"

The file's MIME type is used as contentType. For large files or batch uploads, use async mode:

curl -X POST http://localhost:3000/ingest \
  -F "file=@video.mp4" \
  -F "async=true"

Response:

{
  "jobId": "job_xyz789",
  "status": "pending"
}

Async jobs are enqueued via the configured QueueAdapter. In v0.2.0 the in-memory queue stores jobs; workers for processing are not yet wired.

Upload raw text

curl -X POST http://localhost:3000/ingest \
  -H "Content-Type: application/json" \
  -d '{"text": "Rust uses ownership instead of GC", "contentType": "text/plain"}'

Parameters

FieldTypeRequiredDescription
fileFileYes*The file to ingest (multipart only)
textstringYes*Raw text to ingest (JSON only)
contentTypestringYes*MIME type of the content. Inferred from the uploaded file for multipart
asyncbooleanNotrue queues the job, false waits for completion
metadataJSONNoArbitrary key-value metadata
sectorHintstringNoForce sector: episodic, semantic, procedural, emotional, reflective
validFromdatetimeNoISO 8601 start of validity window (rich mode)
validTodatetimeNoISO 8601 end of validity window (rich mode)

*Either file (multipart) or text (JSON) is required.

Supported content types

Currently implemented:

  • text/plain, text/markdown, text/html, and other text/* types

Deferred to later phases:

  • Images (OCR)
  • PDF documents
  • Audio/video (Whisper transcription)

What happens during ingestion

  1. Validate — Check MIME type and size limits
  2. Store blob — Save raw file to BlobAdapter (multipart only)
  3. Extract text — Plain-text extraction (OCR/PDF/audio deferred)
  4. Classify sector (rich mode only) — LLM decides: episodic, semantic, procedural, emotional, reflective
  5. Generate embedding — Send extracted text or summary to EmbedderAdapter
  6. Generate summary — LLM produces a short summary
  7. Store memory — Save record to StoreAdapter with metadata, access policy, and relations

When RBAC is enabled, the current principal becomes the memory owner unless owner or accessPolicy is provided explicitly.

Ingestion jobs

The queue processes these job types:

TypeDescription
ingestionFull pipeline for a new file
extractionRe-run text extraction (e.g., after updating extractor)
summarizationRe-generate summary with a new prompt
insight-generationCross-memory pattern detection

On this page