Responsibilities
| Layer | Responsibility |
|---|---|
crates/protocol | Public DTOs, method constants, notifications, and JSON Schema export. |
crates/artifacts | ArtifactService, blob store abstraction, local blob store, ingestion, path validation, quotas, GC planning, projections, and provider artifact resolution. |
crates/crud | All artifact database reads and writes through CrudStore; no gateway handler should bypass this layer. |
crates/entity | SeaORM entities for artifact tables. |
crates/migration | Artifact schema creation and rollback. |
crates/gateway | JSON-RPC handlers, WebSocket upload/download sessions, explicit agent artifact registration tools, notifications, and provider attachment normalization. |
crates/desktop | Thread artifacts panel, timeline chips, preview cache, local download/open/reveal actions, and composer reuse. |
crates/config | Artifact storage, staging, quota, upload, download, and projection defaults. |
Storage Layout
LocalArtifactBlobStore stores bytes under the gateway runtime home:
PIONEER_ARTIFACT_OUTPUT_DIR. It is not durable storage and it is not a protocol identity. artifact_prepare allocates a safe path inside this directory, and artifact_register imports the completed regular file into the canonical blob store.
The storage key is derived from the SHA-256 digest. The physical store deduplicates identical blobs inside the same workspace and validates existing blobs before reusing them.
Desktop preview files are separate. The desktop writes derived preview variants under its own runtime home at previews/artifacts and prunes that cache when it exceeds 512 MB. Those files are not authoritative and can be deleted without losing artifacts.
Database Model
Artifact metadata lives ingateway.db.
| Table | Purpose |
|---|---|
artifact_blob | One stored blob: workspace, SHA-256, size, MIME type, storage backend, storage key, verification metadata. |
artifact | User-facing artifact identity: display name, kind, status, current version, creator, primary thread, soft-delete state. |
artifact_version | Immutable version metadata pointing to a blob, plus creation lineage such as turn, message, tool call, task, or task run. |
artifact_binding | Links an artifact/version to a thread, turn, message, turn item, tool call, task, or task run with direction and role. |
artifact_projection | Derived views such as plain text and thumbnails. A projection can store text inline or point at another artifact blob. |
artifact_external_ref | Provider-specific upload refs so the same artifact can be reused with a model provider without re-uploading every time. |
artifact_upload_session | Schema support for upload session bookkeeping. The active WebSocket chunk path also keeps connection-bound session state in the gateway. |
Identity And Versions
Anartifact_id identifies the logical file. A version_id identifies a concrete blob-backed version. Most UI operations can use the current version, but protocol clients should pass version_id when they need an exact immutable file.
Artifacts can be soft deleted and restored. Deleting an artifact changes its status; blob GC is separate and must respect grace periods and active references.
Ingestion
All ingestion flows go throughArtifactService.
User uploads:
- The client calls
artifact/upload/startwith workspace, optional thread/planned turn, file name, MIME type, size, SHA-256, and source kind. - The client sends binary chunk frames on the same WebSocket connection.
- The gateway validates offsets, chunk hashes, declared size, session owner, and final SHA-256.
artifact/upload/finishpersists the temp file throughArtifactService::ingest_temp_file.- The service writes the blob, creates metadata through
CrudStore, creates projections when supported, and returns anArtifactRef.
- The turn runtime creates
PIONEER_ARTIFACT_OUTPUT_DIRbefore model/tool execution. - The prompt contract tells the model that user-visible result files must be registered before the final response.
- The model calls
artifact_preparewhen it needs a safe path before writing the file. - A shell command, MCP tool, browser tool, renderer, or other tool writes bytes to that prepared path.
- The model calls
artifact_registerwith the completed file path. - The gateway validates the path, rejects symlink escapes and non-regular files, sniffs/validates MIME when possible, checks size/quota limits, writes the blob, creates DB rows, and binds the artifact to workspace/thread/turn/message/tool lineage.
- The gateway sends
artifact/createdandthread/artifacts/changed. Projection workers may later sendartifact/projection/updated. - Staging files are removed after successful registration or by TTL/GC cleanup.
Projections And Previews
Projections are derived data for display and indexing. Current projection kinds are:| Projection | Stored as | Notes |
|---|---|---|
plain_text | Inline text in artifact_projection.text_content | Created for supported small text-like files up to 256 KB. |
thumbnail | A PNG blob referenced by artifact_projection.blob_id | Created for image-like artifacts when the source is up to 64 MB. |
json_summary | Reserved protocol kind | For future structured summaries. |
pdf_text | Reserved protocol kind | For future document extraction. |
ready while its thumbnail is pending or failed.
Remote Gateway Boundary
The desktop app never reads the gateway filesystem directly. This is required for remote gateways. For uploads, the desktop streams local bytes to the gateway and the gateway persists them as artifacts. For downloads and open/reveal actions, the desktop usesartifact/download/* to copy bytes from the gateway into a local cache or user-selected folder.
Any new desktop artifact feature must use protocol methods. Do not add direct file-store access in the desktop layer.
Provider Attachment Reuse
Model providers often require their own file upload step before an attachment can be used in a request. Pioneer keeps that provider-specific state inartifact_external_ref.
The cache key includes workspace, artifact, optional version, provider, optional model family, and transport kind. Expired refs are pruned and ignored. This keeps provider attachment reuse workspace scoped and replaces the previous standalone provider attachment cache.
Prompt-Time Artifact References
Current-turn attachments are still sent to the provider as real provider attachments when the selected provider supports them. Historical artifacts are handled differently. When the gateway renders retained conversation history, it can append compact artifact references to the specific user or assistant message that originally had those artifacts. When thread episodic recall injects an older snippet, the snippet can carry a scoped block such as “available artifacts for this recalled snippet.” In both cases the prompt gets metadata, not bytes:- artifact id;
- version id when known;
- display name;
- artifact kind;
- MIME type;
- size;
- role or binding direction;
- source message/snippet identity.
artifact tool domain and call artifact_read for the exact artifact ids it wants. artifact_read can return text-like content directly or provide a provider attachment for binary/image-like content. The tool does not accept a workspace id from the model; the gateway resolves workspace and authorization from the current turn context.
This design keeps three things separate:
| Surface | What the model sees | When bytes are loaded |
|---|---|---|
| Current attachments | Actual provider attachments plus message text. | Immediately for the current turn. |
| History artifact refs | Metadata beside retained recent history messages. | Only if artifact_read is called. |
| Thread-context artifact refs | Metadata beside recalled episodic snippets. | Only if artifact_read is called. |
Security And Lifecycle
The artifact boundary is designed for local and remote gateways:- User uploads are normalized by streaming bytes from the client to the gateway; the desktop never sends an inaccessible desktop path as model context.
artifact_registeraccepts files only from allowed roots such as the workspace and turn-scoped artifact output directory, plus any explicit path the user allowed for that turn.- Symlink escapes are rejected after canonicalization.
- Directories, device files, sockets, pipes, and other non-regular files are rejected.
- MIME type and artifact kind are derived from declared data, sniffing, and file extension rather than trusted blindly from the model.
- Per-file size limits, workspace byte quotas, and workspace file-count quotas are checked before a blob is committed.
- Blob writes, metadata writes, version creation, and binding creation are transactional at the service/repository boundary. Failed registration cleans up partial temp state and does not create a misleading thread artifact.
- Staging directories and upload session directories are temporary; GC may remove expired staging/output bytes after their TTL.
- Blob GC is separate and must preserve any blob referenced by active artifact versions or projections.
Developer Rules
- Keep artifact DB access inside
crates/crudand expose it throughCrudStore. - Keep bytes behind
ArtifactBlobStore; do not hard-code local filesystem paths outside the blob-store implementation. - Register agent-created user-visible files with
artifact_prepareandartifact_register; do not add heuristic filesystem discovery. - Use artifact references for historical continuity; do not auto-attach every old artifact to every provider request.
- Keep historical artifact references scoped to the retained history message or recalled thread-context snippet that produced them.
- Always validate
workspace_idbefore listing, reading, uploading, downloading, binding, deleting, or restoring artifacts. - Bind artifacts at the narrowest known scope: message when known, otherwise turn/item/task lineage, and always workspace.
- Do not query artifacts for draft threads that are not materialized.
- Do not assume a desktop path exists on the gateway or a gateway path exists on the desktop.
- Send
thread/artifacts/changedwhen a thread-level artifact set or binding changes. - Add protocol schemas when adding public fields, methods, events, statuses, kinds, or projections.
Related Pages
- Artifacts User Guide explains the desktop workflow.
- Artifacts API documents JSON-RPC methods, binary chunk frames, and notifications.
- Persistence Layer explains the broader database and
CrudStoreboundary. - Provider Architecture explains provider normalization and attachments.