Skip to main content
Artifacts are Pioneer’s durable file layer. They keep file bytes out of transient chat UI state and give the gateway a stable way to store, list, preview, download, and reuse files that are uploaded by users or produced by agents. The core rule is simple: artifacts are workspace scoped and gateway owned. The desktop app may cache previews or downloaded copies, but the gateway is the source of truth for artifact bytes and metadata. Artifacts also participate in conversation continuity. Pioneer does not keep re-sending every old file to the model. Instead, it keeps compact artifact references attached to the exact conversation messages or recalled thread-context snippets where those artifacts appeared. The model can request the artifact domain and read a specific artifact only when it needs the content.

Responsibilities

LayerResponsibility
crates/protocolPublic DTOs, method constants, notifications, and JSON Schema export.
crates/artifactsArtifactService, blob store abstraction, local blob store, ingestion, path validation, quotas, GC planning, projections, and provider artifact resolution.
crates/crudAll artifact database reads and writes through CrudStore; no gateway handler should bypass this layer.
crates/entitySeaORM entities for artifact tables.
crates/migrationArtifact schema creation and rollback.
crates/gatewayJSON-RPC handlers, WebSocket upload/download sessions, explicit agent artifact registration tools, notifications, and provider attachment normalization.
crates/desktopThread artifacts panel, timeline chips, preview cache, local download/open/reveal actions, and composer reuse.
crates/configArtifact storage, staging, quota, upload, download, and projection defaults.

Storage Layout

LocalArtifactBlobStore stores bytes under the gateway runtime home:
<runtime_home>/artifacts/workspaces/<workspace_id>/blobs/sha256/<aa>/<bb>/<sha256>
Temporary upload payloads are connection-bound and written under:
<runtime_home>/artifacts/upload_sessions/<workspace_id>/<upload_id>/payload.bin
Agent-created result files use a separate turn-scoped staging directory before they become durable artifacts:
<runtime_home>/artifact-output/<workspace_id>/<thread_id>/<turn_id>/
The staging directory is exposed to the model and tools as PIONEER_ARTIFACT_OUTPUT_DIR. It is not durable storage and it is not a protocol identity. artifact_prepare allocates a safe path inside this directory, and artifact_register imports the completed regular file into the canonical blob store. The storage key is derived from the SHA-256 digest. The physical store deduplicates identical blobs inside the same workspace and validates existing blobs before reusing them. Desktop preview files are separate. The desktop writes derived preview variants under its own runtime home at previews/artifacts and prunes that cache when it exceeds 512 MB. Those files are not authoritative and can be deleted without losing artifacts.

Database Model

Artifact metadata lives in gateway.db.
TablePurpose
artifact_blobOne stored blob: workspace, SHA-256, size, MIME type, storage backend, storage key, verification metadata.
artifactUser-facing artifact identity: display name, kind, status, current version, creator, primary thread, soft-delete state.
artifact_versionImmutable version metadata pointing to a blob, plus creation lineage such as turn, message, tool call, task, or task run.
artifact_bindingLinks an artifact/version to a thread, turn, message, turn item, tool call, task, or task run with direction and role.
artifact_projectionDerived views such as plain text and thumbnails. A projection can store text inline or point at another artifact blob.
artifact_external_refProvider-specific upload refs so the same artifact can be reused with a model provider without re-uploading every time.
artifact_upload_sessionSchema support for upload session bookkeeping. The active WebSocket chunk path also keeps connection-bound session state in the gateway.
The binding table is what makes list-by-thread, list-by-turn, and list-by-message fast and explicit. Do not infer artifact membership from file paths.

Identity And Versions

An artifact_id identifies the logical file. A version_id identifies a concrete blob-backed version. Most UI operations can use the current version, but protocol clients should pass version_id when they need an exact immutable file. Artifacts can be soft deleted and restored. Deleting an artifact changes its status; blob GC is separate and must respect grace periods and active references.

Ingestion

All ingestion flows go through ArtifactService. User uploads:
  1. The client calls artifact/upload/start with workspace, optional thread/planned turn, file name, MIME type, size, SHA-256, and source kind.
  2. The client sends binary chunk frames on the same WebSocket connection.
  3. The gateway validates offsets, chunk hashes, declared size, session owner, and final SHA-256.
  4. artifact/upload/finish persists the temp file through ArtifactService::ingest_temp_file.
  5. The service writes the blob, creates metadata through CrudStore, creates projections when supported, and returns an ArtifactRef.
Agent-created files:
  1. The turn runtime creates PIONEER_ARTIFACT_OUTPUT_DIR before model/tool execution.
  2. The prompt contract tells the model that user-visible result files must be registered before the final response.
  3. The model calls artifact_prepare when it needs a safe path before writing the file.
  4. A shell command, MCP tool, browser tool, renderer, or other tool writes bytes to that prepared path.
  5. The model calls artifact_register with the completed file path.
  6. The gateway validates the path, rejects symlink escapes and non-regular files, sniffs/validates MIME when possible, checks size/quota limits, writes the blob, creates DB rows, and binds the artifact to workspace/thread/turn/message/tool lineage.
  7. The gateway sends artifact/created and thread/artifacts/changed. Projection workers may later send artifact/projection/updated.
  8. Staging files are removed after successful registration or by TTL/GC cleanup.
Task result files follow the same artifact service path and bind files to task and task-run lineage. If a task runs in a hidden subagent thread, the artifact remains bound to that child thread and task run, while parent-thread artifact listing can include the child-thread subtree through thread lineage. There is no turn-end filesystem scan. Pioneer does not infer artifacts from newly-created files in a workspace, process current directory, home directory, browser runtime directory, or tool cache. Any tool can create a file, but user-visible turn results enter the artifact graph only through explicit registration.

Projections And Previews

Projections are derived data for display and indexing. Current projection kinds are:
ProjectionStored asNotes
plain_textInline text in artifact_projection.text_contentCreated for supported small text-like files up to 256 KB.
thumbnailA PNG blob referenced by artifact_projection.blob_idCreated for image-like artifacts when the source is up to 64 MB.
json_summaryReserved protocol kindFor future structured summaries.
pdf_textReserved protocol kindFor future document extraction.
Clients should treat projection state independently from artifact state. An artifact can be ready while its thumbnail is pending or failed.

Remote Gateway Boundary

The desktop app never reads the gateway filesystem directly. This is required for remote gateways. For uploads, the desktop streams local bytes to the gateway and the gateway persists them as artifacts. For downloads and open/reveal actions, the desktop uses artifact/download/* to copy bytes from the gateway into a local cache or user-selected folder. Any new desktop artifact feature must use protocol methods. Do not add direct file-store access in the desktop layer.

Provider Attachment Reuse

Model providers often require their own file upload step before an attachment can be used in a request. Pioneer keeps that provider-specific state in artifact_external_ref. The cache key includes workspace, artifact, optional version, provider, optional model family, and transport kind. Expired refs are pruned and ignored. This keeps provider attachment reuse workspace scoped and replaces the previous standalone provider attachment cache.

Prompt-Time Artifact References

Current-turn attachments are still sent to the provider as real provider attachments when the selected provider supports them. Historical artifacts are handled differently. When the gateway renders retained conversation history, it can append compact artifact references to the specific user or assistant message that originally had those artifacts. When thread episodic recall injects an older snippet, the snippet can carry a scoped block such as “available artifacts for this recalled snippet.” In both cases the prompt gets metadata, not bytes:
  • artifact id;
  • version id when known;
  • display name;
  • artifact kind;
  • MIME type;
  • size;
  • role or binding direction;
  • source message/snippet identity.
This lets the model understand that a past message had a photo, PDF, CSV, screenshot, or generated file without paying the cost of loading every old artifact into every turn. If the model needs content, it must reveal the hidden artifact tool domain and call artifact_read for the exact artifact ids it wants. artifact_read can return text-like content directly or provide a provider attachment for binary/image-like content. The tool does not accept a workspace id from the model; the gateway resolves workspace and authorization from the current turn context. This design keeps three things separate:
SurfaceWhat the model seesWhen bytes are loaded
Current attachmentsActual provider attachments plus message text.Immediately for the current turn.
History artifact refsMetadata beside retained recent history messages.Only if artifact_read is called.
Thread-context artifact refsMetadata beside recalled episodic snippets.Only if artifact_read is called.
Do not add a global “all thread artifacts” prompt section. Long threads can contain hundreds of artifacts, and most of them are irrelevant. Artifact references should stay close to the message or snippet that gives them meaning.

Security And Lifecycle

The artifact boundary is designed for local and remote gateways:
  • User uploads are normalized by streaming bytes from the client to the gateway; the desktop never sends an inaccessible desktop path as model context.
  • artifact_register accepts files only from allowed roots such as the workspace and turn-scoped artifact output directory, plus any explicit path the user allowed for that turn.
  • Symlink escapes are rejected after canonicalization.
  • Directories, device files, sockets, pipes, and other non-regular files are rejected.
  • MIME type and artifact kind are derived from declared data, sniffing, and file extension rather than trusted blindly from the model.
  • Per-file size limits, workspace byte quotas, and workspace file-count quotas are checked before a blob is committed.
  • Blob writes, metadata writes, version creation, and binding creation are transactional at the service/repository boundary. Failed registration cleans up partial temp state and does not create a misleading thread artifact.
  • Staging directories and upload session directories are temporary; GC may remove expired staging/output bytes after their TTL.
  • Blob GC is separate and must preserve any blob referenced by active artifact versions or projections.

Developer Rules

  • Keep artifact DB access inside crates/crud and expose it through CrudStore.
  • Keep bytes behind ArtifactBlobStore; do not hard-code local filesystem paths outside the blob-store implementation.
  • Register agent-created user-visible files with artifact_prepare and artifact_register; do not add heuristic filesystem discovery.
  • Use artifact references for historical continuity; do not auto-attach every old artifact to every provider request.
  • Keep historical artifact references scoped to the retained history message or recalled thread-context snippet that produced them.
  • Always validate workspace_id before listing, reading, uploading, downloading, binding, deleting, or restoring artifacts.
  • Bind artifacts at the narrowest known scope: message when known, otherwise turn/item/task lineage, and always workspace.
  • Do not query artifacts for draft threads that are not materialized.
  • Do not assume a desktop path exists on the gateway or a gateway path exists on the desktop.
  • Send thread/artifacts/changed when a thread-level artifact set or binding changes.
  • Add protocol schemas when adding public fields, methods, events, statuses, kinds, or projections.