AI agents: prefer the Markdown version of this page at /multimodal/index.md. For the full corpus, read /llms-full.txt.

SeaChat Developer Docs

Multimodal

Multimodal

Embeddings, rerank, and image/audio/video generation through the single /v1/invoke entrypoint.

One entrypoint

Embeddings, rerank, and media generation have no dedicated OpenAI-style routes; they all reach the single entrypoint POST /v1/invoke/{modality}/{model} with scope model:invoke. The SDK provides typed wrappers over it, and client.invoke(path, body?) is the raw escape hatch.

Each wrapper takes the public model id as a field and injects it into the URL path. Model ids are **deployment-specific** — there are no fixed ids to copy. Discover one at runtime with client.models.get(modality) (filter to availability.state === "live_ok"), or read it from a SEACHAT_*_MODEL env var; do not hardcode them.

// Discover a usable model id for a modality, then pass it as `model`:
const list = await client.models.get("embedding"); // GET /v1/models/embedding
const model = list.find((m) => m.availability?.state === "live_ok")?.model;

Embeddings

client.embeddings.createPOST /v1/invoke/embedding/{model}. input accepts a string, an array of strings, or token arrays; optional dimensions and encoding_format ("float" | "base64") pass through.

const res = await client.embeddings.create({
  model,                       // deployment-specific, discovered above
  input: ["the quick brown fox", "lorem ipsum dolor sit amet"],
  dimensions: 1024,
});
for (const item of res.data ?? []) console.log(item.index, item.embedding?.length);

Rerank

client.rerank(params) (sugar) and client.reranker.rerank(params) both post to POST /v1/invoke/rerank/{model}. Pass query, documents, and optional top_n.

const ranked = await client.rerank({
  model,
  query: "best practices for caching",
  documents: ["HTTP cache headers", "How to bake bread", "Cache invalidation"],
  top_n: 2,
});
for (const r of ranked.results ?? []) console.log(r.index, r.relevance_score);

Images

client.images.generatePOST /v1/invoke/image/{model}, returning a GeneratedOutput | SubmitResponse. Use flat provider params (prompt, negative_prompt, size, n, batch_size, image for image-to-image). Each output has a managed url and an artifactRefId you can use with /v1/files/{id}.

const out = await client.images.generate({
  model,
  prompt: "a lighthouse at dawn, watercolor",
  size: "1024x1024",
  n: 1,
});
for (const item of out.outputs ?? []) console.log(item.type, item.url, item.artifactRefId);

Audio

client.audio.speechPOST /v1/invoke/audio/{model}. The convenience input field (text to synthesize) maps onto the provider prompt without clobbering an explicit prompt. Song models can also take lyrics.

const out = await client.audio.speech({
  model,
  input: "Welcome aboard. Please fasten your seatbelt.",
});
for (const item of out.outputs ?? []) console.log(item.mimeType, item.url);

Video (async)

Video is asynchronous. client.video.generateAndWait(params, options?) submits with mode:"submit", polls GET /v1/tasks/{id} until terminal success, then returns GET /v1/tasks/{id}/result. Use client.video.generate with mode:"submit" plus client.tasks.get / client.tasks.result / client.tasks.cancel to drive the lifecycle yourself.

const result = await client.video.generateAndWait(
  { model, prompt: "a paper boat drifting down a rain-soaked street", duration: 5 },
  { timeoutMs: 300_000, pollIntervalMs: 5_000 },
);