# Multimodal

Embeddings, rerank, and image/audio/video generation through the single /v1/invoke entrypoint.

## One entrypoint

Embeddings, rerank, and media generation have no dedicated OpenAI-style routes; they all reach the single entrypoint `POST /v1/invoke/{modality}/{model}` with scope `model:invoke`. The SDK provides typed wrappers over it, and `client.invoke(path, body?)` is the raw escape hatch.

Each wrapper takes the public `model` id as a field and injects it into the URL path. Model ids are **deployment-specific** — there are no fixed ids to copy. Discover one at runtime with `client.models.get(modality)` (filter to `availability.state === "live_ok"`), or read it from a `SEACHAT_*_MODEL` env var; do not hardcode them.

```bash
// Discover a usable model id for a modality, then pass it as `model`:
const list = await client.models.get("embedding"); // GET /v1/models/embedding
const model = list.find((m) => m.availability?.state === "live_ok")?.model;
```

## Embeddings

`client.embeddings.create` → `POST /v1/invoke/embedding/{model}`. `input` accepts a string, an array of strings, or token arrays; optional `dimensions` and `encoding_format` (`"float" | "base64"`) pass through.

```bash
const res = await client.embeddings.create({
  model,                       // deployment-specific, discovered above
  input: ["the quick brown fox", "lorem ipsum dolor sit amet"],
  dimensions: 1024,
});
for (const item of res.data ?? []) console.log(item.index, item.embedding?.length);
```

## Rerank

`client.rerank(params)` (sugar) and `client.reranker.rerank(params)` both post to `POST /v1/invoke/rerank/{model}`. Pass `query`, `documents`, and optional `top_n`.

```bash
const ranked = await client.rerank({
  model,
  query: "best practices for caching",
  documents: ["HTTP cache headers", "How to bake bread", "Cache invalidation"],
  top_n: 2,
});
for (const r of ranked.results ?? []) console.log(r.index, r.relevance_score);
```

## Images

`client.images.generate` → `POST /v1/invoke/image/{model}`, returning a `GeneratedOutput | SubmitResponse`. Use flat provider params (`prompt`, `negative_prompt`, `size`, `n`, `batch_size`, `image` for image-to-image). Each output has a managed `url` and an `artifactRefId` you can use with `/v1/files/{id}`.

```bash
const out = await client.images.generate({
  model,
  prompt: "a lighthouse at dawn, watercolor",
  size: "1024x1024",
  n: 1,
});
for (const item of out.outputs ?? []) console.log(item.type, item.url, item.artifactRefId);
```

## Audio

`client.audio.speech` → `POST /v1/invoke/audio/{model}`. The convenience `input` field (text to synthesize) maps onto the provider `prompt` without clobbering an explicit `prompt`. Song models can also take `lyrics`.

```bash
const out = await client.audio.speech({
  model,
  input: "Welcome aboard. Please fasten your seatbelt.",
});
for (const item of out.outputs ?? []) console.log(item.mimeType, item.url);
```

## Video (async)

Video is asynchronous. `client.video.generateAndWait(params, options?)` submits with `mode:"submit"`, polls `GET /v1/tasks/{id}` until terminal success, then returns `GET /v1/tasks/{id}/result`. Use `client.video.generate` with `mode:"submit"` plus `client.tasks.get` / `client.tasks.result` / `client.tasks.cancel` to drive the lifecycle yourself.

```bash
const result = await client.video.generateAndWait(
  { model, prompt: "a paper boat drifting down a rain-soaked street", duration: 5 },
  { timeoutMs: 300_000, pollIntervalMs: 5_000 },
);
```