Download

API Documentation

Complete endpoint reference for the local AI server.

Local AI Server API

This document describes the HTTP API exposed by the app server (WebMServer).

Base URL

  • Default port: 11434
  • Base URL format: http://<iphone-ip-address>:<port>
  • The server binds to 0.0.0.0 (reachable on local network).
  • Bonjour name is exposed in the UI and can also be used on compatible networks.

Authentication

  • No authentication is enforced by the server.
  • If a client requires an API key (for OpenAI-compatible flows), any non-empty value can be used.

Content Type

  • Send Content-Type: application/json for most POST requests.
  • POST /v1/audio/transcriptions requires multipart/form-data.
  • POST /v1/audio/transcriptions has a route body limit of 30MB, but validates uploaded audio files up to 25MB.

Endpoint Summary

Method Path Purpose
HEAD / Lightweight health check
GET /status Server status
GET /api/services/screen/brightness Read device brightness
POST /api/services/screen/brightness Set device brightness
GET /v1/models List installed models plus Apple Foundation Model when available (OpenAI-like)
POST /v1/chat/completions Chat completion (OpenAI-compatible)
POST /v1/chat Chat completion alias (legacy compatibility)
POST /v1/completions Text completion (OpenAI-like)
POST /v1/audio/transcriptions Speech-to-text transcription (OpenAI-compatible)
POST /v1/audio/speech Text-to-speech synthesis (OpenAI-compatible)
GET /api/tags List installed models plus Apple Foundation Model when available (Ollama-like)
POST /api/chat Chat completion (Ollama-like)
POST /api/generate Prompt completion (Ollama-like)
POST /api/show Show model details (Ollama-like)

Health and Status

HEAD /

  • Returns HTTP 200 OK with no response body.

GET /status

Returns:

{
  "status": "Running",
  "message": "Server is currently running."
}

status values are Stopped, Running, or Failed.

Device Control

GET /api/services/screen/brightness

Returns current brightness:

{
  "brightness": 0.62
}

POST /api/services/screen/brightness

Request body:

{
  "brightness": 0.8
}

Behavior:

  • brightness must be a finite number.
  • Value is clamped to [0.0, 1.0].
  • Non-finite values return 400 Bad Request.

Response:

{
  "requestedBrightness": 0.8,
  "appliedBrightness": 0.8
}

OpenAI-like API (/v1)

Model Naming

For /v1 endpoints, send model names without the mlx-community/ prefix (for example Llama-3.2-1B-Instruct-4bit).

GET /v1/models

Returns installed models discovered by the app. When Apple Foundation Models are available on the device, apple-foundation-model is included. Response keys are snake_case.

Example:

{
  "object": "list",
  "data": [
    {
      "id": "llama-3.2-1b-instruct-4bit",
      "object": "model",
      "created": 0,
      "owned_by": "Unknown",
      "capabilities": {
        "completion_chat": true,
        "completion_fim": false,
        "function_calling": false,
        "vision": false,
        "fine_tuning": false
      },
      "name": "llama-3.2-1b-instruct-4bit",
      "description": "llama-3.2-1b-instruct-4bit",
      "aliases": [],
      "deprecation": null,
      "type": "base"
    }
  ]
}

POST /v1/chat/completions

Request body:

{
  "model": "Llama-3.2-1B-Instruct-4bit",
  "messages": [
    { "role": "user", "content": "Hello" }
  ]
}

Response shape:

{
  "id": "uuid",
  "object": "chat_completion",
  "created": 1739577600,
  "model": "Llama-3.2-1B-Instruct-4bit",
  "choices": [
    { "message": { "role": "assistant", "content": "..." }, "index": 0, "finish_reason": "stop" }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 24,
    "total_tokens": 36
  }
}

Notes:

  • /v1/chat is also available as a legacy alias and returns the same response shape.
  • temperature is accepted and applied.
  • top_p, max_tokens, n, and stream are accepted for compatibility but currently ignored.

POST /v1/completions

Request body:

{
  "model": "Llama-3.2-1B-Instruct-4bit",
  "prompt": "Explain transformers simply.",
  "max_tokens": 128,
  "temperature": 0.7,
  "top_p": 1.0,
  "n": 1
}

Response:

{
  "id": "uuid",
  "object": "text_completion",
  "created": 1739577600,
  "model": "Llama-3.2-1B-Instruct-4bit",
  "choices": [
    {
      "text": "...",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 20,
    "total_tokens": 50
  }
}

Notes:

  • n defaults to 1 when omitted.
  • temperature is used.
  • max_tokens and top_p are accepted for compatibility but currently not applied.
  • n > 1 returns repeated choices containing the same generated text (not independent samples).
  • completion_tokens is an estimate.

POST /v1/audio/transcriptions

OpenAI-compatible speech-to-text endpoint.

Request (multipart/form-data) fields:

  • file (required): audio file.
  • model (required): model identifier (for compatibility).
  • language (optional): locale/BCP-47 language hint (for example en-US).
  • prompt (optional): context hint.
  • temperature (optional): must be in [0, 1].
  • response_format (optional): json (default), text, verbose_json, srt, or vtt.
  • timestamp_granularities or timestamp_granularities[] (optional, only with verbose_json): segment, word.

Supported upload extensions: flac, m4a, mp3, mp4, mpeg, mpga, ogg, wav, webm. Maximum upload size: 25MB.

Notes:

  • model is required for OpenAI compatibility, but transcription engine selection is currently independent of this value.
  • When response_format=verbose_json, omitting both timestamp_granularities and timestamp_granularities[] defaults to segment.
  • Multipart clients can send timestamp granularity as either repeated fields or a single scalar field.

json response example:

{
  "text": "Hello from local speech recognition."
}

verbose_json response example:

{
  "task": "transcribe",
  "language": "en-US",
  "duration": 1.42,
  "text": "Hello from local speech recognition.",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0,
      "end": 1.42,
      "text": "Hello from local speech recognition.",
      "tokens": [],
      "temperature": 0,
      "avg_logprob": 0,
      "compression_ratio": 0,
      "no_speech_prob": 0
    }
  ],
  "words": [
    { "word": "Hello", "start": 0, "end": 0.2 }
  ]
}

text, srt, and vtt return plain text bodies.

POST /v1/audio/speech

OpenAI-compatible text-to-speech endpoint.

Request body:

{
  "model": "tts-1",
  "input": "Hello from local text to speech.",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}

Behavior:

  • model is required for OpenAI compatibility, but speech engine selection is currently independent of this value.
  • Supported voice: alloy, echo, fable, onyx, nova, shimmer.
  • Supported response_format: mp3 (default), wav.
  • speed must be between 0.25 and 4.
  • input max length is 4096 characters.

Response:

  • Binary audio (audio/mpeg for mp3, audio/wav for wav).

Ollama-like API (/api)

Model Naming

  • /api/chat: model is optional.
  • If omitted, server tries, in order: current selected model in app settings, first installed model, then default model.
  • /api/generate: model is required.
  • /api/show: model is required.
  • Names may be provided with or without mlx-community/ prefix.

GET /api/tags

Response:

Returns installed models discovered by the app. When Apple Foundation Models are available on the device, apple-foundation-model is included.

{
  "models": [
    {
      "name": "llama-3.2-1b-instruct-4bit",
      "modified_at": "2026-02-15T12:00:00Z",
      "size": 0,
      "digest": "uuid",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": null,
        "parameter_size": "13B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

POST /api/chat

Request body:

{
  "model": "Llama-3.2-1B-Instruct-4bit",
  "messages": [
    { "role": "user", "content": "Hello" }
  ],
  "stream": false,
  "options": {
    "topK": 40,
    "topP": 0.9,
    "temperature": 0.7
  }
}

Response:

{
  "model": "Llama-3.2-1B-Instruct-4bit",
  "created_at": "2026-02-15T12:00:00Z",
  "message": { "role": "assistant", "content": "..." },
  "done": true
}

Notes:

  • stream is accepted for compatibility but currently ignored; response is non-streaming.
  • In options, only temperature is currently used; topK and topP are currently ignored.
  • For options, use camelCase keys (topK, topP, temperature).

POST /api/generate

Request body:

{
  "model": "Llama-3.2-1B-Instruct-4bit",
  "prompt": "Write a haiku about Swift.",
  "images": []
}

images can include URL strings, file paths, data: URLs, or base64 strings.

Response:

{
  "response": "...",
  "model": "Llama-3.2-1B-Instruct-4bit",
  "created_at": "2026-02-15T12:00:00Z"
}

POST /api/show

Request body:

{
  "model": "Llama-3.2-1B-Instruct-4bit",
  "verbose": false
}

Response (verbose: false):

{
  "license": "Unknown",
  "parameters": "temperature 0.7\nquantization Q4_0",
  "details": {
    "format": "safetensors",
    "family": "llama",
    "families": ["llama"],
    "parameter_size": "1B",
    "quantization_level": "Q4_0"
  },
  "capabilities": ["completion"],
  "modified_at": "2026-02-15T12:00:00Z"
}

Response (verbose: true) also includes modelfile, template, model_info, and messages.

If model is not installed, returns 404:

{
  "error": "model 'your-model-name' not found"
}

Message Format Details

messages[].content accepts:

  • a plain string, or
  • an object/array with mixed text and image parts (e.g. {"type":"text","text":"..."} and {"type":"image_url","image_url":"..."}).

Optional images can also be sent through messages[].images.

Supported roles include user, assistant, system, and tool.

cURL Examples

OpenAI-like chat:

curl http://<iphone-ip-address>:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer local-ai" \
  -d '{
    "model": "Llama-3.2-1B-Instruct-4bit",
    "messages": [{"role":"user","content":"Hello"}]
  }'

Ollama-like chat:

curl http://<iphone-ip-address>:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama-3.2-1B-Instruct-4bit",
    "messages": [{"role":"user","content":"Hello"}],
    "stream": false
  }'