Skip to main content

Local AI

Run supported local AI models on-device and chat with them (availability varies).

Local AI: chat view with a compact control bar.

Overview

Local AI provides an on-device chat UI with two backends:

  • Apple Foundation (when available on your OS/device)
  • LLM.swift (uses locally stored model files)

It also shows live CPU and memory usage so you can see the cost of loading and running a model.

Table Of Contents

Quick Start

  1. Open Tools -> Local AI.
  2. Choose a backend (Apple Foundation or LLM.swift).
  3. Tap Load.
  4. Type a prompt and send it.

Control Bar

At the top of the chat screen, the control bar has three expansion states:

Compact (Default)

Shows:

  • Model status (unloaded/loading/loaded/unavailable)
  • Backend selection menu
  • Model picker (LLM.swift only)
  • Load / Unload button

Middle Expanded

Tap the control bar to expand it and reveal additional indicators:

  • Live CPU usage gauge
  • Live Memory usage gauge

Full Expanded

Tap again to open the full detail screen with three cards:

  • Model Status Card -- shows the backend name, model name, and file size (for LLM.swift models). Includes backend selection and model picker menus.
  • Performance Card -- shows a "Baseline" vs "Now" comparison for CPU and memory usage. Tap Capture Baseline to snapshot the current values, then watch how loading and running a model changes resource consumption.
  • Actions Card -- contains Load Model / Unload Model, New Conversation (clears messages and reloads), Manage Models (opens the Model Library), and Export Conversation.

The control bar remembers its expansion state between sessions.

Tap Load to load the selected backend/model.
When loaded, the control bar shows a loaded state and exposes Unload.

Backends

Apple Foundation

Apple Foundation uses Apple's built-in FoundationModels framework. It requires iOS 26.0+ or visionOS 26.0+ and supported hardware. If it is not available on your device, Lirum shows an unavailable message. Availability is rechecked whenever the app comes to the foreground.

LLM.swift

LLM.swift runs GGUF model files locally on your device. It uses the ChatML message template and streams responses token by token as they are generated.

Technical details:

  • Conversation history is maintained with an 8-turn limit -- older messages are dropped to keep context manageable.
  • Responses have a 2-minute timeout. If a model does not produce output within that time, an error is shown.
  • Special model tokens (such as <|...|> markers) are automatically stripped from responses.
  • If a KV cache error occurs, Lirum shows a specific diagnostic message.

Model Library

Open the Model Library from the toolbar menu to download, manage, and select models. The library has three sections:

Installed Models

Lists all downloaded model folders with their name, file count, and total size. You can:

  • Select a model to use it with LLM.swift.
  • Import a GGUF file from the iOS Files app.
  • Enter selection mode to batch-export or batch-delete multiple models at once.

Catalog

A curated list of models bundled with the app. Each entry shows the model name, parameter count, and colored tags indicating characteristics:

TagMeaning
ChatGeneral-purpose conversational model
InstructionsTuned for following instructions
ReasoningDesigned for step-by-step reasoning
CodingOptimized for code generation
RecommendedTested and works well on-device
FastGenerates responses quickly
SlowMay be slow on some devices
TestedVerified to work in Lirum
ExperimentalMay produce inconsistent results
UntestedNot yet verified

Sort the catalog by Default, Alphabetical, Date (Newest/Oldest First), or Parameters (Largest/Smallest First).

Active Downloads

Shows any currently downloading models with:

  • Download progress (percentage, speed in MB/s, estimated time remaining)
  • Abort and Resume controls

Manual Model Entry

You can also add models manually in two ways:

  • Import from Files -- opens the iOS file picker for GGUF files and copies them with a progress display.
  • Manual URL download -- enter a direct download URL along with model name, quantization, and parameter count. Fields can be auto-filled from the catalog or parsed from the filename.
Model Library: manage and select local models for the LLM.swift backend.
Model details and actions (varies by model/backend).

Loading And Unloading

  • Load initializes the selected backend/model.
  • Unload releases the model and clears the current conversation.

Large models can take time to load and may fail if the device doesn't have enough free memory.

Chat

The main UI is a standard chat view:

  • Type a prompt and send it.
  • While a response is streaming, you can stop generation.
Enter a prompt in the chat composer.
After sending, the assistant begins generating a response.
Example response shown in the chat history.

Performance Snapshot

Local AI tracks CPU and memory usage while you use the tool.

In the expanded controls (AI Model panel), you can capture a baseline snapshot and compare baseline vs current CPU/memory.

Export Conversation

Use Export Conversation to share the current chat history. The conversation is exported as Markdown text with role prefixes (User: and Assistant:) for each message. You can then share it via any standard iOS sharing method.

Notes And Limitations

  • On-device models can use significant CPU and memory.
  • Model availability, download options, and performance vary by device and OS.
  • Apple Foundation requires iOS 26.0+ or visionOS 26.0+ and supported hardware.
  • LLM.swift is not available on macOS Catalyst builds.
  • Large models may fail to load if the device does not have enough free memory.
  • The LLM.swift backend has an 8-turn conversation history limit and a 2-minute response timeout.