Local AI
Run supported local AI models on-device and chat with them (availability varies).
Overview
Local AI provides an on-device chat UI with two backends:
- Apple Foundation (when available on your OS/device)
- LLM.swift (uses locally stored model files)
It also shows live CPU and memory usage so you can see the cost of loading and running a model.
Table Of Contents
- Quick Start
- Control Bar
- Backends
- Model Library
- Loading And Unloading
- Chat
- Performance Snapshot
- Export Conversation
- Notes And Limitations
Quick Start
- Open Tools -> Local AI.
- Choose a backend (Apple Foundation or LLM.swift).
- Tap Load.
- Type a prompt and send it.
Control Bar
At the top of the chat screen, the control bar has three expansion states:
Compact (Default)
Shows:
- Model status (unloaded/loading/loaded/unavailable)
- Backend selection menu
- Model picker (LLM.swift only)
- Load / Unload button
Middle Expanded
Tap the control bar to expand it and reveal additional indicators:
- Live CPU usage gauge
- Live Memory usage gauge
Full Expanded
Tap again to open the full detail screen with three cards:
- Model Status Card -- shows the backend name, model name, and file size (for LLM.swift models). Includes backend selection and model picker menus.
- Performance Card -- shows a "Baseline" vs "Now" comparison for CPU and memory usage. Tap Capture Baseline to snapshot the current values, then watch how loading and running a model changes resource consumption.
- Actions Card -- contains Load Model / Unload Model, New Conversation (clears messages and reloads), Manage Models (opens the Model Library), and Export Conversation.
The control bar remembers its expansion state between sessions.
Backends
Apple Foundation
Apple Foundation uses Apple's built-in FoundationModels framework. It requires iOS 26.0+ or visionOS 26.0+ and supported hardware. If it is not available on your device, Lirum shows an unavailable message. Availability is rechecked whenever the app comes to the foreground.
LLM.swift
LLM.swift runs GGUF model files locally on your device. It uses the ChatML message template and streams responses token by token as they are generated.
Technical details:
- Conversation history is maintained with an 8-turn limit -- older messages are dropped to keep context manageable.
- Responses have a 2-minute timeout. If a model does not produce output within that time, an error is shown.
- Special model tokens (such as
<|...|>markers) are automatically stripped from responses. - If a KV cache error occurs, Lirum shows a specific diagnostic message.
Model Library
Open the Model Library from the toolbar menu to download, manage, and select models. The library has three sections:
Installed Models
Lists all downloaded model folders with their name, file count, and total size. You can:
- Select a model to use it with LLM.swift.
- Import a GGUF file from the iOS Files app.
- Enter selection mode to batch-export or batch-delete multiple models at once.
Catalog
A curated list of models bundled with the app. Each entry shows the model name, parameter count, and colored tags indicating characteristics:
| Tag | Meaning |
|---|---|
| Chat | General-purpose conversational model |
| Instructions | Tuned for following instructions |
| Reasoning | Designed for step-by-step reasoning |
| Coding | Optimized for code generation |
| Recommended | Tested and works well on-device |
| Fast | Generates responses quickly |
| Slow | May be slow on some devices |
| Tested | Verified to work in Lirum |
| Experimental | May produce inconsistent results |
| Untested | Not yet verified |
Sort the catalog by Default, Alphabetical, Date (Newest/Oldest First), or Parameters (Largest/Smallest First).
Active Downloads
Shows any currently downloading models with:
- Download progress (percentage, speed in MB/s, estimated time remaining)
- Abort and Resume controls
Manual Model Entry
You can also add models manually in two ways:
- Import from Files -- opens the iOS file picker for GGUF files and copies them with a progress display.
- Manual URL download -- enter a direct download URL along with model name, quantization, and parameter count. Fields can be auto-filled from the catalog or parsed from the filename.
Loading And Unloading
- Load initializes the selected backend/model.
- Unload releases the model and clears the current conversation.
Large models can take time to load and may fail if the device doesn't have enough free memory.
Chat
The main UI is a standard chat view:
- Type a prompt and send it.
- While a response is streaming, you can stop generation.
Performance Snapshot
Local AI tracks CPU and memory usage while you use the tool.
In the expanded controls (AI Model panel), you can capture a baseline snapshot and compare baseline vs current CPU/memory.
Export Conversation
Use Export Conversation to share the current chat history. The conversation is exported as Markdown text with role prefixes (User: and Assistant:) for each message. You can then share it via any standard iOS sharing method.
Notes And Limitations
- On-device models can use significant CPU and memory.
- Model availability, download options, and performance vary by device and OS.
- Apple Foundation requires iOS 26.0+ or visionOS 26.0+ and supported hardware.
- LLM.swift is not available on macOS Catalyst builds.
- Large models may fail to load if the device does not have enough free memory.
- The LLM.swift backend has an 8-turn conversation history limit and a 2-minute response timeout.