Local Inference

Morphik comes with built-in support for running both embeddings and completions locally, ensuring your data never leaves your machine. Choose between two powerful local inference engines:

Lemonade - Windows-only, optimized for AMD GPUs and NPUs
Ollama - Cross-platform (Windows, macOS, Linux), supports various hardware

Both are pre-configured in Morphik and can be selected through the UI or configuration file.

Why Local Inference?

Running models locally provides several key advantages:

Complete Privacy: Your data never leaves your machine
No API Costs: Eliminate ongoing API expenses
Low Latency: No network round-trips for inference
Offline Capability: Work without internet connectivity
Hardware Acceleration: Leverage your local GPU, NPU, or specialized AI processors

Lemonade
Ollama

🍋

Lemonade

Run embeddings & completions locally with AMD GPU/NPU acceleration

Lemonade SDK provides high-performance local inference on Windows, with optimizations for AMD hardware. It exposes an OpenAI-compatible API and is already configured in Morphik.

Built-in Support: Lemonade models are pre-configured in morphik.toml for both embeddings and completions. Simply install Lemonade Server and select the models in the UI.

System Requirements

Windows 10/11 only (x86/x64)
8GB+ RAM (16GB recommended)
Python 3.10+
Optional but recommended:
- AMD Ryzen AI 300 series (NPU acceleration)
- AMD Radeon 7000/9000 series (GPU acceleration)

Quick Start

Download Lemonade

Download and install Lemonade from the official site: lemonade-server.ai.

Start Lemonade Server

Start the Lemonade server following their documentation. Make sure it is running and note the port. The API is OpenAI-compatible (e.g., /api/v1/models).

Configure Morphik - Two Options

Option 1: Using the UI (Recommended)

Open the Morphik UI and go to Settings → API Keys
Select “Lemonade” (🍋). No API key is required
Enter the host and port where Lemonade is running
Open Chat and use the model selector pill (top left) to pick a Lemonade model

Running inside Docker? Use host.docker.internal instead of localhost for the host field.

If you are not using a vision-capable model, turn off ColPali in chat settings (settings → ColPali) to avoid vision-dependent paths.

Option 2: Edit morphik.toml

You can also set Lemonade models directly in morphik.toml so they’re used by default. Ensure the api_base points to your Lemonade server:

lemonade_qwen = {
  model_name = "openai/Qwen2.5-VL-7B-Instruct-GGUF",
  api_base = "http://localhost:8020/api/v1",
  vision = true
}
lemonade_embedding = {
  model_name = "openai/nomic-embed-text-v1-GGUF",
  api_base = "http://localhost:8020/api/v1"
}

[completion]
model = "lemonade_qwen"

[embedding]
model = "lemonade_embedding"

If your system has under 16GB RAM, prefer models under ~4B parameters or smaller quantizations (e.g., Q4/Q5). Larger models may fail to load or will be very slow on low-memory systems.

Performance Tips

Model Quantization: Use GGUF quantized models for better performance
Low-memory systems: Under 16GB RAM, prefer models under 4B parameters
Hardware Acceleration: Automatically detects and uses AMD GPUs/NPUs when available
Memory Management: Models are cached after first download

Troubleshooting

Connection Issues

Verify server health: curl http://localhost:8020/health
List models: curl http://localhost:8020/api/v1/models
For Docker: Use host.docker.internal instead of localhost
Check firewall settings for port 8020

Model Loading Errors

Ensure sufficient disk space (5-15GB per model)
Try smaller quantized versions (Q4, Q5)
Check model compatibility with lemonade list

Performance Issues

Use GGUF quantized models for better performance
Monitor GPU/NPU usage with system tools
Adjust batch size and context length in model config

Get Started

Concepts

Using Morphik

Cookbooks

Instance Management

Self-Hosting

Morphik Community

Why Local Inference?

Lemonade

System Requirements

Quick Start

Option 1: Using the UI (Recommended)

Option 2: Edit morphik.toml

Performance Tips

Troubleshooting

Get Started

Concepts

Using Morphik

Cookbooks

Instance Management

Self-Hosting

Morphik Community

​Why Local Inference?

​Lemonade

​System Requirements

​Quick Start

​Option 1: Using the UI (Recommended)

​Option 2: Edit morphik.toml

​Performance Tips

​Troubleshooting

Why Local Inference?

Lemonade

System Requirements

Quick Start

Option 1: Using the UI (Recommended)

Option 2: Edit morphik.toml

Performance Tips

Troubleshooting