Last Updated: 4/8/2026
Supported Models
Pie supports a wide range of open-source language models.
Compatible Models
Llama Family
- Llama 3.2 (1B, 3B, 8B, 70B)
- Llama 3.1 (8B, 70B, 405B)
- Llama 3 (8B, 70B)
- Llama 2 (7B, 13B, 70B)
Mistral Family
- Mistral 7B
- Mixtral 8x7B
- Mixtral 8x22B
Other Models
- Qwen 2.5 (0.5B - 72B)
- Phi-3 (mini, small, medium)
- Gemma 2 (2B, 9B, 27B)
Model Loading
Via CLI
pie serve --model llama-3.2-1bVia Configuration
[models]
default = "llama-3.2-1b"
cache_dir = "~/.pie/models"Programmatically
use inferlet::get_model;
let model = get_model("llama-3.2-1b");Model Requirements
GPU Memory
| Model Size | Minimum VRAM |
|---|---|
| 1B | 4 GB |
| 3B | 8 GB |
| 7-8B | 16 GB |
| 13B | 24 GB |
| 70B | 80 GB (multi-GPU) |
Quantization
Pie supports quantized models:
- FP16 - Full precision
- INT8 - 8-bit quantization
- INT4 - 4-bit quantization (GPTQ, AWQ)
Custom Models
Add custom models compatible with HuggingFace format:
pie serve --model /path/to/custom/modelNext Steps
- Learn about Writing Inferlets
- Check Installation for GPU setup
- See CLI Reference for model management