Last Updated: 4/8/2026

Supported Models

Pie supports a wide range of open-source language models.

Compatible Models

Llama Family

Llama 3.2 (1B, 3B, 8B, 70B)
Llama 3.1 (8B, 70B, 405B)
Llama 3 (8B, 70B)
Llama 2 (7B, 13B, 70B)

Mistral Family

Mistral 7B
Mixtral 8x7B
Mixtral 8x22B

Other Models

Qwen 2.5 (0.5B - 72B)
Phi-3 (mini, small, medium)
Gemma 2 (2B, 9B, 27B)

Model Loading

Via CLI


pie serve --model llama-3.2-1b

Via Configuration


[models]
default = "llama-3.2-1b"
cache_dir = "~/.pie/models"

Programmatically


use inferlet::get_model;
 
let model = get_model("llama-3.2-1b");

Model Requirements

GPU Memory

Model Size	Minimum VRAM
1B	4 GB
3B	8 GB
7-8B	16 GB
13B	24 GB
70B	80 GB (multi-GPU)

Quantization

Pie supports quantized models:

FP16 - Full precision
INT8 - 8-bit quantization
INT4 - 4-bit quantization (GPTQ, AWQ)

Custom Models

Add custom models compatible with HuggingFace format:


pie serve --model /path/to/custom/model

Next Steps

Learn about Writing Inferlets
Check Installation for GPU setup
See CLI Reference for model management

Cli (2)Server Mode