Skip to Content
Models (2)

Last Updated: 4/8/2026


Supported Models

Pie supports a wide range of open-source language models.

Compatible Models

Llama Family

  • Llama 3.2 (1B, 3B, 8B, 70B)
  • Llama 3.1 (8B, 70B, 405B)
  • Llama 3 (8B, 70B)
  • Llama 2 (7B, 13B, 70B)

Mistral Family

  • Mistral 7B
  • Mixtral 8x7B
  • Mixtral 8x22B

Other Models

  • Qwen 2.5 (0.5B - 72B)
  • Phi-3 (mini, small, medium)
  • Gemma 2 (2B, 9B, 27B)

Model Loading

Via CLI

pie serve --model llama-3.2-1b

Via Configuration

[models] default = "llama-3.2-1b" cache_dir = "~/.pie/models"

Programmatically

use inferlet::get_model; let model = get_model("llama-3.2-1b");

Model Requirements

GPU Memory

Model SizeMinimum VRAM
1B4 GB
3B8 GB
7-8B16 GB
13B24 GB
70B80 GB (multi-GPU)

Quantization

Pie supports quantized models:

  • FP16 - Full precision
  • INT8 - 8-bit quantization
  • INT4 - 4-bit quantization (GPTQ, AWQ)

Custom Models

Add custom models compatible with HuggingFace format:

pie serve --model /path/to/custom/model

Next Steps