Skip to Content
Index (4)

Last Updated: 4/8/2026


Welcome to Pie 🥧

Pie is a programmable system for LLM serving that lets you control the serving loop itself—not just send prompts and wait for responses.

Why Pie?

Modern AI applications need more than simple text completion. They require:

  • Fine-grained control over inference and caching
  • Integrated I/O without round-trip latency
  • Custom generation workflows for agentic patterns

Pie makes all of this possible through programmable inferlets.


⚡ Quick Example

Here’s a simple inferlet in Rust:

use inferlet::{Args, Result, Sampler, get_auto_model}; use inferlet::stop_condition::{max_len, ends_with_any}; #[inferlet::main] async fn main(mut args: Args) -> Result<String> { let model = get_auto_model(); let mut ctx = model.create_context(); ctx.fill_system("You are a helpful assistant."); ctx.fill_user("How are you?"); let sampler = Sampler::top_p(0.6, 0.95); let stop = max_len(100).or(ends_with_any(model.eos_tokens())); let response = ctx.generate(sampler, stop).await; Ok(response) }

Compile to WebAssembly and run:

pie run my-inferlet -- --prompt "Hello!"

🚀 Key Features

Programmable Serving

Write custom decoding logic and resource management policies in your inferlets. Control KV cache, embeddings, and generation at a fine-grained level.

Application-Aware Optimization

Optimize for your specific use case—whether it’s tree-of-thought reasoning, multi-agent systems, or custom workflows.

Integrated Computation & I/O

Call external APIs, run code, and manage data directly in your serving loop without extra round-trips.

Multi-Language Support

Write inferlets in Rust, C++, Go, or any language that compiles to WebAssembly.

Framework Agnostic

Client libraries for Python, JavaScript, and Rust make it easy to integrate with any stack.


📖 Getting Started

Ready to dive in? Here’s your path:

  1. Installation - Get Pie installed on your system
  2. Quickstart Tutorial - Build your first inferlet in 5 minutes
  3. Core Concepts - Understand Pie’s architecture and design
  4. Client SDKs - Integrate Pie into your applications

🎯 Use Cases

Pie excels at:

  • Agentic Workflows - Multi-step reasoning with tool use
  • Custom Decoding - Speculative decoding, beam search, MCTS
  • Interactive Applications - Chat, code completion, real-time generation
  • Research - Experiment with novel serving strategies
  • Production Systems - High-throughput, low-latency inference

🌟 Performance

Pie delivers significant performance improvements for complex workflows:

  • Lower latency through application-aware KV cache management
  • Higher throughput with fine-grained resource control
  • Better efficiency by eliminating unnecessary round-trips

Benchmarked on Llama 3.2 1B using L40 GPU


🧑‍💻 Community


📚 Documentation Sections

For New Users

For Developers

Advanced Topics


📄 Research

Pie is backed by academic research:


ResourceDescription
InstallationInstall Pie on your system
QuickstartBuild your first inferlet
Python ClientPython SDK documentation
CLI ReferenceCommand-line interface guide
Examples Sample inferlets and code

Ready to get started? Head to the Installation Guide or jump into the Quickstart Tutorial!