**Concise Description:** Deploy agents, models, RAG & pipelines easily. MCP server simplifies AI deployment. No YAML/MLOps needed.
<div align='center'>
<h1>
  The Easiest Way to Deploy Agents, MCP Servers, RAG, Pipelines, and Any Model.
  <br/>
  No MLOps. No YAML.
</h1>
<img alt="Lightning" src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_banner2.png" width="800px" style="max-width: 100%;">
</div>
LitServe lets you serve any model (vision, audio, text) and build full AI systems - agents, chatbots, MCP servers, RAG, pipelines - with full control, batching, multi-GPU, streaming, custom logic, and multi-model support, all without YAML.  Unlike most serving engines that serve one model with rigid abstractions, LitServe gives you the flexibility to build complex AI systems.
Self-host or deploy in one-click to [Lightning AI](https://lightning.ai/).
<div align='center'>
✅ Build full AI systems   ✅ 2× faster than FastAPI     ✅ Agents, RAG, pipelines, more
✅ Custom logic + control  ✅ Any PyTorch model          ✅ Self-host or managed
✅ Multi-GPU autoscaling   ✅ Batching + streaming       ✅ BYO model or vLLM
✅ No MLOps glue code      ✅ Easy setup in Python       ✅ Serverless support
<div align='center'>
[](https://pepy.tech/projects/litserve)
[](https://discord.gg/WajDThKAur)

[](https://codecov.io/gh/Lightning-AI/litserve)
[](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
</div>
</div>
<div align="center">
  <div style="text-align: center;">
    <a href="#quick-start">Quick Start</a> •
    <a href="#featured-examples">Examples</a> •
    <a href="#features">Features</a> •
    <a href="#performance">Performance</a> •
    <a href="#host-anywhere">Hosting</a> •
    <a href="https://lightning.ai/docs/litserve">Docs</a>
  </div>
</div>
<div align="center">
<a href="https://lightning.ai/docs/litserve/home/get-started">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg" height="36px" alt="Get started"/>
</a>
</div>
## Quick Start
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install litserveExamples:
import litserve as ls
# Define the API to include any number of models, databases, etc.
class InferencePipeline(ls.LitAPI):
    def setup(self, device):
        self.model1 = lambda x: x**2
        self.model2 = lambda x: x**3
    def predict(self, request):
        x = request["input"]
        # Perform calculations using both models
        a = self.model1(x)
        b = self.model2(x)
        c = a + b
        return {"output": c}
if __name__ == "__main__":
    # 12+ features like batching, streaming, etc.
    server = ls.LitServer(InferencePipeline(max_batch_size=1), accelerator="auto")
    server.run(port=8000)Deploy for free to Lightning cloud (or self-host anywhere):
# Deploy for free with autoscaling, monitoring, etc.
lightning deploy server.py --cloud
# Or run locally (self host anywhere)
lightning deploy server.py
# python server.pyTest the server: Simulate an HTTP request (run this on any terminal):
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'import re, requests, openai
import litserve as ls
class NewsAgent(ls.LitAPI):
    def setup(self, device):
        self.openai_client = openai.OpenAI(api_key="OPENAI_API_KEY")
    def predict(self, request):
        website_url = request.get("website_url", "https://text.npr.org/")
        website_text = re.sub(r'<[^>]+>', ' ', requests.get(website_url).text)
        # Ask the LLM to tell you about the news
        llm_response = self.openai_client.chat.completions.create(
           model="gpt-3.5-turbo",
           messages=[{"role": "user", "content": f"Based on this, what is the latest: {website_text}"}],
        )
        output = llm_response.choices[0].message.content.strip()
        return {"output": output}
if __name__ == "__main__":
    server = ls.LitServer(NewsAgent())
    server.run(port=8000)Test it:
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"website_url": "https://text.npr.org/"}'Here are a few key benefits of using LitServe:
setup() (more).⚠️ Not a vLLM or Ollama alternative out of the box. LitServe gives you lower-level flexibility to build what they do (and more) if you need it.
Here are examples of inference pipelines for common model types and use cases.
**Toy model:**      <a href="#define-a-server">Hello world</a>
**LLMs:**           <a href="https://lightning.ai/lightning-ai/studios/deploy-llama-3-2-vision-with-litserve">Llama 3.2</a>, <a href="https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server">LLM Proxy server</a>, <a href="https://lightning.ai/lightning-ai/studios/deploy-ai-agent-with-tool-use">Agent with tool use</a>
**RAG:**            <a href="https://lightning.ai/lightningLightning-AI/LitServe
December 12, 2023
July 7, 2025
Python