opik

Opik is an open-source LLM evaluation framework designed to help build better, faster, and cheaper LLM systems through tracing, evaluations, and dashboards for RAG chatbots, code assistants, and agentic pipelines.

12,031

413

Opik: Open Source LLM Evaluation Framework

Opik is an open-source platform designed for evaluating, testing, and monitoring LLM applications, built by Comet. It helps build better, faster, and cheaper LLM systems through tracing, evaluations, and dashboards.

Opik offers features for development, evaluation, and production monitoring. For development, it provides tracing, annotations via Python SDK or UI, and a prompt playground. Evaluation features include datasets and experiments, LLM as a judge metrics for hallucination detection and RAG evaluation, and CI/CD integration via PyTest. Production monitoring includes logging production traces, monitoring dashboards, and online evaluation metrics.

Installation is available through a free Comet account or self-hosting using Docker Compose. The Python SDK can be installed via pip install opik and configured with opik configure. Opik supports integrations with OpenAI, LangChain, and others, and offers a track decorator for logging traces. It also includes LLM as a judge metrics and supports evaluation through datasets, experiments, and CI/CD integration.

Repository

comet-ml

comet-ml/opik

Created

May 10, 2023

Updated

March 28, 2025

Language

Python