Braintrust

Open in Colab

Braintrust lets you evaluate an LLM. This Colab shows you how to use Pi as a custom scorer within a Braintrust Eval.

Langfuse

Open in Colab

Langfuse provides a set of tools for building LLM applications.
Pi can serve as a Scoring function within Langfuse.

Promptfoo

Open in Colab

Promptfoo focuses on evaluating guardrails for LLM applications. Pi can be a custom metric within the platform.

Langsmith

Open in Colab

Langsmith by Langchain helps you build orchestrated applications. This notebook demonstrates how to use Pi as a Langsmith evaluator.

SFT

Open in Colab

This notebook demonstrates how to evaluate a model checkpoint using Pi with the Unsloth toolkit.

GRPO

Open in Colab

GRPO enables tuning a large language model using a robust reward function—without the need for labeled training data. This notebook demonstrates how to use Pi as the reward function with the Unsloth toolkit.

Helicone

See webhook integration examples

Helicone is an open-source observability platform for LLM applications. Pi can be a custom metric within the platform, helping you annotate quality in addition to performance.