Braintrust

Open in Colab

Braintrust lets you evaluate an LLM. This Colab shows you how to use Pi as a custom scorer within a Braintrust Eval.

Langfuse

Open in Colab

Langfuse provides a set of tools for building LLM applications.
Pi can serve as a Scoring function within Langfuse.

Promptfoo

Open in Colab

Promptfoo focuses on evaluating guardrails for LLM applications. Pi can be a custom metric within the platform.

Langsmith

Open in Colab

Langsmith by Langchain helps you build orchestrated applications. This notebook demonstrates how to use Pi as a Langsmith evaluator.

SFT

Open in Colab

This notebook demonstrates how to evaluate a model checkpoint using Pi with the Unsloth toolkit.

GRPO

Open in Colab

GRPO enables tuning a large language model using a robust reward function—without the need for labeled training data. This notebook demonstrates how to use Pi as the reward function with the Unsloth toolkit.