Open in ColabGRPO enables tuning a large language model using a robust reward function—without the need for labeled training data.
This notebook demonstrates how to use Pi as the reward function with the Unsloth toolkit.
See webhook integration examplesHelicone is an open-source observability platform for LLM applications.
Pi can be a custom metric within the platform, helping you annotate quality in addition to performance.