Skip to main content
POST
/
scoring_system
/
calibrate
JavaScript
import PiClient from 'withpi';

const client = new PiClient({
  apiKey: 'My API Key',
});

const scoringSpecCalibrationStatus = await client.scoringSystem.calibrate.startJob({
  examples: [
    { llm_input: 'good input', llm_output: 'good response' },
    { llm_input: 'neutral input', llm_output: 'neutral response' },
  ],
  preference_examples: [
    { chosen: 'chosen response', llm_input: 'some input', rejected: 'rejected response' },
  ],
  scoring_spec: [{ question: 'Is this response truthful?' }, { question: 'Is this response relevant?' }],
});

console.log(scoringSpecCalibrationStatus.job_id);
{
  "calibrated_scoring_spec": [
    {
      "custom_model_id": "your-model-id",
      "label": "Relevance to Prompt",
      "parameters": [
        0.14285714285714285,
        0.2857142857142857,
        0.42857142857142855,
        0.5714285714285714,
        0.7142857142857143,
        0.8571428571428571
      ],
      "python_code": "\ndef score(response_text: str, input_text: str, kwargs: dict) -> dict:\n    word_count = len(response_text.split())\n    if word_count > 10:\n        return {\"score\": 0.2, \"explanation\": \"Response has more than 10 words\"}\n    elif word_count > 5:\n        return{\"score\": 0.6, \"explanation\": \"Response has more than 5 words\"}\n    else:\n        return {\"score\": 1, \"explanation\": \"Response has 5 or fewer words\"}\n",
      "question": "Is the response relevant to the prompt?",
      "scoring_type": "PI_SCORER",
      "tag": "Legal Formatting",
      "weight": 1
    }
  ],
  "detailed_status": [
    "Downloading model",
    "Tuning prompt"
  ],
  "job_id": "1234abcd",
  "state": "RUNNING"
}

Authorizations

x-api-key
string
header
required

Body

application/json
examples
SDKLabeledExample · object[]
required

Rated examples to use when calibrating the scoring spec. Must specify either the examples or the preference examples

Examples:
[
{
"llm_input": "good input",
"llm_output": "good response",
"score": 0.9
},
{
"llm_input": "neutral input",
"llm_output": "neutral response",
"score": 0.5
}
]
preference_examples
SDKPreferenceExample · object[]
required

Preference examples to use when calibrating the scoring spec. Must specify either the examples or preference examples

Examples:
[
{
"chosen": "chosen response",
"llm_input": "some input",
"rejected": "rejected response"
}
]
scoring_spec
Question · object[]
required

Either a scoring spec or a list of questions to score

Examples:
[
{ "question": "Is this response truthful?" },
{ "question": "Is this response relevant?" }
]
strategy
enum<string>

The strategy to use to calibrate the scoring spec. FULL would take longer than LITE but may result in better result.

Available options:
LITE,
FULL

Response

Successful Response

detailed_status
string[]
required

Detailed status of the job

Examples:
["Downloading model", "Tuning prompt"]
job_id
string
required

The job id

Examples:

"1234abcd"

state
enum<string>
required

Current state of the job

Available options:
QUEUED,
RUNNING,
DONE,
ERROR,
CANCELLED
calibrated_scoring_spec
Question · object[] | null

The calibrated scoring spec

I