POST
/
scoring_system
/
calibrate
import PiClient from 'withpi';

const client = new PiClient({
  apiKey: process.env['WITHPI_API_KEY'], // This is the default and can be omitted
});

async function main() {
  const scoringSpecCalibrationStatus = await client.scoringSystem.calibrate.startJob({
    examples: [
      { llm_input: 'good input', llm_output: 'good response', rating: 'Strongly Agree' },
      { llm_input: 'neutral input', llm_output: 'neutral response', rating: 'Neutral' },
    ],
    preference_examples: [
      { chosen: 'chosen response', llm_input: 'some input', rejected: 'rejected response' },
    ],
    scoring_spec: [{ question: 'Is this response truthful?' }, { question: 'Is this response relevant?' }],
  });

  console.log(scoringSpecCalibrationStatus.job_id);
}

main();
{
  "calibrated_scoring_spec": [
    {
      "custom_model_id": "your-model-id",
      "is_lower_score_desirable": "False",
      "label": "Relevance to Prompt",
      "parameters": [
        0.14285714285714285,
        0.2857142857142857,
        0.42857142857142855,
        0.5714285714285714,
        0.7142857142857143,
        0.8571428571428571
      ],
      "python_code": "\ndef score(response_text: str, input_text: str, kwargs: dict) -> dict:\n    word_count = len(response_text.split())\n    if word_count > 10:\n        return {\"score\": 0.2, \"explanation\": \"Response has more than 10 words\"}\n    elif word_count > 5:\n        return{\"score\": 0.6, \"explanation\": \"Response has more than 5 words\"}\n    else:\n        return {\"score\": 1, \"explanation\": \"Response has 5 or fewer words\"}\n",
      "question": "Is the response relevant to the prompt?",
      "scoring_type": "PI_SCORER",
      "tag": "Legal Formatting",
      "weight": 1
    }
  ],
  "detailed_status": [
    "Downloading model",
    "Tuning prompt"
  ],
  "job_id": "1234abcd",
  "state": "RUNNING"
}

Authorizations

x-api-key
string
header
required

Body

application/json
examples
object[]
required

Rated examples to use when calibrating the scoring spec. Must specify either the examples or the preference examples

An labeled example for training or evaluation

Example:
[
  {
    "llm_input": "good input",
    "llm_output": "good response",
    "rating": "Strongly Agree"
  },
  {
    "llm_input": "neutral input",
    "llm_output": "neutral response",
    "rating": "Neutral"
  }
]
preference_examples
object[]
required

Preference examples to use when calibrating the scoring spec. Must specify either the examples or preference examples

An preference example for training or evaluation

Example:
[
  {
    "chosen": "chosen response",
    "llm_input": "some input",
    "rejected": "rejected response"
  }
]
scoring_spec
required

Either a scoring spec or a list of questions to score

Example:
[
  {
    "is_lower_score_desirable": false,
    "question": "Is this response truthful?"
  },
  {
    "is_lower_score_desirable": false,
    "question": "Is this response relevant?"
  }
]
strategy
enum<string>

The strategy to use to calibrate the scoring spec. FULL would take longer than LITE but may result in better result.

Available options:
LITE,
FULL

Response

200
application/json
Successful Response
detailed_status
string[]
required

Detailed status of the job

Example:
["Downloading model", "Tuning prompt"]
job_id
string
required

The job id

Example:

"1234abcd"

state
enum<string>
required

Current state of the job

Available options:
QUEUED,
RUNNING,
DONE,
ERROR,
CANCELLED
calibrated_scoring_spec
object[] | null

The calibrated scoring spec