Client Libraries
API Reference
- Scoring System
- Prompt Optimization
- Data Generation
- Model Training
- Search
Start Job
Starts a Scoring Spec Calibration job
import PiClient from 'withpi';
const client = new PiClient({
apiKey: process.env['WITHPI_API_KEY'], // This is the default and can be omitted
});
async function main() {
const response = await client.scoringSystem.calibrate.startJob({
scoring_spec: {
description: "Write a children's story communicating a simple life lesson.",
dimensions: [
{
description: 'dimension1 description',
label: 'dimension1',
sub_dimensions: [
{ description: 'subdimension1 description', label: 'subdimension1', scoring_type: 'PI_SCORER' },
],
},
],
name: 'Sample Scoring Spec',
},
});
console.log(response.job_id);
}
main();
{
"calibrated_scoring_spec": {
"description": "Write a children's story communicating a simple life lesson.",
"dimensions": [
{
"description": "dimension1 description",
"label": "dimension1",
"sub_dimensions": [
{
"description": "subdimension1 description",
"label": "subdimension1",
"scoring_type": "PI_SCORER"
}
]
}
],
"name": "Sample Scoring Spec"
},
"detailed_status": [
"Downloading model",
"Tuning prompt"
],
"job_id": "1234abcd",
"state": "RUNNING"
}
Authorizations
Body
The scoring spec to calibrate
The application description
"Write a children's story communicating a simple life lesson."
The dimensions of the scoring spec
The description of the dimension
"Relevance of the response"
The label of the dimension
"Relevance"
The sub dimensions of the dimension
The description of the dimension
"Is the response relevant to the prompt?"
The label of the dimension
"Relevance to Prompt"
The type of scoring performed for this dimension
PI_SCORER
, PYTHON_CODE
, CUSTOM_MODEL_SCORER
The ID of the custom model to use for scoring. Only relevant for scoring_type of CUSTOM_MODEL_SCORER
"your-model-id"
The learned parameters for the scoring method. This represents piecewise linear interpolation between [0, 1].
[
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857143,
0.8571428571428571
]
The PYTHON code associated the PYTHON_CODE DimensionScoringType.
"\ndef score(response_text: str, input_text: str, kwargs: dict) -> dict:\n word_count = len(response_text.split())\n if word_count > 10:\n return {\"score\": 0.2, \"explanation\": \"Response has more than 10 words\"}\n elif word_count > 5:\n return{\"score\": 0.6, \"explanation\": \"Response has more than 5 words\"}\n else:\n return {\"score\": 1, \"explanation\": \"Response has 5 or fewer words\"}\n"
The weight of the subdimension. The sum of subdimension weights will be normalized to one internally. A higher weight counts for more when aggregating this subdimension into the parent dimension.
1
{
"description": "subdimension1 description",
"label": "subdimension1",
"scoring_type": "PI_SCORER"
}
The learned parameters for the scoring method. This represents piecewise linear interpolation between [0, 1].
[
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857143,
0.8571428571428571
]
The weight of the dimension The sum of dimension weights will be normalized to one internally. A higher weight counts for more when aggregating this dimension is aggregated into the final score.
1
[
{
"description": "dimension1 description",
"label": "dimension1",
"sub_dimensions": [
{
"description": "subdimension1 description",
"label": "subdimension1",
"scoring_type": "PI_SCORER"
}
]
}
]
The name of the scoring spec
"Sample Scoring Spec"
Rated examples to use when calibrating the scoring spec. Must specify either the examples or the preference examples
An labeled example for training or evaluation
The input to LLM
"Tell me something different"
The output to evaluate
"The lazy dog was jumped over by the quick brown fox"
The rating of the llm_output given the llm_input
Strongly Agree
, Agree
, Neutral
, Disagree
, Strongly Disagree
Preference examples to use when calibrating the scoring spec. Must specify either the examples or preference examples
An preference example for training or evaluation
The chosen output in corresponding to the llm_input.
"The lazy dog was jumped over by the quick brown fox"
The input to LLM
"Tell me something different"
The rejected output in corresponding to the llm_input.
"The lazy dog was flied over by the quick brown fox"
The strategy to use to calibrate the scoring spec. FULL would take longer than LITE but may result in better result.
LITE
, FULL
Response
Detailed status of the job
["Downloading model", "Tuning prompt"]
The job id
"1234abcd"
Current state of the job
QUEUED
, RUNNING
, DONE
, ERROR
, CANCELLED
The calibrated scoring spec
The application description
"Write a children's story communicating a simple life lesson."
The dimensions of the scoring spec
The description of the dimension
"Relevance of the response"
The label of the dimension
"Relevance"
The sub dimensions of the dimension
The description of the dimension
"Is the response relevant to the prompt?"
The label of the dimension
"Relevance to Prompt"
The type of scoring performed for this dimension
PI_SCORER
, PYTHON_CODE
, CUSTOM_MODEL_SCORER
The ID of the custom model to use for scoring. Only relevant for scoring_type of CUSTOM_MODEL_SCORER
"your-model-id"
The learned parameters for the scoring method. This represents piecewise linear interpolation between [0, 1].
[
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857143,
0.8571428571428571
]
The PYTHON code associated the PYTHON_CODE DimensionScoringType.
"\ndef score(response_text: str, input_text: str, kwargs: dict) -> dict:\n word_count = len(response_text.split())\n if word_count > 10:\n return {\"score\": 0.2, \"explanation\": \"Response has more than 10 words\"}\n elif word_count > 5:\n return{\"score\": 0.6, \"explanation\": \"Response has more than 5 words\"}\n else:\n return {\"score\": 1, \"explanation\": \"Response has 5 or fewer words\"}\n"
The weight of the subdimension. The sum of subdimension weights will be normalized to one internally. A higher weight counts for more when aggregating this subdimension into the parent dimension.
1
{
"description": "subdimension1 description",
"label": "subdimension1",
"scoring_type": "PI_SCORER"
}
The learned parameters for the scoring method. This represents piecewise linear interpolation between [0, 1].
[
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857143,
0.8571428571428571
]
The weight of the dimension The sum of dimension weights will be normalized to one internally. A higher weight counts for more when aggregating this dimension is aggregated into the final score.
1
[
{
"description": "dimension1 description",
"label": "dimension1",
"sub_dimensions": [
{
"description": "subdimension1 description",
"label": "subdimension1",
"scoring_type": "PI_SCORER"
}
]
}
]
The name of the scoring spec
"Sample Scoring Spec"
import PiClient from 'withpi';
const client = new PiClient({
apiKey: process.env['WITHPI_API_KEY'], // This is the default and can be omitted
});
async function main() {
const response = await client.scoringSystem.calibrate.startJob({
scoring_spec: {
description: "Write a children's story communicating a simple life lesson.",
dimensions: [
{
description: 'dimension1 description',
label: 'dimension1',
sub_dimensions: [
{ description: 'subdimension1 description', label: 'subdimension1', scoring_type: 'PI_SCORER' },
],
},
],
name: 'Sample Scoring Spec',
},
});
console.log(response.job_id);
}
main();
{
"calibrated_scoring_spec": {
"description": "Write a children's story communicating a simple life lesson.",
"dimensions": [
{
"description": "dimension1 description",
"label": "dimension1",
"sub_dimensions": [
{
"description": "subdimension1 description",
"label": "subdimension1",
"scoring_type": "PI_SCORER"
}
]
}
],
"name": "Sample Scoring Spec"
},
"detailed_status": [
"Downloading model",
"Tuning prompt"
],
"job_id": "1234abcd",
"state": "RUNNING"
}