Start Job

JavaScript

import PiClient from 'withpi';

const client = new PiClient({
  apiKey: 'My API Key',
});

const response = await client.scoringSystem.generate.startJob({
  application_description: "Write a children's story communicating a simple life lesson.",
  examples: [
    { llm_input: 'good input', llm_output: 'good response' },
    { llm_input: 'neutral input', llm_output: 'neutral response' },
  ],
  preference_examples: [
    { chosen: 'chosen response', llm_input: 'some input', rejected: 'rejected response' },
  ],
});

console.log(response.job_id);

{
  "balanced_accuracy": 123,
  "detailed_status": [
    "Downloading model",
    "Tuning prompt"
  ],
  "f1": 123,
  "job_id": "1234abcd",
  "num_labeled_examples_used": 123,
  "num_preference_examples_used": 123,
  "precision": 123,
  "recall": 123,
  "scoring_spec": [
    {
      "custom_model_id": "your-model-id",
      "label": "Relevance to Prompt",
      "parameters": [
        0.14285714285714285,
        0.2857142857142857,
        0.42857142857142855,
        0.5714285714285714,
        0.7142857142857143,
        0.8571428571428571
      ],
      "python_code": "\ndef score(response_text: str, input_text: str, kwargs: dict) -> dict:\n    word_count = len(response_text.split())\n    if word_count > 10:\n        return {\"score\": 0.2, \"explanation\": \"Response has more than 10 words\"}\n    elif word_count > 5:\n        return{\"score\": 0.6, \"explanation\": \"Response has more than 5 words\"}\n    else:\n        return {\"score\": 1, \"explanation\": \"Response has 5 or fewer words\"}\n",
      "question": "Is the response relevant to the prompt?",
      "scoring_type": "PI_SCORER",
      "tag": "Legal Formatting",
      "weight": 1
    }
  ],
  "state": "RUNNING",
  "threshold": 123
}

POST

scoring_system

generate

JavaScript

import PiClient from 'withpi';

const client = new PiClient({
  apiKey: 'My API Key',
});

const response = await client.scoringSystem.generate.startJob({
  application_description: "Write a children's story communicating a simple life lesson.",
  examples: [
    { llm_input: 'good input', llm_output: 'good response' },
    { llm_input: 'neutral input', llm_output: 'neutral response' },
  ],
  preference_examples: [
    { chosen: 'chosen response', llm_input: 'some input', rejected: 'rejected response' },
  ],
});

console.log(response.job_id);

{
  "balanced_accuracy": 123,
  "detailed_status": [
    "Downloading model",
    "Tuning prompt"
  ],
  "f1": 123,
  "job_id": "1234abcd",
  "num_labeled_examples_used": 123,
  "num_preference_examples_used": 123,
  "precision": 123,
  "recall": 123,
  "scoring_spec": [
    {
      "custom_model_id": "your-model-id",
      "label": "Relevance to Prompt",
      "parameters": [
        0.14285714285714285,
        0.2857142857142857,
        0.42857142857142855,
        0.5714285714285714,
        0.7142857142857143,
        0.8571428571428571
      ],
      "python_code": "\ndef score(response_text: str, input_text: str, kwargs: dict) -> dict:\n    word_count = len(response_text.split())\n    if word_count > 10:\n        return {\"score\": 0.2, \"explanation\": \"Response has more than 10 words\"}\n    elif word_count > 5:\n        return{\"score\": 0.6, \"explanation\": \"Response has more than 5 words\"}\n    else:\n        return {\"score\": 1, \"explanation\": \"Response has 5 or fewer words\"}\n",
      "question": "Is the response relevant to the prompt?",
      "scoring_type": "PI_SCORER",
      "tag": "Legal Formatting",
      "weight": 1
    }
  ],
  "state": "RUNNING",
  "threshold": 123
}

Authorizations

x-api-key

string

header

required

Body

application/json

application_description

string

required

The application description to generate a scoring spec for.

Examples:

"Write a children's story communicating a simple life lesson."

examples

SDKLabeledExample · object[]

required

Rated examples to use for generating the discriminating questions. The scores can be class labels or actual scores (but must be between 0 and 1)

Show child attributes

Examples:

[
  {
    "llm_input": "good input",
    "llm_output": "good response",
    "score": 0.9
  },
  {
    "llm_input": "neutral input",
    "llm_output": "neutral response",
    "score": 0.5
  }
]

preference_examples

SDKPreferenceExample · object[]

required

Preference examples to use for generating the discriminating questions. Must specify either the examples or preference examples

Show child attributes

Examples:

[
  {
    "chosen": "chosen response",
    "llm_input": "some input",
    "rejected": "rejected response"
  }
]

batch_size

integer

default:10

Number of examples to use in one batch to generate the questions.

Examples:

"10"

existing_questions

Question · object[]

Existing questions for the applications, these may or may not be retained in the output depending on their performance

Show child attributes

Examples:

[
  {
    "label": "some input",
    "question": "Is the output relevant to input?",
    "weight": 1
  }
]

num_questions

integer

default:-1

The maximum number of new questions that the generated scoring system should contain. If <= 0, then the number is auto selected.

Examples:

"10"

retain_existing_questions

boolean

default:true

If true, only generate new questions that improve the accuracy.

Examples:

false

try_auto_generating_python_code

boolean

default:true

If true, try to generate python code for the generated questions.

Examples:

false

Response

Successful Response

detailed_status

string[]

required

Detailed status of the job

Examples:

["Downloading model", "Tuning prompt"]

job_id

string

required

The job id

Examples:

"1234abcd"

state

enum<string>

required

Current state of the job

Available options:

QUEUED,

RUNNING,

DONE,

ERROR,

CANCELLED

balanced_accuracy

number | null

Weighted combination fo average accuracy per class for the labeled data and overall accuracy for preference data.

number | null

F1 for the labeled data.

num_labeled_examples_used

integer | null

Number of labeled examples used for spec generation.

num_preference_examples_used

integer | null

Number of preference examples used for spec generation.

precision

number | null

Precision for the labeled data.

recall

number | null

Recall for the labeled data.

scoring_spec

Question · object[] | null

The generated scoring spec

Show child attributes

threshold

number | null

Threshold to use to separate 0 and 1 labels in the case of classification.

Pi Scorer

Retrieve

⌘I

Getting Started

SDK

API Reference

Authorizations

Body

Response