Authorizations
Body
The application description to generate a scoring spec for.
"Write a children's story communicating a simple life lesson."
Rated examples to use for generating the discriminating questions. The scores can be class labels or actual scores (but must be between 0 and 1)
[
{
"llm_input": "good input",
"llm_output": "good response",
"score": 0.9
},
{
"llm_input": "neutral input",
"llm_output": "neutral response",
"score": 0.5
}
]Preference examples to use for generating the discriminating questions. Must specify either the examples or preference examples
[
{
"chosen": "chosen response",
"llm_input": "some input",
"rejected": "rejected response"
}
]Number of examples to use in one batch to generate the questions.
"10"
Existing questions for the applications, these may or may not be retained in the output depending on their performance
[
{
"label": "some input",
"question": "Is the output relevant to input?",
"weight": 1
}
]The maximum number of new questions that the generated scoring system should contain. If <= 0, then the number is auto selected.
"10"
If true, only generate new questions that improve the accuracy.
false
If true, try to generate python code for the generated questions.
false
Response
Successful Response
Detailed status of the job
["Downloading model", "Tuning prompt"]The job id
"1234abcd"
Current state of the job
QUEUED, RUNNING, DONE, ERROR, CANCELLED Weighted combination fo average accuracy per class for the labeled data and overall accuracy for preference data.
F1 for the labeled data.
Number of labeled examples used for spec generation.
Number of preference examples used for spec generation.
Precision for the labeled data.
Recall for the labeled data.
The generated scoring spec
Threshold to use to separate 0 and 1 labels in the case of classification.