GRPO
Start Job
Starts a RL GRPO job
POST
Authorizations
Body
application/json
The base model to start the RL tunning process
Available options:
LLAMA_3.2_3B
, LLAMA_3.1_8B
Examples to use in the RL tuning process
An example for RL training
GRPO learning rate
Example:
0.000005
The LoRA configuration.
GRPO number of train epochs
Example:
10
The scoring spec to use in the GRPO tuning process
A custom system prompt to use during the RL tuning process
Example:
"An optional system prompt."
Response
200
application/json
Successful Response
RlGrpoStatus is the status of a RL PPO job.
Detailed status of the job
Example:
["Downloading model", "Tuning prompt"]
The job id
Example:
"1234abcd"
Current state of the job
Available options:
QUEUED
, RUNNING
, DONE
, ERROR
, CANCELLED
A list of trained models selected based on the PI score.