Sometimes throwing more API calls at a problem is the easiest way to increase accuracy
Wouldn't this increase the latency?
When generating the outputs you could run the prompts in parallel to minimize latency increase. So then the only increase in latency comes from the last prompt where the model is judging the outputs
Wouldn't this increase the latency?
When generating the outputs you could run the prompts in parallel to minimize latency increase. So then the only increase in latency comes from the last prompt where the model is judging the outputs