2 Comments
User's avatar
Neethu Nath's avatar

Wouldn't this increase the latency?

Expand full comment
Dan Cleary's avatar

When generating the outputs you could run the prompts in parallel to minimize latency increase. So then the only increase in latency comes from the last prompt where the model is judging the outputs

Expand full comment