"Act like a... or maybe not?" The truth about persona prompting
A ton of research, templates, and a free upcoming prompt engineering conference
Before we get into it, just a quick note. I’m co-organizing a free online prompt engineering conference on Nov 20. Speakers from Scale, Microsoft, Dropbox, Wix, and more.
Did I mention it’s free?
I hope you liked reading that title as much as I hated writing it.
Giving the model a role or persona like “You’re an expert mathematician” has been more or less a best practice since the launch of ChatGPT. I would use roles in prompts and tell others to do so as well.
I realized I never really questioned it, so I decided to dive deep to see if using personas actually helped improve accuracy.
So I’m not talking about using personas to guide tone or style when using LLMs for writing tasks, because it clearly works there. If you tell ChatGPT to talk like a cowboy, it will talk like a cowboy.
If you want the full breakdown, you can check out our blog post here.
What Is Persona Prompting?
Persona prompting is when you assign a specific role to an LLM, like “Act like a lawyer”.
Mixed Results in Accuracy-Based Tasks
I read through a number of papers that looked at the effectiveness of persona prompting on accuracy based tasks. Here are the best ones:
Better Zero-Shot Reasoning with Role-Play Prompting
This paper claims 10% performance improvement using a variant of persona prompting they call the “role immersion method”.
This method contains two prompts:
Role-Setting Prompt: A user-designed prompt that assigns the persona.
Role-Feedback Prompt: The model’s response to the Role-Setting Prompt.
So with each new request, 3 total requests are sent.
Here is what the request code looks like:
prompt_1 = role_setting_prompt
prompt_2 = role_feedback_prompt
conversation = [
{"role": "user", "content": prompt_1},
{"role": "assistant", "content": prompt_2},
{"role": "user", "content": question}
]
answer = openai.ChatCompletion.create( model="gpt-3.5-turbo-0613"),
This evidence isn’t really strong to me because:
The Role-Setting Prompt is hand-crafted, and it doesn’t look simple to me. This could be challenging for most people
Sending three messages with each new request isn’t efficient and isn’t typically what I think of when I think of persona prompting
Testing was limited to GPT-3.5
Alrighty, next up:
Maybe the most notable part about this paper is that the original abstract, written in 2023 said:
Through extensive analysis of 3 popular LLMs and 2457 questions, we show that adding interpersonal roles in prompts consistently improves the models' performance over a range of questions.
The paper was updated in October of 2024, and the abstract now reads:
Through extensive analysis of 4 popular families of LLMs and 2,410 factual questions, we demonstrate that adding personas in system prompts does not improve model performance across a range of questions compared to the control setting where no persona is added
In this paper, the researchers tested a ton of personas across thousands of fact-based questions, and 4 model families.
These were the prompt templates used:
In my opinion, these templates are a little weak, in that they are the most basic version you could write for a persona prompt.
Here were the findings from the paper:
Adding personas in system prompts generally didn’t improve performance and sometimes had negative effects.
When personas did improve performance, no consistent strategy emerged for choosing the best persona—random selection often worked just as well. I.e., it was impossible to know how to pick the best persona.
Gender-neutral, in-domain, and work-related roles showed slight performance improvements, but the effect size was minimal.
Domain alignment (e.g., using a “lawyer” persona for legal tasks) had only a minor impact on performance.
Onto the next one, maybe this paper will have some positive news for persona prompting?
In this paper, the researchers developed a persona prompting framework called Jekyll & Hyde.
The framework has a few steps:
Persona generator (template below): Use an LLM to generate a persona for a task
Solver: Generate solutions to the task through two prompts, one with the persona, one without
Evaluator: Both outputs are sent through an LLM evaluator, the better solution is chosen
Here’s what the flow looks like:
You can access the persona generator template in PromptHub here.
Here were the results from the experiments
In some cases, the framework led to performance increases, in others it didn’t.
Interestingly, the gap between “Persona” and “Base” for GPT-4 is so slim in some cases.
Either way, I think it’s a little unfair to judge a base prompt against a framework, considering the framework includes multiple LLM calls and evaluator. Since there are multiple components in the framework it is hard to discern where the performance boost is coming from.
Next up is my favorite pro-persona prompting paper
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts
Mentioned in last week’s post, this paper was written in May of 2023, but still has some relevant information in it.
ExpertPrompting consists of 2 simple steps
An instruction is passed to an LLM to generate an expert identity
The generated identity and original instruction are sent to the LLM to process
I like this framework more than the previous one because the prompt engineering is better. The prompt that generates the expert identities is better structured, using In-Context learning to produce better outputs.
Below is an example of an output.
Here is the template that you can access in PromptHub if you’d like.
The researchers tested 3 prompt engineering methods:
Vanilla Prompting:
Vanilla Prompting + Static DESC: The basic prompt plus a simple persona prompt:
(”Imaging you are an expert in the regarding field, try to answer the following instruction as professional as possible.{Instruction}”)
Expert Prompting: Uses the LLM-generated persona via the template above
Here were the results
Vanilla Prompting and Vanilla Prompting + Static DESC showed similar performance, reinforcing findings from the papers above that basic persona prompts don’t improve results.
Expert Prompting significantly outperformed the other methods.
While these experiments were done using GPT-3.5, I have a hunch that more elaborate persona prompts like the ones tested here, can still improve performance on non-writing tasks.
When Role Prompting is Most Useful
Persona prompting is highly effective for open-ended tasks like creative writing, where it can help align responses with a desired tone or style (e.g., “talk like a pirate”).
It’s a valuable tool for content creation and engagement-focused interactions.
Persona prompts can also support security by adding guardrails in system prompts for safer, more controlled outputs.
How to construct effective personas for role prompts
If you’re going to write a persona into your prompt, your persona should be specific, detailed, and automated.
Specific: The role should be in the same domain as the task, and should be as specific as possible. “Python developer” > “developer”
Detailed: The role description should be detailed and cover all the necessary bases.
Automated (and simple): Ain’t nobody got time to write this by hand. LLM-generated personas outperform human-written ones, and LLMs are much more scalable
I’d just start by using the ExpertPrompt persona generator. (Access it here. )
Wrapping up
Here are my final thoughts after all this research:
Persona prompting is effective for open-ended tasks (e.g., creative writing).
It’s generally not beneficial for accuracy-based tasks (e.g., classification), especially with newer models or simple persona definitions.
For best results, persona prompts should be specific, detailed, and ideally automated—ExpertPrompting provides a strong starting point.
Prompt template of the week
Couldn’t be anything other than the persona generator prompt from the ExpertPrompting framework:O
You can access the template here.
depends on the task and prompt. sometimes it helps greatly, other times not at all.