Using Prompt Engineering to build an LLM Recommendation System

Free templates, a video rundown, and unexpected experiment results

Feb 13, 2024

If you prefer video, there’s a YouTube link at the bottom 👇.

Recommendation systems have been around for a long time. Yelp recommends restaurants based on past bookings, Netflix recommends movies based on your binges, and Spotify recommends songs based on your listens.

Recommendation systems were one of the earliest implementations of AI, before Large Language Models (LLMs) became cool!

But, LLMs have provided a new way to build these systems. With just some prompt engineering, you can create a recommendation system in a day that performs on the same level as a system that used to take weeks to build.

A recent paper from the research teams at The University of Tokyo and University College Dublin make the implementation of this type of system even easier: RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models.

RecPrompt is a prompt engineering framework built to make news article recommendations based on a user’s history.

Let’s dive a little deeper

Prompt Engineering a Recommendation System

RecPrompt has three main components and a few prompts.

RecPrompt flowchart — This is going to get a whole lot clearer in a few minutes

It all starts with a system instruction prompt and an initial prompt template.

System Instructions

An image containing the text for the System message used in RecPrompt

Initial Prompt Template

The three main components of the framework are:

The Prompt Optimizer: Optimizes the prompt based on the performance of previous prompt alongside the success of their corresponding recommendations. This makes for a positive feedback loop.
A Recommender: Responsible for generating the news recommendations. These recommendations get fed into the prompt template (see above) as examples.
A Monitor: Measures and evaluates the prompts against specific metrics

Here’s how it works in practice

The Prompt Optimizer is fed four items:

System Instructions
Current Prompt Template
Samples from the Recommender to use for in-context learning
The observation instruction (in the bottom right portion of the graphic above). This is the prompt that optimizes the prompt template.

Following this, the Prompt Optimizer outputs the updated and enhanced prompt template. That refined prompt is fed to the Recommender, which then generates the personalized recommendations for the user.

Lastly, the Monitor evaluates and records the best prompt templates based on performance on some metrics.

Experiment results

The researchers tested RecPrompt using the Microsoft News Dataset. They employed multiple evaluation metrics and LLMs (GPT-3.5, GPT-4). They compared RecPrompt’s effectiveness against other news recommendation methods and deep neural models.

Before we jump in, let’s orient ourselves.

We have some classic news recommendation methods in the top row, followed by AI-methods that don’t utilize LLMs, followed by three LLM-based methods.

The “Initial Prompt” is the template shared above. The “Hand-Crafted Prompt” leverages the framework, but the prompt optimizations are done by hand, rather than using the optimizer. Lastly, the “LLM-Generated Prompt”, makes full use of the framework and the Prompt Optimizer.

Got it? Great.

Here’s what jumps out to me:

Using GPT-3.5, “Initial Prompt” performs worse than TopicPop and all the Neural models.
Even when using GPT-4, “Initial Prompt” doesn’t beat all the Neural models (Loses in AUC)
Only when using GPT-4 as the prompt optimizer does an LLM-based method beat all the neural models.

Here comes my favorite part

Let’s focus only on the LLM-based methods, for a single evaluation set (MRR).

A table of results focusing on GPT 3.5 and GPT-4 on the MRR evaluation set

As a reminder “Initial Prompt” is the template shared above. It doesn’t leverage the whole RecPrompt framework.

On average, implementing the framework and letting the LLM optimize the prompt for you leads to ~5% gains. Depending on your use case, that could be a lot or a little.

If you just need to stand something up quickly, just using the Initial Prompt Template might be a better option. If you are in a highly competitive space and every percentage point matters, it is probably worth the investment to implement the framework.

Wrapping up + free templates and videos

We put a template together that mimics the Initial Prompt template structure. You can access it for free in PromptHub here.

More of a Youtube type of person? Check out our full rundown here.

If you have a minute, I’d appreciate any support you can give us on LinkedIn! Here is a post related to RecPrompt.

Happy prompting!

Dan

The Prompt Engineering Substack

Discussion about this post