This page contains links to access the datasets. Our benchmark encompasses two settings for each task: 1) User setting, replicating real-world cold start behavior; 2) Temporal setting, aiding the study of LLMs' abilities to capture behaviors over time, another common scenario. Both settings are described in more detail in the paper. The following sections introduce each task and provide dataset links, along with small subsets for previewing. Notably, the Avocado (Personalized Email Subject Generation) dataset isn't publicly available. However, we offer the code and sample IDs used for dataset generation. Follow the instructions to easily generate the dataset upon Avocado dataset access.

Personalized Email Completion

This dataset is created using the private email collection dataset: Avocado Research Email Collection. The objective of this task is to complete the email for a given user, given an email subject, part of the email and user's previous email-subject pairs. For more information on this, refer to the longLAMP paper.

This is a sample of input and output file of this dataset and the below table contains the complete dataset for the both the settings.

Setting Train Validation Test

User Setting

Input / Output Input / Output Input / Output Input / Output Input

Temporal Setting

Input / Output Input / Output Input / Output Input / Output Input

Personalized Abstract Generation

The benchmark aims to test the model’s ability to distill complex ideas and generate accurate, concise and coherent output over the span of multiple paragraphs on domain specific tasks. The input of the model is the title of the paper along with some keywords to guide the content. The expected output is an abstract conditioned on the title and keywords in user's style.

This is a sample of input and output file of this dataset and the below table contains the complete dataset for the both the settings.

Setting Train Validation Test

User Setting

Input / Output Input / Output Input / Output Input / Output Input

Temporal Setting

Input / Output Input / Output Input / Output Input / Output Input

Personalized Topic Generation

This task involves generating the content of Reddit posts based on the provided input, which is a summary of the post. The primary objective is to evaluate the model’s capability in expanding concisely written ideas into detailed discussions while effectively integrating the user’s writing style and specific interests.

This is a sample of input and output file of this dataset and the below table contains the complete dataset for the both the settings.

Setting Train Validation Test

User Setting

Input / Output Input / Output Input / Output Input / Output Input

Temporal Setting

Input / Output Input / Output Input / Output Input / Output Input

Personalized Product Review Generation

Thisbenchmark focuses on generating comprehensive product reviews based on the input comprising the product description, the user’s rating, and a summary of their review. The objective is to evaluate the model’s capability to capture the user’s writing style while producing detailed opinions on the product, incorporating the individual's preferences. The generated review serves as the output, enabling an assessment of the model’s ability to synthesize coherent and personalized text aligning with the provided input and context.

This is a sample of input and output file of this dataset and the below table contains the complete dataset for the both the settings.

Setting Train Validation Test

User Setting

Input / Output Input / Output Input / Output Input / Output Input

Temporal Setting

Input / Output Input / Output Input / Output Input / Output Input