Long LaMP

This page contains links to access the datasets. Our benchmark encompasses two settings for each task: 1) User setting, replicating real-world cold start behavior; 2) Temporal setting, aiding the study of LLMs' abilities to capture behaviors over time, another common scenario. Both settings are described in more detail in the paper. The following sections introduce each task and provide dataset links, along with small subsets for previewing.

Personalized Email Completion

This dataset is created using the private email collection dataset: Avocado Research Email Collection. The objective of this task is to complete the email for a given user, given an email subject, part of the email and user's previous email-subject pairs.

The Avocado dataset is not publicly accessible. However, we provided the samples' ID and the code we used to generate our dataset. Therefore, if you get access to the dataset, you can quickly generate the dataset with the same format as the other datasets in LaMP using the following code:

python data/avocado/create_avocado_dataset.py \
--avocado_files_dir *Address to the directory containing zip files for avocado dataset 'avocado-1.0.2/data/text'* \
--extract_addr *A temp dir to extract the files for creating dataset* \
--output_dir *The directory to generate the final dataset* \
--input_question_file_train *The address to the train_questions.json file we provided in LaMP* \
--input_question_file_dev *The address to the dev_questions.json file we provided in LaMP* \
--input_question_file_test *The address to the test_questions.json file we provided in LaMP*

Personalized Abstract Generation

The benchmark aims to test the model’s ability to distill complex ideas and generate accurate, concise, and coherent output over the span of multiple paragraphs on domain-specific tasks. The input of the model is the title of the paper along with some keywords to guide the content. The expected output is an abstract conditioned on the title and keywords in the user's style.

Setting	Train	Validation	Test
User Setting	Train	Validation	Test
Temporal Setting	Train	Validation	Test

Personalized Topic Generation

This task involves generating the content of Reddit posts based on the provided input, which is a summary of the post. The primary objective is to evaluate the model’s capability in expanding concisely written ideas into detailed discussions while effectively integrating the user’s writing style and specific interests.

Setting	Train	Validation	Test
User Setting	Train	Validation	Test
Temporal Setting	Train	Validation	Test

Personalized Product Review Generation

This benchmark focuses on generating comprehensive product reviews based on the input comprising the product description, the user’s rating, and a summary of their review. The objective is to evaluate the model’s capability to capture the user’s writing style while producing detailed opinions on the product, incorporating the individual's preferences. The generated review serves as the output, enabling an assessment of the model’s ability to synthesize coherent and personalized text aligning with the provided input and context.

Setting	Train	Validation	Test
User Setting	Train	Validation	Test
Temporal Setting	Train	Validation	Test

Long LaMP

Personalized Long-Text Generation

Personalized Email Completion

Personalized Abstract Generation

Personalized Topic Generation

Personalized Product Review Generation