AIcrowd | Amazon KDD Cup 24: All-Around

Round 1: 7 days left #llm #recommendation_system #specialized_llm Weight: 1.0

Amazon Search

2802

1192

446

217

This page corresponds to "Track 5: All-around" in the Amazon KDD Cup 2024 Challenge".

🌟 Introduction

Online shopping has become an indispensable service in the lives of modern citizens. To provide better shopping experiences to users, machine learning has been extensively used to understand various entities in online shopping, such as queries, browse sessions, etc. to infer the user's search and shopping intentions. However, there are few studies that explore online shopping tasks under the multi-task, few-shot learning scenario. In practice, online shopping creates a massive multi-task learning problem that often involves joint understanding of various shopping entities, such as products, attributes, queries, purchases, etc. Moreover, new shopping entities and tasks constantly emerge over time as a result of business expansion or new product lines, creating few-shot learning problems on these emerging tasks.

Large language models (LLM) emerge as promising solutions to the multi-task, few-shot learning problem in online shopping. Many studies have underscored the ability of a single LLM to perform various text-related tasks with state-of-the-art abilities, and to generalize to unseen tasks with only a few samples or task descriptions. Therefore, by training a single LLM for all shopping-related machine learning tasks, we mitigate the costs for task-specific engineering efforts, and for data labeling and re-training upon new tasks. Furthermore, LLMs can improve the customers' shopping experiences by providing interactive and real-time shopping recommendations.

This track, the all-around track, aims to put the multi-task nature of online shopping to the maximum. In this track, participants are expected to solve all questions in Tracks 1-4 with a single solution, which thus calls for more unified and versatile solutions.

👨‍💻👩‍💻 Tasks

This track, Track 5: All-around, requires participants to solve all questions in Tracks 1-4 with a unified solution to further emphasize the generalizability and the versatility of the solutions. Please refer to webpages for Tracks 1-4 (linked below) for more detailed task descriptions.

Shopping Concept Understanding: There are many domain-specific concepts in online shopping, such as brands, product lines, etc. Moreover, these concepts often exist in short texts, such as queries, making it even more challenging for models to understand them without adequate contexts. This skill emphasizes the ability of LLMs to understand and answer questions related to these concepts.
Shopping Knowledge Reasoning: Complex reasoning with implicit knowledge is involved when people make shopping decisions, such as numeric reasoning (e.g. calculating the total amount of a product pack), multi-step reasoning (e.g. identifying whether two products are compatible with each other). This skill focuses on evaluating the model's reasoning ability on products or product attributes with domain-specific implicit knowledge.
User Behavior Alignment: User behavior modeling is of paramount importance in online shopping. However, user behaviors are highly diverse, including browsing, purchasing, query-then-clicking, etc. Moreover, most of them are implicit and not expressed in texts. Therefore, aligning with heterogeneous and implicit shopping behaviors is a unique challenge for language models in online shopping, which is the primary aim of this track.
Multi-lingual Abilities: Multi-lingual models are especially desired in online shopping as they can be deployed in multiple marketplaces without re-training. Therefore, we include a separate multi-lingual track, including multi-lingual concept understanding and user behavior alignment, to evaluate how a single model performs in different shopping locales without re-training.

🗃 Datasets

The ShopBench Dataset is an anonymized, multi-task dataset sampled from real-world Amazon shopping data. Statistics of ShopBench in this track is given in Table 1.

Table 1: Dataset statistics for Track 5: All-around.

# Tasks	# Questions	# Products	# Product Category	# Attributes	# Reviews	# Queries
57	20598	~13300	400	1032	~11200	~4500

ShopBench is split into a few-shot development set and a test set to better mimic the few-shot learning setting. With this setting, we encourage participants to use any resource that is publicly available (e.g. pre-trained models, text datasets) to construct their solutions, instead of overfitting the given development data (e.g. generating pseudo data samples with GPT).

The development datasets will be given in json format with the following fields.

'input_field': This field contains the instructions and the question that should be answered by the model.
'output_field': This field contains the ground truth answer to the question.
'task_type': This field contains the type of the task (Details in the next Section, "Tasks")
'metric': This field contains the metric used to evaluate the question (Details in Section "Evaluation Metrics").

However, the test dataset (which will be hidden from participants) will have a different format with only two fields:

'input_field', which is the same as above.
'is_multiple_choice': This field contains a 'True' or 'False' that indicates whether the question is a multiple choice or not. The detailed 'task_type' will not be given to participants.

In Track 5, participants will not be given the track each question comes from.

💯 Evaluation Metrics

Please see the detailed evaluation metrics here.

**The score of track 5 would be a macro-average of scores of tracks 1-4. **

🚀 Baselines

To assist you in making your first submission effortlessly, we have granted access to the ShopBench baseline. This setup utilises existing LLM models to generate answers in a zero-shot manner for a variety of questions. This resource features open-source LLMs, such as Vicuna-7B. We also include results of a proprietary LLM, Claude 2. We report results of Vicuna-7B and Claude in Table 2.

Table 2: Baseline results of Vicuna-7B and Claude 2 on ShopBench.

Models	Track 1: Shopping Concept Understanding	Track 2: Shopping Knowledge Reasoning	Track 3: User Behavior Alignment	Track 4: Multi-lingual Abilities	Track 5: All-around
Vicuna-7B-v1.5	0.5273	0.4453	0.4103	0.4382	0.4785
Claude 2	0.7511	0.6382	0.6322	0.6524	0.6960
Amazon Titan	0.6105	0.4500	0.5063	0.5531	0.5556

With the results, we show that the challenge is manageable, in that open-source LLMs, without specific prompting techniques, can already achieve non-trivial performances. In addition, we observe a significant gap between open-source models (Vicuna-7B) and proprietary models (Claude 2), showing the potential room for improvement. We encourage participants to development effective solutions to close or even eliminate the gap.

🏆 Prizes

Track 5 will be the track with the heaviest prizes. In total, we prepare $16,500 as awards to winners of Track 5.

CASH PRIZES

We prepare a total of $14,000 assigned as follows.

🥇 First Place: $7,000
🥈 Second Place: $3,500
🥉 Third Place: $1,500
Student Award: The best student team (i.e. all team members are students) will be awarded $2,000.

In addition to cash prizes, the winning teams will also have the opportunity to present their work at the KDD Cup workshop.

💳 AWS CREDITS

We will award $500 worth of AWS credits to the 4th-8th teams in this track.

📨 Submission

Please see details of submission here.

The time limit for submissions to Track 5 in Phase 1 is 300 minutes, i.e. 5 hours.

📅 Timeline

Please see the timeline here.

📱 Contact

Please use kddcup2024@amazon.com for all communication to reach the Amazon KDD cup 2024 team.

Organizers of this competition are:

Yilun Jin
Zheng Li
Chenwei Zhang
Xianfeng Tang
Haodong Wang
Mao Li
Ritesh Sarkhel
Qingyu Yin
Yifan Gao
Xin Liu
Zhengyang Wang
Tianyu Cao
Jingfeng Yang
Ming Zeng
Qing Ping
Wenju Xu
Pratik Jayarao
Priyanka Nigam
Yi Xu
Xian Li
Hyokun Yun
Jianshu Chen
Meng Jiang
Kai Chen
Bing Yin
Qiang Yang
Trishul Chilimbi

🤝 Acknowledgements

We thank our partners in AWS, Paxton Hall, for supporting with the AWS credits for winning teams and the competition.