AIcrowd | Amazon KDD Cup 24: Multi-Lingual Abilities

Round 1: 7 days left #nlp #llm #multilingual_processing Weight: 1.0

Amazon Search

2093

1189

446

272

This page corresponds to "Track 4: Multi-lingual Abilities" in the Amazon KDD Cup 2024 Challenge".

🌟 Introduction

Online shopping has become an indispensable service in the lives of modern citizens. To provide better shopping experiences to users, machine learning has been extensively used to understand various entities in online shopping, such as queries, browse sessions, etc. to infer the user's search and shopping intentions. However, there are few studies that explore online shopping tasks under the multi-task, few-shot learning scenario. In practice, online shopping creates a massive multi-task learning problem that often involves joint understanding of various shopping entities, such as products, attributes, queries, purchases, etc. Moreover, new shopping entities and tasks constantly emerge over time as a result of business expansion or new product lines, creating few-shot learning problems on these emerging tasks.

Large language models (LLM) emerge as promising solutions to the multi-task, few-shot learning problem in online shopping. Many studies have underscored the ability of a single LLM to perform various text-related tasks with state-of-the-art abilities, and to generalize to unseen tasks with only a few samples or task descriptions. Therefore, by training a single LLM for all shopping-related machine learning tasks, we mitigate the costs for task-specific engineering efforts, and for data labeling and re-training upon new tasks. Furthermore, LLMs can improve the customers' shopping experiences by providing interactive and real-time shopping recommendations.

This track, Multi-lingual Abilities, focus on the ability to extend the model's shopping knowledge to a wider range of languages. Online shopping is popular worldwide, demanding timely and accurate services to customers all over the globe. Therefore, a model adept at multiple languages is especially desirable as it can be deployed to multiple marketplaces without the need for addition tuning. A real-world example is shown in Figure 1.

Figure 1: Example of how multi-lingual online shopping questions pose challenges to LLMs.

👨‍💻👩‍💻 Tasks

This track focuses on how models can extend their abilities to multiple languages to cater for multiple marketplaces around the globe simultaneously, an example of which is shown in Figure 1. For a more fine-grained evaluation, this track is further divided into the following sub-skills.

Multi-lingual Shopping Concept Understanding: Please see details in Track 1: Shopping Concept Understanding.
Multi-lingual User Behavior Alignment: Please see details in Track 3: User Behavior Alignment.

Note that both sub-skills may include both cross-lingual tasks (i.e. finding relations between languages) or multi-lingual tasks (i.e. the same task in multiple languages).

🗃 Datasets

The ShopBench Dataset is an anonymized, multi-task dataset sampled from real-world Amazon shopping data. Statistics of ShopBench in this track is given in Table 1.

Table 1: Dataset statistics for Track 4: Multi-lingual Abilities.

# Tasks	# Questions	# Products	# Product Category	# Attributes	# Reviews	# Queries
7	2379	~6000	/	/	/	~520

ShopBench is split into a few-shot development set and a test set to better mimic the few-shot learning setting. With this setting, we encourage participants to use any resource that is publicly available (e.g. pre-trained models, text datasets) to construct their solutions, instead of overfitting the given development data (e.g. generating pseudo data samples with GPT).

The development datasets will be given in json format with the following fields.

'input_field': This field contains the instructions and the question that should be answered by the model.
'output_field': This field contains the ground truth answer to the question.
'task_type': This field contains the type of the task (Details in the next Section, "Tasks")
'metric': This field contains the metric used to evaluate the question (Details in Section "Evaluation Metrics").

However, the test dataset (which will be hidden from participants) will have a different format with only two fields:

'input_field', which is the same as above.
'is_multiple_choice': This field contains a 'True' or 'False' that indicates whether the question is a multiple choice or not. The detailed 'task_type' will not be given to participants.

💯 Evaluation Metrics

Please see the detailed evaluation metrics here.

🚀 Baselines

To assist you in making your first submission effortlessly, we have granted access to the ShopBench baseline. This setup utilises existing LLM models to generate answers in a zero-shot manner for a variety of questions. This resource features open-source LLMs, such as Vicuna-7B. We also include results of a proprietary LLM, Claude 2. We report results of Vicuna-7B and Claude in Table 2.

Table 2: Baseline results of Vicuna-7B and Claude 2 on ShopBench Track 4: Multi-lingual Abilities.

Models	Track 4: Multi-lingual Abilities
Vicuna-7B-v1.5	0.4382
Claude 2	0.6324
Amazon Titan	0.5300

With the results, we show that the challenge is manageable, in that open-source LLMs, without specific prompting techniques, can already achieve non-trivial performances. In addition, we observe a significant gap between open-source models (Vicuna-7B) and proprietary models (Claude 2), showing the potential room for improvement. We encourage participants to development effective solutions to close or even eliminate the gap.

🏆 Prizes

We prepare a prize pool of a total of $6,250 for this track.

Cash Prizes

For this track, we assign the following awards.

🥇 First Place: $2,000
🥈 Second Place: $1,000
🥉 Third Place: $500
Student Award: The best student team (i.e. all team members are students) will be awarded $750.

In addition to cash prizes, the winning teams will also have the opportunity to present their work at the KDD Cup workshop.

💳 AWS Credits

We will award $500 worth of AWS credits to the 4th-7th teams in this track.

📨 Submission

Please see details of submission here.

The time limit for submissions to Track 4 in Phase 1 is 60 minutes.

📅 Timeline

Please see the timeline here.

📱 Contact

Please use kddcup2024@amazon.com for all communication to reach the Amazon KDD cup 2024 team.

Organizers of this competition are:

Yilun Jin
Zheng Li
Chenwei Zhang
Xianfeng Tang
Haodong Wang
Mao Li
Ritesh Sarkhel
Qingyu Yin
Yifan Gao
Xin Liu
Zhengyang Wang
Tianyu Cao
Jingfeng Yang
Ming Zeng
Qing Ping
Wenju Xu
Pratik Jayarao
Priyanka Nigam
Yi Xu
Xian Li
Hyokun Yun
Jianshu Chen
Meng Jiang
Kai Chen
Bing Yin
Qiang Yang
Trishul Chilimbi

🤝 Acknowledgements

We thank our partners in AWS, Paxton Hall, for supporting with the AWS credits for winning teams and the competition.