Loading
9338
1190
446
679

This page corresponds to "Track 1: Shopping Concept Understanding" in the Amazon KDD Cup 2024 Challenge".

🌟 Introduction

Online shopping has become an indispensable service in the lives of modern citizens. To provide better shopping experiences to users, machine learning has been extensively used to understand various entities in online shopping, such as queries, browse sessions, etc. to infer the user's search and shopping intentions. However, few studies explore online shopping tasks under the multi-task, few-shot learning scenario. In practice, online shopping creates a massive multi-task learning problem that often involves a joint understanding of various shopping entities, such as products, attributes, queries, purchases, etc. Moreover, new shopping entities and tasks constantly emerge over time as a result of business expansion or new product lines, creating few-shot learning problems on these emerging tasks. 

Large language models (LLM) emerge as promising solutions to the multi-task, few-shot learning problem in online shopping. Many studies have underscored the ability of a single LLM to perform various text-related tasks with state-of-the-art abilities, and to generalize to unseen tasks with only a few samples or task descriptions. Therefore, by training a single LLM for all shopping-related machine learning tasks, we mitigate the costs for task-specific engineering efforts, and for data labeling and re-training upon new tasks. Furthermore, LLMs can improve the customers' shopping experiences by providing interactive and real-time shopping recommendations. 

This track, Shopping Concept Understanding, aims to evaluate the model's ability to understand shopping entities in the form of texts, such as product names, product categories, attributes, product descriptions, reviews, etc. Many domain-specific terms and concepts exist in online shopping, such as brands, product lines, etc., which may rarely be seen during general domain pre-training and fine-tuning. Moreover, these terms and concepts in online shopping often appear within very short texts, making it even more difficult for general LLMs to understand them without sufficient context. A real-world example is shown in Figure 1. This track and the underlying skill serve as the basis for performing more advanced tasks, such as reasoning, recommendation, etc. 

Figure 1: Example of domain-specific entities in short texts in online shopping.

Figure 1: Example of domain-specific entities in short texts in online shopping. 

πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’» Tasks

This track focuses on understanding entities and concepts that are specific to the domain of online shopping, an example of which is shown in Figure 1. For a more fine-grained evaluation, this track is further divided into the following sub-skills. 

  • Concept Normalization: In online shopping, many concepts refer to the same thing. For example, 'USB 3.0', 'USB 3.1 Gen 1', 'USB 3.2 Gen 1', and 'USB 5G' actually refer to the same USB standards. These unnormalized concepts trigger significant confusion among customers, and thus it is important to unify the unnormalized concepts for a clearer product description. 
  • Elaboration: As a shop assistant, it is important to elaborate or explain shopping concepts to customers in plain, understandable, and concise language to facilitate customer shopping decisions.
  • Extraction and Summarization: Product descriptions are often long, while customers may only look for several important properties and highlights. It thus requires strong extraction and summarization abilities to tell a customer whether a product fits his specific needs. 
  • Relational Inference: Concepts and entities in online shopping are often related to each other. For example, an attribute only applies to certain products. Understanding how entities like products, product categories, and attributes can be related to each other plays an important role in retrieving products according to requirements. 
  • Sentiment Analysis: Reviews about products often serve as important references for customers to make shopping decisions. How well a model can analyze the sentiments expressed in reviews thus serves as the foundation for recommending high-quality products to customers. 

πŸ—ƒ Datasets

The ShopBench Dataset is an anonymized, multi-task dataset sampled from real-world Amazon shopping data. Statistics of ShopBench in this track is given in Table 1. 

Table 1: Dataset statistics for Track 1: Shopping Concept Understanding.

# Tasks # Questions # Products # Product Category # Attributes # Reviews # Queries
27 11129 ~1500 400 1032 ~9600 361

The few-shot development datasets (shared across all tracks) will be given in json format with the following fields. 

  • 'input_field': This field contains the instructions and the question that should be answered by the model. 
  • 'output_field': This field contains the ground truth answer to the question. 
  • 'task_type': This field contains the type of the task (Details in the next Section, "Tasks")
  • 'metric': This field contains the metric used to evaluate the question (Details in Section "Evaluation Metrics"). 

However, the test dataset (which will be hidden from participants) will have a different format with only two fields: 

  • 'input_field', which is the same as above. 
  • 'is_multiple_choice': This field contains a 'True' or 'False' that indicates whether the question is a multiple choice or not. The detailed 'task_type' will not be given to participants.   

πŸ’― Evaluation Metrics

Please see the detailed evaluation metrics here.

πŸš€ Baselines

To assist you in making your first submission effortlessly, we have granted access to the ShopBench baseline. This setup utilises existing LLM models to generate answers in a zero-shot manner for a variety of questions. This resource features open-source LLMs, such as Vicuna-7B. We also include results of a proprietary LLM, Claude 2. We report results of Vicuna-7B and Claude in Table 2. 

Table 2: Baseline results of Vicuna-7B and Claude 2 on ShopBench Track 1.

Models Track 1: Shopping Concept Understanding
Vicuna-7B-v1.5 0.5273
Claude 2 0.7511
Amazon Titan 0.6105

With the results, we show that the challenge is manageable, in that open-source LLMs, without specific prompting techniques, can already achieve non-trivial performances. In addition, we observe a significant gap between open-source models (Vicuna-7B) and proprietary models (Claude 2), showing the potential room for improvement. We encourage participants to development effective solutions to close or even eliminate the gap. 

πŸ† Prizes

We prepare a prize pool of a total of $6,250 for this track. 

Cash Prizes

For this track, we assign the following awards. 

  • πŸ₯‡ First Place: $2,000
  • πŸ₯ˆ Second Place: $1,000
  • πŸ₯‰ Third Place: $500
  • Student Award: The best student team (i.e. all team members are students) will be awarded $750. 

In addition to cash prizes, the winning teams will also have the opportunity to present their work at the KDD Cup workshop.

πŸ’³ AWS Credits

We will award $500 worth of AWS credits to the 4th-7th teams in this track. 

πŸ“¨ Submission

Please see details of submission here.

The time limit for submissions to Track 1 in Phase 1 is 140 minutes.

πŸ“… Timeline

Please see the timeline here.

πŸ“± Contact 

Please use kddcup2024@amazon.com for all communication to reach the Amazon KDD cup 2024 team. 

Organizers of this competition are: 

  • Yilun Jin
  • Zheng Li
  • Chenwei Zhang
  • Xianfeng Tang
  • Haodong Wang
  • Mao Li
  • Ritesh Sarkhel
  • Qingyu Yin
  • Yifan Gao
  • Xin Liu
  • Zhengyang Wang
  • Tianyu Cao
  • Jingfeng Yang
  • Ming Zeng
  • Qing Ping
  • Wenju Xu
  • Pratik Jayarao
  • Priyanka Nigam
  • Yi Xu
  • Xian Li
  • Hyokun Yun
  • Jianshu Chen
  • Meng Jiang
  • Kai Chen
  • Bing Yin
  • Qiang Yang
  • Trishul Chilimbi

🀝 Acknowledgements

We thank our partners in AWS, Paxton Hall, for supporting with the AWS credits for winning teams and the competition.

Notebooks

See all
Shopping Concept Understanding
By
Psi_xj-long-language-model-1
25 days ago
0