This challenge has now come to an end. You can browse interesting ongoing challenges on AIcrowd here.

IMPORTANT: Details about end of competition evaluations 🎯

πŸš€ Challenge Starter Kit | 🌈 Welcome thread

πŸ‘₯ Looking for teammates? | πŸ“’ Share your feedback


πŸ•΅οΈ Introduction

Data for machine learning tasks usually does not come for free but has to be purchased. The costs and benefits of data have to be weighed against each other. This is challenging. First, data usually has combinatorial value. For instance, different observations might complement or substitute each other for a given machine learning task. In such cases, the decision to purchase one group of observations has to be made conditional on the decision to purchase another group of observations. If these relationships are high-dimensional, finding the optimal bundle becomes computationally hard. Second, data comes at different quality, for instance, with different levels of noise. Third, data has to be acquired under the assumption of being valuable out-of-sample. Distribution shifts have to be anticipated.

In this competition, you face these data purchasing challenges in the context of an multi-label image classification task in a quality control setting.

πŸ“‘ Problem Statement

In short: You have to classify images. Some images in your training set are labelled but most of them aren't. How do you decide which images to label if you have a limited budget to do so?

In more detail: You face a multi-label image classification task. The dataset consists of synthetically generated images of painted metal sheets. A classifier is meant to predict whether the sheets have production damages and if so which ones. You have access to a set of images, a subset of which are labelled with respect to production damages. Because labeling is costly and your budget is limited, you have to decide for which of the unlabelled images labels should be purchased in order to maximize prediction accuracy.

Each of the images have a 6 dimensional label representing the presence or the absence of ['scratch_small', 'scratch_large', 'dent_small', 'dent_large', 'stray_particle', 'discoloration'] in the images.

You are required to submit code, which will be used to run the three different phases of the competition:

  • Pre-Training Phase

    • In the Pre-Training Phase, your code will have access to 1,000 labelled images on a multi-label image classification task with 6 classes.
    • It is up to you, how you wish to use this data. For instance, you might want to pre-train a classification model.
  • Purchase Phase

    • In the Purchase Phase, your code, after going through the Pre-Training Phase will have access to an unlabelled dataset of 10,000 images.
    • You will have a budget of 500 - 2000 label purchases, that you can freely use across any of the images in the unlabelled dataset to obtain their labels.
    • You are tasked with designing your own approach on how to select the optimal subset images in the unlabelled dataset, which would help you optimize your model's performance on the prediction task.
    • The available labelling budget will be made available in the purchase_phase via the budget parameter, and the available compute time will be made available in the purchase_phase via the time_available parameter.
    • In case of timeout in any of the labelling-budget & compute-constraint pairs, the evaluation will fail.
  • Post Purchase Training Phase

    • In the Post Purchase Training phase, we combine the labels purchased in the Purchase Phase with the available training set, and train an EfficientNet_b4 model for 10 epochs. The trained model is then used to make predictions on a held out test set, which is eventually used to compute the scores.

How much labelling budget do I have?

In the Round-2 of the competition, your submissions have to able to perform well across multiple labelling budget and compute constraint pairs. Submissions will be evaluated based on five purchasing budget-compute constraint pairs (different numbers of images to be labelled and different runtime limits). This means that there are five runs of your purchasing functions under different purchasing budget-compute constraint pairs. The five pairs will be the same for all submissions. They will be randomly drawn from the intervals [500 labels, 2,000 labels] for the puchasing budget and [15 min, 60 min] for the compute constraint. In all the cases, your code will be executed on a node with 4 CPUS, 16 GB RAM, 1 NVIDIA T4 GPU.

CHANGELOG | Round 2 | March 1st, 2022 : Please refer to the CHANGELOG.md for more details on everything that changed between Round 1 & Round 2.

πŸ’Ύ Dataset

The datasets for this challenge can be accessed in the Resources Section.

  • training-v0.2-rc4.tar.gz: The training set containing 1,000 images with their associated labels. During your local experiments you are allowed to use the data as you please.
  • unlabelled-v0.2-rc4.tar.gz: The unlabelled set containing 10,000 images, and their associated labels. During your local experiments you are only allowed to access the labels through the provided purchase_label function.
  • validation-v0.2-rc4.tar.gz: The validation set containing 3,000 images, and their associated labels. During your local experiments you are only allowed to use the labels of the validation set to measure the performance of your models and experiments.
  • debug-v0.2-rc4.tar.gz.: A small set of 100 images with their associated labels, that you can use for integration testing, and for trying out the provided starter kit.

NOTE The public dataset on which you run your local experiments might not be sampled from the same distribution as the private data set, on which the actual evaluations and the scoring are made.

πŸ‘₯ Participation

The participation flow looks as follows:

Quick description of all the phases:

  • Runtime Setup
    You can use requirements.txt for all your python packages requirement. In case you are advanced developer and need more freedom, checkout all the other supported runtime configurations here.
  • Pre-Training Phase
    It is your typical training phase. You need to implement pre_training_phase function and it will have access to training_dataset (instance of ZEWDPCBaseDataset). Learn more about it by referring to inline documentation here.
  • Purchase Phase
    In this phase you have access to unlabelled dataset as well, which you can probe till your budget lasts. Learn more about it by referring to inline documentation here.
  • Post Purchase Training Phase
    In this phase, we collect the purchased lables from the purchase_phase, and train an EfficientNet_b4 model after combining the purchased labels with the training set. We use the ZEWDPCTrainer class to train and evaluate the models.


  • Prediction Phase
    In this phase, your code has access to a test set, we expect the prediction_phase interface to make predictions on the test set using your trained models. Learn more about it by referring to the inline documentation here. While we now do not use the results generated by your prediction_phase for computing your final scores (starting Round 2), having a healthy & functioning prediction_phase interface in your code is still important to us. This challenge is a part of a larger research project, where we will like to be able to analyze the predictions made by your models, and compare them across submissions. Hence, the evaluation interface will continue to test the functionality of your prediction_phase interface against a small test set.

πŸš€ Submission

πŸ–Š Evaluation Criteria

The challenge will use the macro-weighted F1 Score, Accuracy Score, and the Hamming Loss during evaluation. The primary score will be the macro-weighted F1 Score.

πŸ“… Timeline

This challenge has two Rounds.

  • Round 1 : Feb 4th – Feb 28th, 2022

    • The first round submissions will be evaluated based on one budget-compute constraint pair (max. of 3,000 images to be labelled and 3 hours runtime).
    • Labelled Dataset : 5,000 images
    • Unlabelled Dataset : 10,000 images
    • Labelling Budget : 3,000 images
    • Test Set : 3,000 images
    • GPU Runtime : 3 hours
  • Round 2 : March 3rd – April 7th, 2022

    • Labelled Dataset : 1,000 images

    • Unlabelled Dataset : 10,000 images

    • Labelling Budget : [500 labels, 2000 labels] (with associated compute constraints in the range of [15min, 60min])

    • Test Set : 3,000 images

    • GPU Runtime : [15min, 60min] combined time available for the pre-training phase and purchase-phase.

    • NOTE: At the end of Round-2, the winners will be decided based on a private leaderboard, which is computed using a dataset sampled from a different distribution, and evaluated on 5 different budget-compute constraint pairs. In the Round-2 of the competition, your submissions have to able to perform well across multiple Purchasing Budget & Compute Budget pairs. Submissions will be evaluated based on 5 purchasing-compute budget pairs. The five pairs will be the same for all submissions. They will be drawn from the intervals [500 labels, 2,000 labels] for the puchasing budget and [15 min, 60 min] for the compute budget.

      The Public Leaderboard (visible throughout the Round-2) will be computed using the following purchasing-compute budget pairs :

      Purchasing budget Compute Budget
      621 labels 17 min
      621 labels 51 min
      1,889 labels 17 min
      1,889 labels 51 min
      1,321 labels 34 min

      The Private Leaderboard (computed at the end of Round-2), will use a different set of purchasing-compute budget pairs. Hence, the winning submisions are expected to generalize well across the Purchasing Budget space of [500 labels, 2,000 labels] and the Compute Budget space of [15 min, 60min]. A form for selecting the submissions for the Private Leaderboard will be floated at the end of the Round-2, and every participants can select upto 3 submissions.

      NOTE: The final scores for each of the submissions for both the Public Leaderboard and the Private Leaderboard are computed as the mean of the scores of the said submission across all the purchasing-compute budget pairs for the specific leaderboard.

πŸ† Prizes

This challenge has both Leaderboard Prizes and Community Contribution Prizes.

Leaderboard Prizes

These prizes will be given away to the top performing teams/participants of the second round of this challenge.

  • 1st Place : USD 6,000
  • 2nd Place : USD 4,500
  • 3rd Place : USD 3,000

The Community Contribution Prizes will be awarded based on the discretion of the organizers, and the popularity of the posts (or activity) in the community (based on the number of likes ❀️) - so share your post widely to spread the word!

The prizes typically go to individuals or teams who are extremely active in the community, share resources - or even answer questions - that benefit the whole community greatly!

You can make multiple submissions, but you are only eligible for the Community Contribution Prize once. In case of resources that are created, your work needs to be published under a license of your choice, and on a platform that allows other participants to access and use it.

Notebooks, Blog Posts, Tutorials, Screencasts, Youtube Videos, or even your active responses on the challenge forums - everything is eligible for the Community Contribution Prizes. We are looking forward to see everything you create!

πŸ”— Links

  • πŸ’ͺ Challenge Page: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022
  • πŸ—£οΈ Discussion Forum: https://discourse.aicrowd.com/c/data-purchasing-challenge-2022/2136
  • πŸ† Leaderboard: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022/leaderboards
  • πŸ™‹ Frequently Asked Questions (FAQs): https://discourse.aicrowd.com/t/frequently-asked-questions-faqs/7298/1

πŸ“± Contact

🀝 Organizers