Activity
Ratings Progression
Challenge Categories
Challenges Entered
Generate Synchronised & Contextually Accurate Videos
Latest submissions
See Allgraded | 271590 | ||
graded | 271581 | ||
graded | 271570 |
Improve RAG with Real-World Benchmarks
Latest submissions
See Allgraded | 251985 | ||
failed | 251936 | ||
failed | 251933 |
Evaluate Natural Conversations
Latest submissions
See Allgraded | 249059 | ||
submitted | 249058 | ||
submitted | 249057 |
Multi-Agent Dynamics & Mixed-Motive Cooperation
Latest submissions
See Allgraded | 244563 | ||
graded | 237014 |
Advanced Building Control & Grid-Resilience
Latest submissions
See Allfailed | 238874 | ||
failed | 238665 | ||
failed | 238660 |
Specialize and Bargain in Brave New Worlds
Latest submissions
See Allfailed | 237925 | ||
failed | 237924 |
Small Object Detection and Classification
Latest submissions
See Allgraded | 237160 | ||
failed | 237147 | ||
graded | 236860 |
Understand semantic segmentation and monocular depth estimation from downward-facing drone images
Latest submissions
See Allgraded | 214611 | ||
failed | 214573 | ||
failed | 214570 |
Audio Source Separation using AI
Latest submissions
See Allgraded | 236127 | ||
graded | 236125 | ||
failed | 236121 |
Identify user photos in the marketplace
Latest submissions
See Allgraded | 209775 | ||
failed | 209210 | ||
graded | 208987 |
A benchmark for image-based food recognition
Latest submissions
Using AI For Buildingβs Energy Management
Latest submissions
See Allfailed | 235438 | ||
failed | 235434 | ||
failed | 205233 |
Learning From Human-Feedback
Latest submissions
What data should you label to get the most value for your money?
Latest submissions
Interactive embodied agents for Human-AI collaboration
Latest submissions
See Allgraded | 199453 | ||
graded | 199452 | ||
graded | 198521 |
Specialize and Bargain in Brave New Worlds
Latest submissions
Amazon KDD Cup 2022
Latest submissions
Behavioral Representation Learning from Animal Poses.
Latest submissions
See Allgraded | 198630 | ||
graded | 197504 | ||
graded | 197503 |
Airborne Object Tracking Challenge
Latest submissions
ASCII-rendered single-player dungeon crawl game
Latest submissions
See Allgraded | 158823 | ||
failed | 158209 | ||
failed | 158208 |
Latest submissions
Training sample-efficient agents in Minecraft
Latest submissions
Machine Learning for detection of early onset of Alzheimers
Latest submissions
Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments
Latest submissions
Self-driving RL on DeepRacer cars - From simulation to real world
Latest submissions
See Allgraded | 165209 | ||
failed | 165208 | ||
failed | 165206 |
Robustness and teamwork in a massively multiagent environment
Latest submissions
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Multi-Agent Reinforcement Learning on Trains
Latest submissions
Latest submissions
See Allgraded | 143804 | ||
graded | 125756 | ||
graded | 125751 |
5 Problems 15 Days. Can you solve it all?
Latest submissions
Learn to Recognise New Behaviors from limited training examples.
Latest submissions
See Allgraded | 125756 | ||
graded | 125589 |
Reinforcement Learning, IIT-M, assignment 1
Latest submissions
See Allgraded | 125767 | ||
submitted | 125747 | ||
graded | 125006 |
IIT-M, Reinforcement Learning, DP, Taxi Problem
Latest submissions
See Allgraded | 125767 | ||
graded | 125006 | ||
graded | 124921 |
Latest submissions
See Allgraded | 128400 | ||
submitted | 128365 |
Latest submissions
See Allfailed | 131869 | ||
graded | 130090 | ||
graded | 128401 |
Latest submissions
See Allfailed | 131869 | ||
graded | 130090 | ||
graded | 128401 |
Latest submissions
See Allgraded | 135842 | ||
graded | 130545 |
Round 1 - Completed
Latest submissions
Round 1 - Completed
Latest submissions
Identify Words from silent video inputs.
Latest submissions
Round 2 - Active | Claim AWS Credits by beating the baseline
Latest submissions
See Allgraded | 198630 | ||
graded | 182252 | ||
graded | 178951 |
Round 2 - Active | Claim AWS Credits by beating the baseline
Latest submissions
See Allgraded | 197504 | ||
graded | 197503 | ||
graded | 182254 |
Use an RL agent to build a structure with natural language inputs
Latest submissions
See Allgraded | 199453 | ||
graded | 199452 | ||
graded | 198521 |
Language assisted Human - AI Collaboration
Latest submissions
See Allgraded | 196399 | ||
graded | 196379 | ||
failed | 196363 |
Latest submissions
See Allsubmitted | 245759 | ||
submitted | 245757 | ||
graded | 200962 |
Estimate depth in aerial images from monocular downward-facing drone
Latest submissions
See Allgraded | 214522 | ||
graded | 214521 | ||
failed | 214517 |
Perform semantic segmentation on aerial images from monocular downward-facing drone
Latest submissions
See Allgraded | 214611 | ||
failed | 214573 | ||
failed | 214570 |
Music source separation of an audio signal into separate tracks for vocals, bass, drums, and other
Latest submissions
See Allgraded | 236127 | ||
graded | 236125 | ||
failed | 236121 |
Source separation of a cinematic audio track into dialogue, sound-effects and misc.
Latest submissions
See Allfailed | 236027 | ||
failed | 236026 | ||
failed | 236024 |
Latest submissions
See Allgraded | 206453 | ||
submitted | 206452 | ||
submitted | 206337 |
Predicting Temperature around Nuclear Waste Canisters
Latest submissions
Latest submissions
See Allgraded | 223574 | ||
graded | 223573 | ||
graded | 223572 |
Commonsense Dialogue Response Generation
Latest submissions
See Allgraded | 249059 | ||
submitted | 249058 | ||
submitted | 249057 |
Participant | Rating |
---|---|
nachiket_dev_me18b017 | 0 |
cadabullos | 0 |
alina_porechina | 0 |
ryan811 | 0 |
1774452411aqqcom | 0 |
XIAOhe | 0 |
Participant | Rating |
---|
-
Random-walk Airborne Object Tracking ChallengeView
-
R2D2 NeurIPS 2022: CityLearn ChallengeView
-
dipam_chakraborty Testing NetView
meta-kdd-cup-24-staging
Task 1: Commonsense Dialogue Response Generation
Updates to Task 1 Metrics
10 months ago@saidinesh_pola I canβt share all the details about the GPT metric, but we do manage it for cases itβs not a valid score. For final submissions, each team will get to select any 2 successful submissions, doesnβt matter if itβs GPU track or not.
Updates to Task 1 Metrics
10 months ago@unnikrishnan.r I donβt see why gpt3.5 would naturally score prompt track submissions higher. Yes it is what is currently occuring but there is no natural reason for it. The metrics were decided based on actual human evaluations done on a blindly selected subset of Round 1 conversations, and gpt3.5 scores were the most correlated with the human evaluations.
Updates to Task 1 Metrics
10 months ago@saidinesh_pola , Yes GPT3.5 score is GPT-3.5-turbo generating a score based on a modified prompt similar to G-eval. It scores every utterance with the conversation history as context. And it generates a score.
Updates to Task 1 Metrics
10 months agoWe are updating the metrics used to compute the leaderboards during the challenge.
Noting that the final prizes and standings will be decided based on the outcomes of the human evaluations, we explored how closely the current metrics correlated with overall score of human evaluation results.
After a thorough investigation, we noticed that World Level F1 and 4-gram BLEU did not accurately reflect the performance of the submitted models.
In light of the said observation, and with the motivation to provide a more accurate feedback to the teams about their performance, we have decided to incorporate three new metrics: CPDScore, USEScore, and BERTScore. CPDScore and USEScore have demonstrated superior accuracy compared to the previously used metrics. Moving forward, CPDScore will be the primary metric for leaderboard rankings, while BERTScore will serve as an additional metric for reference due to its widespread use in automatic evaluation benchmarks.
CPDScore is a LLM based metric that uses a prompt similar to G-EVAL. The metric focuses on βHumannessβ whose criteria are described in the prompt. For Round 2, it employs GPT-3.5-turbo-0125, while for the final leaderboard we might use a stronger model.
USEScore calculates similarity using the Universal Sentence Encoder.
BERTScore is a commonly used benchmark in the evaluation of automatic metrics.
We have added the scores to all submissions of Round 2 - Task 1.
Changelog for Round 1
About 1 year agoHi Everyone,
Weβve launched Round of the challenge, as you know, this comes with an updated the dataset and leaderboard. Along with this, we introduced some new major updates, as mentioned below.
-
Prompt Engineering Track - We have introduced the option to use OpenAI API for prompt engineering based submissions. This will significantly lower the barrier to entry to make submissions. Check out latest documentation starter kit to find out how to make submissions with access to OpenAI API.
-
Changes in function format - The
generate_responses
function in the agent class has an updated interface to support the prompt engineering track and gpu track simultaneously. Please make the changes to the agent class according to the new format. Check this starter kit commit for the updates to the documentation. Itβs recommended that you pull the latest updates made on the starter kit to your submission repos. -
Removal of persona A information - The organizers have asked for the removal of persona A information from the input data. This brings the benchmark closer to a real world setting where the persona of the conversation partner may not be known. If your agent used persona A information, please update it accordingly.
Generative Interior Design Challenge 2024
How to know my submission was successful and what is the right way to make a successful submission?
11 months agoHi @kcy4 ,
Unfortunately your submission failed, you can see the logs here
Apologies for the confusion in the documentation, will fix them in the starter kit. You should change the model name in models/user_config.py
and the function name should be generate_design
, please ignore any other conflicting names in the starter kit.
Commonsense Persona-Grounded Dialogue Chall-459c12
Request for More Comprehensive Error Reports for Recent Submission Failures Task1
11 months agoHi @nazlicanto , unfortunately most these seem to be caused by intermittent issue in the OpenAI API while handling large number of API calls. Iβve made changes to try and fix this, have resubmitted your submission 248051 and will keep on eye on it so that it passes.
Prize of Task1
About 1 year agoHello, the tracks are meant to share the prizes. In fact, the leaderboard and dataset for both tracks are also shared, the only difference is the trade off between using a single onboard GPU and a smaller finetuned model vs a larger model via API and only retrieval and prompt engineering.
So to answer your question, no, there will be only a single set of winners who will receive the prize.
Task 2: Commonsense Persona Knowledge Linking
MeltingPot Challenge 2023-ca9cce
Evaluation Score Update
About 1 year agoTL;DR: We are re-calculating scores using the correct normalization. We donβt expect any significant changes in the ranking, only on the numerical values.
Dear Participants - We are making an amendment to the evaluation score shown on the leaderboard. Unfortunately we noticed that score computation had a normalisation discrepancy with that of Melting Pot 2.0. We donβt expect the ranking of any of the submissions to change, only the numerical score. The Melting Pot 2.0 tech report defines a normalization of return for a scenario to be zero for the baseline with the lowest focal per-capita return, and one for the highest one. The normalization currently reported used the exploiter return from Melting Point 2.0 as the upper bound and random performance as lower bound. For most scenarios these are the same, but there are some for which this is not the case.
Going forward we are updating the leaderboard score to use the normalization from Melting Pot 2.0. We will use this updated normalization for the generalization score to be displayed from Nov 1 too.
Please note: This change will only reflect as a scale factor on overall score where in your performance on some scenarios may slightly decrease by the same scale for all teams and hence we do not expect any significant change in the current leaderboard ranking due to this. We will be using the new score moving forward for both development and generalization phase and also final evaluation.
MosquitoAlert Challenge 2023
Select submissions for final leaderboard
About 1 year agoHi Everyone,
All teams need to select upto 3 submissions for their final leaderboard entries using the form on this page.
Git Clone Runs infinitely
About 1 year agoHi Everyone,
Weβre looking into whatβs wrong with the submission script. In the meantime, please use the advice from @hca97. The same is also described in the starter kit. Please note the you need to push to the correct remote repo. I think for most participants, this would be set to aicrowd
instead of origin
, please check the name of your remote repo using git remote -v
before trying to push the tag.
Submission failed during Inference
About 1 year ago@saidinesh_pola @gavinwangshuo , both of these failed due to timeout.
Submission failed during Inference
About 1 year ago@hca97 No it is not the first prediction, I see many print statements before this with varying time values.
Submission failed during Inference
About 1 year ago@fkemeth Can you give me the submission ids where you get the failure and Iβll check if itβs due to timeout or something else.
NeurIPS 2023 Citylearn Challenge
Support for central_agent=False
About 1 year agoHi Everyone,
We now support setting central_agent=False in the Citylearn Environment. Since the creation of the environment is not directly controlled by participants. You need to provide a separate key in the aicrowd.json file in your starter kits, to let us know to use non central agents.
Things to keep in mind:
- The actions you provide will be passed to the environment as is, so make sure to use the correct action format when using non central agent.
- Make sure the aicrowd.json is formatted correctly as a valid json file.
- Make sure to use βFalseβ in quotes when setting central agent to False in the json file.
All the best for the challenge.
Control Track: CityLearn Challenge
How to load a trained model during AIcrowd evaluation?
About 1 year ago@skywuuuu You an use git-lfs using the instructions provided in the starter kit.
Notebooks
-
MosquitoAlert - YoloV5 Baseline Submission MosquitoAlert - YoloV5 Baseline SubmissiondipamΒ· Over 1 year ago
-
[Getting Started] ETH PSC Summer School Hackathon This is a Baseline Code to get you started with the challenge.dipamΒ· Over 2 years ago
-
Baseline - BERT Classifier - BM25 Ranker Official baseline that uses BERT based classifier and BM25 rankerdipamΒ· Over 2 years ago
-
Unsupervised model - SimCLR - Ant-Beetles Video Data Unsupervised model training using contrastive learning with modified SimCLRdipamΒ· Over 2 years ago
-
Unsupervised model - SimCLR - Mouse Video Data Unsupervised model training using contrastive learning with modified SimCLRdipamΒ· Over 2 years ago
-
Getting Started - Mouse-Triplets Video Data Initial data exploration and a basic embedding using a vision modeldipamΒ· Over 2 years ago
-
Getting Started - Ant-Beetles Video Data Initial data exploration and a basic embedding using a vision modeldipamΒ· Over 2 years ago
-
BSuite Challenge Starter Kit IITM RL Final Project Bsuite starter kit with random baselinedipamΒ· Over 3 years ago
-
Solution for submission 128367 A detailed solution for submission 128367 submitted for challenge IIT-M RL-ASSIGNMENT-2-GRIDWORLDdipamΒ· Over 3 years ago
-
Solution for submission 130090 A detailed solution for submission 130090 submitted for challenge IIT-M RL-ASSIGNMENT-2-GRIDWORLDdipamΒ· Over 3 years ago
-
Solution for submission 128401 A detailed solution for submission 128401 submitted for challenge IIT-M RL-ASSIGNMENT-2-GRIDWORLDdipamΒ· Over 3 years ago
-
Solution for submission 128400 A detailed solution for submission 128400 submitted for challenge IIT-M RL-ASSIGNMENT-2-TAXIdipamΒ· Over 3 years ago
-
Taxi Notebook IITM RL Assignment 2 Notebook to be filled for IITM RL Assingnment 2 TaxidipamΒ· Over 3 years ago
-
Gridworld Notebook IITM RL Assignment 2 Notebook to be filled for IITM RL Assingnment 2 GridworlddipamΒ· Over 3 years ago
About the meta-kdd-cup-24-staging category
7 months ago(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
Why should people use this category? What is it for?
How exactly is this different than the other categories we already have?
What should topics in this category generally contain?
Do we need this category? Can we merge with another category, or subcategory?