Activity
Challenge Categories
Challenges Entered
Generate Synchronised & Contextually Accurate Videos
Latest submissions
See Allgraded | 271590 | ||
graded | 271581 | ||
graded | 271570 |
Improve RAG with Real-World Benchmarks
Latest submissions
See Allgraded | 251985 | ||
failed | 251936 | ||
failed | 251933 |
Evaluate Natural Conversations
Latest submissions
See Allgraded | 249059 | ||
submitted | 249058 | ||
submitted | 249057 |
Multi-Agent Dynamics & Mixed-Motive Cooperation
Latest submissions
See Allgraded | 244563 | ||
graded | 237014 |
Advanced Building Control & Grid-Resilience
Latest submissions
See Allfailed | 238874 | ||
failed | 238665 | ||
failed | 238660 |
Specialize and Bargain in Brave New Worlds
Latest submissions
See Allfailed | 237925 | ||
failed | 237924 |
Small Object Detection and Classification
Latest submissions
See Allgraded | 237160 | ||
failed | 237147 | ||
graded | 236860 |
Understand semantic segmentation and monocular depth estimation from downward-facing drone images
Latest submissions
See Allgraded | 214611 | ||
failed | 214573 | ||
failed | 214570 |
Audio Source Separation using AI
Latest submissions
See Allgraded | 236127 | ||
graded | 236125 | ||
failed | 236121 |
Identify user photos in the marketplace
Latest submissions
See Allgraded | 209775 | ||
failed | 209210 | ||
graded | 208987 |
A benchmark for image-based food recognition
Latest submissions
Using AI For Buildingβs Energy Management
Latest submissions
See Allfailed | 235438 | ||
failed | 235434 | ||
failed | 205233 |
Learning From Human-Feedback
Latest submissions
What data should you label to get the most value for your money?
Latest submissions
Interactive embodied agents for Human-AI collaboration
Latest submissions
See Allgraded | 199453 | ||
graded | 199452 | ||
graded | 198521 |
Specialize and Bargain in Brave New Worlds
Latest submissions
Amazon KDD Cup 2022
Latest submissions
Behavioral Representation Learning from Animal Poses.
Latest submissions
See Allgraded | 198630 | ||
graded | 197504 | ||
graded | 197503 |
Airborne Object Tracking Challenge
Latest submissions
ASCII-rendered single-player dungeon crawl game
Latest submissions
See Allgraded | 158823 | ||
failed | 158209 | ||
failed | 158208 |
Latest submissions
Training sample-efficient agents in Minecraft
Latest submissions
Machine Learning for detection of early onset of Alzheimers
Latest submissions
Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments
Latest submissions
Self-driving RL on DeepRacer cars - From simulation to real world
Latest submissions
See Allgraded | 165209 | ||
failed | 165208 | ||
failed | 165206 |
Robustness and teamwork in a massively multiagent environment
Latest submissions
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Multi-Agent Reinforcement Learning on Trains
Latest submissions
Latest submissions
See Allgraded | 143804 | ||
graded | 125756 | ||
graded | 125751 |
5 Problems 15 Days. Can you solve it all?
Latest submissions
Learn to Recognise New Behaviors from limited training examples.
Latest submissions
See Allgraded | 125756 | ||
graded | 125589 |
Reinforcement Learning, IIT-M, assignment 1
Latest submissions
See Allgraded | 125767 | ||
submitted | 125747 | ||
graded | 125006 |
IIT-M, Reinforcement Learning, DP, Taxi Problem
Latest submissions
See Allgraded | 125767 | ||
graded | 125006 | ||
graded | 124921 |
Latest submissions
See Allgraded | 128400 | ||
submitted | 128365 |
Latest submissions
See Allfailed | 131869 | ||
graded | 130090 | ||
graded | 128401 |
Latest submissions
See Allfailed | 131869 | ||
graded | 130090 | ||
graded | 128401 |
Latest submissions
See Allgraded | 135842 | ||
graded | 130545 |
Round 1 - Completed
Latest submissions
Round 1 - Completed
Latest submissions
Identify Words from silent video inputs.
Latest submissions
Round 2 - Active | Claim AWS Credits by beating the baseline
Latest submissions
See Allgraded | 198630 | ||
graded | 182252 | ||
graded | 178951 |
Round 2 - Active | Claim AWS Credits by beating the baseline
Latest submissions
See Allgraded | 197504 | ||
graded | 197503 | ||
graded | 182254 |
Use an RL agent to build a structure with natural language inputs
Latest submissions
See Allgraded | 199453 | ||
graded | 199452 | ||
graded | 198521 |
Language assisted Human - AI Collaboration
Latest submissions
See Allgraded | 196399 | ||
graded | 196379 | ||
failed | 196363 |
Latest submissions
See Allsubmitted | 245759 | ||
submitted | 245757 | ||
graded | 200962 |
Estimate depth in aerial images from monocular downward-facing drone
Latest submissions
See Allgraded | 214522 | ||
graded | 214521 | ||
failed | 214517 |
Perform semantic segmentation on aerial images from monocular downward-facing drone
Latest submissions
See Allgraded | 214611 | ||
failed | 214573 | ||
failed | 214570 |
Music source separation of an audio signal into separate tracks for vocals, bass, drums, and other
Latest submissions
See Allgraded | 236127 | ||
graded | 236125 | ||
failed | 236121 |
Source separation of a cinematic audio track into dialogue, sound-effects and misc.
Latest submissions
See Allfailed | 236027 | ||
failed | 236026 | ||
failed | 236024 |
Latest submissions
See Allgraded | 206453 | ||
submitted | 206452 | ||
submitted | 206337 |
Predicting Temperature around Nuclear Waste Canisters
Latest submissions
Latest submissions
See Allgraded | 223574 | ||
graded | 223573 | ||
graded | 223572 |
Commonsense Dialogue Response Generation
Latest submissions
See Allgraded | 249059 | ||
submitted | 249058 | ||
submitted | 249057 |
Participant | Rating |
---|---|
nachiket_dev_me18b017 | 0 |
cadabullos | 0 |
alina_porechina | 0 |
ryan811 | 0 |
1774452411aqqcom | 0 |
XIAOhe | 0 |
Participant | Rating |
---|
-
Random-walk Airborne Object Tracking ChallengeView
-
R2D2 NeurIPS 2022: CityLearn ChallengeView
-
dipam_chakraborty Testing NetView
Meta Comprehensive RAG Benchmark: KDD Cup 2-524854
Meta Comprehensive RAG Benchmark: KDD Cup 2-35606c
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-35606c category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2-f8dd03
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-f8dd03 category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2-290ad7
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-290ad7 category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2-b8c075
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-b8c075 category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2-dcb737
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-dcb737 category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2-68ef6d
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-68ef6d category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2-ac174e
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-ac174e category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2-dd58fc
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-dd58fc category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Meta Comprehensive RAG Benchmark: KDD Cup 2025
About the Meta Comprehensive RAG Benchmark: KDD Cup 2025 category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
meta-kdd-cup-24-staging
About the meta-kdd-cup-24-staging category
9 months ago(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
-
Why should people use this category? What is it for?
-
How exactly is this different than the other categories we already have?
-
What should topics in this category generally contain?
-
Do we need this category? Can we merge with another category, or subcategory?
Task 1: Commonsense Dialogue Response Generation
Updates to Task 1 Metrics
11 months ago@saidinesh_pola I canβt share all the details about the GPT metric, but we do manage it for cases itβs not a valid score. For final submissions, each team will get to select any 2 successful submissions, doesnβt matter if itβs GPU track or not.
Updates to Task 1 Metrics
11 months ago@unnikrishnan.r I donβt see why gpt3.5 would naturally score prompt track submissions higher. Yes it is what is currently occuring but there is no natural reason for it. The metrics were decided based on actual human evaluations done on a blindly selected subset of Round 1 conversations, and gpt3.5 scores were the most correlated with the human evaluations.
Updates to Task 1 Metrics
11 months ago@saidinesh_pola , Yes GPT3.5 score is GPT-3.5-turbo generating a score based on a modified prompt similar to G-eval. It scores every utterance with the conversation history as context. And it generates a score.
Updates to Task 1 Metrics
11 months agoWe are updating the metrics used to compute the leaderboards during the challenge.
Noting that the final prizes and standings will be decided based on the outcomes of the human evaluations, we explored how closely the current metrics correlated with overall score of human evaluation results.
After a thorough investigation, we noticed that World Level F1 and 4-gram BLEU did not accurately reflect the performance of the submitted models.
In light of the said observation, and with the motivation to provide a more accurate feedback to the teams about their performance, we have decided to incorporate three new metrics: CPDScore, USEScore, and BERTScore. CPDScore and USEScore have demonstrated superior accuracy compared to the previously used metrics. Moving forward, CPDScore will be the primary metric for leaderboard rankings, while BERTScore will serve as an additional metric for reference due to its widespread use in automatic evaluation benchmarks.
CPDScore is a LLM based metric that uses a prompt similar to G-EVAL. The metric focuses on βHumannessβ whose criteria are described in the prompt. For Round 2, it employs GPT-3.5-turbo-0125, while for the final leaderboard we might use a stronger model.
USEScore calculates similarity using the Universal Sentence Encoder.
BERTScore is a commonly used benchmark in the evaluation of automatic metrics.
We have added the scores to all submissions of Round 2 - Task 1.
Generative Interior Design Challenge 2024
How to know my submission was successful and what is the right way to make a successful submission?
12 months agoHi @kcy4 ,
Unfortunately your submission failed, you can see the logs here
Apologies for the confusion in the documentation, will fix them in the starter kit. You should change the model name in models/user_config.py
and the function name should be generate_design
, please ignore any other conflicting names in the starter kit.
Commonsense Persona-Grounded Dialogue Chall-459c12
Request for More Comprehensive Error Reports for Recent Submission Failures Task1
12 months agoHi @nazlicanto , unfortunately most these seem to be caused by intermittent issue in the OpenAI API while handling large number of API calls. Iβve made changes to try and fix this, have resubmitted your submission 248051 and will keep on eye on it so that it passes.
Task 2: Commonsense Persona Knowledge Linking
Notebooks
-
MosquitoAlert - YoloV5 Baseline Submission MosquitoAlert - YoloV5 Baseline SubmissiondipamΒ· Over 1 year ago
-
[Getting Started] ETH PSC Summer School Hackathon This is a Baseline Code to get you started with the challenge.dipamΒ· Over 2 years ago
-
Baseline - BERT Classifier - BM25 Ranker Official baseline that uses BERT based classifier and BM25 rankerdipamΒ· Over 2 years ago
-
Unsupervised model - SimCLR - Ant-Beetles Video Data Unsupervised model training using contrastive learning with modified SimCLRdipamΒ· Almost 3 years ago
-
Unsupervised model - SimCLR - Mouse Video Data Unsupervised model training using contrastive learning with modified SimCLRdipamΒ· Almost 3 years ago
-
Getting Started - Mouse-Triplets Video Data Initial data exploration and a basic embedding using a vision modeldipamΒ· Almost 3 years ago
-
Getting Started - Ant-Beetles Video Data Initial data exploration and a basic embedding using a vision modeldipamΒ· Almost 3 years ago
-
BSuite Challenge Starter Kit IITM RL Final Project Bsuite starter kit with random baselinedipamΒ· Almost 4 years ago
-
Solution for submission 128367 A detailed solution for submission 128367 submitted for challenge IIT-M RL-ASSIGNMENT-2-GRIDWORLDdipamΒ· Almost 4 years ago
-
Solution for submission 130090 A detailed solution for submission 130090 submitted for challenge IIT-M RL-ASSIGNMENT-2-GRIDWORLDdipamΒ· Almost 4 years ago
-
Solution for submission 128401 A detailed solution for submission 128401 submitted for challenge IIT-M RL-ASSIGNMENT-2-GRIDWORLDdipamΒ· Almost 4 years ago
-
Solution for submission 128400 A detailed solution for submission 128400 submitted for challenge IIT-M RL-ASSIGNMENT-2-TAXIdipamΒ· Almost 4 years ago
-
Taxi Notebook IITM RL Assignment 2 Notebook to be filled for IITM RL Assingnment 2 TaxidipamΒ· Almost 4 years ago
-
Gridworld Notebook IITM RL Assignment 2 Notebook to be filled for IITM RL Assingnment 2 GridworlddipamΒ· Almost 4 years ago
About the Meta Comprehensive RAG Benchmark: KDD Cup 2-524854 category
Yesterday(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)
Use the following paragraphs for a longer description, or to establish category guidelines or rules:
Why should people use this category? What is it for?
How exactly is this different than the other categories we already have?
What should topics in this category generally contain?
Do we need this category? Can we merge with another category, or subcategory?