Loading
6 Follower
0 Following
dipam
Dipam Chakraborty

Organization

ML Engineer at AIcrowd

Location

Kolkata, IN

Badges

7
5
3

Connect

Activity

Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Improve RAG with Real-World Benchmarks

Latest submissions

See All
graded 251985
failed 251936
failed 251933

Latest submissions

See All
failed 248132
failed 247994

Latest submissions

See All
graded 249059
submitted 249058
submitted 249057

Multi-Agent Dynamics & Mixed-Motive Cooperation

Latest submissions

See All
graded 244563
graded 237014

Latest submissions

See All
failed 238874
failed 238665
failed 238660

Latest submissions

See All
failed 237925
failed 237924

Latest submissions

See All
graded 220243
failed 220238
failed 220005

Latest submissions

See All
graded 227169
graded 227166
graded 227164

Small Object Detection and Classification

Latest submissions

See All
graded 237160
failed 237147
graded 236860

Understand semantic segmentation and monocular depth estimation from downward-facing drone images

Latest submissions

See All
graded 214611
failed 214573
failed 214570

Latest submissions

See All
graded 236127
graded 236125
failed 236121

Latest submissions

See All
graded 209775
failed 209210
graded 208987

A benchmark for image-based food recognition

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
failed 235438
failed 235434
failed 205233

Latest submissions

No submissions made in this challenge.

What data should you label to get the most value for your money?

Latest submissions

No submissions made in this challenge.

Interactive embodied agents for Human-AI collaboration

Latest submissions

See All
graded 199453
graded 199452
graded 198521

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Behavioral Representation Learning from Animal Poses.

Latest submissions

See All
graded 198630
graded 197504
graded 197503

Airborne Object Tracking Challenge

Latest submissions

No submissions made in this challenge.

ASCII-rendered single-player dungeon crawl game

Latest submissions

See All
graded 158823
failed 158209
failed 158208

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 152892
graded 152891
failed 152884

Machine Learning for detection of early onset of Alzheimers

Latest submissions

No submissions made in this challenge.

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

Latest submissions

No submissions made in this challenge.

Self-driving RL on DeepRacer cars - From simulation to real world

Latest submissions

See All
graded 165209
failed 165208
failed 165206

Robustness and teamwork in a massively multiagent environment

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

5 Puzzles 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Multi-Agent Reinforcement Learning on Trains

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 143804
graded 125756
graded 125751

5 Problems 15 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Learn to Recognise New Behaviors from limited training examples.

Latest submissions

See All
graded 125756
graded 125589

Reinforcement Learning, IIT-M, assignment 1

Latest submissions

See All
graded 125767
submitted 125747
graded 125006

IIT-M, Reinforcement Learning, DP, Taxi Problem

Latest submissions

See All
graded 125767
graded 125006
graded 124921

Latest submissions

See All
graded 128400
submitted 128365

Latest submissions

See All
failed 131869
graded 130090
graded 128401

Latest submissions

See All
failed 131869
graded 130090
graded 128401

Latest submissions

See All
graded 135842
graded 130545

Latest submissions

No submissions made in this challenge.

Round 1 - Completed

Latest submissions

No submissions made in this challenge.

Identify Words from silent video inputs.

Latest submissions

No submissions made in this challenge.

Round 2 - Active | Claim AWS Credits by beating the baseline

Latest submissions

See All
graded 198630
graded 182252
graded 178951

Round 2 - Active | Claim AWS Credits by beating the baseline

Latest submissions

See All
graded 197504
graded 197503
graded 182254

Use an RL agent to build a structure with natural language inputs

Latest submissions

See All
graded 199453
graded 199452
graded 198521

Language assisted Human - AI Collaboration

Latest submissions

See All
graded 196399
graded 196379
failed 196363

Latest submissions

See All
submitted 245759
submitted 245757
graded 200962

Estimate depth in aerial images from monocular downward-facing drone

Latest submissions

See All
graded 214522
graded 214521
failed 214517

Perform semantic segmentation on aerial images from monocular downward-facing drone

Latest submissions

See All
graded 214611
failed 214573
failed 214570

Music source separation of an audio signal into separate tracks for vocals, bass, drums, and other

Latest submissions

See All
graded 236127
graded 236125
failed 236121

Source separation of a cinematic audio track into dialogue, sound-effects and misc.

Latest submissions

See All
failed 236027
failed 236026
failed 236024

Latest submissions

See All
graded 206453
submitted 206452
submitted 206337

Latest submissions

See All
graded 227169
graded 211920
failed 211916

Latest submissions

See All
graded 227164
graded 211921

Latest submissions

See All
graded 227166
graded 212284

Latest submissions

See All
graded 223574
graded 223573
graded 223572

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 249059
submitted 249058
submitted 249057
Participant Rating
nachiket_dev_me18b017 0
cadabullos 0
alina_porechina 0
ryan811 0
1774452411aqqcom 0
XIAOhe 0
Participant Rating

Task 1: Commonsense Dialogue Response Generation

Updates to Task 1 Metrics

About 1 month ago

@saidinesh_pola I can’t share all the details about the GPT metric, but we do manage it for cases it’s not a valid score. For final submissions, each team will get to select any 2 successful submissions, doesn’t matter if it’s GPU track or not.

Updates to Task 1 Metrics

About 2 months ago

@unnikrishnan.r I don’t see why gpt3.5 would naturally score prompt track submissions higher. Yes it is what is currently occuring but there is no natural reason for it. The metrics were decided based on actual human evaluations done on a blindly selected subset of Round 1 conversations, and gpt3.5 scores were the most correlated with the human evaluations.

Updates to Task 1 Metrics

About 2 months ago

@saidinesh_pola , Yes GPT3.5 score is GPT-3.5-turbo generating a score based on a modified prompt similar to G-eval. It scores every utterance with the conversation history as context. And it generates a score.

Updates to Task 1 Metrics

About 2 months ago

We are updating the metrics used to compute the leaderboards during the challenge.

Noting that the final prizes and standings will be decided based on the outcomes of the human evaluations, we explored how closely the current metrics correlated with overall score of human evaluation results.

After a thorough investigation, we noticed that World Level F1 and 4-gram BLEU did not accurately reflect the performance of the submitted models.

In light of the said observation, and with the motivation to provide a more accurate feedback to the teams about their performance, we have decided to incorporate three new metrics: CPDScore, USEScore, and BERTScore. CPDScore and USEScore have demonstrated superior accuracy compared to the previously used metrics. Moving forward, CPDScore will be the primary metric for leaderboard rankings, while BERTScore will serve as an additional metric for reference due to its widespread use in automatic evaluation benchmarks.

CPDScore is a LLM based metric that uses a prompt similar to G-EVAL. The metric focuses on “Humanness” whose criteria are described in the prompt. For Round 2, it employs GPT-3.5-turbo-0125, while for the final leaderboard we might use a stronger model.
USEScore calculates similarity using the Universal Sentence Encoder.
BERTScore is a commonly used benchmark in the evaluation of automatic metrics.

We have added the scores to all submissions of Round 2 - Task 1.

Changelog for Round 1

4 months ago

Hi Everyone,

We’ve launched Round of the challenge, as you know, this comes with an updated the dataset and leaderboard. Along with this, we introduced some new major updates, as mentioned below.

  1. Prompt Engineering Track - We have introduced the option to use OpenAI API for prompt engineering based submissions. This will significantly lower the barrier to entry to make submissions. Check out latest documentation starter kit to find out how to make submissions with access to OpenAI API.

  2. Changes in function format - The generate_responses function in the agent class has an updated interface to support the prompt engineering track and gpu track simultaneously. Please make the changes to the agent class according to the new format. Check this starter kit commit for the updates to the documentation. It’s recommended that you pull the latest updates made on the starter kit to your submission repos.

  3. Removal of persona A information - The organizers have asked for the removal of persona A information from the input data. This brings the benchmark closer to a real world setting where the persona of the conversation partner may not be known. If your agent used persona A information, please update it accordingly.

Generative Interior Design Challenge 2024

How to know my submission was successful and what is the right way to make a successful submission?

2 months ago

Hi @kcy4 ,

Unfortunately your submission failed, you can see the logs here

Apologies for the confusion in the documentation, will fix them in the starter kit. You should change the model name in models/user_config.py and the function name should be generate_design, please ignore any other conflicting names in the starter kit.

Can't open baseline starter kit

2 months ago

Thanks for informing @Camaro , it’s public now.

Commonsense Persona-Grounded Dialogue Chall-1f6f43

Request for More Comprehensive Error Reports for Recent Submission Failures Task1

2 months ago

Hi @nazlicanto , unfortunately most these seem to be caused by intermittent issue in the OpenAI API while handling large number of API calls. I’ve made changes to try and fix this, have resubmitted your submission 248051 and will keep on eye on it so that it passes.

Prize of Task1

4 months ago

Hello, the tracks are meant to share the prizes. In fact, the leaderboard and dataset for both tracks are also shared, the only difference is the trade off between using a single onboard GPU and a smaller finetuned model vs a larger model via API and only retrieval and prompt engineering.

So to answer your question, no, there will be only a single set of winners who will receive the prize.

Task 2: Commonsense Persona Knowledge Linking

Submissions getting failed

3 months ago

Okay I’ll create it in a few days

Submissions getting failed

3 months ago

Hi @saidinesh_pola I’ve replied to got gitlab issue.

MeltingPot Challenge 2023-ca9cce

Evaluation Score Update

6 months ago

TL;DR: We are re-calculating scores using the correct normalization. We don’t expect any significant changes in the ranking, only on the numerical values.

Dear Participants - We are making an amendment to the evaluation score shown on the leaderboard. Unfortunately we noticed that score computation had a normalisation discrepancy with that of Melting Pot 2.0. We don’t expect the ranking of any of the submissions to change, only the numerical score. The Melting Pot 2.0 tech report defines a normalization of return for a scenario to be zero for the baseline with the lowest focal per-capita return, and one for the highest one. The normalization currently reported used the exploiter return from Melting Point 2.0 as the upper bound and random performance as lower bound. For most scenarios these are the same, but there are some for which this is not the case.

Going forward we are updating the leaderboard score to use the normalization from Melting Pot 2.0. We will use this updated normalization for the generalization score to be displayed from Nov 1 too.

Please note: This change will only reflect as a scale factor on overall score where in your performance on some scenarios may slightly decrease by the same scale for all teams and hence we do not expect any significant change in the current leaderboard ranking due to this. We will be using the new score moving forward for both development and generalization phase and also final evaluation.

MosquitoAlert Challenge 2023

Select submissions for final leaderboard

6 months ago

Hi Everyone,

All teams need to select upto 3 submissions for their final leaderboard entries using the form on this page.

Git Clone Runs infinitely

6 months ago

Hi Everyone,

We’re looking into what’s wrong with the submission script. In the meantime, please use the advice from @hca97. The same is also described in the starter kit. Please note the you need to push to the correct remote repo. I think for most participants, this would be set to aicrowd instead of origin, please check the name of your remote repo using git remote -v before trying to push the tag.

Submission failed during Inference

6 months ago

@saidinesh_pola @gavinwangshuo , both of these failed due to timeout.

Submission failed during Inference

7 months ago

@hca97 No it is not the first prediction, I see many print statements before this with varying time values.

Submission failed during Inference

7 months ago

@fkemeth Can you give me the submission ids where you get the failure and I’ll check if it’s due to timeout or something else.

Submission failed during Inference

7 months ago

@hca97

The timeout on these submission does indeed cross 2 seconds for one instance. I believe these are your print statements?

Totla Time  2018.3565616607666 ms

Unfortunately we can’t relax the timing criteria, if possible try to reduce the time taken by your model.

NeurIPS 2023 Citylearn Challenge

Support for central_agent=False

6 months ago

Hi Everyone,

We now support setting central_agent=False in the Citylearn Environment. Since the creation of the environment is not directly controlled by participants. You need to provide a separate key in the aicrowd.json file in your starter kits, to let us know to use non central agents.

Things to keep in mind:

  • The actions you provide will be passed to the environment as is, so make sure to use the correct action format when using non central agent.
  • Make sure the aicrowd.json is formatted correctly as a valid json file.
  • Make sure to use “False” in quotes when setting central agent to False in the json file.

All the best for the challenge.

Control Track: CityLearn Challenge

How to load a trained model during AIcrowd evaluation?

7 months ago

@skywuuuu You an use git-lfs using the instructions provided in the starter kit.

dipam has not provided any information yet.

Notebooks

Create Notebook