Loading
0 Follower
0 Following
mitchelldehaven
Mitchell DeHaven

Location

US

Badges

0
0
0

Activity

Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Mon
Wed
Fri

Challenge Categories

Loading...

Challenges Entered

Improve RAG with Real-World Benchmarks

Latest submissions

See All
graded 267139
graded 267109
graded 267108

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Music source separation of an audio signal into separate tracks for vocals, bass, drums, and other

Latest submissions

No submissions made in this challenge.

Testing RAG Systems with Limited Web Pages

Latest submissions

See All
graded 266980
graded 266965
graded 266414

Evaluating RAG Systems With Mock KGs and APIs

Latest submissions

See All
graded 267109
graded 267108
graded 266489

Enhance RAG systems With Multiple Web Sources & Mock API

Latest submissions

See All
graded 267139
graded 266751
failed 266738
Participant Rating
Participant Rating
  • md_dh Meta Comprehensive RAG Benchmark: KDD Cup 2024
    View

Meta Comprehensive RAG Benchmark: KDD Cup 2-9d1937

Is there an estimate for when the due diligence will be done?

6 months ago

From other discussions, it sounds like results are still being verified. Is there an estimate for how much longer this will take? I understand things need to be done to verify the results, but it has been a week and we are essentially in the dark as to what is going on or when it will be done.

Could 'Evaluation timed out' submission stuck at the final time be re-run?

7 months ago

Yeah, unfortunately my task 3 submission ran for 30.64 seconds on one instance, so it failed. I queue up the submission ~ 10 hours before the deadline, but it failed after the deadline, so I couldn’t fix the issue. Would be nice if we could submit to the selection deadline due to backlog yesterday, but probably unlikely. I guess a nice lesson not to wait till the last day to make a submission.

EDIT: To add, the problem with Evaluation Timed Out seems to be related to the speed at which git lfs can checkout your repo. I had the same problem yesterday and cleaned up my repo and they started submitting as expected.

Will submissions be allowed to finish beyond deadline?

7 months ago

@aicrowd_team @snehananavati @mohanty

Is there anyway to increase the compute available? The submission evaluation is ~6 hours behind the current submissions. Additionally the strain is causing previously working commits of mine to time out (at least that’s as far as I can tell what the problem is with:

Is this submission ID?

7 months ago

Yes, that is the submission ID.

Submission Fail due to Private Test: Evaluation timed out 😒

7 months ago

I recently had this happen. My guess is that with the recent submissions, there is strain somewhere. My recently working commits no longer work. I looked at the debug log and saw that it took ~ 40 minutes to the git repo to get fully downloaded. I cleaned up my lfs objects and trying again, will update if it worked.

Will submissions be allowed to finish beyond deadline?

7 months ago

Yeah, having to wait multiple hours for the submission to even get assigned a node is problematic. Hopefully they increase the resources here shortly so people can get their submissions in before the end.

How exactly is the number of submissions counted ten times a week?

7 months ago

I think they didn’t configure it correctly. It looks like for teams, each member can submit 10 times, at least that’s the only way I can see that team with 70+ submissions getting that to work.

Will submissions be allowed to finish beyond deadline?

7 months ago

There is quite a queue of submissions currently, meaning that it takes several hours before a submission even begins evaluation. If a submission is started before the end of the competition, will its results be allowed on the breadboard, or do they need to finish prior to the deadline to be considered?

Has phase-2 started?

8 months ago

So I think they extended phase 1 with phase 1b by ~ 10 days. Given the radio silence, hopefully that means they are postponing the phase 2 start by ~ 10 days, which would have it starting soon. Otherwise if phase 2 already started, there is going to be a mess since many teams have likely already exceeded the 6 submission limit they set for phase 2.

Whether the task test phase can link to the Internet

8 months ago

If I remember correctly, there is no internet access during eval.

Inference failed without helpful log

8 months ago

I ran into a similar problem, reducing the batch size resolved the problem.

Data Quality Collection V2, Task-1

8 months ago

At least on this one, the Oscar’s are a bit weird and this may be where some ambiguities creep in. When the Oscar’s are hosted, they are giving awards for shows in the previous year. So while King Kong was released in 2005, it was awarded the Oscar in 2006. From the Oscar’s website:

The 78th Academy Awards | 2006
Kodak Theatre at Hollywood & Highland Center
Sunday, March 5, 2006
Honoring movies released in 2005

However, I agree with your analysis in other areas and have run into a variety of instances where manually inspecting the evidence provided for the query indicates that either the answer they provide is incorrect or I cannot find in the evidence where they are able to retrieve the correct answer.

Failed submission (failed to download aicrowd_gym)

9 months ago

Can someone from the AICrowd team take a look at one of my submissions (my most recent one)? I didn’t use the starterkit and made my own project structure. I am able to run docker_run.sh locally and do small round of validation to ensure the Docker and everything gets set up properly.

However, when I submitted my code, I got the following error at the top of the log during the inference section:

An error occurred when installing aicrowd-gym:

  error: subprocess-exited-with-error

  

  Γ— Building wheel for pyzmq (pyproject.toml) did not run successfully.

  β”‚ exit code: 1

  ╰─> [178 lines of output]
...
...

Traceback (most recent call last):

  File "/aicrowd_source/run.py", line 1, in <module>

    from client_launcher import start_test_client

  File "/aicrowd_source/client_launcher.py", line 3, in <module>

    from aicrowd_gym.clients.zmq_oracle_client import ZmqOracleClient

ModuleNotFoundError: No module named 'aicrowd_gym'

The rest of the logs look like the docker gets built properly, but then it just starts unloading, without running inference.

I would try to do more debugging, but failed submissions count against our total submissions, so I would prefer not to burn through all of mine as they just reset.

Problems with API on billboard queries

9 months ago

For context, I am using the Docker container, but running from source.

Problems with API on billboard queries

9 months ago

Several of the billboard queries in the API allow specifying a date (e.g. { "date": "2024-02-28", "rank": 1 } which is the example given in the documentation for the endpoint /music/get_billboard_rank_date). However, sending a query for an adjacent date, like 2024-02-27 returns empty arrays for this example. I tried a handful of dates and they all return empty lists except for the example date 2024-02-28. This affects the other billboard endpoint APIs as well.

Sound Demixing Challenge 2023

Is evaluation ran on mono or dual channel

Almost 2 years ago

I’m currently setting up a system and had a question I wanted to sort out before I made any more decisions. The MUSDB18 dataset has dual channel audio for the input audio files. It would obviously be simpler to just mix this into a single source, but that may not make sense if the evaluation is checking the ability to reproduce the de-mixed dual channel audio. So is the evaluation ran on the dual channel audio?

Sony Music Demixing Challenge 2023

Is evaluation ran on mono or dual channel

Almost 2 years ago

I’m currently setting up a system and had a question I wanted to sort out before I made any more decisions. The MUSDB18 dataset has dual channel audio for the input audio files. It would obviously be simpler to just mix this into a single source, but that may not make sense if the evaluation is checking the ability to reproduce the de-mixed dual channel audio. So is the evaluation ran on the dual channel audio?

mitchelldehaven has not provided any information yet.