Location
Badges
Activity
Challenge Categories
Challenges Entered
Multi-Agent Dynamics & Mixed-Motive Cooperation
Latest submissions
Small Object Detection and Classification
Latest submissions
See Allgraded | 241075 | ||
graded | 241074 | ||
graded | 241073 |
A benchmark for image-based food recognition
Latest submissions
What data should you label to get the most value for your money?
Latest submissions
See Allgraded | 179179 | ||
failed | 179174 | ||
graded | 179153 |
Image-based plant identification at global scale
Latest submissions
Participant | Rating |
---|---|
cadabullos | 0 |
Sudhakar37 | 0 |
Participant | Rating |
---|
AIcrowd
Scoring Announcement: Public vs. Private
Over 1 year agoThanks for the clarification! I think this is quite fair and reasonable.
MosquitoAlert Challenge 2023
External datasets used by participants
Over 1 year ago@MPWARE
So iNaturalist is a separate app from mosquito alert, so any image taken with the app directly will be different from images taken with the mosquito alert app. That said, itβs also possible to upload files you have stored locally, probably in both apps. It would be difficult to rule out images that have been uploaded in such a way by users to both apps, especially if you donβt have the mosquito alert image dataset. I guess if this happens rarely it will not be a big deal.
External datasets used by participants
Over 1 year ago@MPWARE really? thatβs impressive if you got such a high score only with the provided data! Got to tell us how you achieved that after the competition!
External datasets used by participants
Over 1 year agoAs required by the competition rules, I here share the external data I used for my competition entries. Other participants may want to share their data as well in this thread.
I have used the following external datasets:
inaturalist 2021 dataset:
A custom subset of iNaturalist images, including many mosquito species, but also other species was downloaded from inaturalist-open-data:
A csv file containing path, url and species name:
For downloading use e.g. a download manager like aria2. Careful, a lot of space is required (440 GB), which is why Iβm sharing the links rather than reuploading the data.
The license is image specific but generally is either public domain or some form of creative commons. The bulk of the images have CC-BY-NC and CC BY-SA licenses. Iβm not a lawyer, but I assume using them for non commercial machine learning models is fair use.
Justification for selection:
to solve the issue of having few samples in some minority classes and to have better discriminate features for insect classification.
π¨ Important Updates for Round 2
Over 1 year ago
Can you explain how the calculation of the private leaderboard is done, please?
So hereβs how I understand it:
all submissions are run on the private test set and ranked by score.
This means one candidate may have submission x rank high on public leaderboard and submission y rank high on private leaderboard.
Another way it could be implemented would be that whatever solution is ranked high on the public leaderboard is then evaluated on the private test set and other submissions are ignored.
Which one is it?
Submissions are quite unstable
Over 1 year agoI think itβs pretty tricky to get this right. Besides some indeterminism / caching issues that naturally occur, on cloud instances you additionally have to face things like noisy neighbors or βsteal timeβ.
see Understanding CPU Steal Time - when should you be worried? | Scout APM Blog
While you say the container gets the full node, itβs not quite clear if that means it getβs the full bare metal server. You are probably using EC2 instances with 2 cores, which are VMs on a bigger machine and thus you have to deal with the problems mentioned.
Increasing time to 2 sec doesnβt solve the problem, as people may just deploy bigger models and then run over the limit again. Imo only averaging can prevent the issue.
Submissions are quite unstable
Over 1 year agoNot sure how the performance is measured, but if itβs like βNO image is allowed to take longer than 1secβ it could be relaxed to βON AVERAGE no image is allowed to take longer than 1secβ
About submissions
Over 1 year ago@harshitsheoran for the stats on current sub you should be able to click on the βViewβ button next to the submission trend on the leader board. About past submissions you are right, those are missing. In another competition it was possible to see them.
π’ Announcement: Important Updates to Challenge Rules!
Over 1 year agoThe updated rules just say no public MosquitoAlert data may be used. This implies other data may still be used.
About submissions
Over 1 year agoYes, thereβs a bug here. I canβt see the tab and the page I land on after submitting something is empty:
Data Purchasing Challenge 2022
[Announcement] Leaderboard Winners
Almost 3 years agocongrats to the winners! Quite a shakeup in the final leader board. Iβm curious about your solutions, would be cool if youβd explain them.
:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!
Almost 3 years agoI agree. Thereβs now an incentive to not buy the most useful images, but images that can be learned and improve a model in the first few epochs. It would probably rule out βdifficultβ images. Itβs quite likely that this is of little practical relevance. While for competitions sake itβs ok, it would still be good if the results here had some practical relevance.
While I appreciate if the training pipeline would be made more realistic, I hope this will not be a change implemented like a week before deadline and force us to make big changes.
Which submission is used for private LB scoring?
Almost 3 years agogood question!
If it would be 1., there would be an incentive to run many variations covering many potential distributions in the hope of one fitting best. So this seems bad.
2. seems plausible. But it has the danger that this submission is overfitted to the public leader board. It would incentivize not trying out many submissions.
3. seems best, but there is no feature currently where you can specify this.
Why there is no GaussianBlur in test transform?
Almost 3 years agodoes gaussianblur even make sense with the small particles ? Someone should look at how an image with this applied looks like. Letβs assume it totally washes out the small particles, but still is recognizable, but it just looks different, this could explain the worse scores in eval. Or maybe it does only affect the speed of convergence.
πΉ Town Hall Recording & Resources from top participants
Almost 3 years agoI tried this method in round 1 (locally) and it worked pretty well:
Itβs sold as an active learning method, but really does select labels in one go. However it really is essential that it uses a model that was trained in an unsupervised fashion, like facebookβs Dino. I tried using the vision transformer that came with torchvision or an efficientnet that was finetuned on the given data. Both didnβt work. Since dino is not among the supported pretrained weights itβs not an option in this competition.
I also think while it may work, itβs likely not the best performing method.
πΉ Town Hall Recording & Resources from top participants
Almost 3 years agothanks for putting this online! I totally didnβt assume labels were noisy. When looking at some images I did wonder where for example some dents were supposed to be, but because the data was generated synthetically I just assumed labels would be 100% correct. Definitely going to take this into account now.
:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!
Almost 3 years agoIn the first round I hit some wall with efnet b1, but didnβt with efnet b4. I.e. using active learning I got an improvement with b4, but not with b1. This is not a totally conclusive argument, but some evidence. However with frozen layers and only 10 epochs at a fixed learning rate, itβs a different situation.
A big issue I see is that the variance of the final scores seems too high and too much dependent on random seeds.
For example, with a modified starter kit (batch size=64, aggregated_dataset used) and a purchase budget of 500 which always buys the first 500 images and using different seeds I measured these f1 scores:
[0.23507449716686799, 0.17841491812405716, 0.19040294167615202, 0.17191250777735645, 0.16459303242037562]
mean: 0.188
std: 0.025
In the first round the improvements I observed with active learning were between 0.7% and 1.5%. Now if results fluctuate up to 7% just based on random seed this is pretty bad. I think the winner should not be decided based on luck or on his skill to fight random number generators.
You do run multiple runs, but even then itβs still not great I guess. Would be better to bring variance down for individual runs, as much as possible.
I guess some experiments should be run to see what improves this. Training for longer, averaging more runs, using weight averaging, not freezing layers, using efnet b1 or b0, different learning rate schedules or dropout would be some of the parameters that are worth experimenting with.
Hereβs a paper I just googled (havenβt read it yet) about this issue:
ACCOUNTING FOR VARIANCE IN MACHINE LEARNING BENCHMARKS
And another one:
Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision
:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!
Almost 3 years agobug In local_evaluation.py in the post purchase training phase:
trainer.train(
training_dataset, num_epochs=10, validation_percentage=0.1, batch_size=5
)
It should be aggregated_dataset instead, otherwise none of the purchased labels have an effect! This bug may also be present in your server side evaluation scripts.
another thing:
in run.purchase_phase a dict is returned. Should it be a dict? And also is it allowed to fill in labels for indices you didnβt purchase, say for example with pseudo labeling?
In instantiate_purchased_dataset the type hint says itβs supposed to be a set, which is inconsistent and also wouldnβt work. It would in theory even be possible to return some other type in purchase_phase, which has the dict interface, i.e. supports .keys() but allows repetitions of keys. This would be some hack to increase the dataset to as many images as you want, which is surely an unwanted exploit. I suggest you convert whatever is returned by purchase_phase to a dict, and depending on if pseudo labeling is allowed or not, further validate it.
It would be good if you would test your training pipeline if it can actually achieve good scores under ideal conditions (say with buying all labels).
:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!
Almost 3 years agoI also noticed, the feature layers are frozen during training of the efnet4 model. Is that intentional? Seems like this will guarantee low scores.
Scoring Announcement: Public vs. Private
Over 1 year agoTwo questions:
The designation of top three submissions is per team and not per participant right? Otherwise a team with multiple members would have a big advantage, as scores fluctuate widely and a best of 9 would be much more likely to win than a best of 3.
The calculation of the private scores was already done at the time of submission, right? So we will not have to worry if a solution may fail on the private test set because of going over the 2sec limit per image.