Activity
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Mon
Wed
Fri
Challenge Categories
Loading...
Challenges Entered
5 Problems 21 Days. Can you solve it all?
Latest submissions
See Allgraded | 124481 | ||
graded | 122093 | ||
graded | 122080 |
5 Puzzles, 3 Weeks | Can you solve them all?
Latest submissions
Latest submissions
See Allgraded | 67522 | ||
graded | 67521 | ||
failed | 67519 |
Immitation Learning for Autonomous Driving
Latest submissions
5 PROBLEMS 3 WEEKS. CAN YOU SOLVE THEM ALL?
Latest submissions
See Allgraded | 80476 | ||
graded | 80473 | ||
graded | 80472 |
Latest submissions
Participant | Rating |
---|---|
bhuvanesh_sridharan | 0 |
Participant | Rating |
---|
-
BayesianMechanics AIcrowd Blitz - May 2020View
-
BayesianMechanics AI for Good - AI Blitz #3View
-
BayesianMechanics AI Blitz #6View
AI Blitz #6
sigma_g has not provided any information yet.
About the new datasets for WinPrediction
Almost 4 years agoHi, I think it does not matter whether or not these game positions were from real human players, grandmasters, or even from the TCEC. Given a board position and which sideβs turn it is, there is a clear unique evaluation that Stockfish 12+ will give, which is the evaluation assuming best play from both sides.
Now, in such positions, when giving the win prediction, we have to assume best play from both the side. We cannot assume human play because itβs irregular. A human play can be from a 1200 ELO player or a 2100 ELO player, and we have no way to account for that. Even a 2100 ELO player can have a bad day and play with a drop of 100 points in performance rating.
Now that we have established that there is one unique answer, we come back to the above pictured position - and similarly in another position on this post - to state that we have contradictory information in the dataset (against what we get from Stockfish evaluating the position). And this is not rare. For the first 100 training samples we observed 20 of them with opposite win predictions. Even if we assume our OCR is wrong on half of them, thatβs still a 10% error rate in the training dataset.
Moreover, another issue is that not all positions are few moves before checkmate, as the problem statement says on the main page. Several positions are already mated, where thereβs no sense of giving whose turn it is. On the other hand, several positions are far from mated, as you can see in the linked post, the evaluation is a meagre approx +3. However, any position near checkmate will ceratinly have a \pm Mx evaluation from stockfish, which means mate in x moves by either white or black.
Let me know if any part is unclear, I will re-explain. But I hope - if the dataset is revised once again - these issues are taken care of, because as it stands, it is almost impossible to submit a better score if we follow standard Chess evaluation metrics.