Loading
0 Follower
0 Following
bjoern.holzhauer
Björn

Organization

Novartis Biostatistics Respiratory

Location

CH

Badges

2
1
1

Activity

Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...
Participant Rating
Participant Rating

Novartis DSAI Challenge

Urgent: Request asap if you need access to your vm this week - vm's start being deleted Tuesday morning

Almost 5 years ago

Can you clarify what the difference is? If we need the VM to run some model interpretability packages in R or Phython, then it would really be helpful to still have access this week.

Randomly failing image builds - what is going on?

Almost 5 years ago

I keep getting “Unable to build image from the repository.” in a seemingly completely random way. I.e. it fails a few times, I keep submitting eventually it works despite zero changes to code etc. and this is just wasting a lot of time.

Is this just the evaluation server being unreliable/unstable/not being able to deal with the number of people submitting at once? Can you stabilize it? Or is there some change to the original instructions that we should be aware of? Anything you can do AI-crowd team @shivam or @kelleni2?

Examples:
https://gitlab.aicrowd.com/bjoern.holzhauer/dsai-challenge-solution/issues/133
https://gitlab.aicrowd.com/bjoern.holzhauer/dsai-challenge-solution/issues/136

Unable to push file that "exceeds maximum limit"

Almost 5 years ago

Yes, I did and it took ages to fix. The problem was that the commits are still there, as pointed out by @lukas.widmer (have a look at git log). And brutally getting back to a previous state of the repo (after making sure to save any changes you want to retain in a different location) solved it. @josephmkahnnvs pinged me about the same problem last night, did git checkout -B master origin/master actually solve your issue, @josephmkahnnvs?

Submissions get killed without any error message

Almost 5 years ago

Hi @shivam, the one linked above is the last one. I’ve profiled my code and it definitely by far exceeds 5.5G in several places (various string processing & some full_joins). Is there is a possibility to ge tmore RAM? The same as on the training server would be the obvious choice.

Submissions get killed without any error message

Almost 5 years ago

@shivam, @kelleni2 any updates? I have not heard anything back despite reaching out via multiple channels & not received any information on the GitLab issue, either. Unless this gets resolved our feature engineering part of the pipeline will not run on the evaluation server and you will get a solution with that part done on the training server, instead.

Unable to push file that "exceeds maximum limit"

Almost 5 years ago

I think the answer is to use git-large-file-support (lfs), but I had the same problem and could not get it to work with the instructions provided by the AI Crowd team.

Aridhia hangs when attempting to start desktop

Almost 5 years ago

Hi Joseph, no, not currently for me (but have had that in the past).

Submissions get killed without any error message

Almost 5 years ago

On my latest attempts to get things running on the evaluation sever, I ran into a number of issues, some of which I managed to fix, but the current error I get baffles me. Things now get killed without any interpretable error message (at least none that is visible to me in the agent log). Everything works fine on the training server, but the agent log on the evaluation server says:
2019-12-31T23:58:12.255439177Z /home/aicrowd/run.sh: line 9: 11 Killed Rscript predict.R

Is this a memory limit issue like it was described in another thread, @shivam, or something else? Any help to get this working?

PS: It would be good to make some official annoucement about the changes to the AICROWD_TRAIN_DATA_PATH environment variable on the evaluation server. Additionally, warning people about the changes to the training data file on the evaluation server would have been good (seems to no longer match the training server - at least read_csv gives different results now versus before the latest changes and different than in the training environment). Those changes took me completely by surprise - I had expected the only changes vs. before to be a scrambled/randomized row_id column. It’s quite tedious to figure those things out via debug submissions (took me 8 submissions \approx 2 hours due to the slowness of the submission process) to trouble-shoot that.

How to use conda-forge or CRAN for packages in evaluation?

Almost 5 years ago

Yes, adding r-glmnet=2.0_16 (and conda-forge in the channels) works for me.

Different results from debug vs non-debug mode

Almost 5 years ago

Yes, as far as I understand the testset for the debug mode is a small subset of the full public leaderboard test set (which is a subset of the full testset).

Is the scoring function F1 or logloss?

Almost 5 years ago

Completely agree.

  • Scoring the submission with the best partial score would be absurd, because teams have no control over designating what they think should be scored and can be penalized for an early attempted that happened to be good on the public leaderboard.

  • Taking the best one out of anything ever submitted of course just encourages an absurd shotgun approach.

  • Taking the last one submitted or the best one out of the last 5 or 10 submitted might be reasonable.

It would be really good to know what will be done and to know that it is some sensible approach.

Submitting Solution: Push Error and No code with current Tag

Almost 5 years ago

The hint is in the bit about git pull. Have a look whether there’s something (e.g. edits) in the repository in GitLab. This might have happened, if e.g. the AI crowd team edited your repository in GitLab.

Test file changed

Almost 5 years ago

Yes, even better: it looks like some exclusion criteria strings spilled over into numeric columns (like decMinAge). Why is anyone even messing with the data?

R submission with custom packages in yaml fails

Almost 5 years ago

Have you tried with debug = true in the aicrowd.json?

Test data matrix available

Almost 5 years ago

We’d like to clarify whether it is now fine to just submit predictions instead of running any models on the evaluation server. Is that fine?

Any information on how the final insights should be submitted?

Almost 5 years ago

Is there any information, yet, on how the final insights (presentation, notebooks etc.) should be submitted and whether the timeline is any different than for the predictions? We tried to see whether we could find this information somewhere, but could not seem to locate this.

Is the scoring function F1 or logloss?

Almost 5 years ago

When we sat down together as a team, we realized that we are not sure, at all, whether it will be the logLoss of the final submission or the best logLoss of any submission. Obviously, that makes a difference for how one does submissions. Could you clarify?

Test data matrix available

Almost 5 years ago

Why on earth, again, do we have the super-complicated cobbled together AIcrowd submission setup, if we could all just submit a csv with row_id and predicted probability? I thought that whole mess was only necessary to avoid giving us the testdata? Quite frankly, it’s bizarre to do a U-turn on this with 2 weeks to go.

I guess you did consider the unavoidable leakage that will result from everyone seeing such a small test dataset, but decided that was okay (maybe that’s not too bad, but many will immediately recognize a lot of the approved drugs without even having to try to look something up).

How to use conda-forge or CRAN for packages in evaluation?

Almost 5 years ago

Thank you, great news! This seems to have fully solved this problem, I’m still testing more of what we had in mind, but at least in debug mode it seems to work now.

I guess this illustrates why for R CRAN may be a better choice (given their extensive testing suite for packages).

I am a machine learning enthusiast and biostatistician with 15 years of drug development experience. I do stuff like Bayesian methods, deep learning for audio, clinical event prediction, (network-)meta-analysis, estimands/missing data imputation and dose finding. My profile picture was done using neural style transfer from "The Scream" / "Der Schrei" to a picture of my favorite pet goose. My current main project... [Start of auto-completion using GPT-2] ...is building a predictive validation toolkit for "the data science version of Lego", backlog analysis and t-SNE optimisation. I enjoy the full and challenging challenge of machine learning. [Thank you https://talktotransformer.com/]