Loading

Novartis DSAI Challenge

Challenge Rules

Rules:

The challenge is divided into the Leaderboard methods competition, and into subjective prizes like insights, innovation, and wrangling.

Below are the rules for the Leaderboard portion, and guidance for the insights aspect.

Leaderboard Evaluation:

All participants will use the provided core data set as the starting point for training and submissions.

The data is split by time:

  • Training data 2015 and earlier
  • Test data 2016 and later
  • Note: the visible public leaderboard will be on only a portion of the test data set, but the entire data set will be used at the end of the challenge to prevent gaming of the system
  • Likewise, there will be a limit on number of submissions per day
  • No data from phase 3 or existence of phase 3 for leaderboard competition
  • Recreating and/or using test data to train on or game the submission system for the Leaderboard competition is prohibited

Your leaderboard submitted model may use additional variables from the raw Informa historical trials data from which we produced the core data matrix. But you will have to wrangle those variables to the test data also.

Additionally, you may explore other data sources in your workspace, and then request that a new data source be made available to everyone to enable the entire challenge.

IMPORTANT: When adding additional data, it is up to the participants that no data from after the 2015 cutoff is “leaked” back into the model training – especially regarding trial outcomes. Information leakage would negate your score.

Insights:

When demonstrating the performance of your model for various other challenge questions or with new data sets, please keep in mind the topic of information leakage, but you are of course able to come up with new ways for validating your algorithm.

We strongly encourage participants to show the value or additional insights associated with bringing in new data, to help us quantify the new potential information it brings.