Challenge Rules

  1. Entries to the MineRL challenge must be “open.” 
    • Teams will be expected to reveal most details of their method including source-code (special exceptions may be made for pending publications).
  2. For a team to be eligible to move to Round 2, each member must satisfy the following
    • (1) be at least 18 and at least the age of majority in place of residence; 
    • (2) not reside in any region or country subject to U.S. Export Regulations; and 
    • (3) not be an organizer of this competition nor a family member of a competition organizer. 
    • In addition, to receive any awards from our sponsors, competition winners must attend the NeurIPS workshop.
  3. The submission must train a machine learning model without relying on human domain knowledge. 
    • The reward function may not be changed (shaped) based on manually engineered, hard-coded functions of the state. For example, additional rewards for approaching tree-like objects are not permitted, but rewards for encountering novel states (“curiosity rewards”) are permitted.
    • Actions/meta-actions/sub-actions/sub-policies may not be manually specified in any way. For example, though a learned hierarchical controller is permitted, meta-controllers may not choose between two policies based on a manually specified condition, such as whether the agent has a certain item in its inventory. This restriction includes the composition of actions (e.g., adding an additional action which is equivalent to performing “walk forward for 2 seconds” or “break a log and then place a crafting table”).
    • State processing/pre-processing cannot exploit Minecraft-specific techniques / domain knowledge. For example, the agent can act every even-numbered timestep based on the last two observations, but a manually specified edge detector may not be applied to the observation. As another example, the agent’s observations may be normalized to be “zero-mean, variance one” based on an observation history or the dataset.
    • To ensure that the semantic meaning attached to action and observation labels are not exploited, the labels assigned to actions and observations have been obfuscated (in both the dataset and the environment). Actions and observations (with the exception of POV observations) have been embedded into a different space. Furthermore, during Round 2 submissions, the actions will be re-embedded. Any attempt to bypass these obfuscations will constitute a violation of the rules.
    • Models may only be trained against the competition environments (MineRL environments ending with “Comp”). All of the MineRL environments have specific competition versions which incorporate action and observation space obfuscation. For example (MineRLObtainDiamondComp-v0, MineRLTreechopComp-v0, etc.). They all share a similar observation and action space embedding which is changed in Round 2 as with the texture pack of the environment.
  4. There are two tracks, each with a different sample budget:
    • The primary track is “Demonstrations and Environment.” Eight million (8,000,000) interactions with the environment may be used in addition to the provided dataset. If stacking observations / repeating actions, then each skipped frame still counts against this budget.
    • The secondary track is “Demonstrations Only.” No environment interactions may be used in addition to the provided dataset. Competitors interested in learning solely from demonstrations can compete in this track without being disadvantaged compared to those who also use reinforcement learning. 
    • A team can submit separate entries to both tracks; performance in the tracks will be evaluated separately (i.e., submissions between the two tracks are not linked in any way).
  5. Participants may only use the provided dataset; no additional datasets may be included in the source file submissions nor may be downloaded during training evaluation, but pre-trained models which are publicly available by June 5th are permitted. 
    • During the evaluation of submitted code, the individual containers will not have access to any external network in order to avoid any information leak. Relevant exceptions are added to ensure participants can download and use the pre-trained models included in popular frameworks like PyTorch and TensorFlow. Participants can request to add network exceptions for any other publicly available pre-trained models, which will be validated by AICrowd on a case-by-case basis.
    • All submitted code repositories will be scrubbed to remove any files larger than 30MB to ensure participants are not checking in any model weights pre-trained on the released training dataset.
    • Pretrained models are not allowed to have been trained on MineRL or any related or unrelated Minecraft data. The intent of this rule is to allow participants to use models which are for example trained on ImageNet or similar datasets. Don't abuse this.
  6. The procedure for Round 1 is as follows:
    • During Round 1, teams submit their trained models for evaluation at most twice a week times and receive the performance of their models. 
    • At the end of Round 1, teams must submit source code to train their models. This code must terminate within four days on the specified platform. 
    • For teams with the highest evaluation scores, this code will be inspected for rule compliance and used to re-train the models with the validation dataset and environment. 
    • For those submissions whose end-of-round and organizer-ran performance distributions disagree, the offending teams will be contacted for appeal. Unless a successful appeal is made, the organizers will remove those submissions from the competition and then evaluate additional submissions until each track is at capacity.
    • The top 15 teams in the main (RL+Demonstration) track and the top 5 teams in the secondary (Demonstration Only) track will progress to Round 2.
  7. The procedure for Round 2 is as follows:
    • During Round 2, teams will submit their source code at most once every two weeks.
    • After each submission, the model will be trained for four days on a re-rendered, private dataset and domain, and the teams will receive the final performance of their model. The dataset and domain will contain matching perturbations to the action space and the observation space.
    • At the end of the round, final standings are based on the best-performing submission of each team during Round 2.
  8. Official rule clarifications will be made in the FAQ.