INFO:root:Training run: nmmo_20231022_225107 (/content/drive/MyDrive/nmmo/nmmo_20231022_225107)
INFO:root:Training args: Namespace(attend_task='none', attentional_decode=True, bptt_horizon=8, checkpoint_interval=30, clip_coef=0.1, death_fog_tick=None, device='cuda', early_stop_agent_num=8, encode_task=True, eval_batch_size=32768, eval_mode=False, eval_num_policies=2, eval_num_rounds=1, eval_num_steps=1000000, explore_bonus_weight=0.01, extra_encoders=True, heal_bonus_weight=0.03, hidden_size=256, input_size=256, learner_weight=1.0, local_mode=True, map_size=128, maps_path='maps/train/', max_episode_length=1024, max_opponent_policies=0, meander_bonus_weight=0.02, num_agents=128, num_buffers=1, num_cores=None, num_envs=1, num_lstm_layers=0, num_maps=128, num_npcs=256, policy_store_dir=None, ppo_learning_rate=0.00015, ppo_training_batch_size=128, ppo_update_epochs=3, resilient_population=0.2, rollout_batch_size=1024, run_name='nmmo_20231022_225107', runs_dir='/content/drive/MyDrive/nmmo/', seed=1, spawn_immunity=20, sqrt_achievement_rewards=False, task_size=4096, tasks_path='reinforcement_learning/curriculum_with_embedding.pkl', track='rl', train_num_steps=10000000, use_serial_vecenv=True, wandb_entity=None, wandb_project=None)
INFO:root:Using policy store from /content/drive/MyDrive/nmmo/nmmo_20231022_225107/policy_store
INFO:root:Generating 128 maps
Allocated 93.30 MB to environments. Only accurate for Serial backend.
PolicyPool sample_weights: [128]
Allocated to storage - Pytorch: 0.00 GB, System: 0.11 GB
INFO:root:PolicyPool: Updated policies: dict_keys(['learner'])
Allocated during evaluation - Pytorch: 0.01 GB, System: 1.53 GB
Epoch: 0 - 1K steps - 0:01:20 Elapsed
Steps Per Second: Env=759, Inference=185
Train=430
Allocated during training - Pytorch: 0.07 GB, System: 0.24 GB
INFO:root:Saving policy to /content/drive/MyDrive/nmmo/nmmo_20231022_225107/policy_store/nmmo_20231022_225107.000001
INFO:root:PolicyPool: Updated policies: dict_keys(['learner'])
Allocated during evaluation - Pytorch: 0.00 GB, System: 0.01 GB
Epoch: 1 - 2K steps - 0:01:26 Elapsed
Steps Per Second: Env=565, Inference=3752
Train=610
Allocated during training - Pytorch: 0.01 GB, System: 0.03 GB
INFO:root:PolicyPool: Updated policies: dict_keys(['learner'])
Allocated during evaluation - Pytorch: 0.00 GB, System: 0.02 GB
Epoch: 2 - 3K steps - 0:01:31 Elapsed
Steps Per Second: Env=438, Inference=3722
Train=651
Allocated during training - Pytorch: 0.01 GB, System: 0.04 GB
INFO:root:PolicyPool: Updated policies: dict_keys(['learner'])
Allocated during evaluation - Pytorch: 0.00 GB, System: -0.04 GB
Epoch: 3 - 4K steps - 0:01:35 Elapsed
Steps Per Second: Env=736, Inference=5234
Train=732
Allocated during training - Pytorch: 0.01 GB, System: 0.01 GB
INFO:root:PolicyPool: Updated policies: dict_keys(['learner'])
Allocated during evaluation - Pytorch: 0.00 GB, System: 0.00 GB
Epoch: 4 - 5K steps - 0:01:40 Elapsed
Steps Per Second: Env=438, Inference=4811
Train=738
Allocated during training - Pytorch: 0.01 GB, System: 0.01 GB
INFO:root:PolicyPool: Updated policies: dict_keys(['learner'])
Allocated during evaluation - Pytorch: 0.00 GB, System: 0.00 GB
Epoch: 5 - 6K steps - 0:01:44 Elapsed
Steps Per Second: Env=719, Inference=4496
Train=637
Allocated during training - Pytorch: 0.01 GB, System: 0.00 GB
INFO:root:PolicyPool: Updated policies: dict_keys(['learner'])