0 Follower
0 Following
karolisram
Karolis Ramanauskas

GB

5
2
2

Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Mon
Wed
Fri

#### Challenges Entered

##### NeurIPS 2022: MineRL BASALT Competition
By MineRL Labs

Learning From Human-Feedback

#### Latest submissions

No submissions made in this challenge.
##### NeurIPS 2022 - The Neural MMO Challenge
By Parametrix.ai MIT THU_SIGS AIcrowd

Specialize and Bargain in Brave New Worlds

#### Latest submissions

 graded 197092 Tue, 23 Aug 2022 10:13:39 graded 196866 Fri, 19 Aug 2022 09:52:23 graded 196863 Fri, 19 Aug 2022 09:44:49
##### NeurIPS 2021 - The NetHack Challenge
By AIcrowd

ASCII-rendered single-player dungeon crawl game

#### Latest submissions

 graded 149038 Wed, 30 Jun 2021 08:40:25 graded 148738 Mon, 28 Jun 2021 11:10:09 graded 147238 Fri, 18 Jun 2021 11:09:10
##### IJCAI 2022 - The Neural MMO Challenge
By Parametrix.ai MIT THU_SIGS AIcrowd

#### Latest submissions

No submissions made in this challenge.
##### NeurIPS 2021: MineRL Diamond Competition
By MineRL Labs - Carnegie Mellon University

Training sample-efficient agents in Minecraft

#### Latest submissions

 graded 149687 Mon, 5 Jul 2021 10:15:23
##### NeurIPS 2021: MineRL BASALT Competition
By C.H.A.I. - UC Berkeley

Sample Efficient Reinforcement Learning in Minecraft

#### Latest submissions

No submissions made in this challenge.
##### NeurIPS 2020: Procgen Competition
By OpenAI

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

#### Latest submissions

 graded 93478 Thu, 29 Oct 2020 12:57:57 graded 93477 Thu, 29 Oct 2020 12:57:25 graded 93390 Wed, 28 Oct 2020 12:56:30
##### Flatland
By SNCF SBB Deutsche Bahn

Multi-Agent Reinforcement Learning on Trains

#### Latest submissions

No submissions made in this challenge.
##### NeurIPS 2020: MineRL Competition
By MineRL Labs - Carnegie Mellon University

Sample-efficient reinforcement learning in Minecraft

#### Latest submissions

 graded 120617 Thu, 11 Feb 2021 09:56:57 graded 120492 Wed, 10 Feb 2021 13:34:54 failed 120483 Wed, 10 Feb 2021 12:26:29
##### NeurIPS 2019 : MineRL Competition
By MineRL Labs - Carnegie Mellon University

Sample-efficient reinforcement learning in Minecraft

#### Latest submissions

 graded 25413 Tue, 26 Nov 2019 09:31:38 graded 25412 Tue, 26 Nov 2019 09:29:38 graded 25075 Sat, 23 Nov 2019 11:00:13
##### Flatland Challenge
By SBB

Multi Agent Reinforcement Learning on Trains.

#### Latest submissions

No submissions made in this challenge.
##### Unity Obstacle Tower Challenge
By Unity Technologies

A new benchmark for Artificial Intelligence (AI) research in Reinforcement Learning

#### Latest submissions

 graded 8563 Thu, 11 Jul 2019 06:16:27 graded 8534 Wed, 10 Jul 2019 21:01:07 failed 8533 Wed, 10 Jul 2019 20:52:11
##### Spotify Sequential Skip Prediction Challenge
By Spotify

Predict if users will skip or listen to the music they're streamed

#### Latest submissions

No submissions made in this challenge.
##### Flatland AMLD 2021
By AIcrowd

Multi-Agent Reinforcement Learning on Trains

#### Latest submissions

No submissions made in this challenge.
Participant Rating
Participant Rating

### MineRL self._actions

Over 1 year ago

Ah I see the issue now. I think the confusion comes from line 121 in RL_plus_script.py:
[('forward', 1), ('jump', 1)]
This line doesn’t mean two actions, forward on first tick and then jump on the next tick. Instead it means that the forward and jump keys are both pressed for a single tick.

You can see that by printing out act = env.action_space.noop():

OrderedDict([('attack', 0),
('back', 0),
('camera', array([0., 0.], dtype=float32)),
('forward', 0),
('jump', 0),
('left', 0),
('right', 0),
('sneak', 0),
('sprint', 0)])


This is a single action that does nothing, because none of the keys are pressed. If you then do:

act['forward'] = 1
act['jump'] = 1


act will become an action with those two buttons pressed. This is what the ActionShaping() wrapper does. To create meta actions that perform 5 attacks and such you will need to do something else. Maybe frame skipping would be an easier way to achieve that?

### MineRL self._actions

Over 1 year ago

The docstring of the class ActionShaping() should be enough to figure out how to adjust the actions for the RL part of the algo. What changes do you want to make and what have you tried?
Maybe playing Minecraft for a bit or watching a youtube guide would help with Minecraft knowledge?

### Questions about the environment that can be used to train the model

Over 1 year ago

yes, you can use the *DenseVectorObf environments in the Research track of the competition.

### Discord invite invalid

Almost 2 years ago

Good catch, thank you! The links have been fixed.

### Obfuscated actions + KMeans analysis

Here’s some analysis our team did on the whole obfuscated action + KMeans thing:

A teaser: sometimes the agents don’t have a single action to look up. So shy

### Error using gym.make

Over 2 years ago

Working Colab example (credit to @tviskaron):

!java -version
!sudo apt-get purge openjdk-*
!java -version
!sudo apt-get install openjdk-8-jdk

!sudo apt-get install xvfb xserver-xephyr vnc4server
!sudo pip install pyvirtualdisplay

from pyvirtualdisplay import Display
display = Display(visible=0, size=(640, 480))
display.start()

import minerl
import gym
env = gym.make(‘MineRLNavigateDense-v0’)

obs = env.reset()
done = False
net_reward = 0

for _ in range(100):
action = env.action_space.noop()

action['camera'] = [0, 0.03*obs["compassAngle"]]
action['back'] = 0
action['forward'] = 1
action['jump'] = 1
action['attack'] = 1

obs, reward, done, info = env.step(
action)

net_reward += reward
print("Total reward: ", net_reward)


env.close()

### How to find subtle implementation details

Over 2 years ago

It could be the weight initialization, as pytorch uses he_uniform by default and tensorflow uses glorot_uniform. Using tensorflow with glorot_uniform I get 42 score on starpilot, while using tensorflow with he_uniform I get 19.

### Round 2 is open for submissions 🚀

Over 2 years ago

Sounds good, thanks @shivam . Could you please also give us the normalization factors for the 4 private envs (Rmin, Rmax) ?

### Round 2 is open for submissions 🚀

Over 2 years ago

Will we be able to choose which submission to use for the final 16+4 evaluation? It might be the case that our best solution that was tested locally on 16 envs is not the same as the best one for the 6+4 envs on public LB.

### Human score

Over 2 years ago

So I was a little bored and decided to see how well I could play the procgen games myself.

Setup:

python -m procgen.interactive --distribution-mode easy --vision agent --env-name coinrun

First I tried each game for 5-10 episodes to figure out what the keys do, how the game works, etc.
Then I played each game 100 times and logged the rewards. Here are the results:

Environment Mean reward Mean normalized reward
bigfish 29.40 0.728
bossfight 10.15 0.772
caveflyer 11.69 0.964
chaser 11.23 0.859
climber 12.34 0.975
coinrun 9.80 0.960
dodgeball 18.36 0.963
fruitbot 25.15 0.786
heist 10.00 1.000
jumper 9.20 0.911
leaper 9.90 0.988
maze 10.00 1.000
miner 12.27 0.937
ninja 8.60 0.785
plunder 29.46 0.979
starpilot 33.15 0.498

The mean normalized score over all games was 0.882. It stayed relatively constant throughout the 100 episodes, i.e. I didn’t improve much while playing.

I’m not sure how useful this result would be as a “human benchmark” though - I could easily achieve ~1.000 score given enough time to think on each frame. Also, human visual reaction time is ~250ms, which at 15 fps would translate to us being at least 4 frames behind on our actions, which can be important for games like starpilot, chaser and some others.

### How to save rollout video / render?

Over 2 years ago

That worked, thank you!

### How to save rollout video / render?

Over 2 years ago

Does it work properly for everyone else? When I run it for 100 episodes it only saves episodes number 0, 1, 8, 27, 64.

### Same marks on the testing video

Over 2 years ago

It’s the paint_vel_info flag that you can find under env_config in the .yaml files. There are also some flags that are not in the .yaml files, but people are using (use_monochrome_assets, use_backgrounds). You can find all of them if you scroll down here: https://github.com/openai/procgen .
Should we actually be allowed to change the environment? Maybe these settings should be reset when doing evaluation?

### Submissions are stuck

Over 3 years ago

There was a mention about the final standings for round 2 being based on more seeds than 5 to get a proper average performance. Is that going to happen? I didn’t try to repeatedly submit similar models to overfit the 5 seeds for that reason.

### Is there any due date of GCP credit?

Almost 4 years ago

mine says it expires 28 May 2020, not sure if that’s a set date or depends on when you redeem. I can’t find the date of when I redeemed.

### Successful submissions do not appear on the leaderboard

Almost 4 years ago

Is the debug option off?

### What reward receives the agent for collecting a key?

Almost 4 years ago

0.1, same as a single door (there’s 2 doors in each doorway).

And I was thinking I’m going mad when my previously working submission suddenly broke after “disabling” debug

### Submission Failed: Evaluation Error

Can’t wait! I’ve been trying to get my dopamine trained agent to be scored (only 5-7 floors so far), but the only response I get after every change is
The following containers terminated prematurely. : agent
and it’s not very helpful. It builds fine, but gets stuck on evaluation phase.

### Human Performance

In the Obstacle Tower paper there is a section on human performance. 15 people tried it multiple times and the max floor was 22. Am I reading this right? I finished all 25 floors on my very first try without much trouble.
How far did everyone else get and how many runs did you do? We could try collecting more data and make a more accurate human benchmark this way.

karolisram has not provided any information yet.