0 Follower
0 Following
Adrian Egli


Swiss Federal Railways (SBB CFF FFS)








Ratings Progression


Challenge Categories


Challenges Entered

Latest submissions

See All
failed 172376
failed 172375
failed 172374

Multi-Agent Reinforcement Learning on Trains

Latest submissions

See All
graded 109332
graded 109022
graded 109020

Multi Agent Reinforcement Learning on Trains.

Latest submissions

See All
graded 24345
failed 24340
graded 24238

Multi-Agent Reinforcement Learning on Trains

Latest submissions

See All
graded 149984
graded 128658
graded 128649
Participant Rating
Participant Rating
adrian_egli has not joined any teams yet...


🚅 Accelerate the learning – increase agents behavior at higher speed!

About 3 years ago

An extended version of the fast-tree-obs can be found:

Or just fork the ppo-baseline /
AMLD2021 baseline with score 80.579

Submission error : no module flatland

Over 3 years ago

I fixed submission error no module flatland:

My apt.txt has following entries:


🚅 Accelerate the learning – increase agents behavior at higher speed!

Over 3 years ago

Adrian Egli, Computer Scientist, SBB


  • I improved the performance of the Flatland environment by optimizing critical code paths using a profiler. These changes have been included in the latest Flatland release.
  • I introduce a new observation builder, FastTreeObs, which performs many time faster than the default TreeObs and reaches better performance for RL-based agents. See the comparison below: FastTreeObs in red and green vs default TreeObs in blue and orange.
  • This new observation builder has already been added in the Starter kit in the fast-tree-obs branch! use --use_extra_observation

I would like to share some of the things I’ve learnt while working with FLATland. I would also like to share some working experience in the field of mathematical optimization where I have been developing solutions for more than a decade, in the hope that these experiences might help to solve similarly complex problems.

One key issue with reinforcement learning is that to train agents we have to show hundreds of thousands of different variants. To do this in a reasonable amount of time we must have a fast environment. So to shorten the training time we can accelerate the FLATland environment by using faster computers or we can change the optimization model or the implementation of FLATland. Based on my experience you should not invest too much time accelerating code such as the FLATland core when working with early, naive optimisation models, such as RL agents and their observations. Doing core optimisations in general increases code complexity and it doesn’t make a big difference towards optimisation quality. Of course core optimisations allow you to train more episodes in fixed time. But I think this will not make a major difference to agents’ behavior or optimisation quality.

Nevertheless, I decided as a first step using a profiler to get a first insight into whether there are some low-hanging-fruits to optimize. This is cheap in doing and could make a first improvement. But I was aware that this can not make a big improvement in agents behavior after training. What I did was simply to run FLATland with a profiler. I profiled FLATland with yappi and analyzed the call graph. Within a very short time I found some expensive function calls in the core and as well in the build-in tree observation. I just focused on the most expensive ones. Then I found a better implementation and so I replaced just five numpy function calls. I used only pure python methods and I got the FLATland core accelerated by factors 3-4.

Which numpy functions are extremely expensive and should not be used in FLATland? The most expensive one is np.isclose. Thus I suggested replacing np.isclose with fast_isclose which is my pure python replacement. Next you take by using np.clip or np.argmax. Beside those three methods I worked out that two other methods have a major impact on the overall performance, thus I suggested also to replace equal for position comparison and count_nonzero for counting the number of possible transitions.

def fast_isclose(a, b, rtol):
    return (a < (b + rtol)) or (a < (b - rtol))

def fast_clip(position: (int, int), min_value: (int, int), max_value: (int, int)) -> bool:
    return (
        max(min_value[0], min(position[0], max_value[0])),
        max(min_value[1], min(position[1], max_value[1]))

def fast_argmax(possible_transitions: (int, int, int, int)) -> bool:
    if possible_transitions[0] == 1:
        return 0
    if possible_transitions[1] == 1:
        return 1
    if possible_transitions[2] == 1:
        return 2
    return 3

def fast_position_equal(pos_1: (int, int), pos_2: (int, int)) -> bool:
    return pos_1[0] == pos_2[0] and pos_1[1] == pos_2[1]

def fast_count_nonzero(possible_transitions: (int, int, int, int)):
    return possible_transitions[0] + possible_transitions[1] + possible_transitions[2] + possible_transitions[3]

Those methods are now part of the latest FLATland release. Many thanks to the community to integrate them into FLATland core.

from time import time

import numpy as np
from flatland.envs.rail_env import fast_isclose

def print_timing(label, start_time, end_time):
    print("{:>10.4f}ms".format(1000 * (end_time - start_time)) + "\t" + label)

def check_isclose(nbr=100000):
    s = time()
    for x in range(nbr):
        fast_isclose(x, 0.0, rtol=1e-03)
    e = time()
    print_timing("fast_isclose", start_time=s, end_time=e)

    s = time()
    for x in range(nbr):
        np.isclose(x, 0.0, rtol=1e-03)
    e = time()
    print_timing("np.isclose", start_time=s, end_time=e)

if __name__ == "__main__":

#21.9455ms	fast_isclose
#2421.8295ms	np.isclose

This optimization makes it possible to train a reinforcement learning agent on FLATland much faster. But it made no difference in agents’ behavior! I think the whole community can now benefit from these code accelerations. That’s great. So the wave evaluation strategy with these optimizations enables to execute more scenarios in the limited time and so you all get higher reward.

But then I switch to the more interesting work – I like to understand what the agents can do and I like to understand how I can use these problem understanding to rewrite the observation model. So I focus most time on the model and the question of how we can reformulate the tree observation to improve agents behavior at higher calculation speed. This work is fun!

What observation I made while analyzing FLATland? I got that the agent’s action space is not at each cell the same. Thus the action space is locally different and might be used to better formulate the observation. The key question was to understand where an agent can freely choose the full action space and where an agent can only walk forward to optimize the reward.

Where agents can choose an action from full action space is where they can make a real decision. This is in general on a cell before a switch and on a cell which is a switch.
One cell before a switch (agent is looking toward the switch which is joining two tracks) the agent can only choose to wait or walk forward. All other actions don’t optimize the reward function. What is going on there? Two trains can change their ordering, A before B or B before A. For all other switches the agent has to walk at least one step and has to decide on the switch which track I will walk on. Thus on a switch the agent can choose to take the path (track). In most cases the agent has to choose to continue walking on the shortest path or not. But he has to ensure that the chosen path will end at the target. Otherwise the reward functions will dramastically be bad.


What conceptual issue I believe is in the current observation:


At the red box the agent can choose an action. This real action space is very spare.

I used this insight to implement a very small observation builder which can be computed very fast but the agents behavior will not be worse than the built-in tree observation based agents.

My observation contains only 26 values:

# all values are [0,1]
# observation[0]  : 1 path towards target (direction 0) / otherwise 0 -> path is longer or there is no path
# observation[1]  : 1 path towards target (direction 1) / otherwise 0 -> path is longer or there is no path
# observation[2]  : 1 path towards target (direction 2) / otherwise 0 -> path is longer or there is no path
# observation[3]  : 1 path towards target (direction 3) / otherwise 0 -> path is longer or there is no path
# observation[4]  : int(agent.status == RailAgentStatus.READY_TO_DEPART)
# observation[5]  : int(agent.status == RailAgentStatus.ACTIVE)
# observation[6]  : int(agent.status == RailAgentStatus.DONE or agent.status == RailAgentStatus.DONE_REMOVED)
# observation[7]  : current agent is located at a switch, where it can take a routing decision
# observation[8]  : current agent is located at a cell, where it has to take a stop-or-go decision
# observation[9]  : current agent is located one step before/after a switch
# observation[10] : 1 if there is a path (track/branch) otherwise 0 (direction 0)
# observation[11] : 1 if there is a path (track/branch) otherwise 0 (direction 1)
# observation[12] : 1 if there is a path (track/branch) otherwise 0 (direction 2)
# observation[13] : 1 if there is a path (track/branch) otherwise 0 (direction 3)
# observation[14] : If there is a path with step (direction 0) and there is a agent with opposite direction -> 1
# observation[15] : If there is a path with step (direction 1) and there is a agent with opposite direction -> 1
# observation[16] : If there is a path with step (direction 2) and there is a agent with opposite direction -> 1
# observation[17] : If there is a path with step (direction 3) and there is a agent with opposite direction -> 1
# observation[18] : If there is a path with step (direction 0) and there is a agent with same direction -> 1
# observation[19] : If there is a path with step (direction 1) and there is a agent with same direction -> 1
# observation[20] : If there is a path with step (direction 2) and there is a agent with same direction -> 1
# observation[21] : If there is a path with step (direction 3) and there is a agent with same direction -> 1
# observation[22] : If there is a switch on the path which agent can not use -> 1
# observation[23] : If there is a switch on the path which agent can not use -> 1
# observation[24] : If there is a switch on the path which agent can not use -> 1
# observation[25] : If there is a switch on the path which agent can not use -> 1

At a switch I tried to explore the local agent environment by exploring similar to the tree observation. But I don’t branch at all switch the exploration. I branch only at real decision points, and stop the exploration when it reaches an opposite agent (agents which are traveling towards the agent of interest) or stop the exploration when it reaches any other agents. And of course it stops the exploration when the maximal branch depth is reached. Then I just aggregate the observations made during the exploration.

Here are some experimental results with FastTreeObs: using the starter kit DQN implementation, we can see that they get better results when looking at sample efficiency

However it’s when you look at the time efficiency that you see it’s true potential :rocket::rocket:

🚃🚃 Train Close Following

Almost 4 years ago

@hagrid67: Do you also have a illustrative (screenshot) image with agents travelling in different direction? And with more then one junction (join and split)?

🚃🚃 Train Close Following

Almost 4 years ago

Very nice - thanks @hagrid67 . This graphic looks excellent. It could help to improve RL. Because we can see “flows” of agents in a very precise way. And we may be able to include this information into tree observation. The current tree observation just scans forward (in the direction of travel), but now we can also get information about the incoming traffic at switchs in very accurate way. This might be important information to avoid overfilling which can lead into “unpredictable” deadlocks (traffic jam issue).

🚃🚃 Train Close Following

Almost 4 years ago

Thank you @ hagrid67 very good job. I think the graph (network) can also be used to create new RL observations. How can i run a demostration of the graphic structure, do you have an examples with illustrative rendering?

🚉 Questions about the Flatland Environment

Almost 4 years ago

I don’t exactly know what the manual generator is doint - but if you put an agent into the environment you just need to set the agent status: READY_TO_DEPART and as well its position = None. The agent is now located outside of the environment. Thus you have to further set the initial_position to the entry position. Means where the agent will entring the world.

The ending position is the agent’s target. Where the agent can be automatically removed or not. In the challenge setup it will automatically removed. RailEnv(…, remove_agents_at_target = True) -> remove_agents_at_target is set to true thus it reqeust the env to remove agents at target. It will be done as soon as the agent reached its target.

🚉 Questions about the Flatland Environment

Almost 4 years ago

Hi Zain

In the RailEnv.step FLATland checks whether there is a mal function for given agent. If so the agent can not move. See the comment in the rail_env.py:

        # Induce malfunction before we do a step, thus a broken agent can't move in this step

When an agent is not yet active (READY_TO_DEPART) and the agent’s apply action MOVE_LEFT,MOVE_RIGHT,MOVE_FORWARD it become active and the agent is set to its initial position but only if the cell is empty


Flatland Challenge

The train goes off the rails

Over 4 years ago

Hi @xcilog

What version of FLATland are you using - it might be a bug which was reported and fixed few days ago.

Best regards

Loading pre-create env

Over 4 years ago

Hi Filip

I hope the following method does answer your question.

def load_flatland_environment_from_file(file_name):
    rail_generator = rail_from_file(file_name)
    schedule_generator = schedule_from_file(file_name)

    environment = RailEnv(width=1,

    return environment

I tested this method on lastest master trunk. We found yesterday a bug in the load method. The bug is now fixed.


Call for participate

Over 4 years ago

Hello everybody

We are working hard on the next release of FLATland. We just launched FLATland 2.0 a few days ago. Unfortunately, we found many difficulties in the current codebase. The core team is working hard to fix all reported bugs, but we have restricted ressource. Thus it takes quite long time.

Beside fixing bugs, we also spent quite a lot of time in improving code quality and code readability. We did a refactoring of most of the FLATland core. Furthermore FLATland contains a new city generator, which is unfortunatelly not yet documented. You see, we give our best to make FLATland more easy to use and more robust with increased functionality. But we are still not there we like to be.

Since you all have made first expierence with FLATland. We believe that you are the most valuable persons in further improving the great FLATland environment.

You might can help in writting documentations, user manuals, and examples. We think you can quite easy support us, be writing down how to work with FLATland based on your own expirience. It might be more easy from your side to describe how to set up the environment and what is important to know before starting using the software. Our goal is getting a very easy to use environment. But as we know the environment from first line of code, it’s pretty hard to understand where the documentation has to focus on.

So we’re excited to get your feedback and your own effort in improving the FLATland story. Write down what ideas you have and what you like to get improved. We might can improve the whole API together. You might can implement some improvement directly into the public open codebase.

Do you like to help us with the documentations, with examples, with new features or even better with further improvement? FLATland should be a community product and it should be shaped by the whole community.


Reporting Bugs in Flatland

Over 4 years ago

Hi Filip

I just merged the latest change into master branch. Can you check if the bug is resolved, do you have a test python script?

kind regards

Reporting Bugs in Flatland

Over 4 years ago

Hi Filip

Thank you for reporting this bug. I am working on improving the sparse railway generator, respectively we might replace this with a new one or we will introduce a new generator.

But the bug has to be fixed anywhy.

To track the bug fix:


Reporting Bugs in Flatland

Over 4 years ago

Thanks for reporting. I opened an issue.

Torch learning fails on latest baseline/flatland repo

Over 4 years ago

Thank you for reporting the bug.

Training 1 Agents on (20,20). Episode 1771 Average Score: -0.401 Dones: 44.00%
Epsilon: 0.03 Action Probabilities: [0.0964351 0.19218464 0.15082267 0.22097806 0.33957952]/home/xarthur/Software/miniconda3/envs/flatland-rl/lib/python3.6/site-packages/flatland/envs/rail_generators.py:703: UserWarning: sparse_rail_generator: num_agents > nr_start_goal, changing num_agents
warnings.warn(“sparse_rail_generator: num_agents > nr_start_goal, changing num_agents”)
/home/xarthur/Software/miniconda3/envs/flatland-rl/lib/python3.6/site-packages/flatland/envs/schedule_generators.py:66: UserWarning: Too many agents! Changes number of agents.
warnings.warn(“Too many agents! Changes number of agents.”)
Traceback (most recent call last):
File “training_navigation.py”, line 226, in
File “training_navigation.py”, line 195, in main
done_window.append(tasks_finished / env.get_num_agents())
ZeroDivisionError: division by zero

This bug should be fixed. Just checkout latest version from master trunk.

adrian_egli has not provided any information yet.


Create Notebook