Loading

NeurIPS 2022: CityLearn Challenge

Baby Steps

Getting Started with the CityLearn Environment

Chemago

BABY STEPS - Getting Started

Author: Chia E Tungom
Email: bamtungom@protonmail.com

This Notebook demonstrates the basic facets of the CityLearn Environment. You can play with it to get familiar with the environment. Important aspects of the environment that covered include include:

  1. Observation Space (dataset)

  2. Action Space (discrete or continous)

  3. Model (Policy)

  4. Action (steps)

  5. Evaluation (reward)

We use general purpose functions common to most RL environments for illustration.

Note: To run this notebook, place it in the root directory of your CityLearn Phase one repository (same directory as requirements.txt)

Lets Goooooo!!!

In [3]:
import numpy as np
import time

"""
Please do not make changes to this file. 
This is only a reference script provided to allow you 
to do local evaluation. The evaluator **DOES NOT** 
use this script for orchestrating the evaluations. 
"""

# to avoid crashes but might cause results to be different 
# https://github.com/dmlc/xgboost/issues/1715

# import os
# os.environ['KMP_DUPLICATE_LIB_OK']='True'


from agents.orderenforcingwrapper import OrderEnforcingAgent
from citylearn.citylearn import CityLearnEnv
In [4]:
# Custom configure enviroment 
class Constants:
    episodes = 3
    schema_path = './data/citylearn_challenge_2022_phase_1/schema.json'

def action_space_to_dict(aspace):
    """ Only for box space """
    return { "high": aspace.high,
             "low": aspace.low,
             "shape": aspace.shape,
             "dtype": str(aspace.dtype)
    }

def env_reset(env):
    observations = env.reset()
    action_space = env.action_space
    observation_space = env.observation_space
    building_info = env.get_building_information()
    building_info = list(building_info.values())
    action_space_dicts = [action_space_to_dict(asp) for asp in action_space]
    observation_space_dicts = [action_space_to_dict(osp) for osp in observation_space]
    obs_dict = {"action_space": action_space_dicts,
                "observation_space": observation_space_dicts,
                "building_info": building_info,
                "observation": observations }
    return obs_dict

1. Define Environment

The first thing we need to do is create a CityLearn environment. The environment is defined using a json schema and dataset which can be found in the data directory.

In [5]:
# Understand CityLearn Environment

env = CityLearnEnv(schema=Constants.schema_path)

2. OBSERVATION SPACE

The observation space is the data of the environment. This is what the agent sees inorder to decide which action to take.

Based on our environment the observation space is 5 dimensional corresponding to the number of buildings. Each building has it's own observation which is a 28 dimension 1D array. The 1D array stands for an observation at one point in time. Therefore our environment is a 5x28 array

  1. Use env.observation_space to explore the entire environment
  2. Use env.observation_space[index] to esplore the envrionment of a particular building (index 0 for building 1)
In [6]:
# There is an action space for every building
# print(f' OBSERVATION SPACES {env.observation_space}')
# print(f' OBSERVATION SPACE for Builiding ONE is {env.observation_space[0]}')

# sample some actions
for building in range(5):
    print(f' SAMPLE OBSERVATION SPACE for Builiding ONE >>> {len(env.observation_space[building].sample()), env.observation_space[building].sample()}')

# we can see the observations are a 28 1D numpy array with every dimension defined by the range given in the spaces BOX
 SAMPLE OBSERVATION SPACE for Builiding ONE >>> (28, array([ 3.72611475e+00,  5.90002871e+00,  3.38330746e+00,  1.85652428e+01,
        1.10114441e+01,  3.14579144e+01,  2.62133846e+01,  1.56115332e+01,
        7.45751266e+01,  7.29705200e+01,  5.58529778e+01,  2.11469635e+02,
        4.79321472e+02,  8.91451477e+02,  7.35224609e+02,  4.92095062e+02,
        4.25108917e+02,  1.49664703e+02,  2.14000732e+02, -8.54978502e-01,
        1.06144376e-01,  5.88945862e+02, -6.58616662e-01,  7.11129333e+02,
       -7.65464902e-01, -5.47531724e-01,  5.26078463e-01,  5.96680045e-02],
      dtype=float32))
 SAMPLE OBSERVATION SPACE for Builiding ONE >>> (28, array([ 4.0389609e+00,  2.8432424e+00,  9.9948273e+00,  2.4518454e+01,
        3.1930521e+01,  2.2542482e+01,  2.2376108e+01,  1.8229486e+01,
        7.0618614e+01,  1.5685252e+01,  8.3693321e+01,  5.7827277e+02,
        5.3332483e+02,  1.6341650e+01,  1.5506995e+02,  5.1457767e+01,
        3.0998364e+02,  5.2089398e+02,  9.4451117e+02, -5.8035105e-01,
        7.4431238e+00,  1.7595216e+02,  1.6815170e+00,  1.2750994e+02,
        4.0032055e-02, -4.9875233e-01,  5.7025725e-01,  2.1613620e-01],
      dtype=float32))
 SAMPLE OBSERVATION SPACE for Builiding ONE >>> (28, array([ 1.17186365e+01,  3.18270946e+00,  6.14653444e+00,  3.25422440e+01,
        2.62085438e+01,  1.44684963e+01,  7.41451883e+00,  2.83720951e+01,
        2.80027790e+01,  2.28040695e+01,  1.80689659e+01,  1.69564529e+02,
        5.42111740e+01,  3.38281250e+02,  2.67779053e+02,  2.25956726e+01,
        1.03206261e+02,  8.85538513e+02,  2.78913879e+02,  5.94975911e-02,
        8.67778808e-02,  4.45558319e+02, -2.55736768e-01, -4.52406555e+02,
        2.86849767e-01,  1.04062557e+00, -5.00637412e-01,  8.92925680e-01],
      dtype=float32))
 SAMPLE OBSERVATION SPACE for Builiding ONE >>> (28, array([ 1.4095416e+00,  4.7277827e+00,  2.3003006e+01,  1.3902643e+01,
        8.7876978e+00,  2.8277906e+01,  1.1951243e+01,  8.7856026e+01,
        1.3805855e+01,  5.7982998e+01,  8.7259422e+01,  7.4579242e+02,
        7.5220001e+02,  4.5045831e+02,  3.6321094e+02,  7.0574365e+02,
        2.4187749e+02,  2.1142715e+02,  4.1702780e+02,  7.8608274e-01,
        6.3703117e+00,  5.1346136e+02, -3.7115738e-01, -4.0904335e+01,
       -4.3367735e-01, -6.4988054e-02, -3.2038456e-01,  1.3591734e+00],
      dtype=float32))
 SAMPLE OBSERVATION SPACE for Builiding ONE >>> (28, array([ 7.6544585e+00,  6.4193420e+00,  8.9196646e-01,  2.5086798e+01,
        2.2238373e+01,  2.5978731e+01,  1.6386499e+01,  1.6177193e+01,
        5.1696659e+01,  2.0904518e+01,  5.2468597e+01,  9.8244214e+02,
        6.7750659e+02,  7.9618781e+02,  1.5748044e+02,  4.2071628e+02,
        4.4860941e+02,  6.3910620e+02,  1.4609628e+02, -9.1379322e-03,
        2.3886821e+00,  5.3471820e+02, -9.1612226e-01, -1.3235274e+02,
        8.9882714e-01,  3.9591041e-01, -5.5951768e-01,  8.2208580e-01],
      dtype=float32))

3. ACTION SPACE

This shows us the type of actions we can take along with the dimension and property (discrete of contineous) of each actions. In the citylearn challenge, the actions are continous and one dimensional in the range [-1,1] for each building. 1 means charging and -1 means discharging.

  • Based on our environment, the action space is a 5 dimensional array with each array corresponding to the action space of a building.
  • one array is of the form [(-1,1), (1,), float32] which correspond to [(lower bound, upper bound), (dimension,), datatype]
  • lower bound is the lowest or smallest value of an action while upper bound is the highest.
  • Dimension stands for of our action which here is 1 (use action_space.sample() to see an action)
  • Datatype is the data type of our action which here is float

The cell below illustrates the action space(s). Play with it for understanding the actions.

action_space.sample produces a random actions

Note: You must pick an action space of a given building inorder to sample (use index e.g action_space[0])

In [7]:
# There is an action space for every building
print(f' ACTION SPACES {env.action_space}')
print(f' ACTION SPACE for Builiding ONE is {env.action_space[0]}')

# sample some actions
for action in range(5):
    print(f' SAMPLE ACTION SPACE for Builiding ONE >>> {env.action_space[1].sample()}')

# we can observe the actions are continous in the range [-1,1]
 ACTION SPACES [Box(-1.0, 1.0, (1,), float32), Box(-1.0, 1.0, (1,), float32), Box(-1.0, 1.0, (1,), float32), Box(-1.0, 1.0, (1,), float32), Box(-1.0, 1.0, (1,), float32)]
 ACTION SPACE for Builiding ONE is Box(-1.0, 1.0, (1,), float32)
 SAMPLE ACTION SPACE for Builiding ONE >>> [0.00192693]
 SAMPLE ACTION SPACE for Builiding ONE >>> [-0.11414329]
 SAMPLE ACTION SPACE for Builiding ONE >>> [0.46518373]
 SAMPLE ACTION SPACE for Builiding ONE >>> [0.9710998]
 SAMPLE ACTION SPACE for Builiding ONE >>> [-0.9744655]

4. Define A Model or Agent

The agent is the Policy which decides what action to take given an observation. We can use Rule based actions(agents). The CityLearn setting is built for multiagent systems but a single agent can aslo be used.

Here we just show how to load an agent

In [8]:
from citylearn.agents.sac import SAC

# SAC??

5. TAKING AN ACTION

As already explained with the action spaces, $n$ buildings will have $n$ actions with each action corresponding to one building. Therefore our actions should appear as follows

  • Action should be a List containing tuples(number of buildings). inside the tuple is a list conatining the action corresponding to the action to be taken for a given building
  • Example for a five buildings environment, we could have.
Actions = [ ([0.0]), ([0.0]), ([0.0]), ([0.0]), ([0.0]) ]

A list of list is also acceptable

Actions = [ [0.0], [0.0], [0.0], [0.0], [0.0] ]

We take an action when we want to move one step ahead. We can do this using env.step(action)

When we take an action the output contains a tuple with the following:

  1. Next State
  2. Reward
  3. If the state is a Terminal State
  4. Information about the environment
In [10]:
# print(env_reset(env)["action_space"])
# env_reset(env)["observation_space"]
# env.reset()[0]

import random
Actions = [([random.uniform(-1,1)]) for _ in range(5)]
print(f' WE are about to take {Actions} \n')
next_state, reward, terminal, info = env.step(Actions)

print(f' NEXT STATE \n {next_state} \n')
print(f' REWARDS {reward} \n')
print(f' TERMINAL OR NOT >> {terminal} \n')
print(f' INFO {info}')


# obs_dict = env_reset(env)
# agent = OrderEnforcingAgent()
# print(agent.register_reset(obs_dict))
# env.step(agent.register_reset(obs_dict))
 WE are about to take [[0.6012819693791092], [-0.31427934930108825], [0.9886604954903087], [-0.4034571694315179], [0.46058412170506324]] 

 NEXT STATE 
 [[8, 1, 2, 19.7, 21.1, 22.2, 19.4, 78.0, 73.0, 73.0, 87.0, 0.0, 420.0, 683.0, 0.0, 0.0, 592.0, 291.0, 0.0, 0.1545025601953125, 0.8346000000000005, 0.0, 0.8299186694781067, 4.785508273152418, 0.22, 0.22, 0.22, 0.22], [8, 1, 2, 19.7, 21.1, 22.2, 19.4, 78.0, 73.0, 73.0, 87.0, 0.0, 420.0, 683.0, 0.0, 0.0, 592.0, 291.0, 0.0, 0.1545025601953125, 1.1012500000000005, 0.0, 0.0, -0.5955546875415629, 0.22, 0.22, 0.22, 0.22], [8, 1, 2, 19.7, 21.1, 22.2, 19.4, 78.0, 73.0, 73.0, 87.0, 0.0, 420.0, 683.0, 0.0, 0.0, 592.0, 291.0, 0.0, 0.1545025601953125, 1.0083516438802096e-07, 0.0, 0.7501442104720216, 5.207332165081885, 0.22, 0.22, 0.22, 0.22], [8, 1, 2, 19.7, 21.1, 22.2, 19.4, 78.0, 73.0, 73.0, 87.0, 0.0, 420.0, 683.0, 0.0, 0.0, 592.0, 291.0, 0.0, 0.1545025601953125, 0.4758166666666665, 0.0, 0.0, 0.4758166666666665, 0.22, 0.22, 0.22, 0.22], [8, 1, 2, 19.7, 21.1, 22.2, 19.4, 78.0, 73.0, 73.0, 87.0, 0.0, 420.0, 683.0, 0.0, 0.0, 592.0, 291.0, 0.0, 0.1545025601953125, 0.5030500000000001, 0.0, 0.44618232936462265, 3.545934943922044, 0.22, 0.22, 0.22, 0.22]] 

 REWARDS [-1.7921851  -0.         -1.95015923 -0.17819456 -1.32796171] 

 TERMINAL OR NOT >> False 

 INFO {}

6. Evaluating Actions

After Taking actions we can evaluate the performance of our agent or agents.

Evalution is done using the final metric which is the price cost and Emission cost

In [11]:
env.evaluate()
Out[11]:
(1.890623421222926, 1.880297274737686)

SAMPLE RUN or LOCAL EVALUATION

Some modification have been made from the origial code. For isinstance

  • We can run a test for a month i.e $30*24$ to quickly evaluate our agent

we add the following code in the evaluation section

# Skipping to shorten training time
    days = 30*5
    training_steps = 24*days
    skipping = False
In [12]:
import numpy as np
import time

"""
Please do not make changes to this file. 
This is only a reference script provided to allow you 
to do local evaluation. The evaluator **DOES NOT** 
use this script for orchestrating the evaluations. 
"""

from agents.orderenforcingwrapper import OrderEnforcingAgent
from citylearn.citylearn import CityLearnEnv

class Constants:
    episodes = 5
    schema_path = './data/citylearn_challenge_2022_phase_1/schema.json'

def action_space_to_dict(aspace):
    """ Only for box space """
    return { "high": aspace.high,
             "low": aspace.low,
             "shape": aspace.shape,
             "dtype": str(aspace.dtype)
    }

def env_reset(env):
    observations = env.reset()
    action_space = env.action_space
    observation_space = env.observation_space
    building_info = env.get_building_information()
    building_info = list(building_info.values())
    action_space_dicts = [action_space_to_dict(asp) for asp in action_space]
    observation_space_dicts = [action_space_to_dict(osp) for osp in observation_space]
    obs_dict = {"action_space": action_space_dicts,
                "observation_space": observation_space_dicts,
                "building_info": building_info,
                "observation": observations }
    return obs_dict


def evaluate():
    print("Starting local evaluation")
    
    env = CityLearnEnv(schema=Constants.schema_path)
    agent = OrderEnforcingAgent()

    obs_dict = env_reset(env)

    agent_time_elapsed = 0

    step_start = time.perf_counter()
    actions = agent.register_reset(obs_dict)
    agent_time_elapsed += time.perf_counter()- step_start

    episodes_completed = 0
    num_steps = 0
    interrupted = False
    episode_metrics = []
    
    # Skipping to shorten training time
    days = 30*5
    training_steps = 24*days
    skipping = False
    
    try:
        while True:
            
            ### This is only a reference script provided to allow you 
            ### to do local evaluation. The evaluator **DOES NOT** 
            ### use this script for orchestrating the evaluations. 

            observations, _, done, _ = env.step(actions)
            if done or skipping:
                episodes_completed += 1
                metrics_t = env.evaluate()
                metrics = {"price_cost": metrics_t[0], "emmision_cost": metrics_t[1]}
                if np.any(np.isnan(metrics_t)):
                    raise ValueError("Episode metrics are nan, please contant organizers")
                episode_metrics.append(metrics)
                print(f"Episode complete: {episodes_completed} | Latest episode metrics: {metrics}", )

                obs_dict = env_reset(env)

                step_start = time.perf_counter()
                actions = agent.register_reset(obs_dict)
                agent_time_elapsed += time.perf_counter()- step_start
            else:
                step_start = time.perf_counter()
                actions = agent.compute_action(observations)
                agent_time_elapsed += time.perf_counter()- step_start
            
            num_steps += 1
            if num_steps % 1000 == 0:
                print(f"Num Steps: {num_steps}, Num episodes: {episodes_completed}")
            
            ### End training in set time
            if num_steps % training_steps == 0:
                print(f"Num Steps: {num_steps}, Num episodes: {episodes_completed}")
                if num_steps == training_steps:
                    print(f'ENDING TRAINING AFTER {training_steps} STEPS')
                    skipping = True

            if episodes_completed >= Constants.episodes:
                break
    except KeyboardInterrupt:
        print("========================= Stopping Evaluation =========================")
        interrupted = True
    
    if not interrupted:
        print("=========================Completed=========================")

    if len(episode_metrics) > 0:
        print("Average Price Cost:", np.mean([e['price_cost'] for e in episode_metrics]))
        print("Average Emmision Cost:", np.mean([e['emmision_cost'] for e in episode_metrics]))
    print(f"Total time taken by agent: {agent_time_elapsed}s")

if __name__ == '__main__':
    evaluate()
Starting local evaluation
Num Steps: 1000, Num episodes: 0
Num Steps: 2000, Num episodes: 0

setting Up Environment requiremnents.txt and yml files

follow the links https://stackoverflow.com/questions/48787250/set-up-virtualenv-using-a-requirements-txt-generated-by-conda


Comments

You must login before you can post a comment.

Execute