Round 1: Completed Weight: 1.0
2098
87
2
136

## 🕵️ Introduction

Consider a problem of a taxi driver, who serves six cities 0, 1, 2, 3, 4 and 5 which are located on a circular highway. The taxi driver can choose one of the following actions.

1. Cruise the streets looking for a passenger.
2. Go to the nearest taxi stand and wait in line. 3. Head back to headquarters (city 0).

Any passenger will only go to a preceding or succeeding city and the driver never cancels a ride. For a given city and any of the first two actions, there is a probability that the driver gets a passenger and goes to the succeeding or preceding city and there is also a probability that the driver doesn’t get any passenger and stays in the same city. Refer Table for the transition probabilities pkij which are represented as pkc where c = j − i.

The rewards for outcomes 1 and -1 for actions 1 and 2 are shown in Figure. When the driver doesn’t get a passenger in a city (c=0), then the reward for the driver is 0. Also, if the driver decides to go to headquarters, the reward is 0.

Suppose 1 − γ is the probability that the taxi will breakdown before the next trip. The driver’s goal is to maximize the total reward until his taxi breakdown

Implement the following: (2+1.5 marks)

• Find an optimal policy using policy iteration starting with a policy that will always cruise independent of the town, and a zero value vector. Let γ = 0.9.
• Run policy iteration for discount factors γ ranging from 0 to 0.95 with intervals of 0.05 and display the results.

Answer the following (based on the data given above): (1+0.5 mark)

• How is different values of γ affecting the policy iteration? Explain your findings.
• Give alternate transition probabilities for action 2(if exists) such that optimal policy consists of action 2. Explain your answer.

You will be writing your solutions & making a submission through a notebook. You can follow the instructions in the starter notebook.

## 💾 Dataset

Under the Resources section you will find data files that contains parameters for the environment for this problem.

## 🚀 Submission

• Submissions will be made through a notebook following the instructions in the starter notebook.
• Each Team can make 5 successful submissions and 5 failed submissions in a day. Once the limit of failed submission is reached, the submission will be counted in the successful submission.
• The submission limit will reset at 5:30 AM IST every day.
• At the end of the challenge, you will have to select 1 submission as the final one. You can select that here.

## 📱 Contact

• RL TAs

#### Notebooks

 1 Starter notebook By siddhartha Over 1 year ago 0