GridWorld#

Overview#

class GridWorld(origin_shape=None)[source]#

Base class for gridworld environment.

Gridworld environment consists of a world, objects, and one agent. The world consists of multiple connected rectangle areas and each area is represented by a two-dimensional gird graph, which has each node connected to its four nearest neighbors. Each node represents a state in the world, and has an attribute altitude. When the agent moves from one state toward another state, it will get a reward generated by the altitude change (if there is), i.e.

\[R_{move} = A_s - A_{s + 1}\]

where \(R_{move}\) is the movement reward and \(A\) represents the altitude of current state \(s\) and next state \(s + 1\).

At each position(state), the agent can choose from 5 actions to move towards UP, DOWN, LEFT, RIGHT, and STAY in the same state. When the performed movement would make the agent get out of the world, the agent would be forced to stay in the same state.

Objects where the agent can get reward are placed at different states and each state can only obtain one object. Each object has its own adjustable probability (prob) of getting a reward when the agent reaches the state with this object, if the agent fails to get a reward, it will get a punishment (punish).

\[\begin{split}R_{object} = \left \{ \begin{aligned} & reward, P=p \\ & punish, P=1-p \end{aligned} \right.\end{split}\]

Under this situation, the total reward for this step will be the movement reward adding the object reward.

\[R_{total} = R_{move} + R_{object}\]

So long as the agent gets to an object (no matter it was reward or punish that it got), this trial is finished and then the agent will be sent back to the start state of each trial.

Parameters:: origin_shape (tuple of ints (optional, default: None)) – Shape of the world origin. If not provided, the origin will be initialized to be only one state (0, 0, 0), otherwise it will be a rectangular area of shape origin_shape.

Examples

Initialize a gridworld environment with only an origin state.

>>> W = GridWorld()

W can be grown in several aspects.

Areas:

Add one area of shape (2, 2).

>>> W.add_area((2, 2))

Remove areas.

>>> W.remove_area(1)

Set area altitude.

>>> W.add_area((3, 3))
>>> W.set_altitude(1, altitude_mat=np.random.randn(3, 3))

Paths:

Add inter-area paths.

>>> W = GridWorld()
>>> W.add_area((2, 2))
>>> W.add_path((0, 0, 0), (1, 0, 0))
>>> W.add_area((2, 2))
>>> W.add_path((1, 1, 1), (2, 1, 0), register_action=(0, 1))

Remove paths.

>>> W.remove_path((1, 1, 1), (2, 1, 0))

Objects:

Add objects.

>>> W = GridWorld()
>>> W.add_area((2, 2))
>>> W.add_object((0, 0, 0), reward=1, prob=0.7)
>>> W.add_object((1, 0, 0), reward=1, prob=0.3, punish=-10)

Remove objects.

>>> W.remove_object((1, 0, 0))

Update object attributes.

>>> W.update_object((0, 0, 0), reward=10)
>>> W.update_object((0, 0, 0), reward=1, prob=0.8)

Agent:

>>> W = GridWorld()
>>> W.init_agent()

One can also manually set the agent initial state.

>>> W = GridWorld()
>>> W.add_area((2, 2))
>>> W.add_path((0, 0, 0), (1, 0, 0))
>>> W.init_agent(init_coord=(1, 1, 1))

When the agent is initialized, the agent can move in the world and get rewards.

>>> next_state, reward, done = W.step(action=(0, 1))

Reset:

To reset the environment, first set a reset checkpoint.

>>> W.set_reset_checkpoint()

Then the environment can be reset if needed.

>>> W.reset()

Methods#

Configuring the environment#

`GridWorld.__init__`([origin_shape])	Initialize a gridworld environment.
`GridWorld.add_area`(shape[, name])	Add a new area to the world.
`GridWorld.remove_area`(area)	Remove an area from the world.
`GridWorld.set_area_name`(area, name)	Set an alias name for an area.
`GridWorld.add_path`(coord_from, coord_to[, ...])	Add a new inter-area connection.
`GridWorld.remove_path`(coord_from, coord_to)	Remove one inter-area connection from the world.
`GridWorld.add_object`(coord, reward, prob[, ...])	Add one object to the world.
`GridWorld.remove_object`(coord)	Remove one object from the world.
`GridWorld.update_object`(coord, **attr)	Reset object attributes.
`GridWorld.set_altitude`(area, altitude_mat)	Set the altitude of each state for one area.
`GridWorld.block`(coord)	Block one state.
`GridWorld.unblock`(coord)	Unblock one state.
`GridWorld.init_agent`([init_coord, overwrite])	Initialize an agent in the world.
`GridWorld.set_reset_checkpoint`([overwrite])	Set environment checkpoint for reset.
`GridWorld.reset`()	Reset the environment to the checkpoint state.

Get environment information#

`GridWorld.world`	A copy of `world` attribute of the gridworld environment.
`GridWorld.time`	Gridworld environment time.
`GridWorld.num_area`	Number of areas in the `world` of gridworld environment.
`GridWorld.actions`	Action space of the gridworld environment.
`GridWorld.has_reset_checkpoint`	Whether there is a reset checkpoint for the gridworld environment.
`GridWorld.get_area_name`(area_idx)	Get the alias name of an area using area index.
`GridWorld.get_area_index`(area_name)	Get the index of an area with its alias name.
`GridWorld.get_area_shape`(area)	Get the shape of one area.
`GridWorld.get_area_altitude`(area)	Get the altitude of each state in one area.
`GridWorld.get_object_attribute`(coord, attr)	Get the value of object attribute.
`GridWorld.get_agent_state`([when])	Get state of the agent.

Moving the agent#

GridWorld.step(action)

Make the agent move toward direction given by action.