Policies¶

Base Classes¶

All agents have a Policy, which nominally converts an observation to an action. Agents whose decision-making is completely internal to the environment have an InternalPolicy; those whose decision-making occurs at least partially externally have an ExternalPolicy.

class gym_collision_avoidance.envs.policies.Policy.Policy(str='NoPolicy')¶

Each Agent has one of these, which nominally converts an observation to an action

Parameters:	is_still_learning – (bool) whether this policy is still being learned (i.e., weights are changing during execution) is_external – (bool) whether the Policy computes its own actions or relies on an external process to provide an action.

near_goal_smoother(dist_to_goal, pref_speed, heading, raw_action)¶

Linearly ramp down speed/turning if agent is near goal, stop if close enough.

I think this is just here for convenience, but nobody uses it? We used it on the jackal for sure.

class gym_collision_avoidance.envs.policies.InternalPolicy.InternalPolicy(str='Internal')¶

Convert an observation to an action completely within the environment (for model-based/pre-trained, simulated agents).

Please see the possible subclasses at Internal Policies.

find_next_action(obs, agents, i)¶

Use the provided inputs to select a commanded action [heading delta, speed]

Parameters:	obs (dict) – this `Agent` ‘s observation vector agents (list) – of `Agent` objects i (int) – index of agents list corresponding to this agent
Returns:	To be implemented by children.

class gym_collision_avoidance.envs.policies.ExternalPolicy.ExternalPolicy(str='External')¶

Please see the possible subclasses at External Policies.

external_action_to_action(agent, external_action)¶: Dummy method to be re-implemented by subclasses

find_next_action(obs, agents, i)¶

External policies don’t compute a commanded action [heading delta, speed]

Returns:	None

Internal Policies¶

Simple Policies¶

StaticPolicy¶

class gym_collision_avoidance.envs.policies.StaticPolicy.StaticPolicy¶

For an agent who never moves, useful for confirming algorithms can avoid static objects too

find_next_action(obs, agents, i)¶

Static Agents do not move, so just set goal to current pos and action to zero.

Parameters:	obs (dict) – ignored agents (list) – of Agent objects i (int) – this agent’s index in that list
Returns:	np array of shape (2,)… [spd, delta_heading] both are zero.

NonCooperativePolicy¶

class gym_collision_avoidance.envs.policies.NonCooperativePolicy.NonCooperativePolicy¶

Non Cooperative Agents simply drive at pref speed toward the goal, ignoring other agents.

find_next_action(obs, agents, i)¶

Go at pref_speed, apply a change in heading equal to zero out current ego heading (heading to goal)

Parameters:	obs (dict) – ignored agents (list) – of Agent objects i (int) – this agent’s index in that list
Returns:	np array of shape (2,)… [spd, delta_heading]

Model-Based Policies¶

RVOPolicy¶

class gym_collision_avoidance.envs.policies.RVOPolicy.RVOPolicy¶

find_next_action(obs, agents, agent_index)¶

Use the provided inputs to select a commanded action [heading delta, speed]

Parameters:	obs (dict) – this `Agent` ‘s observation vector agents (list) – of `Agent` objects i (int) – index of agents list corresponding to this agent
Returns:	To be implemented by children.

Learned Policies¶

CADRLPolicy¶

class gym_collision_avoidance.envs.policies.CADRLPolicy.CADRLPolicy¶

Re-purposed from: Socially Aware Motion Planning with Deep Reinforcement Learning

Loads a pre-traned SA-CADRL 4-agent network (with no social norm preference LHS/RHS). Some methods to convert the gym agent representation to the numpy arrays used in the old code.

convert_host_agent_to_cadrl_state(agent)¶

Convert this repo’s state representation format into the legacy cadrl format for the host agent

Parameters:	agent (`Agent`) – this agent
Returns:	10-element (np array) describing current state

convert_other_agents_to_cadrl_state(host_agent, other_agents)¶

Convert this repo’s state representation format into the legacy cadrl format for the other agents in the environment.

Filtering other agents’ velocities was crucial to replicate SA-CADRL results

Parameters:

host_agent (Agent) – this agent
other_agents (list) – of all the other Agent objects

Returns:

(3 x 10) np array (this cadrl can handle 3 other agents), each has 10-element state vector
(3 x 2) np array of other agents’ filtered velocities

find_next_action(obs, agents, i)¶

Converts environment’s agents representation to CADRL format, then queries NN

Parameters:	obs (dict) – ignored agents (list) – of `Agent` objects i (int) – index of agents list corresponding to this agent
Returns:	commanded [heading delta, speed]

find_next_action_and_value(obs, agents, i)¶: Same as find_next_action but also queries value fn

parse_agents(agents, i)¶

Convert from gym env representation of agents to CADRL’s representation.

Parameters:	obs (dict) – ignored agents (list) – of `Agent` objects i (int) – index of agents list corresponding to this agent
Returns:	this agent agent_state (np array): CADRL representation of this agent’s state other_agents_state (np array): CADRL repr. of other agents’ states other_agents_actions (np array): CADRL repr. of other agents’ current actions
Return type:	host_agent (`Agent`)

query_and_rescale_action(host_agent, agent_state, other_agents_state, other_agents_actions)¶: If there’s nobody around, just go straight to goal, otherwise query DNN and make heading action an offset from current heading

GA3CCADRLPolicy¶

class gym_collision_avoidance.envs.policies.GA3CCADRLPolicy.GA3CCADRLPolicy¶

Pre-trained policy from Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning

By default, loads a pre-trained LSTM network (GA3C-CADRL-10-LSTM from the paper). There are 11 discrete actions with max heading angle change of $pm pi/6$.

find_next_action(obs, agents, i)¶

Using only the dictionary obs, convert this to the vector needed for the GA3C-CADRL network, query the network, adjust the actions for this env.

Parameters:	obs (dict) – this `Agent` ‘s observation vector agents (list) – [unused] of `Agent` objects i (int) – [unused] index of agents list corresponding to this agent
Returns:	[spd, heading change] command

initialize_network(**kwargs)¶

Load the model parameters of either a default file, or if provided through kwargs, a specific path and/or tensorflow checkpoint.

Parameters:	kwargs['checkpt_name'] (str) – name of checkpoint file to load (without file extension) kwargs['checkpt_dir'] (str) – path to checkpoint

DRLLongPolicy¶

class gym_collision_avoidance.envs.policies.DRLLongPolicy.DRLLongPolicy¶

Wrapper for an implementation of Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning.

Based on this open-source implementation.

Note

This policy is not fully working with this version of the code

find_next_action(obs, agents, i)¶

Normalize the laserscan, grab the goal position, query the NN, return the action.

TODO

External Policies¶

Note

TODO

Still Being Trained¶

LearningPolicy¶

class gym_collision_avoidance.envs.policies.LearningPolicy.LearningPolicy¶

An RL policy that is still being trained or otherwise fed actions from an external script, but still needs to convert the external actions to this env’s format

external_action_to_action(agent, external_action)¶

Convert the external_action into an action for this environment using properties about the agent.

For instance, RL network might have continuous outputs between [0-1], which could be scaled by this method to correspond to a speed between [0, pref_speed], without the RL network needing to know the agent’s preferred speed.

Parameters:	agent (`Agent`) – the agent who has this policy external_action (int, array, ..) – what the learning system returned for an action
Returns:	[speed, heading_change] command

LearningPolicyGA3C¶

class gym_collision_avoidance.envs.policies.LearningPolicyGA3C.LearningPolicyGA3C¶

The GA3C-CADRL policy while it’s still being trained (an external process provides a discrete action input)

external_action_to_action(agent, external_action)¶

Convert the discrete external_action into an action for this environment using properties about the agent.

Parameters:	agent (`Agent`) – the agent who has this policy external_action (int) – discrete action between 0-11 directly from the network output
Returns:	[speed, heading_change] command

Pre-trained, but still external¶

CARRLPolicy¶

class gym_collision_avoidance.envs.policies.CARRLPolicy.CARRLPolicy¶

Wrapper for the policy related to Certified Adversarial Robustness for Deep Reinforcement Learning

Note

None of the interesting aspects of the policy are implemented here, as that software is under IP protection currently.

convert_to_action(discrete_action)¶

The CARRL code (external) provides the index of the desired action (but doesn’t need to know the details of what that means in this environment), so we convert that index to an environment-specific action here.

Parameters:	discrete_action (int) – index corresponding to the desired element of self.actions
Returns:	[speed, heading delta] corresponding to the provided index