Policies¶
Base Classes¶
All agents have a Policy, which nominally converts an observation to an action. Agents whose decision-making is completely internal to the environment have an InternalPolicy; those whose decision-making occurs at least partially externally have an ExternalPolicy.
-
class
gym_collision_avoidance.envs.policies.Policy.
Policy
(str='NoPolicy')¶ Each
Agent
has one of these, which nominally converts an observation to an actionParameters: - is_still_learning – (bool) whether this policy is still being learned (i.e., weights are changing during execution)
- is_external – (bool) whether the Policy computes its own actions or relies on an external process to provide an action.
-
near_goal_smoother
(dist_to_goal, pref_speed, heading, raw_action)¶ Linearly ramp down speed/turning if agent is near goal, stop if close enough.
I think this is just here for convenience, but nobody uses it? We used it on the jackal for sure.
-
class
gym_collision_avoidance.envs.policies.InternalPolicy.
InternalPolicy
(str='Internal')¶ Convert an observation to an action completely within the environment (for model-based/pre-trained, simulated agents).
Please see the possible subclasses at Internal Policies.
-
class
gym_collision_avoidance.envs.policies.ExternalPolicy.
ExternalPolicy
(str='External')¶ Please see the possible subclasses at External Policies.
-
external_action_to_action
(agent, external_action)¶ Dummy method to be re-implemented by subclasses
-
find_next_action
(obs, agents, i)¶ External policies don’t compute a commanded action [heading delta, speed]
Returns: None
-
Internal Policies¶
Simple Policies¶
StaticPolicy¶
-
class
gym_collision_avoidance.envs.policies.StaticPolicy.
StaticPolicy
¶ For an agent who never moves, useful for confirming algorithms can avoid static objects too
-
find_next_action
(obs, agents, i)¶ Static Agents do not move, so just set goal to current pos and action to zero.
Parameters: - obs (dict) – ignored
- agents (list) – of Agent objects
- i (int) – this agent’s index in that list
Returns: np array of shape (2,)… [spd, delta_heading] both are zero.
-
NonCooperativePolicy¶
-
class
gym_collision_avoidance.envs.policies.NonCooperativePolicy.
NonCooperativePolicy
¶ Non Cooperative Agents simply drive at pref speed toward the goal, ignoring other agents.
-
find_next_action
(obs, agents, i)¶ Go at pref_speed, apply a change in heading equal to zero out current ego heading (heading to goal)
Parameters: - obs (dict) – ignored
- agents (list) – of Agent objects
- i (int) – this agent’s index in that list
Returns: np array of shape (2,)… [spd, delta_heading]
-
Model-Based Policies¶
Learned Policies¶
CADRLPolicy¶
-
class
gym_collision_avoidance.envs.policies.CADRLPolicy.
CADRLPolicy
¶ Re-purposed from: Socially Aware Motion Planning with Deep Reinforcement Learning
Loads a pre-traned SA-CADRL 4-agent network (with no social norm preference LHS/RHS). Some methods to convert the gym agent representation to the numpy arrays used in the old code.
-
convert_host_agent_to_cadrl_state
(agent)¶ Convert this repo’s state representation format into the legacy cadrl format for the host agent
Parameters: agent ( Agent
) – this agentReturns: 10-element (np array) describing current state
-
convert_other_agents_to_cadrl_state
(host_agent, other_agents)¶ Convert this repo’s state representation format into the legacy cadrl format for the other agents in the environment.
Filtering other agents’ velocities was crucial to replicate SA-CADRL results
Parameters: Returns: - (3 x 10) np array (this cadrl can handle 3 other agents), each has 10-element state vector
- (3 x 2) np array of other agents’ filtered velocities
-
find_next_action
(obs, agents, i)¶ Converts environment’s agents representation to CADRL format, then queries NN
Parameters: - obs (dict) – ignored
- agents (list) – of
Agent
objects - i (int) – index of agents list corresponding to this agent
Returns: commanded [heading delta, speed]
-
find_next_action_and_value
(obs, agents, i)¶ Same as find_next_action but also queries value fn
-
parse_agents
(agents, i)¶ Convert from gym env representation of agents to CADRL’s representation.
Parameters: - obs (dict) – ignored
- agents (list) – of
Agent
objects - i (int) – index of agents list corresponding to this agent
Returns: this agent agent_state (np array): CADRL representation of this agent’s state other_agents_state (np array): CADRL repr. of other agents’ states other_agents_actions (np array): CADRL repr. of other agents’ current actions
Return type: host_agent (
Agent
)
-
query_and_rescale_action
(host_agent, agent_state, other_agents_state, other_agents_actions)¶ If there’s nobody around, just go straight to goal, otherwise query DNN and make heading action an offset from current heading
-
GA3CCADRLPolicy¶
-
class
gym_collision_avoidance.envs.policies.GA3CCADRLPolicy.
GA3CCADRLPolicy
¶ Pre-trained policy from Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning
By default, loads a pre-trained LSTM network (GA3C-CADRL-10-LSTM from the paper). There are 11 discrete actions with max heading angle change of $pm pi/6$.
-
find_next_action
(obs, agents, i)¶ Using only the dictionary obs, convert this to the vector needed for the GA3C-CADRL network, query the network, adjust the actions for this env.
Parameters: Returns: [spd, heading change] command
-
initialize_network
(**kwargs)¶ Load the model parameters of either a default file, or if provided through kwargs, a specific path and/or tensorflow checkpoint.
Parameters: - kwargs['checkpt_name'] (str) – name of checkpoint file to load (without file extension)
- kwargs['checkpt_dir'] (str) – path to checkpoint
-
DRLLongPolicy¶
-
class
gym_collision_avoidance.envs.policies.DRLLongPolicy.
DRLLongPolicy
¶ Wrapper for an implementation of Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning.
Based on this open-source implementation.
Note
This policy is not fully working with this version of the code
-
find_next_action
(obs, agents, i)¶ Normalize the laserscan, grab the goal position, query the NN, return the action.
TODO
-
External Policies¶
Note
TODO
Still Being Trained¶
LearningPolicy¶
-
class
gym_collision_avoidance.envs.policies.LearningPolicy.
LearningPolicy
¶ An RL policy that is still being trained or otherwise fed actions from an external script, but still needs to convert the external actions to this env’s format
-
external_action_to_action
(agent, external_action)¶ Convert the external_action into an action for this environment using properties about the agent.
For instance, RL network might have continuous outputs between [0-1], which could be scaled by this method to correspond to a speed between [0, pref_speed], without the RL network needing to know the agent’s preferred speed.
Parameters: - agent (
Agent
) – the agent who has this policy - external_action (int, array, ..) – what the learning system returned for an action
Returns: [speed, heading_change] command
- agent (
-
LearningPolicyGA3C¶
-
class
gym_collision_avoidance.envs.policies.LearningPolicyGA3C.
LearningPolicyGA3C
¶ The GA3C-CADRL policy while it’s still being trained (an external process provides a discrete action input)
-
external_action_to_action
(agent, external_action)¶ Convert the discrete external_action into an action for this environment using properties about the agent.
Parameters: - agent (
Agent
) – the agent who has this policy - external_action (int) – discrete action between 0-11 directly from the network output
Returns: [speed, heading_change] command
- agent (
-
Pre-trained, but still external¶
CARRLPolicy¶
-
class
gym_collision_avoidance.envs.policies.CARRLPolicy.
CARRLPolicy
¶ Wrapper for the policy related to Certified Adversarial Robustness for Deep Reinforcement Learning
Note
None of the interesting aspects of the policy are implemented here, as that software is under IP protection currently.
-
convert_to_action
(discrete_action)¶ The CARRL code (external) provides the index of the desired action (but doesn’t need to know the details of what that means in this environment), so we convert that index to an environment-specific action here.
Parameters: discrete_action (int) – index corresponding to the desired element of self.actions Returns: [speed, heading delta] corresponding to the provided index
-