From 01455321037dbec72d8ad9c6a7cb3d695fe19052 Mon Sep 17 00:00:00 2001 From: Marek Wolan Date: Sun, 9 Jul 2023 20:23:53 +0100 Subject: [PATCH] Update docs --- docs/source/about.rst | 86 +++++++++++++++----------------- docs/source/custom_agent.rst | 76 ++++++++++++++++++++++++++-- docs/source/primaite_session.rst | 2 +- 3 files changed, 111 insertions(+), 53 deletions(-) diff --git a/docs/source/about.rst b/docs/source/about.rst index 1f4669fe..a4a92b92 100644 --- a/docs/source/about.rst +++ b/docs/source/about.rst @@ -10,11 +10,11 @@ PrimAITE provides the following features: * A flexible network / system laydown based on the Python networkx framework * Nodes and links (edges) host Python classes in order to present attributes and methods (and hence, a more representative model of a platform / system) -* A ‘green agent’ Information Exchange Requirement (IER) function allows the representation of traffic (protocols and loading) on any / all links. Application of IERs is based on the status of node operating systems and services -* A ‘green agent’ node Pattern-of-Life (PoL) function allows the representation of core behaviours on nodes (e.g. Hardware state, Software State, Service state, File System state) +* A 'green agent' Information Exchange Requirement (IER) function allows the representation of traffic (protocols and loading) on any / all links. Application of IERs is based on the status of node operating systems and services +* A 'green agent' node Pattern-of-Life (PoL) function allows the representation of core behaviours on nodes (e.g. changing the Hardware state, Software State, Service state, or File System state) * An Access Control List (ACL) function, mimicking the behaviour of a network firewall, is applied across the model, following standard ACL rule format (e.g. DENY/ALLOW, source IP, destination IP, protocol and port). Application of IERs adheres to any ACL restrictions * Presents an OpenAI Gym interface to the environment, allowing integration with any OpenAI Gym compliant defensive agents -* Red agent activity based on ‘red’ IERs and ‘red’ PoL +* Red agent activity based on 'red' IERs and 'red' PoL * Defined reward function for use with RL agents (based on nodes status, and green / red IER success) * Fully configurable (network / system laydown, IERs, node PoL, ACL, episode step period, episode max steps) and repeatable to suit the training requirements of agents. Therefore, not bound to a representation of any particular platform, system or technology * Full capture of discrete metrics relating to agent training (full system state, agent actions taken, average reward) @@ -201,7 +201,7 @@ An example observation space is provided below: * - - ID - Hardware State - - SoftwareState + - Software State - File System State - Service / Protocol A - Service / Protocol B @@ -250,48 +250,35 @@ An example observation space is provided below: For the nodes, the following values are represented: - * ID - * Hardware State: +.. code-block:: - * 1 = ON - * 2 = OFF - * 3 = RESETTING - * 4 = SHUTTING_DOWN - * 5 = BOOTING - - * SoftwareState: - - * 1 = GOOD - * 2 = PATCHING - * 3 = COMPROMISED - - * Service State: - - * 1 = GOOD - * 2 = PATCHING - * 3 = COMPROMISED - * 4 = OVERWHELMED - - * File System State: - - * 1 = GOOD - * 2 = CORRUPT - * 3 = DESTROYED - * 4 = REPAIRING - * 5 = RESTORING + [ + ID + Hardware State (1=ON, 2=OFF, 3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING) + Operating System State (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED) + File System State (0=none, 1=GOOD, 2=CORRUPT, 3=DESTROYED, 4=REPAIRING, 5=RESTORING) + Service1/Protocol1 state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED) + Service2/Protocol2 state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED) + ] (Note that each service available in the network is provided as a column, although not all nodes may utilise all services) For the links, the following statuses are represented: - * ID - * Hardware State = N/A - * SoftwareState = N/A - * Protocol = loading in bits/s +.. code-block:: + + [ + ID + Hardware State (0=not applicable) + Operating System State (0=not applicable) + File System State (0=not applicable) + Service1/Protocol1 state (Traffic load from this protocol on this link) + Service2/Protocol2 state (Traffic load from this protocol on this link) + ] NodeStatus component ---------------------- -This is a MultiDiscrete observation space that can be though of as a one-dimensional vector of discrete states, represented by integers. +This is a MultiDiscrete observation space that can be though of as a one-dimensional vector of discrete states. The example above would have the following structure: .. code-block:: @@ -307,9 +294,9 @@ Each ``node_info`` contains the following: .. code-block:: [ - hardware_state (0=none, 1=ON, 2=OFF, 3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING) + hardware_state (0=none, 1=ON, 2=OFF, 3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING) software_state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED) - file_system_state (0=none, 1=GOOD, 2=CORRUPT, 3=DESTROYED, 4=REPAIRING, 5=RESTORING) + file_system_state (0=none, 1=GOOD, 2=CORRUPT, 3=DESTROYED, 4=REPAIRING, 5=RESTORING) service1_state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED) service2_state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED) ] @@ -320,10 +307,18 @@ In a network with three nodes and two services, the full observation space would gym.spaces.MultiDiscrete([4,5,6,4,4,4,5,6,4,4,4,5,6,4,4]) +.. note:: + NodeStatus observation component provides information only about nodes. Links are not considered. + LinkTrafficLevels ----------------- This component is a MultiDiscrete space showing the traffic flow levels on the links in the network, after applying a threshold to convert it from a continuous to a discrete value. -The number of bins can be customised with 5 being the default. It has the following strucutre: +There are two configurable parameters: +* ``quantisation_levels`` determines how many discrete bins to use for converting the continuous traffic value to discrete (default is 5). +* ``combine_service_traffic`` determines whether to separately output traffic use for each network protocol or whether to combine them into an overall value for the link. (default is ``True``) + +For example, with default parameters and a network with three links, the structure of this component would be: + .. code-block:: [ @@ -337,16 +332,13 @@ Each ``link_status`` is a number from 0-4 representing the network load in relat .. code-block:: 0 = No traffic (0%) - 1 = low traffic (<33%) - 2 = medium traffic (<66%) - 3 = high traffic (<100%) + 1 = low traffic (1%-33%) + 2 = medium traffic (33%-66%) + 3 = high traffic (66%-99%) 4 = max traffic/ overwhelmed (100%) -If the network has three links, the full observation space would have 3 elements. It can be written with ``gym`` notation to indicate the number of discrete options for each of the elements of the observation space. For example: +Using ``gym`` notation, the shape of the obs space is: ``gym.spaces.MultiDiscrete([5,5,5])``. -.. code-block:: - - gym.spaces.MultiDiscrete([5,5,5]) Action Spaces ************** diff --git a/docs/source/custom_agent.rst b/docs/source/custom_agent.rst index ed1d35c7..53594a8f 100644 --- a/docs/source/custom_agent.rst +++ b/docs/source/custom_agent.rst @@ -4,12 +4,78 @@ **Integrating a user defined blue agent** -Integrating a blue agent with PrimAITE requires some modification of the code within the main.py file. The main.py file -consists of a number of functions, each of which will invoke training for a particular agent. These are: +PrimAITE has integration with Ray RLLib and StableBaselines3 agents. All agents interface with PrimAITE through an :py:class:`primaite.agents.agent.AgentSessionABC` which provides Input/Output of agent savefiles, as well as capturing and plotting performance metrics during training. If you wish to integrate a custom blue agent, it is recommended to create a subclass of the :py:class:`primaite.agents.agent.AgentSessionABC` and implement the ``__init__()``, ``_setup()``, ``_save_checkpoint()``, ``learn()``, ``evaluate()``, ``_get_latest_checkpoint``, ``load()``, ``save()``, and ``export()`` methods. You will also need to modify :py:class:`primaite.primaite_session.PrimaiteSession` class to capture your new agent identifier. + +Below is a barebones example of a custom agent implementation: + +.. code:: python + + from primaite.agents.agent import AgentSessionABC + from primaite.common.enums import AgentFramework, AgentIdentifier + + class CustomAgent(AgentSessionABC): + def __init__(self, training_config_path, lay_down_config_path): + super().__init__(training_config_path, lay_down_config_path) + assert self._training_config.agent_framework == AgentFramework.CUSTOM + assert self._training_config.agent_identifier == AgentIdentifier.MY_AGENT + self._setup() + + def _setup(self): + super()._setup() + self._env = Primaite( + training_config_path=self._training_config_path, + lay_down_config_path=self._lay_down_config_path, + session_path=self.session_path, + timestamp_str=self.timestamp_str, + ) + self._agent = ... # your code to setup agent + + def _save_checkpoint(self): + checkpoint_num = self._training_config.checkpoint_every_n_episodes + episode_count = self._env.episode_count + save_checkpoint = False + if checkpoint_num: + save_checkpoint = episode_count % checkpoint_num == 0 + # saves checkpoint if the episode count is not 0 and save_checkpoint flag was set to true + if episode_count and save_checkpoint: + ... + # your code to save checkpoint goes here. + # The path should start with self.checkpoints_path and include the episode number. + + def learn(self): + ... + # call your agent's learning function here. + + super().learn() # this will finalise learning and output session metadata + self.save() + + def evaluate(self): + ... + # call your agent's evaluation function here. + + self._env.close() + super().evaluate() + + def _get_latest_checkpoint(self): + ... + # Load an agent from file. + + @classmethod + def load(cls, path): + ... + # + + def save(self): + ... + # Call your agent's function that saves it to a file + + def export(self): + ... + # Call your agent's function that exports it to a transportable file format. + + + -* Generic (run_generic) -* Stable Baselines 3 PPO (:func:`~primaite.main.run_stable_baselines3_ppo) -* Stable Baselines 3 A2C (:func:`~primaite.main.run_stable_baselines3_a2c) The selection of which agent type to use is made via the training config file. In order to train a user generated agent, the run_generic function should be selected, and should be modified (typically) to be: diff --git a/docs/source/primaite_session.rst b/docs/source/primaite_session.rst index a59b2361..1b48494a 100644 --- a/docs/source/primaite_session.rst +++ b/docs/source/primaite_session.rst @@ -78,9 +78,9 @@ PrimAITE automatically creates two sets of results from each session: * Timestamp * Episode number * Step number - * Initial observation space (what the blue agent observed when it decided its action) * Reward value * Action taken (as presented by the blue agent on this step). Individual elements of the action space are presented in the format AS_X + * Initial observation space (what the blue agent observed when it decided its action) **Diagrams**