From c6ed921643e2f205ffec9f86a3c01cdb19eb5db5 Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Sun, 9 Jul 2023 20:23:53 +0100
Subject: [PATCH 1/9] Update docs

---
 docs/source/about.rst            | 86 +++++++++++++++-----------------
 docs/source/custom_agent.rst     | 76 ++++++++++++++++++++++++++--
 docs/source/primaite_session.rst |  2 +-
 3 files changed, 111 insertions(+), 53 deletions(-)

diff --git a/docs/source/about.rst b/docs/source/about.rst
index 1f4669fe..a4a92b92 100644
--- a/docs/source/about.rst
+++ b/docs/source/about.rst
@@ -10,11 +10,11 @@ PrimAITE provides the following features:
 
 * A flexible network / system laydown based on the Python networkx framework
 * Nodes and links (edges) host Python classes in order to present attributes and methods (and hence, a more representative model of a platform / system)
-* A ‘green agent’ Information Exchange Requirement (IER) function allows the representation of traffic (protocols and loading) on any / all links. Application of IERs is based on the status of node operating systems and services
-* A ‘green agent’ node Pattern-of-Life (PoL) function allows the representation of core behaviours on nodes (e.g. Hardware state, Software State, Service state, File System state)
+* A 'green agent' Information Exchange Requirement (IER) function allows the representation of traffic (protocols and loading) on any / all links. Application of IERs is based on the status of node operating systems and services
+* A 'green agent' node Pattern-of-Life (PoL) function allows the representation of core behaviours on nodes (e.g. changing the Hardware state, Software State, Service state, or File System state)
 * An Access Control List (ACL) function, mimicking the behaviour of a network firewall, is applied across the model, following standard ACL rule format (e.g. DENY/ALLOW, source IP, destination IP, protocol and port). Application of IERs adheres to any ACL restrictions
 * Presents an OpenAI Gym interface to the environment, allowing integration with any OpenAI Gym compliant defensive agents
-* Red agent activity based on ‘red’ IERs and ‘red’ PoL
+* Red agent activity based on 'red' IERs and 'red' PoL
 * Defined reward function for use with RL agents (based on nodes status, and green / red IER success)
 * Fully configurable (network / system laydown, IERs, node PoL, ACL, episode step period, episode max steps) and repeatable to suit the training requirements of agents. Therefore, not bound to a representation of any particular platform, system or technology
 * Full capture of discrete metrics relating to agent training (full system state, agent actions taken, average reward)
@@ -201,7 +201,7 @@ An example observation space is provided below:
    * -
      - ID
      - Hardware State
-     - SoftwareState
+     - Software State
      - File System State
      - Service / Protocol A
      - Service / Protocol B
@@ -250,48 +250,35 @@ An example observation space is provided below:
 
 For the nodes, the following values are represented:
 
- * ID
- * Hardware State:
+.. code-block::
 
-    * 1 = ON
-    * 2 = OFF
-    * 3 = RESETTING
-    * 4 = SHUTTING_DOWN
-    * 5 = BOOTING
-
- * SoftwareState:
-
-    * 1 = GOOD
-    * 2 = PATCHING
-    * 3 = COMPROMISED
-
- * Service State:
-
-    * 1 = GOOD
-    * 2 = PATCHING
-    * 3 = COMPROMISED
-    * 4 = OVERWHELMED
-
- * File System State:
-
-    * 1 = GOOD
-    * 2 = CORRUPT
-    * 3 = DESTROYED
-    * 4 = REPAIRING
-    * 5 = RESTORING
+  [
+    ID
+    Hardware State            (1=ON,   2=OFF,  3=RESETTING,  4=SHUTTING_DOWN, 5=BOOTING)
+    Operating System State    (0=none, 1=GOOD, 2=PATCHING,   3=COMPROMISED)
+    File System State         (0=none, 1=GOOD, 2=CORRUPT,    3=DESTROYED,  4=REPAIRING, 5=RESTORING)
+    Service1/Protocol1 state  (0=none, 1=GOOD, 2=PATCHING,   3=COMPROMISED)
+    Service2/Protocol2 state  (0=none, 1=GOOD, 2=PATCHING,   3=COMPROMISED)
+  ]
 
 (Note that each service available in the network is provided as a column, although not all nodes may utilise all services)
 
 For the links, the following statuses are represented:
 
- * ID
- * Hardware State = N/A
- * SoftwareState = N/A
- * Protocol = loading in bits/s
+.. code-block::
+
+  [
+    ID
+    Hardware State            (0=not applicable)
+    Operating System State    (0=not applicable)
+    File System State         (0=not applicable)
+    Service1/Protocol1 state  (Traffic load from this protocol on this link)
+    Service2/Protocol2 state  (Traffic load from this protocol on this link)
+  ]
 
 NodeStatus component
 ----------------------
-This is a MultiDiscrete observation space that can be though of as a one-dimensional vector of discrete states, represented by integers.
+This is a MultiDiscrete observation space that can be though of as a one-dimensional vector of discrete states.
 The example above would have the following structure:
 
 .. code-block::
@@ -307,9 +294,9 @@ Each ``node_info`` contains the following:
 .. code-block::
 
   [
-    hardware_state    (0=none, 1=ON, 2=OFF, 3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING)
+    hardware_state    (0=none, 1=ON,   2=OFF,      3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING)
     software_state    (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
-    file_system_state (0=none, 1=GOOD, 2=CORRUPT, 3=DESTROYED, 4=REPAIRING, 5=RESTORING)
+    file_system_state (0=none, 1=GOOD, 2=CORRUPT,  3=DESTROYED, 4=REPAIRING, 5=RESTORING)
     service1_state    (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
     service2_state    (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
   ]
@@ -320,10 +307,18 @@ In a network with three nodes and two services, the full observation space would
 
   gym.spaces.MultiDiscrete([4,5,6,4,4,4,5,6,4,4,4,5,6,4,4])
 
+.. note::
+  NodeStatus observation component provides information only about nodes. Links are not considered.
+
 LinkTrafficLevels
 -----------------
 This component is a MultiDiscrete space showing the traffic flow levels on the links in the network, after applying a threshold to convert it from a continuous to a discrete value.
-The number of bins can be customised with 5 being the default. It has the following strucutre:
+There are two configurable parameters:
+* ``quantisation_levels`` determines how many discrete bins to use for converting the continuous traffic value to discrete (default is 5).
+* ``combine_service_traffic`` determines whether to separately output traffic use for each network protocol or whether to combine them into an overall value for the link. (default is ``True``)
+
+For example, with default parameters and a network with three links, the structure of this component would be:
+
 .. code-block::
 
   [
@@ -337,16 +332,13 @@ Each ``link_status`` is a number from 0-4 representing the network load in relat
 .. code-block::
 
   0 = No traffic (0%)
-  1 = low traffic (<33%)
-  2 = medium traffic (<66%)
-  3 = high traffic (<100%)
+  1 = low traffic (1%-33%)
+  2 = medium traffic (33%-66%)
+  3 = high traffic (66%-99%)
   4 = max traffic/ overwhelmed (100%)
 
-If the network has three links, the full observation space would have 3 elements. It can be written with ``gym`` notation to indicate the number of discrete options for each of the elements of the observation space. For example:
+Using ``gym`` notation, the shape of the obs space is: ``gym.spaces.MultiDiscrete([5,5,5])``.
 
-.. code-block::
-
-  gym.spaces.MultiDiscrete([5,5,5])
 
 Action Spaces
 **************
diff --git a/docs/source/custom_agent.rst b/docs/source/custom_agent.rst
index ed1d35c7..53594a8f 100644
--- a/docs/source/custom_agent.rst
+++ b/docs/source/custom_agent.rst
@@ -4,12 +4,78 @@
 
 **Integrating a user defined blue agent**
 
-Integrating a blue agent with PrimAITE requires some modification of the code within the main.py file. The main.py file
-consists of a number of functions, each of which will invoke training for a particular agent. These are:
+PrimAITE has integration with Ray RLLib and StableBaselines3 agents. All agents interface with PrimAITE through an :py:class:`primaite.agents.agent.AgentSessionABC<Agent Session>` which provides Input/Output of agent savefiles, as well as capturing and plotting performance metrics during training. If you wish to integrate a custom blue agent, it is recommended to create a subclass of the :py:class:`primaite.agents.agent.AgentSessionABC` and implement the ``__init__()``, ``_setup()``,  ``_save_checkpoint()``, ``learn()``, ``evaluate()``, ``_get_latest_checkpoint``, ``load()``, ``save()``, and ``export()`` methods. You will also need to modify :py:class:`primaite.primaite_session.PrimaiteSession<PrimaiteSession>` class to capture your new agent identifier.
+
+Below is a barebones example of a custom agent implementation:
+
+.. code:: python
+
+    from primaite.agents.agent import AgentSessionABC
+    from primaite.common.enums import AgentFramework, AgentIdentifier
+
+    class CustomAgent(AgentSessionABC):
+        def __init__(self, training_config_path, lay_down_config_path):
+            super().__init__(training_config_path, lay_down_config_path)
+            assert self._training_config.agent_framework == AgentFramework.CUSTOM
+            assert self._training_config.agent_identifier == AgentIdentifier.MY_AGENT
+            self._setup()
+
+        def _setup(self):
+            super()._setup()
+            self._env = Primaite(
+                training_config_path=self._training_config_path,
+                lay_down_config_path=self._lay_down_config_path,
+                session_path=self.session_path,
+                timestamp_str=self.timestamp_str,
+        )
+            self._agent = ... # your code to setup agent
+
+        def _save_checkpoint(self):
+            checkpoint_num = self._training_config.checkpoint_every_n_episodes
+            episode_count = self._env.episode_count
+            save_checkpoint = False
+            if checkpoint_num:
+                save_checkpoint = episode_count % checkpoint_num == 0
+            # saves checkpoint if the episode count is not 0 and save_checkpoint flag was set to true
+            if episode_count and save_checkpoint:
+                ...
+                # your code to save checkpoint goes here.
+                # The path should start with self.checkpoints_path and include the episode number.
+
+        def learn(self):
+            ...
+            # call your agent's learning function here.
+
+            super().learn() # this will finalise learning and output session metadata
+            self.save()
+
+        def evaluate(self):
+            ...
+            # call your agent's evaluation function here.
+
+            self._env.close()
+            super().evaluate()
+
+        def _get_latest_checkpoint(self):
+            ...
+            # Load an agent from file.
+
+        @classmethod
+        def load(cls, path):
+            ...
+            #
+
+        def save(self):
+            ...
+            # Call your agent's function that saves it to a file
+
+        def export(self):
+            ...
+            # Call your agent's function that exports it to a transportable file format.
+
+
+
 
-* Generic (run_generic)
-* Stable Baselines 3 PPO (:func:`~primaite.main.run_stable_baselines3_ppo)
-* Stable Baselines 3 A2C (:func:`~primaite.main.run_stable_baselines3_a2c)
 
 The selection of which agent type to use is made via the training config file. In order to train a user generated agent,
 the run_generic function should be selected, and should be modified (typically) to be:
diff --git a/docs/source/primaite_session.rst b/docs/source/primaite_session.rst
index a59b2361..1b48494a 100644
--- a/docs/source/primaite_session.rst
+++ b/docs/source/primaite_session.rst
@@ -78,9 +78,9 @@ PrimAITE automatically creates two sets of results from each session:
     * Timestamp
     * Episode number
     * Step number
-    * Initial observation space (what the blue agent observed when it decided its action)
     * Reward value
     * Action taken (as presented by the blue agent on this step). Individual elements of the action space are presented in the format AS_X
+    * Initial observation space (what the blue agent observed when it decided its action)
 
 **Diagrams**
 

From cc5e31c9b5483ae1a5a427963abeceb59dc295c8 Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Mon, 10 Jul 2023 11:19:47 +0100
Subject: [PATCH 2/9] Changed order of text in custom agent docs

---
 docs/source/custom_agent.rst | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/source/custom_agent.rst b/docs/source/custom_agent.rst
index 53594a8f..74b6a607 100644
--- a/docs/source/custom_agent.rst
+++ b/docs/source/custom_agent.rst
@@ -4,7 +4,7 @@
 
 **Integrating a user defined blue agent**
 
-PrimAITE has integration with Ray RLLib and StableBaselines3 agents. All agents interface with PrimAITE through an :py:class:`primaite.agents.agent.AgentSessionABC<Agent Session>` which provides Input/Output of agent savefiles, as well as capturing and plotting performance metrics during training. If you wish to integrate a custom blue agent, it is recommended to create a subclass of the :py:class:`primaite.agents.agent.AgentSessionABC` and implement the ``__init__()``, ``_setup()``,  ``_save_checkpoint()``, ``learn()``, ``evaluate()``, ``_get_latest_checkpoint``, ``load()``, ``save()``, and ``export()`` methods. You will also need to modify :py:class:`primaite.primaite_session.PrimaiteSession<PrimaiteSession>` class to capture your new agent identifier.
+PrimAITE has integration with Ray RLLib and StableBaselines3 agents. All agents interface with PrimAITE through an :py:class:`primaite.agents.agent.AgentSessionABC<Agent Session>` which provides Input/Output of agent savefiles, as well as capturing and plotting performance metrics during training. If you wish to integrate a custom blue agent, it is recommended to create a subclass of the :py:class:`primaite.agents.agent.AgentSessionABC` and implement the ``__init__()``, ``_setup()``,  ``_save_checkpoint()``, ``learn()``, ``evaluate()``, ``_get_latest_checkpoint``, ``load()``, ``save()``, and ``export()`` methods.
 
 Below is a barebones example of a custom agent implementation:
 
@@ -74,6 +74,9 @@ Below is a barebones example of a custom agent implementation:
             # Call your agent's function that exports it to a transportable file format.
 
 
+You will also need to modify :py:class:`primaite.primaite_session.PrimaiteSession<PrimaiteSession>` class to capture your new agent identifier.
+
+
 
 
 

From 31703c54e25a05e8aa02a18a223f5e73496c1a1d Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Mon, 10 Jul 2023 14:56:06 +0100
Subject: [PATCH 3/9] Finished writing custom agent example.

---
 docs/source/custom_agent.rst | 106 ++++++++++++++++++-----------------
 1 file changed, 55 insertions(+), 51 deletions(-)

diff --git a/docs/source/custom_agent.rst b/docs/source/custom_agent.rst
index 74b6a607..45d1c5a4 100644
--- a/docs/source/custom_agent.rst
+++ b/docs/source/custom_agent.rst
@@ -2,14 +2,21 @@
 =============
 
 
-**Integrating a user defined blue agent**
+Integrating a user defined blue agent
+*************************************
 
-PrimAITE has integration with Ray RLLib and StableBaselines3 agents. All agents interface with PrimAITE through an :py:class:`primaite.agents.agent.AgentSessionABC<Agent Session>` which provides Input/Output of agent savefiles, as well as capturing and plotting performance metrics during training. If you wish to integrate a custom blue agent, it is recommended to create a subclass of the :py:class:`primaite.agents.agent.AgentSessionABC` and implement the ``__init__()``, ``_setup()``,  ``_save_checkpoint()``, ``learn()``, ``evaluate()``, ``_get_latest_checkpoint``, ``load()``, ``save()``, and ``export()`` methods.
+.. note::
+
+    If you are planning to implement custom RL agents into PrimAITE, you must use the project as a repository. If you install PrimAITE as a python package from wheel, custom agents are not supported.
+
+PrimAITE has integration with Ray RLLib and StableBaselines3 agents. All agents interface with PrimAITE through an :py:class:`primaite.agents.agent.AgentSessionABC<Agent Session>` which provides Input/Output of agent savefiles, as well as capturing and plotting performance metrics during training and evaluation. If you wish to integrate a custom blue agent, it is recommended to create a subclass of the :py:class:`primaite.agents.agent.AgentSessionABC` and implement the ``__init__()``, ``_setup()``,  ``_save_checkpoint()``, ``learn()``, ``evaluate()``, ``_get_latest_checkpoint``, ``load()``, and ``save()`` methods.
 
 Below is a barebones example of a custom agent implementation:
 
 .. code:: python
 
+    # src/primaite/agents/my_custom_agent.py
+
     from primaite.agents.agent import AgentSessionABC
     from primaite.common.enums import AgentFramework, AgentIdentifier
 
@@ -63,72 +70,69 @@ Below is a barebones example of a custom agent implementation:
         @classmethod
         def load(cls, path):
             ...
-            #
+            # Create a CustomAgent object which loads model weights from file.
 
         def save(self):
             ...
             # Call your agent's function that saves it to a file
 
-        def export(self):
-            ...
-            # Call your agent's function that exports it to a transportable file format.
 
+You will also need to modify :py:class:`primaite.primaite_session.PrimaiteSession<PrimaiteSession>` and :py:mod:`primaite.common.enums` to capture your new agent identifiers.
 
-You will also need to modify :py:class:`primaite.primaite_session.PrimaiteSession<PrimaiteSession>` class to capture your new agent identifier.
+.. code-block:: python
+    :emphasize-lines: 17, 18
 
+    # src/primaite/common/enums.py
 
+    class AgentIdentifier(Enum):
+        """The Red Agent algo/class."""
+        A2C = 1
+        "Advantage Actor Critic"
+        PPO = 2
+        "Proximal Policy Optimization"
+        HARDCODED = 3
+        "The Hardcoded agents"
+        DO_NOTHING = 4
+        "The DoNothing agents"
+        RANDOM = 5
+        "The RandomAgent"
+        DUMMY = 6
+        "The DummyAgent"
+        CUSTOM_AGENT = 7
+        "Your custom agent"
 
+.. code-block:: python
+    :emphasize-lines: 3, 11, 12
 
+    # src/primaite_session.py
 
-The selection of which agent type to use is made via the training config file. In order to train a user generated agent,
-the run_generic function should be selected, and should be modified (typically) to be:
+    from primaite.agents.my_custom_agent import CustomAgent
 
-.. code:: python
+    # ...
 
-    agent = MyAgent(environment, num_steps)
-    for episode in range(0, num_episodes):
-        agent.learn()
-    env.close()
-    save_agent(agent)
+        def setup(self):
+        """Performs the session setup."""
+        if self._training_config.agent_framework == AgentFramework.CUSTOM:
+            _LOGGER.debug(f"PrimaiteSession Setup: Agent Framework = {AgentFramework.CUSTOM}")
+            if self._training_config.agent_identifier == AgentIdentifier.CUSTOM_AGENT:
+                self._agent_session = CustomAgent(self._training_config_path, self._lay_down_config_path)
+            if self._training_config.agent_identifier == AgentIdentifier.HARDCODED:
+                _LOGGER.debug(f"PrimaiteSession Setup: Agent Identifier =" f" {AgentIdentifier.HARDCODED}")
+                if self._training_config.action_type == ActionType.NODE:
+                    # Deterministic Hardcoded Agent with Node Action Space
+                    self._agent_session = HardCodedNodeAgent(self._training_config_path, self._lay_down_config_path)
 
-Where:
+Finally, specify your agent in your training config.
 
-* *MyAgent* is the user created agent
-* *environment* is the :class:`~primaite.environment.primaite_env.Primaite` environment
-* *num_episodes* is the number of episodes in the session, as defined in the training config file
-* *num_steps* is the number of steps in an episode, as defined in the training config file
-* the *.learn()* function should be defined in the user created agent
-* the *env.close()* function is defined within PrimAITE
-* the *save_agent()* assumes that a *save()* function has been defined in the user created agent. If not, this line can
-  be ommitted (although it is encouraged, since it will allow the agent to be saved and ported)
+.. code-block:: yaml
 
-The code below provides a suggested format for the learn() function within the user created agent.
-It's important to include the *self.environment.reset()* call within the episode loop in order that the
-environment is reset between episodes. Note that the example below should not be considered exhaustive.
+    # ~/primaite/config/path/to/your/config_main.yaml
 
-.. code:: python
+    # Training Config File
 
-    def learn(self) :
+    agent_framework: CUSTOM
+    agent_identifier: CUSTOM_AGENT
+    random_red_agent: False
+    # ...
 
-    # pre-reqs
-
-    # reset the environment
-    self.environment.reset()
-    done = False
-
-    for step in range(max_steps):
-        # calculate the action
-        action = ...
-
-        # execute the environment step
-        new_state, reward, done, info = self.environment.step(action)
-
-        # algorithm updates
-        ...
-
-        # update to our new state
-        state = new_state
-
-        # if done, finish episode
-        if done == True:
-            break
+Now you can `Run a PrimAITE Session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.

From 1297d61a7abd8a046bd9249be0b77c8a79bc5a8e Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Tue, 11 Jul 2023 09:56:52 +0100
Subject: [PATCH 4/9] Added glossary

---
 docs/index.rst           |  1 +
 docs/source/glossary.rst | 76 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)
 create mode 100644 docs/source/glossary.rst

diff --git a/docs/index.rst b/docs/index.rst
index 17dae2c9..4be73154 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -38,6 +38,7 @@ The best place to start is :ref:`about`
    PrimAITE API <source/_autosummary/primaite>
    PrimAITE Tests <source/_autosummary/tests>
    source/dependencies
+   source/glossary
 
 .. toctree::
    :caption: Project Links:
diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst
new file mode 100644
index 00000000..6ebf99f9
--- /dev/null
+++ b/docs/source/glossary.rst
@@ -0,0 +1,76 @@
+Glossary
+=============
+
+.. glossary::
+
+    Network
+        The network in primaite is a logical representation of a computer network containing :term:`Node<Nodes>` and :term:`Link<Links>`.
+
+    Node
+        A Node represents a network endpoint. For example a computer, server, switch, or an actuator.
+
+    Link
+        A Link represents the connection between two Nodes. For example, a physical wire between a computer and a switch or a wireless connection.
+
+    Agent
+        An agent is a representation of a user of the network. Typically this would be a user that is using one of the computer nodes, though it could be an autonomous agent.
+
+    Red Agent
+        An agent that is aiming to attack the network in some way, for example by executing a Denial-Of-Service attack or stealing data.
+
+    Blue Agent
+        A defensive agent that protects the network from Red Agent attacks to minimise disruption to green agents and protect data.
+
+    Green agent
+        Simulates typical benign activity on the network, such as real users using computers and servers.
+
+    Information Exchange Request (IER)
+        ...
+
+    Pattern-of-Life (PoL)
+        ...
+
+    Protocol
+        ...
+
+    Service
+        ...
+
+    Gym
+        ...
+
+    Reward
+        ...
+
+    Access Control List
+        ...
+
+    Observation
+        ...
+
+    Action
+        ...
+
+    StableBaselines3
+        ...
+
+    Ray RLLib
+        ...
+
+    Episode
+        ...
+
+    Step
+        ...
+
+    Reference environment
+        ...
+
+    Transaction
+        ...
+
+    Laydown
+        ...
+
+    User data directory
+        PrimAITE supports upgrading software version while retaining user data. The user data directory is where configs, notebooks, and results are stored, this location is `~/primaite` on linux/darwin and `C:\Users\<username>\primaite` on Windows.

From 35263ee1406f356f118d13f93d7c57cc5b62f056 Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Tue, 11 Jul 2023 11:13:28 +0100
Subject: [PATCH 5/9] Completed glossary

---
 docs/source/glossary.rst | 44 ++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst
index 6ebf99f9..796b6aa1 100644
--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
@@ -4,7 +4,7 @@ Glossary
 .. glossary::
 
     Network
-        The network in primaite is a logical representation of a computer network containing :term:`Node<Nodes>` and :term:`Link<Links>`.
+        The network in primaite is a logical representation of a computer network containing :term:`Nodes<Node>` and :term:`Links<Link>`.
 
     Node
         A Node represents a network endpoint. For example a computer, server, switch, or an actuator.
@@ -24,53 +24,53 @@ Glossary
     Green agent
         Simulates typical benign activity on the network, such as real users using computers and servers.
 
-    Information Exchange Request (IER)
-        ...
+    Information Exchange Requirement (IER)
+        Simulates network traffic by sending data from one network node to another via links for a specified amount of time. IERs can be part of green agent behaviour or red agent behaviour. PrimAITE can be configured to apply a penalty for green agents' IERs being blocked and a reward for red agents' IERs being blocked.
 
     Pattern-of-Life (PoL)
-        ...
+        PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.
 
     Protocol
-        ...
+        Protocols are used by links to separate different types of network traffic. Common examples would be HTTP, TCP, and UDP.
 
     Service
-        ...
+        A service represents a piece of software that is installed on a node, such as a web server or a database.
 
     Gym
-        ...
+        PrimAITE uses the Gym reinforcement learning framework API to create a training environment and interface with RL agents. Gym defines a common way of creating observations, actions, and rewards.
 
     Reward
-        ...
+        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current state of the environment and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
 
     Access Control List
-        ...
+        PrimAITE blocks or allows certain traffic on the network by simulating firewall rules, which are defined in the Access Control List.
 
     Observation
-        ...
+        An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
 
     Action
-        ...
-
-    StableBaselines3
-        ...
-
-    Ray RLLib
-        ...
+        The learning agent decides on an action to take on every step in the simulation. The action has the chance to positively or negatively impact the environment state. Over time, the agent aims to learn which actions to take when to maximise the expected reward.
 
     Episode
-        ...
+        When an episode starts, the network simulation is reset to an initial state. The agents take actions on each step of the episode until it reaches a terminal state, which usually happens after a predetermined number of steps. After the terminal state is reached, a new episode starts and the RL agent has another opportunity to protect the network.
+
+    Training
+        During training, an RL agent is placed in the simulated network and it learns which actions to take in which scenarios to obtain maximum reward.
+
+    Evaluation
+        During evaluation, an RL agent acts on the simulated network but it is not allowed to update it's behaviour. Evaluation is used to assess how successful agents are at defending the network.
 
     Step
-        ...
+        The agents can only act in the environment at discrete intervals. The time step is the basic unit of time in the simulation. At each step, the RL agent has an opportunity to observe the state of the environment and decide an action. Steps are also used for updating states for time-dependent activities such as rebooting a node.
 
     Reference environment
-        ...
+        While the network simulation is unfolding, a parallel simulation takes place which is identical to the main one except that blue and red agent actions are not applied. This reference environment essentially shows what would be happening to the network if there had been no cyberattack or defense. The reference environment is used to calculate rewards.
 
     Transaction
-        ...
+        PrimAITE records the decisions of the learning agent by saving its observation, action, and reward at every time step. During each session, this data is saved to disk to allow for full inspection.
 
     Laydown
-        ...
+        The laydown is a file which defines the training scenario. It contains the network topology, firewall rules, services, protocols, and details about green and red agent behaviours.
 
     User data directory
         PrimAITE supports upgrading software version while retaining user data. The user data directory is where configs, notebooks, and results are stored, this location is `~/primaite` on linux/darwin and `C:\Users\<username>\primaite` on Windows.

From 4f36ffd90922900f22c0105f4d16ae6548f31ac3 Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Tue, 11 Jul 2023 11:31:29 +0100
Subject: [PATCH 6/9] Improved order of glossary terms

---
 docs/source/glossary.rst | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst
index 796b6aa1..34e3c8a3 100644
--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
@@ -2,6 +2,7 @@ Glossary
 =============
 
 .. glossary::
+    :sorted:
 
     Network
         The network in primaite is a logical representation of a computer network containing :term:`Nodes<Node>` and :term:`Links<Link>`.
@@ -12,48 +13,42 @@ Glossary
     Link
         A Link represents the connection between two Nodes. For example, a physical wire between a computer and a switch or a wireless connection.
 
+    Protocol
+        Protocols are used by links to separate different types of network traffic. Common examples would be HTTP, TCP, and UDP.
+
+    Service
+        A service represents a piece of software that is installed on a node, such as a web server or a database.
+
+    Access Control List
+        PrimAITE blocks or allows certain traffic on the network by simulating firewall rules, which are defined in the Access Control List.
+
     Agent
         An agent is a representation of a user of the network. Typically this would be a user that is using one of the computer nodes, though it could be an autonomous agent.
 
+    Green agent
+        Simulates typical benign activity on the network, such as real users using computers and servers.
+
     Red Agent
         An agent that is aiming to attack the network in some way, for example by executing a Denial-Of-Service attack or stealing data.
 
     Blue Agent
         A defensive agent that protects the network from Red Agent attacks to minimise disruption to green agents and protect data.
 
-    Green agent
-        Simulates typical benign activity on the network, such as real users using computers and servers.
-
     Information Exchange Requirement (IER)
         Simulates network traffic by sending data from one network node to another via links for a specified amount of time. IERs can be part of green agent behaviour or red agent behaviour. PrimAITE can be configured to apply a penalty for green agents' IERs being blocked and a reward for red agents' IERs being blocked.
 
     Pattern-of-Life (PoL)
         PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.
 
-    Protocol
-        Protocols are used by links to separate different types of network traffic. Common examples would be HTTP, TCP, and UDP.
-
-    Service
-        A service represents a piece of software that is installed on a node, such as a web server or a database.
-
-    Gym
-        PrimAITE uses the Gym reinforcement learning framework API to create a training environment and interface with RL agents. Gym defines a common way of creating observations, actions, and rewards.
-
     Reward
         The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current state of the environment and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
 
-    Access Control List
-        PrimAITE blocks or allows certain traffic on the network by simulating firewall rules, which are defined in the Access Control List.
-
     Observation
         An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
 
     Action
         The learning agent decides on an action to take on every step in the simulation. The action has the chance to positively or negatively impact the environment state. Over time, the agent aims to learn which actions to take when to maximise the expected reward.
 
-    Episode
-        When an episode starts, the network simulation is reset to an initial state. The agents take actions on each step of the episode until it reaches a terminal state, which usually happens after a predetermined number of steps. After the terminal state is reached, a new episode starts and the RL agent has another opportunity to protect the network.
-
     Training
         During training, an RL agent is placed in the simulated network and it learns which actions to take in which scenarios to obtain maximum reward.
 
@@ -63,6 +58,9 @@ Glossary
     Step
         The agents can only act in the environment at discrete intervals. The time step is the basic unit of time in the simulation. At each step, the RL agent has an opportunity to observe the state of the environment and decide an action. Steps are also used for updating states for time-dependent activities such as rebooting a node.
 
+    Episode
+        When an episode starts, the network simulation is reset to an initial state. The agents take actions on each step of the episode until it reaches a terminal state, which usually happens after a predetermined number of steps. After the terminal state is reached, a new episode starts and the RL agent has another opportunity to protect the network.
+
     Reference environment
         While the network simulation is unfolding, a parallel simulation takes place which is identical to the main one except that blue and red agent actions are not applied. This reference environment essentially shows what would be happening to the network if there had been no cyberattack or defense. The reference environment is used to calculate rewards.
 
@@ -72,5 +70,8 @@ Glossary
     Laydown
         The laydown is a file which defines the training scenario. It contains the network topology, firewall rules, services, protocols, and details about green and red agent behaviours.
 
+    Gym
+        PrimAITE uses the Gym reinforcement learning framework API to create a training environment and interface with RL agents. Gym defines a common way of creating observations, actions, and rewards.
+
     User data directory
         PrimAITE supports upgrading software version while retaining user data. The user data directory is where configs, notebooks, and results are stored, this location is `~/primaite` on linux/darwin and `C:\Users\<username>\primaite` on Windows.

From 42e8a6522701fabe874ec8e028ca9ce8017f5cb4 Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Tue, 11 Jul 2023 12:01:48 +0100
Subject: [PATCH 7/9] Added draft migration guide.

---
 docs/index.rst                      |  1 +
 docs/source/migration_1.2_-_2.0.rst | 43 +++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)
 create mode 100644 docs/source/migration_1.2_-_2.0.rst

diff --git a/docs/index.rst b/docs/index.rst
index 02baa695..fed65919 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -39,6 +39,7 @@ The best place to start is :ref:`about`
    PrimAITE Tests <source/_autosummary/tests>
    source/dependencies
    source/glossary
+   source/migration_1.2_-_2.0
 
 .. toctree::
    :caption: Project Links:
diff --git a/docs/source/migration_1.2_-_2.0.rst b/docs/source/migration_1.2_-_2.0.rst
new file mode 100644
index 00000000..99cb891b
--- /dev/null
+++ b/docs/source/migration_1.2_-_2.0.rst
@@ -0,0 +1,43 @@
+v1.2 to v2.0 Migration guide
+============================
+
+**1. Running a training session**
+
+    In version 1.2 of PrimAITE, the main entry point for training or evaluating agents was the ``src/primaite/main.py`` file. v2.0.0 introduced managed 'sessions' which are responsible for reading configuration files, performing training, and writing outputs.
+
+    ``main.py`` file still runs a training session but it now uses the new `PrimaiteSession`, and it now requires you to provide the path to your config files.
+
+    .. code-block:: bash
+
+        python src/primaite/main.py --tc path/to/training-config.yaml --ldc path/to/laydown-config.yaml
+
+    Alternatively, the session can be invoked via the commandline by running:
+
+    .. code-block:: bash
+
+        primaite session --tc path/to/training-config.yaml --ldc path/to/laydown-config.yaml
+
+**2. Location of configs**
+
+    In version 1.2, training configs and laydown configs were all stored in the project repository under ``src/primaite/config``. Version 2.0.0 introduced user data directories, and now when you install and setup PrimAITE, config files are stored in your user data location. On Linux/OSX, this is stored in ``~/primaite/config``. On Windows, this is stored in ``C:\Users\<your username>\primaite\configs``. Upon first setup, the configs folder is populated with some default yaml files. It is recommended that you store all your custom configuration files here.
+
+**3. Contents of configs**
+
+    Some things that were previously part of the laydown config are now part of the traning config.
+
+        * Actions
+
+    If you have custom configs which use these, you will need to adapt them by moving the configuration from the laydown config to the training config.
+
+    Also, there are new configurable items in the training config:
+
+        * Observations
+        * Agent framework
+        * Agent
+        * Deep learning framework
+        * random red agents
+        * seed
+        * deterministic
+        * hard coded agent view
+
+    Each of these items have default values which are designed so that PrimAITE has the same behaviour as it did in 1.2.0, so you do not have to specify them.

From a01984b0ac7dddd34f21433ea93185f83ee90b82 Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Tue, 11 Jul 2023 12:10:20 +0100
Subject: [PATCH 8/9] Updated migration guide

---
 docs/source/migration_1.2_-_2.0.rst | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/docs/source/migration_1.2_-_2.0.rst b/docs/source/migration_1.2_-_2.0.rst
index 99cb891b..2adf9656 100644
--- a/docs/source/migration_1.2_-_2.0.rst
+++ b/docs/source/migration_1.2_-_2.0.rst
@@ -1,7 +1,15 @@
 v1.2 to v2.0 Migration guide
 ============================
 
-**1. Running a training session**
+**1. Installing PrimAITE**
+
+    Like before, you can install primaite from the repository by running ``pip install -e .``. But, there is now an additional setup step which does several things, like setting up user directories, copy default configs and notebooks, etc. Once you have installed PrimAITE to your virtual environment, run this command to finalise setup.
+
+    .. code-block:: bash
+
+        primaite setup
+
+**2. Running a training session**
 
     In version 1.2 of PrimAITE, the main entry point for training or evaluating agents was the ``src/primaite/main.py`` file. v2.0.0 introduced managed 'sessions' which are responsible for reading configuration files, performing training, and writing outputs.
 
@@ -17,11 +25,11 @@ v1.2 to v2.0 Migration guide
 
         primaite session --tc path/to/training-config.yaml --ldc path/to/laydown-config.yaml
 
-**2. Location of configs**
+**3. Location of configs**
 
     In version 1.2, training configs and laydown configs were all stored in the project repository under ``src/primaite/config``. Version 2.0.0 introduced user data directories, and now when you install and setup PrimAITE, config files are stored in your user data location. On Linux/OSX, this is stored in ``~/primaite/config``. On Windows, this is stored in ``C:\Users\<your username>\primaite\configs``. Upon first setup, the configs folder is populated with some default yaml files. It is recommended that you store all your custom configuration files here.
 
-**3. Contents of configs**
+**4. Contents of configs**
 
     Some things that were previously part of the laydown config are now part of the traning config.
 

From 5d5d70c0b63d4a64629aa86a2843239cfe10e69d Mon Sep 17 00:00:00 2001
From: Marek Wolan <marek.wolan@methods.co.uk>
Date: Wed, 12 Jul 2023 09:16:40 +0100
Subject: [PATCH 9/9] Add better hyperlinks

---
 docs/source/custom_agent.rst     | 2 +-
 docs/source/glossary.rst         | 2 +-
 docs/source/primaite_session.rst | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/source/custom_agent.rst b/docs/source/custom_agent.rst
index 45d1c5a4..b4552d64 100644
--- a/docs/source/custom_agent.rst
+++ b/docs/source/custom_agent.rst
@@ -135,4 +135,4 @@ Finally, specify your agent in your training config.
     random_red_agent: False
     # ...
 
-Now you can `Run a PrimAITE Session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
+Now you can :ref:`run a primaite session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst
index 34e3c8a3..58b4cd5e 100644
--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
@@ -41,7 +41,7 @@ Glossary
         PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.
 
     Reward
-        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current state of the environment and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
+        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment / :term:`reference environment` and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
 
     Observation
         An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
diff --git a/docs/source/primaite_session.rst b/docs/source/primaite_session.rst
index 1b48494a..a393093c 100644
--- a/docs/source/primaite_session.rst
+++ b/docs/source/primaite_session.rst
@@ -1,3 +1,5 @@
+.. _run a primaite session:
+
 Run a PrimAITE Session
 ======================