Core User Guide Feeedback Implemented.

Feedback Issues left: 1.Issues with red-agent image not embeded correctly in data_manipulation_e2e notebook 2._autosummary/tests.unit_tests.html is still completely blank. 3. _index.html is not updated with new 2-pager 4. _dependencies is not updated to just include tier-1 and primary for v3. 5. definiton of user_app_home is not confirmed
2024-05-31 13:55:20 +01:00
parent 4c9b2334da
commit efc4e3e9b0
7 changed files with 67 additions and 99 deletions
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -55,23 +55,6 @@ PrimAITE provides a training and evaluation capability to AI agents in the conte

 Use of PrimAITE default scenarios within ARCD is supported by a “Use Case Profile” tailored to the scenario.

-AI Assessment Capability
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-PrimAITE includes the capability to support in-depth assessment of cyber defence AI by outputting logs of the environment state and AI behaviour throughout both training and evaluation sessions. These logs include the following data:
-
- Timestamp;
- Episode and step number;
- Agent identifier;
- Observation space;
- Action taken (by defensive AI);
- Reward value.
-
-Logs are available in CSV format and provide coverage of the above data for every step of every episode.
-
-
-
-
 What is PrimAITE built with
 ---------------------------

@@ -109,6 +92,7 @@ Head over to the :ref:`getting-started` page to install and setup PrimAITE!
   source/config
   source/environment
   source/customising_scenarios
+   source/varying_config_files

 .. toctree::
   :caption: Notebooks:
@@ -125,14 +109,4 @@ Head over to the :ref:`getting-started` page to install and setup PrimAITE!
   source/state_system
   source/request_system
   PrimAITE API <source/_autosummary/primaite>
-   PrimAITE Tests <source/_autosummary/tests>
-
-
-.. toctree::
-   :caption: Project Links:
-   :hidden:
-
-   Code <https://github.com/Autonomous-Resilient-Cyber-Defence/PrimAITE>
-   Issues <https://github.com/Autonomous-Resilient-Cyber-Defence/PrimAITE/issues>
-   Pull Requests <https://github.com/Autonomous-Resilient-Cyber-Defence/PrimAITE/pulls>
-   Discussions <https://github.com/Autonomous-Resilient-Cyber-Defence/PrimAITE/discussions>
+   PrimAITE Tests <source/_autosummary/tests>
--- a/docs/source/getting_started.rst
+++ b/docs/source/getting_started.rst
@@ -107,7 +107,9 @@ Clone & Install PrimAITE for Development
 To be able to extend PrimAITE further, or to build wheels manually before install, clone the repository to a location
 of your choice:

-1. Clone the repository
+1. Clone the repository. 
+
+For example:

 .. code-block:: bash

--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
@@ -38,14 +38,11 @@ Glossary
    Blue Agent
        A defensive agent that protects the network from Red Agent attacks to minimise disruption to green agents and protect data.

-    Information Exchange Requirement (IER)
-        Simulates network traffic by sending data from one network node to another via links for a specified amount of time. IERs can be part of green agent behaviour or red agent behaviour. PrimAITE can be configured to apply a penalty for green agents' IERs being blocked and a reward for red agents' IERs being blocked.
-
    Pattern-of-Life (PoL)
        PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.

    Reward
-        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment / :term:`reference environment` and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
+        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment and is impacted positively by things like green PoL running successfully and negatively by things like nodes being compromised.

    Observation
        An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
@@ -65,17 +62,8 @@ Glossary
    Episode
        When an episode starts, the network simulation is reset to an initial state. The agents take actions on each step of the episode until it reaches a terminal state, which usually happens after a predetermined number of steps. After the terminal state is reached, a new episode starts and the RL agent has another opportunity to protect the network.

-    Reference environment
-        While the network simulation is unfolding, a parallel simulation takes place which is identical to the main one except that blue and red agent actions are not applied. This reference environment essentially shows what would be happening to the network if there had been no cyberattack or defense. The reference environment is used to calculate rewards.
-
-    Transaction
-        PrimAITE records the decisions of the learning agent by saving its observation, action, and reward at every time step. During each session, this data is saved to disk to allow for full inspection.
-
    Laydown
        The laydown is a file which defines the training scenario. It contains the network topology, firewall rules, services, protocols, and details about green and red agent behaviours.

    Gymnasium
-        PrimAITE uses the Gymnasium reinforcement learning framework API to create a training environment and interface with RL agents. Gymnasium defines a common way of creating observations, actions, and rewards.
-
-    User app home
-        PrimAITE supports upgrading software version while retaining user data. The user data directory is where configs, notebooks, and results are stored, this location is `~/primaite<version>` on linux/darwin and `C:\\Users\\<username>\\primaite\\<version>` on Windows.
+        PrimAITE uses the Gymnasium reinforcement learning framework API to create a training environment and interface with RL agents. Gymnasium defines a common way of creating observations, actions, and rewards.
--- a/docs/source/varying_config_files.rst
+++ b/docs/source/varying_config_files.rst
@@ -0,0 +1,49 @@
+.. only:: comment
+
+    © Crown-owned copyright 2023, Defence Science and Technology Laboratory UK
+
+Defining variations in the config files
+================
+
+PrimAITE supports the ability to use different variations on a scenario at different episodes. This can be used to increase domain randomisation to prevent overfitting, or to set up curriculum learning to train agents to perform more complicated tasks.
+
+When using a fixed scenario, a single yaml config file is used. However, to use episode schedules, PrimAITE uses a directory with several config files that work together.
+Defining variations in the config file.
+
+Base scenario
+*************
+
+The base scenario is essentially the same as a fixed YAML configuration, but it can contain placeholders that are populated with episode-specific data at runtime. The base scenario contains any network, agent, or settings that remain fixed for the entire training/evaluation session.
+
+The placeholders are defined as YAML Aliases and they are denoted by an asterisk (*placeholder).
+
+Variations
+**********
+
+For each variation that could be used in a placeholder, there is a separate yaml file that contains the data that should populate the placeholder.
+
+The data that fills the placeholder is defined as a YAML Anchor in a separate file, denoted by an ampersand ``&anchor``.
+
+Learn more about YAML Aliases and Anchors here.
+
+Schedule
+********
+
+Users must define which combination of scenario variations should be loaded in each episode. This takes the form of a YAML file with a relative path to the base scenario and a list of paths to be loaded in during each episode.
+
+It takes the following format:
+
+.. code-block:: yaml
+
+    base_scenario: base.yaml
+    schedule:
+    0: # list of variations to load in at episode 0 (before the first call to env.reset() happens)
+        - laydown_1.yaml
+        - attack_1.yaml
+    1: # list of variations to load in at episode 1 (after the first env.reset() call)
+        - laydown_2.yaml
+        - attack_2.yaml
+
+For more information please refer to the ``Using Episode Schedules`` notebook in either :ref:`Executed Notebooks` or run the notebook interactively in ``notebooks/example_notebooks/``. 
+
+For further information around notebooks in general refer to the :ref:`Example Jupyter Notebooks`.