Merged PR 633: #3110 Final user guide comments.

## Summary Feedback following James' comments ## Test process ## Checklist - [x ] PR is linked to a **work item** - [x] **acceptance criteria** of linked ticket are met - [x] performed **self-review** of the code - [x] written **tests** for any new functionality added with this PR - [x] updated the **documentation** if this PR changes or adds functionality - [x] written/updated **design docs** if this PR implements new functionality - [x] updated the **change log** - [x] ran **pre-commit** checks for code style - [x] attended to any **TO-DOs** left in the code #3110 Final user guide comments. Related work items: #3110
2025-03-17 09:09:59 +00:00
parent 7f7fa36f14 e9e49ab5f9
commit b48ed25b23
3 changed files with 143 additions and 33 deletions
--- a/docs/source/primaite-dependencies.rst
+++ b/docs/source/primaite-dependencies.rst
@@ -13,7 +13,9 @@
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
 | kaleido           | ==0.2.1             | 0.2.1         |  MIT                                 | Static image export for web-based visualization libraries with zero dependencies                       |  https://github.com/plotly/Kaleido                                  |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
-| matplotlib        | >=3.7.1             | 3.7.1         |  Python Software Foundation License  | Python plotting package                                                                                |  https://matplotlib.org                                             |
+| matplotlib        | >=3.7.1             | 3.10.1        |  Python Software Foundation License  | Python plotting package                                                                                |  https://matplotlib.org                                             |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
 | matplotlib-inline | >=0.1.7             | 0.1.7         |  BSD License                         | Matplotlib Inline Back-end for IPython and Jupyter                                                     |  https://github.com/ipython/matplotlib-inline                       |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
 | networkx          | 3.1                 | 3.1           |  BSD License                         | Python package for creating and manipulating graphs and networks                                       |  https://networkx.org/                                              |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
@@ -29,7 +31,7 @@
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
 | pydantic          | 2.7.0               | 2.7.0         |  MIT License                         | Data validation using Python type hints                                                                |  https://github.com/pydantic/pydantic                               |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
-| PyYAML            | >=6.0               | 6.0           |  MIT License                         | YAML parser and emitter for Python                                                                     |  https://pyyaml.org/                                                |
+| PyYAML            | >=6.0               | 6.0.2         |  MIT License                         | YAML parser and emitter for Python                                                                     |  https://pyyaml.org/                                                |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
 | ray               | >=2.20, <2.33       | 2.32.0        |  Apache 2.0                          | Ray provides a simple, universal API for building distributed applications.                            |  https://github.com/ray-project/ray                                 |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
@@ -37,9 +39,9 @@
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
 | tensorflow        | ~=2.12              | 2.12.0        |  Apache Software License             | TensorFlow is an open source machine learning framework for everyone.                                  |  https://www.tensorflow.org/                                        |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
-| typer             | >=0.9               | 0.9.0         |  MIT License                         | Typer, build great CLIs. Easy to code. Based on Python type hints.                                     |  https://github.com/tiangolo/typer                                  |
+| typer             | >=0.9               | 0.15.2        |  MIT License                         | Typer, build great CLIs. Easy to code. Based on Python type hints.                                     |  https://github.com/tiangolo/typer                                  |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
-| Deepdiff          | 8.0.1               | 8.0.1         |  MIT License                         | Deep difference of dictionaries, iterables, strings, and any other object objects.                     |  https://github.com/seperman/deepdiff                               |
+| Deepdiff          | >=8.0.1             | 8.3.0         |  MIT License                         | Deep difference of dictionaries, iterables, strings, and any other object objects.                     |  https://github.com/seperman/deepdiff                               |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
 | sb3_contrib       | 2.1.0               | 2.1.0         |  MIT License                         | Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code (Action Masking) |  https://github.com/Stable-Baselines-Team/stable-baselines3-contrib |
 +-------------------+---------------------+---------------+--------------------------------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+
--- a/src/primaite/notebooks/UC7-E2E-Demo.ipynb
+++ b/src/primaite/notebooks/UC7-E2E-Demo.ipynb
@@ -34,7 +34,10 @@
   "outputs": [],
   "source": [
    "import yaml\n",
    "from prettytable import PrettyTable\n",
    "from primaite.session.environment import PrimaiteGymEnv\n",
    "from primaite.game.agent.scripted_agents.random_agent import PeriodicAgent\n",
    "from primaite.game.agent.interface import ProxyAgent\n",
    "from primaite.simulator.network.hardware.nodes.host.computer import Computer\n",
    "from primaite.simulator.network.hardware.nodes.host.server import Server\n",
    "from primaite.simulator.network.hardware.nodes.network.router import Router\n",
@@ -546,7 +549,7 @@
    "\n",
    "Additionally, `database-client` green agents are *Periodic* meaning they will attempt to use the database based on game time-steps. Specifically, these agents will begin on the time-step given in their `start_step` setting and will then will reattempt on each subsequence timestep based on the `Frequency` setting. These settings are then randomised using the remaining `start_variance` and `variance` options (also given in timesteps). These values are used to *±* their respective base settings to ensure the green agents achieve a moderate amount of domain randomisation in each PrimAITE episode.\n",
    "\n",
-    "For example, take a *Periodic* green agent set with a `start_step` of 4 and a `frequency` of **4** with a `start_variance` and a `variance` of **4** will cause a green agent to make it's first action on timestep $4 \\pm 1$ and then any subsequent actions every $4 \\pm 1$ timesteps afterwards.\n"
+    "For example, take a *Periodic* green agent set with a `start_step` of **4** and a `frequency` of **4** with a `start_variance` of **1** and a `variance` of **1** will cause a green agent to make its first action on timestep $4 \\pm 1$ and then any subsequent actions every $4 \\pm 1$ timesteps afterwards.\n"
   ]
  },
  {
@@ -581,7 +584,7 @@
    "\n",
    "Unlike the `database-client` green agents, the `web-browser` green agents are *probabilistic*. These agents are quite simple; on every timestep a probability roll is made to determine whenever the agent acts. On a successful outcome the agent will attempt to execute the `web-browser` application which will then attempt to connect to the `ST-DMZ-PUB-SRV-WEB` host. On a unsuccessful outcome then the green agent will simply perform not action on this timestep.\n",
    "\n",
-    "For example, a `web-browser` green agent with a `20%` chance has a $\\frac{1}{5}$ chance of actioning it's host's `web-browser` to access the `ST-DMZ-PUB-SRV-WEB` web-server. "
+    "For example, a `web-browser` green agent with a `20%` chance has a $\\frac{1}{5}$ chance of actioning its host's `web-browser` to access the `ST-DMZ-PUB-SRV-WEB` web-server. "
   ]
  },
  {
@@ -616,9 +619,9 @@
   "source": [
    "### AGENTS | Red Agents\n",
    "\n",
-    "For UC7, two new red agents have been developed which introduce a much more complex and realistic attacks in comparison to UC2's [data manipulation red agent](./Data-Manipulation-Customising-Red-Agent.ipynb) for the blue agent to defend against. These new red agents, or more commonly referred to `Threat Actor Profiles` (*TAPS*), utilise a series of different green, blue and red actions to simulate the different steps of a real-world attack.\n",
+    "For UC7, two new red agents have been developed which introduce much more complex and realistic attacks in comparison to UC2's [data manipulation red agent](./Data-Manipulation-Customising-Red-Agent.ipynb) for the blue agent to defend against. These new red agents, or more commonly referred to `Threat Actor Profiles` (*TAPS*), utilise a series of different green, blue and red actions to simulate the different steps of a real-world attack.\n",
    "\n",
-    "This notebook does not cover the red agents in much detail, hence it is highly recommended that readers should check out the respective TAP notebooks for a much more in-depth look at each TAP and their impacts.\n"
+    "This notebook does not cover the red agents in much detail, hence its highly recommended that readers should check out the respective TAP notebooks for a much more in-depth look at each TAP and their impacts.\n"
   ]
  },
  {
@@ -627,11 +630,11 @@
   "source": [
    "### AGENTS | RED AGENT | Threat Actor Profile 001 (`TAP001`)\n",
    "\n",
-    "This TAP aims to exfiltrate and then encrypt the `database.db` file on `ST_DATA-PRV-SRV-DB` host, whilst leaving the functionality of the database intact. Configured by default to start on the `ST_PROJ-A-PRV-PC-1` host, `TAP001` must first embed itself on the host, locate the target (`ST_DATA-PRV-SRV-DB`) through a series of [`nmap`](/PrimAITE/docs/source/simulation_components/system/applications/nmap.rst) scans, establish a connection to it's [`c2-server`](./Command-and-Control-E2E-Demonstration.ipynb)(`ISP-PUB-SRV-DNS` by default) and then finally attempt to exfiltrate and encrypt. \n",
+    "This TAP aims to exfiltrate and then encrypt the `database.db` file on `ST_DATA-PRV-SRV-DB` host, whilst leaving the functionality of the database intact. Configured by default to start on the `ST_PROJ-A-PRV-PC-1` host, `TAP001` must first embed itself on the host, locate the target (`ST_DATA-PRV-SRV-DB`) through a series of [`nmap`](/PrimAITE/docs/source/simulation_components/system/applications/nmap.rst) scans, establish a connection to its [`c2-server`](./Command-and-Control-E2E-Demonstration.ipynb)(`ISP-PUB-SRV-DNS` by default) and then finally attempt to exfiltrate and encrypt. \n",
    "\n",
-    "If successful, the blue agent is configured to receive a serve negative reward and thus must prevent `TAP001` from ever reaching the target database. This could be through blocking it's connection to the target or it's `c2-server` via a carefully crafted ACL or perhaps through more a forceful approach such as shutting down the starting host.\n",
+    "If successful, the blue agent is configured to receive a serve negative reward and thus must prevent `TAP001` from ever reaching the target database. This could be through blocking its connection to the target or its `c2-server` via a carefully crafted ACL or perhaps through more a forceful approach such as shutting down the starting host.\n",
    "\n",
-    "For more information on `TAP001` and it's impacts, [please refer to the TAP001 E2E notebook](./UC7-TAP001-Kill-Chain-E2E.ipynb) or for more blue agent involved demonstration refer to the [UC7 attack variants notebook](./UC7-attack-variants.ipynb) "
+    "For more information on `TAP001` and its impacts, [please refer to the TAP001 E2E notebook](./UC7-TAP001-Kill-Chain-E2E.ipynb) or for more blue agent involved demonstration refer to the [UC7 attack variants notebook](./UC7-attack-variants.ipynb) "
   ]
  },
  {
@@ -688,9 +691,9 @@
   "source": [
    "### AGENTS | RED AGENT | Threat Actor Profile 003 (`TAP003`)\n",
    "\n",
-    "Unlike `TAP001`'s more traditional representation of a threat actor, `TAP003` represents a malicious insider which leverages it's pre-existing knowledge to covertly add malicious access control lists (ACLs) to three different routers each of which affecting green agent traffic in a different way causing the blue agent to receive negative rewards. Thus, the blue agent must learn to leverage it's ability to remove rules and change credentials throughout the network to rectify the impacts of `TA003` and re-establish green POL and prevent `TAP003` from accessing additional routers.\n",
+    "Unlike `TAP001`'s more traditional representation of a threat actor, `TAP003` represents a malicious insider which leverages its pre-existing knowledge to covertly add malicious access control lists (ACLs) to three different routers each of which affecting green agent traffic in a different way causing the blue agent to receive negative rewards. Thus, the blue agent must learn to leverage its ability to remove rules and change credentials throughout the network to rectify the impacts of `TA003` and re-establish green POL and prevent `TAP003` from accessing additional routers.\n",
    "\n",
-    "The table below is a brief summary of the malicious acls added by `TAP003`\n",
+    "The table below is a brief summary of the malicious ACLs added by `TAP003`\n",
    "\n",
    "|Target Router         | Impact |\n",
    "|----------------------|--------|\n",
@@ -1280,7 +1283,7 @@
    "\n",
    "|Action Num | Action Type | Options|\n",
    "|:---------:|:-----------:|:------:|\n",
-    "|0|**donothing**|*n/a*|\n",
+    "|0|**do-nothing**|*n/a*|\n",
    "|1|**node-os-scan**|*node_name: ST_PROJ-A-PRV-PC-1*|\n",
    "|2|**node-shutdown**|*node_name: ST_PROJ-A-PRV-PC-1*|\n",
    "|3|**node-startup**|*node_name: ST_PROJ-A-PRV-PC-1*|\n"
@@ -1355,7 +1358,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Set by the `node_scan_duration` option in the simulation `defaults` section, it takes **8** timesteps before the results of `node-os-scan` impact the blue agent's observation space."
+    "Set by the `node_scan_duration` option in the simulation `defaults` section, the results of `node-os-scan` take **8** timesteps before it impacts the blue agent's observation space."
   ]
  },
  {
@@ -1471,14 +1474,14 @@
    "    reward_function:\n",
    "      reward_components:\n",
    "        - type: database-file-integrity\n",
-    "          weight: *HIGH_WEIGHT_IMPACT\n",
+    "          weight: *HIGH_WEIGHT_IMPACT # Equal to 0.95 (Reward Anchors defined at lines 960 - 980 in the uc7_config.yaml)\n",
    "          options: \n",
    "            node_hostname: ST_DATA-PRV-SRV-DB \n",
    "            folder_name: database\n",
    "            file_name: database.db\n",
    "```\n",
    "\n",
-    "The blue agent's remaining reward function is comprised of **32** different ``shared-reward`` components. These rewards will grant the blue agent a positive or negative reward based on the current reward of the **32** green agents. The next code snippets The code snippets below demonstrate how the blue agent's reward is affected by simulation state."
+    "The blue agent's remaining reward function is comprised of **32** different ``shared-reward`` components. These rewards will grant the blue agent a positive or negative reward based on the current reward of the **32** green agents. "
   ]
  },
  {
@@ -1487,11 +1490,128 @@
   "metadata": {},
   "outputs": [],
   "source": [
    "table = PrettyTable()\n",
    "table.field_names = [\"Reward Type\", \"Reward Option\", \"Reward Weight\"]\n",
    "for i in range(len(defender.reward_function.reward_components)):\n",
    "    reward_type = defender.reward_function.reward_components[i][0].config.type\n",
    "    try:\n",
-    "        print(f\"Simulation State Reward: {defender.reward_function.reward_components[i][0].location_in_state}\")\n",
+    "        reward_option = defender.reward_function.reward_components[i][0].config.file_name\n",
    "    except:\n",
-    "        print(f\"Green Agent Shared Reward: {defender.reward_function.reward_components[i][0].config.agent_name}\")\n"
+    "        reward_option = defender.reward_function.reward_components[i][0].config.agent_name\n",
    "    reward_weight = defender.reward_function.reward_components[i][1]\n",
    "    table.add_row(row=[reward_type, reward_option, reward_weight])\n",
    "print(table)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, each of the `shared-reward` component is configured with a equal reward `weight` of `0.03125` which totals a blue agent reward weight of `1`. \n",
    "\n",
    "It's worth noting that `shared-reward` components are **not** required to have a equal weight or total a weight value under `1`. \n",
    "\n",
    "Users are recommended to alter the `weights` of these rewards when creating their own scenarios.\n",
    "\n",
    "```yaml\n",
    "\n",
    "# UC7 Shared Reward Component Green Agents (32 Green Agents each contributing 0.03125 of blue reward)\n",
    "\n",
    "# Blue Shared Reward | HOME_WORKER-1-DB\n",
    "- type: shared-reward\n",
    "    weight: 0.03125\n",
    "    options:\n",
    "    agent_name: HOME_WORKER-1-DB\n",
    "\n",
    "# Green Agent HOME_WORKER-1-DB's reward function:\n",
    "    reward_function:\n",
    "      reward_components:\n",
    "      - type: green-admin-database-unreachable-penalty\n",
    "        weight: *MEDIUM_WEIGHT_IMPACT # Equal to 0.5 (Reward Anchors defined at lines 960 - 980 in the uc7_config.yaml)\n",
    "        options:\n",
    "          node_hostname: HOME-PUB-PC-1\n",
    "\n",
    "```\n",
    "\n",
    "The `weight` option in a `shared-reward` reward acts a multiplier to the reward of agent given in `agent_name`:\n",
    "\n",
    "shared_reward = agent_reward x shared_reward_weight\n",
    "\n",
    "\n",
    "This can be a little difficult to understand intuitively so the following code snippets demonstrate how one of these rewards are calculated during a live episode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Readers running this notebook natively can use edit this to test out different reward weight combinations\n",
    "BLUE_AGENT_SHARED_REWARD_WEIGHT = 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, if a user wished to configure the blue agent to place more value on the head office green agents such as the `CEO` then the blue agent's `shared-reward` components could be altered to reflect this by increasing the `weight` of the `shared-reward` configured to the `CEO` green agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Reloads the UC7 config and removes all of other reward-components. \n",
    "BLUE_AGENT_INDEX = 33\n",
    "with open(_EXAMPLE_CFG/\"uc7_config.yaml\", mode=\"r\") as uc7_config:\n",
    "    cfg = yaml.safe_load(uc7_config)\n",
    "\n",
    "    # Removing all the other blue agent rewards and adding a custom blue reward\n",
    "    blue_shared_reward_ceo = {'type': 'shared-reward', 'weight': BLUE_AGENT_SHARED_REWARD_WEIGHT, 'options': {'agent_name': 'CEO'}}\n",
    "\n",
    "    # Add the new custom blue agent shared rewards\n",
    "    blue_shared_reward_home_worker = cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].pop(1)\n",
    "    cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].clear() # Remove all blue agent rewards\n",
    "    cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].append(blue_shared_reward_ceo) \n",
    "    cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].append(blue_shared_reward_home_worker) \n",
    "\n",
    "\n",
    "env = PrimaiteGymEnv(env_config=cfg)\n",
    "env.reset()\n",
    "\n",
    "# Run the episode 10 times and record the results\n",
    "table = PrettyTable()\n",
    "table.field_names = [\"Time Step\", \"Home Worker Reward\", \"CEO Reward\", \"Blue Agent Total Reward\"]\n",
    "for _ in range(10):\n",
    "    env.step(0)\n",
    "    home_worker = env.game.agents.get('HOME_WORKER-1-DB')\n",
    "    ceo = env.game.agents.get('CEO')\n",
    "    defender = env.game.agents.get(\"defender\")\n",
    "    table.add_row([env.game.step_counter,home_worker.reward_function.current_reward, ceo.reward_function.current_reward, defender.reward_function.current_reward])\n",
    "print(table)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see from the table above, because we increased the `shared-reward` weightings the blue agent's reward is nearly all comprised of the CEO's reward - `4.75`:\n",
    "\n",
    "ceo_reward_contribution = 0.95 x 5\n",
    "\n",
    "We can see that the remote worker agent only contributes `0.015625` to the blue agent's total reward:\n",
    "\n",
    "remote_worked_shared_reward = 0.5 x 0.03125\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, the final few code snippets demonstrate how the default UC7 blue agent's reward is affected by simulation state within an episode."
   ]
  },
  {
@@ -1612,18 +1732,6 @@
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/src/primaite/notebooks/UC7-TAP003-Kill-Chain-E2E.ipynb
+++ b/src/primaite/notebooks/UC7-TAP003-Kill-Chain-E2E.ipynb
@@ -1547,7 +1547,7 @@
    "|probability|Action Probability - The chance of successfully carrying out this stage in the kill_chain.|str|_Required_|\n",
    "|malicious_acls|The configurable ACL that the TAP003 agent adds to the target node.|dict|_Required_|\n",
    "\n",
-    "The malicious ACL is configured identically to the other ACLs. except from the target router/firewall. \n",
+    "The malicious ACL is configured identically to the other ACLs except from the target router/firewall. \n",
    "This option is set to the TAP003's configured target host automatically.\n",
    "\n",
    "TAP003 intends to leverage these ACL's for malicious purposes. The default configuration is to deny all traffic from and towards the 0.0.0.255 subnet. \n",
@@ -1640,7 +1640,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Unlike the blue agent, TAP003 does not need to use it's action space options for indexing different options, meaning that ACL's are a lot easier to configure.\n",
+    "Unlike the blue agent, TAP003 does not need to use its action space options for indexing different options, meaning that ACLs are a lot easier to configure.\n",
    "\n",
    "The sandbox below can be used to try out different configuration options and their impact on the simulation."
   ]