diff --git a/src/primaite/notebooks/UC7-E2E-Demo.ipynb b/src/primaite/notebooks/UC7-E2E-Demo.ipynb index cd3c2f8f..2710893b 100644 --- a/src/primaite/notebooks/UC7-E2E-Demo.ipynb +++ b/src/primaite/notebooks/UC7-E2E-Demo.ipynb @@ -34,7 +34,10 @@ "outputs": [], "source": [ "import yaml\n", + "from prettytable import PrettyTable\n", "from primaite.session.environment import PrimaiteGymEnv\n", + "from primaite.game.agent.scripted_agents.random_agent import PeriodicAgent\n", + "from primaite.game.agent.interface import ProxyAgent\n", "from primaite.simulator.network.hardware.nodes.host.computer import Computer\n", "from primaite.simulator.network.hardware.nodes.host.server import Server\n", "from primaite.simulator.network.hardware.nodes.network.router import Router\n", @@ -546,7 +549,7 @@ "\n", "Additionally, `database-client` green agents are *Periodic* meaning they will attempt to use the database based on game time-steps. Specifically, these agents will begin on the time-step given in their `start_step` setting and will then will reattempt on each subsequence timestep based on the `Frequency` setting. These settings are then randomised using the remaining `start_variance` and `variance` options (also given in timesteps). These values are used to *±* their respective base settings to ensure the green agents achieve a moderate amount of domain randomisation in each PrimAITE episode.\n", "\n", - "For example, take a *Periodic* green agent set with a `start_step` of 4 and a `frequency` of **4** with a `start_variance` and a `variance` of **4** will cause a green agent to make it's first action on timestep $4 \\pm 1$ and then any subsequent actions every $4 \\pm 1$ timesteps afterwards.\n" + "For example, take a *Periodic* green agent set with a `start_step` of **4** and a `frequency` of **4** with a `start_variance` of **1** and a `variance` of **1** will cause a green agent to make it's first action on timestep $4 \\pm 1$ and then any subsequent actions every $4 \\pm 1$ timesteps afterwards.\n" ] }, { @@ -616,9 +619,9 @@ "source": [ "### AGENTS | Red Agents\n", "\n", - "For UC7, two new red agents have been developed which introduce a much more complex and realistic attacks in comparison to UC2's [data manipulation red agent](./Data-Manipulation-Customising-Red-Agent.ipynb) for the blue agent to defend against. These new red agents, or more commonly referred to `Threat Actor Profiles` (*TAPS*), utilise a series of different green, blue and red actions to simulate the different steps of a real-world attack.\n", + "For UC7, two new red agents have been developed which introduce much more complex and realistic attacks in comparison to UC2's [data manipulation red agent](./Data-Manipulation-Customising-Red-Agent.ipynb) for the blue agent to defend against. These new red agents, or more commonly referred to `Threat Actor Profiles` (*TAPS*), utilise a series of different green, blue and red actions to simulate the different steps of a real-world attack.\n", "\n", - "This notebook does not cover the red agents in much detail, hence it is highly recommended that readers should check out the respective TAP notebooks for a much more in-depth look at each TAP and their impacts.\n" + "This notebook does not cover the red agents in much detail, hence its highly recommended that readers should check out the respective TAP notebooks for a much more in-depth look at each TAP and their impacts.\n" ] }, { @@ -690,7 +693,7 @@ "\n", "Unlike `TAP001`'s more traditional representation of a threat actor, `TAP003` represents a malicious insider which leverages it's pre-existing knowledge to covertly add malicious access control lists (ACLs) to three different routers each of which affecting green agent traffic in a different way causing the blue agent to receive negative rewards. Thus, the blue agent must learn to leverage it's ability to remove rules and change credentials throughout the network to rectify the impacts of `TA003` and re-establish green POL and prevent `TAP003` from accessing additional routers.\n", "\n", - "The table below is a brief summary of the malicious acls added by `TAP003`\n", + "The table below is a brief summary of the malicious ACLs added by `TAP003`\n", "\n", "|Target Router | Impact |\n", "|----------------------|--------|\n", @@ -1280,7 +1283,7 @@ "\n", "|Action Num | Action Type | Options|\n", "|:---------:|:-----------:|:------:|\n", - "|0|**donothing**|*n/a*|\n", + "|0|**do-nothing**|*n/a*|\n", "|1|**node-os-scan**|*node_name: ST_PROJ-A-PRV-PC-1*|\n", "|2|**node-shutdown**|*node_name: ST_PROJ-A-PRV-PC-1*|\n", "|3|**node-startup**|*node_name: ST_PROJ-A-PRV-PC-1*|\n" @@ -1355,7 +1358,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Set by the `node_scan_duration` option in the simulation `defaults` section, it takes **8** timesteps before the results of `node-os-scan` impact the blue agent's observation space." + "Set by the `node_scan_duration` option in the simulation `defaults` section, the results of `node-os-scan` take **8** timesteps before it impacts the blue agent's observation space." ] }, { @@ -1471,14 +1474,14 @@ " reward_function:\n", " reward_components:\n", " - type: database-file-integrity\n", - " weight: *HIGH_WEIGHT_IMPACT\n", + " weight: *HIGH_WEIGHT_IMPACT # Equal to 0.95 (Reward Anchors defined at lines 960 - 980 in the uc7_config.yaml)\n", " options: \n", " node_hostname: ST_DATA-PRV-SRV-DB \n", " folder_name: database\n", " file_name: database.db\n", "```\n", "\n", - "The blue agent's remaining reward function is comprised of **32** different ``shared-reward`` components. These rewards will grant the blue agent a positive or negative reward based on the current reward of the **32** green agents. The next code snippets The code snippets below demonstrate how the blue agent's reward is affected by simulation state." + "The blue agent's remaining reward function is comprised of **32** different ``shared-reward`` components. These rewards will grant the blue agent a positive or negative reward based on the current reward of the **32** green agents. " ] }, { @@ -1487,11 +1490,128 @@ "metadata": {}, "outputs": [], "source": [ + "table = PrettyTable()\n", + "table.field_names = [\"Reward Type\", \"Reward Option\", \"Reward Weight\"]\n", "for i in range(len(defender.reward_function.reward_components)):\n", + " reward_type = defender.reward_function.reward_components[i][0].config.type\n", " try:\n", - " print(f\"Simulation State Reward: {defender.reward_function.reward_components[i][0].location_in_state}\")\n", + " reward_option = defender.reward_function.reward_components[i][0].config.file_name\n", " except:\n", - " print(f\"Green Agent Shared Reward: {defender.reward_function.reward_components[i][0].config.agent_name}\")\n" + " reward_option = defender.reward_function.reward_components[i][0].config.agent_name\n", + " reward_weight = defender.reward_function.reward_components[i][1]\n", + " table.add_row(row=[reward_type, reward_option, reward_weight])\n", + "print(table)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By default, each of the `shared-reward` component is configured with a equal reward `weight` of `0.03125` which totals a blue agent reward weight of `1`. \n", + "\n", + "It's worth noting that `shared-reward` components are **not** required to have a equal weight or total a weight value under `1`. \n", + "\n", + "Users are recommended to alter the `weights` of these rewards when creating their own scenarios.\n", + "\n", + "```yaml\n", + "\n", + "# UC7 Shared Reward Component Green Agents (32 Green Agents each contributing 0.03125 of blue reward)\n", + "\n", + "# Blue Shared Reward | HOME_WORKER-1-DB\n", + "- type: shared-reward\n", + " weight: 0.03125\n", + " options:\n", + " agent_name: HOME_WORKER-1-DB\n", + "\n", + "# Green Agent HOME_WORKER-1-DB's reward function:\n", + " reward_function:\n", + " reward_components:\n", + " - type: green-admin-database-unreachable-penalty\n", + " weight: *MEDIUM_WEIGHT_IMPACT # Equal to 0.5 (Reward Anchors defined at lines 960 - 980 in the uc7_config.yaml)\n", + " options:\n", + " node_hostname: HOME-PUB-PC-1\n", + "\n", + "```\n", + "\n", + "The `weight` option in a `shared-reward` reward acts a multiplier to the reward of agent given in `agent_name`:\n", + "\n", + "$\\text{shared\\_reward} = \\text{agent\\_reward} \\times \\text{shared\\_reward\\_weight}$\n", + "\n", + "\n", + "This can a little difficult to understand intuitively so the following code snippets demonstrates how one of these rewards are calculated during a live episode." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Readers running this notebook natively can use edit this to test out different reward weight combinations\n", + "BLUE_AGENT_SHARED_REWARD_WEIGHT = 5" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example, if a user wished to configure the blue agent to place more value on the head office green agents such as the `CEO` then the blue agent's `shared-reward` components could be altered to reflect this by increasing the `weight` of the `shared-reward` configured to the `CEO` green agent." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Reloads the UC7 config and removes all of other reward-components. \n", + "BLUE_AGENT_INDEX = 33\n", + "with open(_EXAMPLE_CFG/\"uc7_config.yaml\", mode=\"r\") as uc7_config:\n", + " cfg = yaml.safe_load(uc7_config)\n", + "\n", + " # Removing all the other blue agent rewards and adding a custom blue reward\n", + " blue_shared_reward_ceo = {'type': 'shared-reward', 'weight': BLUE_AGENT_SHARED_REWARD_WEIGHT, 'options': {'agent_name': 'CEO'}}\n", + "\n", + " # Add the new custom blue agent shared rewards\n", + " blue_shared_reward_home_worker = cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].pop(1)\n", + " cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].clear() # Remove all blue agent rewards\n", + " cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].append(blue_shared_reward_ceo) \n", + " cfg['agents'][BLUE_AGENT_INDEX]['reward_function']['reward_components'].append(blue_shared_reward_home_worker) \n", + "\n", + "\n", + "env = PrimaiteGymEnv(env_config=cfg)\n", + "env.reset()\n", + "\n", + "# Run the episode 10 times and record the results\n", + "table = PrettyTable()\n", + "table.field_names = [\"Time Step\", \"Home Worker Reward\", \"CEO Reward\", \"Blue Agent Total Reward\"]\n", + "for _ in range(10):\n", + " env.step(0)\n", + " home_worker = env.game.agents.get('HOME_WORKER-1-DB')\n", + " ceo = env.game.agents.get('CEO')\n", + " defender = env.game.agents.get(\"defender\")\n", + " table.add_row([env.game.step_counter,home_worker.reward_function.current_reward, ceo.reward_function.current_reward, defender.reward_function.current_reward])\n", + "print(table)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see from the table above, because we increased the `shared-reward` weightings the blue agent's reward is nearly all comprised of the CEO's reward - `4.75`:\n", + "\n", + "$\\text{ceo\\_reward\\_contribution} = 0.95 \\times 5$ \n", + "\n", + "We can see that the remote worker agent only contributes `0.015625` to the blue agent's total reward:\n", + "\n", + "$\\text{remote\\_worker\\_reward\\_contribution} = 0.5 \\times 0.03125$\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lastly, the final few code snippets demonstrate how the default UC7 blue agent's reward is affected by simulation state within an episode." ] }, { @@ -1612,18 +1732,6 @@ "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" } }, "nbformat": 4, diff --git a/src/primaite/notebooks/UC7-TAP003-Kill-Chain-E2E.ipynb b/src/primaite/notebooks/UC7-TAP003-Kill-Chain-E2E.ipynb index 0827228d..f2baf310 100644 --- a/src/primaite/notebooks/UC7-TAP003-Kill-Chain-E2E.ipynb +++ b/src/primaite/notebooks/UC7-TAP003-Kill-Chain-E2E.ipynb @@ -1547,7 +1547,7 @@ "|probability|Action Probability - The chance of successfully carrying out this stage in the kill_chain.|str|_Required_|\n", "|malicious_acls|The configurable ACL that the TAP003 agent adds to the target node.|dict|_Required_|\n", "\n", - "The malicious ACL is configured identically to the other ACLs. except from the target router/firewall. \n", + "The malicious ACL is configured identically to the other ACLs except from the target router/firewall. \n", "This option is set to the TAP003's configured target host automatically.\n", "\n", "TAP003 intends to leverage these ACL's for malicious purposes. The default configuration is to deny all traffic from and towards the 0.0.0.255 subnet. \n", @@ -1640,7 +1640,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Unlike the blue agent, TAP003 does not need to use it's action space options for indexing different options, meaning that ACL's are a lot easier to configure.\n", + "Unlike the blue agent, TAP003 does not need to use its action space options for indexing different options, meaning that ACLs are a lot easier to configure.\n", "\n", "The sandbox below can be used to try out different configuration options and their impact on the simulation." ]