Merged PR 278: Enable the red agent to vary its start node

## Summary - Made the data manipulation red agent be able to choose between the two clients to start operating on - changed the attacker name in the config to 'data_manipulation_attacker' to because it is no longer tied to any client - Updated the documentation notebook accordingly. - Fixed a bug where the database client made a new connection every time it sent a SQL query (it tries to reuse its most recent one instead) - Fixed a bug where link loads were not being cleared between episodes (?) **warning** - the green agents are not working properly after reset right now, but I'm gonna fix this in the next ticket where I refactor episode reset. ## Test process - unit tests pass - UC2 notebook passes with both clients. (currently this doesn't work after an episode reset, but the very next thing I'm gonna work on is refactoring the reset, so I don't want to waste time fixing this.) ## Checklist - [x] PR is linked to a **work item** - [x] **acceptance criteria** of linked ticket are met - [x] performed **self-review** of the code - [x] written **tests** for any new functionality added with this PR - [ ] updated the **documentation** if this PR changes or adds functionality - [ ] written/updated **design docs** if this PR implements new functionality - [x] updated the **change log** - [x] ran **pre-commit** checks for code style - [n] attended to any **TO-DOs** left in the code Related work items: #2232
2024-02-20 20:22:20 +00:00
parent 23a56ca59f 88f8e9cb42
commit 8f85555709
13 changed files with 105 additions and 31 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

 ## [Unreleased]
+- Changed the red agent in the data manipulation scenario to randomly choose client 1 or client 2 to start its attack.
 - Changed the data manipulation scenario to include a second green agent on client 1.
 - Refactored actions and observations to be configurable via object name, instead of UUID.
 - Fixed a bug where ACL rules were not resetting on episode reset.
--- a/src/primaite/config/_package_data/example_config.yaml
+++ b/src/primaite/config/_package_data/example_config.yaml
@@ -85,7 +85,7 @@ agents:



-  - ref: client_1_data_manipulation_red_bot
+  - ref: data_manipulation_attacker
    team: RED
    type: RedDatabaseCorruptingAgent

@@ -106,6 +106,9 @@ agents:
        - node_name: client_1
          applications:
            - application_name: DataManipulationBot
+        - node_name: client_2
+          applications:
+            - application_name: DataManipulationBot
        max_folders_per_node: 1
        max_files_per_folder: 1
        max_services_per_node: 1
@@ -730,6 +733,13 @@ simulation:
        type: WebBrowser
        options:
          target_url: http://arcd.com/users/
+      - ref: data_manipulation_bot
+        type: DataManipulationBot
+        options:
+          port_scan_p_of_success: 0.8
+          data_manipulation_p_of_success: 0.8
+          payload: "DELETE"
+          server_ip: 192.168.1.14
      services:
      - ref: client_2_dns_client
        type: DNSClient
--- a/src/primaite/config/_package_data/example_config_2_rl_agents.yaml
+++ b/src/primaite/config/_package_data/example_config_2_rl_agents.yaml
@@ -54,7 +54,7 @@ agents:
        frequency: 4
        variance: 3

-  - ref: client_1_data_manipulation_red_bot
+  - ref: data_manipulation_attacker
    team: RED
    type: RedDatabaseCorruptingAgent

@@ -74,7 +74,10 @@ agents:
        nodes:
        - node_ref: client_1
          applications:
-            - application_ref: data_manipulation_bot
+            - application_name: DataManipulationBot
+        - node_name: client_2
+          applications:
+            - application_name: DataManipulationBot
        max_folders_per_node: 1
        max_files_per_folder: 1
        max_services_per_node: 1
--- a/src/primaite/game/agent/data_manipulation_bot.py
+++ b/src/primaite/game/agent/data_manipulation_bot.py
@@ -1,21 +1,20 @@
 import random
-from typing import Dict, List, Tuple
+from typing import Dict, Tuple

 from gymnasium.core import ObsType

 from primaite.game.agent.interface import AbstractScriptedAgent
-from primaite.simulator.system.applications.red_applications.data_manipulation_bot import DataManipulationBot


 class DataManipulationAgent(AbstractScriptedAgent):
    """Agent that uses a DataManipulationBot to perform an SQL injection attack."""

-    data_manipulation_bots: List["DataManipulationBot"] = []
    next_execution_timestep: int = 0
+    starting_node_idx: int = 0

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
-        self._set_next_execution_timestep(self.agent_settings.start_settings.start_step)
+        self.reset_agent_for_episode()

    def _set_next_execution_timestep(self, timestep: int) -> None:
        """Set the next execution timestep with a configured random variance.
@@ -44,9 +43,16 @@ class DataManipulationAgent(AbstractScriptedAgent):

        self._set_next_execution_timestep(current_timestep + self.agent_settings.start_settings.frequency)

-        return "NODE_APPLICATION_EXECUTE", {"node_id": 0, "application_id": 0}
+        return "NODE_APPLICATION_EXECUTE", {"node_id": self.starting_node_idx, "application_id": 0}

    def reset_agent_for_episode(self) -> None:
        """Set the next execution timestep when the episode resets."""
        super().reset_agent_for_episode()
+        self._select_start_node()
        self._set_next_execution_timestep(self.agent_settings.start_settings.start_step)
+
+    def _select_start_node(self) -> None:
+        """Set the starting starting node of the agent to be a random node from this agent's action manager."""
+        # we are assuming that every node in the node manager has a data manipulation application at idx 0
+        num_nodes = len(self.action_manager.node_names)
+        self.starting_node_idx = random.randint(0, num_nodes - 1)
--- a/src/primaite/game/game.py
+++ b/src/primaite/game/game.py
@@ -1,6 +1,6 @@
 """PrimAITE game - Encapsulates the simulation and agents."""
 from ipaddress import IPv4Address
-from typing import Dict, List
+from typing import Dict, List, Tuple

 from pydantic import BaseModel, ConfigDict

@@ -152,8 +152,14 @@ class PrimaiteGame:
            agent.update_reward(state)
            agent.reward_function.total_reward += agent.reward_function.current_reward

-    def apply_agent_actions(self) -> None:
-        """Apply all actions to simulation as requests."""
+    def apply_agent_actions(self) -> Dict[str, Tuple[str, Dict]]:
+        """
+        Apply all actions to simulation as requests.
+
+        :return: A recap of each agent's actions, in CAOS format.
+        :rtype: Dict[str, Tuple[str, Dict]]
+
+        """
        agent_actions = {}
        for agent in self.agents:
            obs = agent.observation_manager.current_observation
--- a/src/primaite/notebooks/uc2_demo.ipynb
+++ b/src/primaite/notebooks/uc2_demo.ipynb
@@ -55,7 +55,7 @@
   "source": [
    "## Red agent\n",
    "\n",
-    "The red agent waits a bit then sends a DELETE query to the database from client 1. If the delete is successful, the database file is flagged as compromised to signal that data is not available.\n",
+    "At the start of every episode, the red agent randomly chooses either client 1 or client 2 to login to. It waits a bit then sends a DELETE query to the database from its chosen client. If the delete is successful, the database file is flagged as compromised to signal that data is not available.\n",
    "\n",
    "[<img src=\"_package_data/uc2_attack.png\" width=\"500\"/>](_package_data/uc2_attack.png)\n",
    "\n",
@@ -68,7 +68,7 @@
   "source": [
    "## Blue agent\n",
    "\n",
-    "The blue agent can view the entire network, but the health statuses of components are not updated until a scan is performed. The blue agent should restore the database file from backup after it was compromised. It can also prevent further attacks by blocking client 1 from sending the malicious SQL query to the database server. This can be done by implementing an ACL rule on the router."
+    "The blue agent can view the entire network, but the health statuses of components are not updated until a scan is performed. The blue agent should restore the database file from backup after it was compromised. It can also prevent further attacks by blocking the red agent client from sending the malicious SQL query to the database server. This can be done by implementing an ACL rule on the router."
   ]
  },
  {
@@ -84,7 +84,7 @@
   "source": [
    "## Scripted agents:\n",
    "### Red\n",
-    "The red agent sits on client 1 and uses an application called DataManipulationBot whose sole purpose is to send a DELETE query to the database.\n",
+    "The red agent sits on a client and uses an application called DataManipulationBot whose sole purpose is to send a DELETE query to the database.\n",
    "The red agent can choose one of two action each timestep:\n",
    "1. do nothing\n",
    "2. execute the data manipulation application\n",
@@ -92,6 +92,7 @@
    "- start time\n",
    "- frequency\n",
    "- variance\n",
+    "\n",
    "Attacks start at a random timestep between (start_time - variance) and (start_time + variance). After each attack, another is attempted after a random delay between (frequency - variance) and (frequency + variance) timesteps.\n",
    "\n",
    "The data manipulation app itself has an element of randomness because the attack has a probability of success. The default is 0.8 to succeed with the port scan step and 0.8 to succeed with the attack itself.\n",
@@ -290,10 +291,16 @@
    "- `9`: Scan the database file - this refreshes the health status of the database file\n",
    "- `13`: Patch the database service - This triggers the database to restore data from the backup server\n",
    "- `19`: Shut down client 1\n",
+    "- `20`: Start up client 1\n",
    "- `22`: Block outgoing traffic from client 1\n",
+    "- `23`: Block outgoing traffic from client 2\n",
    "- `26`: Block TCP traffic from client 1 to the database node\n",
+    "- `27`: Block TCP traffic from client 2 to the database node\n",
    "- `28-37`: Remove ACL rules 1-10\n",
    "- `42`: Disconnect client 1 from the network\n",
+    "- `43`: Reconnect client 1 to the network\n",
+    "- `44`: Disconnect client 2 from the network\n",
+    "- `45`: Reconnect client 2 to the network\n",
    "\n",
    "The other actions will either have no effect or will negatively impact the network, so the blue agent should avoid taking them."
   ]
@@ -375,7 +382,9 @@
    "    cfg = yaml.safe_load(f)\n",
    "    # set success probability to 1.0 to avoid rerunning cells.\n",
    "    cfg['simulation']['network']['nodes'][8]['applications'][0]['options']['data_manipulation_p_of_success'] = 1.0\n",
+    "    cfg['simulation']['network']['nodes'][9]['applications'][0]['options']['data_manipulation_p_of_success'] = 1.0\n",
    "    cfg['simulation']['network']['nodes'][8]['applications'][0]['options']['port_scan_p_of_success'] = 1.0\n",
+    "    cfg['simulation']['network']['nodes'][9]['applications'][0]['options']['port_scan_p_of_success'] = 1.0\n",
    "game = PrimaiteGame.from_config(cfg)\n",
    "env = PrimaiteGymEnv(game = game)\n",
    "# Don't flatten obs as we are not training an agent and we wish to see the dict-formatted observations\n",
@@ -389,7 +398,25 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The red agent will start attacking at some point between step 20 and 30. When this happens, the reward will go from 1.0 to 0.0, and to -1.0 when the green agent tries to access the webpage."
+    "The red agent will start attacking at some point between step 20 and 30. When this happens, the reward will drop immediately, then drop to -1.0 when green agents try to access the webpage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def friendly_output_red_action(info):\n",
+    "    # parse the info dict form step output and write out what the red agent is doing\n",
+    "    red_info = info['agent_actions']['data_manipulation_attacker']\n",
+    "    red_action = red_info[0]\n",
+    "    if red_action == 'DONOTHING':\n",
+    "        red_str = 'DO NOTHING'\n",
+    "    elif red_action == 'NODE_APPLICATION_EXECUTE':\n",
+    "        client = \"client 1\" if red_info[1]['node_id'] == 0 else \"client 2\"\n",
+    "        red_str = f\"ATTACK from {client}\"\n",
+    "    return red_str"
   ]
  },
  {
@@ -400,9 +427,9 @@
   },
   "outputs": [],
   "source": [
-    "for step in range(32):\n",
+    "for step in range(35):\n",
    "    obs, reward, terminated, truncated, info = env.step(0)\n",
-    "    print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['client_1_data_manipulation_red_bot'][0]}, Blue reward:{reward}\" )"
+    "    print(f\"step: {env.game.step_counter}, Red action: {friendly_output_red_action(info)}, Blue reward:{reward}\" )"
   ]
  },
  {
@@ -468,7 +495,7 @@
   "source": [
    "obs, reward, terminated, truncated, info = env.step(13)  # patch the database\n",
    "print(f\"step: {env.game.step_counter}\")\n",
-    "print(f\"Red action: {info['agent_actions']['client_1_data_manipulation_red_bot'][0]}\" )\n",
+    "print(f\"Red action: {info['agent_actions']['data_manipulation_attacker'][0]}\" )\n",
    "print(f\"Green action: {info['agent_actions']['client_1_green_user'][0]}\" )\n",
    "print(f\"Green action: {info['agent_actions']['client_2_green_user'][0]}\" )\n",
    "print(f\"Blue reward:{reward}\" )"
@@ -480,7 +507,7 @@
   "source": [
    "The patching takes two steps, so the reward hasn't changed yet. Let's do nothing for another timestep, the reward should improve.\n",
    "\n",
-    "The reward will be 0 as soon as the file finishes restoring. Then, the reward will increase to 1 when the green agent makes a request. (Because the webapp access part of the reward does not update until a successful request is made.)\n",
+    "The reward will increase slightly as soon as the file finishes restoring. Then, the reward will increase to 1 when both green agents make successful requests.\n",
    "\n",
    "Run the following cell until the green action is `NODE_APPLICATION_EXECUTE`, then the reward should become 1. If you run it enough times, another red attack will happen and the reward will drop again."
   ]
@@ -495,7 +522,7 @@
   "source": [
    "obs, reward, terminated, truncated, info = env.step(0)  # patch the database\n",
    "print(f\"step: {env.game.step_counter}\")\n",
-    "print(f\"Red action: {info['agent_actions']['client_1_data_manipulation_red_bot'][0]}\" )\n",
+    "print(f\"Red action: {info['agent_actions']['data_manipulation_attacker'][0]}\" )\n",
    "print(f\"Green action: {info['agent_actions']['client_2_green_user'][0]}\" )\n",
    "print(f\"Green action: {info['agent_actions']['client_1_green_user'][0]}\" )\n",
    "print(f\"Blue reward:{reward}\" )"
@@ -505,7 +532,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The blue agent can prevent attacks by implementing an ACL rule to stop client_1 from sending POSTGRES traffic to the database. (Let's also patch the database file to get the reward back up.)"
+    "The blue agent can prevent attacks by implementing an ACL rule to stop client_1 or client_2 from sending POSTGRES traffic to the database. (Let's also patch the database file to get the reward back up.)\n",
+    "\n",
+    "Let's block both clients from communicating directly with the database."
   ]
  },
  {
@@ -517,14 +546,17 @@
   "outputs": [],
   "source": [
    "env.step(13)  # Patch the database\n",
-    "print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['client_1_data_manipulation_red_bot'][0]}, Blue reward:{reward}\" )\n",
+    "print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['data_manipulation_attacker'][0]}, Blue reward:{reward}\" )\n",
    "\n",
    "env.step(26)  # Block client 1\n",
-    "print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['client_1_data_manipulation_red_bot'][0]}, Blue reward:{reward}\" )\n",
+    "print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['data_manipulation_attacker'][0]}, Blue reward:{reward}\" )\n",
+    "\n",
+    "env.step(27)  # Block client 2\n",
+    "print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['data_manipulation_attacker'][0]}, Blue reward:{reward}\" )\n",
    "\n",
    "for step in range(30):\n",
    "    obs, reward, terminated, truncated, info = env.step(0)  # do nothing\n",
-    "    print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['client_1_data_manipulation_red_bot'][0]}, Blue reward:{reward}\" )"
+    "    print(f\"step: {env.game.step_counter}, Red action: {info['agent_actions']['data_manipulation_attacker'][0]}, Blue reward:{reward}\" )"
   ]
  },
  {
@@ -538,7 +570,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's also have a look at the ACL observation to verify our new ACL rule at position 5."
+    "Let's also have a look at the ACL observation to verify our new ACL rule at positions 5 and 6."
   ]
  },
  {
@@ -549,6 +581,13 @@
   "source": [
    "obs['ACL']"
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
--- a/src/primaite/simulator/network/hardware/base.py
+++ b/src/primaite/simulator/network/hardware/base.py
@@ -548,6 +548,11 @@ class Link(SimComponent):
    def __str__(self) -> str:
        return f"{self.endpoint_a}<-->{self.endpoint_b}"

+    def apply_timestep(self, timestep: int) -> None:
+        """Apply a timestep to the simulation."""
+        super().apply_timestep(timestep)
+        self.current_load = 0.0
+

 class Node(SimComponent):
    """
--- a/src/primaite/simulator/system/applications/database_client.py
+++ b/src/primaite/simulator/system/applications/database_client.py
@@ -196,7 +196,11 @@ class DatabaseClient(Application):
            return False

        if connection_id is None:
-            connection_id = str(uuid4())
+            if self.connections:
+                connection_id = list(self.connections.keys())[-1]
+                # TODO: if the most recent connection dies, it should be automatically cleared.
+            else:
+                connection_id = str(uuid4())

        if not self.connections.get(connection_id):
            if not self.connect(connection_id=connection_id):
--- a/tests/assets/configs/bad_primaite_session.yaml
+++ b/tests/assets/configs/bad_primaite_session.yaml
@@ -46,7 +46,7 @@ agents:
        frequency: 20
        variance: 5

-  - ref: client_1_data_manipulation_red_bot
+  - ref: data_manipulation_attacker
    team: RED
    type: RedDatabaseCorruptingAgent

--- a/tests/assets/configs/eval_only_primaite_session.yaml
+++ b/tests/assets/configs/eval_only_primaite_session.yaml
@@ -51,7 +51,7 @@ agents:
        frequency: 20
        variance: 5

-  - ref: client_1_data_manipulation_red_bot
+  - ref: data_manipulation_attacker
    team: RED
    type: RedDatabaseCorruptingAgent

--- a/tests/assets/configs/multi_agent_session.yaml
+++ b/tests/assets/configs/multi_agent_session.yaml
@@ -57,7 +57,7 @@ agents:
        frequency: 20
        variance: 5

-  - ref: client_1_data_manipulation_red_bot
+  - ref: data_manipulation_attacker
    team: RED
    type: RedDatabaseCorruptingAgent

--- a/tests/assets/configs/test_primaite_session.yaml
+++ b/tests/assets/configs/test_primaite_session.yaml
@@ -55,7 +55,7 @@ agents:
        frequency: 20
        variance: 5

-  - ref: client_1_data_manipulation_red_bot
+  - ref: data_manipulation_attacker
    team: RED
    type: RedDatabaseCorruptingAgent

--- a/tests/assets/configs/train_only_primaite_session.yaml
+++ b/tests/assets/configs/train_only_primaite_session.yaml
@@ -58,7 +58,7 @@ agents:
        frequency: 20
        variance: 5

-  - ref: client_1_data_manipulation_red_bot
+  - ref: data_manipulation_attacker
    team: RED
    type: RedDatabaseCorruptingAgent