Merge remote-tracking branch 'origin/dev' into feature/901-change-functionality-acl-rules

This commit is contained in:
SunilSamra
2023-07-13 16:48:02 +01:00
33 changed files with 677 additions and 260 deletions

View File

@@ -38,6 +38,8 @@ The best place to start is :ref:`about`
PrimAITE API <source/_autosummary/primaite>
PrimAITE Tests <source/_autosummary/tests>
source/dependencies
source/glossary
source/migration_1.2_-_2.0
.. toctree::
:caption: Project Links:

View File

@@ -10,11 +10,11 @@ PrimAITE provides the following features:
* A flexible network / system laydown based on the Python networkx framework
* Nodes and links (edges) host Python classes in order to present attributes and methods (and hence, a more representative model of a platform / system)
* A green agent Information Exchange Requirement (IER) function allows the representation of traffic (protocols and loading) on any / all links. Application of IERs is based on the status of node operating systems and services
* A green agent node Pattern-of-Life (PoL) function allows the representation of core behaviours on nodes (e.g. Hardware state, Software State, Service state, File System state)
* A 'green agent' Information Exchange Requirement (IER) function allows the representation of traffic (protocols and loading) on any / all links. Application of IERs is based on the status of node operating systems and services
* A 'green agent' node Pattern-of-Life (PoL) function allows the representation of core behaviours on nodes (e.g. changing the Hardware state, Software State, Service state, or File System state)
* An Access Control List (ACL) function, mimicking the behaviour of a network firewall, is applied across the model, following standard ACL rule format (e.g. DENY/ALLOW, source IP, destination IP, protocol and port). Application of IERs adheres to any ACL restrictions
* Presents an OpenAI Gym interface to the environment, allowing integration with any OpenAI Gym compliant defensive agents
* Red agent activity based on red IERs and red PoL
* Red agent activity based on 'red' IERs and 'red' PoL
* Defined reward function for use with RL agents (based on nodes status, and green / red IER success)
* Fully configurable (network / system laydown, IERs, node PoL, ACL, episode step period, episode max steps) and repeatable to suit the training requirements of agents. Therefore, not bound to a representation of any particular platform, system or technology
* Full capture of discrete metrics relating to agent training (full system state, agent actions taken, average reward)
@@ -201,7 +201,7 @@ An example observation space is provided below:
* -
- ID
- Hardware State
- SoftwareState
- Software State
- File System State
- Service / Protocol A
- Service / Protocol B
@@ -250,48 +250,35 @@ An example observation space is provided below:
For the nodes, the following values are represented:
* ID
* Hardware State:
.. code-block::
* 1 = ON
* 2 = OFF
* 3 = RESETTING
* 4 = SHUTTING_DOWN
* 5 = BOOTING
* SoftwareState:
* 1 = GOOD
* 2 = PATCHING
* 3 = COMPROMISED
* Service State:
* 1 = GOOD
* 2 = PATCHING
* 3 = COMPROMISED
* 4 = OVERWHELMED
* File System State:
* 1 = GOOD
* 2 = CORRUPT
* 3 = DESTROYED
* 4 = REPAIRING
* 5 = RESTORING
[
ID
Hardware State (1=ON, 2=OFF, 3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING)
Operating System State (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
File System State (0=none, 1=GOOD, 2=CORRUPT, 3=DESTROYED, 4=REPAIRING, 5=RESTORING)
Service1/Protocol1 state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
Service2/Protocol2 state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
]
(Note that each service available in the network is provided as a column, although not all nodes may utilise all services)
For the links, the following statuses are represented:
* ID
* Hardware State = N/A
* SoftwareState = N/A
* Protocol = loading in bits/s
.. code-block::
[
ID
Hardware State (0=not applicable)
Operating System State (0=not applicable)
File System State (0=not applicable)
Service1/Protocol1 state (Traffic load from this protocol on this link)
Service2/Protocol2 state (Traffic load from this protocol on this link)
]
NodeStatus component
----------------------
This is a MultiDiscrete observation space that can be though of as a one-dimensional vector of discrete states, represented by integers.
This is a MultiDiscrete observation space that can be though of as a one-dimensional vector of discrete states.
The example above would have the following structure:
.. code-block::
@@ -307,9 +294,9 @@ Each ``node_info`` contains the following:
.. code-block::
[
hardware_state (0=none, 1=ON, 2=OFF, 3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING)
hardware_state (0=none, 1=ON, 2=OFF, 3=RESETTING, 4=SHUTTING_DOWN, 5=BOOTING)
software_state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
file_system_state (0=none, 1=GOOD, 2=CORRUPT, 3=DESTROYED, 4=REPAIRING, 5=RESTORING)
file_system_state (0=none, 1=GOOD, 2=CORRUPT, 3=DESTROYED, 4=REPAIRING, 5=RESTORING)
service1_state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
service2_state (0=none, 1=GOOD, 2=PATCHING, 3=COMPROMISED)
]
@@ -320,10 +307,18 @@ In a network with three nodes and two services, the full observation space would
gym.spaces.MultiDiscrete([4,5,6,4,4,4,5,6,4,4,4,5,6,4,4])
.. note::
NodeStatus observation component provides information only about nodes. Links are not considered.
LinkTrafficLevels
-----------------
This component is a MultiDiscrete space showing the traffic flow levels on the links in the network, after applying a threshold to convert it from a continuous to a discrete value.
The number of bins can be customised with 5 being the default. It has the following strucutre:
There are two configurable parameters:
* ``quantisation_levels`` determines how many discrete bins to use for converting the continuous traffic value to discrete (default is 5).
* ``combine_service_traffic`` determines whether to separately output traffic use for each network protocol or whether to combine them into an overall value for the link. (default is ``True``)
For example, with default parameters and a network with three links, the structure of this component would be:
.. code-block::
[
@@ -337,16 +332,13 @@ Each ``link_status`` is a number from 0-4 representing the network load in relat
.. code-block::
0 = No traffic (0%)
1 = low traffic (<33%)
2 = medium traffic (<66%)
3 = high traffic (<100%)
1 = low traffic (1%-33%)
2 = medium traffic (33%-66%)
3 = high traffic (66%-99%)
4 = max traffic/ overwhelmed (100%)
If the network has three links, the full observation space would have 3 elements. It can be written with ``gym`` notation to indicate the number of discrete options for each of the elements of the observation space. For example:
Using ``gym`` notation, the shape of the obs space is: ``gym.spaces.MultiDiscrete([5,5,5])``.
.. code-block::
gym.spaces.MultiDiscrete([5,5,5])
Action Spaces
**************

View File

@@ -83,13 +83,24 @@ The environment config file consists of the following attributes:
The other configurable item is ``flatten`` which is false by default. When set to true, the observation space is flattened (turned into a 1-D vector). You should use this if your RL agent does not natively support observation space types like ``gym.Spaces.Tuple``.
* **num_episodes** [int]
* **num_train_episodes** [int]
This defines the number of episodes that the agent will train or be evaluated over.
This defines the number of episodes that the agent will train for.
* **num_steps** [int]
Determines the number of steps to run in each episode of the session
* **num_train_steps** [int]
Determines the number of steps to run in each episode of the training session.
* **num_eval_episodes** [int]
This defines the number of episodes that the agent will be evaluated over.
* **num_eval_steps** [int]
Determines the number of steps to run in each episode of the evaluation session.
* **time_delay** [int]

View File

@@ -2,64 +2,137 @@
=============
**Integrating a user defined blue agent**
Integrating a user defined blue agent
*************************************
Integrating a blue agent with PrimAITE requires some modification of the code within the main.py file. The main.py file
consists of a number of functions, each of which will invoke training for a particular agent. These are:
.. note::
* Generic (run_generic)
* Stable Baselines 3 PPO (:func:`~primaite.main.run_stable_baselines3_ppo)
* Stable Baselines 3 A2C (:func:`~primaite.main.run_stable_baselines3_a2c)
If you are planning to implement custom RL agents into PrimAITE, you must use the project as a repository. If you install PrimAITE as a python package from wheel, custom agents are not supported.
The selection of which agent type to use is made via the training config file. In order to train a user generated agent,
the run_generic function should be selected, and should be modified (typically) to be:
PrimAITE has integration with Ray RLLib and StableBaselines3 agents. All agents interface with PrimAITE through an :py:class:`primaite.agents.agent.AgentSessionABC<Agent Session>` which provides Input/Output of agent savefiles, as well as capturing and plotting performance metrics during training and evaluation. If you wish to integrate a custom blue agent, it is recommended to create a subclass of the :py:class:`primaite.agents.agent.AgentSessionABC` and implement the ``__init__()``, ``_setup()``, ``_save_checkpoint()``, ``learn()``, ``evaluate()``, ``_get_latest_checkpoint``, ``load()``, and ``save()`` methods.
Below is a barebones example of a custom agent implementation:
.. code:: python
agent = MyAgent(environment, num_steps)
for episode in range(0, num_episodes):
agent.learn()
env.close()
save_agent(agent)
# src/primaite/agents/my_custom_agent.py
Where:
from primaite.agents.agent import AgentSessionABC
from primaite.common.enums import AgentFramework, AgentIdentifier
* *MyAgent* is the user created agent
* *environment* is the :class:`~primaite.environment.primaite_env.Primaite` environment
* *num_episodes* is the number of episodes in the session, as defined in the training config file
* *num_steps* is the number of steps in an episode, as defined in the training config file
* the *.learn()* function should be defined in the user created agent
* the *env.close()* function is defined within PrimAITE
* the *save_agent()* assumes that a *save()* function has been defined in the user created agent. If not, this line can
be ommitted (although it is encouraged, since it will allow the agent to be saved and ported)
class CustomAgent(AgentSessionABC):
def __init__(self, training_config_path, lay_down_config_path):
super().__init__(training_config_path, lay_down_config_path)
assert self._training_config.agent_framework == AgentFramework.CUSTOM
assert self._training_config.agent_identifier == AgentIdentifier.MY_AGENT
self._setup()
The code below provides a suggested format for the learn() function within the user created agent.
It's important to include the *self.environment.reset()* call within the episode loop in order that the
environment is reset between episodes. Note that the example below should not be considered exhaustive.
def _setup(self):
super()._setup()
self._env = Primaite(
training_config_path=self._training_config_path,
lay_down_config_path=self._lay_down_config_path,
session_path=self.session_path,
timestamp_str=self.timestamp_str,
)
self._agent = ... # your code to setup agent
.. code:: python
def _save_checkpoint(self):
checkpoint_num = self._training_config.checkpoint_every_n_episodes
episode_count = self._env.episode_count
save_checkpoint = False
if checkpoint_num:
save_checkpoint = episode_count % checkpoint_num == 0
# saves checkpoint if the episode count is not 0 and save_checkpoint flag was set to true
if episode_count and save_checkpoint:
...
# your code to save checkpoint goes here.
# The path should start with self.checkpoints_path and include the episode number.
def learn(self) :
def learn(self):
...
# call your agent's learning function here.
# pre-reqs
super().learn() # this will finalise learning and output session metadata
self.save()
# reset the environment
self.environment.reset()
done = False
def evaluate(self):
...
# call your agent's evaluation function here.
for step in range(max_steps):
# calculate the action
action = ...
self._env.close()
super().evaluate()
# execute the environment step
new_state, reward, done, info = self.environment.step(action)
def _get_latest_checkpoint(self):
...
# Load an agent from file.
# algorithm updates
...
@classmethod
def load(cls, path):
...
# Create a CustomAgent object which loads model weights from file.
# update to our new state
state = new_state
def save(self):
...
# Call your agent's function that saves it to a file
# if done, finish episode
if done == True:
break
You will also need to modify :py:class:`primaite.primaite_session.PrimaiteSession<PrimaiteSession>` and :py:mod:`primaite.common.enums` to capture your new agent identifiers.
.. code-block:: python
:emphasize-lines: 17, 18
# src/primaite/common/enums.py
class AgentIdentifier(Enum):
"""The Red Agent algo/class."""
A2C = 1
"Advantage Actor Critic"
PPO = 2
"Proximal Policy Optimization"
HARDCODED = 3
"The Hardcoded agents"
DO_NOTHING = 4
"The DoNothing agents"
RANDOM = 5
"The RandomAgent"
DUMMY = 6
"The DummyAgent"
CUSTOM_AGENT = 7
"Your custom agent"
.. code-block:: python
:emphasize-lines: 3, 11, 12
# src/primaite_session.py
from primaite.agents.my_custom_agent import CustomAgent
# ...
def setup(self):
"""Performs the session setup."""
if self._training_config.agent_framework == AgentFramework.CUSTOM:
_LOGGER.debug(f"PrimaiteSession Setup: Agent Framework = {AgentFramework.CUSTOM}")
if self._training_config.agent_identifier == AgentIdentifier.CUSTOM_AGENT:
self._agent_session = CustomAgent(self._training_config_path, self._lay_down_config_path)
if self._training_config.agent_identifier == AgentIdentifier.HARDCODED:
_LOGGER.debug(f"PrimaiteSession Setup: Agent Identifier =" f" {AgentIdentifier.HARDCODED}")
if self._training_config.action_type == ActionType.NODE:
# Deterministic Hardcoded Agent with Node Action Space
self._agent_session = HardCodedNodeAgent(self._training_config_path, self._lay_down_config_path)
Finally, specify your agent in your training config.
.. code-block:: yaml
# ~/primaite/config/path/to/your/config_main.yaml
# Training Config File
agent_framework: CUSTOM
agent_identifier: CUSTOM_AGENT
random_red_agent: False
# ...
Now you can :ref:`run a primaite session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.

77
docs/source/glossary.rst Normal file
View File

@@ -0,0 +1,77 @@
Glossary
=============
.. glossary::
:sorted:
Network
The network in primaite is a logical representation of a computer network containing :term:`Nodes<Node>` and :term:`Links<Link>`.
Node
A Node represents a network endpoint. For example a computer, server, switch, or an actuator.
Link
A Link represents the connection between two Nodes. For example, a physical wire between a computer and a switch or a wireless connection.
Protocol
Protocols are used by links to separate different types of network traffic. Common examples would be HTTP, TCP, and UDP.
Service
A service represents a piece of software that is installed on a node, such as a web server or a database.
Access Control List
PrimAITE blocks or allows certain traffic on the network by simulating firewall rules, which are defined in the Access Control List.
Agent
An agent is a representation of a user of the network. Typically this would be a user that is using one of the computer nodes, though it could be an autonomous agent.
Green agent
Simulates typical benign activity on the network, such as real users using computers and servers.
Red Agent
An agent that is aiming to attack the network in some way, for example by executing a Denial-Of-Service attack or stealing data.
Blue Agent
A defensive agent that protects the network from Red Agent attacks to minimise disruption to green agents and protect data.
Information Exchange Requirement (IER)
Simulates network traffic by sending data from one network node to another via links for a specified amount of time. IERs can be part of green agent behaviour or red agent behaviour. PrimAITE can be configured to apply a penalty for green agents' IERs being blocked and a reward for red agents' IERs being blocked.
Pattern-of-Life (PoL)
PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.
Reward
The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment / :term:`reference environment` and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
Observation
An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
Action
The learning agent decides on an action to take on every step in the simulation. The action has the chance to positively or negatively impact the environment state. Over time, the agent aims to learn which actions to take when to maximise the expected reward.
Training
During training, an RL agent is placed in the simulated network and it learns which actions to take in which scenarios to obtain maximum reward.
Evaluation
During evaluation, an RL agent acts on the simulated network but it is not allowed to update it's behaviour. Evaluation is used to assess how successful agents are at defending the network.
Step
The agents can only act in the environment at discrete intervals. The time step is the basic unit of time in the simulation. At each step, the RL agent has an opportunity to observe the state of the environment and decide an action. Steps are also used for updating states for time-dependent activities such as rebooting a node.
Episode
When an episode starts, the network simulation is reset to an initial state. The agents take actions on each step of the episode until it reaches a terminal state, which usually happens after a predetermined number of steps. After the terminal state is reached, a new episode starts and the RL agent has another opportunity to protect the network.
Reference environment
While the network simulation is unfolding, a parallel simulation takes place which is identical to the main one except that blue and red agent actions are not applied. This reference environment essentially shows what would be happening to the network if there had been no cyberattack or defense. The reference environment is used to calculate rewards.
Transaction
PrimAITE records the decisions of the learning agent by saving its observation, action, and reward at every time step. During each session, this data is saved to disk to allow for full inspection.
Laydown
The laydown is a file which defines the training scenario. It contains the network topology, firewall rules, services, protocols, and details about green and red agent behaviours.
Gym
PrimAITE uses the Gym reinforcement learning framework API to create a training environment and interface with RL agents. Gym defines a common way of creating observations, actions, and rewards.
User data directory
PrimAITE supports upgrading software version while retaining user data. The user data directory is where configs, notebooks, and results are stored, this location is `~/primaite` on linux/darwin and `C:\Users\<username>\primaite` on Windows.

View File

@@ -0,0 +1,51 @@
v1.2 to v2.0 Migration guide
============================
**1. Installing PrimAITE**
Like before, you can install primaite from the repository by running ``pip install -e .``. But, there is now an additional setup step which does several things, like setting up user directories, copy default configs and notebooks, etc. Once you have installed PrimAITE to your virtual environment, run this command to finalise setup.
.. code-block:: bash
primaite setup
**2. Running a training session**
In version 1.2 of PrimAITE, the main entry point for training or evaluating agents was the ``src/primaite/main.py`` file. v2.0.0 introduced managed 'sessions' which are responsible for reading configuration files, performing training, and writing outputs.
``main.py`` file still runs a training session but it now uses the new `PrimaiteSession`, and it now requires you to provide the path to your config files.
.. code-block:: bash
python src/primaite/main.py --tc path/to/training-config.yaml --ldc path/to/laydown-config.yaml
Alternatively, the session can be invoked via the commandline by running:
.. code-block:: bash
primaite session --tc path/to/training-config.yaml --ldc path/to/laydown-config.yaml
**3. Location of configs**
In version 1.2, training configs and laydown configs were all stored in the project repository under ``src/primaite/config``. Version 2.0.0 introduced user data directories, and now when you install and setup PrimAITE, config files are stored in your user data location. On Linux/OSX, this is stored in ``~/primaite/config``. On Windows, this is stored in ``C:\Users\<your username>\primaite\configs``. Upon first setup, the configs folder is populated with some default yaml files. It is recommended that you store all your custom configuration files here.
**4. Contents of configs**
Some things that were previously part of the laydown config are now part of the traning config.
* Actions
If you have custom configs which use these, you will need to adapt them by moving the configuration from the laydown config to the training config.
Also, there are new configurable items in the training config:
* Observations
* Agent framework
* Agent
* Deep learning framework
* random red agents
* seed
* deterministic
* hard coded agent view
Each of these items have default values which are designed so that PrimAITE has the same behaviour as it did in 1.2.0, so you do not have to specify them.

View File

@@ -1,3 +1,5 @@
.. _run a primaite session:
Run a PrimAITE Session
======================
@@ -78,9 +80,9 @@ PrimAITE automatically creates two sets of results from each session:
* Timestamp
* Episode number
* Step number
* Initial observation space (what the blue agent observed when it decided its action)
* Reward value
* Action taken (as presented by the blue agent on this step). Individual elements of the action space are presented in the format AS_X
* Initial observation space (what the blue agent observed when it decided its action)
**Diagrams**

View File

@@ -162,12 +162,11 @@ class AgentSessionABC(ABC):
metadata_dict = json.load(file)
metadata_dict["end_datetime"] = datetime.now().isoformat()
if not self.is_eval:
metadata_dict["learning"]["total_episodes"] = self._env.episode_count # noqa
metadata_dict["learning"]["total_episodes"] = self._env.actual_episode_count # noqa
metadata_dict["learning"]["total_time_steps"] = self._env.total_step_count # noqa
else:
metadata_dict["evaluation"]["total_episodes"] = self._env.episode_count # noqa
metadata_dict["evaluation"]["total_episodes"] = self._env.actual_episode_count # noqa
metadata_dict["evaluation"]["total_time_steps"] = self._env.total_step_count # noqa
filepath = self.session_path / "session_metadata.json"
@@ -218,10 +217,11 @@ class AgentSessionABC(ABC):
:param kwargs: Any agent-specific key-word args to be passed.
"""
self._env.set_as_eval() # noqa
self.is_eval = True
self._plot_av_reward_per_episode(learning_session=False)
_LOGGER.info("Finished evaluation")
if self._can_evaluate:
self._plot_av_reward_per_episode(learning_session=False)
self._update_session_metadata_file()
self.is_eval = True
_LOGGER.info("Finished evaluation")
@abstractmethod
def _get_latest_checkpoint(self):
@@ -375,8 +375,8 @@ class HardCodedAgentSessionABC(AgentSessionABC):
self._env.set_as_eval() # noqa
self.is_eval = True
time_steps = self._training_config.num_steps
episodes = self._training_config.num_episodes
time_steps = self._training_config.num_eval_steps
episodes = self._training_config.num_eval_episodes
obs = self._env.reset()
for episode in range(episodes):
@@ -395,6 +395,7 @@ class HardCodedAgentSessionABC(AgentSessionABC):
time.sleep(self._training_config.time_delay / 1000)
obs = self._env.reset()
self._env.close()
super().evaluate()
@classmethod
def load(cls):

View File

@@ -97,8 +97,12 @@ class RLlibAgent(AgentSessionABC):
metadata_dict = json.load(file)
metadata_dict["end_datetime"] = datetime.now().isoformat()
metadata_dict["total_episodes"] = self._current_result["episodes_total"]
metadata_dict["total_time_steps"] = self._current_result["timesteps_total"]
if not self.is_eval:
metadata_dict["learning"]["total_episodes"] = self._current_result["episodes_total"] # noqa
metadata_dict["learning"]["total_time_steps"] = self._current_result["timesteps_total"] # noqa
else:
metadata_dict["evaluation"]["total_episodes"] = self._current_result["episodes_total"] # noqa
metadata_dict["evaluation"]["total_time_steps"] = self._current_result["timesteps_total"] # noqa
filepath = self.session_path / "session_metadata.json"
_LOGGER.debug(f"Updating Session Metadata file: {filepath}")
@@ -122,13 +126,13 @@ class RLlibAgent(AgentSessionABC):
)
self._agent_config.seed = self._training_config.seed
self._agent_config.training(train_batch_size=self._training_config.num_steps)
self._agent_config.training(train_batch_size=self._training_config.num_train_steps)
self._agent_config.framework(framework="tf")
self._agent_config.rollouts(
num_rollout_workers=1,
num_envs_per_worker=1,
horizon=self._training_config.num_steps,
horizon=self._training_config.num_train_steps,
)
self._agent: Algorithm = self._agent_config.build(logger_creator=_custom_log_creator(self.learning_path))
@@ -150,8 +154,8 @@ class RLlibAgent(AgentSessionABC):
:param kwargs: Any agent-specific key-word args to be passed.
"""
time_steps = self._training_config.num_steps
episodes = self._training_config.num_episodes
time_steps = self._training_config.num_train_steps
episodes = self._training_config.num_train_episodes
_LOGGER.info(f"Beginning learning for {episodes} episodes @" f" {time_steps} time steps...")
for i in range(episodes):
@@ -162,9 +166,6 @@ class RLlibAgent(AgentSessionABC):
super().learn()
# save agent
self.save()
def evaluate(
self,
**kwargs,

View File

@@ -65,11 +65,12 @@ class SB3Agent(AgentSessionABC):
session_path=self.session_path,
timestamp_str=self.timestamp_str,
)
self._agent = self._agent_class(
PPOMlp,
self._env,
verbose=self.sb3_output_verbose_level,
n_steps=self._training_config.num_steps,
n_steps=self._training_config.num_train_steps,
tensorboard_log=str(self._tensorboard_log_path),
seed=self._training_config.seed,
)
@@ -97,14 +98,14 @@ class SB3Agent(AgentSessionABC):
:param kwargs: Any agent-specific key-word args to be passed.
"""
time_steps = self._training_config.num_steps
episodes = self._training_config.num_episodes
time_steps = self._training_config.num_train_steps
episodes = self._training_config.num_train_episodes
self.is_eval = False
_LOGGER.info(f"Beginning learning for {episodes} episodes @" f" {time_steps} time steps...")
for i in range(episodes):
self._agent.learn(total_timesteps=time_steps)
self._save_checkpoint()
self._env.reset()
self._env._write_av_reward_per_episode() # noqa
self.save()
self._env.close()
super().learn()
@@ -121,8 +122,8 @@ class SB3Agent(AgentSessionABC):
:param kwargs: Any agent-specific key-word args to be passed.
"""
time_steps = self._training_config.num_steps
episodes = self._training_config.num_episodes
time_steps = self._training_config.num_eval_steps
episodes = self._training_config.num_eval_episodes
self._env.set_as_eval()
self.is_eval = True
if self._training_config.deterministic:
@@ -140,7 +141,7 @@ class SB3Agent(AgentSessionABC):
if isinstance(action, np.ndarray):
action = np.int64(action)
obs, rewards, done, info = self._env.step(action)
self._env.reset()
self._env._write_av_reward_per_episode() # noqa
self._env.close()
super().evaluate()

View File

@@ -59,11 +59,19 @@ observation_space:
- name: NODE_LINK_TABLE
# - name: NODE_STATUSES
# - name: LINK_TRAFFIC_LEVELS
# Number of episodes to run per session
num_episodes: 10
# Number of time_steps per episode
num_steps: 256
# Number of episodes for training to run per session
num_train_episodes: 10
# Number of time_steps for training per episode
num_train_steps: 256
# Number of episodes for evaluation to run per session
num_eval_episodes: 1
# Number of time_steps for evaluation per episode
num_eval_steps: 256
# Sets how often the agent will save a checkpoint (every n time episodes).
# Set to 0 if no checkpoints are required. Default is 10

View File

@@ -61,11 +61,17 @@ class TrainingConfig:
action_type: ActionType = ActionType.ANY
"The ActionType to use"
num_episodes: int = 10
"The number of episodes to train over"
num_train_episodes: int = 10
"The number of episodes to train over during an training session"
num_steps: int = 256
"The number of steps in an episode"
num_train_steps: int = 256
"The number of steps in an episode during an training session"
num_eval_episodes: int = 1
"The number of episodes to train over during an evaluation session"
num_eval_steps: int = 256
"The number of steps in an episode during an evaluation session"
checkpoint_every_n_episodes: int = 5
"The agent will save a checkpoint every n episodes"
@@ -249,8 +255,17 @@ class TrainingConfig:
tc += f"{self.hard_coded_agent_view}, "
tc += f"{self.action_type}, "
tc += f"observation_space={self.observation_space}, "
tc += f"{self.num_episodes} episodes @ "
tc += f"{self.num_steps} steps"
if self.session_type is SessionType.TRAIN:
tc += f"{self.num_train_episodes} episodes @ "
tc += f"{self.num_train_steps} steps"
elif self.session_type is SessionType.EVAL:
tc += f"{self.num_eval_episodes} episodes @ "
tc += f"{self.num_eval_steps} steps"
else:
tc += f"Training: {self.num_eval_episodes} episodes @ "
tc += f"{self.num_eval_steps} steps"
tc += f"Evaluation: {self.num_eval_episodes} episodes @ "
tc += f"{self.num_eval_steps} steps"
return tc
@@ -298,24 +313,27 @@ def convert_legacy_training_config_dict(
agent_framework: AgentFramework = AgentFramework.SB3,
agent_identifier: AgentIdentifier = AgentIdentifier.PPO,
action_type: ActionType = ActionType.ANY,
num_steps: int = 256,
num_train_steps: int = 256,
) -> Dict[str, Any]:
"""
Convert a legacy training config dict to the new format.
:param legacy_config_dict: A legacy training config dict.
:param agent_framework: The agent framework to use as legacy training configs don't have agent_framework values.
:param agent_identifier: The red agent identifier to use as legacy training configs don't have agent_identifier
values.
:param action_type: The action space type to set as legacy training configs don't have action_type values.
:param num_steps: The number of steps to set as legacy training configs don't have num_steps values.
:param agent_framework: The agent framework to use as legacy training
configs don't have agent_framework values.
:param agent_identifier: The red agent identifier to use as legacy
training configs don't have agent_identifier values.
:param action_type: The action space type to set as legacy training configs
don't have action_type values.
:param num_train_steps: The number of steps to set as legacy training configs
don't have num_train_steps values.
:return: The converted training config dict.
"""
config_dict = {
"agent_framework": agent_framework.name,
"agent_identifier": agent_identifier.name,
"action_type": action_type.name,
"num_steps": num_steps,
"num_train_steps": num_train_steps,
"sb3_output_verbose_level": SB3OutputVerboseLevel.INFO.name,
}
session_type_map = {"TRAINING": "TRAIN", "EVALUATION": "EVAL"}
@@ -336,7 +354,8 @@ def _get_new_key_from_legacy(legacy_key: str) -> str:
"""
key_mapping = {
"agentIdentifier": None,
"numEpisodes": "num_episodes",
"numEpisodes": "num_train_episodes",
"numSteps": "num_train_steps",
"timeDelay": "time_delay",
"configFilename": None,
"sessionType": "session_type",

View File

@@ -84,7 +84,12 @@ class Primaite(Env):
_LOGGER.info(f"Using: {str(self.training_config)}")
# Number of steps in an episode
self.episode_steps = self.training_config.num_steps
if self.training_config.session_type == SessionType.TRAIN:
self.episode_steps = self.training_config.num_train_steps
elif self.training_config.session_type == SessionType.EVAL:
self.episode_steps = self.training_config.num_eval_steps
else:
self.episode_steps = self.training_config.num_train_steps
super(Primaite, self).__init__()
@@ -259,6 +264,12 @@ class Primaite(Env):
self.episode_count = 0
self.step_count = 0
self.total_step_count = 0
self.episode_steps = self.training_config.num_eval_steps
def _write_av_reward_per_episode(self):
if self.actual_episode_count > 0:
csv_data = self.actual_episode_count, self.average_reward
self.episode_av_reward_writer.write(csv_data)
def reset(self):
"""
@@ -267,10 +278,7 @@ class Primaite(Env):
Returns:
Environment observation space (reset)
"""
if self.actual_episode_count > 0:
csv_data = self.actual_episode_count, self.average_reward
self.episode_av_reward_writer.write(csv_data)
self._write_av_reward_per_episode()
self.episode_count += 1
# Don't need to reset links, as they are cleared and recalculated every

View File

@@ -90,7 +90,6 @@ def calculate_reward_function(
f"Penalty of {ier_reward} was NOT applied."
)
)
return reward_value

View File

@@ -15,5 +15,6 @@ def av_rewards_dict(av_rewards_csv_file: Union[str, Path]) -> Dict[int, float]:
:param av_rewards_csv_file: The average rewards per episode csv file path.
:return: The average rewards per episode cdv as a dict.
"""
d = pl.read_csv(av_rewards_csv_file).to_dict()
return {v: d["Average Reward"][i] for i, v in enumerate(d["Episode"])}
df = pl.read_csv(av_rewards_csv_file).to_dict()
return {v: df["Average Reward"][i] for i, v in enumerate(df["Episode"])}

View File

@@ -20,10 +20,12 @@ agent_identifier: PPO
# "ACL"
# "ANY" node and acl actions
action_type: ANY
# Number of episodes to run per session
num_episodes: 10
# Number of time_steps per episode
num_steps: 256
# Number of episodes for training to run per session
num_train_episodes: 10
# Number of time_steps for training per episode
num_train_steps: 256
# Time delay between steps (for generic agents)
time_delay: 10
# Type of session to be run (TRAINING or EVALUATION)

View File

@@ -22,11 +22,11 @@ agent_identifier: A2C
# "ACL"
# "ANY" node and acl actions
action_type: ANY
# Number of episodes to run per session
num_episodes: 1
# Number of time_steps per episode
num_steps: 5
# Number of episodes for training to run per session
num_train_episodes: 1
# Number of time_steps for training per episode
num_train_steps: 5
observation_space:
components:

View File

@@ -22,10 +22,11 @@ agent_identifier: RANDOM
# "ACL"
# "ANY" node and acl actions
action_type: ANY
# Number of episodes to run per session
num_episodes: 1
# Number of time_steps per episode
num_steps: 5
# Number of episodes for training to run per session
num_train_episodes: 1
# Number of time_steps for training per episode
num_train_steps: 5
observation_space:
components:

View File

@@ -22,10 +22,12 @@ agent_identifier: RANDOM
# "ACL"
# "ANY" node and acl actions
action_type: ANY
# Number of episodes to run per session
num_episodes: 1
# Number of time_steps per episode
num_steps: 5
# Number of episodes for training to run per session
num_train_episodes: 1
# Number of time_steps for training per episode
num_train_steps: 5
observation_space:
components:

View File

@@ -22,10 +22,11 @@ agent_identifier: RANDOM
# "ACL"
# "ANY" node and acl actions
action_type: ANY
# Number of episodes to run per session
num_episodes: 1
# Number of time_steps per episode
num_steps: 5
# Number of episodes for training to run per session
num_train_episodes: 1
# Number of time_steps for training per episode
num_train_steps: 5
# Time delay between steps (for generic agents)
time_delay: 1
# Type of session to be run (TRAINING or EVALUATION)

View File

@@ -18,11 +18,6 @@
- name: ftp
port: '21'
state: GOOD
- item_type: POSITION
positions:
- node: '1'
x_pos: 309
y_pos: 78
- item_type: RED_POL
id: '1'
start_step: 1

View File

@@ -22,10 +22,13 @@ agent_identifier: DUMMY
# "ACL"
# "ANY" node and acl actions
action_type: NODE
# Number of episodes to run per session
num_episodes: 1
# Number of time_steps per episode
num_steps: 15
# Number of episodes for evaluation to run per session
num_eval_episodes: 1
# Number of time_steps for evaluation per episode
num_eval_steps: 15
# Time delay between steps (for generic agents)
time_delay: 1

View File

@@ -60,10 +60,16 @@ observation_space:
# - name: NODE_STATUSES
# - name: LINK_TRAFFIC_LEVELS
# Number of episodes to run per session
num_episodes: 10
num_train_episodes: 10
# Number of time_steps per episode
num_steps: 256
num_train_steps: 256
# Number of episodes to run per session
num_eval_episodes: 10
# Number of time_steps per episode
num_eval_steps: 256
# Sets how often the agent will save a checkpoint (every n time episodes).
# Set to 0 if no checkpoints are required. Default is 10

View File

@@ -60,10 +60,16 @@ observation_space:
# - name: NODE_STATUSES
# - name: LINK_TRAFFIC_LEVELS
# Number of episodes to run per session
num_episodes: 10
num_train_episodes: 10
# Number of time_steps per episode
num_steps: 256
num_train_steps: 256
# Number of episodes to run per session
num_eval_episodes: 1
# Number of time_steps per episode
num_eval_steps: 256
# Sets how often the agent will save a checkpoint (every n time episodes).
# Set to 0 if no checkpoints are required. Default is 10

View File

@@ -22,10 +22,12 @@ agent_identifier: RANDOM
# "ACL"
# "ANY" node and acl actions
action_type: ANY
# Number of episodes to run per session
num_episodes: 1
# Number of time_steps per episode
num_steps: 15
# Number of episodes for training to run per session
num_train_episodes: 1
# Number of time_steps for training per episode
num_train_steps: 15
# Time delay between steps (for generic agents)
time_delay: 1
# Type of session to be run (TRAINING or EVALUATION)

View File

@@ -32,14 +32,6 @@
- name: TCP
port: '80'
state: COMPROMISED
- item_type: POSITION
positions:
- node: '1'
x_pos: 309
y_pos: 78
- node: '2'
x_pos: 200
y_pos: 78
- item_type: RED_IER
id: '3'
start_step: 2

View File

@@ -22,10 +22,17 @@ agent_identifier: RANDOM
# "ACL"
# "ANY" node and acl actions
action_type: ANY
# Number of episodes to run per session
num_episodes: 1
# Number of time_steps per episode
num_steps: 5
# Number of episodes for training to run per session
num_train_episodes: 10
# Number of time_steps for training per episode
num_train_steps: 256
# Number of episodes for evaluation to run per session
num_eval_episodes: 10
# Number of time_steps for evaluation per episode
num_eval_steps: 256
# Time delay between steps (for generic agents)
time_delay: 1
# Type of session to be run (TRAINING or EVALUATION)

View File

@@ -28,10 +28,17 @@ random_red_agent: True
# "ACL"
# "ANY" node and acl actions
action_type: NODE
# Number of episodes to run per session
num_episodes: 2
# Number of time_steps per episode
num_steps: 15
# Number of episodes for training to run per session
num_train_episodes: 2
# Number of time_steps for training per episode
num_train_steps: 15
# Number of episodes for evaluation to run per session
num_eval_episodes: 2
# Number of time_steps for evaluation per episode
num_eval_steps: 15
# Time delay between steps (for generic agents)
time_delay: 1

View File

@@ -0,0 +1,153 @@
# Training Config File
# Sets which agent algorithm framework will be used.
# Options are:
# "SB3" (Stable Baselines3)
# "RLLIB" (Ray RLlib)
# "CUSTOM" (Custom Agent)
agent_framework: SB3
# Sets which deep learning framework will be used (by RLlib ONLY).
# Default is TF (Tensorflow).
# Options are:
# "TF" (Tensorflow)
# TF2 (Tensorflow 2.X)
# TORCH (PyTorch)
deep_learning_framework: TF2
# Sets which Agent class will be used.
# Options are:
# "A2C" (Advantage Actor Critic coupled with either SB3 or RLLIB agent_framework)
# "PPO" (Proximal Policy Optimization coupled with either SB3 or RLLIB agent_framework)
# "HARDCODED" (The HardCoded agents coupled with an ACL or NODE action_type)
# "DO_NOTHING" (The DoNothing agents coupled with an ACL or NODE action_type)
# "RANDOM" (primaite.agents.simple.RandomAgent)
# "DUMMY" (primaite.agents.simple.DummyAgent)
agent_identifier: PPO
# Sets whether Red Agent POL and IER is randomised.
# Options are:
# True
# False
random_red_agent: False
# Sets what view of the environment the deterministic hardcoded agent has. The default is BASIC.
# Options are:
# "BASIC" (The current observation space only)
# "FULL" (Full environment view with actions taken and reward feedback)
hard_coded_agent_view: FULL
# Sets How the Action Space is defined:
# "NODE"
# "ACL"
# "ANY" node and acl actions
action_type: NODE
# observation space
observation_space:
# flatten: true
components:
- name: NODE_LINK_TABLE
# - name: NODE_STATUSES
# - name: LINK_TRAFFIC_LEVELS
# Number of episodes for training to run per session
num_train_episodes: 3
# Number of time_steps for training per episode
num_train_steps: 25
# Number of episodes for evaluation to run per session
num_eval_episodes: 1
# Number of time_steps for evaluation per episode
num_eval_steps: 17
# Sets how often the agent will save a checkpoint (every n time episodes).
# Set to 0 if no checkpoints are required. Default is 10
checkpoint_every_n_episodes: 0
# Time delay (milliseconds) between steps for CUSTOM agents.
time_delay: 5
# Type of session to be run. Options are:
# "TRAIN" (Trains an agent)
# "EVAL" (Evaluates an agent)
# "TRAIN_EVAL" (Trains then evaluates an agent)
session_type: TRAIN_EVAL
# Environment config values
# The high value for the observation space
observation_space_high_value: 1000000000
# The Stable Baselines3 learn/eval output verbosity level:
# Options are:
# "NONE" (No Output)
# "INFO" (Info Messages (such as devices and wrappers used))
# "DEBUG" (All Messages)
sb3_output_verbose_level: NONE
# Reward values
# Generic
all_ok: 0
# Node Hardware State
off_should_be_on: -10
off_should_be_resetting: -5
on_should_be_off: -2
on_should_be_resetting: -5
resetting_should_be_on: -5
resetting_should_be_off: -2
resetting: -3
# Node Software or Service State
good_should_be_patching: 2
good_should_be_compromised: 5
good_should_be_overwhelmed: 5
patching_should_be_good: -5
patching_should_be_compromised: 2
patching_should_be_overwhelmed: 2
patching: -3
compromised_should_be_good: -20
compromised_should_be_patching: -20
compromised_should_be_overwhelmed: -20
compromised: -20
overwhelmed_should_be_good: -20
overwhelmed_should_be_patching: -20
overwhelmed_should_be_compromised: -20
overwhelmed: -20
# Node File System State
good_should_be_repairing: 2
good_should_be_restoring: 2
good_should_be_corrupt: 5
good_should_be_destroyed: 10
repairing_should_be_good: -5
repairing_should_be_restoring: 2
repairing_should_be_corrupt: 2
repairing_should_be_destroyed: 0
repairing: -3
restoring_should_be_good: -10
restoring_should_be_repairing: -2
restoring_should_be_corrupt: 1
restoring_should_be_destroyed: 2
restoring: -6
corrupt_should_be_good: -10
corrupt_should_be_repairing: -10
corrupt_should_be_restoring: -10
corrupt_should_be_destroyed: 2
corrupt: -10
destroyed_should_be_good: -20
destroyed_should_be_repairing: -20
destroyed_should_be_restoring: -20
destroyed_should_be_corrupt: -20
destroyed: -20
scanning: -2
# IER status
red_ier_running: -5
green_ier_blocked: -10
# Patching / Reset durations
os_patching_duration: 5 # The time taken to patch the OS
node_reset_duration: 5 # The time taken to reset a node (hardware)
service_patching_duration: 5 # The time taken to patch a service
file_system_repairing_limit: 5 # The time take to repair the file system
file_system_restoring_limit: 5 # The time take to restore the file system
file_system_scanning_limit: 5 # The time taken to scan the file system

View File

@@ -1,17 +1,16 @@
# Crown Copyright (C) Dstl 2022. DEFCON 703. Shared in confidence.
import datetime
import json
import shutil
import tempfile
import time
from datetime import datetime
from pathlib import Path
from typing import Dict, Union
from typing import Any, Dict, Union
from unittest.mock import patch
import pytest
from primaite import getLogger
from primaite.common.enums import AgentIdentifier
from primaite.environment.primaite_env import Primaite
from primaite.primaite_session import PrimaiteSession
from primaite.utils.session_output_reader import av_rewards_dict
@@ -48,6 +47,11 @@ class TempPrimaiteSession(PrimaiteSession):
csv_file = f"average_reward_per_episode_{self.timestamp_str}.csv"
return av_rewards_dict(self.evaluation_path / csv_file)
def metadata_file_as_dict(self) -> Dict[str, Any]:
"""Read the session_metadata.json file and return as a dict."""
with open(self.session_path / "session_metadata.json", "r") as file:
return json.load(file)
@property
def env(self) -> Primaite:
"""Direct access to the env for ease of testing."""
@@ -58,6 +62,7 @@ class TempPrimaiteSession(PrimaiteSession):
def __exit__(self, type, value, tb):
shutil.rmtree(self.session_path)
shutil.rmtree(self.session_path.parent)
_LOGGER.debug(f"Deleted temp session directory: {self.session_path}")
@@ -129,58 +134,3 @@ def temp_session_path() -> Path:
session_path.mkdir(exist_ok=True, parents=True)
return session_path
def _get_primaite_env_from_config(
training_config_path: Union[str, Path],
lay_down_config_path: Union[str, Path],
temp_session_path,
):
"""Takes a config path and returns the created instance of Primaite."""
session_timestamp: datetime = datetime.now()
session_path = temp_session_path(session_timestamp)
timestamp_str = session_timestamp.strftime("%Y-%m-%d_%H-%M-%S")
env = Primaite(
training_config_path=training_config_path,
lay_down_config_path=lay_down_config_path,
session_path=session_path,
timestamp_str=timestamp_str,
)
config_values = env.training_config
config_values.num_steps = env.episode_steps
# TOOD: This needs t be refactored to happen outside. Should be part of
# a main Session class.
if env.training_config.agent_identifier is AgentIdentifier.RANDOM:
run_generic(env, config_values)
return env
def run_generic(env, config_values):
"""Run against a generic agent."""
# Reset the environment at the start of the episode
# env.reset()
for episode in range(0, config_values.num_episodes):
for step in range(0, config_values.num_steps):
# Send the observation space to the agent to get an action
# TEMP - random action for now
# action = env.blue_agent_action(obs)
# action = env.action_space.sample()
action = 0
# Run the simulation step on the live environment
obs, reward, done, info = env.step(action)
# Break if done is True
if done:
break
# Introduce a delay between steps
time.sleep(config_values.time_delay / 1000)
# Reset the environment at the end of the episode
# env.reset()
# env.close()

View File

@@ -1,7 +1,10 @@
import pytest
from primaite import getLogger
from tests import TEST_CONFIG_ROOT
_LOGGER = getLogger(__name__)
@pytest.mark.parametrize(
"temp_primaite_session",
@@ -45,6 +48,5 @@ def test_rewards_are_being_penalised_at_each_step_function(
"""
with temp_primaite_session as session:
session.evaluate()
session.close()
ev_rewards = session.eval_av_reward_per_episode_csv()
assert ev_rewards[1] == -8.0

View File

@@ -13,8 +13,8 @@ def run_generic_set_actions(env: Primaite):
# Reset the environment at the start of the episode
# env.reset()
training_config = env.training_config
for episode in range(0, training_config.num_episodes):
for step in range(0, training_config.num_steps):
for episode in range(0, training_config.num_train_episodes):
for step in range(0, training_config.num_train_steps):
# Send the observation space to the agent to get an action
# TEMP - random action for now
# action = env.blue_agent_action(obs)

View File

@@ -0,0 +1,42 @@
import pytest
from primaite import getLogger
from primaite.config.lay_down_config import dos_very_basic_config_path
from tests import TEST_CONFIG_ROOT
_LOGGER = getLogger(__name__)
@pytest.mark.parametrize(
"temp_primaite_session",
[[TEST_CONFIG_ROOT / "train_episode_step.yaml", dos_very_basic_config_path()]],
indirect=True,
)
def test_eval_steps_differ_from_training(temp_primaite_session):
"""Uses PrimaiteSession class to compare number of episodes used for training and evaluation.
Train_episode_step.yaml main config:
num_train_steps = 25
num_train_episodes = 3
num_eval_steps = 17
num_eval_episodes = 1
"""
expected_learning_metadata = {"total_episodes": 3, "total_time_steps": 75}
expected_evaluation_metadata = {"total_episodes": 1, "total_time_steps": 17}
with temp_primaite_session as session:
# Run learning and check episode and step counts
session.learn()
assert session.env.actual_episode_count == expected_learning_metadata["total_episodes"]
assert session.env.total_step_count == expected_learning_metadata["total_time_steps"]
# Run evaluation and check episode and step counts
session.evaluate()
assert session.env.actual_episode_count == expected_evaluation_metadata["total_episodes"]
assert session.env.total_step_count == expected_evaluation_metadata["total_time_steps"]
# Load the session_metadata.json file and check that the both the
# learning and evaluation match what is expected above
metadata = session.metadata_file_as_dict()
assert metadata["learning"] == expected_learning_metadata
assert metadata["evaluation"] == expected_evaluation_metadata