PrimAITE/docs/source/game_layer.rst

.. only:: comment

    © Crown-owned copyright 2024, Defence Science and Technology Laboratory UK

PrimAITE Game layer
*******************

The Primaite codebase consists of two main modules:

* ``simulator``: The simulation logic including the network topology, the network state, and behaviour of various hardware and software classes.
* ``game``: The agent-training infrastructure which helps reinforcement learning agents interface with the simulation. This includes the observation, action, and rewards, for RL agents, but also scripted deterministic agents. The game layer orchestrates all the interactions between modules.

The simulator and game layer communicate using the PrimAITE State API and the PrimAITE Request API.

The game layer is responsible for managing agents and getting them to interface with the simulator correctly. It consists of several components:


Agents
======

All agents inherit from the :py:class:`primaite.game.agent.interface.AbstractAgent` class, which mandates that they have an ObservationManager, ActionManager, and RewardManager. The agent behaviour depends on the type of agent, but there are two main types:

* RL agents action during each step is decided by an appropriate RL algorithm. The agent within PrimAITE just acts to format and forward actions decided by an RL policy.
* Deterministic agents perform all of their decision making within the PrimAITE game layer. They typically have a scripted policy which always performs the same action or a rule-based policy which performs actions based on the current state of the simulation. They can have a stochastic element, and their seed is settable.


Observations
============

An agent's observations are managed by the ``ObservationManager`` class. It generates observations based on the current simulation state dictionary. It also provides the observation space during initial setup. The data is formatted so it's compatible with ``Gymnasium.spaces``. Observation spaces are composed of one or more components which are defined by the ``AbstractObservation`` base class.

Actions
=======

An agent's actions are managed by the ``ActionManager``. It converts actions selected by agents (which are typically integers chosen from a ``gymnasium.spaces.Discrete`` space) into simulation-friendly requests. It also provides the action space during initial setup. Action spaces are composed of one or more components which are defined by the ``AbstractAction`` base class.

Rewards
=======

An agent's reward function is managed by the ``RewardManager``. It calculates rewards based on the simulation state (in a way similar to observations). Rewards can be defined as a weighted sum of small reward components. For example, an agents reward can be based on the uptime of a database service plus the loss rate of packets between clients and a web server.

Reward Components
-----------------

Currently implemented are reward components tailored to the data manipulation scenario. View the full API and description of how they work here: :py:modules:`primaite.game.agent.rewards`.

Reward Sharing
--------------

An agent's reward can be based on rewards of other agents. This is particularly useful for modelling a situation where the blue agent's job is to protect the ability of green agents to perform their pattern-of-life. This can be configured in the YAML file this way:

.. code-block:: yaml

  green_agent_1: # this agent sometimes tries to access the webpage, and sometimes the database
      # actions, observations, and agent settings go here
      reward_function:
        reward_components:

          # When the webpage loads, the reward goes up by 0.25 when it fails to load, it goes down to -0.25
          - type: WEBPAGE_UNAVAILABLE_PENALTY
            weight: 0.25
            options:
              node_hostname: client_2

          # When the database is reachable, the reward goes up by 0.05, when it is unreachable it goes down to -0.05
          - type: GREEN_ADMIN_DATABASE_UNREACHABLE_PENALTY
            weight: 0.05
            options:
              node_hostname: client_2

  blue_agent:
      # actions, observations, and agent settings go here
      reward_function:
        reward_components:

          # When the database file is in a good state, blue's reward is 0.4, when it's in a corrupted state the reward is -0.4
          - type: DATABASE_FILE_INTEGRITY
            weight: 0.40
            options:
              node_hostname: database_server
              folder_name: database
              file_name: database.db

          # The green's reward is added onto the blue's reward.
          - type: SHARED_REWARD
            weight: 1.0
            options:
              agent_name: client_2_green_user


When defining agent reward sharing, users must be careful to avoid circular references, as that would lead to an infinite calculation loop. PrimAITE will prevent circular dependencies and provide a helpful error message if they are detected in the yaml.
#2646 - Added a custom pre-commit hook that ensure the copyright clause is added to .py and .rst files. 2024-06-05 09:11:37 +01:00			`.. only:: comment`

			`© Crown-owned copyright 2024, Defence Science and Technology Laboratory UK`

Update text in docs 2023-10-24 17:02:29 +01:00			`PrimAITE Game layer`
			`*******************`

			`The Primaite codebase consists of two main modules:`

			* ``simulator``: The simulation logic including the network topology, the network state, and behaviour of various hardware and software classes.
#2068: Removed references to ARCD GATE 2023-11-23 16:19:39 +00:00			* ``game``: The agent-training infrastructure which helps reinforcement learning agents interface with the simulation. This includes the observation, action, and rewards, for RL agents, but also scripted deterministic agents. The game layer orchestrates all the interactions between modules.
Update text in docs 2023-10-24 17:02:29 +01:00
Update docs on rewards 2024-03-13 12:08:20 +00:00			`The simulator and game layer communicate using the PrimAITE State API and the PrimAITE Request API.`
Update text in docs 2023-10-24 17:02:29 +01:00
			`The game layer is responsible for managing agents and getting them to interface with the simulator correctly. It consists of several components:`


			`Agents`
Update docs on rewards 2024-03-13 12:08:20 +00:00			`======`
Update text in docs 2023-10-24 17:02:29 +01:00
Apply suggestions from code review 2023-10-25 17:19:24 +01:00			All agents inherit from the :py:class:`primaite.game.agent.interface.AbstractAgent` class, which mandates that they have an ObservationManager, ActionManager, and RewardManager. The agent behaviour depends on the type of agent, but there are two main types:
#2068: Further typo and formatting changes. 2023-11-24 15:17:08 +00:00
#2068: Removed references to ARCD GATE 2023-11-23 16:19:39 +00:00			`* RL agents action during each step is decided by an appropriate RL algorithm. The agent within PrimAITE just acts to format and forward actions decided by an RL policy.`
Doc fixes 2024-03-15 09:22:55 +00:00			`* Deterministic agents perform all of their decision making within the PrimAITE game layer. They typically have a scripted policy which always performs the same action or a rule-based policy which performs actions based on the current state of the simulation. They can have a stochastic element, and their seed is settable.`
Update text in docs 2023-10-24 17:02:29 +01:00

			`Observations`
Update docs on rewards 2024-03-13 12:08:20 +00:00			`============`
Update text in docs 2023-10-24 17:02:29 +01:00
Apply suggestions from code review 2023-10-25 17:19:24 +01:00			An agent's observations are managed by the ``ObservationManager`` class. It generates observations based on the current simulation state dictionary. It also provides the observation space during initial setup. The data is formatted so it's compatible with ``Gymnasium.spaces``. Observation spaces are composed of one or more components which are defined by the ``AbstractObservation`` base class.
Update text in docs 2023-10-24 17:02:29 +01:00
			`Actions`
Update docs on rewards 2024-03-13 12:08:20 +00:00			`=======`
Update text in docs 2023-10-24 17:02:29 +01:00
Apply suggestions from code review 2023-10-25 17:19:24 +01:00			An agent's actions are managed by the ``ActionManager``. It converts actions selected by agents (which are typically integers chosen from a ``gymnasium.spaces.Discrete`` space) into simulation-friendly requests. It also provides the action space during initial setup. Action spaces are composed of one or more components which are defined by the ``AbstractAction`` base class.
Update text in docs 2023-10-24 17:02:29 +01:00
			`Rewards`
Update docs on rewards 2024-03-13 12:08:20 +00:00			`=======`

			An agent's reward function is managed by the ``RewardManager``. It calculates rewards based on the simulation state (in a way similar to observations). Rewards can be defined as a weighted sum of small reward components. For example, an agents reward can be based on the uptime of a database service plus the loss rate of packets between clients and a web server.

			`Reward Components`
			`-----------------`

2734 - Initial User Guide Updates 2024-07-16 09:32:26 +01:00			Currently implemented are reward components tailored to the data manipulation scenario. View the full API and description of how they work here: :py:modules:`primaite.game.agent.rewards`.
Update docs on rewards 2024-03-13 12:08:20 +00:00
			`Reward Sharing`
			`--------------`

			`An agent's reward can be based on rewards of other agents. This is particularly useful for modelling a situation where the blue agent's job is to protect the ability of green agents to perform their pattern-of-life. This can be configured in the YAML file this way:`

2734 - Initial User Guide Updates 2024-07-16 09:32:26 +01:00			`.. code-block:: yaml`

			`green_agent_1: # this agent sometimes tries to access the webpage, and sometimes the database`
			`# actions, observations, and agent settings go here`
			`reward_function:`
			`reward_components:`

			`# When the webpage loads, the reward goes up by 0.25 when it fails to load, it goes down to -0.25`
			`- type: WEBPAGE_UNAVAILABLE_PENALTY`
			`weight: 0.25`
			`options:`
			`node_hostname: client_2`

			`# When the database is reachable, the reward goes up by 0.05, when it is unreachable it goes down to -0.05`
			`- type: GREEN_ADMIN_DATABASE_UNREACHABLE_PENALTY`
			`weight: 0.05`
			`options:`
			`node_hostname: client_2`

			`blue_agent:`
			`# actions, observations, and agent settings go here`
			`reward_function:`
			`reward_components:`

			`# When the database file is in a good state, blue's reward is 0.4, when it's in a corrupted state the reward is -0.4`
			`- type: DATABASE_FILE_INTEGRITY`
			`weight: 0.40`
			`options:`
			`node_hostname: database_server`
			`folder_name: database`
			`file_name: database.db`

			`# The green's reward is added onto the blue's reward.`
			`- type: SHARED_REWARD`
			`weight: 1.0`
			`options:`
			`agent_name: client_2_green_user`

Update text in docs 2023-10-24 17:02:29 +01:00
Update docs on rewards 2024-03-13 12:08:20 +00:00			`When defining agent reward sharing, users must be careful to avoid circular references, as that would lead to an infinite calculation loop. PrimAITE will prevent circular dependencies and provide a helpful error message if they are detected in the yaml.`