Add a bit of documentation

2023-10-24 15:51:29 +01:00
parent 174de013fb
commit d49d5b292d
3 changed files with 106 additions and 4 deletions
--- a/docs/source/config(v3).rst
+++ b/docs/source/config(v3).rst
@@ -0,0 +1,15 @@
+Primaite v3 config
+******************
+
+
+
+The YAML file allows configuring a cybersecurity scenario involving a computer network and multiple agents. There are three main sections: training_config, game, and simulation.
+
+The simulation section describes the simulated network environment with which the agetns interact.
+
+The game section describes the agents and their capabilities. Each agent has a unique type and is associated with a team (GREEN, RED, or BLUE). Each agent has a configurable observation space, action space, and reward function.
+
+The training_config section describes the training parameters for the learning agents. This includes the number of episodes, the number of steps per episode, and the number of steps before the agents start learning. The training_config section also describes the learning algorithm used by the agents. The learning algorithm is specified by the name of the algorithm and the hyperparameters for the algorithm. The hyperparameters are specific to each algorithm and are described in the documentation for each algorithm.
+
+.. only:: comment
+    This needs a bit of refactoring so I haven't written extensive documentation about the config yet.
--- a/docs/source/high_level_project_structure.rst
+++ b/docs/source/high_level_project_structure.rst
@@ -0,0 +1,87 @@
+Primaite Codebase Documentation
+===============================
+
+High-level structure
+--------------------
+The Primaite codebase consists of two main modules: the agent-training infrastructure and the simulation logic. These modules have been decoupled to allow for flexibility and modularity. The 'game' module acts as an interface between agents and the simulation.
+
+Simulation
+----------
+The simulation module purely simulates a computer network. It has no concept of agents acting, but it can interact with agents by providing a 'state' dictionary (using the SimComponent describe_state() method) and by accepting requests (a list of strings).
+
+Game layer
+----------
+
+The game layer is responsible for managing agents and getting them to interface with the simulator correctly. It consists of several components:
+
+Observations
+^^^^^^^^^^^^^^^^^^
+
+The ObservationManager is responsible for generating observations from the simulator state dictionary. The data is formatted so it's compatible with Gymnasium.spaces. The ObservationManager is used by the AgentManager to generate observations for each agent.
+
+Actions
+^^^^^^^
+
+The ActionManager is responsible for converting actions selected by agents (which comply with Gymnasium.spaces API) into simulation-friendly requests. The ActionManager is used by the AgentManager to take actions for each agent.
+
+Rewards
+^^^^^^^
+
+The RewardManager is responsible for calculating rewards based on the state (similar to observations). The RewardManager is used by the AgentManager to calculate rewards for each agent.
+
+Agents
+^^^^^^
+
+The AgentManager is responsible for managing agents and their interactions with the simulator. It uses the ObservationManager to generate observations for each agent, the ActionManager to take actions for each agent, and the RewardManager to calculate rewards for each agent.
+
+PrimaiteSession
+^^^^^^^^^^^^^^^
+
+PrimaiteSession is the main entry point into Primaite and it allows the simultaneous coordination of a simulation and agents that interact with it. It also sends messages to ARCD GATE to perform reinforcement learning. PrimaiteSession uses the AgentManager to manage agents and their interactions with the simulator.
+
+Code snippets
+-------------
+Here's an example of how to create a PrimaiteSession object:
+
+.. code-block:: python
+
+    from primaite import PrimaiteSession
+
+    session = PrimaiteSession()
+
+To start the simulation, use the start() method:
+
+.. code-block:: python
+
+    session.start()
+
+To stop the simulation, use the stop() method:
+
+.. code-block:: python
+
+    session.stop()
+
+To get the current state of the simulation, use the describe_state() method. This is also used as input for generating observations and rewards:
+
+.. code-block:: python
+
+    state = session.sim.describe_state()
+
+To get the current observation of an agent, use the get_observation() method:
+
+.. code-block:: python
+
+    observation = session.get_observation(agent_id)
+
+To get the current reward of an agent, use the get_reward() method:
+
+.. code-block:: python
+
+    reward = session.get_reward(agent_id)
+
+To take an action for an agent, use the take_action() method:
+
+.. code-block:: python
+
+    action = agent.select_action(observation)
+    session.take_action(agent_id, action)
--- a/example_config.yaml
+++ b/example_config.yaml
@@ -2,10 +2,10 @@ training_config:
  rl_framework: SB3
  rl_algorithm: PPO
  seed: 333
-  n_learn_episodes: 1
-  n_learn_steps: 8
-  n_eval_episodes: 0
-  n_eval_steps: 8
+  n_learn_episodes: 20
+  n_learn_steps: 128
+  n_eval_episodes: 20
+  n_eval_steps: 128


 game_config: