#2374 Remove primaite session
This commit is contained in:
@@ -5,8 +5,7 @@
|
||||
PrimAITE |VERSION| Configuration
|
||||
********************************
|
||||
|
||||
PrimAITE uses a single configuration file to define everything needed to train and evaluate an RL policy in a custom cybersecurity scenario. This includes the configuration of the network, the scripted or trained agents that interact with the network, as well as settings that define how to perform training in Stable Baselines 3 or Ray RLLib.
|
||||
The entire config is used by the ``PrimaiteSession`` object for users who wish to let PrimAITE handle the agent definition and training. If you wish to define custom agents and control the training loop yourself, you can use the config with the ``PrimaiteGame``, and ``PrimaiteGymEnv`` objects instead. That way, only the network configuration and agent setup parts of the config are used, and the training section is ignored.
|
||||
PrimAITE uses a single configuration file to define everything needed to create the training environment for RL agents, including the network, the scripted agents, and the RL agent's action space, observation space, and reward function.
|
||||
|
||||
Example Configuration Hierarchy
|
||||
###############################
|
||||
@@ -14,8 +13,6 @@ The top level configuration items in a configuration file is as follows
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
training_config:
|
||||
...
|
||||
io_settings:
|
||||
...
|
||||
game:
|
||||
@@ -33,7 +30,6 @@ Configurable items
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
configuration/training_config.rst
|
||||
configuration/io_settings.rst
|
||||
configuration/game.rst
|
||||
configuration/agents.rst
|
||||
|
||||
@@ -13,42 +13,12 @@ This section configures how PrimAITE saves data during simulation and training.
|
||||
.. code-block:: yaml
|
||||
|
||||
io_settings:
|
||||
save_final_model: True
|
||||
save_checkpoints: False
|
||||
checkpoint_interval: 10
|
||||
# save_logs: True
|
||||
# save_transactions: False
|
||||
save_agent_actions: True
|
||||
save_step_metadata: False
|
||||
save_pcap_logs: False
|
||||
save_sys_logs: False
|
||||
|
||||
``save_final_model``
|
||||
--------------------
|
||||
|
||||
Optional. Default value is ``True``.
|
||||
|
||||
Only used if training with PrimaiteSession.
|
||||
If ``True``, the policy will be saved after the final training iteration.
|
||||
|
||||
|
||||
``save_checkpoints``
|
||||
--------------------
|
||||
|
||||
Optional. Default value is ``False``.
|
||||
|
||||
Only used if training with PrimaiteSession.
|
||||
If ``True``, the policy will be saved periodically during training.
|
||||
|
||||
|
||||
``checkpoint_interval``
|
||||
-----------------------
|
||||
|
||||
Optional. Default value is ``10``.
|
||||
|
||||
Only used if training with PrimaiteSession and if ``save_checkpoints`` is ``True``.
|
||||
Defines how often to save the policy during training.
|
||||
|
||||
|
||||
``save_logs``
|
||||
-------------
|
||||
|
||||
@@ -1,75 +0,0 @@
|
||||
.. only:: comment
|
||||
|
||||
© Crown-owned copyright 2023, Defence Science and Technology Laboratory UK
|
||||
|
||||
``training_config``
|
||||
===================
|
||||
Configuration items relevant to how the Reinforcement Learning agent(s) will be trained.
|
||||
|
||||
``training_config`` hierarchy
|
||||
-----------------------------
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
training_config:
|
||||
rl_framework: SB3 # or RLLIB_single_agent or RLLIB_multi_agent
|
||||
rl_algorithm: PPO # or A2C
|
||||
n_learn_episodes: 5
|
||||
max_steps_per_episode: 200
|
||||
n_eval_episodes: 1
|
||||
deterministic_eval: True
|
||||
seed: 123
|
||||
|
||||
|
||||
``rl_framework``
|
||||
----------------
|
||||
The RL (Reinforcement Learning) Framework to use in the training session
|
||||
|
||||
Options available are:
|
||||
|
||||
- ``SB3`` (Stable Baselines 3)
|
||||
- ``RLLIB_single_agent`` (Single Agent Ray RLLib)
|
||||
- ``RLLIB_multi_agent`` (Multi Agent Ray RLLib)
|
||||
|
||||
``rl_algorithm``
|
||||
----------------
|
||||
The Reinforcement Learning Algorithm to use in the training session
|
||||
|
||||
Options available are:
|
||||
|
||||
- ``PPO`` (Proximal Policy Optimisation)
|
||||
- ``A2C`` (Advantage Actor Critic)
|
||||
|
||||
``n_learn_episodes``
|
||||
--------------------
|
||||
The number of episodes to train the agent(s).
|
||||
This should be an integer value above ``0``
|
||||
|
||||
``max_steps_per_episode``
|
||||
-------------------------
|
||||
The number of steps each episode will last for.
|
||||
This should be an integer value above ``0``.
|
||||
|
||||
|
||||
``n_eval_episodes``
|
||||
-------------------
|
||||
Optional. Default value is ``0``.
|
||||
|
||||
The number of evaluation episodes to run the trained agent for.
|
||||
This should be an integer value above ``0``.
|
||||
|
||||
``deterministic_eval``
|
||||
----------------------
|
||||
Optional. By default this value is ``False``.
|
||||
|
||||
If this is set to ``True``, the agents will act deterministically instead of stochastically.
|
||||
|
||||
|
||||
|
||||
``seed``
|
||||
--------
|
||||
Optional.
|
||||
|
||||
The seed is used (alongside ``deterministic_eval``) to reproduce a previous instance of training and evaluation of an RL agent.
|
||||
The seed should be an integer value.
|
||||
Useful for debugging.
|
||||
@@ -10,15 +10,6 @@ The simulator and game layer communicate using the PrimAITE State API and the Pr
|
||||
|
||||
The game layer is responsible for managing agents and getting them to interface with the simulator correctly. It consists of several components:
|
||||
|
||||
PrimAITE Session
|
||||
================
|
||||
|
||||
.. admonition:: Deprecated
|
||||
:class: deprecated
|
||||
|
||||
PrimAITE Session is being deprecated in favour of Jupyter Notebooks. The `session` command will be removed in future releases, but example notebooks will be provided to demonstrate the same functionality.
|
||||
|
||||
``PrimaiteSession`` is the main entry point into Primaite and it allows the simultaneous coordination of a simulation and agents that interact with it. ``PrimaiteSession`` keeps track of multiple agents of different types.
|
||||
|
||||
Agents
|
||||
======
|
||||
|
||||
@@ -1,41 +0,0 @@
|
||||
.. only:: comment
|
||||
|
||||
© Crown-owned copyright 2023, Defence Science and Technology Laboratory UK
|
||||
|
||||
.. _run a primaite session:
|
||||
|
||||
.. admonition:: Deprecated
|
||||
:class: deprecated
|
||||
|
||||
PrimAITE Session is being deprecated in favour of Jupyter Notebooks. The ``session`` command will be removed in future releases, but example notebooks will be provided to demonstrate the same functionality.
|
||||
|
||||
Run a PrimAITE Session
|
||||
======================
|
||||
|
||||
``PrimaiteSession`` allows the user to train or evaluate an RL agent on the primaite simulation with just a config file,
|
||||
no code required. It manages the lifecycle of a training or evaluation session, including the setup of the environment,
|
||||
policy, simulator, agents, and IO.
|
||||
|
||||
If you want finer control over the RL policy, you can interface with the :py:module::`primaite.session.environment`
|
||||
module directly without running a session.
|
||||
|
||||
|
||||
|
||||
Run
|
||||
---
|
||||
|
||||
A PrimAITE session can be started either with the ``primaite session`` command from the cli
|
||||
(See :func:`primaite.cli.session`), or by calling :func:`primaite.main.run` from a Python terminal or Jupyter Notebook.
|
||||
|
||||
There are two parameters that can be specified:
|
||||
- ``--config``: The path to the config file to use. If not specified, the default config file is used.
|
||||
- ``--agent-load-file``: The path to the pre-trained agent to load. If not specified, a new agent is created.
|
||||
|
||||
Outputs
|
||||
-------
|
||||
|
||||
Running a session creates a session output directory in your user data folder. The filepath looks like this:
|
||||
``~/primaite/{VERSION}/sessions/YYYY-MM-DD/HH-MM-SS/``. This folder contains the simulation sys logs generated by each node,
|
||||
the saved agent checkpoints, and final model. The folder also contains a .json file for each episode step that
|
||||
contains the action, reward, and simulation state. These can be found in
|
||||
``~/primaite/{VERSION}/sessions/YYYY-MM-DD/HH-MM-SS/simulation_output/episode_<n>/step_metadata/step_<n>.json``
|
||||
Reference in New Issue
Block a user