#2374 Remove primaite session

This commit is contained in:
Marek Wolan
2024-04-16 11:26:17 +01:00
parent 72dd84886b
commit 8d0d323e0b
33 changed files with 122 additions and 1700 deletions

View File

@@ -13,42 +13,12 @@ This section configures how PrimAITE saves data during simulation and training.
.. code-block:: yaml
io_settings:
save_final_model: True
save_checkpoints: False
checkpoint_interval: 10
# save_logs: True
# save_transactions: False
save_agent_actions: True
save_step_metadata: False
save_pcap_logs: False
save_sys_logs: False
``save_final_model``
--------------------
Optional. Default value is ``True``.
Only used if training with PrimaiteSession.
If ``True``, the policy will be saved after the final training iteration.
``save_checkpoints``
--------------------
Optional. Default value is ``False``.
Only used if training with PrimaiteSession.
If ``True``, the policy will be saved periodically during training.
``checkpoint_interval``
-----------------------
Optional. Default value is ``10``.
Only used if training with PrimaiteSession and if ``save_checkpoints`` is ``True``.
Defines how often to save the policy during training.
``save_logs``
-------------

View File

@@ -1,75 +0,0 @@
.. only:: comment
© Crown-owned copyright 2023, Defence Science and Technology Laboratory UK
``training_config``
===================
Configuration items relevant to how the Reinforcement Learning agent(s) will be trained.
``training_config`` hierarchy
-----------------------------
.. code-block:: yaml
training_config:
rl_framework: SB3 # or RLLIB_single_agent or RLLIB_multi_agent
rl_algorithm: PPO # or A2C
n_learn_episodes: 5
max_steps_per_episode: 200
n_eval_episodes: 1
deterministic_eval: True
seed: 123
``rl_framework``
----------------
The RL (Reinforcement Learning) Framework to use in the training session
Options available are:
- ``SB3`` (Stable Baselines 3)
- ``RLLIB_single_agent`` (Single Agent Ray RLLib)
- ``RLLIB_multi_agent`` (Multi Agent Ray RLLib)
``rl_algorithm``
----------------
The Reinforcement Learning Algorithm to use in the training session
Options available are:
- ``PPO`` (Proximal Policy Optimisation)
- ``A2C`` (Advantage Actor Critic)
``n_learn_episodes``
--------------------
The number of episodes to train the agent(s).
This should be an integer value above ``0``
``max_steps_per_episode``
-------------------------
The number of steps each episode will last for.
This should be an integer value above ``0``.
``n_eval_episodes``
-------------------
Optional. Default value is ``0``.
The number of evaluation episodes to run the trained agent for.
This should be an integer value above ``0``.
``deterministic_eval``
----------------------
Optional. By default this value is ``False``.
If this is set to ``True``, the agents will act deterministically instead of stochastically.
``seed``
--------
Optional.
The seed is used (alongside ``deterministic_eval``) to reproduce a previous instance of training and evaluation of an RL agent.
The seed should be an integer value.
Useful for debugging.