Files
PrimAITE/docs/source/configuration/training_config.rst
2024-02-21 14:49:59 +00:00

76 lines
1.9 KiB
ReStructuredText

.. only:: comment
© Crown-owned copyright 2023, Defence Science and Technology Laboratory UK
``training_config``
===================
Configuration items relevant to how the Reinforcement Learning agent(s) will be trained.
``training_config`` hierarchy
-----------------------------
.. code-block:: yaml
training_config:
rl_framework: SB3 # or RLLIB_single_agent or RLLIB_multi_agent
rl_algorithm: PPO # or A2C
n_learn_episodes: 5
max_steps_per_episode: 200
n_eval_episodes: 1
deterministic_eval: True
seed: 123
``rl_framework``
----------------
The RL (Reinforcement Learning) Framework to use in the training session
Options available are:
- ``SB3`` (Stable Baselines 3)
- ``RLLIB_single_agent`` (Single Agent Ray RLLib)
- ``RLLIB_multi_agent`` (Multi Agent Ray RLLib)
``rl_algorithm``
----------------
The Reinforcement Learning Algorithm to use in the training session
Options available are:
- ``PPO`` (Proximal Policy Optimisation)
- ``A2C`` (Advantage Actor Critic)
``n_learn_episodes``
--------------------
The number of episodes to train the agent(s).
This should be an integer value above ``0``
``max_steps_per_episode``
-------------------------
The number of steps each episode will last for.
This should be an integer value above ``0``.
``n_eval_episodes``
-------------------
Optional. Default value is ``0``.
The number of evaluation episodes to run the trained agent for.
This should be an integer value above ``0``.
``deterministic_eval``
----------------------
Optional. By default this value is ``False``.
If this is set to ``True``, the agents will act deterministically instead of stochastically.
``seed``
--------
Optional.
The seed is used (alongside ``deterministic_eval``) to reproduce a previous instance of training and evaluation of an RL agent.
The seed should be an integer value.
Useful for debugging.