.. only:: comment

    © Crown-owned copyright 2023, Defence Science and Technology Laboratory UK

``training_config``
===================
Configuration items relevant to how the Reinforcement Learning agent(s) will be trained.

``training_config`` hierarchy
-----------------------------

.. code-block:: yaml

    training_config:
        rl_framework: SB3 # or RLLIB_single_agent or RLLIB_multi_agent
        rl_algorithm: PPO # or A2C
        n_learn_episodes: 5
        max_steps_per_episode: 200
        n_eval_episodes: 1
        deterministic_eval: True
        seed: 123


``rl_framework``
----------------
The RL (Reinforcement Learning) Framework to use in the training session

Options available are:

- ``SB3`` (Stable Baselines 3)
- ``RLLIB_single_agent`` (Single Agent Ray RLLib)
- ``RLLIB_multi_agent`` (Multi Agent Ray RLLib)

``rl_algorithm``
----------------
The Reinforcement Learning Algorithm to use in the training session

Options available are:

- ``PPO`` (Proximal Policy Optimisation)
- ``A2C`` (Advantage Actor Critic)

``n_learn_episodes``
--------------------
The number of episodes to train the agent(s).
This should be an integer value above ``0``

``max_steps_per_episode``
-------------------------
The number of steps each episode will last for.
This should be an integer value above ``0``.


``n_eval_episodes``
-------------------
Optional. Default value is ``0``.

The number of evaluation episodes to run the trained agent for.
This should be an integer value above ``0``.

``deterministic_eval``
----------------------
Optional. By default this value is ``False``.

If this is set to ``True``, the agents will act deterministically instead of stochastically.


``seed``
--------
Optional.

The seed is used (alongside ``deterministic_eval``) to reproduce a previous instance of training and evaluation of an RL agent.
The seed should be an integer value.
Useful for debugging.