Add better hyperlinks

2023-07-12 09:16:40 +01:00
parent 0ec2f79ac3
commit c7547f715e
3 changed files with 4 additions and 2 deletions
--- a/docs/source/custom_agent.rst
+++ b/docs/source/custom_agent.rst
@@ -135,4 +135,4 @@ Finally, specify your agent in your training config.
    random_red_agent: False
    # ...

-Now you can `Run a PrimAITE Session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
+Now you can :ref:`run a primaite session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
@@ -41,7 +41,7 @@ Glossary
        PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.

    Reward
-        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current state of the environment and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
+        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment / :term:`reference environment` and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.

    Observation
        An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
--- a/docs/source/primaite_session.rst
+++ b/docs/source/primaite_session.rst
@@ -1,3 +1,5 @@
+.. _run a primaite session:
+
 Run a PrimAITE Session
 ======================