Add better hyperlinks

2023-07-12 09:16:40 +01:00
parent fdec452e7a
commit 5d5d70c0b6
3 changed files with 4 additions and 2 deletions
--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
@@ -41,7 +41,7 @@ Glossary
        PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.

    Reward
-        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current state of the environment and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
+        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment / :term:`reference environment` and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.

    Observation
        An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.