diff --git a/docs/source/custom_agent.rst b/docs/source/custom_agent.rst
index 45d1c5a4..b4552d64 100644
--- a/docs/source/custom_agent.rst
+++ b/docs/source/custom_agent.rst
@@ -135,4 +135,4 @@ Finally, specify your agent in your training config.
     random_red_agent: False
     # ...
 
-Now you can `Run a PrimAITE Session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
+Now you can :ref:`run a primaite session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst
index 34e3c8a3..58b4cd5e 100644
--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
@@ -41,7 +41,7 @@ Glossary
         PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.
 
     Reward
-        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current state of the environment and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
+        The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment / :term:`reference environment` and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
 
     Observation
         An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
diff --git a/docs/source/primaite_session.rst b/docs/source/primaite_session.rst
index 1b48494a..a393093c 100644
--- a/docs/source/primaite_session.rst
+++ b/docs/source/primaite_session.rst
@@ -1,3 +1,5 @@
+.. _run a primaite session:
+
 Run a PrimAITE Session
 ======================