Add better hyperlinks
This commit is contained in:
@@ -135,4 +135,4 @@ Finally, specify your agent in your training config.
|
||||
random_red_agent: False
|
||||
# ...
|
||||
|
||||
Now you can `Run a PrimAITE Session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
|
||||
Now you can :ref:`run a primaite session<run a primaite session>` with your custom agent by passing in the custom ``config_main``.
|
||||
|
||||
@@ -41,7 +41,7 @@ Glossary
|
||||
PoLs allow agents to change the current hardware, OS, file system, or service statuses of nodes during the course of an episode. For example, a green agent may restart a server node to represent scheduled maintainance. A red agent's Pattern-of-Life can be used to attack nodes by changing their states to CORRUPTED or COMPROMISED.
|
||||
|
||||
Reward
|
||||
The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current state of the environment and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
|
||||
The reward is a single number used by the blue agent to understand whether it's performing well or poorly. RL agents change their behaviour in an attempt to increase the expected reward each episode. The reward is generated based on the current states of the environment / :term:`reference environment` and is impacted positively by things like green IERS running successfully and negatively by things like nodes being compromised.
|
||||
|
||||
Observation
|
||||
An observation is a representation of the current state of the environment that is given to the learning agent so it can decide on which action to perform. If the environment is 'fully observable', the observation contains information about every possible aspect of the environment. More commonly, the environment is 'partially observable' which means the learning agent has to make decisions without knowing every detail of the current environment state.
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
.. _run a primaite session:
|
||||
|
||||
Run a PrimAITE Session
|
||||
======================
|
||||
|
||||
|
||||
Reference in New Issue
Block a user