Merge branch 'dev' into feature/1386-enable-a-repeatable-or-deterministic-baseline-test
This commit is contained in:
@@ -28,6 +28,10 @@ The environment config file consists of the following attributes:
|
||||
* STABLE_BASELINES3_PPO - Use a SB3 PPO agent
|
||||
* STABLE_BASELINES3_A2C - use a SB3 A2C agent
|
||||
|
||||
* **random_red_agent** [bool]
|
||||
|
||||
Determines if the session should be run with a random red agent
|
||||
|
||||
* **action_type** [enum]
|
||||
|
||||
Determines whether a NODE, ACL, or ANY (combined NODE & ACL) action space format is adopted for the session
|
||||
|
||||
@@ -78,10 +78,9 @@ PrimAITE automatically creates two sets of results from each session:
|
||||
* Timestamp
|
||||
* Episode number
|
||||
* Step number
|
||||
* Initial observation space (before red and blue agent actions have been taken). Individual elements of the observation space are presented in the format OSI_X_Y
|
||||
* Resulting observation space (after the red and blue agent actions have been taken) Individual elements of the observation space are presented in the format OSN_X_Y
|
||||
* Initial observation space (what the blue agent observed when it decided its action)
|
||||
* Reward value
|
||||
* Action space (as presented by the blue agent on this step). Individual elements of the action space are presented in the format AS_X
|
||||
* Action taken (as presented by the blue agent on this step). Individual elements of the action space are presented in the format AS_X
|
||||
|
||||
**Diagrams**
|
||||
|
||||
|
||||
Reference in New Issue
Block a user