Merge remote-tracking branch 'origin/dev' into feature/917_Integrate_with_RLLib
# Conflicts: # tests/test_reward.py
This commit is contained in:
@@ -21,19 +21,27 @@ def test_rewards_are_being_penalised_at_each_step_function(
|
||||
|
||||
When the initial state is OFF compared to reference state which is ON.
|
||||
|
||||
On different steps (of the 13 in total) these are the following rewards
|
||||
for config_6 which are activated:
|
||||
File System State: goodShouldBeCorrupt = 5 (between Steps 1 & 3)
|
||||
Hardware State: onShouldBeOff = -2 (between Steps 4 & 6)
|
||||
Service State: goodShouldBeCompromised = 5 (between Steps 7 & 9)
|
||||
Software State (Software State): goodShouldBeCompromised = 5 (between
|
||||
Steps 10 & 12)
|
||||
The config 'one_node_states_on_off_lay_down_config.yaml' has 15 steps:
|
||||
On different steps, the laydown config has Pattern of Life (PoLs) which change a state of the node's attribute.
|
||||
For example, turning the nodes' file system state to CORRUPT from its original state GOOD.
|
||||
As a result these are the following rewards are activated:
|
||||
File System State: corrupt_should_be_good = -10 * 2 (on Steps 1 & 2)
|
||||
Hardware State: off_should_be_on = -10 * 2 (on Steps 4 & 5)
|
||||
Service State: compromised_should_be_good = -20 * 2 (on Steps 7 & 8)
|
||||
Software State: compromised_should_be_good = -20 * 2 (on Steps 10 & 11)
|
||||
|
||||
Total Reward: -2 - 2 + 5 + 5 + 5 + 5 + 5 + 5 = 26
|
||||
Step Count: 13
|
||||
The Pattern of Life (PoLs) last for 2 steps, so the agent is penalised twice.
|
||||
|
||||
Note: This test run inherits from conftest.py where the PrimAITE environment is ran and the blue agent is hard-coded
|
||||
to do NOTHING on every step.
|
||||
We use Pattern of Lifes (PoLs) to change the nodes states and display that the agent is being penalised on all steps
|
||||
where the live network node differs from the network reference node.
|
||||
|
||||
Total Reward: -10 + -10 + -10 + -10 + -20 + -20 + -20 + -20 = -120
|
||||
Step Count: 15
|
||||
|
||||
For the 4 steps where this occurs the average reward is:
|
||||
Average Reward: 2 (26 / 13)
|
||||
Average Reward: -8 (-120 / 15)
|
||||
"""
|
||||
with temp_primaite_session as session:
|
||||
session.evaluate()
|
||||
|
||||
Reference in New Issue
Block a user