Merged PR 108: Divide default rewards by 10000

## Summary
As per the discussion this morning, this PR reimplements changes that were made by ADSP to make the default rewards smaller. This also adds type hints rewards as floats.

## Test process
I checked that sessions are able to run and that they report values similar to what we are used to but smaller by a factor of 10000. I did not change the reward values in the integration test configs, and the tests still pass.

## Checklist
- [x] This PR is linked to a **work item**
- [x] I have performed **self-review** of the code
- [x] I have written **tests** for any new functionality added with this PR
- [x] I have updated the **documentation** if this PR changes or adds functionality
- [x] I have run **pre-commit** checks for code style

Related work items: #889, #1586
This commit is contained in:
Marek Wolan
2023-07-06 15:17:47 +00:00
7 changed files with 214 additions and 214 deletions

View File

@@ -82,203 +82,203 @@ The environment config file consists of the following attributes:
Rewards are calculated based on the difference between the current state and reference state (the 'should be' state) of the environment.
* **Generic [all_ok]** [int]
* **Generic [all_ok]** [float]
The score to give when the current situation (for a given component) is no different from that expected in the baseline (i.e. as though no blue or red agent actions had been undertaken)
* **Node Hardware State [off_should_be_on]** [int]
* **Node Hardware State [off_should_be_on]** [float]
The score to give when the node should be on, but is off
* **Node Hardware State [off_should_be_resetting]** [int]
* **Node Hardware State [off_should_be_resetting]** [float]
The score to give when the node should be resetting, but is off
* **Node Hardware State [on_should_be_off]** [int]
* **Node Hardware State [on_should_be_off]** [float]
The score to give when the node should be off, but is on
* **Node Hardware State [on_should_be_resetting]** [int]
* **Node Hardware State [on_should_be_resetting]** [float]
The score to give when the node should be resetting, but is on
* **Node Hardware State [resetting_should_be_on]** [int]
* **Node Hardware State [resetting_should_be_on]** [float]
The score to give when the node should be on, but is resetting
* **Node Hardware State [resetting_should_be_off]** [int]
* **Node Hardware State [resetting_should_be_off]** [float]
The score to give when the node should be off, but is resetting
* **Node Hardware State [resetting]** [int]
* **Node Hardware State [resetting]** [float]
The score to give when the node is resetting
* **Node Operating System or Service State [good_should_be_patching]** [int]
* **Node Operating System or Service State [good_should_be_patching]** [float]
The score to give when the state should be patching, but is good
* **Node Operating System or Service State [good_should_be_compromised]** [int]
* **Node Operating System or Service State [good_should_be_compromised]** [float]
The score to give when the state should be compromised, but is good
* **Node Operating System or Service State [good_should_be_overwhelmed]** [int]
* **Node Operating System or Service State [good_should_be_overwhelmed]** [float]
The score to give when the state should be overwhelmed, but is good
* **Node Operating System or Service State [patching_should_be_good]** [int]
* **Node Operating System or Service State [patching_should_be_good]** [float]
The score to give when the state should be good, but is patching
* **Node Operating System or Service State [patching_should_be_compromised]** [int]
* **Node Operating System or Service State [patching_should_be_compromised]** [float]
The score to give when the state should be compromised, but is patching
* **Node Operating System or Service State [patching_should_be_overwhelmed]** [int]
* **Node Operating System or Service State [patching_should_be_overwhelmed]** [float]
The score to give when the state should be overwhelmed, but is patching
* **Node Operating System or Service State [patching]** [int]
* **Node Operating System or Service State [patching]** [float]
The score to give when the state is patching
* **Node Operating System or Service State [compromised_should_be_good]** [int]
* **Node Operating System or Service State [compromised_should_be_good]** [float]
The score to give when the state should be good, but is compromised
* **Node Operating System or Service State [compromised_should_be_patching]** [int]
* **Node Operating System or Service State [compromised_should_be_patching]** [float]
The score to give when the state should be patching, but is compromised
* **Node Operating System or Service State [compromised_should_be_overwhelmed]** [int]
* **Node Operating System or Service State [compromised_should_be_overwhelmed]** [float]
The score to give when the state should be overwhelmed, but is compromised
* **Node Operating System or Service State [compromised]** [int]
* **Node Operating System or Service State [compromised]** [float]
The score to give when the state is compromised
* **Node Operating System or Service State [overwhelmed_should_be_good]** [int]
* **Node Operating System or Service State [overwhelmed_should_be_good]** [float]
The score to give when the state should be good, but is overwhelmed
* **Node Operating System or Service State [overwhelmed_should_be_patching]** [int]
* **Node Operating System or Service State [overwhelmed_should_be_patching]** [float]
The score to give when the state should be patching, but is overwhelmed
* **Node Operating System or Service State [overwhelmed_should_be_compromised]** [int]
* **Node Operating System or Service State [overwhelmed_should_be_compromised]** [float]
The score to give when the state should be compromised, but is overwhelmed
* **Node Operating System or Service State [overwhelmed]** [int]
* **Node Operating System or Service State [overwhelmed]** [float]
The score to give when the state is overwhelmed
* **Node File System State [good_should_be_repairing]** [int]
* **Node File System State [good_should_be_repairing]** [float]
The score to give when the state should be repairing, but is good
* **Node File System State [good_should_be_restoring]** [int]
* **Node File System State [good_should_be_restoring]** [float]
The score to give when the state should be restoring, but is good
* **Node File System State [good_should_be_corrupt]** [int]
* **Node File System State [good_should_be_corrupt]** [float]
The score to give when the state should be corrupt, but is good
* **Node File System State [good_should_be_destroyed]** [int]
* **Node File System State [good_should_be_destroyed]** [float]
The score to give when the state should be destroyed, but is good
* **Node File System State [repairing_should_be_good]** [int]
* **Node File System State [repairing_should_be_good]** [float]
The score to give when the state should be good, but is repairing
* **Node File System State [repairing_should_be_restoring]** [int]
* **Node File System State [repairing_should_be_restoring]** [float]
The score to give when the state should be restoring, but is repairing
* **Node File System State [repairing_should_be_corrupt]** [int]
* **Node File System State [repairing_should_be_corrupt]** [float]
The score to give when the state should be corrupt, but is repairing
* **Node File System State [repairing_should_be_destroyed]** [int]
* **Node File System State [repairing_should_be_destroyed]** [float]
The score to give when the state should be destroyed, but is repairing
* **Node File System State [repairing]** [int]
* **Node File System State [repairing]** [float]
The score to give when the state is repairing
* **Node File System State [restoring_should_be_good]** [int]
* **Node File System State [restoring_should_be_good]** [float]
The score to give when the state should be good, but is restoring
* **Node File System State [restoring_should_be_repairing]** [int]
* **Node File System State [restoring_should_be_repairing]** [float]
The score to give when the state should be repairing, but is restoring
* **Node File System State [restoring_should_be_corrupt]** [int]
* **Node File System State [restoring_should_be_corrupt]** [float]
The score to give when the state should be corrupt, but is restoring
* **Node File System State [restoring_should_be_destroyed]** [int]
* **Node File System State [restoring_should_be_destroyed]** [float]
The score to give when the state should be destroyed, but is restoring
* **Node File System State [restoring]** [int]
* **Node File System State [restoring]** [float]
The score to give when the state is restoring
* **Node File System State [corrupt_should_be_good]** [int]
* **Node File System State [corrupt_should_be_good]** [float]
The score to give when the state should be good, but is corrupt
* **Node File System State [corrupt_should_be_repairing]** [int]
* **Node File System State [corrupt_should_be_repairing]** [float]
The score to give when the state should be repairing, but is corrupt
* **Node File System State [corrupt_should_be_restoring]** [int]
* **Node File System State [corrupt_should_be_restoring]** [float]
The score to give when the state should be restoring, but is corrupt
* **Node File System State [corrupt_should_be_destroyed]** [int]
* **Node File System State [corrupt_should_be_destroyed]** [float]
The score to give when the state should be destroyed, but is corrupt
* **Node File System State [corrupt]** [int]
* **Node File System State [corrupt]** [float]
The score to give when the state is corrupt
* **Node File System State [destroyed_should_be_good]** [int]
* **Node File System State [destroyed_should_be_good]** [float]
The score to give when the state should be good, but is destroyed
* **Node File System State [destroyed_should_be_repairing]** [int]
* **Node File System State [destroyed_should_be_repairing]** [float]
The score to give when the state should be repairing, but is destroyed
* **Node File System State [destroyed_should_be_restoring]** [int]
* **Node File System State [destroyed_should_be_restoring]** [float]
The score to give when the state should be restoring, but is destroyed
* **Node File System State [destroyed_should_be_corrupt]** [int]
* **Node File System State [destroyed_should_be_corrupt]** [float]
The score to give when the state should be corrupt, but is destroyed
* **Node File System State [destroyed]** [int]
* **Node File System State [destroyed]** [float]
The score to give when the state is destroyed
* **Node File System State [scanning]** [int]
* **Node File System State [scanning]** [float]
The score to give when the state is scanning
* **IER Status [red_ier_running]** [int]
* **IER Status [red_ier_running]** [float]
The score to give when a red agent IER is permitted to run
* **IER Status [green_ier_blocked]** [int]
* **IER Status [green_ier_blocked]** [float]
The score to give when a green agent IER is prevented from running

View File

@@ -83,58 +83,58 @@ sb3_output_verbose_level: NONE
# Generic
all_ok: 0
# Node Hardware State
off_should_be_on: -10
off_should_be_resetting: -5
on_should_be_off: -2
on_should_be_resetting: -5
resetting_should_be_on: -5
resetting_should_be_off: -2
resetting: -3
off_should_be_on: -0.001
off_should_be_resetting: -0.0005
on_should_be_off: -0.0002
on_should_be_resetting: -0.0005
resetting_should_be_on: -0.0005
resetting_should_be_off: -0.0002
resetting: -0.0003
# Node Software or Service State
good_should_be_patching: 2
good_should_be_compromised: 5
good_should_be_overwhelmed: 5
patching_should_be_good: -5
patching_should_be_compromised: 2
patching_should_be_overwhelmed: 2
patching: -3
compromised_should_be_good: -20
compromised_should_be_patching: -20
compromised_should_be_overwhelmed: -20
compromised: -20
overwhelmed_should_be_good: -20
overwhelmed_should_be_patching: -20
overwhelmed_should_be_compromised: -20
overwhelmed: -20
good_should_be_patching: 0.0002
good_should_be_compromised: 0.0005
good_should_be_overwhelmed: 0.0005
patching_should_be_good: -0.0005
patching_should_be_compromised: 0.0002
patching_should_be_overwhelmed: 0.0002
patching: -0.0003
compromised_should_be_good: -0.002
compromised_should_be_patching: -0.002
compromised_should_be_overwhelmed: -0.002
compromised: -0.002
overwhelmed_should_be_good: -0.002
overwhelmed_should_be_patching: -0.002
overwhelmed_should_be_compromised: -0.002
overwhelmed: -0.002
# Node File System State
good_should_be_repairing: 2
good_should_be_restoring: 2
good_should_be_corrupt: 5
good_should_be_destroyed: 10
repairing_should_be_good: -5
repairing_should_be_restoring: 2
repairing_should_be_corrupt: 2
repairing_should_be_destroyed: 0
repairing: -3
restoring_should_be_good: -10
restoring_should_be_repairing: -2
restoring_should_be_corrupt: 1
restoring_should_be_destroyed: 2
restoring: -6
corrupt_should_be_good: -10
corrupt_should_be_repairing: -10
corrupt_should_be_restoring: -10
corrupt_should_be_destroyed: 2
corrupt: -10
destroyed_should_be_good: -20
destroyed_should_be_repairing: -20
destroyed_should_be_restoring: -20
destroyed_should_be_corrupt: -20
destroyed: -20
scanning: -2
good_should_be_repairing: 0.0002
good_should_be_restoring: 0.0002
good_should_be_corrupt: 0.0005
good_should_be_destroyed: 0.001
repairing_should_be_good: -0.0005
repairing_should_be_restoring: 0.0002
repairing_should_be_corrupt: 0.0002
repairing_should_be_destroyed: 0.0000
repairing: -0.0003
restoring_should_be_good: -0.001
restoring_should_be_repairing: -0.0002
restoring_should_be_corrupt: 0.0001
restoring_should_be_destroyed: 0.0002
restoring: -0.0006
corrupt_should_be_good: -0.001
corrupt_should_be_repairing: -0.001
corrupt_should_be_restoring: -0.001
corrupt_should_be_destroyed: 0.0002
corrupt: -0.001
destroyed_should_be_good: -0.002
destroyed_should_be_repairing: -0.002
destroyed_should_be_restoring: -0.002
destroyed_should_be_corrupt: -0.002
destroyed: -0.002
scanning: -0.0002
# IER status
red_ier_running: -5
green_ier_blocked: -10
red_ier_running: -0.0005
green_ier_blocked: -0.001
# Patching / Reset durations
os_patching_duration: 5 # The time taken to patch the OS

View File

@@ -37,60 +37,60 @@ observation_space_high_value: 1000000000
# Reward values
# Generic
all_ok: 0
all_ok: 0.0000
# Node Hardware State
off_should_be_on: -10
off_should_be_resetting: -5
on_should_be_off: -2
on_should_be_resetting: -5
resetting_should_be_on: -5
resetting_should_be_off: -2
resetting: -3
off_should_be_on: -0.001
off_should_be_resetting: -0.0005
on_should_be_off: -0.0002
on_should_be_resetting: -0.0005
resetting_should_be_on: -0.0005
resetting_should_be_off: -0.0002
resetting: -0.0003
# Node Software or Service State
good_should_be_patching: 2
good_should_be_compromised: 5
good_should_be_overwhelmed: 5
patching_should_be_good: -5
patching_should_be_compromised: 2
patching_should_be_overwhelmed: 2
patching: -3
compromised_should_be_good: -20
compromised_should_be_patching: -20
compromised_should_be_overwhelmed: -20
compromised: -20
overwhelmed_should_be_good: -20
overwhelmed_should_be_patching: -20
overwhelmed_should_be_compromised: -20
overwhelmed: -20
good_should_be_patching: 0.0002
good_should_be_compromised: 0.0005
good_should_be_overwhelmed: 0.0005
patching_should_be_good: -0.0005
patching_should_be_compromised: 0.0002
patching_should_be_overwhelmed: 0.0002
patching: -0.0003
compromised_should_be_good: -0.002
compromised_should_be_patching: -0.002
compromised_should_be_overwhelmed: -0.002
compromised: -0.002
overwhelmed_should_be_good: -0.002
overwhelmed_should_be_patching: -0.002
overwhelmed_should_be_compromised: -0.002
overwhelmed: -0.002
# Node File System State
good_should_be_repairing: 2
good_should_be_restoring: 2
good_should_be_corrupt: 5
good_should_be_destroyed: 10
repairing_should_be_good: -5
repairing_should_be_restoring: 2
repairing_should_be_corrupt: 2
repairing_should_be_destroyed: 0
repairing: -3
restoring_should_be_good: -10
restoring_should_be_repairing: -2
restoring_should_be_corrupt: 1
restoring_should_be_destroyed: 2
restoring: -6
corrupt_should_be_good: -10
corrupt_should_be_repairing: -10
corrupt_should_be_restoring: -10
corrupt_should_be_destroyed: 2
corrupt: -10
destroyed_should_be_good: -20
destroyed_should_be_repairing: -20
destroyed_should_be_restoring: -20
destroyed_should_be_corrupt: -20
destroyed: -20
scanning: -2
good_should_be_repairing: 0.0002
good_should_be_restoring: 0.0002
good_should_be_corrupt: 0.0005
good_should_be_destroyed: 0.001
repairing_should_be_good: -0.0005
repairing_should_be_restoring: 0.0002
repairing_should_be_corrupt: 0.0002
repairing_should_be_destroyed: 0.0000
repairing: -0.0003
restoring_should_be_good: -0.001
restoring_should_be_repairing: -0.0002
restoring_should_be_corrupt: 0.0001
restoring_should_be_destroyed: 0.0002
restoring: -0.0006
corrupt_should_be_good: -0.001
corrupt_should_be_repairing: -0.001
corrupt_should_be_restoring: -0.001
corrupt_should_be_destroyed: 0.0002
corrupt: -0.001
destroyed_should_be_good: -0.002
destroyed_should_be_repairing: -0.002
destroyed_should_be_restoring: -0.002
destroyed_should_be_corrupt: -0.002
destroyed: -0.002
scanning: -0.0002
# IER status
red_ier_running: -5
green_ier_blocked: -10
red_ier_running: -0.0005
green_ier_blocked: -0.001
# Patching / Reset durations
os_patching_duration: 5 # The time taken to patch the OS

View File

@@ -94,64 +94,64 @@ class TrainingConfig:
# Reward values
# Generic
all_ok: int = 0
all_ok: float = 0
# Node Hardware State
off_should_be_on: int = -10
off_should_be_resetting: int = -5
on_should_be_off: int = -2
on_should_be_resetting: int = -5
resetting_should_be_on: int = -5
resetting_should_be_off: int = -2
resetting: int = -3
off_should_be_on: float = -0.001
off_should_be_resetting: float = -0.0005
on_should_be_off: float = -0.0002
on_should_be_resetting: float = -0.0005
resetting_should_be_on: float = -0.0005
resetting_should_be_off: float = -0.0002
resetting: float = -0.0003
# Node Software or Service State
good_should_be_patching: int = 2
good_should_be_compromised: int = 5
good_should_be_overwhelmed: int = 5
patching_should_be_good: int = -5
patching_should_be_compromised: int = 2
patching_should_be_overwhelmed: int = 2
patching: int = -3
compromised_should_be_good: int = -20
compromised_should_be_patching: int = -20
compromised_should_be_overwhelmed: int = -20
compromised: int = -20
overwhelmed_should_be_good: int = -20
overwhelmed_should_be_patching: int = -20
overwhelmed_should_be_compromised: int = -20
overwhelmed: int = -20
good_should_be_patching: float = 0.0002
good_should_be_compromised: float = 0.0005
good_should_be_overwhelmed: float = 0.0005
patching_should_be_good: float = -0.0005
patching_should_be_compromised: float = 0.0002
patching_should_be_overwhelmed: float = 0.0002
patching: float = -0.0003
compromised_should_be_good: float = -0.002
compromised_should_be_patching: float = -0.002
compromised_should_be_overwhelmed: float = -0.002
compromised: float = -0.002
overwhelmed_should_be_good: float = -0.002
overwhelmed_should_be_patching: float = -0.002
overwhelmed_should_be_compromised: float = -0.002
overwhelmed: float = -0.002
# Node File System State
good_should_be_repairing: int = 2
good_should_be_restoring: int = 2
good_should_be_corrupt: int = 5
good_should_be_destroyed: int = 10
repairing_should_be_good: int = -5
repairing_should_be_restoring: int = 2
repairing_should_be_corrupt: int = 2
repairing_should_be_destroyed: int = 0
repairing: int = -3
restoring_should_be_good: int = -10
restoring_should_be_repairing: int = -2
restoring_should_be_corrupt: int = 1
restoring_should_be_destroyed: int = 2
restoring: int = -6
corrupt_should_be_good: int = -10
corrupt_should_be_repairing: int = -10
corrupt_should_be_restoring: int = -10
corrupt_should_be_destroyed: int = 2
corrupt: int = -10
destroyed_should_be_good: int = -20
destroyed_should_be_repairing: int = -20
destroyed_should_be_restoring: int = -20
destroyed_should_be_corrupt: int = -20
destroyed: int = -20
scanning: int = -2
good_should_be_repairing: float = 0.0002
good_should_be_restoring: float = 0.0002
good_should_be_corrupt: float = 0.0005
good_should_be_destroyed: float = 0.001
repairing_should_be_good: float = -0.0005
repairing_should_be_restoring: float = 0.0002
repairing_should_be_corrupt: float = 0.0002
repairing_should_be_destroyed: float = 0.0000
repairing: float = -0.0003
restoring_should_be_good: float = -0.001
restoring_should_be_repairing: float = -0.0002
restoring_should_be_corrupt: float = 0.0001
restoring_should_be_destroyed: float = 0.0002
restoring: float = -0.0006
corrupt_should_be_good: float = -0.001
corrupt_should_be_repairing: float = -0.001
corrupt_should_be_restoring: float = -0.001
corrupt_should_be_destroyed: float = 0.0002
corrupt: float = -0.001
destroyed_should_be_good: float = -0.002
destroyed_should_be_repairing: float = -0.002
destroyed_should_be_restoring: float = -0.002
destroyed_should_be_corrupt: float = -0.002
destroyed: float = -0.002
scanning: float = -0.0002
# IER status
red_ier_running: int = -5
green_ier_blocked: int = -10
red_ier_running: float = -0.0005
green_ier_blocked: float = -0.001
# Patching / Reset durations
os_patching_duration: int = 5

View File

@@ -142,10 +142,10 @@ class Primaite(Env):
self.step_info = {}
# Total reward
self.total_reward = 0
self.total_reward: float = 0
# Average reward
self.average_reward = 0
self.average_reward: float = 0
# Episode count
self.episode_count = 0
@@ -283,9 +283,9 @@ class Primaite(Env):
self._create_random_red_agent()
# Reset counters and totals
self.total_reward = 0
self.total_reward = 0.0
self.step_count = 0
self.average_reward = 0
self.average_reward = 0.0
# Update observations space and return
self.update_environent_obs()

View File

@@ -20,7 +20,7 @@ def calculate_reward_function(
red_iers,
step_count,
config_values,
):
) -> float:
"""
Compares the states of the initial and final nodes/links to get a reward.
@@ -33,7 +33,7 @@ def calculate_reward_function(
step_count: current step
config_values: Config values
"""
reward_value = 0
reward_value: float = 0.0
# For each node, compare hardware state, SoftwareState, service states
for node_key, final_node in final_nodes.items():
@@ -94,7 +94,7 @@ def calculate_reward_function(
return reward_value
def score_node_operating_state(final_node, initial_node, reference_node, config_values):
def score_node_operating_state(final_node, initial_node, reference_node, config_values) -> float:
"""
Calculates score relating to the hardware state of a node.
@@ -104,7 +104,7 @@ def score_node_operating_state(final_node, initial_node, reference_node, config_
reference_node: The node if there had been no red or blue effect
config_values: Config values
"""
score = 0
score: float = 0.0
final_node_operating_state = final_node.hardware_state
reference_node_operating_state = reference_node.hardware_state
@@ -143,7 +143,7 @@ def score_node_operating_state(final_node, initial_node, reference_node, config_
return score
def score_node_os_state(final_node, initial_node, reference_node, config_values):
def score_node_os_state(final_node, initial_node, reference_node, config_values) -> float:
"""
Calculates score relating to the Software State of a node.
@@ -153,7 +153,7 @@ def score_node_os_state(final_node, initial_node, reference_node, config_values)
reference_node: The node if there had been no red or blue effect
config_values: Config values
"""
score = 0
score: float = 0.0
final_node_os_state = final_node.software_state
reference_node_os_state = reference_node.software_state
@@ -194,7 +194,7 @@ def score_node_os_state(final_node, initial_node, reference_node, config_values)
return score
def score_node_service_state(final_node, initial_node, reference_node, config_values):
def score_node_service_state(final_node, initial_node, reference_node, config_values) -> float:
"""
Calculates score relating to the service state(s) of a node.
@@ -204,7 +204,7 @@ def score_node_service_state(final_node, initial_node, reference_node, config_va
reference_node: The node if there had been no red or blue effect
config_values: Config values
"""
score = 0
score: float = 0.0
final_node_services: Dict[str, Service] = final_node.services
reference_node_services: Dict[str, Service] = reference_node.services
@@ -266,7 +266,7 @@ def score_node_service_state(final_node, initial_node, reference_node, config_va
return score
def score_node_file_system(final_node, initial_node, reference_node, config_values):
def score_node_file_system(final_node, initial_node, reference_node, config_values) -> float:
"""
Calculates score relating to the file system state of a node.
@@ -275,7 +275,7 @@ def score_node_file_system(final_node, initial_node, reference_node, config_valu
initial_node: The node before red and blue agents take effect
reference_node: The node if there had been no red or blue effect
"""
score = 0
score: float = 0.0
final_node_file_system_state = final_node.file_system_state_actual
reference_node_file_system_state = reference_node.file_system_state_actual

View File

@@ -31,7 +31,7 @@ class Transaction(object):
"The observation space before any actions are taken"
self.obs_space_post = None
"The observation space after any actions are taken"
self.reward = None
self.reward: float = None
"The reward value"
self.action_space = None
"The action space invoked by the agent"