Files
PrimAITE/tests/test_reward.py
Chris McCarthy 273876873e #915 - Created app dirs and set as constants in the top-level init.
- renamed _config_values_main to training_config.py and renamed the ConfigValuesMain class to TrainingConfig.
Moved training_config.py to src/primaite/config/training_config.py
- Renamed all training config yaml file keys to make creating an instance of TrainingConfig easier.
Moved action_type and num_steps over to the training config.
- Decoupled the training config and lay down config.
- Refactored main.py so that it can be ran from CLI and can take a training config path and a lay down config path.
- refactored all outputs so that they save to the session dir.
- Added some necessary setup scripts that handle creating app dirs, fronting example config files to the user, fronting demo notebooks to the user, performing clean-up in between installations etc.
- Added functions that attempt to retrieve the file path of users example config files that have been fronted by the primaite setup.
- Added logging config and a getLogger function in the top-level init.
- Refactored all logs entries logged to use a logger using the primaite logging config.
- Added basic typer CLI for doing things like setup, viewing logs, viewing primaite version, running a basic session.
- Updated test to use new features and config structures.
- Began updating docs. More to do here.
2023-06-07 22:40:16 +01:00

33 lines
1.2 KiB
Python

from tests import TEST_CONFIG_ROOT
from tests.conftest import _get_primaite_env_from_config
def test_rewards_are_being_penalised_at_each_step_function():
"""
Test that hardware state is penalised at each step.
When the initial state is OFF compared to reference state which is ON.
"""
env = _get_primaite_env_from_config(
training_config_path=TEST_CONFIG_ROOT
/ "one_node_states_on_off_main_config.yaml",
lay_down_config_path=TEST_CONFIG_ROOT
/ "one_node_states_on_off_lay_down_config.yaml",
)
"""
On different steps (of the 13 in total) these are the following rewards for config_6 which are activated:
File System State: goodShouldBeCorrupt = 5 (between Steps 1 & 3)
Hardware State: onShouldBeOff = -2 (between Steps 4 & 6)
Service State: goodShouldBeCompromised = 5 (between Steps 7 & 9)
Software State (Software State): goodShouldBeCompromised = 5 (between Steps 10 & 12)
Total Reward: -2 - 2 + 5 + 5 + 5 + 5 + 5 + 5 = 26
Step Count: 13
For the 4 steps where this occurs the average reward is:
Average Reward: 2 (26 / 13)
"""
print("average reward", env.average_reward)
assert env.average_reward == -8.0