This PR fixes some minor issues that I found in the main.py script. Namely:
1. The first observation was always all zeroes when using a generic agent. This is because the `update_environment_obs()` method is not called automatically and is only called by `env.reset()`.
2. The config yaml is never closed as the close function of the file reader was only referenced but never called.
Related work items: #1441
Check out the linked bug ticket to understand the issue.
The fix was very simple- just changing which variable is passed to the reward calculation funciton.
Related work items: #1442
**Summary:**
This adds support for the MultiDiscrete observation spaces, the same as what exists in the ADSP branch. The observation space is now configurable in the same way as the action space- by selecting a config item within the laydown config yaml.
The 'box' option has the same behaviour as before.
**Test Process:**
I added two integration tests to ensure that creating the environment is possible with both types of observation space. I also checked that all existing unit tests run fine as long as I update the observation space in the yaml to box.
**Other comments:**
I also updated the documentation relating to observation spaces, please check if the explanation makes sense.
Related work items: #1463
I wanted to add this pull request template just as a checklist for everyone to ensure they add tests and update documentation.
Do you think it's necessary? Feel free to discuss in the comments of this PR or accept/reject the suggestion.
Related work items: #1467
In reward.py, the comparisons for the IF statements used when assigning config_values reward values currently compares the initial state to the reference state. However, it should be comparing the reference state (What it should be without any blue/red agent interference) and the final state (state after red and blue actions have taken affect).
Change the IF statement logic to say if `reference_node_os_state` and then in the following IF statement if `final_node_os_state` to compare it.
Do this for all reward functions
Write tests to evaluate step rewards
Related work items: #1443