Files
PrimAITE/docs/source/how_to_guides/extensible_rewards.rst

56 lines
2.2 KiB
ReStructuredText
Raw Normal View History

2024-10-31 14:42:26 +00:00
.. only:: comment
2025-01-03 11:22:17 +00:00
© Crown-owned copyright 2025, Defence Science and Technology Laboratory UK
2024-10-31 14:42:26 +00:00
2025-02-26 12:03:00 +00:00
.. _extensible_rewards:
2024-10-31 14:42:26 +00:00
Extensible Rewards
******************
2024-11-07 16:35:39 +00:00
Extensible Rewards differ from the previous reward mechanism used in PrimAITE v3.x as new reward
types can be added without requiring a change to the RewardFunction class in rewards.py (PrimAITE
core repository).
2024-10-31 14:42:26 +00:00
Changes to reward class structure.
==================================
Reward classes are inherited from AbstractReward (a sub-class of Pydantic's BaseModel).
2024-11-07 16:35:39 +00:00
Within the reward class there is a ConfigSchema class responsible for ensuring the config file data
is in the correct format. This also means there is little (if no) requirement for and `__init__`
method. The `.from_config` method is no longer required as it's inherited from `AbstractReward`.
Each class requires an discriminator string which is used by the ConfigSchema class to verify that it
2024-11-06 15:08:38 +00:00
hasn't previously been added to the registry.
2024-10-31 14:42:26 +00:00
2024-11-06 15:08:38 +00:00
Inheriting from `BaseModel` removes the need for an `__init__` method but means that object
2024-10-31 14:42:26 +00:00
attributes need to be passed by keyword.
2024-11-07 16:35:39 +00:00
To add a new reward class follow the example below. Note that the type attribute in the
`ConfigSchema` class should match the type used in the config file to define the reward.
2024-10-31 14:42:26 +00:00
2024-11-07 16:35:39 +00:00
.. code-block:: Python
2024-10-31 14:42:26 +00:00
2025-03-11 12:27:45 +00:00
class DatabaseFileIntegrity(AbstractReward, discriminator="database-file-integrity"):
"""Reward function component which rewards the agent for maintaining the integrity of a database file."""
2024-10-31 14:42:26 +00:00
2025-03-11 12:27:45 +00:00
config: "DatabaseFileIntegrity.ConfigSchema"
location_in_state: List[str] = [""]
reward: float = 0.0
2024-10-31 14:42:26 +00:00
2025-03-11 12:27:45 +00:00
class ConfigSchema(AbstractReward.ConfigSchema):
"""ConfigSchema for DatabaseFileIntegrity."""
2024-10-31 14:42:26 +00:00
2025-03-11 12:27:45 +00:00
type: str = "database-file-integrity"
node_hostname: str
folder_name: str
file_name: str
2024-10-31 14:42:26 +00:00
2025-03-11 12:27:45 +00:00
def calculate(self, state: Dict, last_action_response: "AgentHistoryItem") -> float:
"""Calculate the reward for the current state.
pass
2024-10-31 14:42:26 +00:00
Changes to YAML file.
=====================
There's no longer a need to provide a `dns_server` as an option in the simulation section
2024-11-06 15:08:38 +00:00
of the config file.