Custom Agents

In addition to FQI, users can also define custom decision-making agents and use them in PyCFRL. A custom decision-agent often represents a pre-specified decision rule (e.g. making decisions randomly) or implements another reinforcement learning algorithm not provided by PyCFRL.

To ensure a custom agent is compatible with PyCFRL, it must inherit from the Agent class provided by the agents module. That is,

The custom agent should be a subclass of Agent.
The custom preprocessor should have an act() method whose function name, parameter names, parameter data types, parameter default values, and return type are exactly as that defined in the Agent class, except that it might have some additional arguments. The input and output lists or arrays should also follow the same Trajectory Array format or have the same shape as those defined in Preprocessor.

For example, though simple, the following RandomAgent is a valid custom agent that will be compatible with PyCFRL.

class RandomAgent(Agent):
    def __init__(self, p: int | float = 0.5) -> None:
        self.p = p

    def act(
        self,
        z: list | np.ndarray,
        xt: list | np.ndarray,
        xtm1: list | np.ndarray | None = None,
        atm1: list | np.ndarray | None = None,
        uat: list | np.ndarray | None = None,
        verbose: bool = False
    ) -> np.ndarray:
        if verbose:
            print("RandomAgent taking actions...")
        N = np.array(z).shape[0]
        u = np.random.uniform(0, 1, size=N)
        actions = (u < p).astype(int)
        return actions

On the other hand, the following agent will not be compatible with PyCFRL because its act() does not have uat in its argument list. uat should be included in the argument list here to ensure compatibility even though it is not used in the function.

class RandomAgent(Agent):
    def __init__(self, p: int | float = 0.5) -> None:
        self.p = p

    def act(
        self,
        z: list | np.ndarray,
        xt: list | np.ndarray,
        xtm1: list | np.ndarray | None = None,
        atm1: list | np.ndarray | None = None,
        verbose: bool = False
    ) -> np.ndarray:
        if verbose:
            print("RandomAgent taking actions...")
        N = np.array(z).shape[0]
        u = np.random.uniform(0, 1, size=N)
        actions = (u < p).astype(int)
        return actions

If an agent is a valid custom agent, then it can be used wherever an FQI can be used. For example, we can use evaluate_fairness_through_model() to calculate its counterfactual fairness metric.

# Suppose e is a SimulatedEnvironment object that has already been trained.
# Also suppose zs, states, actions is a trajectory from the environment
# on which e is trained.
agent = RandomAgent(p=0.2)
evaluate_fairness_through_model(env=e,
                                zs=zs,
                                states=states,
                                actions=actions,
                                policy=agent)