Example Workflows
This section demonstrates some major workflows that can be performed by PyCFRL.
Preprocessing Only

In this workflow, PyCFRL takes in an offline trajectory and then preprocesses the offline trajectory
using SyntheticPreprocessor
. The final output of the workflow is the preprocessed (debiased)
offline trajectory. This workflow is appropriate when the user does not want to train policies using
PyCFRL. Instead, the user can take the preprocessed trajectory to train a counterfactually fair policy
using another reinforcement learning library or application that better fits their needs.
Code: A detailed code demonstration of this workflow can be found here.
Preprocessing + Policy Learning

In this workflow, PyCFRL takes in an offline trajectory and then preprocesses the offline trajectory
using SequentialPreprocessor
. After that, the preprocessed trajectory is passed into
FQI
to train a counterfactually fair policy, which is the final output of the workflow. This
workflow is appropriate if the user wants to train a policy using PyCFRL. The trained policy can be
further evaluated on its value and counterfactual fairness, which is discussed in detail in the
“Assessing Policies Using Real Data” workflow later in this section.
Code: A detailed code demonstration of this workflow can be found here.
Assessing Preprocessors Using Synthetic Data

In this workflow, PyCFRL first uses sample_trajectory()
to sample a trajectory from a
SyntheticEnvironment
whose transition rules are pre-specified. It then preprocesses the
sampled trajectory using some custom preprocessor defined by the user.
After that, the preprocessed trajectory is passed into FQI
to train a policy, which is then
assessed using synthetic data via evaluate_reward_through_simulation()
and
evaluate_fairness_through_simulation()
. The final output of the workflow is the policy trained
on the preprocessed data as well as its estimated value and counterfactual fairness metric. This
workflow is appropriate when the user wants to examine the impact of some trajectory preprocessing
method on the value and counterfactual fairness of the trained policy.
Code: A detailed code demonstration of this workflow can be found here.
Assessing Policies Using Real Data

In this workflow, PyCFRL takes in an offline trajectory and then preprocesses the offline trajectory
using SequentialPreprocessor
. After that, the preprocessed trajectory is passed into
FQI
to train a counterfactually fair policy, which is then assessed using
evaluate_reward_through_fqe()
and evaluate_fairness_through_model()
based on a
SimulatedEnvironment
that mimics the transition rules of the true environment underlying the
training trajectory. The final output of the workflow is the policy trained on the preprocessed data
as well as its estimated value and counterfactual fairness metric. This workflow is appropriate when the
user is interested in knowing the value and counterfactual fairness achieved by the trained policy when
interacting with the true underlying environment.
Code: A detailed code demonstration of this workflow can be found here.
Conceptual Explanation: A step-by-step conceptual explanation of this workflow can be found here.