Trajectory Arrays

In PyCFRL, a trajectory refers to the set of collected observed tuples \(\{(z_i, s_{i0}, a_{i0}, r_{i0}, s_{i1}, \dots, s_{i,T-1}, a_{i,T-1}, r_{i,T-1}, s_{iT}): i=1,\dots,N\}\) describing the sensitive attribute, state, action, and reward of each individual (or subject) at each time step. Each \(t=0,\dots,T\) is called a time step, and the observed tuple \((s_{it}, a_{it}, r_{it}, s_{i,t+1})\) is called the transition for individual \(i\) at time step \(t\). Let \(N\) be the total number of individuals and \(T\) be the total number of transitions of each individual.

This section introduces Trajectory Arrays, which is how trajectories are represented in PyCFRL. Any trajectory satisfying the data requirements can be represented by Trajectory Arrays. The trajectory inputs and outputs of PyCFRL functions and classes are all in the form of Trajectory Arrays. To convert trajectory data from a tabular format to Trajectory Arrays or from Trajectory Arrays to a tabular format, see Tabular Trajectory Data.

The sensitive attributes, states, actions, and rewards in a trajectory are represented by Trajectory Arrays of different formats, which are introduced below.

Sensitive Attributes Format

A Trajectory Array in the Sensitive Attribute Format is used to store the observed sensitive attributes of each individual in the trajectory. It is a 2D list or array with shape (N, zdim) where zdim is the number of components in the sensitive attribute vector. The (i, j)-th entry of the list or array represents the j-th component of the observed sensitive attribute of the i-th individual. Note that if the sensitive attribute is univariate, then a Trajectory Array in the Sensitive Attribute Format should have shape (N, 1) rather than (N,).

For example, consider a trajectory dataset with 3 individuals where the sensitive attribute is bivariate. Then the sensitive attributes of this trajectory can be represented in the Sensitive Attribute Format as

\(z_1^1\)

\(z_1^2\)

\(z_2^1\)

\(z_2^2\)

\(z_3^1\)

\(z_3^2\)

Single-time States Format

A Trajectory Array in the Single-time States Attribute Format is used to store the state of each individual in the trajectory at a single time step. It is a 2D list or array with shape (N, xdim) where xdim is the number of components in the state vector. The (i, j)-th entry of the list or array represents the j-th component of the state variable of the i-th individual at the given time step. Note that if the state vector is univariate, then a Trajectory Array in the Single-time States Format should have shape (N, 1) rather than (N,).

For example, consider a trajectory dataset with 3 individuals where the state variable is bivariate. Then the states of this trajectory at some time step \(t\) can be represented in the Single-time States Format as

\(x_{1t}^1\)

\(x_{1t}^2\)

\(x_{2t}^1\)

\(x_{2t}^2\)

\(x_{3t}^1\)

\(x_{3t}^2\)

Full-trajectory States Format

A Trajectory Array in the Full-trajectory States Format is used to store the state of each individual in the trajectory at all time steps. It is a 3D list or array with shape (N, T+1, xdim) where xdim is the number of components in the state vector. The (i, j, k)-th entry of the list or array represents the k-th component of the state variable of the i-th individual at the j-th time step. Note that if the state vector is univariate, then a Trajectory Array in the Single-time States Format should have shape (N, T+1, 1) rather than (N, T+1).

For example, consider a trajectory dataset with 3 individuals and 3 transitions where the state variable is bivariate. Then the states of this trajectory at all time steps can be represented in the Full-trajectory States Format as

\([x_{10}^1, x_{10}^2]\)

\([x_{11}^1, x_{11}^2]\)

\([x_{12}^1, x_{12}^2]\)

\([x_{13}^1, x_{13}^2]\)

\([x_{20}^1, x_{20}^2]\)

\([x_{21}^1, x_{21}^2]\)

\([x_{22}^1, x_{22}^2]\)

\([x_{23}^1, x_{23}^2]\)

\([x_{30}^1, x_{30}^2]\)

\([x_{31}^1, x_{31}^2]\)

\([x_{32}^1, x_{32}^2]\)

\([x_{33}^1, x_{33}^2]\)

Single-time Actions Format

A Trajectory Array in the Single-time Actions Format is used to store the action of each individual in the trajectory at a single time step. It is a 1D list or array with shape (N,). The i-th entry of the list or array represents action of the i-th individual at the given time step.

For example, consider a trajectory dataset with 3 individuals. Then the actions of this trajectory at some time step \(t\) can be represented in the Single-time Actions Format as

\(a_{1t}\)

\(a_{2t}\)

\(a_{3t}\)

Full-trajectory Actions Format

A Trajectory Array in the Full-trajectory Actions Format is used to store the action of each individual in the trajectory at all time steps. It is a 2D list or array with shape (N, T). The (i, j)-th entry of the list or array represents the action of the i-th individual at the j-th time step.

For example, consider a trajectory dataset with 3 individuals and 3 transitions. Then the actions of this trajectory at all time steps can be represented in the Full-trajectory Actions Format as

\(a_{10}\)

\(a_{11}\)

\(a_{12}\)

\(a_{20}\)

\(a_{21}\)

\(a_{22}\)

\(a_{30}\)

\(a_{31}\)

\(a_{32}\)

Single-time Rewards Format

A Trajectory Array in the Single-time Rewards Format is used to store the reward of each individual in the trajectory at a single time step. It is a 1D list or array with shape (N,). The i-th entry of the list or array represents reward of the i-th individual at the given time step.

For example, consider a trajectory dataset with 3 individuals. Then the rewards of this trajectory at some time step \(t\) can be represented in the Single-time Rewards Format as

\(r_{1t}\)

\(r_{2t}\)

\(r_{3t}\)

Full-trajectory Rewards Format

A Trajectory Array in the Full-trajectory Rewards Format is used to store the reward of each individual in the trajectory at all time steps. It is a 2D list or array with shape (N, T). The (i, j)-th entry of the list or array represents the reward of the i-th individual at the j-th time step.

For example, consider a trajectory dataset with 3 individuals and 3 transitions. Then the rewards of this trajectory at all time steps can be represented in the Full-trajectory Rewards Format as

\(r_{10}\)

\(r_{11}\)

\(r_{12}\)

\(r_{20}\)

\(r_{21}\)

\(r_{22}\)

\(r_{30}\)

\(r_{31}\)

\(r_{32}\)