Support restrictions¶

Main idea¶

dualbounds also allows analysts to restrict the support of \(Y(1), Y(0), X\) to yield sharper partial identification bounds. That said, restricting the support of \(Y(1), Y(0), X\) is a real assumption—if the assumption is false, the final bounds will not be valid.

In particular, the user can provide a boolean-valued function \(s(Y(1), Y(0), X) \in \{0,1\}\), where the support of \(Y(1), Y(0), X\) is assumed to be \(\{y_1, y_0, x : s(y_1, y_0, x) = 1\}\), i.e., the set of values such that \(s\) evaluates to True. Below, we give some examples of support restrictions.

[1]:

# No support restriction: this does not make any assumptions
s1 = lambda y0, y1, x: True
# Assume y0 <= y1 holds a.s.
s2 = lambda y0, y1, x: y0 <= y1
# Assume y0 <= y1 whenever x[0] >= 0
s3 = lambda y0, y1, x: (y0 <= y1) | (x[0] < 0)

Passing this function to a DualBounds or DeltaDualBounds object using the support_restriction argument will yield bounds that incorporate this structural assumption. For example, below, we show how to bound the variance \(\text{Var}(Y(1) - Y(0))\) under the assumption that \(Y(0) \le Y(1)\).

Note: the correct argument to use is the support_restriction argument, not the support argument (which is used to specify the marginal support of \(Y\)).

[2]:

# Import packages
import sys; sys.path.insert(0, "../../../")
import dualbounds as db
from dualbounds.varite import VarITEDualBounds

# Generate synthetic data from a linear model
data = db.gen_data.gen_regression_data(
    n=500, p=30, interactions=False, tau=1, sample_seed=123
)

# Common arguments
db_args = dict(
    outcome=data['y'],
    treatment=data['W'],
    covariates=data['X'],
    propensities=data['pis'],
    how_transform='identity',
    eps_dist='gaussian',
)
# Fit assumption-free dual bounds
vdb = VarITEDualBounds(**db_args).fit(verbose=False)
# Fit dual bounds assuming Y(0) <= Y(1)
vdb_monotone = VarITEDualBounds(
    **db_args,
    support_restriction=lambda y0, y1, x: y0 <= y1
).fit(verbose=True, ninterp=0, grid_size=0)

Cross-fitting the outcome model.

Estimating optimal dual variables.

[3]:

print("The assumption-free results are:")
print(vdb.results().to_markdown())
print("The results assuming monotonicity are:")
print(vdb_monotone.results().to_markdown())

The assumption-free results are:
|            |     Lower |   Upper |
|:-----------|----------:|--------:|
| Estimate   | 0         | 4.29261 |
| SE         | 0.0179071 | 0.26683 |
| Conf. Int. | 0         | 4.81558 |
The results assuming monotonicity are:
|            |      Lower |    Upper |
|:-----------|-----------:|---------:|
| Estimate   | 0          | 2.5705   |
| SE         | 0.00829492 | 0.205357 |
| Conf. Int. | 0          | 2.97299  |

Best practices and common problems¶

Ensuring the outcome model is compatible¶

It is important that the estimated outcome model is compatible with any assumed support restriction. For example, consider the following scenario:

You would like to compute bounds which incorporate the monotonicity assumption \(Y(0) \le Y(1)\)
Your outcome model predicts that the conditional average treatment effect \(E[Y(1) - Y(0) \mid X]\) is negative for certain \(X\).

Here, the estimated outcome model is incompatible with the monotonicity assumption. Note that this can happen even when the monotonicity assumption \(Y(0) \le Y(1)\) is accurate, e.g., because the outcome model has overfit. Mathematically, this will yield completely vacuous bounds (i.e. a bound from \(-\infty\) to \(\infty\)).

Incompatible outcome models will not cause errors—instead, dualbounds will automatically try to force the incompatible outcome model to become compatible with the support restriction. However, this has two consequences:

Computation speed: Forcing the outcome model to be compatible with the support restriction can be slow.
Numerical instability: This procedure can also be numerically unstable, leading to large standard errors and loose bounds.

Thus, although it is not strictly necessary, the best solution is to ensure the outcome model is compatible with the support restriction.

For example, the sklearn HistGradientBoostingRegressor has an argument (monotonic_cst) which can be used to guarantee that \(E[Y(1) - Y(0) \mid X] > 0\).
For bespoke support restrictions, we suggest that analysts implement custom outcome models wrapping the dist_reg.DistReg class.

Very important note: If you think your outcome model should be compatible but you are still getting numerical errors, try setting ninterp=0 and grid_size=0 when calling the DualBounds.fit() method. These technical arguments (described in the documentation to DualBounds.compute_dual_variables()) are used to ensure validity even when the data are very heavy tailed—however, they can sometimes cause a compatible outcome model to become incompatible.

Large standard errors and numerical problems¶

If you cannot create a compatible outcome model, you may (or may not) have numerical problems and large standard errors. However, dualbounds has a few ways to address this problem:

Try setting ninterp=0 and grid_size=0 when calling the .fit() method.
Try increasing the value of nvals0 and nvals1 when calling the .fit() method.
Try changing the interp_fn input when calling .fit().
Try setting dual_strategy='se' when calling the .fit() method.
If the outcome variable is heavy-tailed, try transforming it to make it lighter tailed, e.g., by using a np.arcsinh transformation. This won’t necessarily change the estimand since one can just undo this transformation when specifying the estimand in the DualBounds class.

Summary¶

In sum, incorporating support restrictions can substantially sharpen partial identification bounds. However, for optimal statistical and computational performance, we recommend the following:

Try to ensure that the outcome model is compatible with the support restriction.
Always inspect the diagnostic results (via the .diagnostics() method) to see if there are major numerical problems. If so, see the section on “large standard errors and numerical problems.”