dualbounds.generic.DualBounds¶

Computes dual bounds on \(E[f(Y(0),Y(1), X)].\)

Here, \(X\) are covariates and \(Y(0), Y(1)\) are potential outcomes.

Parameters:¶

f : function¶

Function which defines the partially identified estimand. Must be a function of three arguments: y0, y1, x (in that order). E.g., f = lambda y0, y1, x : y0 <= y1

outcome : np.array | pd.Series¶

n-length array of outcome measurements (Y).

treatment : np.array | pd.Series¶

n-length array of binary treatment (W).

covariates : np.array | pd.Series¶

(n, p)-shaped array of covariates (X).

propensities : np.array | pd.Series¶

n-length array of propensity scores \(P(W=1 | X)\). If None, will be estimated from the data.

clusters : np.array | pd.Series¶

Optional n-length array of clusters, so clusters[i] = j indicates that observation i is in cluster j.

outcome_model : str | dist_reg.DistReg | list¶

The model for estimating the law of \(Y | X, W\). Three options:

A str identifier, e.g., ‘ridge’, ‘lasso’, ‘elasticnet’, ‘randomforest’, ‘knn’.
An object inheriting from dist_reg.DistReg.
A list of dist_reg.DistReg objects to automatically choose between.

E.g., when outcome is continuous, the default is outcome_model=dist_reg.CtsDistReg(model_type='ridge').

propensity_model : str | sklearn classifier¶

How to estimate the propensity scores if they are not provided. Two options:

A str identifier, e.g., ‘ridge’, ‘lasso’, ‘elasticnet’, ‘randomforest’, ‘knn’.
An sklearn classifier, e.g., sklearn.linear_model.LogisticRegressionCV().

model_selector : dist_reg.ModelSelector¶

A ModelSelector object which can choose between several outcome models. The default performs within-fold nested cross-validation. Note: this argument is ignored unless outcome_model is a list.

discrete : bool¶

If True, treats the outcome as a discrete variable. Defaults to None (inferred from the data).

support : np.array¶

Optional support of the outcome, if known and discrete. Defaults to None (inferred from the data).

support_restriction : function¶

Boolean-valued function of y0, y1, x where support_restriction(y0, y1, x) = False asserts that y0, y1, x is not in the support of \(Y(0), Y(1), X\). Defaults to None (no a-priori support restrictions). See the user guide for important usage tips.

model_kwargs : dict¶

Additional kwargs for the outcome_model, e.g., feature_transform. See dualbounds.dist_reg.CtsDistReg or dualbounds.dist_reg.BinaryDistReg for more kwargs.

Notes

DualBounds will do limited preprocessing to (e.g.) create dummies for discrete covariates. However, we recommended doing custom preprocessing for optimal results.

Examples

Here we fit DualBounds on \(P(Y(0) < Y(1))\) based on synthetic regression data:

import dualbounds as db

# Generate synthetic data
data = db.gen_data.gen_regression_data(n=900, p=30)

# Initialize dual bounds object
dbnd = db.generic.DualBounds(
        f=lambda y0, y1, x: y0 < y1,
        covariates=data['X'],
        treatment=data['W'],
        outcome=data['y'],
        propensities=data['pis'],
        outcome_model='ridge',
)

# Compute dual bounds and observe output
dbnd.fit(alpha=0.05).summary()

Methods

`compute_dual_variables`([y0_dists, y0_vals, ...])	Estimates dual variables using the outcome model.
`cross_fit`([nfolds, suppress_warning, ...])	Cross-fits the outcome model.
`diagnostics`([plot, aipw])	Reports a set of technical diagnostics.
`eval_outcome_model`()	Thinly wraps `dist_reg._evaluate_model_predictions`.
`eval_treatment_model`()	Thinly wraps `dist_reg._evaluate_model_predictions`.
`fit`([nfolds, aipw, alpha, y0_dists, ...])	Main function which (1) performs cross-fitting, (2) computes optimal dual variables, and (3) computes final dual bounds.
`fit_propensity_scores`(nfolds[, clip, verbose])	Cross-fits the propensity scores.
`plot_dual_variables`([i])	Plots the estimated dual variables for the ith data-point.
`results`([minval, maxval])	Returns a dataframe of key inferential results.
`summary`([minval, maxval])	Prints a summary of main results from the class.