dualbounds.generic.DualBounds

class dualbounds.generic.DualBounds(f: callable, outcome: array | Series, treatment: array | Series, covariates: array | DataFrame | None = None, propensities: array | Series | None = None, clusters: array | Series | None = None, outcome_model: str | DistReg | list = 'ridge', propensity_model: str | BaseEstimator = 'ridge', model_selector: ModelSelector | None = None, discrete: array | None = None, support: array | None = None, support_restriction: callable | None = None, **model_kwargs)[source]

Computes dual bounds on \(E[f(Y(0),Y(1), X)].\)

Here, \(X\) are covariates and \(Y(0), Y(1)\) are potential outcomes.

Parameters:
f : function

Function which defines the partially identified estimand. Must be a function of three arguments: y0, y1, x (in that order). E.g., f = lambda y0, y1, x : y0 <= y1

outcome : np.array | pd.Series

n-length array of outcome measurements (Y).

treatment : np.array | pd.Series

n-length array of binary treatment (W).

covariates : np.array | pd.Series

(n, p)-shaped array of covariates (X).

propensities : np.array | pd.Series

n-length array of propensity scores \(P(W=1 | X)\). If None, will be estimated from the data.

clusters : np.array | pd.Series

Optional n-length array of clusters, so clusters[i] = j indicates that observation i is in cluster j.

outcome_model : str | dist_reg.DistReg | list

The model for estimating the law of \(Y | X, W\). Three options:

  • A str identifier, e.g., ‘ridge’, ‘lasso’, ‘elasticnet’, ‘randomforest’, ‘knn’.

  • An object inheriting from dist_reg.DistReg.

  • A list of dist_reg.DistReg objects to automatically choose between.

E.g., when outcome is continuous, the default is outcome_model=dist_reg.CtsDistReg(model_type='ridge').

propensity_model : str | sklearn classifier

How to estimate the propensity scores if they are not provided. Two options:

  • A str identifier, e.g., ‘ridge’, ‘lasso’, ‘elasticnet’, ‘randomforest’, ‘knn’.

  • An sklearn classifier, e.g., sklearn.linear_model.LogisticRegressionCV().

model_selector : dist_reg.ModelSelector

A ModelSelector object which can choose between several outcome models. The default performs within-fold nested cross-validation. Note: this argument is ignored unless outcome_model is a list.

discrete : bool

If True, treats the outcome as a discrete variable. Defaults to None (inferred from the data).

support : np.array

Optional support of the outcome, if known and discrete. Defaults to None (inferred from the data).

support_restriction : function

Boolean-valued function of y0, y1, x where support_restriction(y0, y1, x) = False asserts that y0, y1, x is not in the support of \(Y(0), Y(1), X\). Defaults to None (no a-priori support restrictions). See the user guide for important usage tips.

model_kwargs : dict

Additional kwargs for the outcome_model, e.g., feature_transform. See dualbounds.dist_reg.CtsDistReg or dualbounds.dist_reg.BinaryDistReg for more kwargs.

Notes

DualBounds will do limited preprocessing to (e.g.) create dummies for discrete covariates. However, we recommended doing custom preprocessing for optimal results.

Examples

Here we fit DualBounds on \(P(Y(0) < Y(1))\) based on synthetic regression data:

import dualbounds as db

# Generate synthetic data
data = db.gen_data.gen_regression_data(n=900, p=30)

# Initialize dual bounds object
dbnd = db.generic.DualBounds(
        f=lambda y0, y1, x: y0 < y1,
        covariates=data['X'],
        treatment=data['W'],
        outcome=data['y'],
        propensities=data['pis'],
        outcome_model='ridge',
)

# Compute dual bounds and observe output
dbnd.fit(alpha=0.05).summary()

Methods

compute_dual_variables([y0_dists, y0_vals, ...])

Estimates dual variables using the outcome model.

cross_fit([nfolds, suppress_warning, ...])

Cross-fits the outcome model.

diagnostics([plot, aipw])

Reports a set of technical diagnostics.

eval_outcome_model()

Thinly wraps dist_reg._evaluate_model_predictions.

eval_treatment_model()

Thinly wraps dist_reg._evaluate_model_predictions.

fit([nfolds, aipw, alpha, y0_dists, ...])

Main function which (1) performs cross-fitting, (2) computes optimal dual variables, and (3) computes final dual bounds.

fit_propensity_scores(nfolds[, clip, verbose])

Cross-fits the propensity scores.

plot_dual_variables([i])

Plots the estimated dual variables for the ith data-point.

results([minval, maxval])

Returns a dataframe of key inferential results.

summary([minval, maxval])

Prints a summary of main results from the class.