dualbounds.generic.DualBounds¶
-
class dualbounds.generic.DualBounds(f: callable, outcome: array | Series, treatment: array | Series, covariates: array | DataFrame | None =
None, propensities: array | Series | None =None, clusters: array | Series | None =None, outcome_model: str | DistReg | list ='ridge', propensity_model: str | BaseEstimator ='ridge', model_selector: ModelSelector | None =None, discrete: array | None =None, support: array | None =None, support_restriction: callable | None =None, **model_kwargs)[source]¶ Computes dual bounds on \(E[f(Y(0),Y(1), X)].\)
Here, \(X\) are covariates and \(Y(0), Y(1)\) are potential outcomes.
- Parameters:¶
- f : function¶
Function which defines the partially identified estimand. Must be a function of three arguments: y0, y1, x (in that order). E.g.,
f = lambda y0, y1, x : y0 <= y1- outcome : np.array | pd.Series¶
n-length array of outcome measurements (Y).
- treatment : np.array | pd.Series¶
n-length array of binary treatment (W).
- covariates : np.array | pd.Series¶
(n, p)-shaped array of covariates (X).
- propensities : np.array | pd.Series¶
n-length array of propensity scores \(P(W=1 | X)\). If
None, will be estimated from the data.- clusters : np.array | pd.Series¶
Optional n-length array of clusters, so
clusters[i] = jindicates that observation i is in cluster j.- outcome_model : str | dist_reg.DistReg | list¶
The model for estimating the law of \(Y | X, W\). Three options:
A str identifier, e.g., ‘ridge’, ‘lasso’, ‘elasticnet’, ‘randomforest’, ‘knn’.
An object inheriting from
dist_reg.DistReg.A list of
dist_reg.DistRegobjects to automatically choose between.
E.g., when
outcomeis continuous, the default isoutcome_model=dist_reg.CtsDistReg(model_type='ridge').- propensity_model : str | sklearn classifier¶
How to estimate the propensity scores if they are not provided. Two options:
A str identifier, e.g., ‘ridge’, ‘lasso’, ‘elasticnet’, ‘randomforest’, ‘knn’.
An sklearn classifier, e.g.,
sklearn.linear_model.LogisticRegressionCV().
- model_selector : dist_reg.ModelSelector¶
A ModelSelector object which can choose between several outcome models. The default performs within-fold nested cross-validation. Note: this argument is ignored unless
outcome_modelis a list.- discrete : bool¶
If True, treats the outcome as a discrete variable. Defaults to
None(inferred from the data).- support : np.array¶
Optional support of the outcome, if known and discrete. Defaults to
None(inferred from the data).- support_restriction : function¶
Boolean-valued function of y0, y1, x where
support_restriction(y0, y1, x) = Falseasserts that y0, y1, x is not in the support of \(Y(0), Y(1), X\). Defaults toNone(no a-priori support restrictions). See the user guide for important usage tips.- model_kwargs : dict¶
Additional kwargs for the
outcome_model, e.g.,feature_transform. Seedualbounds.dist_reg.CtsDistRegordualbounds.dist_reg.BinaryDistRegfor more kwargs.
Notes
DualBoundswill do limited preprocessing to (e.g.) create dummies for discrete covariates. However, we recommended doing custom preprocessing for optimal results.Examples
Here we fit DualBounds on \(P(Y(0) < Y(1))\) based on synthetic regression data:
import dualbounds as db # Generate synthetic data data = db.gen_data.gen_regression_data(n=900, p=30) # Initialize dual bounds object dbnd = db.generic.DualBounds( f=lambda y0, y1, x: y0 < y1, covariates=data['X'], treatment=data['W'], outcome=data['y'], propensities=data['pis'], outcome_model='ridge', ) # Compute dual bounds and observe output dbnd.fit(alpha=0.05).summary()Methods
compute_dual_variables([y0_dists, y0_vals, ...])Estimates dual variables using the outcome model.
cross_fit([nfolds, suppress_warning, ...])Cross-fits the outcome model.
diagnostics([plot, aipw])Reports a set of technical diagnostics.
Thinly wraps
dist_reg._evaluate_model_predictions.Thinly wraps
dist_reg._evaluate_model_predictions.fit([nfolds, aipw, alpha, y0_dists, ...])Main function which (1) performs cross-fitting, (2) computes optimal dual variables, and (3) computes final dual bounds.
fit_propensity_scores(nfolds[, clip, verbose])Cross-fits the propensity scores.
plot_dual_variables([i])Plots the estimated dual variables for the ith data-point.
results([minval, maxval])Returns a dataframe of key inferential results.
summary([minval, maxval])Prints a summary of main results from the class.