dualbounds.dist_reg.CtsDistReg

class dualbounds.dist_reg.CtsDistReg(model_type: str | BaseEstimator = 'ridge', how_transform: str = 'interactions', eps_dist: str = 'empirical', eps_kwargs: dict | None = None, heterosked_model: str = 'none', heterosked_kwargs: dict | None = None, **model_kwargs)[source]

Distributional regression for continuous outcomes.

Parameters:
model_type : str or sklearn class

Str specifying a sklearn model class to use; options include ‘ridge’, ‘lasso’, ‘elasticnet’, ‘randomforest’, ‘knn’. One can also directly pass an sklearn class, e.g., model_type=sklearn.ensemble.KNeighborsRegressor.

how_transform : str

Str specifying how to transform the features before fitting the underlying model. One of several options:

  • ’identity’: does not transform the features

  • ’intercept’: adds an intercept

  • ’interactions’ : adds treatment-covariate interactions

The default is interactions.

eps_dist : str

Str specifying the distribution of the residuals. Options include [‘empirical’, gaussian’, ‘laplace’, ‘expon’, ‘tdist’, ‘skewnorm’]. Defaults to empirical, which uses the empirical law of the residuals of the training data.

eps_kwargs : dict

kwargs to utilities.parse_dist for the residual scipy distribution

heterosked_model : str or sklearn class

Str specifying a sklearn model class to use to estimate Var(Y | X) as a function of X. Options are the same as model_type. Defaults to heterosked_model=None, in which case homoskedasticity is assumed (although the final bounds will still be valid in the presence of heteroskedasticity).

heterosked_kwargs : dict

kwargs for the heterosked model. E.g., if heterosked_model=knn, heterosked_kwargs could include n_neighbors.

**model_kwargs : dict

kwargs for sklearn base model. E.g., if model_type=knn, model_kwargs could include n_neighbors.

Examples

Here we instantiate a model which assumes Gaussianity, uses a ridge to make predictions and a lasso to estimate the heteroskedasticity pattern:

import numpy as np
import dualbounds
import sklearn.linear_model

# Instantiate dist_reg
cdreg = dualbounds.dist_reg.CtsDistReg(
        # Arguments for main model
        model_type=sklearn.linear_model.RidgeCV,
        fit_intercept=True,
        gcv_mode='auto',
        # How to estimate the law of the residuals
        eps_dist='gaussian',
        # How to estimate Var(Y | X)
        heterosked_model=sklearn.linear_model.LassoCV,
        heterosked_kwargs=dict(cv=5),
)

# Fit
n, p = 300, 20
W = np.random.binomial(1, 0.5, n)
X = np.random.randn(n, p)
y = np.random.randn(n)
cdreg.fit(W=W, X=X, y=y)

# Predict on new X
m = 10
Xnew = np.random.randn(m, p)
y0_preds = cdreg.predict(X=Xnew, W=np.zeros(m))

Methods

feature_transform(W, X[, Z])

Transforms the features before feeding them to the base model.

features_to_WX(features)

Inverse of feature_transform.

fit(W, X, y[, Z, sample_weight])

Fits model on the data.

predict(X, W[, Z])

Predicts the conditional law of the outcome.

predict_counterfactuals(X)

Predicts counterfactual distributions of Y (outcome).