dualbounds.gen_data.gen_regression_data¶
-
dualbounds.gen_data.gen_regression_data(n: int, p: int, lmda_dist: str =
'constant', eps_dist: str ='gaussian', heterosked: str ='constant', tauv: float =1, r2: float =0.95, sparsity: float =0, interactions: bool =True, tau: float =3, betaW_norm: float =0, covmethod: str ='identity', dgp_seed: int =1, sample_seed: int | None =None)[source]¶ Samples a synthetic regression dataset.
- Parameters:¶
- n : int¶
Number of observations.
- p : int¶
Number of covariates
- lmda_dist : str¶
str specifying the distribution of lmdai, where Xi = lmdai * N(0, Sigma), so the covariates are elliptically distributed.
- eps_dist : str¶
str specifying the distribution of the residuals. See
utilities.parse_distfor the list of options.- heterosked : str¶
str specifying the type of heteroskedasticity. Defaults to
constant.- tauv : float¶
Ratio of Var(Y(1) | X) / Var(Y(0) | X)
- r2 : float¶
Population r^2 of 1 - E[Var(Y | X)] / Var(Y).
- sparsity : float¶
Proportion of covariates with zero coefficients. Defaults to zero (no sparsity).
- interactions : bool¶
If True (default), Y = X beta + W * X * beta_int + epsilon. Else, the interactions between the treatment and the covariates are ommitted.
- tau : float¶
Average treatment effect.
- betaW_norm : float¶
E[W | X] = logistic(X @ betaW). This parameter controls the norm of betaW and thus the variance of the propensity scores.
- covmethod : str¶
str identifier for how to generate the covariance matrix.
- dgp_seed : int¶
Random seed for the data-generating parameters.
- sample_seed : int¶
Random seed for the randomness from sampling.
- Returns:¶
data – Dictionary with keys
X(covariates),y(response),W(treatment),pis(true propensity scores),beta,beta_int,betaW, and more.- Return type:¶
dict