skbonus.linear_model package¶

Module contents¶

Special linear regressors.

class skbonus.linear_model.ImbalancedLinearRegression(alpha: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, copy_X: bool = True, positive: bool = False, overestimation_punishment_factor: float = 1.0)¶

Bases: skbonus.linear_model._scipy_regressors.BaseScipyMinimizeRegressor

Linear regression where overestimating is overestimation_punishment_factor times worse than underestimating.

A value of overestimation_punishment_factor=5 implies that overestimations by the model are penalized with a factor of 5 while underestimations have a default factor of 1. The formula optimized for is

1 / (2 * n_samples) * switch * ||y - Xw||_2 ** 2 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||_2 ** 2

where switch is a vector with value overestimation_punishment_factor if y - Xw < 0, else 1.

ImbalancedLinearRegression fits a linear model to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Compared to normal linear regression, this approach allows for a different treatment of over or under estimations.

Parameters

alpha (float, default=0.0) – Constant that multiplies the penalty terms.
l1_ratio (float, default=0.0) – The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
positive (bool, default=False) – When set to True, forces the coefficients to be positive.
overestimation_punishment_factor (float, default=1) – Factor to punish overestimations more (if the value is larger than 1) or less (if the value is between 0 and 1).

coef_¶

Estimated coefficients of the model.

Type: np.ndarray of shape (n_features,)

intercept_¶

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type: float

Notes

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4]) + 2*np.random.randn(100)
>>> over_bad = ImbalancedLinearRegression(overestimation_punishment_factor=50).fit(X, y)
>>> over_bad.coef_
array([0.36267036, 1.39526844, 3.4247146 , 3.93679175])

>>> under_bad = ImbalancedLinearRegression(overestimation_punishment_factor=0.01).fit(X, y)
>>> under_bad.coef_
array([0.73519586, 1.28698197, 2.61362614, 4.35989806])

class skbonus.linear_model.LADRegression(alpha: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, copy_X: bool = True, positive: bool = False)¶

Bases: skbonus.linear_model._scipy_regressors.BaseScipyMinimizeRegressor

Least absolute deviation Regression.

LADRegression fits a linear model to minimize the residual sum of absolute deviations between the observed targets in the dataset, and the targets predicted by the linear approximation, i.e.

1 / (2 * n_samples) * ||y - Xw||_1 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||_2 ** 2

Compared to linear regression, this approach is robust to outliers. You can even optimize for the lowest MAPE (Mean Average Percentage Error), if you pass in np.abs(1/y_train) for the sample_weight keyword when fitting the regressor.

Parameters

alpha (float, default=0.0) – Constant that multiplies the penalty terms.
l1_ratio (float, default=0.0) – The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
positive (bool, default=False) – When set to True, forces the coefficients to be positive.

coef_¶

Estimated coefficients of the model.

Type: np.ndarray of shape (n_features,)

intercept_¶

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type: float

Notes

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4])
>>> l = LADRegression().fit(X, y)
>>> l.coef_
array([1., 2., 3., 4.])

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([-1, 2, -3, 4])
>>> l = LADRegression(positive=True).fit(X, y)
>>> l.coef_
array([8.44480086e-17, 1.42423304e+00, 1.97135192e-16, 4.29789588e+00])

class skbonus.linear_model.QuantileRegression(alpha: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, copy_X: bool = True, positive: bool = False, quantile: float = 0.5)¶

Bases: skbonus.linear_model._scipy_regressors.BaseScipyMinimizeRegressor

Compute Quantile Regression. This can be used for computing confidence intervals of linear regressions.

QuantileRegression fits a linear model to minimize a weighted residual sum of absolute deviations between the observed targets in the dataset and the targets predicted by the linear approximation, i.e.

1 / (2 * n_samples) * switch * ||y - Xw||_1 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||_2 ** 2

where switch is a vector with value quantile if y - Xw < 0, else 1 - quantile. The regressor defaults to LADRegression for its default value of quantile=0.5.

Compared to linear regression, this approach is robust to outliers.

Parameters

alpha (float, default=0.0) – Constant that multiplies the penalty terms.
l1_ratio (float, default=0.0) – The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.
positive (bool, default=False) – When set to True, forces the coefficients to be positive.
quantile (float, between 0 and 1, default=0.5) – The line output by the model will have a share of approximately quantile data points under it. A value of quantile=1 outputs a line that is above each data point, for example. quantile=0.5 corresponds to LADRegression.

coef_¶

Estimated coefficients of the model.

Type: np.ndarray of shape (n_features,)

intercept_¶

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type: float

Notes

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4])
>>> l = QuantileRegression().fit(X, y)
>>> l.coef_
array([1., 2., 3., 4.])

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([-1, 2, -3, 4])
>>> l = QuantileRegression(quantile=0.8).fit(X, y)
>>> l.coef_
array([-1.,  2., -3.,  4.])

fit(X: numpy.ndarray, y: numpy.ndarray, sample_weight: Optional[numpy.ndarray] = None) → skbonus.linear_model._scipy_regressors.QuantileRegression¶

Fit the model using the SLSQP algorithm.

Parameters

X (np.ndarray of shape (n_samples, n_features)) – The training data.
y (np.ndarray, 1-dimensional) – The target values.
sample_weight (Optional[np.ndarray], default=None) – Individual weights for each sample.

Returns

Return type

Fitted regressor.