skbonus.linear_model package

Module contents

Special linear regressors.

class skbonus.linear_model.ImbalancedLinearRegression(alpha: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, copy_X: bool = True, positive: bool = False, overestimation_punishment_factor: float = 1.0)

Bases: skbonus.linear_model._scipy_regressors.BaseScipyMinimizeRegressor

Linear regression where overestimating is overestimation_punishment_factor times worse than underestimating.

A value of overestimation_punishment_factor=5 implies that overestimations by the model are penalized with a factor of 5 while underestimations have a default factor of 1. The formula optimized for is

1 / (2 * n_samples) * switch * ||y - Xw||_2 ** 2 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||_2 ** 2

where switch is a vector with value overestimation_punishment_factor if y - Xw < 0, else 1.

ImbalancedLinearRegression fits a linear model to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Compared to normal linear regression, this approach allows for a different treatment of over or under estimations.

Parameters
  • alpha (float, default=0.0) – Constant that multiplies the penalty terms.

  • l1_ratio (float, default=0.0) – The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

  • copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.

  • positive (bool, default=False) – When set to True, forces the coefficients to be positive.

  • overestimation_punishment_factor (float, default=1) – Factor to punish overestimations more (if the value is larger than 1) or less (if the value is between 0 and 1).

coef_

Estimated coefficients of the model.

Type

np.ndarray of shape (n_features,)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type

float

Notes

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4]) + 2*np.random.randn(100)
>>> over_bad = ImbalancedLinearRegression(overestimation_punishment_factor=50).fit(X, y)
>>> over_bad.coef_
array([0.36267036, 1.39526844, 3.4247146 , 3.93679175])
>>> under_bad = ImbalancedLinearRegression(overestimation_punishment_factor=0.01).fit(X, y)
>>> under_bad.coef_
array([0.73519586, 1.28698197, 2.61362614, 4.35989806])
class skbonus.linear_model.LADRegression(alpha: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, copy_X: bool = True, positive: bool = False)

Bases: skbonus.linear_model._scipy_regressors.BaseScipyMinimizeRegressor

Least absolute deviation Regression.

LADRegression fits a linear model to minimize the residual sum of absolute deviations between the observed targets in the dataset, and the targets predicted by the linear approximation, i.e.

1 / (2 * n_samples) * ||y - Xw||_1 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||_2 ** 2

Compared to linear regression, this approach is robust to outliers. You can even optimize for the lowest MAPE (Mean Average Percentage Error), if you pass in np.abs(1/y_train) for the sample_weight keyword when fitting the regressor.

Parameters
  • alpha (float, default=0.0) – Constant that multiplies the penalty terms.

  • l1_ratio (float, default=0.0) – The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

  • copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.

  • positive (bool, default=False) – When set to True, forces the coefficients to be positive.

coef_

Estimated coefficients of the model.

Type

np.ndarray of shape (n_features,)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type

float

Notes

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4])
>>> l = LADRegression().fit(X, y)
>>> l.coef_
array([1., 2., 3., 4.])
>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([-1, 2, -3, 4])
>>> l = LADRegression(positive=True).fit(X, y)
>>> l.coef_
array([8.44480086e-17, 1.42423304e+00, 1.97135192e-16, 4.29789588e+00])
class skbonus.linear_model.QuantileRegression(alpha: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, copy_X: bool = True, positive: bool = False, quantile: float = 0.5)

Bases: skbonus.linear_model._scipy_regressors.BaseScipyMinimizeRegressor

Compute Quantile Regression. This can be used for computing confidence intervals of linear regressions.

QuantileRegression fits a linear model to minimize a weighted residual sum of absolute deviations between the observed targets in the dataset and the targets predicted by the linear approximation, i.e.

1 / (2 * n_samples) * switch * ||y - Xw||_1 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||_2 ** 2

where switch is a vector with value quantile if y - Xw < 0, else 1 - quantile. The regressor defaults to LADRegression for its default value of quantile=0.5.

Compared to linear regression, this approach is robust to outliers.

Parameters
  • alpha (float, default=0.0) – Constant that multiplies the penalty terms.

  • l1_ratio (float, default=0.0) – The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

  • copy_X (bool, default=True) – If True, X will be copied; else, it may be overwritten.

  • positive (bool, default=False) – When set to True, forces the coefficients to be positive.

  • quantile (float, between 0 and 1, default=0.5) – The line output by the model will have a share of approximately quantile data points under it. A value of quantile=1 outputs a line that is above each data point, for example. quantile=0.5 corresponds to LADRegression.

coef_

Estimated coefficients of the model.

Type

np.ndarray of shape (n_features,)

intercept_

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

Type

float

Notes

This implementation uses scipy.optimize.minimize, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([1, 2, 3, 4])
>>> l = QuantileRegression().fit(X, y)
>>> l.coef_
array([1., 2., 3., 4.])
>>> import numpy as np
>>> np.random.seed(0)
>>> X = np.random.randn(100, 4)
>>> y = X @ np.array([-1, 2, -3, 4])
>>> l = QuantileRegression(quantile=0.8).fit(X, y)
>>> l.coef_
array([-1.,  2., -3.,  4.])
fit(X: numpy.ndarray, y: numpy.ndarray, sample_weight: Optional[numpy.ndarray] = None)skbonus.linear_model._scipy_regressors.QuantileRegression

Fit the model using the SLSQP algorithm.

Parameters
  • X (np.ndarray of shape (n_samples, n_features)) – The training data.

  • y (np.ndarray, 1-dimensional) – The target values.

  • sample_weight (Optional[np.ndarray], default=None) – Individual weights for each sample.

Returns

Return type

Fitted regressor.