skbonus.preprocessing package

Submodules

skbonus.preprocessing.saturation module

Saturation classes.

class skbonus.preprocessing.saturation.AdbudgSaturation(exponent: float = 1.0, denominator_shift: float = 1.0)

Bases: skbonus.preprocessing.saturation.Saturation

Apply the Adbudg saturation.

The formula is x ** exponent / (denominator_shift + x ** exponent).

Parameters
  • exponent (float, default=1.0) – The exponent.

  • denominator_shift (float, default=1.0) – The shift in the denominator.

Notes

This version produces saturated values in the interval [0, 1]. You can use LinearShift from the shift module to bring it between some interval [a, b].

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> AdbudgSaturation().fit_transform(X)
array([[0.5       , 0.999001  ],
       [0.66666667, 0.999001  ],
       [0.75      , 0.999001  ]])
class skbonus.preprocessing.saturation.BoxCoxSaturation(exponent: float = 1.0, shift: float = 1.0)

Bases: skbonus.preprocessing.saturation.Saturation

Apply the Box-Cox saturation.

The formula is ((x + shift) ** exponent-1) / exponent if exponent!=0, else ln(x+shift).

Parameters
  • exponent (float, default=1.0) – The exponent.

  • shift (float, default=1.0) – The shift.

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> BoxCoxSaturation(exponent=0.5).fit_transform(X)
array([[ 0.82842712, 61.27716808],
       [ 1.46410162, 61.27716808],
       [ 2.        , 61.27716808]])
class skbonus.preprocessing.saturation.ExponentialSaturation(exponent: float = 1.0)

Bases: skbonus.preprocessing.saturation.Saturation

Apply exponential saturation.

The formula is 1 - exp(-exponent * x).

Parameters

exponent (float, default=1.0) – The exponent.

Notes

This version produces saturated values in the interval [0, 1]. You can use LinearShift from the shift module to bring it between some interval [a, b].

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> ExponentialSaturation().fit_transform(X)
array([[0.63212056, 1.        ],
       [0.86466472, 1.        ],
       [0.95021293, 1.        ]])
class skbonus.preprocessing.saturation.HillSaturation(exponent: float = 1.0, half_saturation: float = 1.0)

Bases: skbonus.preprocessing.saturation.Saturation

Apply the Hill saturation.

The formula is 1 / (1 + (half_saturation / x) ** exponent).

Parameters
  • exponent (float, default=1.0) – The exponent.

  • half_saturation (float, default=1.0) – The point of half saturation, i.e. Hill(half_saturation) = 0.5.

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> HillSaturation().fit_transform(X)
array([[0.5       , 0.999001  ],
       [0.66666667, 0.999001  ],
       [0.75      , 0.999001  ]])
class skbonus.preprocessing.saturation.Saturation

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin, abc.ABC

Base class for all saturations, such as Box-Cox, Adbudg, …

fit(X: numpy.ndarray, y: None = None)skbonus.preprocessing.saturation.Saturation

Fit the transformer.

In this special case, nothing is done.

Parameters
  • X (Ignored) – Not used, present here for API consistency by convention.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns

Fitted transformer.

Return type

Saturation

transform(X: numpy.ndarray)numpy.ndarray

Apply the saturation effect.

Parameters

X (np.ndarray) – Data to be transformed.

Returns

Data with saturation effect applied.

Return type

np.ndarray

skbonus.preprocessing.time module

class skbonus.preprocessing.time.CyclicalEncoder(cycles: Optional[List[Tuple[float, float]]] = None)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Break each cyclic feature into two new features, corresponding to the representation of this feature on a circle.

For example, take the hours from 0 to 23. On a normal, round analog clock, these features are perfectly aligned on a circle already. You can do the same with days, month, …

Notes

This method has the advantage that close points in time stay close together. See the examples below.

Otherwise, if algorithms deal with the raw value for hour they cannot know that 0 and 23 are actually close. Another possibility is one hot encoding the hour. This has the disadvantage that it breaks the distances between different hours. Hour 5 and 16 have the same distance as hour 0 and 23 when doing this.

Parameters

cycles (Optional[List[Tuple[float, float]]], default=None) –

Define the ranges of the cycles in the format [(col_1_min, col_1_max), (col_2_min, col_2_max), …). For example, use [(0, 23), (1, 7)] if your dataset consists of two columns, the first one containing hours and the second one the day of the week.

If left empty, the encoder tries to infer it from the data, i.e. it looks for the minimum and maximum value of each column.

Examples

>>> import numpy as np
>>> df = np.array([[22], [23], [0], [1], [2]])
>>> CyclicalEncoder().fit_transform(df)
array([[ 0.8660254 , -0.5       ],
       [ 0.96592583, -0.25881905],
       [ 1.        ,  0.        ],
       [ 0.96592583,  0.25881905],
       [ 0.8660254 ,  0.5       ]])
fit(X: numpy.ndarray, y=None)skbonus.preprocessing.time.CyclicalEncoder

Fit the estimator. In this special case, nothing is done.

Parameters
  • X (np.ndarray) – Used for inferring te ranges of the data, if not provided during initialization.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns

Fitted transformer.

Return type

CyclicalEncoder

transform(X: numpy.ndarray)numpy.ndarray

Add the cyclic features to the dataframe.

Parameters

X (np.ndarray) – The data with cyclical features in the columns.

Returns

The encoded data with twice as man columns as the original.

Return type

np.ndarray

Module contents

Module for preprocessing data.