skbonus.preprocessing package¶

Submodules¶

skbonus.preprocessing.saturation module¶

Saturation classes.

class skbonus.preprocessing.saturation.AdbudgSaturation(exponent: float = 1.0, denominator_shift: float = 1.0)¶

Bases: skbonus.preprocessing.saturation.Saturation

Apply the Adbudg saturation.

The formula is x ** exponent / (denominator_shift + x ** exponent).

Parameters

exponent (float, default=1.0) – The exponent.
denominator_shift (float, default=1.0) – The shift in the denominator.

Notes

This version produces saturated values in the interval [0, 1]. You can use LinearShift from the shift module to bring it between some interval [a, b].

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> AdbudgSaturation().fit_transform(X)
array([[0.5       , 0.999001  ],
       [0.66666667, 0.999001  ],
       [0.75      , 0.999001  ]])

class skbonus.preprocessing.saturation.BoxCoxSaturation(exponent: float = 1.0, shift: float = 1.0)¶

Bases: skbonus.preprocessing.saturation.Saturation

Apply the Box-Cox saturation.

The formula is ((x + shift) ** exponent-1) / exponent if exponent!=0, else ln(x+shift).

Parameters

exponent (float, default=1.0) – The exponent.
shift (float, default=1.0) – The shift.

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> BoxCoxSaturation(exponent=0.5).fit_transform(X)
array([[ 0.82842712, 61.27716808],
       [ 1.46410162, 61.27716808],
       [ 2.        , 61.27716808]])

class skbonus.preprocessing.saturation.ExponentialSaturation(exponent: float = 1.0)¶

Bases: skbonus.preprocessing.saturation.Saturation

Apply exponential saturation.

The formula is 1 - exp(-exponent * x).

Parameters: exponent (float, default=1.0) – The exponent.

Notes

This version produces saturated values in the interval [0, 1]. You can use LinearShift from the shift module to bring it between some interval [a, b].

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> ExponentialSaturation().fit_transform(X)
array([[0.63212056, 1.        ],
       [0.86466472, 1.        ],
       [0.95021293, 1.        ]])

class skbonus.preprocessing.saturation.HillSaturation(exponent: float = 1.0, half_saturation: float = 1.0)¶

Bases: skbonus.preprocessing.saturation.Saturation

Apply the Hill saturation.

The formula is 1 / (1 + (half_saturation / x) ** exponent).

Parameters

exponent (float, default=1.0) – The exponent.
half_saturation (float, default=1.0) – The point of half saturation, i.e. Hill(half_saturation) = 0.5.

Examples

>>> import numpy as np
>>> X = np.array([[1, 1000], [2, 1000], [3, 1000]])
>>> HillSaturation().fit_transform(X)
array([[0.5       , 0.999001  ],
       [0.66666667, 0.999001  ],
       [0.75      , 0.999001  ]])

class skbonus.preprocessing.saturation.Saturation¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin, abc.ABC

Base class for all saturations, such as Box-Cox, Adbudg, …

fit(X: numpy.ndarray, y: None = None) → skbonus.preprocessing.saturation.Saturation ¶

Fit the transformer.

In this special case, nothing is done.

Parameters

X (Ignored) – Not used, present here for API consistency by convention.
y (Ignored) – Not used, present here for API consistency by convention.

Returns

Fitted transformer.

Return type

Saturation

transform(X: numpy.ndarray) → numpy.ndarray¶

Apply the saturation effect.

Parameters: X (np.ndarray) – Data to be transformed.
Returns: Data with saturation effect applied.
Return type: np.ndarray

skbonus.preprocessing.time module¶

class skbonus.preprocessing.time.CyclicalEncoder(cycles: Optional[List[Tuple[float, float]]] = None)¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Break each cyclic feature into two new features, corresponding to the representation of this feature on a circle.

For example, take the hours from 0 to 23. On a normal, round analog clock, these features are perfectly aligned on a circle already. You can do the same with days, month, …

Notes

This method has the advantage that close points in time stay close together. See the examples below.

Otherwise, if algorithms deal with the raw value for hour they cannot know that 0 and 23 are actually close. Another possibility is one hot encoding the hour. This has the disadvantage that it breaks the distances between different hours. Hour 5 and 16 have the same distance as hour 0 and 23 when doing this.

Parameters

cycles (Optional[List[Tuple[float, float]]], default=None) –

Define the ranges of the cycles in the format [(col_1_min, col_1_max), (col_2_min, col_2_max), …). For example, use [(0, 23), (1, 7)] if your dataset consists of two columns, the first one containing hours and the second one the day of the week.

If left empty, the encoder tries to infer it from the data, i.e. it looks for the minimum and maximum value of each column.

Examples

>>> import numpy as np
>>> df = np.array([[22], [23], [0], [1], [2]])
>>> CyclicalEncoder().fit_transform(df)
array([[ 0.8660254 , -0.5       ],
       [ 0.96592583, -0.25881905],
       [ 1.        ,  0.        ],
       [ 0.96592583,  0.25881905],
       [ 0.8660254 ,  0.5       ]])

fit(X: numpy.ndarray, y=None) → skbonus.preprocessing.time.CyclicalEncoder ¶

Fit the estimator. In this special case, nothing is done.

Parameters

X (np.ndarray) – Used for inferring te ranges of the data, if not provided during initialization.
y (Ignored) – Not used, present here for API consistency by convention.

Returns

Fitted transformer.

Return type

CyclicalEncoder

transform(X: numpy.ndarray) → numpy.ndarray¶

Add the cyclic features to the dataframe.

Parameters: X (np.ndarray) – The data with cyclical features in the columns.
Returns: The encoded data with twice as man columns as the original.
Return type: np.ndarray

Module contents¶

Module for preprocessing data.