skbonus.preprocessing package¶
Submodules¶
skbonus.preprocessing.saturation module¶
Saturation classes.
-
class
skbonus.preprocessing.saturation.
AdbudgSaturation
(exponent: float = 1.0, denominator_shift: float = 1.0)¶ Bases:
skbonus.preprocessing.saturation.Saturation
Apply the Adbudg saturation.
The formula is x ** exponent / (denominator_shift + x ** exponent).
- Parameters
exponent (float, default=1.0) – The exponent.
denominator_shift (float, default=1.0) – The shift in the denominator.
Notes
This version produces saturated values in the interval [0, 1]. You can use LinearShift from the shift module to bring it between some interval [a, b].
Examples
>>> import numpy as np >>> X = np.array([[1, 1000], [2, 1000], [3, 1000]]) >>> AdbudgSaturation().fit_transform(X) array([[0.5 , 0.999001 ], [0.66666667, 0.999001 ], [0.75 , 0.999001 ]])
-
class
skbonus.preprocessing.saturation.
BoxCoxSaturation
(exponent: float = 1.0, shift: float = 1.0)¶ Bases:
skbonus.preprocessing.saturation.Saturation
Apply the Box-Cox saturation.
The formula is ((x + shift) ** exponent-1) / exponent if exponent!=0, else ln(x+shift).
- Parameters
exponent (float, default=1.0) – The exponent.
shift (float, default=1.0) – The shift.
Examples
>>> import numpy as np >>> X = np.array([[1, 1000], [2, 1000], [3, 1000]]) >>> BoxCoxSaturation(exponent=0.5).fit_transform(X) array([[ 0.82842712, 61.27716808], [ 1.46410162, 61.27716808], [ 2. , 61.27716808]])
-
class
skbonus.preprocessing.saturation.
ExponentialSaturation
(exponent: float = 1.0)¶ Bases:
skbonus.preprocessing.saturation.Saturation
Apply exponential saturation.
The formula is 1 - exp(-exponent * x).
- Parameters
exponent (float, default=1.0) – The exponent.
Notes
This version produces saturated values in the interval [0, 1]. You can use LinearShift from the shift module to bring it between some interval [a, b].
Examples
>>> import numpy as np >>> X = np.array([[1, 1000], [2, 1000], [3, 1000]]) >>> ExponentialSaturation().fit_transform(X) array([[0.63212056, 1. ], [0.86466472, 1. ], [0.95021293, 1. ]])
-
class
skbonus.preprocessing.saturation.
HillSaturation
(exponent: float = 1.0, half_saturation: float = 1.0)¶ Bases:
skbonus.preprocessing.saturation.Saturation
Apply the Hill saturation.
The formula is 1 / (1 + (half_saturation / x) ** exponent).
- Parameters
exponent (float, default=1.0) – The exponent.
half_saturation (float, default=1.0) – The point of half saturation, i.e. Hill(half_saturation) = 0.5.
Examples
>>> import numpy as np >>> X = np.array([[1, 1000], [2, 1000], [3, 1000]]) >>> HillSaturation().fit_transform(X) array([[0.5 , 0.999001 ], [0.66666667, 0.999001 ], [0.75 , 0.999001 ]])
-
class
skbonus.preprocessing.saturation.
Saturation
¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
,abc.ABC
Base class for all saturations, such as Box-Cox, Adbudg, …
-
fit
(X: numpy.ndarray, y: None = None) → skbonus.preprocessing.saturation.Saturation¶ Fit the transformer.
In this special case, nothing is done.
- Parameters
X (Ignored) – Not used, present here for API consistency by convention.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns
Fitted transformer.
- Return type
-
transform
(X: numpy.ndarray) → numpy.ndarray¶ Apply the saturation effect.
- Parameters
X (np.ndarray) – Data to be transformed.
- Returns
Data with saturation effect applied.
- Return type
np.ndarray
-
skbonus.preprocessing.time module¶
-
class
skbonus.preprocessing.time.
CyclicalEncoder
(cycles: Optional[List[Tuple[float, float]]] = None)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Break each cyclic feature into two new features, corresponding to the representation of this feature on a circle.
For example, take the hours from 0 to 23. On a normal, round analog clock, these features are perfectly aligned on a circle already. You can do the same with days, month, …
Notes
This method has the advantage that close points in time stay close together. See the examples below.
Otherwise, if algorithms deal with the raw value for hour they cannot know that 0 and 23 are actually close. Another possibility is one hot encoding the hour. This has the disadvantage that it breaks the distances between different hours. Hour 5 and 16 have the same distance as hour 0 and 23 when doing this.
- Parameters
cycles (Optional[List[Tuple[float, float]]], default=None) –
Define the ranges of the cycles in the format [(col_1_min, col_1_max), (col_2_min, col_2_max), …). For example, use [(0, 23), (1, 7)] if your dataset consists of two columns, the first one containing hours and the second one the day of the week.
If left empty, the encoder tries to infer it from the data, i.e. it looks for the minimum and maximum value of each column.
Examples
>>> import numpy as np >>> df = np.array([[22], [23], [0], [1], [2]]) >>> CyclicalEncoder().fit_transform(df) array([[ 0.8660254 , -0.5 ], [ 0.96592583, -0.25881905], [ 1. , 0. ], [ 0.96592583, 0.25881905], [ 0.8660254 , 0.5 ]])
-
fit
(X: numpy.ndarray, y=None) → skbonus.preprocessing.time.CyclicalEncoder¶ Fit the estimator. In this special case, nothing is done.
- Parameters
X (np.ndarray) – Used for inferring te ranges of the data, if not provided during initialization.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns
Fitted transformer.
- Return type
-
transform
(X: numpy.ndarray) → numpy.ndarray¶ Add the cyclic features to the dataframe.
- Parameters
X (np.ndarray) – The data with cyclical features in the columns.
- Returns
The encoded data with twice as man columns as the original.
- Return type
np.ndarray
Module contents¶
Module for preprocessing data.