skbonus.pandas.time package¶
Module contents¶
Module for time series utilities with a focus on pandas compatibility.
-
class
skbonus.pandas.time.
DateIndicator
(name: str, dates: List[str])¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Enrich a pandas dataframes with a new column indicating if there is a special date.
This new column will contain a one for each date specified in the dates keyword, zero otherwise.
- Parameters
name (str) – The name of the new column. Usually a holiday name such as Easter, Christmas, Black Friday, …
dates (List[str]) – A list containing the dates of the holiday. You have to state every holiday explicitly, i.e. Christmas from 2018 to 2020 can be encoded as [“2018-12-24”, “2019-12-24”, “2020-12-24”].
Examples
>>> import pandas as pd >>> df = pd.DataFrame({"A": range(7)}, index=pd.date_range(start="2019-12-29", periods=7)) >>> DateIndicator("around_new_year_2020", ["2019-12-31", "2020-01-01", "2020-01-02"]).fit_transform(df) A around_new_year_2020 2019-12-29 0 0 2019-12-30 1 0 2019-12-31 2 1 2020-01-01 3 1 2020-01-02 4 1 2020-01-03 5 0 2020-01-04 6 0
-
fit
(X: pandas.core.frame.DataFrame, y=None) → skbonus.pandas.time._simple.DateIndicator¶ Fit the estimator. In this special case, nothing is done.
- Parameters
X (Ignored) – Not used, present here for API consistency by convention.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns
Fitted transformer.
- Return type
-
transform
(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Add the new date feature to the dataframe.
- Parameters
X (pd.DataFrame) – A pandas dataframe with a DatetimeIndex.
- Returns
The input dataframe with an additional boolean column named self.name.
- Return type
pd.DataFrame
-
class
skbonus.pandas.time.
ExponentialDecaySmoother
(frequency: Optional[str] = None, window: int = 1, strength: float = 0.0, peak: float = 0.0, exponent: float = 1.0)¶ Bases:
skbonus.pandas.time._continuous.Smoother
Smooth the columns of a data frame by applying a convolution with a exponentially decaying curve.
This class can be used for modelling carry over effects in marketing mix models
- Parameters
frequency (Optional[str], default=None) – A pandas time frequency. Can take values like “d” for day or “m” for month. A full list can be found on https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases. If None, the transformer tries to infer it during fit time.
window (int, default=1) – Size of the sliding window. The effect of a holiday will reach from approximately date - window/2 * frequency to date + window/2 * frequency, i.e. it is centered around the dates in dates.
strength (float, default=0.0) – Fraction of the spending effect that is carried over.
peak (float, default=0.0) – Where the carryover effect peaks.
exponent (float, default=1.0) – To further widen or narrow the carryover curve. A value of 1.0 yields a normal exponential decay. With values larger than 1.0, a super exponential decay can be achieved.
Examples
>>> import pandas as pd >>> df = pd.DataFrame({"A": [0, 0, 0, 1, 0, 0, 0]}, index=pd.date_range(start="2019-12-29", periods=7)) >>> ExponentialDecaySmoother().fit_transform(df) A 2019-12-29 0.0 2019-12-30 0.0 2019-12-31 0.0 2020-01-01 1.0 2020-01-02 0.0 2020-01-03 0.0 2020-01-04 0.0
>>> ExponentialDecaySmoother(frequency="d", window=3, strength=0.5).fit_transform(df) A 2019-12-29 0.000000 2019-12-30 0.000000 2019-12-31 0.000000 2020-01-01 0.571429 2020-01-02 0.285714 2020-01-03 0.142857 2020-01-04 0.000000
>>> ExponentialDecaySmoother(window=3, strength=0.5, peak=1).fit_transform(df) A 2019-12-29 0.00 2019-12-30 0.00 2019-12-31 0.00 2020-01-01 0.25 2020-01-02 0.50 2020-01-03 0.25 2020-01-04 0.00
-
class
skbonus.pandas.time.
GeneralGaussianSmoother
(frequency: Optional[str] = None, window: int = 1, p: float = 1, sig: float = 1, tails: str = 'both')¶ Bases:
skbonus.pandas.time._continuous.Smoother
Smooth the columns of a data frame by applying a convolution with a generalized Gaussian curve.
- Parameters
frequency (Optional[str], default=None) – A pandas time frequency. Can take values like “d” for day or “m” for month. A full list can be found on https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases. If None, the transformer tries to infer it during fit time.
window (int, default=1) – Size of the sliding window. The effect of a holiday will reach from approximately date - window/2 * frequency to date + window/2 * frequency, i.e. it is centered around the dates in dates.
p (float, default=1) – Parameter for the shape of the curve. p=1 yields a typical Gaussian curve while p=0.5 yields a Laplace curve, for example.
sig (float, default=1) – Parameter for the standard deviation of the bell-shaped curve.
tails (str, default="both") –
Which tails to use. Can be one of
”left”
”right”
”both”
Examples
>>> import pandas as pd >>> df = pd.DataFrame({"A": [0, 0, 0, 1, 0, 0, 0]}, index=pd.date_range(start="2019-12-29", periods=7)) >>> GeneralGaussianSmoother().fit_transform(df) A 2019-12-29 0.0 2019-12-30 0.0 2019-12-31 0.0 2020-01-01 1.0 2020-01-02 0.0 2020-01-03 0.0 2020-01-04 0.0
>>> GeneralGaussianSmoother(frequency="d", window=5, p=1, sig=1).fit_transform(df) A 2019-12-29 0.000000 2019-12-30 0.054489 2019-12-31 0.244201 2020-01-01 0.402620 2020-01-02 0.244201 2020-01-03 0.054489 2020-01-04 0.000000
>>> GeneralGaussianSmoother(window=7, tails="right").fit_transform(df) A 2019-12-29 0.000000 2019-12-30 0.000000 2019-12-31 0.000000 2020-01-01 0.570459 2020-01-02 0.346001 2020-01-03 0.077203 2020-01-04 0.006337
-
class
skbonus.pandas.time.
PowerTrend
(power: float = 1.0, frequency: Optional[str] = None, origin_date: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]] = None)¶ Bases:
skbonus.pandas.time._continuous.BaseContinuousTransformer
Add a power trend column to a pandas dataframe.
For example, it can create a new column with numbers increasing quadratically in the index.
- Parameters
power (float, default=1.0) – Exponent to use for the trend, i.e. linear (power=1.), root (power=0.5), or cube (power=3.).
frequency (Optional[str], default=None) – A pandas time frequency. Can take values like “d” for day or “m” for month. A full list can be found on https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases. If None, the transformer tries to infer it during fit time.
origin_date (Optional[Union[str, pd.Timestamp]], default=None) – A date the trend originates from, i.e. the value of the trend column is zero for this date. If None, the transformer uses the smallest date of the training set during fit time.
Examples
>>> import pandas as pd >>> df = pd.DataFrame( ... {"A": ["a", "b", "c", "d"]}, ... index=pd.date_range(start="1988-08-08", periods=4) ... ) >>> PowerTrend(power=2., frequency="d", origin_date="1988-08-06").fit_transform(df) A trend 1988-08-08 a 4.0 1988-08-09 b 9.0 1988-08-10 c 16.0 1988-08-11 d 25.0
-
fit
(X: pandas.core.frame.DataFrame, y: None = None) → skbonus.pandas.time._continuous.PowerTrend¶ Fit the model.
The point of origin and the frequency is constructed here, if not provided during initialization.
- Parameters
X (pd.DataFrame) – Used for inferring the frequency and the origin date, if not provided during initialization.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns
Fitted transformer.
- Return type
-
transform
(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Add the trend column to the input dataframe.
- Parameters
X (pd.DataFrame) – A pandas dataframe with a DatetimeIndex.
- Returns
The input dataframe with an additional trend column.
- Return type
pd.DataFrame
-
class
skbonus.pandas.time.
SimpleTimeFeatures
(second: bool = False, minute: bool = False, hour: bool = False, day_of_week: bool = False, day_of_month: bool = False, day_of_year: bool = False, week_of_month: bool = False, week_of_year: bool = False, month: bool = False, year: bool = False)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Enrich pandas dataframes with new columns which are easy derivations from its DatetimeIndex, such as the day of week or the month.
- Parameters
second (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.
minute (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.
hour (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.
day_of_week (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.
day_of_month (bool, default=False) – Whether to extract the day of month from the index and add it as a new column.
day_of_year (bool, default=False) – Whether to extract the day of year from the index and add it as a new column.
week_of_month (bool, default=False) – Whether to extract the week of month from the index and add it as a new column.
week_of_year (bool, default=False) – Whether to extract the week of year from the index and add it as a new column.
month (bool, default=False) – Whether to extract the month from the index and add it as a new column.
year (bool, default=False) – Whether to extract the year from the index and add it as a new column.
Examples
>>> import pandas as pd >>> df = pd.DataFrame( ... {"A": ["a", "b", "c"]}, ... index=[ ... pd.Timestamp("1988-08-08"), ... pd.Timestamp("2000-01-01"), ... pd.Timestamp("1950-12-31"), ... ]) >>> SimpleTimeFeatures(day_of_month=True, month=True, year=True).fit_transform(df) A day_of_month month year 1988-08-08 a 8 8 1988 2000-01-01 b 1 1 2000 1950-12-31 c 31 12 1950
-
fit
(X: pandas.core.frame.DataFrame, y: Optional[Any] = None) → skbonus.pandas.time._simple.SimpleTimeFeatures¶ Fit the estimator.
In this special case, nothing is done.
- Parameters
X (Ignored) – Not used, present here for API consistency by convention.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns
Fitted transformer.
- Return type
-
transform
(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Insert all chosen time features as new columns into the dataframe and output it.
- Parameters
X (pd.DataFrame) – A pandas dataframe with a DatetimeIndex.
- Returns
The input dataframe with additional time feature columns.
- Return type
pd.DataFrame