skbonus.pandas.time package

Module contents

Module for time series utilities with a focus on pandas compatibility.

class skbonus.pandas.time.DateIndicator(name: str, dates: List[str])

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Enrich a pandas dataframes with a new column indicating if there is a special date.

This new column will contain a one for each date specified in the dates keyword, zero otherwise.

Parameters
  • name (str) – The name of the new column. Usually a holiday name such as Easter, Christmas, Black Friday, …

  • dates (List[str]) – A list containing the dates of the holiday. You have to state every holiday explicitly, i.e. Christmas from 2018 to 2020 can be encoded as [“2018-12-24”, “2019-12-24”, “2020-12-24”].

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"A": range(7)}, index=pd.date_range(start="2019-12-29", periods=7))
>>> DateIndicator("around_new_year_2020", ["2019-12-31", "2020-01-01", "2020-01-02"]).fit_transform(df)
            A  around_new_year_2020
2019-12-29  0                     0
2019-12-30  1                     0
2019-12-31  2                     1
2020-01-01  3                     1
2020-01-02  4                     1
2020-01-03  5                     0
2020-01-04  6                     0
fit(X: pandas.core.frame.DataFrame, y=None)skbonus.pandas.time._simple.DateIndicator

Fit the estimator. In this special case, nothing is done.

Parameters
  • X (Ignored) – Not used, present here for API consistency by convention.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns

Fitted transformer.

Return type

DateIndicator

transform(X: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame

Add the new date feature to the dataframe.

Parameters

X (pd.DataFrame) – A pandas dataframe with a DatetimeIndex.

Returns

The input dataframe with an additional boolean column named self.name.

Return type

pd.DataFrame

class skbonus.pandas.time.ExponentialDecaySmoother(frequency: Optional[str] = None, window: int = 1, strength: float = 0.0, peak: float = 0.0, exponent: float = 1.0)

Bases: skbonus.pandas.time._continuous.Smoother

Smooth the columns of a data frame by applying a convolution with a exponentially decaying curve.

This class can be used for modelling carry over effects in marketing mix models

Parameters
  • frequency (Optional[str], default=None) – A pandas time frequency. Can take values like “d” for day or “m” for month. A full list can be found on https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases. If None, the transformer tries to infer it during fit time.

  • window (int, default=1) – Size of the sliding window. The effect of a holiday will reach from approximately date - window/2 * frequency to date + window/2 * frequency, i.e. it is centered around the dates in dates.

  • strength (float, default=0.0) – Fraction of the spending effect that is carried over.

  • peak (float, default=0.0) – Where the carryover effect peaks.

  • exponent (float, default=1.0) – To further widen or narrow the carryover curve. A value of 1.0 yields a normal exponential decay. With values larger than 1.0, a super exponential decay can be achieved.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [0, 0, 0, 1, 0, 0, 0]}, index=pd.date_range(start="2019-12-29", periods=7))
>>> ExponentialDecaySmoother().fit_transform(df)
              A
2019-12-29  0.0
2019-12-30  0.0
2019-12-31  0.0
2020-01-01  1.0
2020-01-02  0.0
2020-01-03  0.0
2020-01-04  0.0
>>> ExponentialDecaySmoother(frequency="d", window=3, strength=0.5).fit_transform(df)
                   A
2019-12-29  0.000000
2019-12-30  0.000000
2019-12-31  0.000000
2020-01-01  0.571429
2020-01-02  0.285714
2020-01-03  0.142857
2020-01-04  0.000000
>>> ExponentialDecaySmoother(window=3, strength=0.5, peak=1).fit_transform(df)
               A
2019-12-29  0.00
2019-12-30  0.00
2019-12-31  0.00
2020-01-01  0.25
2020-01-02  0.50
2020-01-03  0.25
2020-01-04  0.00
class skbonus.pandas.time.GeneralGaussianSmoother(frequency: Optional[str] = None, window: int = 1, p: float = 1, sig: float = 1, tails: str = 'both')

Bases: skbonus.pandas.time._continuous.Smoother

Smooth the columns of a data frame by applying a convolution with a generalized Gaussian curve.

Parameters
  • frequency (Optional[str], default=None) – A pandas time frequency. Can take values like “d” for day or “m” for month. A full list can be found on https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases. If None, the transformer tries to infer it during fit time.

  • window (int, default=1) – Size of the sliding window. The effect of a holiday will reach from approximately date - window/2 * frequency to date + window/2 * frequency, i.e. it is centered around the dates in dates.

  • p (float, default=1) – Parameter for the shape of the curve. p=1 yields a typical Gaussian curve while p=0.5 yields a Laplace curve, for example.

  • sig (float, default=1) – Parameter for the standard deviation of the bell-shaped curve.

  • tails (str, default="both") –

    Which tails to use. Can be one of

    • ”left”

    • ”right”

    • ”both”

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [0, 0, 0, 1, 0, 0, 0]}, index=pd.date_range(start="2019-12-29", periods=7))
>>> GeneralGaussianSmoother().fit_transform(df)
              A
2019-12-29  0.0
2019-12-30  0.0
2019-12-31  0.0
2020-01-01  1.0
2020-01-02  0.0
2020-01-03  0.0
2020-01-04  0.0
>>> GeneralGaussianSmoother(frequency="d", window=5, p=1, sig=1).fit_transform(df)
                   A
2019-12-29  0.000000
2019-12-30  0.054489
2019-12-31  0.244201
2020-01-01  0.402620
2020-01-02  0.244201
2020-01-03  0.054489
2020-01-04  0.000000
>>> GeneralGaussianSmoother(window=7, tails="right").fit_transform(df)
                   A
2019-12-29  0.000000
2019-12-30  0.000000
2019-12-31  0.000000
2020-01-01  0.570459
2020-01-02  0.346001
2020-01-03  0.077203
2020-01-04  0.006337
class skbonus.pandas.time.PowerTrend(power: float = 1.0, frequency: Optional[str] = None, origin_date: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]] = None)

Bases: skbonus.pandas.time._continuous.BaseContinuousTransformer

Add a power trend column to a pandas dataframe.

For example, it can create a new column with numbers increasing quadratically in the index.

Parameters
  • power (float, default=1.0) – Exponent to use for the trend, i.e. linear (power=1.), root (power=0.5), or cube (power=3.).

  • frequency (Optional[str], default=None) – A pandas time frequency. Can take values like “d” for day or “m” for month. A full list can be found on https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases. If None, the transformer tries to infer it during fit time.

  • origin_date (Optional[Union[str, pd.Timestamp]], default=None) – A date the trend originates from, i.e. the value of the trend column is zero for this date. If None, the transformer uses the smallest date of the training set during fit time.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame(
...     {"A": ["a", "b", "c", "d"]},
...     index=pd.date_range(start="1988-08-08", periods=4)
... )
>>> PowerTrend(power=2., frequency="d", origin_date="1988-08-06").fit_transform(df)
            A  trend
1988-08-08  a    4.0
1988-08-09  b    9.0
1988-08-10  c   16.0
1988-08-11  d   25.0
fit(X: pandas.core.frame.DataFrame, y: None = None)skbonus.pandas.time._continuous.PowerTrend

Fit the model.

The point of origin and the frequency is constructed here, if not provided during initialization.

Parameters
  • X (pd.DataFrame) – Used for inferring the frequency and the origin date, if not provided during initialization.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns

Fitted transformer.

Return type

PowerTrend

transform(X: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame

Add the trend column to the input dataframe.

Parameters

X (pd.DataFrame) – A pandas dataframe with a DatetimeIndex.

Returns

The input dataframe with an additional trend column.

Return type

pd.DataFrame

class skbonus.pandas.time.SimpleTimeFeatures(second: bool = False, minute: bool = False, hour: bool = False, day_of_week: bool = False, day_of_month: bool = False, day_of_year: bool = False, week_of_month: bool = False, week_of_year: bool = False, month: bool = False, year: bool = False)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Enrich pandas dataframes with new columns which are easy derivations from its DatetimeIndex, such as the day of week or the month.

Parameters
  • second (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.

  • minute (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.

  • hour (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.

  • day_of_week (bool, default=False) – Whether to extract the day of week from the index and add it as a new column.

  • day_of_month (bool, default=False) – Whether to extract the day of month from the index and add it as a new column.

  • day_of_year (bool, default=False) – Whether to extract the day of year from the index and add it as a new column.

  • week_of_month (bool, default=False) – Whether to extract the week of month from the index and add it as a new column.

  • week_of_year (bool, default=False) – Whether to extract the week of year from the index and add it as a new column.

  • month (bool, default=False) – Whether to extract the month from the index and add it as a new column.

  • year (bool, default=False) – Whether to extract the year from the index and add it as a new column.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame(
...     {"A": ["a", "b", "c"]},
...     index=[
...         pd.Timestamp("1988-08-08"),
...         pd.Timestamp("2000-01-01"),
...         pd.Timestamp("1950-12-31"),
...     ])
>>> SimpleTimeFeatures(day_of_month=True, month=True, year=True).fit_transform(df)
            A  day_of_month  month  year
1988-08-08  a             8      8  1988
2000-01-01  b             1      1  2000
1950-12-31  c            31     12  1950
fit(X: pandas.core.frame.DataFrame, y: Optional[Any] = None)skbonus.pandas.time._simple.SimpleTimeFeatures

Fit the estimator.

In this special case, nothing is done.

Parameters
  • X (Ignored) – Not used, present here for API consistency by convention.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns

Fitted transformer.

Return type

SimpleTimeFeatures

transform(X: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame

Insert all chosen time features as new columns into the dataframe and output it.

Parameters

X (pd.DataFrame) – A pandas dataframe with a DatetimeIndex.

Returns

The input dataframe with additional time feature columns.

Return type

pd.DataFrame