binder

Probabilistic Forecasting with sktime#

originally presented at pydata Berlin 2022, see there for video presentation

Overview of this notebook#

  • quick start - probabilistic forecasting

  • disambiguation - types of probabilistic forecasts

  • details: probabilistic forecasting interfaces

  • metrics for, and evaluation of probabilistic forecasts

  • advanced composition: pipelines, tuning, reduction

  • extender guide

  • contributor credits

[ ]:
import warnings

warnings.filterwarnings("ignore")

Quick Start - Probabilistic Forecasting with sktime#

… works exactly like the basic forecasting workflow, replace predict by a probabilistic method!

[ ]:
from sktime.datasets import load_airline
from sktime.forecasting.arima import ARIMA

# step 1: data specification
y = load_airline()
# step 2: specifying forecasting horizon
fh = [1, 2, 3]
# step 3: specifying the forecasting algorithm
forecaster = ARIMA()
# step 4: fitting the forecaster
forecaster.fit(y, fh=[1, 2, 3])
# step 5: querying predictions
y_pred = forecaster.predict()

# for probabilistic forecasting:
#   call a probabilistic forecasting method after or instead of step 5
y_pred_int = forecaster.predict_interval(coverage=0.9)
y_pred_int

probabilistic forecasting methods in ``sktime``:

  • forecast intervals - predict_interval(fh=None, X=None, coverage=0.90)

  • forecast quantiles - predict_quantiles(fh=None, X=None, alpha=[0.05, 0.95])

  • forecast variance - predict_var(fh=None, X=None, cov=False)

  • distribution forecast - predict_proba(fh=None, X=None, marginal=True)

To check which forecasters in sktime support probabilistic forecasting, use the registry.all_estimators utility and search for estimators which have the capability:pred_int tag (value True).

For composites such as pipelines, a positive tag means that logic is implemented if (some or all) components support it.

[ ]:
from sktime.registry import all_estimators

all_estimators(
    "forecaster", filter_tags={"capability:pred_int": True}, as_dataframe=True
)

What is probabilistic forecasting?#

Intuition#

  • produce low/high scenarios of forecasts

  • quantify uncertainty around forecasts

  • produce expected range of variation of forecasts

Interface view#

Want to produce “distribution” or “range” of forecast values,

at time stamps defined by forecasting horizon fh

given past data y (series), and possibly exogeneous data X

Input, to fit or predict: fh, y, X

Output, from predict_probabilistic: some “distribution” or “range” object

Big caveat: there are multiple possible ways to model “distribution” or “range”!

Used in practice and easily confused! (and often, practically, confused!)

Formal view (endogeneous, one forecast time stamp)#

Let \(y(t_1), \dots, y(t_n)\) be observations at fixed time stamps \(t_1, \dots, t_n\).

(we consider \(y\) as an \(\mathbb{R}^n\)-valued random variable)

Let \(y'\) be a (true) value, which will be observed at a future time stamp \(\tau\).

(we consider \(y'\) as an \(\mathbb{R}\)-valued random variable)

We have the following “types of forecasts” of \(y'\):

Name

param

prediction/estimate of

sktime

point forecast

conditional expectation \(\mathbb{E}[y'\|y]\)

predict

variance forecast

conditional variance \(Var[y'\|y]\)

predict_var

quantile forecast

\(\alpha\in (0,1)\)

\(\alpha\)-quantile of \(y'\|y\)

predict_quantiles

interval forecast

\(c\in (0,1)\)

\([a,b]\) s.t. \(P(a\le y' \le b\| y) = c\)

predict_interval

distribution forecast

the law/distribution of \(y'\|y\)

predict_proba

Notes:

  • different forecasters have different capabilities!

  • metrics, evaluation & tuning are different by “type of forecast”

  • compositors can “add” type of forecast! Example: bootstrap

More formal details & intuition:#

  • a “point forecast” is a prediction/estimate of the conditional expectation \(\mathbb{E}[y'|y]\). Intuition: “out of many repetitions/worlds, this value is the arithmetic average of all observations”.

  • a “variance forecast” is a prediction/estimate of the conditional expectation \(Var[y'|y]\). Intuition: “out of many repetitions/worlds, this value is the average squared distance of the observation to the perfect point forecast”.

  • a “quantile forecast”, at quantile point \(\alpha\in (0,1)\) is a prediction/estimate of the \(\alpha\)-quantile of \(y'|y\), i.e., of \(F^{-1}_{y'|y}(\alpha)\), where \(F^{-1}\) is the (generalized) inverse cdf = quantile function of the random variable y’|y. Intuition: “out of many repetitions/worlds, a fraction of exactly \(\alpha\) will have equal or smaller than this value.”

  • an “interval forecast” or “predictive interval” with (symmetric) coverage \(c\in (0,1)\) is a prediction/estimate pair of lower bound \(a\) and upper bound \(b\) such that \(P(a\le y' \le b| y) = c\) and \(P(y' \gneq b| y) = P(y' \lneq a| y) = (1 - c) /2\). Intuition: “out of many repetitions/worlds, a fraction of exactly \(c\) will be contained in the interval \([a,b]\), and being above is equally likely as being below”.

  • a “distribution forecast” or “full probabilistic forecast” is a prediction/estimate of the distribution of \(y'|y\), e.g., “it’s a normal distribution with mean 42 and variance 1”. Intuition: exhaustive description of the generating mechanism of many repetitions/worlds.

Notes:

  • lower/upper of interval forecasts are quantile forecasts at quantile points \(0.5 - c/2\) and \(0.5 + c/2\) (as long as forecast distributions are absolutely continuous).

  • all other forecasts can be obtained from a full probabilistic forecasts; a full probabilistic forecast can be obtained from all quantile forecasts or all interval forecasts.

  • there is no exact relation between the other types of forecasts (point or variance vs quantile)

  • in particular, point forecast does not need to be median forecast aka 0.5-quantile forecast. Can be \(\alpha\)-quantile for any \(\alpha\)!

Frequent confusion in literature & python packages: * coverage c vs quantile \alpha * coverage c vs significance p = 1-c * quantile of lower interval bound, p/2, vs p * interval forecasts vs related, but substantially different concepts: confidence interval on predictive mean; Bayesian posterior or credibility interval of the predictive mean * all forecasts above can be Bayesian, confusion: “posteriors are different” or “have to be evaluted differently”


Probabilistic forecasting interfaces in sktime#

This section:

  • walkthrough of probabilistic predict methods

  • use in update/predict workflow

  • multivariate and hierarchical data

All forecasters with tag capability:pred_int provide the following:

  • forecast intervals - predict_interval(fh=None, X=None, coverage=0.90)

  • forecast quantiles - predict_quantiles(fh=None, X=None, alpha=[0.05, 0.95])

  • forecast variance - predict_var(fh=None, X=None, cov=False)

  • distribution forecast - predict_proba(fh=None, X=None, marginal=True)

Generalities:

  • methods do not change state, multiple can be called

  • fh is optional, if passed late

  • exogeneous data X can be passed

[ ]:
import numpy as np

from sktime.datasets import load_airline
from sktime.forecasting.theta import ThetaForecaster

# until fit, identical with the simple workflow
y = load_airline()

fh = np.arange(1, 13)

forecaster = ThetaForecaster(sp=12)
forecaster.fit(y, fh=fh)
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
coverage, float or list of floats, default=0.9
nominal coverage(s) of the prediction interval(s) queried
Output: pandas.DataFrame
Row index is fh
Column has multi-index:
1st level = variable name from y in fit
2nd level = coverage fractions in coverage
3rd level = string “lower” or “upper”

Entries = forecasts of lower/upper interval at nominal coverage in 2nd lvl, for var in 1st lvl, for time in row

[ ]:
coverage = 0.9
y_pred_ints = forecaster.predict_interval(coverage=coverage)
y_pred_ints

pretty-plotting the predictive interval forecasts:

[ ]:
from sktime.utils import plotting

# also requires predictions
y_pred = forecaster.predict()

fig, ax = plotting.plot_series(y, y_pred, labels=["y", "y_pred"])
ax.fill_between(
    ax.get_lines()[-1].get_xdata(),
    y_pred_ints["Coverage"][coverage]["lower"],
    y_pred_ints["Coverage"][coverage]["upper"],
    alpha=0.2,
    color=ax.get_lines()[-1].get_c(),
    label=f"{coverage} cov.pred.intervals",
)
ax.legend();

multiple coverages:

[ ]:
coverage = [0.5, 0.9, 0.95]
y_pred_ints = forecaster.predict_interval(coverage=coverage)
y_pred_ints
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
alpha, float or list of floats, default = [0.1, 0.9]
quantile points at which quantiles are queried
Output: pandas.DataFrame
Row index is fh
Column has multi-index:
1st level = variable name from y in fit
2nd level = quantile points in alpha

Entries = forecasts of quantiles at quantile point in 2nd lvl, for var in 1st lvl, for time in row

[ ]:
alpha = [0.1, 0.25, 0.5, 0.75, 0.9]
y_pred_quantiles = forecaster.predict_quantiles(alpha=alpha)
y_pred_quantiles

pretty-plotting the quantile interval forecasts:

[ ]:
from sktime.utils import plotting

_, columns = zip(*y_pred_quantiles.iteritems())
fig, ax = plotting.plot_series(y[-50:], *columns)
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
cov, boolean, default=False
whether covariance forecasts should also be returned (not all estimators support this)
Output: pandas.DataFrame, for cov=False:
Row index is fh
Column is equal to column index of y (variables)

Entries = variance forecast for variable in col, for time in row

[ ]:
y_pred_variance = forecaster.predict_var()
y_pred_variance

with covariance, using a forecaster which can return covariance forecasts:

return is pandas.DataFrame with fh indexing rows and columns;
entries are forecast covariance between row and column time
(diagonal = forecast variances)
[ ]:
from sktime.forecasting.naive import NaiveVariance

forecaster_with_covariance = NaiveVariance(forecaster)
forecaster_with_covariance.fit(y=y, fh=fh)
forecaster_with_covariance.predict_var(cov=True)
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
marginal, bool, optional, default=True
whether returned distribution is marginal over time points (True), or joint over time points (False)
(not all forecasters support marginal=False)
Output: tensorflow-probability Distribution object (requires tensorflow installed)
if marginal=True: batch shape 1D, len(fh) (time); event shape 1D, len(y.columns) (variables)
if marginal=False: event shape 2D, [len(fh), len(y.columns)]
[ ]:
y_pred_dist = forecaster.predict_proba()
y_pred_dist
[ ]:
# obtaining quantiles
y_pred_dist.quantile([0.1, 0.9])
[ ]:
# obtaining distribution parameters

y_pred_dist.parameters

Outputs of predict_interval, predict_quantiles, predict_var, predict_proba are typically but not necessarily consistent with each other!

Consistency is weak interface requirement but not strictly enforced.

Using probabilistic forecasts with update/predict stream workflow#

Example: * data observed monthly * make probabilistic forecasts for an entire year ahead * update forecasts every month * start in Dec 1950

[ ]:
# 1949 and 1950
y_start = y[:24]
# Jan 1951 etc
y_update_batch_1 = y.loc[["1951-01"]]
y_update_batch_2 = y.loc[["1951-02"]]
y_update_batch_3 = y.loc[["1951-03"]]
[ ]:
# now = Dec 1950

# 1a. fit to data available in Dec 1950
#   fh = [1, 2, ..., 12] for all 12 months ahead
forecaster.fit(y_start, fh=1 + np.arange(12))

# 1b. predict 1951, in Dec 1950
forecaster.predict_interval()
# or other proba predict functions
[ ]:
# time passes, now = Jan 1951

# 2a. update forecaster with new data
forecaster.update(y_update_batch_1)

# 2b. make new prediction - year ahead = Feb 1951 to Jan 1952
forecaster.predict_interval()
# forecaster remembers relative forecasting horizon

repeat the same commands with further data batches:

[ ]:
# time passes, now = Feb 1951

# 3a. update forecaster with new data
forecaster.update(y_update_batch_2)

# 3b. make new prediction - year ahead = Feb 1951 to Jan 1952
forecaster.predict_interval()
[ ]:
# time passes, now = Feb 1951

# 4a. update forecaster with new data
forecaster.update(y_update_batch_3)

# 4b. make new prediction - year ahead = Feb 1951 to Jan 1952
forecaster.predict_interval()

… and so on.

[ ]:
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.utils.plotting import plot_windows

cv = ExpandingWindowSplitter(step_length=1, fh=fh, initial_window=24)
plot_windows(cv, y.iloc[:50])

Probabilistic forecasting for multivariate and hierarchical data#

multivariate data: first column index for different variables

[ ]:
from sktime.datasets import load_longley
from sktime.forecasting.var import VAR

_, y = load_longley()

mv_forecaster = VAR()

mv_forecaster.fit(y, fh=[1, 2, 3])
# mv_forecaster.predict_var()

hierarchical data: probabilistic forecasts per level are row-concatenated with a row hierarchy index

[ ]:
from sktime.forecasting.arima import ARIMA
from sktime.utils._testing.hierarchical import _make_hierarchical

y_hier = _make_hierarchical()
y_hier
[ ]:
forecaster = ARIMA()
forecaster.fit(y_hier, fh=[1, 2, 3])
forecaster.predict_interval()

(more about this in the hierarchical forecasting notebook)


Metrics for probabilistic forecasts and evaluation#

overview - theory#

Predicted y has different form from true y, so metrics have form

metric(y_true: series, y_pred: proba_prediction) -> float

where proba_prediction is the type of the specific “probabilistic prediction type”.

I.e., we have the following function signature for a loss/metric \(L\):

Name

param

prediction/estimate of

general form

point forecast

conditional expectation \(\mathbb{E}[y'\|y]\)

metric(y_true, y_pred)

variance forecast

conditional variance \(Var[y'\|y]\)

metric(y_pred, y_pt, y_var) (requires point forecast too)

quantile forecast

\(\alpha\in (0,1)\)

\(\alpha\)-quantile of \(y'\|y\)

metric(y_true, y_quantiles, alpha)

interval forecast

\(c\in (0,1)\)

\([a,b]\) s.t. \(P(a\le y' \le b\| y) = c\)

metric(y_true, y_interval, c)

distribution forecast

the law/distribution of \(y'\|y\)

metric(y_true, y_distribution)

metrics: general signature and averaging#

intro using the example of the quantile loss aka interval loss aka pinball loss, in the univariate case.

For one quantile value \(\alpha\), the (per-sample) pinball loss function is defined as
\(L_{\alpha}(\widehat{y}, y) := \alpha \cdot \Theta (y - \widehat{y}) + (1-\alpha) \cdot \Theta (\widehat{y} - y)\),
where \(\Theta (x) := [1\) if \(x\ge 0\) and \(0\) otherwise \(]\) is the Heaviside function.

This can be used to evaluate:

  • multiple quantile forecasts \(\widehat{\bf y}:=\widehat{y}_1, \dots, \widehat{y}_k\) for quantiles \(\bm{\alpha} = \alpha_1,\dots, \alpha_k\) via \(L_{\bm{\alpha}}(\widehat{\bf y}, y) := \frac{1}{k}\sum_{i=1}^k L_{\alpha_i}(\widehat{y}_i, y)\)

  • interval forecasts \([\widehat{a}, \widehat{b}]\) at symmetric coverage \(c\) via \(L_c([\widehat{a},\widehat{b}], y) := \frac{1}{2} L_{\alpha_{low}}(\widehat{a}, y) + \frac{1}{2}L_{\alpha_{high}}(\widehat{b}, y)\) where \(\alpha_{low} = \frac{1-c}{2}, \alpha_{high} = \frac{1+c}{2}\)

(all are known to be strictly proper losses for their respective prediction object)

There are three things we can choose to average over:

  • quantile values, if multiple are predicted - elements of alpha in predict_interval(fh, alpha)

  • time stamps in the forecasting horizon fh - elements of fh in fit(fh) resp predict_interval(fh, alpha)

  • variables in y, in case of multivariate (later, first we look at univariate)

We will show quantile values and time stamps first:

  1. averaging by fh time stamps only -> one number per quantile value in alpha

  2. averaging over nothing -> one number per quantile value in alpha and fh time stamp

  3. averaging over both fh and quantile values in alpha -> one number

first, generating some quantile predictions. pred_quantiles now contains quantile forecasts
formally, forecasts \(\widehat{y}_j(t_i)\) where \(\widehat{y_j}\) are forecasts at quantile \(\alpha_j\), with range \(i=1\dots N, j=1\dots k\)
\(\alpha_j\) are the elements of alpha, and \(t_i\) are the future time stamps indexed by fh
[ ]:
import numpy as np

from sktime.datasets import load_airline
from sktime.forecasting.theta import ThetaForecaster

y_train = load_airline()[0:24]  # train on 24 months, 1949 and 1950
y_test = load_airline()[24:36]  # ground truth for 12 months in 1951

# try to forecast 12 months ahead, from y_train
fh = np.arange(1, 13)

forecaster = ThetaForecaster(sp=12)
forecaster.fit(y_train, fh=fh)

pred_quantiles = forecaster.predict_quantiles(alpha=[0.1, 0.25, 0.5, 0.75, 0.9])
pred_quantiles
  1. computing the loss by quantile point or interval end, averaged over fh time stamps i.e., \(\frac{1}{N} \sum_{i=1}^N L_{\alpha}(\widehat{y}(t_i), y(t_i))\) for \(t_i\) in the fh, and every alpha, this is one number per quantile value in alpha

[ ]:
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss

loss = PinballLoss(score_average=False)
loss(y_true=y_test, y_pred=pred_quantiles)
  1. computing the the individual loss values, by sample, no averaging, i.e., \(L_{\alpha}(\widehat{y}(t_i), y(t_i))\) for every \(t_i\) in fh and every \(\alpha\) in alpha this is one number per quantile value \(\alpha\) in alpha and time point \(t_i\) in fh

[ ]:
loss.evaluate_by_index(y_true=y_test, y_pred=pred_quantiles)
  1. computing the loss for a multiple quantile forecast, averaged over fh time stamps and quantile values alpha i.e., \(\frac{1}{Nk} \sum_{j=1}^k\sum_{i=1}^N L_{\alpha_j}(\widehat{y_j}(t_i), y(t_i))\) for \(t_i\) in fh, and quantile values \(\alpha_j\), this is a single number that can be used in tuning (e.g., grid search) or evaluation overall

[ ]:
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss

loss_multi = PinballLoss(score_average=True)
loss_multi(y_true=y_test, y_pred=pred_quantiles)
  1. computing the loss for a multiple quantile forecast, averaged quantile values alpha, for individual time stamps i.e., \(\frac{1}{k} \sum_{j=1}^k L_{\alpha_j}(\widehat{y_j}(t_i), y(t_i))\) for \(t_i\) in fh, and quantile values \(\alpha_j\), this is a univariate time series at fh times \(t_i\), it can be used for tuning or evaluation by horizon index

[ ]:
loss_multi.evaluate_by_index(y_true=y_test, y_pred=pred_quantiles)

Question: why is score_average a constructor flag, and evaluate_by_index a method?

  • not all losses are “by index”, so evaluate_by_index logic can vary (e.g., pseudo-samples)

  • constructor args define “mathematical object” of scientific signature: series -> non-temporal object methods define action or “way to apply”, e.g., as used in tuning or reporting

Compare score_average to multioutput arg in scikit-learn metrics and sktime.

metrics: interval vs quantile metrics#

Interval and quantile metrics can be used interchangeably:

internally, these are easily convertible to each other
recall: lower/upper interval = quantiles at \(\frac{1}{2} \pm \frac{1}2\) coverage
[ ]:
pred_interval = forecaster.predict_interval(coverage=0.8)
pred_interval

loss object recognizes input type automatically and computes corresponding interval loss

[ ]:
loss(y_true=y_test, y_pred=pred_interval)
[ ]:
loss_multi(y_true=y_test, y_pred=pred_interval)

evaluation by backtesting#

[ ]:
from sktime.datasets import load_airline
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss

# 1. define data
y = load_airline()

# 2. define splitting/backtesting regime
fh = [1, 2, 3]
cv = ExpandingWindowSplitter(step_length=12, fh=fh, initial_window=72)

# 3. define loss to use
loss = PinballLoss()
# default is score_average=True and multi_output="uniform_average", so gives a number

forecaster = ThetaForecaster(sp=12)
results = evaluate(
    forecaster=forecaster, y=y, cv=cv, strategy="refit", return_data=True, scoring=loss
)
results.iloc[:, :5].head()
  • each row is one train/test split in the walkforward setting

  • first col is the loss on the test fold

  • last two columns summarize length of training window, cutoff between train/test

roadmap items:

implementing further metrics

  • distribution prediction metrics - may need tfp extension

  • advanced evaluation set-ups

  • variance loss

contributions are appreciated!


Advanced composition: pipelines, tuning, reduction, adding proba forecasts to any estimator#

composition = constructing “composite” estimators out of multiple “component” estimators

  • reduction = building estimator type A using estimator type B

    • special case: adding proba forecasting capability to non-proba forecaster

    • special case: using proba supervised learner for proba forecasting

  • pipelining = chaining estimators, here: transformers to a forecaster

  • tuning = automated hyper-parameter fitting, usually via internal evaluation loop

    • special case: grid parameter search and random parameter search tuning

    • special case: “Auto-ML”, optimizing not just estimator hyper-parameter but also choice of estimator

Adding probabilistic forecasts to non-probabilistic forecasters#

start with a forecaster that does not produce probabilistic predictions:

[ ]:
from sktime.forecasting.exp_smoothing import ExponentialSmoothing

my_forecaster = ExponentialSmoothing()

# does the forecaster support probabilistic predictions?
my_forecaster.get_tag("capability:pred_int")

adding probabilistic predictions is possible via reduction wrappers:

[ ]:
# NaiveVariance adds intervals & variance via collecting past residuals
from sktime.forecasting.naive import NaiveVariance

# create a composite forecaster like this:
my_forecaster_with_proba = NaiveVariance(my_forecaster)

# does it support probabilistic predictions now?
my_forecaster_with_proba.get_tag("capability:pred_int")

the composite can now be used like any probabilistic forecaster:

[ ]:
y = load_airline()

my_forecaster_with_proba.fit(y, fh=[1, 2, 3])
my_forecaster_with_proba.predict_interval()

roadmap items:

more compositors to enable probabilistic forecasting

  • bootstrap forecast intervals

  • reduction to probabilistic supervised learning

  • popular “add probabilistic capability” wrappers

contributions are appreciated!

Tuning and AutoML#

tuning and autoML with probabilistic forecasters works exactly like with “ordinary” forecasters
via ForecastingGridSearchCV or ForecastingRandomSearchCV
  • change metric to tune to a probabilistic metric

  • use a corresponding probabilistic metric or loss function

Internally, evaluation will be done using probabilistic metric, via backtesting evaluation.

important: to evaluate the tuned estimator, use it in evaluate or a separate benchmarking workflow.
Using internal metric/loss values amounts to in-sample evaluation, which is over-optimistic.
[ ]:
from sktime.forecasting.model_selection import (
    ForecastingGridSearchCV,
    SlidingWindowSplitter,
)
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss

# forecaster we want to tune
forecaster = ThetaForecaster()

# parameter grid to search over
param_grid = {"sp": [1, 6, 12]}

# evaluation/backtesting regime for *tuning*
fh = [1, 2, 3]  # fh for tuning regime, does not need to be same as in fit/predict!
cv = SlidingWindowSplitter(window_length=36, fh=fh)
scoring = PinballLoss()

# construct the composite forecaster with grid search compositor
gscv = ForecastingGridSearchCV(
    forecaster, cv=cv, param_grid=param_grid, scoring=scoring, strategy="refit"
)
[ ]:
from sktime.datasets import load_airline

y = load_airline()[:60]

gscv.fit(y, fh=fh)

inspect hyper-parameter fit obtained by tuning:

[ ]:
gscv.best_params_

obtain predictions:

[ ]:
gscv.predict_interval()

for AutoML, use the MultiplexForecaster to select among multiple forecasters:

[ ]:
from sktime.forecasting.compose import MultiplexForecaster
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.naive import NaiveForecaster, NaiveVariance

forecaster = MultiplexForecaster(
    forecasters=[
        ("naive", NaiveForecaster(strategy="last")),
        ("ets", ExponentialSmoothing(trend="add", sp=12)),
    ],
)

forecaster_param_grid = {"selected_forecaster": ["ets", "naive"]}
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=forecaster_param_grid)

gscv.fit(y)
gscv.best_params_

Pipelines with probabilistic forecasters#

sktime pipelines are compatible with probabilistic forecasters:

  • ForecastingPipeline applies transformers to the exogeneous X argument before passing them to the forecaster

  • TransformedTargetForecaster transforms y and back-transforms forecasts, including interval or quantile forecasts

ForecastingPipeline takes a chain of transformers and forecasters, say,

[t1, t2, ..., tn, f],

where t[i] are forecasters that pre-process, and tp[i] are forecasters that postprocess

fit(y, X, fh) does:#

X1 = t1.fit_transform(X)
X2 = t2.fit_transform(X1)
etc
X[n] = t3.fit_transform(X[n-1])

f.fit(y=y, x=X[n])

predict_[sth](X, fh) does:#

X1 = t1.transform(X)
X2 = t2.transform(X1)
etc
X[n] = t3.transform(X[n-1])

f.predict_[sth](X=X[n], fh)

[ ]:
from sktime.datasets import load_macroeconomic
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.compose import ForecastingPipeline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.transformations.series.impute import Imputer
[ ]:
data = load_macroeconomic()
y = data["unemp"]
X = data.drop(columns=["unemp"])

y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
[ ]:
forecaster = ForecastingPipeline(
    steps=[
        ("imputer", Imputer(method="mean")),
        ("forecaster", ARIMA(suppress_warnings=True)),
    ]
)
forecaster.fit(y=y_train, X=X_train, fh=X_test.index[:5])
forecaster.predict_interval(X=X_test[:5])

TransformedTargetForecaster takes a chain of transformers and forecasters, say,

[t1, t2, ..., tn, f, tp1, tp2, ..., tk],

where t[i] are forecasters that pre-process, and tp[i] are forecasters that postprocess

fit(y, X, fh) does:#

y1 = t1.fit_transform(y)
y2 = t2.fit_transform(y1)
y3 = t3.fit_transform(y2)
etc
y[n] = t3.fit_transform(y[n-1])

f.fit(y[n])

yp1 = tp1.fit_transform(yn)
yp2 = tp2.fit_transform(yp1)
yp3 = tp3.fit_transform(yp2)
etc

predict_quantiles(y, X, fh) does:#

y1 = t1.transform(y)
y2 = t2.transform(y1)
etc
y[n] = t3.transform(y[n-1])

y_pred = f.predict_quantiles(y[n])

y_pred = t[n].inverse_transform(y_pred)
y_pred = t[n-1].inverse_transform(y_pred)
etc
y_pred = t1.inverse_transform(y_pred)
y_pred = tp1.transform(y_pred)
y_pred = tp2.transform(y_pred)
etc
y_pred = tp[n].transform(y_pred)

Note: the remaining proba predictions are inferred from predict_quantiles.

[ ]:
from sktime.datasets import load_macroeconomic
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.detrend import Deseasonalizer, Detrender
[ ]:
data = load_macroeconomic()
y = data[["unemp"]]
[ ]:
forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", Deseasonalizer(sp=12)),
        ("detrend", Detrender()),
        ("forecast", ARIMA()),
    ]
)

forecaster.fit(y, fh=[1, 2, 3])
forecaster.predict_interval()
[ ]:
forecaster.predict_quantiles()

quick creation also possible via the * dunder method, same pipeline:

[ ]:
forecaster = Deseasonalizer(sp=12) * Detrender() * ARIMA()
[ ]:
forecaster.fit(y, fh=[1, 2, 3])
forecaster.predict_interval()

Building your own probabilistic forecaster#

Getting started:

Extension template = python “fill-in” template with to-do blocks that allow you to implement your own, sktime-compatible forecasting algorithm.

Check estimators using check_estimator

For probabilistic forecasting:

  • implement at least one of predict_quantiles, predict_interval, predict_var, predict_proba

  • optimally, implement all, unless identical with defaulting behaviour as below

  • if only one is implemented, others use following defaults (in this sequence, dependent availability):

    • predict_interval uses quantiles from predict_quantiles and vice versa

    • predict_var uses variance from predict_proba, or variance of normal with IQR as obtained from predict_quantiles

    • predict_interval or predict_quantiles uses quantiles from predict_proba distribution

    • predict_proba returns normal with mean predict and variance predict_var

  • so if predictive residuals not normal, implement predict_proba or predict_quantiles

  • if interfacing, implement the ones where least “conversion” is necessary

  • ensure to set the capability:pred_int tag to True

[ ]:
# estimator checking on the fly using check_estimator

# suppose this is your new estimator
from sktime.forecasting.naive import NaiveForecaster
from sktime.utils.estimator_checks import check_estimator

# check the estimator like this:
check_estimator(NaiveForecaster)
# this prints any failed tests, and returns dictionary with
#   keys of test runs and results from the test run
# run individual tests using the tests_to_run arg or the fixtures_to_run_arg
#   these need to be identical to test or test/fixture names, see docstring
[ ]:
# to raise errors for use in traceback debugging:
check_estimator(NaiveForecaster, return_exceptions=False)
# this does not raise an error since NaiveForecaster is fine, but would if it weren't

Summary#
  • unified API for probabilistic forecasting and probabilistic metrics

  • integrating other packages (e.g. scikit-learn, statsmodels, pmdarima, prophet)

  • interface for composite model building is same, proba or not (pipelining, ensembling, tuning, reduction)

  • easily extensible with custom estimators

Useful resources#

Credits#

notebook creation: fkiraly

probablistic forecasting framework: fkiraly, kejsitake
probabilistic metrics, tuning: eenticott-shell, fkiraly
probabilistic estimators: aiwalter, fkiraly, ilyasmoutawwakil, k1m190r, kejsitake

Generated using nbsphinx. The Jupyter notebook can be found here.