ForecastingBenchmark#
- class ForecastingBenchmark(id_format: str | None = None, backend=None, backend_params=None, return_data=False)[source]#
Forecasting benchmark.
Run a series of forecasters against a series of tasks defined via dataset loaders, cross validation splitting strategies and performance metrics, and return results as a df (as well as saving to file).
- Parameters:
- id_format: str, optional (default=None)
A regex used to enforce task/estimator ID to match a certain format
- backendstring, by default “None”.
Parallelization backend to use for runs.
“None”: executes loop sequentally, simple list comprehension
“loky”, “multiprocessing” and “threading”: uses
joblib.Parallel
loops“joblib”: custom and 3rd party
joblib
backends, e.g.,spark
“dask”: uses
dask
, requiresdask
package in environment“dask_lazy”: same as “dask”,
but changes the return to (lazy)
dask.dataframe.DataFrame
. - “ray”: usesray
, requiresray
package in environmentRecommendation: Use “dask” or “loky” for parallel evaluate. “threading” is unlikely to see speed ups due to the GIL and the serialization backend (
cloudpickle
) for “dask” and “loky” is generally more robust than the standardpickle
library used in “multiprocessing”.- backend_paramsdict, optional
additional parameters passed to the backend as config. Directly passed to
utils.parallel.parallelize
. Valid keys depend on the value ofbackend
:“None”: no additional parameters,
backend_params
is ignored“loky”, “multiprocessing” and “threading”: default
joblib
backends
any valid keys for
joblib.Parallel
can be passed here, e.g.,n_jobs
, with the exception ofbackend
which is directly controlled bybackend
. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults. - “joblib”: custom and 3rd partyjoblib
backends, e.g.,spark
. any valid keys forjoblib.Parallel
can be passed here, e.g.,n_jobs
,backend
must be passed as a key ofbackend_params
in this case. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults. - “dask”: any valid keys fordask.compute
can be passed, e.g.,scheduler
“ray”: The following keys can be passed:
“ray_remote_args”: dictionary of valid keys for
ray.init
- “shutdown_ray”: bool, default=True; False prevents
ray
from shutting down after parallelization.
- “shutdown_ray”: bool, default=True; False prevents
“logger_name”: str, default=”ray”; name of the logger to use.
“mute_warnings”: bool, default=False; if True, suppresses warnings
- return_databool, optional (default=False)
Whether to return the prediction and the ground truth data in the results.
Methods
add_estimator
(estimator[, estimator_id])Register an estimator to the benchmark.
add_task
(dataset_loader, cv_splitter, scorers)Register a forecasting task to the benchmark.
run
(output_file[, force_rerun])Run the benchmarking for all tasks and estimators.
- add_estimator(estimator: BaseEstimator, estimator_id: str | None = None)[source]#
Register an estimator to the benchmark.
- Parameters:
- estimatorDict, List or BaseEstimator object
Estimator to add to the benchmark. If Dict, keys are estimator_ids used to customise identifier ID and values are estimators. If List, each element is an estimator. estimator_ids are generated automatically using the estimator’s class name.
- estimator_idstr, optional (default=None)
Identifier for estimator. If none given then uses estimator’s class name.
- add_task(dataset_loader: Callable | tuple, cv_splitter: BaseSplitter, scorers: list[BaseMetric], task_id: str | None = None, cv_global: BaseSplitter | None = None, error_score: str = 'raise', strategy: str = 'refit')[source]#
Register a forecasting task to the benchmark.
- Parameters:
- dataUnion[Callable, tuple]
Can be - a function which returns a dataset, like from sktime.datasets. - a tuple contianing two data container that are sktime comptaible. - single data container that is sktime compatible (only endogenous data).
- cv_splitterBaseSplitter object
Splitter used for generating validation folds.
- scorersa list of BaseMetric objects
Each BaseMetric output will be included in the results.
- task_idstr, optional (default=None)
Identifier for the benchmark task. If none given then uses dataset loader name combined with cv_splitter class name.
- cv_global: sklearn splitter, or sktime instance splitter, default=None
If
cv_global
is passed, then global benchmarking is applied, as follows:1. the
cv_global
splitter is used to split data at instance level, into a global training sety_train
, and a global test sety_test_global
. 2. The estimator is fitted to the global training sety_train
. 3.cv_splitter
then splits the global test sety_test_global
temporally, to obtain temporal splitsy_past
,y_true
.Overall, with
y_train
,y_past
,y_true
as above, the following evaluation will be applied:forecaster.fit(y=y_train, fh=cv.fh) y_pred = forecaster.predict(y=y_past) metric(y_true, y_pred)
- error_score“raise” or numeric, default=np.nan
Value to assign to the score if an exception occurs in estimator fitting. If set to “raise”, the exception is raised. If a numeric value is given, FitFailedWarning is raised.
- strategy{“refit”, “update”, “no-update_params”}, optional, default=”refit”
defines the ingestion mode when the forecaster sees new data when window expands “refit” = forecaster is refitted to each training window “update” = forecaster is updated with training window data, in sequence provided “no-update_params” = fit to first training window, re-used without fit or update
- Returns:
- A dictionary of benchmark results for that forecaster
- run(output_file: str, force_rerun: str | list[str] = 'none')[source]#
Run the benchmarking for all tasks and estimators.
- Parameters:
- output_filestr
Path to save the results to.
- force_rerunUnion[str, list[str]], optional (default=”none”)
If “none”, will skip validation if results already exist. If “all”, will run validation for all tasks and models. If list of str, will run validation for tasks and models in list.