ClassificationBenchmark#
- class ClassificationBenchmark(id_format: str | None = None, backend=None, backend_params=None, return_data=False)[source]#
Classification benchmark.
Run a series of classifiers against a series of tasks defined via dataset loaders, cross validation splitting strategies and performance metrics, and return results as a df (as well as saving to file).
- Parameters:
- id_format: str, optional (default=None)
A regex used to enforce task/estimator ID to match a certain format
- backendstring, by default “None”.
Parallelization backend to use for runs.
“None”: executes loop sequentially, simple list comprehension
“loky”, “multiprocessing” and “threading”: uses
joblib.Parallelloops“joblib”: custom and 3rd party
joblibbackends, e.g.,spark“dask”: uses
dask, requiresdaskpackage in environment- “dask_lazy”: same as “dask”, but changes the return to (lazy)
dask.dataframe.DataFrame.
“ray”: uses
ray, requiresraypackage in environment
Recommendation: Use “dask” or “loky” for parallel evaluate. “threading” is unlikely to see speed ups due to the GIL and the serialization backend (
cloudpickle) for “dask” and “loky” is generally more robust than the standardpicklelibrary used in “multiprocessing”.- backend_paramsdict, optional
additional parameters passed to the backend as config. Directly passed to
utils.parallel.parallelize. Valid keys depend on the value ofbackend:“None”: no additional parameters,
backend_paramsis ignored“loky”, “multiprocessing” and “threading”: default
joblibbackends
any valid keys for
joblib.Parallelcan be passed here, e.g.,n_jobs, with the exception ofbackendwhich is directly controlled bybackend. Ifn_jobsis not passed, it will default to-1, other parameters will default tojoblibdefaults. - “joblib”: custom and 3rd partyjoblibbackends, e.g.,spark. any valid keys forjoblib.Parallelcan be passed here, e.g.,n_jobs,backendmust be passed as a key ofbackend_paramsin this case. Ifn_jobsis not passed, it will default to-1, other parameters will default tojoblibdefaults. - “dask”: any valid keys fordask.computecan be passed, e.g.,scheduler“ray”: The following keys can be passed:
“ray_remote_args”: dictionary of valid keys for
ray.init- “shutdown_ray”: bool, default=True; False prevents
rayfrom shutting down after parallelization.
- “shutdown_ray”: bool, default=True; False prevents
“logger_name”: str, default=”ray”; name of the logger to use.
“mute_warnings”: bool, default=False; if True, suppresses warnings
- return_databool, optional (default=False)
Whether to return the prediction and the ground truth data in the results.
Methods
add(*args)Add estimators, task components, full task tuples, or catalogues.
add_estimator(estimator[, estimator_id])Register an estimator to the benchmark.
add_task(dataset_loader, cv_splitter, scorers)Register a classification task to the benchmark.
Register stored tasks from global DATASETS, METRICS, CV_SPLITTERS.
run([output_file, force_rerun])Run the benchmarking for all tasks and estimators.
- add_task(dataset_loader: Callable | tuple, cv_splitter: Any, scorers: list, task_id: str | None = None, error_score: str = 'raise')[source]#
Register a classification task to the benchmark.
- Parameters:
- dataUnion[Callable, tuple]
Can be - a function which returns a dataset, like from sktime.datasets. - a tuple contianing two data container that are sktime comptaible. - single data container that is sktime compatible (only endogenous data).
- cv_splitterBaseSplitter object
Splitter used for generating validation folds.
- scorersa list of BaseMetric objects
Each BaseMetric output will be included in the results.
- task_idstr, optional (default=None)
Identifier for the benchmark task. If none given then uses dataset loader name combined with cv_splitter class name.
- error_score“raise” or numeric, default=np.nan
Value to assign to the score if an exception occurs in estimator fitting. If set to “raise”, the exception is raised. If a numeric value is given, FitFailedWarning is raised.
- Returns:
- A dictionary of benchmark results for that classifier
- add(*args)[source]#
Add estimators, task components, full task tuples, or catalogues.
Objects are interpreted based on their
scitypeand added to the benchmark accordingly. Multiple objects can be provided in a single call.Supported inputs include estimators, datasets, metrics, CV splitters, task tuples, and catalogues.
- Parameters:
- *argsobject
Objects to add. Supported patterns are:
- estimator
Estimator with scitype “classifier” or “forecaster”.
- dict
Dictionary of estimators where keys are custom `estimator_id`s and values are the estimators.
- list
List of estimators. `estimator_id`s are generated automatically using the estimator’s class name.
- dataset
Object with scitype dataset_classification or dataset_forecasting.
- metric
Object with scitype metric_forecasting, metric_tabular, or metric_proba_tabular.
- cv_splitter
Object with scitype “splitter” or “splitter_tabular”.
- (dataset, metric, splitter)
Tuple specifying a full task. Must contain exactly one dataset, one metric, and one splitter.
- catalogue
Instance of
BaseCatalogue. All contained objects are added recursively.
- Raises:
- TypeError
If:
a tuple has unsupported length (e.g., not length 3 for task tuples)
a task tuple does not contain exactly one dataset, metric, and splitter
duplicate scitypes are present in a task tuple
an object has an unrecognized
scitype
Notes
Task tuples are order-invariant; roles are inferred via
scitype.Duplicate datasets, metrics, and splitters are ignored.
Examples
>>> benchmark = ClassificationBenchmark()
Add an estimator: >>> benchmark.add(DummyClassifier())
Add components individually: >>> benchmark.add(ArrowHead()) >>> benchmark.add(accuracy_score) >>> benchmark.add(KFold(n_splits=3))
Add a task tuple (order does not matter): >>> benchmark.add((accuracy_score, ArrowHead(), KFold(n_splits=3)))
Add a dictionary of estimators with custom IDs: >>> benchmark.add( … { … “dummy”: DummyClassifier(), … “knn”: KNeighborsClassifier(), … } … )
Add a list of estimators (IDs generated automatically): >>> benchmark.add([DummyClassifier(), KNeighborsClassifier()])
Add multiple objects: >>> benchmark.add( … {“dummy_1”: DummyClassifier()}, … (ArrowHead(), accuracy_score, KFold(n_splits=3)), … )
- add_estimator(estimator: BaseEstimator, estimator_id: str | None = None)[source]#
Register an estimator to the benchmark.
- Parameters:
- estimatordict, list or BaseEstimator object
Estimator to add to the benchmark.
if
BaseEstimator, single estimator.estimator_idis generated as the estimator’s class name if not provided.If
dict, keys are ``estimator_id``s used to customise identifier ID and values are estimators.If
list, each element is an estimator. ``estimator_id``s are generated automatically using the estimator’s class name.
- estimator_idstr, optional (default=None)
Identifier for estimator. If none given then uses estimator’s class name.
- run(output_file: str = None, force_rerun: str | list[str] = 'none')[source]#
Run the benchmarking for all tasks and estimators.
If
output_fileis provided, results will be saved to a file or location, in a format inferred from the file extension.The exact format is determined by the storage backend used, see documentation on storage handlers in
sktime.benchmarking._storage_handlers.get_storage_backend.- Parameters:
- output_filestr or None (default)
Path to save the results to. If None, results will not be saved.
- force_rerunUnion[str, list[str]], optional (default=”none”)
If “none”, will skip validation if results already exist.
If “all”, will run validation for all tasks and models.
If list of str, will run validation for tasks and models in list.