Implementing Estimators#
This page describes how to implement sktime compatible estimators, and how to ensure and test compatibility.
There are additional steps for estimators that are contributed to sktime directly.
Implementing an sktime compatible estimator#
The high-level steps to implement sktime compatible estimators are as follows:
identify the type of the estimator: forecaster, classifier, etc
copy the extension template for that kind of estimator to its intended location
complete the extension template
run the
sktimetest suite on the implemented estimatorif the test suite highlights bugs or issues, fix them and go to 4
What is my learning task?#
sktime is structured along modules encompassing specific learning tasks,
e.g., forecasting or time series classification.
For brevity, we define an estimator’s scientific type or “scitype” by the formal learning task that it solves.
For example, the scitype of an estimator that solves the forecasting task is “forecaster”.
The scitype of an estimator that solves the time series classification task is “time series classifier”.
Estimators for a given scitype should be located in the respective module.
The estimator scitypes also map onto the different extension templates found in the extension_templates
directory of sktime.
Usually, the scitype of a given estimator is directly determined by what the estimator does. This is also, often, explicitly signposted in publications related to the estimator. For instance, most textbooks mention ARIMA in the context of forecasting, so in that hypothetical situation it makeas sense to consider the “forecaster” template. Then, inspect the template and check whether the methods of the class map clearly onto routines of the estimator. If not, another template might be more appropriate.
The most common point of confusion here is between transformers and other estimator types, since transformers are often used as parts of algorithms of other type.
If unsure, feel free to post your question on one of sktime’s social channels.
Don’t panic - it is not uncommon that academic publications are not clear about the type of an estimator,
and correct categorization may be difficult even to experts.
What are sktime extension templates?#
Extension templates are convenient “fill-in” templates for implementers of new estimators.
They fit into sktime’s unified interface as follows:
for each scitype, there is a public user interface, defined by the respective base class. For instance,
BaseForecasterdefines thefitandpredictinterfaces for forecasters. All forecasters will implementfitandpredictthe same way, by inheritance fromBaseForecaster. The public interface follows the “strategy” object orientation pattern.for each scitype, there is a private extender interface, defined by the extension contract in the extension template. For instance, the
forecaster.pyextension template for forecasters explains what to fill in for a concrete forecaster inheriting fromBaseForecaster. In most extension templates, users should implement private methods (“inner” methods), e.g.,_fitand_predictfor forecasters. Boilerplate code rests within the public part of the interface, infitandpredict. The extender interface follows the “template” object orientation pattern.
Extenders familiar with scikit-learn extension should note the following difference to scikit-learn:
the public interface, e.g., fit and predict, is never overridden in sktime (concrete) estimators.
Implementation happens in the private, extender sided interface, e.g., _fit and _predict.
This allows to avoid boilerplate replication, such as check_X etc in scikit-learn.
This also allows richer boilerplate, such as automated vectorization functionality or input conversion.
How to use sktime extension templates#
To use the sktime extension templates, copy them to the intended location of the estimator.
Inside the extension templates, necessary actions are marked with todo.
The typical workflow goes through the extension template by searching for todo, and carrying out
the action described next to the todo.
Extension templates typically have the following todo:
choosing name and parameters for the estimator
filling in the
__init__: writing parameters toself, callingsuper’s__init__filling in docstrings of the module and the estimator. This is recommended as early as parameters have been settled on, it tends to be useful as a specification to follow in implementation.
filling in the tags for the estimator. Some tags are “capabilities”, i.e., what the estimator can do, e.g., dealing with nans. Other tags determine the format of inputs seen in the “inner” methods
_fitetc, these tags are usually calledX_inner_mtypeor similar. This is useful in case the inner functionality assumesnumpy.ndarray, orpandas.DataFrame, and helps avoid conversion boilerplate. The type strings can be found indatatypes.MTYPE_REGISTER. For a tutorial on data type conventions, seeexamples/AA_datatypes_and_datasets.Filling in the “inner” methods, e.g.,
_fitand_predict. The docstrings and comments in the extension template should be followed here. The docstrings also describe the guarantees on the inputs to the “inner” methods, which are typically stronger than the guarantees on inputs to the public methods, and determined by values of tags that have been set. For instance, setting the tagy_inner_mtypetopd.DataFramefor a forecaster guarantees that theyseen by_fitwill be apandas.DataFrame, complying with additional data container specifications insktime(e.g., index types).filling in testing parameters in
get_test_params. The selection of parameters should cover major estimator internal case distinctions to achieve good coverage.
Some common caveats, also described in extension template text:
__init__parameters should be written toselfand never be changedspecial case of this: estimator components, i.e., parameters that are estimators, should generally be cloned (via
sklearn.clone), and method should be called only on the clonesmethods should generally avoid side effects on arguments
non-state changing methods should not write to
selfin generaltypically, implementing
get_paramsandset_paramsis not needed, sincesktime’sBaseEstimatorinherits fromsklearn’s. Customget_params,set_paramsare typically needed only for complex cases only heterogeneous composites, e.g., pipelines with parameters that are nested structures containing estimators.
How to test interface conformance#
Usually, the simplest way to test interface conformance with sktime is via the
check_estimator methods in the utils.estimator_checks module.
When invoked, this will collect tests in sktime relevant for the estimator type and
run them on the estimator.
If the target location of the estimator is within sktime, then the sktime test
suite can be run instead. The sktime test suite (and CI/CD) will automatically
collect all estimators of a certain type and run relevant tests on them.
Testing within a third party extension package#
For third party extension packages to sktime (open or closed),
or third party modules that aim for interface compliance with sktime,
the sktime test suite can be imported and extended in two ways:
importing
check_estimator, this will carry out the tests defined insktimeimporting test classes, e.g.,
test_all_estimators.TestAllEstimatorsortest_all_forecasters.TestAllForecasters. The imports will be discovered directly bypytest. The test suite also be extended by inheriting from the test classes.
Adding an sktime compatible estimator to sktime#
When adding an sktime compatible estimator to sktime itself, a number of
additional things need to be done:
ensure that code also meets
sktime'sdocumentation standards.add the estimator to the
sktimeAPI reference. This is done by adding a reference to the estimator in the correctrstfile insidedocs/source/api_reference.authors of the estimator should add themselves to
CODEOWNERS, as owners of the contributed estimator.if the estimator relies on soft dependencies, or adds new soft dependencies, the steps in the “dependencies” developer guide should be followed
ensure that the estimator passes the entire local test suite of
sktime, with the estimator in its target location. To run tests only for the estimator, the commandpytest -k "EstimatorName"can be used (or vs code GUI filter functionality)ensure that test parameters in
get_test_paramsare chosen such that runtime of estimator specific tests remains in the seconds order onsktimeremote CI/CD
Don’t panic - when contributing to sktime, core developers will give helpful pointers on the above in their PR reviews.
It is recommended to open a draft PR to get feedback early.