temporal_train_test_split#
- temporal_train_test_split(y: Series | DataFrame | ndarray | Index, X: DataFrame | None = None, test_size: float | None = None, train_size: float | None = None, fh=None, anchor: str = 'start') tuple[Series, Series] | tuple[Series, Series, DataFrame, DataFrame][source]#
Split time series data containers into a single train/test split.
Creates a single train/test split of endogenous time series
y, an optionally exogeneous time seriesX.Splits time series
yinto a single temporally ordered train and test split. The split is based ontest_sizeandtrain_sizeparameters, which can signify fractions of total number of indices, or an absolute number of integers to cut.If the data contains multiple time series (Panel or Hierarchical), fractions and train-test sets will be computed per individual time series.
If
Xis provided, will also produce a single train/test split ofX, at the samelocindices asy. If non-pandasbased containers are used, will useilocindex instead.- Parameters:
- ytime series in sktime compatible data container format
endogenous time series
- Xtime series in sktime compatible data container format, optional, default=None
exogenous time series
- test_sizefloat, int or None, optional (default=None)
If float, must be between 0.0 and 1.0, and is interpreted as the proportion of the dataset to include in the test split. Proportions are rounded to the next higher integer count of samples (ceil). If int, is interpreted as total number of test samples. If None, the value is set to the complement of the train size. If
train_sizeis also None, it will be set to 0.25.- train_sizefloat, int, or None, (default=None)
If float, must be between 0.0 and 1.0, and is interpreted as the proportion of the dataset to include in the train split. Proportions are rounded to the next lower integer count of samples (floor). If int, is interpreted as total number of train samples. If None, the value is set to the complement of the test size.
- fhForecastingHorizon
A forecast horizon to use for splitting, alternative specification for test set. If given,
test_sizeandtrain_sizecannot also be specified and must be None. Iffhis passed, the test set will be: iffh.is_relative: the last possible indices to matchfhwithinyifnot fh.is_relative: the indices at the absolute index offh- anchorstr, “start” (default) or “end”
determines behaviour if train and test sizes do not sum up to all data used only if
fh=Noneand bothtest_sizeandtrain_sizeare not None if “start”, cuts train and test set from start of available series if “end”, cuts train and test set from end of available series
- Returns:
- splittingtuple, length = 2 * len(arrays)
Tuple containing train-test split of
y, andXif given. ifX is None, returns(y_train, y_test). Else, returns(y_train, y_test, X_train, X_test).
References
[1]originally adapted from alkaline-ml/pmdarima
Examples
>>> from sktime.datasets import load_airline, load_osuleaf >>> from sktime.split import temporal_train_test_split >>> from sktime.utils._testing.panel import _make_panel >>> # univariate time series >>> y = load_airline() >>> y_train, y_test = temporal_train_test_split(y, test_size=36) >>> y_test.shape (36,) >>> # panel time series >>> y = _make_panel(n_instances = 2, n_timepoints = 20) >>> y_train, y_test = temporal_train_test_split(y, test_size=5) >>> # last 5 timepoints for each instance >>> y_test.shape (10, 1)
The function can also be applied to panel or hierarchical data, in this case the split will be applied per individual time series: >>> from sktime.utils._testing.hierarchical import _make_hierarchical >>> y = _make_hierarchical() >>> y_train, y_test = temporal_train_test_split(y, test_size=0.2)