ltsm.data_provider.tokenizer package

Submodules

ltsm.data_provider.tokenizer.standard_scaler module

class ltsm.data_provider.tokenizer.standard_scaler.StandardScaler[source]

Bases: BaseProcessor

Represents a Standard Scaler object that uses Sklearn’s Standard Scaler for data processing.

module_id

The identifier for base processor objects.

Type:

str

inverse_process(data)[source]

Scales back the data to its original representation.

Parameters:

data (np.ndarray) – The data to scale back.

Returns:

The scaled back data.

Return type:

np.ndarray

load(save_dir)[source]

Loads the scaler saved at the save_dir directory.

Parameters:

save_dir (str) – The directory the scaler was saved.

module_id = 'standard_scaler'
process(raw_data, train_data, val_data, test_data, fit_train_only=False, do_anomaly=False)[source]

Standardizes the training, validation, and test sets by removing the mean and scaling to unit variance.

Parameters:
  • raw_data (np.ndarray) – The raw data.

  • train_data (List[np.ndarray]) – The list of training sequences.

  • val_data (List[np.ndarray]) – The list of validation sequences.

  • test_data (List[np.ndarray]) – The list of test sequences.

  • fit_train_only (bool) – Indicates whether the datasets should be scaled based on the training data.

Returns:

A tuple of three lists containing the processed training, validation, and test data.

Return type:

Tuple[List[np.ndarray], List[np.ndarray], List[np.ndarray]]

save(save_dir)[source]

Saves the scaler to the save_dir directory as a Pickle file named processor.pkl.

Parameters:

save_dir (str) – The directory where to store the scaler.

ltsm.data_provider.tokenizer.tokenizer_processor module

class ltsm.data_provider.tokenizer.tokenizer_processor.ChronosTokenizer[source]

Bases: object

A ChronosTokenizer definines how time series are mapped into token IDs and back.

For details, see the input_transform and output_transform methods, which concrete classes must implement.

input_transform(context)[source]

Turn a batch of time series into token IDs, attention map, and scale.

Parameters:

context (Tensor) – A tensor shaped (batch_size, time_length), containing the timeseries to forecast. Use left-padding with torch.nan to align time series of different lengths.

Return type:

Tuple[Tensor, Tensor, Any]

Returns:

  • token_ids – A tensor of integers, shaped (batch_size, time_length + 1) if config.use_eos_token and (batch_size, time_length) otherwise, containing token IDs for the input series.

  • attention_mask – A boolean tensor, same shape as token_ids, indicating which input observations are not torch.nan (i.e. not missing nor padding).

  • tokenizer_state – An object that will be passed to output_transform. Contains the relevant context to decode output samples into real values, such as location and scale parameters.

output_transform(samples, tokenizer_state)[source]

Turn a batch of sample token IDs into real values.

Parameters:
  • samples (Tensor) – A tensor of integers, shaped (batch_size, num_samples, time_length), containing token IDs of sample trajectories.

  • tokenizer_state (Any) – An object returned by input_transform containing relevant context to decode samples, such as location and scale. The nature of this depends on the specific tokenizer.

Returns:

A real tensor, shaped (batch_size, num_samples, time_length), containing forecasted sample paths.

Return type:

forecasts

class ltsm.data_provider.tokenizer.tokenizer_processor.MeanScaleUniformBins(low_limit, high_limit, config)[source]

Bases: ChronosTokenizer

input_transform(context)[source]

Turn a batch of time series into token IDs, attention map, and scale.

Parameters:

context (Tensor) – A tensor shaped (batch_size, time_length), containing the timeseries to forecast. Use left-padding with torch.nan to align time series of different lengths.

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns:

  • token_ids – A tensor of integers, shaped (batch_size, time_length + 1) if config.use_eos_token and (batch_size, time_length) otherwise, containing token IDs for the input series.

  • attention_mask – A boolean tensor, same shape as token_ids, indicating which input observations are not torch.nan (i.e. not missing nor padding).

  • tokenizer_state – An object that will be passed to output_transform. Contains the relevant context to decode output samples into real values, such as location and scale parameters.

output_transform(samples, scale)[source]

Turn a batch of sample token IDs into real values.

Parameters:
  • samples (Tensor) – A tensor of integers, shaped (batch_size, num_samples, time_length), containing token IDs of sample trajectories.

  • tokenizer_state – An object returned by input_transform containing relevant context to decode samples, such as location and scale. The nature of this depends on the specific tokenizer.

Returns:

A real tensor, shaped (batch_size, num_samples, time_length), containing forecasted sample paths.

Return type:

forecasts

class ltsm.data_provider.tokenizer.tokenizer_processor.TokenizerConfig(tokenizer_class, tokenizer_kwargs, n_tokens, n_special_tokens, pad_token_id, eos_token_id, use_eos_token, model_type, context_length, prediction_length, num_samples, temperature, top_k, top_p)[source]

Bases: object

This class holds all the configuration parameters to be used by ChronosTokenizer and ChronosModel.

context_length: int
create_tokenizer()[source]
Return type:

ChronosTokenizer

eos_token_id: int
model_type: Literal['causal', 'seq2seq']
n_special_tokens: int
n_tokens: int
num_samples: int
pad_token_id: int
prediction_length: int
temperature: float
tokenizer_class: str
tokenizer_kwargs: Dict[str, Any]
top_k: int
top_p: float
use_eos_token: bool

Module contents

ltsm.data_provider.tokenizer.register_processor(module)[source]