ltsm.data_provider.tokenizer package
Submodules
ltsm.data_provider.tokenizer.standard_scaler module
- class ltsm.data_provider.tokenizer.standard_scaler.StandardScaler[source]
Bases:
BaseProcessor
Represents a Standard Scaler object that uses Sklearn’s Standard Scaler for data processing.
- module_id
The identifier for base processor objects.
- Type:
str
- inverse_process(data)[source]
Scales back the data to its original representation.
- Parameters:
data (np.ndarray) – The data to scale back.
- Returns:
The scaled back data.
- Return type:
np.ndarray
- load(save_dir)[source]
Loads the scaler saved at the save_dir directory.
- Parameters:
save_dir (str) – The directory the scaler was saved.
- module_id = 'standard_scaler'
- process(raw_data, train_data, val_data, test_data, fit_train_only=False, do_anomaly=False)[source]
Standardizes the training, validation, and test sets by removing the mean and scaling to unit variance.
- Parameters:
raw_data (np.ndarray) – The raw data.
train_data (List[np.ndarray]) – The list of training sequences.
val_data (List[np.ndarray]) – The list of validation sequences.
test_data (List[np.ndarray]) – The list of test sequences.
fit_train_only (bool) – Indicates whether the datasets should be scaled based on the training data.
- Returns:
A tuple of three lists containing the processed training, validation, and test data.
- Return type:
Tuple[List[np.ndarray], List[np.ndarray], List[np.ndarray]]
ltsm.data_provider.tokenizer.tokenizer_processor module
- class ltsm.data_provider.tokenizer.tokenizer_processor.ChronosTokenizer[source]
Bases:
object
A
ChronosTokenizer
definines how time series are mapped into token IDs and back.For details, see the
input_transform
andoutput_transform
methods, which concrete classes must implement.- input_transform(context)[source]
Turn a batch of time series into token IDs, attention map, and scale.
- Parameters:
context (
Tensor
) – A tensor shaped (batch_size, time_length), containing the timeseries to forecast. Use left-padding withtorch.nan
to align time series of different lengths.- Return type:
Tuple
[Tensor
,Tensor
,Any
]- Returns:
token_ids – A tensor of integers, shaped (batch_size, time_length + 1) if
config.use_eos_token
and (batch_size, time_length) otherwise, containing token IDs for the input series.attention_mask – A boolean tensor, same shape as
token_ids
, indicating which input observations are nottorch.nan
(i.e. not missing nor padding).tokenizer_state – An object that will be passed to
output_transform
. Contains the relevant context to decode output samples into real values, such as location and scale parameters.
- output_transform(samples, tokenizer_state)[source]
Turn a batch of sample token IDs into real values.
- Parameters:
samples (
Tensor
) – A tensor of integers, shaped (batch_size, num_samples, time_length), containing token IDs of sample trajectories.tokenizer_state (
Any
) – An object returned byinput_transform
containing relevant context to decode samples, such as location and scale. The nature of this depends on the specific tokenizer.
- Returns:
A real tensor, shaped (batch_size, num_samples, time_length), containing forecasted sample paths.
- Return type:
forecasts
- class ltsm.data_provider.tokenizer.tokenizer_processor.MeanScaleUniformBins(low_limit, high_limit, config)[source]
Bases:
ChronosTokenizer
- input_transform(context)[source]
Turn a batch of time series into token IDs, attention map, and scale.
- Parameters:
context (
Tensor
) – A tensor shaped (batch_size, time_length), containing the timeseries to forecast. Use left-padding withtorch.nan
to align time series of different lengths.- Return type:
Tuple
[Tensor
,Tensor
,Tensor
]- Returns:
token_ids – A tensor of integers, shaped (batch_size, time_length + 1) if
config.use_eos_token
and (batch_size, time_length) otherwise, containing token IDs for the input series.attention_mask – A boolean tensor, same shape as
token_ids
, indicating which input observations are nottorch.nan
(i.e. not missing nor padding).tokenizer_state – An object that will be passed to
output_transform
. Contains the relevant context to decode output samples into real values, such as location and scale parameters.
- output_transform(samples, scale)[source]
Turn a batch of sample token IDs into real values.
- Parameters:
samples (
Tensor
) – A tensor of integers, shaped (batch_size, num_samples, time_length), containing token IDs of sample trajectories.tokenizer_state – An object returned by
input_transform
containing relevant context to decode samples, such as location and scale. The nature of this depends on the specific tokenizer.
- Returns:
A real tensor, shaped (batch_size, num_samples, time_length), containing forecasted sample paths.
- Return type:
forecasts
- class ltsm.data_provider.tokenizer.tokenizer_processor.TokenizerConfig(tokenizer_class, tokenizer_kwargs, n_tokens, n_special_tokens, pad_token_id, eos_token_id, use_eos_token, model_type, context_length, prediction_length, num_samples, temperature, top_k, top_p)[source]
Bases:
object
This class holds all the configuration parameters to be used by
ChronosTokenizer
andChronosModel
.-
context_length:
int
-
eos_token_id:
int
-
model_type:
Literal
['causal'
,'seq2seq'
]
-
n_special_tokens:
int
-
n_tokens:
int
-
num_samples:
int
-
pad_token_id:
int
-
prediction_length:
int
-
temperature:
float
-
tokenizer_class:
str
-
tokenizer_kwargs:
Dict
[str
,Any
]
-
top_k:
int
-
top_p:
float
-
use_eos_token:
bool
-
context_length: