ltsm.data_provider.tokenizer package
Submodules
ltsm.data_provider.tokenizer.standard_scaler module
- class ltsm.data_provider.tokenizer.standard_scaler.StandardScaler[source]
Bases:
BaseProcessorRepresents a Standard Scaler object that uses Sklearn’s Standard Scaler for data processing.
- module_id
The identifier for base processor objects.
- Type:
str
- inverse_process(data)[source]
Scales back the data to its original representation.
- Parameters:
data (np.ndarray) – The data to scale back.
- Returns:
The scaled back data.
- Return type:
np.ndarray
- load(save_dir)[source]
Loads the scaler saved at the save_dir directory.
- Parameters:
save_dir (str) – The directory the scaler was saved.
- module_id = 'standard_scaler'
- process(raw_data, train_data, val_data, test_data, fit_train_only=False, do_anomaly=False)[source]
Standardizes the training, validation, and test sets by removing the mean and scaling to unit variance.
- Parameters:
raw_data (np.ndarray) – The raw data.
train_data (List[np.ndarray]) – The list of training sequences.
val_data (List[np.ndarray]) – The list of validation sequences.
test_data (List[np.ndarray]) – The list of test sequences.
fit_train_only (bool) – Indicates whether the datasets should be scaled based on the training data.
- Returns:
A tuple of three lists containing the processed training, validation, and test data.
- Return type:
Tuple[List[np.ndarray], List[np.ndarray], List[np.ndarray]]
ltsm.data_provider.tokenizer.tokenizer_processor module
- class ltsm.data_provider.tokenizer.tokenizer_processor.ChronosTokenizer[source]
Bases:
objectA
ChronosTokenizerdefinines how time series are mapped into token IDs and back.For details, see the
input_transformandoutput_transformmethods, which concrete classes must implement.- input_transform(context)[source]
Turn a batch of time series into token IDs, attention map, and scale.
- Parameters:
context (
Tensor) – A tensor shaped (batch_size, time_length), containing the timeseries to forecast. Use left-padding withtorch.nanto align time series of different lengths.- Return type:
Tuple[Tensor,Tensor,Any]- Returns:
token_ids – A tensor of integers, shaped (batch_size, time_length + 1) if
config.use_eos_tokenand (batch_size, time_length) otherwise, containing token IDs for the input series.attention_mask – A boolean tensor, same shape as
token_ids, indicating which input observations are nottorch.nan(i.e. not missing nor padding).tokenizer_state – An object that will be passed to
output_transform. Contains the relevant context to decode output samples into real values, such as location and scale parameters.
- output_transform(samples, tokenizer_state)[source]
Turn a batch of sample token IDs into real values.
- Parameters:
samples (
Tensor) – A tensor of integers, shaped (batch_size, num_samples, time_length), containing token IDs of sample trajectories.tokenizer_state (
Any) – An object returned byinput_transformcontaining relevant context to decode samples, such as location and scale. The nature of this depends on the specific tokenizer.
- Returns:
A real tensor, shaped (batch_size, num_samples, time_length), containing forecasted sample paths.
- Return type:
forecasts
- class ltsm.data_provider.tokenizer.tokenizer_processor.MeanScaleUniformBins(low_limit, high_limit, config)[source]
Bases:
ChronosTokenizer- input_transform(context)[source]
Turn a batch of time series into token IDs, attention map, and scale.
- Parameters:
context (
Tensor) – A tensor shaped (batch_size, time_length), containing the timeseries to forecast. Use left-padding withtorch.nanto align time series of different lengths.- Return type:
Tuple[Tensor,Tensor,Tensor]- Returns:
token_ids – A tensor of integers, shaped (batch_size, time_length + 1) if
config.use_eos_tokenand (batch_size, time_length) otherwise, containing token IDs for the input series.attention_mask – A boolean tensor, same shape as
token_ids, indicating which input observations are nottorch.nan(i.e. not missing nor padding).tokenizer_state – An object that will be passed to
output_transform. Contains the relevant context to decode output samples into real values, such as location and scale parameters.
- output_transform(samples, scale)[source]
Turn a batch of sample token IDs into real values.
- Parameters:
samples (
Tensor) – A tensor of integers, shaped (batch_size, num_samples, time_length), containing token IDs of sample trajectories.tokenizer_state – An object returned by
input_transformcontaining relevant context to decode samples, such as location and scale. The nature of this depends on the specific tokenizer.
- Returns:
A real tensor, shaped (batch_size, num_samples, time_length), containing forecasted sample paths.
- Return type:
forecasts
- class ltsm.data_provider.tokenizer.tokenizer_processor.TokenizerConfig(tokenizer_class, tokenizer_kwargs, n_tokens, n_special_tokens, pad_token_id, eos_token_id, use_eos_token, model_type, context_length, prediction_length, num_samples, temperature, top_k, top_p)[source]
Bases:
objectThis class holds all the configuration parameters to be used by
ChronosTokenizerandChronosModel.-
context_length:
int
-
eos_token_id:
int
-
model_type:
Literal['causal','seq2seq']
-
n_special_tokens:
int
-
n_tokens:
int
-
num_samples:
int
-
pad_token_id:
int
-
prediction_length:
int
-
temperature:
float
-
tokenizer_class:
str
-
tokenizer_kwargs:
Dict[str,Any]
-
top_k:
int
-
top_p:
float
-
use_eos_token:
bool
-
context_length: