ltsm.data_provider package
Subpackages
- ltsm.data_provider.tokenizer package
- Submodules
- ltsm.data_provider.tokenizer.standard_scaler module
- ltsm.data_provider.tokenizer.tokenizer_processor module
ChronosTokenizer
MeanScaleUniformBins
TokenizerConfig
TokenizerConfig.context_length
TokenizerConfig.create_tokenizer()
TokenizerConfig.eos_token_id
TokenizerConfig.model_type
TokenizerConfig.n_special_tokens
TokenizerConfig.n_tokens
TokenizerConfig.num_samples
TokenizerConfig.pad_token_id
TokenizerConfig.prediction_length
TokenizerConfig.temperature
TokenizerConfig.tokenizer_class
TokenizerConfig.tokenizer_kwargs
TokenizerConfig.top_k
TokenizerConfig.top_p
TokenizerConfig.use_eos_token
- Module contents
Submodules
ltsm.data_provider.data_factory module
- class ltsm.data_provider.data_factory.DatasetFactory(data_paths, prompt_data_path, data_processing, seq_len, pred_len, train_ratio, val_ratio, model=None, scale_on_train=False, downsample_rate=10, split_test_sets=True, do_anomaly=False)[source]
Bases:
object
A factory class for time-series datasets.
- createTorchDS(data, prompt_data, downsample_rate, do_anomaly)[source]
Creates a pyTorch Dataset from a list of sequences and a list of their corresponding prompts.
- Parameters:
data (List[np.ndarray]) – A list of sequences.
prompt_data (List[List[np.float64]]) – A list of prompts.
downsample_rate (
int
) – The downsampling rate.
- Returns:
A time-series dataset.
- Return type:
- fetch(data_path)[source]
Retrieves data from the filesystem at the specified data_path.
Selects the appropriate BaseReader implementation based on the file’s extension or location.
- Parameters:
data_path (str) – The file path to the source data.
- Returns:
A Pandas DataFrame containing the data at data_path.
- Return type:
pd.DataFrame
- getDatasets()[source]
Loads, splits, and sclaes the time-series data. Loads the prompts and creates TSDatasets for training, validation, and testing.
- Returns:
A tuple consisting of the time-series datasets for training, validation, and testing. The training and validation datasets combine all data sources and sequences into a single dataset, respectively. Test data is kept separate and are returned as a list of time-series datasets where each dataset corresponds to one of the data sources.
- Return type:
- loadPrompts(data_path, prompt_data_path, buff)[source]
Loads the prompt data from prompt_data_path.
- Parameters:
data_path (str) – The file path to the source data.
prompt_data_path (str) – The file path to the directory where the prompt data files are stored.
buff (List[Union[int, str]]) – The list of row labels of the data.
- Returns:
A dictionary of data indices and the prompt data corresponding to each index. List[Union[int, str]]: A list of indices with missing prompt data
- Return type:
Dict[Union[int, str], List[np.float64]]
ltsm.data_provider.data_loader module
- class ltsm.data_provider.data_loader.Dataset_Custom(data_path, split='train', size=None, features='S', target='OT', scale=True, timeenc=0, freq='h', percent=10, max_len=-1, train_all=False)[source]
Bases:
Dataset
- class ltsm.data_provider.data_loader.Dataset_Custom_List(data_path=[], split='train', size=None, features='M', target='OT', scale=True, timeenc=0, freq='h', percent=10, max_len=-1, train_all=False)[source]
Bases:
Dataset
- class ltsm.data_provider.data_loader.Dataset_Custom_List_TS(data_path=[], split='train', size=None, features='M', target='OT', scale=True, timeenc=0, freq='h', percent=10, max_len=-1, train_all=False)[source]
Bases:
Dataset
- class ltsm.data_provider.data_loader.Dataset_Custom_List_TS_TSF(data_path=[], split='train', size=None, features='M', target='OT', scale=True, timeenc=0, freq='h', percent=10, max_len=-1, train_all=False)[source]
Bases:
Dataset
- class ltsm.data_provider.data_loader.Dataset_ETT_hour(data_path, split='train', size=None, features='S', target='OT', scale=True, timeenc=0, freq='h', percent=100, max_len=-1, train_all=False)[source]
Bases:
Dataset
- class ltsm.data_provider.data_loader.Dataset_ETT_minute(data_path, split='train', size=None, features='S', target='OT', scale=True, timeenc=0, freq='t', percent=100, max_len=-1, train_all=False)[source]
Bases:
Dataset
- class ltsm.data_provider.data_loader.Dataset_Pred(data_path, split='pred', size=None, features='S', target='OT', scale=True, inverse=False, timeenc=0, freq='15min', cols=None, percent=None, train_all=False)[source]
Bases:
Dataset
- class ltsm.data_provider.data_loader.Dataset_TSF(data_path, split='train', size=None, features='S', target='OT', scale=True, timeenc=0, freq='Daily', percent=10, max_len=-1, train_all=False)[source]
Bases:
Dataset
ltsm.data_provider.data_splitter module
- class ltsm.data_provider.data_splitter.SplitterByTimestamp(seq_len, pred_len, train_ratio, val_ratio)[source]
Bases:
DataSplitter
Data splitter class that splits time-series data by timestamp.
- get_csv_splits(df_data, do_anomaly=False)[source]
Splits the .csv data into training-validation-training sets.
- Parameters:
df_data (pd.DataFrame) – A Pandas DataFrame containing the data to be split.
- Returns:
A tuple containing fours lists of sequences for the training, validation, and test sets. The last list contains the row labels of these sequences.
- Return type:
Tuple[List[np.ndarray], List[np.ndarray], List[np.ndarray], List[np.ndarray]]
ltsm.data_provider.dataset module
- class ltsm.data_provider.dataset.TSDataset(data, seq_len, pred_len, do_anomaly=False)[source]
Bases:
Dataset
ltsm.data_provider.hf_train_data_loader module
ltsm.data_provider.prompt_generator module
- ltsm.data_provider.prompt_generator.data_import(path, root_path, format='feather', anomaly=False)[source]
- ltsm.data_provider.prompt_generator.load_data(data_path, save_format)[source]
- Load the prompt data in different format from the input path.
The data should be pd.Series.
- Parameters:
data_path – str, the input path
save_format – str, the format of the data saved
- ltsm.data_provider.prompt_generator.mean_std_export_ds(root_path, output_path, data_path_buf, normalize_param_fname, save_format='pth.tar')[source]
Export the mean and std of the prompt data to the output path :type root_path: :param root_path: str, the root path of the input :type output_path: :param output_path: str, the output path :type data_path_buf: :param data_path_buf: list, the list of the input path :type normalize_param_fname: :param normalize_param_fname: str, the output path :type save_format: :param save_format: str, the format of the saved data
- ltsm.data_provider.prompt_generator.prompt_generate_split(root_path, output_path, save_format, dataset_name=None, ifTest=False)[source]
Generate prompt data for the input time-series data :type root_path:
str
:param root_path: path to the dataset :type root_path: str :type output_path:str
:param output_path: path to save the prompt data :type output_path: str :type save_format:str
:param save_format: format to save the prompt data :type save_format: str :type dataset_name:Optional
[str
] :param dataset_name: name of the dataset :type dataset_name: str :type ifTest: :param ifTest: if True, test if the saved prompt data is loaded back. Can be used during generating data. :type ifTest: bool- Return type:
None
- ltsm.data_provider.prompt_generator.prompt_generation(ts, ts_name)[source]
Generate prompt data for the input time-series data :type ts: :param ts: input time-series data :type ts: pd.DataFrame :type ts_name: :param ts_name: name of the time-series data :type ts_name: str
- ltsm.data_provider.prompt_generator.prompt_generation_single(ts)[source]
Generate prompt data for the input time-series data :type ts: :param ts: input time-series data :type ts: pd.Series
- ltsm.data_provider.prompt_generator.prompt_normalization_split(mode, save_format, root_path_train, output_path_train, root_path_val, output_path_val, root_path_test, output_path_test, dataset_root_path, dataset_name=None)[source]
Normalize the prompt data for the input time-series data :type mode:
str
:param mode: mode to run, “fit” or “transform” :type mode: str :type save_format:str
:param save_format: format to save the prompt data :type save_format: str :type root_path_train:str
:param root_path_train: path to the train dataset :type root_path_train: str :type output_path_train:str
:param output_path_train: path to save the train prompt data :type output_path_train: str :type root_path_val:str
:param root_path_val: path to the val dataset :type root_path_val: str :type output_path_val:str
:param output_path_val: path to save the val prompt data :type output_path_val: str :type root_path_test:str
:param root_path_test: path to the test dataset :type root_path_test: str :type output_path_test:str
:param output_path_test: path to save the test prompt data :type output_path_test: str :type dataset_root_path:str
:param dataset_root_path: path to the dataset root :type dataset_root_path: str :type dataset_name:Optional
[str
] :param dataset_name: name of the dataset :type dataset_name: str- Return type:
None
- ltsm.data_provider.prompt_generator.prompt_save(prompt_buf, output_path, data_name, save_format='pth.tar', ifTest=False)[source]
save prompts to three different files in the output path :type prompt_buf: :param prompt_buf: dictionary containing prompts for train, val, and test splits :type prompt_buf: dict :type output_path: :param output_path: path to save the prompt data :type output_path: str :type data_name: :param data_name: name of the dataset :type data_name: str :type save_format: :param save_format: format to save the prompt data :type save_format: str :type ifTest: :param ifTest: if True, test if the saved prompt data is loaded back. Can be used during generating data. :type ifTest: bool
- ltsm.data_provider.prompt_generator.save_data(data, data_path, save_format)[source]
Save the final prompt data to the output path :type data: :param data: pd.DataFrame, the final prompt data :type data_path: :param data_path: str, the output path :type save_format: :param save_format: str, the format to save the data
- ltsm.data_provider.prompt_generator.standardscale_export(data_path_buf, params_fname, output_path, root_path, save_format='pth.tar')[source]
Export the standardized prompt data to the output path :type data_path_buf: :param data_path_buf: list, the list of the input path :type params_fname: :param params_fname: str, the output path of the mean and std :type output_path: :param output_path: str, the output path of the standardized prompt data :type root_path: :param root_path: str, the root path of the input