Module: data_management
| Parses raw time series data from a single file or archive and loads metadata from header file (if applicable). | |
| Parse header file containing classes/targets and meta-feature information. | |
| 
 | Parses raw time series data file and returns an (n, 3) array of values. | 
| 
 | Class representing a single time series of measurements and metadata. | 
parse_and_store_ts_data
- cesium.data_management.parse_and_store_ts_data(data_path, output_dir, header_path=None, cleanup_archive=True, cleanup_header=True, sep=',')
- Parses raw time series data from a single file or archive and loads metadata from header file (if applicable). Data is stored as files within output_dir, and the list of these paths is returned. - Parameters:
- data_pathstr
- Path to an individual time series file or tarball of multiple time series files to be used for feature generation. 
- output_dirstr
- Directory in which time series files will be saved. 
- header_pathstr, optional
- Path to header file containing file names, labels/targets, and meta_features. 
- cleanup_archivebool, optional
- Boolean specifying whether to delete the uploaded data file/archive (defaults to True). 
- cleanup_headerbool, optional
- Boolean specifying whether to delete the uploaded header file (defaults to True). 
- sepstr, optional
- Separator of columns in data file; defaults to ‘,’. 
 
- Returns:
- List of paths to time series files
 
 
parse_headerfile
- cesium.data_management.parse_headerfile(headerfile_path, files_to_include=None)
- Parse header file containing classes/targets and meta-feature information. - Parameters:
- headerfile_pathstr
- Path to header file. 
- files_to_includelist, optional
- If provided, only return the subset of rows from the header corresponding to the given filenames. 
 
- Returns:
- pandas.Series
- Class labels/targets from header file (if missing, all values are None) 
- pandas.DataFrame
- Feature data from other columns besides filename, label (can be empty) 
 
 
parse_ts_data
- cesium.data_management.parse_ts_data(filepath, sep=',')
- Parses raw time series data file and returns an (n, 3) array of values. - Data is expected as text in tabular format with separator sep. The output will always have three columns (time, measurement, error), even if the data file contains two or fewer: - For data containing three columns (time, measurement, error), all three are returned. 
- For data containing two columns, a dummy error column is added with value time_series.DEFAULT_ERROR_VALUE. 
- For data containing one column, a time column is also added with values evenly spaced from 0 to time_series.DEFAULT_MAX_TIME. 
 - Parameters:
- filenamestr
- Path to raw time series data to be parsed. 
- sepstr, optional
- Separator of columns in data file; defaults to ‘,’. 
 
- Returns:
- np.ndarray
- 3-column array of (time, measurement, error) values. 
 
 
TimeSeries
- class cesium.data_management.TimeSeries(t=None, m=None, e=None, label=None, meta_features={}, name=None, path=None, channel_names=None)
- Bases: - object- Class representing a single time series of measurements and metadata. - A TimeSeries object encapsulates a single set of time-domain measurements, along with any metadata describing the observation. Typically the observations will consist of times, measurements, and (optionally) measurement errors. The measurements can be scalar- or vector-valued (i.e., “multichannel”); for multichannel measurements, the times and errors can also be vector-valued, or they can be shared across all channels of measurement. - Attributes:
- time(n,) or (p, n) array or list of (n,) arrays
- Array(s) of times corresponding to measurement values. If measurement is two-dimensional, this can be one-dimensional (same times for each channel) or two-dimensional (different times for each channel). If time is one-dimensional then it will be broadcast to match measurement.shape. 
- measurement(n,) or (p, n) array or list of (n,) arrays
- Array(s) of measurement values; can be two-dimensional for multichannel data. In the case of multichannel data with different numbers of measurements for each channel, measurement will be a list of arrays instead of a single two-dimensional array. 
- error(n,) or (p, n) array or list of (n,) arrays
- Array(s) of measurement errors for each value. If measurement is two-dimensional, this can be one-dimensional (same times for each channel) or two-dimensional (different times for each channel). If error is one-dimensional then it will be broadcast match measurement.shape. 
- labelstr, float, or None
- Class label or regression target for the given time series (if applicable). 
- meta_featuresdict
- Dictionary of feature names/values specified independently of the featurization process in featurize. 
- namestr or None
- Identifying name for the given time series (if applicable). Typically the name of the raw data file from which the time series was created. 
- pathstr or None
- Path to the file where the time series is stored on disk (if applicable). 
- channel_nameslist of str
- List of names of channels of measurement; by default these are simply channel_{i}, but can be arbitrary depending on the nature of the different measurement channels. 
 
 - Methods - channels()- Iterates over measurement channels (whether one or multiple). - save([path])- Store TimeSeries object as a single .npz file. - sort()- Sort times, measurements, and errors by time. - __init__(t=None, m=None, e=None, label=None, meta_features={}, name=None, path=None, channel_names=None)
- Create a TimeSeries object from measurement values/metadata. - See TimeSeries documentation for parameter values. 
 - channels()
- Iterates over measurement channels (whether one or multiple). 
 - save(path=None)
- Store TimeSeries object as a single .npz file. - Attributes are stored in the following arrays:
- time 
- measurement 
- error 
- meta_feat_names 
- meta_feat_values 
- name 
- label 
 
 - If path is omitted then the path attribute from the TimeSeries object is used. 
 - sort()
- Sort times, measurements, and errors by time.