Module: data_management
Parses raw time series data from a single file or archive and loads metadata from header file (if applicable). |
|
Parse header file containing classes/targets and meta-feature information. |
|
|
Parses raw time series data file and returns an (n, 3) array of values. |
|
Class representing a single time series of measurements and metadata. |
parse_and_store_ts_data
- cesium.data_management.parse_and_store_ts_data(data_path, output_dir, header_path=None, cleanup_archive=True, cleanup_header=True, sep=',')
Parses raw time series data from a single file or archive and loads metadata from header file (if applicable). Data is stored as files within output_dir, and the list of these paths is returned.
- Parameters:
- data_pathstr
Path to an individual time series file or tarball of multiple time series files to be used for feature generation.
- output_dirstr
Directory in which time series files will be saved.
- header_pathstr, optional
Path to header file containing file names, labels/targets, and meta_features.
- cleanup_archivebool, optional
Boolean specifying whether to delete the uploaded data file/archive (defaults to True).
- cleanup_headerbool, optional
Boolean specifying whether to delete the uploaded header file (defaults to True).
- sepstr, optional
Separator of columns in data file; defaults to ‘,’.
- Returns:
- List of paths to time series files
parse_headerfile
- cesium.data_management.parse_headerfile(headerfile_path, files_to_include=None)
Parse header file containing classes/targets and meta-feature information.
- Parameters:
- headerfile_pathstr
Path to header file.
- files_to_includelist, optional
If provided, only return the subset of rows from the header corresponding to the given filenames.
- Returns:
- pandas.Series
Class labels/targets from header file (if missing, all values are None)
- pandas.DataFrame
Feature data from other columns besides filename, label (can be empty)
parse_ts_data
- cesium.data_management.parse_ts_data(filepath, sep=',')
Parses raw time series data file and returns an (n, 3) array of values.
Data is expected as text in tabular format with separator sep. The output will always have three columns (time, measurement, error), even if the data file contains two or fewer:
For data containing three columns (time, measurement, error), all three are returned.
For data containing two columns, a dummy error column is added with value time_series.DEFAULT_ERROR_VALUE.
For data containing one column, a time column is also added with values evenly spaced from 0 to time_series.DEFAULT_MAX_TIME.
- Parameters:
- filenamestr
Path to raw time series data to be parsed.
- sepstr, optional
Separator of columns in data file; defaults to ‘,’.
- Returns:
- np.ndarray
3-column array of (time, measurement, error) values.
TimeSeries
- class cesium.data_management.TimeSeries(t=None, m=None, e=None, label=None, meta_features={}, name=None, path=None, channel_names=None)
Bases:
object
Class representing a single time series of measurements and metadata.
A TimeSeries object encapsulates a single set of time-domain measurements, along with any metadata describing the observation. Typically the observations will consist of times, measurements, and (optionally) measurement errors. The measurements can be scalar- or vector-valued (i.e., “multichannel”); for multichannel measurements, the times and errors can also be vector-valued, or they can be shared across all channels of measurement.
- Attributes:
- time(n,) or (p, n) array or list of (n,) arrays
Array(s) of times corresponding to measurement values. If measurement is two-dimensional, this can be one-dimensional (same times for each channel) or two-dimensional (different times for each channel). If time is one-dimensional then it will be broadcast to match measurement.shape.
- measurement(n,) or (p, n) array or list of (n,) arrays
Array(s) of measurement values; can be two-dimensional for multichannel data. In the case of multichannel data with different numbers of measurements for each channel, measurement will be a list of arrays instead of a single two-dimensional array.
- error(n,) or (p, n) array or list of (n,) arrays
Array(s) of measurement errors for each value. If measurement is two-dimensional, this can be one-dimensional (same times for each channel) or two-dimensional (different times for each channel). If error is one-dimensional then it will be broadcast match measurement.shape.
- labelstr, float, or None
Class label or regression target for the given time series (if applicable).
- meta_featuresdict
Dictionary of feature names/values specified independently of the featurization process in featurize.
- namestr or None
Identifying name for the given time series (if applicable). Typically the name of the raw data file from which the time series was created.
- pathstr or None
Path to the file where the time series is stored on disk (if applicable).
- channel_nameslist of str
List of names of channels of measurement; by default these are simply channel_{i}, but can be arbitrary depending on the nature of the different measurement channels.
Methods
channels
()Iterates over measurement channels (whether one or multiple).
save
([path])Store TimeSeries object as a single .npz file.
sort
()Sort times, measurements, and errors by time.
- __init__(t=None, m=None, e=None, label=None, meta_features={}, name=None, path=None, channel_names=None)
Create a TimeSeries object from measurement values/metadata.
See TimeSeries documentation for parameter values.
- channels()
Iterates over measurement channels (whether one or multiple).
- save(path=None)
Store TimeSeries object as a single .npz file.
- Attributes are stored in the following arrays:
time
measurement
error
meta_feat_names
meta_feat_values
name
label
If path is omitted then the path attribute from the TimeSeries object is used.
- sort()
Sort times, measurements, and errors by time.