Module: data_management

cesium.data_management.parse_and_store_ts_data(...)

Parses raw time series data from a single file or archive and loads metadata from header file (if applicable).

cesium.data_management.parse_headerfile(...)

Parse header file containing classes/targets and meta-feature information.

cesium.data_management.parse_ts_data(filepath)

Parses raw time series data file and returns an (n, 3) array of values.

cesium.data_management.TimeSeries([t, m, e, ...])

Class representing a single time series of measurements and metadata.

parse_and_store_ts_data

cesium.data_management.parse_and_store_ts_data(data_path, output_dir, header_path=None, cleanup_archive=True, cleanup_header=True, sep=',')

Parses raw time series data from a single file or archive and loads metadata from header file (if applicable). Data is stored as files within output_dir, and the list of these paths is returned.

Parameters:
data_pathstr

Path to an individual time series file or tarball of multiple time series files to be used for feature generation.

output_dirstr

Directory in which time series files will be saved.

header_pathstr, optional

Path to header file containing file names, labels/targets, and meta_features.

cleanup_archivebool, optional

Boolean specifying whether to delete the uploaded data file/archive (defaults to True).

cleanup_headerbool, optional

Boolean specifying whether to delete the uploaded header file (defaults to True).

sepstr, optional

Separator of columns in data file; defaults to ‘,’.

Returns:
List of paths to time series files

parse_headerfile

cesium.data_management.parse_headerfile(headerfile_path, files_to_include=None)

Parse header file containing classes/targets and meta-feature information.

Parameters:
headerfile_pathstr

Path to header file.

files_to_includelist, optional

If provided, only return the subset of rows from the header corresponding to the given filenames.

Returns:
pandas.Series

Class labels/targets from header file (if missing, all values are None)

pandas.DataFrame

Feature data from other columns besides filename, label (can be empty)

parse_ts_data

cesium.data_management.parse_ts_data(filepath, sep=',')

Parses raw time series data file and returns an (n, 3) array of values.

Data is expected as text in tabular format with separator sep. The output will always have three columns (time, measurement, error), even if the data file contains two or fewer:

  • For data containing three columns (time, measurement, error), all three are returned.

  • For data containing two columns, a dummy error column is added with value time_series.DEFAULT_ERROR_VALUE.

  • For data containing one column, a time column is also added with values evenly spaced from 0 to time_series.DEFAULT_MAX_TIME.

Parameters:
filenamestr

Path to raw time series data to be parsed.

sepstr, optional

Separator of columns in data file; defaults to ‘,’.

Returns:
np.ndarray

3-column array of (time, measurement, error) values.

TimeSeries

class cesium.data_management.TimeSeries(t=None, m=None, e=None, label=None, meta_features={}, name=None, path=None, channel_names=None)

Bases: object

Class representing a single time series of measurements and metadata.

A TimeSeries object encapsulates a single set of time-domain measurements, along with any metadata describing the observation. Typically the observations will consist of times, measurements, and (optionally) measurement errors. The measurements can be scalar- or vector-valued (i.e., “multichannel”); for multichannel measurements, the times and errors can also be vector-valued, or they can be shared across all channels of measurement.

Attributes:
time(n,) or (p, n) array or list of (n,) arrays

Array(s) of times corresponding to measurement values. If measurement is two-dimensional, this can be one-dimensional (same times for each channel) or two-dimensional (different times for each channel). If time is one-dimensional then it will be broadcast to match measurement.shape.

measurement(n,) or (p, n) array or list of (n,) arrays

Array(s) of measurement values; can be two-dimensional for multichannel data. In the case of multichannel data with different numbers of measurements for each channel, measurement will be a list of arrays instead of a single two-dimensional array.

error(n,) or (p, n) array or list of (n,) arrays

Array(s) of measurement errors for each value. If measurement is two-dimensional, this can be one-dimensional (same times for each channel) or two-dimensional (different times for each channel). If error is one-dimensional then it will be broadcast match measurement.shape.

labelstr, float, or None

Class label or regression target for the given time series (if applicable).

meta_featuresdict

Dictionary of feature names/values specified independently of the featurization process in featurize.

namestr or None

Identifying name for the given time series (if applicable). Typically the name of the raw data file from which the time series was created.

pathstr or None

Path to the file where the time series is stored on disk (if applicable).

channel_nameslist of str

List of names of channels of measurement; by default these are simply channel_{i}, but can be arbitrary depending on the nature of the different measurement channels.

Methods

channels()

Iterates over measurement channels (whether one or multiple).

save([path])

Store TimeSeries object as a single .npz file.

sort()

Sort times, measurements, and errors by time.

__init__(t=None, m=None, e=None, label=None, meta_features={}, name=None, path=None, channel_names=None)

Create a TimeSeries object from measurement values/metadata.

See TimeSeries documentation for parameter values.

channels()

Iterates over measurement channels (whether one or multiple).

save(path=None)

Store TimeSeries object as a single .npz file.

Attributes are stored in the following arrays:
  • time

  • measurement

  • error

  • meta_feat_names

  • meta_feat_values

  • name

  • label

If path is omitted then the path attribute from the TimeSeries object is used.

sort()

Sort times, measurements, and errors by time.