# Module: `features.graphs`

 Half the difference between the maximum and minimum magnitude. Anderson-Darling test statistic. `cesium.features.graphs.cad_prob`(cads, time) Given the observed distribution of time lags cads, compute the probability that the next observation occurs within time minutes of an arbitrary epoch. Build histogram of all possible `|t_i - t_j|`'s. Ratios `(t[i+2] - t[i]) / (t[i+1] - t[i]`). Find peaks, i.e. local maxima, of an array. A ratio of ((50+x) flux percentile - (50-x) flux percentile) / (95 flux percentile - 5 flux percentile), where x = percentile_range/2. Get alphath percentile of slopes of period-folded model. Get the amplitude of the jth harmonic of the ith frequency from a fitted Lomb-Scargle model. Get the ratio of the amplitudes of the first harmonic for the ith and first frequencies from a fitted Lomb-Scargle model. Get the ith frequency from a fitted Lomb-Scargle model. Get the ratio of the ith and first frequencies from a fitted Lomb-Scargle model. Get the regularization parameter of a fitted Lomb-Scargle model. Get the relative phase of the jth harmonic of the ith frequency from a fitted Lomb-Scargle model. Get the significance (in sigmas) of the first frequency from a fitted Lomb-Scargle model. Get the ratio of the significances (in sigmas) of the ith and first frequencies from a fitted Lomb-Scargle model. `cesium.features.graphs.get_lomb_trend`(lomb_model) Get the linear trend of a fitted Lomb-Scargle model. Get the fraction of the variance explained by the first frequency of a fitted Lomb-Scargle model. Get the y-intercept of a fitted Lomb-Scargle model. Largest value minus second largest value of fitted Lomb Scargle model. Get ratio of 90th percentiles of residuals for data folded by twice the estimated period and the estimated period, respectively. Second smallest value minus smallest value of fitted Lomb Scargle model. Ratio of distances between the second minimum and first maximum, and the second minimum and second maximum, of the fitted Lomb-Scargle model. Get ratio of variability (sum of squared differences of consecutive values) of folded and unfolded models. Get ratio of variability of folded and unfolded models. Get ratio of median of period-folded data over median absolute deviation of observed values. Get sum of squared differences of consecutive values as a fraction of the variance of the data. Natural log of goodness of fit of qso-model given fixed parameters. Natural log of expected chi2/nu for non-qso variable. Kurtosis of a dataset. Fits a simple sinuosidal model Simultaneous fit of a sum of sinusoids by weighted least squares. Compute the largest rate of change in the observed data. Maximum observed value. Median of observed values. Median absolute deviation (from the median) of the observed values. Minimum observed value. Normalize histogram such that integral from `t_min` to `t_max` equals 1. `cesium.features.graphs.num_alias`(lomb_model) Here we check for "1-day" aliases in ASAS / Deboss sources. `cesium.features.graphs.p2p_model`(x, y, frequency) Compute features that compare the residuals of data folded by estimated period from Lomb-Scargle model with residuals folded by twice the estimated period. Return the (bin) index of the ith largest peak. `cesium.features.graphs.peak_ratio`(peaks, i, j) Compute the ratio of the values of the ith and jth largest peaks. Returns the largest distance from the median value, measured as a percentage of the median. Percentage of values more than 1 std. Percentage of values within window_frac*(max(x)-min(x)) of median. Difference between the 95th and 5th percentiles of the data, expressed as a percentage of the median value. This section is used to calculate Dubath (10. `cesium.features.graphs.periodic_model`(lomb_model) Compute features related to the extreme points of the fitted Lomb Scargle model. `cesium.features.graphs.qso_fit`(time, data, error) Best-fit qso model determined for Sesar Strip82, ugriz-bands (default r). From arXiv 1101_2406v1 Dubath 20110112 paper. Shapiro-Wilk test statistic. Skewness of a dataset. Standard deviation of observed values. `cesium.features.graphs.stetson_j`(x[, y, dx, dy]) Robust covariance statistic between pairs of observations x,y whose uncertainties are dx,dy. A robust kurtosis statistic. Arithmetic mean of observed values, weighted by measurement errors.

## amplitude

cesium.features.graphs.amplitude(x)

Half the difference between the maximum and minimum magnitude.

## anderson_darling

cesium.features.graphs.anderson_darling(x, e)

Anderson-Darling test statistic.

Given the observed distribution of time lags cads, compute the probability that the next observation occurs within time minutes of an arbitrary epoch.

## delta_t_hist

cesium.features.graphs.delta_t_hist(t, nbins=50, conv_oversample=50)

Build histogram of all possible `|t_i - t_j|`’s.

For efficiency, we construct the histogram via a convolution of the PDF rather than by actually computing all the differences. For better accuracy we use a factor conv_oversample more bins when performing the convolution and then aggregate the result to have nbins total values.

## double_to_single_step

Ratios `(t[i+2] - t[i]) / (t[i+1] - t[i]`).

## find_sorted_peaks

cesium.features.graphs.find_sorted_peaks(x)

Find peaks, i.e. local maxima, of an array. Interior points are peaks if they are greater than both their neighbors, and edge points are peaks if they are greater than their only neighbor. In the case of ties, we (arbitrarily) choose the first index in the sequence of equal values as the peak. Returns a list of tuples `(i, x[i])` of peak indices `i` and values `x[i]`, sorted in decreasing order by peak value.

## flux_percentile_ratio

cesium.features.graphs.flux_percentile_ratio(x, percentile_range, base=10.0, exponent=-0.4)

A ratio of ((50+x) flux percentile - (50-x) flux percentile) / (95 flux percentile - 5 flux percentile), where x = percentile_range/2.

Assumes data is log-scaled; by default we assume inputs are scaled as x=10^(-0.4*y), corresponding to units of magnitudes. Computations are performed on the corresponding linear-scale values.

## get_fold2P_slope_percentile

cesium.features.graphs.get_fold2P_slope_percentile(model, alpha)

Get alphath percentile of slopes of period-folded model.

## get_lomb_amplitude

cesium.features.graphs.get_lomb_amplitude(lomb_model, i, j)

Get the amplitude of the jth harmonic of the ith frequency from a fitted Lomb-Scargle model.

## get_lomb_amplitude_ratio

cesium.features.graphs.get_lomb_amplitude_ratio(lomb_model, i)

Get the ratio of the amplitudes of the first harmonic for the ith and first frequencies from a fitted Lomb-Scargle model.

## get_lomb_frequency

cesium.features.graphs.get_lomb_frequency(lomb_model, i)

Get the ith frequency from a fitted Lomb-Scargle model.

## get_lomb_frequency_ratio

cesium.features.graphs.get_lomb_frequency_ratio(lomb_model, i)

Get the ratio of the ith and first frequencies from a fitted Lomb-Scargle model.

## get_lomb_lambda

cesium.features.graphs.get_lomb_lambda(lomb_model)

Get the regularization parameter of a fitted Lomb-Scargle model.

## get_lomb_rel_phase

cesium.features.graphs.get_lomb_rel_phase(lomb_model, i, j)

Get the relative phase of the jth harmonic of the ith frequency from a fitted Lomb-Scargle model.

## get_lomb_signif

cesium.features.graphs.get_lomb_signif(lomb_model)

Get the significance (in sigmas) of the first frequency from a fitted Lomb-Scargle model.

## get_lomb_signif_ratio

cesium.features.graphs.get_lomb_signif_ratio(lomb_model, i)

Get the ratio of the significances (in sigmas) of the ith and first frequencies from a fitted Lomb-Scargle model.

## get_lomb_trend

cesium.features.graphs.get_lomb_trend(lomb_model)

Get the linear trend of a fitted Lomb-Scargle model.

## get_lomb_varrat

cesium.features.graphs.get_lomb_varrat(lomb_model)

Get the fraction of the variance explained by the first frequency of a fitted Lomb-Scargle model.

## get_lomb_y_offset

cesium.features.graphs.get_lomb_y_offset(lomb_model)

Get the y-intercept of a fitted Lomb-Scargle model.

## get_max_delta_mags

cesium.features.graphs.get_max_delta_mags(model)

Largest value minus second largest value of fitted Lomb Scargle model.

## get_medperc90_2p_p

cesium.features.graphs.get_medperc90_2p_p(model)

Get ratio of 90th percentiles of residuals for data folded by twice the estimated period and the estimated period, respectively.

## get_min_delta_mags

cesium.features.graphs.get_min_delta_mags(model)

Second smallest value minus smallest value of fitted Lomb Scargle model.

## get_model_phi1_phi2

cesium.features.graphs.get_model_phi1_phi2(model)

Ratio of distances between the second minimum and first maximum, and the second minimum and second maximum, of the fitted Lomb-Scargle model.

## get_p2p_scatter_2praw

cesium.features.graphs.get_p2p_scatter_2praw(model)

Get ratio of variability (sum of squared differences of consecutive values) of folded and unfolded models.

Get ratio of variability of folded and unfolded models.

Get ratio of median of period-folded data over median absolute deviation of observed values.

## get_p2p_ssqr_diff_over_var

cesium.features.graphs.get_p2p_ssqr_diff_over_var(model)

Get sum of squared differences of consecutive values as a fraction of the variance of the data.

## get_qso_log_chi2_qsonu

cesium.features.graphs.get_qso_log_chi2_qsonu(qso_model)

Natural log of goodness of fit of qso-model given fixed parameters.

## get_qso_log_chi2nuNULL_chi2nu

cesium.features.graphs.get_qso_log_chi2nuNULL_chi2nu(qso_model)

Natural log of expected chi2/nu for non-qso variable.

## kurtosis

cesium.features.graphs.kurtosis(x)

Kurtosis of a dataset. Approximately 0 for Gaussian data.

## lomb_scargle_fast_period

cesium.features.graphs.lomb_scargle_fast_period(t, m, e)

Fits a simple sinuosidal model

y(t) = A sin(2*pi*w*t + phi) + c

and returns the estimated period 1/w. Much faster than fitting the full multi-frequency model used by features.lomb_scargle.

## lomb_scargle_model

cesium.features.graphs.lomb_scargle_model(time, signal, error, sys_err=0.05, nharm=8, nfreq=3, tone_control=5.0, normalize=False, default_order=1, freq_grid=None)

Simultaneous fit of a sum of sinusoids by weighted least squares.

y(t) = Sum_k Ck*t^k + Sum_i Sum_j A_ij sin(2*pi*j*fi*(t-t0)+phi_j), i=[1,nfreq], j=[1,nharm]

Parameters:
timearray_like

Array containing time values.

signalarray_like

Array containing data values.

errorarray_like

Array containing measurement error values.

sys_errfloat, optional

Defaults to 0.05

nharmint, optional

Number of harmonics to fit for each frequency. Defaults to 8.

nfreqint, optional

Number of frequencies to fit. Defaults to 3.

tone_controlfloat, optional

Defaults to 5.0

normalizeboolean, optional

Normalize the timeseries before fitting? This can help with instabilities seen at large values of nharm Defaults to False.

detrend_orderint, optional

Order of polynomial to fit to the data while fitting the dominant frequency. Defaults to 1.

freq_griddict or None, optional

Grid parameters to use. If None, then calculate the frequency grid automatically. To supply the grid, keys [“f0”, “fmax”] are expected. “f0” is the smallest frequency in the grid and “df” is the difference between grid points. If “”df” is given (grid spacing) then the number of frequencies is calculated. if “numf” is given then “df” is inferred.

Returns:
dict

Dictionary containing fitted parameter values. Parameters specific to a specific fitted frequency are stored in a list of dicts at model_dict[‘freq_fits’], each of which contains the output of fit_lomb_scargle(…)

## max_slope

cesium.features.graphs.max_slope(t, x)

Compute the largest rate of change in the observed data.

## maximum

cesium.features.graphs.maximum(x)

Maximum observed value.

## median

cesium.features.graphs.median(x)

Median of observed values.

## median_absolute_deviation

cesium.features.graphs.median_absolute_deviation(x)

Median absolute deviation (from the median) of the observed values.

## minimum

cesium.features.graphs.minimum(x)

Minimum observed value.

## normalize_hist

cesium.features.graphs.normalize_hist(hist, total_time)

Normalize histogram such that integral from `t_min` to `t_max` equals 1. cf. `np.histogram(..., density=True)`.

## num_alias

cesium.features.graphs.num_alias(lomb_model)

Here we check for “1-day” aliases in ASAS / Deboss sources.

## p2p_model

cesium.features.graphs.p2p_model(x, y, frequency)

Compute features that compare the residuals of data folded by estimated period from Lomb-Scargle model with residuals folded by twice the estimated period.

## peak_bin

cesium.features.graphs.peak_bin(peaks, i)

Return the (bin) index of the ith largest peak. Peaks is a list of tuples `(i, x[i])` of peak indices `i` and values `x[i]`, sorted in decreasing order by peak value.

## peak_ratio

cesium.features.graphs.peak_ratio(peaks, i, j)

Compute the ratio of the values of the ith and jth largest peaks. Peaks is a list of tuples `(i, x[i])` of peak indices `i` and values `x[i]`, sorted in decreasing order by peak value.

## percent_amplitude

cesium.features.graphs.percent_amplitude(x, base=10.0, exponent=-0.4)

Returns the largest distance from the median value, measured as a percentage of the median.

Assumes data is log-scaled; by default we assume inputs are scaled as x=10^(-0.4*y), corresponding to units of magnitudes. Computations are performed on the corresponding linear-scale values.

## percent_beyond_1_std

cesium.features.graphs.percent_beyond_1_std(x, e)

Percentage of values more than 1 std. dev. from the weighted average.

## percent_close_to_median

cesium.features.graphs.percent_close_to_median(x, window_frac=0.1)

Percentage of values within window_frac*(max(x)-min(x)) of median.

## percent_difference_flux_percentile

cesium.features.graphs.percent_difference_flux_percentile(x, base=10.0, exponent=-0.4)

Difference between the 95th and 5th percentiles of the data, expressed as a percentage of the median value. See Eyer (2005) arXiv:astro-ph/0511458v1, Evans & Belokurov (2005) (there the 98th and 2nd percentiles are used).

Assumes data is log-scaled; by default we assume inputs are scaled as x=10^(-0.4*y), corresponding to units of magnitudes. Computations are performed on the corresponding linear-scale values.

## period_folding

cesium.features.graphs.period_folding(x, y, dy, lomb_model, sys_err=0.05)

This section is used to calculate Dubath (10. Percentile90:2P/P), which requires regenerating a model using 2P where P is the original found period

NOTE: this essentially runs everything a second time, so makes feature generation take roughly twice as long.

## periodic_model

cesium.features.graphs.periodic_model(lomb_model)

Compute features related to the extreme points of the fitted Lomb Scargle model.

## qso_fit

cesium.features.graphs.qso_fit(time, data, error, filter='g', mag0=19.0, sys_err=0.0, return_model=False)

Best-fit qso model determined for Sesar Strip82, ugriz-bands (default r). See additional notes for underlying code qso_engine.

Input:

time - measurement times [days] data - measured magnitudes in single filter (also specified) error - uncertainty in measured magnitudes

Output:

chi^2/nu - classical variability measure chi^2_qso/nu - fit statistic chi^2_qso/nu_NULL - expected fit statistic for non-qso variable

signif_qso - significance chi^2/nu<chi^2/nu_NULL (rule out false alarm) signif_not_qso - significance chi^2/nu>1 (rule out qso) signif_vary - significance that source is variable at all class - source type (ambiguous, not_qso, qso)

model - time series prediction for each datum given all others (iff return_model==True) dmodel - model uncertainty, including uncertainty in data

Note on use (i.e., how class is defined):

1. signif_vary < 3: ambiguous, else

2. signif_qso > 3: qso, else

3. signif_not_qso > 3: not_qso

## scatter_res_raw

cesium.features.graphs.scatter_res_raw(t, m, e, lomb_model)

From arXiv 1101_2406v1 Dubath 20110112 paper.

Scatter: res/raw Median absolute deviation (MAD) of the residuals (obtained by subtracting model values from the raw light curve) divided by the MAD of the raw light-curve values around the median.

## shapiro_wilk

cesium.features.graphs.shapiro_wilk(x, e)

Shapiro-Wilk test statistic.

## skew

cesium.features.graphs.skew(x)

Skewness of a dataset. Approximately 0 for Gaussian data.

## std

cesium.features.graphs.std(x)

Standard deviation of observed values.

## stetson_j

cesium.features.graphs.stetson_j(x, y=[], dx=0.1, dy=0.1)

Robust covariance statistic between pairs of observations x,y whose uncertainties are dx,dy. If y is not given, calculates a robust variance for x.

## stetson_k

cesium.features.graphs.stetson_k(x, dx=0.1)

A robust kurtosis statistic.

## weighted_average

cesium.features.graphs.weighted_average(x, e)

Arithmetic mean of observed values, weighted by measurement errors.