Data Comparisons
Contents
Data Comparisons¶
This section applies and compares methods on empirical datasets.
Overview¶
In the previous sections, we used simulated data to investigate properties of the aperiodic methods. While simulated data has the benefit of knowing ground truth parameters and systematically exploring variations in isolated parameters, simulation tests are limited in so far as there is no guarantee that the simulations reflect the realities of empirical data.
In this section, we compare the methods on empirical data, to examine how they relate to each other in real data cases.
Contents¶
The following analyses and comparisons are applied to empirical datasets:
51-RestingEEGData
: analyzing a small sample of resting state EEG dataA small dataset of resting state EEG data collected in the VoytekLab
Young adult subjects (n=29, ages 18-28), with eyes-closed resting data
52-DevelopmentalEEGData
: analyzing a large EEG dataset of developmental dataThe MIPDB dataset, from the ChildMind Institute
Cross-sectional developmental data (n=126, ages 6-44), with eyes-open & eyes-closed resting data
53-iEEGData
: analyzing a large dataset of intracranial EEG dataThe open iEEG Atlas, from the MNI
Clinical iEEG data, cortical electrodes (n=106, average age: 33)
Applied Methods¶
The following methods are applied to the empirical datasets:
SpecParam
IRASA
AutoCorrelation Decay Time
DFA
Higuchi Fractal Dimension
Lempel-Ziv Complexity
Hjorth Complexity
Sample Entropy
Permutation Entropy
Code Approach¶
The general following strategy is taken:
data files are loaded and organized
measures of interest are computed on the empirical data
results of the measures are compared to each other
The overarching function used to compute measures on data is the run_measures
function.
This function allows for:
taking a collection of data and a list of methods
applying each measure across the data
returning the collection of results
# Setup notebook state
from nbutils import setup_notebook; setup_notebook()
import numpy as np
from neurodsp.sim import sim_powerlaw, sim_multiple
# Import custom project code
from apm.run import run_measures
from apm.analysis import compute_all_corrs
from apm.plts import plot_dots
from apm.sim.settings import N_SECONDS, FS
from apm.utils import format_corr
Run Measures¶
To run multiple measures across datasets, we will use the run_measures
function.
# Check the documentation for `run_measures`
print(run_measures.__doc__)
Compute multiple measures on empirical recordings - 2D array input.
Parameters
----------
data : 2d array
Data to run measures on, organized as [channels, timepoints].
measures : dict
Functions to apply to the data.
The keys should be functions to apply to the data.
The values should be a dictionary of parameters to use for the method.
Returns
-------
results : dict
Output measures.
The keys are labels for each applied method.
The values are the computed measures for each method.
Next, we can run an example of using run_measures
.
To do so, we will define an example analysis to apply some measures of interest (here, computing the mean and the variance). To mimic a real dataset, we will use some of the example simulated time series.
# Simulate some mock data
params = {'n_seconds' : N_SECONDS, 'fs' : FS, 'exponent' : -1}
data = sim_multiple(sim_powerlaw, params, 10)
# Define measures to apply
measures = {
np.mean : {},
np.median : {},
}
# Run measures across the data
results = run_measures(data.signals, measures)
# Check output values of computed measures
results
{'mean': array([ 0.00000000e+00, 1.51582450e-17, 1.51582450e-17, -1.51582450e-17,
7.57912251e-18, -3.03164901e-17, 1.51582450e-17, 7.57912251e-18,
0.00000000e+00, 4.54747351e-17]),
'median': array([ 0.02077007, -0.02495609, -0.01602544, -0.0013445 , 0.00428904,
0.02038833, -0.00150607, 0.00802099, 0.01070455, 0.00688729])}
# Compute correlations across all pairs of methods
all_corrs = compute_all_corrs(results)
Examine Results¶
# Plot the comparison different measure estimations
plot_dots(results['mean'], results['median'], xlabel='Mean', ylabel='Median')
# Check the correlation
print('Mean & Median: ', format_corr(*all_corrs['mean']['median']))
Mean & Median: r=-0.599 CI[-0.952, +0.096], p=0.067