Note
Go to the end to download the full example code or to run this example in your browser via Binder
Functional connectivity predicts age group#
This example compares different kinds of functional connectivity between regions of interest : correlation, partial correlation, and tangent space embedding.
The resulting connectivity coefficients can be used to discriminate children from adults. In general, the tangent space embedding outperforms the standard correlations: see Dadi et al.[1] for a careful study.
Note
If you are using Nilearn with a version older than 0.9.0
,
then you should either upgrade your version or import maskers
from the input_data
module instead of the maskers
module.
That is, you should manually replace in the following example all occurrences of:
from nilearn.maskers import NiftiMasker
with:
from nilearn.input_data import NiftiMasker
try:
import matplotlib.pyplot as plt
except ImportError:
raise RuntimeError("This script needs the matplotlib library")
Load brain development fMRI dataset and MSDL atlas#
We study only 60 subjects from the dataset, to save computation time.
from nilearn import datasets
development_dataset = datasets.fetch_development_fmri(n_subjects=60)
We use probabilistic regions of interest (ROIs) from the MSDL atlas.
from nilearn.maskers import NiftiMapsMasker
msdl_data = datasets.fetch_atlas_msdl()
msdl_coords = msdl_data.region_coords
masker = NiftiMapsMasker(
msdl_data.maps,
resampling_target="data",
t_r=2,
detrend=True,
low_pass=0.1,
high_pass=0.01,
memory="nilearn_cache",
memory_level=1,
standardize="zscore_sample",
standardize_confounds="zscore_sample",
).fit()
masked_data = [
masker.transform(func, confounds)
for (func, confounds) in zip(
development_dataset.func, development_dataset.confounds
)
]
What kind of connectivity is most powerful for classification?#
we will use connectivity matrices as features to distinguish children from adults. We use cross-validation and measure classification accuracy to compare the different kinds of connectivity matrices.
# prepare the classification pipeline
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from nilearn.connectome import ConnectivityMeasure
kinds = ["correlation", "partial correlation", "tangent"]
pipe = Pipeline(
[
(
"connectivity",
ConnectivityMeasure(
vectorize=True,
standardize="zscore_sample",
),
),
(
"classifier",
GridSearchCV(LinearSVC(dual=True), {"C": [0.1, 1.0, 10.0]}, cv=5),
),
]
)
param_grid = [
{"classifier": [DummyClassifier(strategy="most_frequent")]},
{"connectivity__kind": kinds},
]
We use random splits of the subjects into training/testing sets. StratifiedShuffleSplit allows preserving the proportion of children in the test set.
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit
from sklearn.preprocessing import LabelEncoder
groups = [pheno["Child_Adult"] for pheno in development_dataset.phenotypic]
classes = LabelEncoder().fit_transform(groups)
cv = StratifiedShuffleSplit(n_splits=30, random_state=0, test_size=10)
gs = GridSearchCV(
pipe,
param_grid,
scoring="accuracy",
cv=cv,
verbose=1,
refit=False,
n_jobs=2,
)
gs.fit(masked_data, classes)
mean_scores = gs.cv_results_["mean_test_score"]
scores_std = gs.cv_results_["std_test_score"]
Fitting 30 folds for each of 4 candidates, totalling 120 fits
display the results
plt.figure(figsize=(6, 4))
positions = [0.1, 0.2, 0.3, 0.4]
plt.barh(positions, mean_scores, align="center", height=0.05, xerr=scores_std)
yticks = ["dummy"] + list(gs.cv_results_["param_connectivity__kind"].data[1:])
yticks = [t.replace(" ", "\n") for t in yticks]
plt.yticks(positions, yticks)
plt.xlabel("Classification accuracy")
plt.gca().grid(True)
plt.gca().set_axisbelow(True)
plt.tight_layout()
This is a small example to showcase nilearn features. In practice such comparisons need to be performed on much larger cohorts and several datasets. Dadi et al.[1] showed that across many cohorts and clinical questions, the tangent kind should be preferred.
plt.show()
References#
Total running time of the script: (2 minutes 28.122 seconds)
Estimated memory usage: 1143 MB