2.1. An introduction to decoding

This section gives an introduction to the main concept of decoding: predicting from brain images.

The discussion and examples are articulated on the analysis of the Haxby 2001 dataset, showing how to predict from fMRI images the stimuli that the subject is viewing. However the process is the same in other settings predicting from other brain imaging modalities, for instance predicting phenotype or diagnostic status from VBM (Voxel Based Morphometry) maps (as illustrated in a more complex example), or from FA maps to capture diffusion mapping.

2.1.1. Loading and preparing the data

2.1.1.1. The Haxby 2001 experiment

In the Haxby experiment, subjects were presented visual stimuli from different categories. We are going to predict which category the subject is seeing from the fMRI activity recorded in masks of the ventral stream. Significant prediction shows that the signal in the region contains information on the corresponding category.

../_images/sphx_glr_plot_haxby_stimuli_0041.png

Face stimuli

../_images/sphx_glr_plot_haxby_stimuli_0021.png

Cat stimuli

../_images/sphx_glr_plot_haxby_masks_0011.png

Masks

../_images/sphx_glr_plot_haxby_full_analysis_0011.png

Decoding scores per mask


fMRI: using beta maps of a first-level analysis

The Haxby experiment is unusual because the experimental paradigm is made of many blocks of continuous stimulation. Most cognitive experiments have a more complex temporal structure with rich sequences of events.

The standard approach to decoding consists in fitting a first-level GLM to retrieve one response map (a beta map) per trial. This is sometimes known as “beta-series regressions” (see Mumford et al, Deconvolving bold activation in event-related designs for multivoxel pattern classification analyses, NeuroImage 2012). These maps can then be input to the decoder as below, predicting the conditions associated to trial.

For simplicity, we will work on the raw time-series of the data. However, it is strongly recomended that you fit a first level to include an HRF model and isolate the responses from various confounds.

2.1.1.2. Loading the data into nilearn

Full code example

The documentation here just gives the big idea. A full code example, with explanation, can be found on A introduction tutorial to fMRI decoding

  • Starting an environment: Launch IPython via “ipython –matplotlib” in a terminal, or use the Jupyter notebook.
  • Retrieving the data: In the tutorial, we load the data using nilearn data downloading function, nilearn.datasets.fetch_haxby. However, all this function does is to download the data and return paths to the files downloaded on the disk. To input your own data to nilearn, you can pass in the path to your own files (more on data input).
  • Loading the behavioral labels: Behavioral information is often stored in a text file such as a CSV, and must be load with numpy.recfromcsv or pandas
  • Extracting the fMRI data: we then use the nilearn.input_data.NiftiMasker: we extract only the voxels on the mask of the ventral temporal cortex that comes with the data, applying the mask_vt mask to the 4D fMRI data. The resulting data is then a matrix with a shape that is (n_timepoints, n_voxels) (see Masking data: from 4D Nifti images to 2D data arrays for a discussion on using masks).
  • Sample mask: Masking some of the time points may be useful to restrict to a specific pair of conditions (eg cats versus faces).

Note

Seemingly minor data preparation can matter a lot on the final score, for instance standardizing the data.

2.1.2. Performing a simple decoding analysis

2.1.2.1. The prediction engine

2.1.2.1.1. An estimator object

To perform decoding we need to use an estimator from the scikit-learn <http://scikit-learn.org> machine-learning library. This object can predict a condition label y given a set X of imaging data.

A simple and yet performant choice is the Support Vector Classifier (or SVC) with a linear kernel. The corresponding class, sklearn.svm.SVC, needs to be imported from the scikit-learn.

Note that the documentation of the object details all parameters. In IPython, it can be displayed as follows:

In [10]: svc?
Type:             SVC
Base Class:       <class 'sklearn.svm.libsvm.SVC'>
String Form:
SVC(kernel=linear, C=1.0, probability=False, degree=3, coef0=0.0, tol=0.001,
cache_size=200, shrinking=True, gamma=0.0)
Namespace:        Interactive
Docstring:
    C-Support Vector Classification.
    Parameters
    ----------
    C : float, optional (default=1.0)
        penalty parameter C of the error term.
...

2.1.2.1.2. Applying it to data: fit (train) and predict (test)

The prediction objects have two important methods:

  • a fit function that “learns” the parameters of the model from the data. Thus, we need to give some training data to fit.
  • a predict function that “predicts” a target from new data. Here, we just have to give the new set of images (as the target should be unknown):

Warning

Do not predict on data used by the fit: this would yield misleadingly optimistic scores.

2.1.2.2. Measuring prediction performance

2.1.2.2.1. Cross-validation

We cannot measure a prediction error on the same set of data that we have used to fit the estimator: it would be much easier than on new data, and the result would be meaningless. We need to use a technique called cross-validation to split the data into different sets, called “folds”, in a K-Fold strategy.

There is a specific function, sklearn.cross_validation.cross_val_score that computes for you the score for the different folds of cross-validation:

>>> from sklearn.cross_validation import cross_val_score  
>>> cv_scores = cross_val_score(svc, fmri_masked, target, cv=5)  

cv=5 stipulates a 5-fold cross-validation. Note that this function is located in sklearn.model_selection.cross_val_score in the newest version of scikit-learn.

You can speed up the computation by using n_jobs=-1, which will spread the computation equally across all processors (but might not work under Windows):

>>> cv_scores = cross_val_score(svc, fmri_masked, target, cv=5, n_jobs=-1, verbose=10) 

Prediction accuracy: We can take a look at the results of the cross_val_score function:

>>> print(cv_scores)  
[0.72727272727272729, 0.46511627906976744, 0.72093023255813948, 0.58139534883720934, 0.7441860465116279]

This is simply the prediction score for each fold, i.e. the fraction of correct predictions on the left-out data.

2.1.2.2.2. Choosing a good cross-validation strategy

There are many cross-validation strategies possible, including K-Fold or leave-one-out. When choosing a strategy, keep in mind that:

  • The test set should be as litte correlated as possible with the train set
  • The test set needs to have enough samples to enable a good measure of the prediction error (a rule of thumb is to use 10 to 20% of the data).

In these regards, leave one out is often one of the worst options (see Varoquaux et al, Assessing and tuning brain decoders: cross-validation, caveats, and guidelines, Neuroimage 2017).

Here, in the Haxby example, we are going to leave a session out, in order to have a test set independent from the train set. For this, we are going to use the session label, present in the behavioral data file, and sklearn.cross_validation.LeaveOneLabelOut.

Note

Full code for the above can be found on A introduction tutorial to fMRI decoding


Exercise

Compute the mean prediction accuracy using cv_scores.

Solution

>>> classification_accuracy = np.mean(cv_scores)  
>>> classification_accuracy  
0.76851...

For discriminating human faces from cats, we measure a total prediction accuracy of 77% across the different sessions.

2.1.2.2.3. Choice of the prediction accuracy measure

The default metric used for measuring errors is the accuracy score, i.e. the number of total errors. It is not always a sensible metric, especially in the case of very imbalanced classes, as in such situations choosing the dominant class can achieve a low number of errors.

Other metrics, such as the AUC (Area Under the Curve, for the ROC: the Receiver Operating Characteristic), can be used:

>>> cv_scores = cross_val_score(svc, fmri_masked, target, cv=cv,  scoring='roc_auc')  

2.1.2.2.4. Measuring the chance level

Dummy estimators: The simplest way to measure prediction performance at chance, is to use a “dummy” classifier, sklearn.dummy.DummyClassifier (purely random):

>>> from sklearn.dummy import DummyClassifier
>>> null_cv_scores = cross_val_score(DummyClassifier(), fmri_masked, target, cv=cv)  

Permutation testing: A more controlled way, but slower, is to do permutation testing on the labels, with sklearn.cross_validation.permutation_test_score:

>>> from sklearn.cross_validation import permutation_test_score
>>> null_cv_scores = permutation_test_score(svc, fmri_masked, target, cv=cv)  

Putting it all together

The ROI-based decoding example does a decoding analysis per mask, giving the f1-score of the prediction for each object.

It uses all the notions presented above, with for loop to iterate over masks and categories and Python dictionaries to store the scores.

../_images/sphx_glr_plot_haxby_masks_0011.png

Masks

../_images/sphx_glr_plot_haxby_full_analysis_0011.png

2.1.2.3. Visualizing the decoder’s weights

We can visualize the weights of the decoder:

  • we first inverse the masking operation, to retrieve a 3D brain volume of the SVC’s weights.
  • we then create a figure and plot as a background the first EPI image
  • finally we plot the SVC’s weights after masking the zero values
../_images/sphx_glr_plot_decoding_tutorial_0022.png

Note

Full code for the above can be found on A introduction tutorial to fMRI decoding

2.1.3. Decoding without a mask: Anova-SVM

2.1.3.1. Dimension reduction with feature selection

If we do not start from a mask of the relevant regions, there is a very large number of voxels and not all are useful for face vs cat prediction. We thus add a feature selection procedure. The idea is to select the k voxels most correlated to the task.

For this, we need to import the sklearn.feature_selection module and use sklearn.feature_selection.f_classif, a simple F-score based feature selection (a.k.a. Anova), that we will put before the SVC in a pipeline (sklearn.pipeline.Pipeline):

# ------------------
# Define the prediction function to be used.
# Here we use a Support Vector Classification, with a linear kernel
from sklearn.svm import SVC
svc = SVC(kernel='linear')

# Define the dimension reduction to be used.
# Here we use a classical univariate feature selection based on F-test,
# namely Anova. When doing full-brain analysis, it is better to use
# SelectPercentile, keeping 5% of voxels
# (because it is independent of the resolution of the data).
from sklearn.feature_selection import SelectPercentile, f_classif
feature_selection = SelectPercentile(f_classif, percentile=5)

# We have our classifier (SVC), our feature selection (SelectPercentile),and now,
# we can plug them together in a *pipeline* that performs the two operations
# successively:
from sklearn.pipeline import Pipeline
anova_svc = Pipeline([('anova', feature_selection), ('svc', svc)])

#############################################################################
# Fit the decoder and predict
# ----------------------------
anova_svc.fit(X, conditions)
y_pred = anova_svc.predict(X)

#############################################################################
# Obtain prediction scores via cross validation
# -----------------------------------------------
from sklearn.cross_validation import LeaveOneLabelOut, cross_val_score

# Define the cross-validation scheme used for validation.
# Here we use a LeaveOneLabelOut cross-validation on the session label
# which corresponds to a leave-one-session-out
cv = LeaveOneLabelOut(session)

# Compute the prediction accuracy for the different folds (i.e. session)
cv_scores = cross_val_score(anova_svc, X, conditions)

# Return the corresponding mean prediction accuracy
classification_accuracy = cv_scores.mean()

# Print the results
print("Classification accuracy: %.4f / Chance level: %f" %
      (classification_accuracy, 1. / len(np.unique(conditions))))
# Classification accuracy: 0.8009 / Chance level: 0.5000


#############################################################################

We can use our anova_svc object exactly as we were using our svc object previously.

2.1.3.2. Visualizing the results

To visualize the results, we need to:

  • first get the support vectors of the SVC and inverse the feature selection mechanism
  • then, as before, inverse the masking process to retrieve the weights and plot them.
# ----------------------
# Look at the SVC's discriminating weights
coef = svc.coef_
# reverse feature selection
coef = feature_selection.inverse_transform(coef)
# reverse masking
weight_img = masker.inverse_transform(coef)


# Use the mean image as a background to avoid relying on anatomical data
from nilearn import image
mean_img = image.mean_img(func_filename)

# Create the figure
from nilearn.plotting import plot_stat_map, show
plot_stat_map(weight_img, mean_img, title='SVM weights')

../_images/sphx_glr_plot_haxby_anova_svm_0011.png

Final script

The complete script to do an SVM-Anova analysis can be found as an example.

2.1.4. Going further with scikit-learn

We have seen a very simple analysis with scikit-learn, but it may be interesting to explore the wide variety of supervised learning algorithms in the scikit-learn.

2.1.4.1. Changing the prediction engine

We now see how one can easily change the prediction engine, if needed. We can try Fisher’s Linear Discriminant Analysis (LDA)

Import the module:

>>> from sklearn.lda import LDA

Construct the new estimator object and use it in a pipeline:

>>> from sklearn.pipeline import Pipeline
>>> lda = LDA()
>>> anova_lda = Pipeline([('anova', feature_selection), ('LDA', lda)])

and recompute the cross-validation score:

>>> cv_scores = cross_val_score(anova_lda, fmri_masked, target, cv=cv, verbose=1)  
>>> classification_accuracy = np.mean(cv_scores)  
>>> n_conditions = len(set(target))  # number of target classes
>>> print("Classification accuracy: %.4f / Chance Level: %.4f" % \
...    (classification_accuracy, 1. / n_conditions))  
Classification accuracy: 0.7846 / Chance level: 0.5000

2.1.4.2. Changing the feature selection

Let’s start by defining a linear SVM as a first classifier:

>>> clf = LinearSVC()

Let’s say that you want a more sophisticated feature selection, for example a Recursive Feature Elimination (RFE)

Import the module:

>>> from sklearn.feature_selection import RFE

Construct your new fancy selection:

>>> rfe = RFE(SVC(kernel='linear', C=1.), 50, step=0.25)

and create a new pipeline, composing the two classifiers rfe and clf:

>>> rfe_svc = Pipeline([('rfe', rfe), ('svc', clf)])

and recompute the cross-validation score:

>>> cv_scores = cross_val_score(rfe_svc, fmri_masked, target, cv=cv,
...     n_jobs=-1, verbose=1)  

But, be aware that this can take A WHILE


See also

  • The scikit-learn documentation has very detailed explanations on a large variety of estimators and machine learning techniques. To become better at decoding, you need to study it.