.. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_07_advanced_plot_advanced_decoding_scikit.py: Advanced decoding using scikit learn ========================================== This tutorial opens the box of decoding pipelines to bridge integrated functionalities provided by the :class:`nilearn.decoding.Decoder` object with more advanced usecases. It reproduces basic examples functionalities with direct calls to scikit-learn function and gives pointers to more advanced objects. If some concepts seem unclear, please refer to the :ref:`documentation on decoding ` and in particular to the :ref:`advanced section `. As in many other examples, we perform decoding of the visual category of a stimuli on Haxby 2001 dataset, focusing on distinguishing two categories : face and cat images. * J.V. Haxby et al. "Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex", Science vol 293 (2001), p 2425.-2430. .. contents:: **Contents** :local: :depth: 1 Retrieve and load the fMRI data from the Haxby study ------------------------------------------------------ First download the data ........................ .. code-block:: default # The :func:`nilearn.datasets.fetch_haxby` function will download the # Haxby dataset composed of fmri images in a Niimg, a spatial mask and a text # document with label of each image from nilearn import datasets haxby_dataset = datasets.fetch_haxby() mask_filename = haxby_dataset.mask_vt[0] fmri_filename = haxby_dataset.func[0] # Loading the behavioral labels import pandas as pd behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ') behavioral .. only:: builder_html .. raw:: html
labels chunks
0 rest 0
1 rest 0
2 rest 0
3 rest 0
4 rest 0
... ... ...
1447 rest 11
1448 rest 11
1449 rest 11
1450 rest 11
1451 rest 11

1452 rows × 2 columns



We keep only a images from a pair of conditions(cats versus faces). .. code-block:: default from nilearn.image import index_img conditions = behavioral['labels'] condition_mask = conditions.isin(['face', 'cat']) fmri_niimgs = index_img(fmri_filename, condition_mask) conditions = conditions[condition_mask] # Convert to numpy array conditions = conditions.values session_label = behavioral['chunks'][condition_mask] Performing decoding with scikit-learn -------------------------------------- .. code-block:: default # Importing a classifier # ........................ # We can import many predictive models from scikit-learn that can be used in a # decoding pipelines. They are all used with the same `fit()` and `predict()` # functions. Let's define a `Support Vector Classifier `_ (or SVC). from sklearn.svm import SVC svc = SVC() Masking the data ................................... To use a scikit-learn estimator on brain images, you should first mask the data using a :class:`nilearn.input_data.NiftiMasker` to extract only the voxels inside the mask of interest, and transform 4D input fMRI data to 2D arrays(`shape=(n_timepoints, n_voxels)`) that estimators can work on. .. code-block:: default from nilearn.input_data import NiftiMasker masker = NiftiMasker(mask_img=mask_filename, sessions=session_label, smoothing_fwhm=4, standardize=True, memory="nilearn_cache", memory_level=1) fmri_masked = masker.fit_transform(fmri_niimgs) Cross-validation with scikit-learn ................................... To train and test the model in a meaningful way we use cross-validation with the function :func:`sklearn.model_selection.cross_val_score` that computes for you the score for the different folds of cross-validation. .. code-block:: default from sklearn.model_selection import cross_val_score cv_scores = cross_val_score(svc, fmri_masked, conditions, cv=5) # Here `cv=5` stipulates a 5-fold cross-validation Tuning cross-validation parameters ................................... You can change many parameters of the cross_validation here, for example: * use a different cross - validation scheme, for example LeaveOneGroupOut() * speed up the computation by using n_jobs = -1, which will spread the computation equally across all processors. * use a different scoring function, as a keyword or imported from scikit-learn : scoring = 'roc_auc' .. code-block:: default from sklearn.model_selection import LeaveOneGroupOut cv = LeaveOneGroupOut() cv_scores = cross_val_score(svc, fmri_masked, conditions, cv=cv, scoring='roc_auc', groups=session_label, n_jobs=-1) print("SVC accuracy: {:.3f}".format(cv_scores.mean())) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none SVC accuracy: 0.858 Measuring the chance level ------------------------------------ :class:`sklearn.dummy.DummyClassifier` (purely random) estimators are the simplest way to measure prediction performance at chance. A more controlled way, but slower, is to do permutation testing on the labels, with :func:`sklearn.model_selection.permutation_test_score`. Dummy estimator ................................... .. code-block:: default from sklearn.dummy import DummyClassifier null_cv_scores = cross_val_score( DummyClassifier(), fmri_masked, conditions, cv=cv, groups=session_label) print("Dummy accuracy: {:.3f}".format(null_cv_scores.mean())) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Dummy accuracy: 0.500 Permutation test ................................... .. code-block:: default from sklearn.model_selection import permutation_test_score null_cv_scores = permutation_test_score( svc, fmri_masked, conditions, cv=cv, groups=session_label)[1] print("Permutation test score: {:.3f}".format(null_cv_scores.mean())) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Permutation test score: 0.502 Decoding without a mask: Anova-SVM in scikit-lean -------------------------------------------------- We can also implement feature selection before decoding as a scikit-learn `pipeline`(:class:`sklearn.pipeline.Pipeline`). For this, we need to import the :mod:`sklearn.feature_selection` module and use :func:`sklearn.feature_selection.f_classif`, a simple F-score based feature selection (a.k.a. `Anova `_), .. code-block:: default from sklearn.pipeline import Pipeline from sklearn.feature_selection import SelectPercentile, f_classif from sklearn.model_selection import cross_validate from sklearn.svm import LinearSVC feature_selection = SelectPercentile(f_classif, percentile=10) anova_svc = Pipeline([('anova', feature_selection), ('svc', LinearSVC())]) # We can use our ``anova_svc`` object exactly as we were using our ``svc`` # object previously. # As we want to investigate our model, we use sklearn `cross_validate` function # with `return_estimator = True` instead of cross_val_score, to save the estimator fitted_pipeline = cross_validate(anova_svc, fmri_masked, conditions, cv=cv, groups=session_label, return_estimator=True) print( "ANOVA+SVC test score: {:.3f}".format(fitted_pipeline["test_score"].mean())) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ANOVA+SVC test score: 0.801 Visualize the ANOVA + SVC's discriminating weights ................................................... .. code-block:: default # retrieve the pipeline fitted on the first cross-validation fold and its SVC # coefficients first_pipeline = fitted_pipeline["estimator"][0] svc_coef = first_pipeline.named_steps['svc'].coef_ print("After feature selection, the SVC is trained only on {} features".format( svc_coef.shape[1])) # We invert the feature selection step to put these coefs in the right 2D place full_coef = first_pipeline.named_steps['anova'].inverse_transform(svc_coef) print("After inverting feature selection, we have {} features back".format( full_coef.shape[1])) # We apply the inverse of masking on these to make a 4D image that we can plot from nilearn.plotting import plot_stat_map weight_img = masker.inverse_transform(full_coef) plot_stat_map(weight_img, title='Anova+SVC weights') .. image:: /auto_examples/07_advanced/images/sphx_glr_plot_advanced_decoding_scikit_001.png :alt: plot advanced decoding scikit :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none After feature selection, the SVC is trained only on 47 features After inverting feature selection, we have 464 features back Going further with scikit-learn ------------------------------------ Changing the prediction engine ............................... To change the prediction engine, we just need to import it and use in our pipeline instead of the SVC. We can try Fisher's `Linear Discriminant Analysis (LDA) `_ .. code-block:: default # Construct the new estimator object and use it in a new pipeline after anova from sklearn.discriminant_analysis import LinearDiscriminantAnalysis feature_selection = SelectPercentile(f_classif, percentile=10) lda = LinearDiscriminantAnalysis() anova_lda = Pipeline([('anova', feature_selection), ('LDA', lda)]) # Recompute the cross-validation score: import numpy as np cv_scores = cross_val_score(anova_lda, fmri_masked, conditions, cv=cv, verbose=1, groups=session_label) classification_accuracy = np.mean(cv_scores) n_conditions = len(set(conditions)) # number of target classes print("ANOVA + LDA classification accuracy: %.4f / Chance Level: %.4f" % (classification_accuracy, 1. / n_conditions)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [Parallel(n_jobs=1)]: Done 12 out of 12 | elapsed: 0.2s finished ANOVA + LDA classification accuracy: 0.8009 / Chance Level: 0.5000 Changing the feature selection ........................................... Let's say that you want a more sophisticated feature selection, for example a Recursive Feature Elimination(RFE) before a svc. We follows the same principle. .. code-block:: default # Import it and define your fancy objects from sklearn.feature_selection import RFE svc = SVC() rfe = RFE(SVC(kernel='linear', C=1.), 50, step=0.25) # Create a new pipeline, composing the two classifiers `rfe` and `svc` rfe_svc = Pipeline([('rfe', rfe), ('svc', svc)]) # Recompute the cross-validation score # cv_scores = cross_val_score(rfe_svc, fmri_masked, target, cv=cv, n_jobs=-1, verbose=1) # But, be aware that this can take * A WHILE * ... .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/varoquau/dev/scikit-learn/sklearn/utils/validation.py:68: FutureWarning: Pass n_features_to_select=50 as keyword args. From version 0.25 passing these as positional arguments will result in an error warnings.warn("Pass {} as keyword args. From version 0.25 " .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 1 minutes 9.626 seconds) .. _sphx_glr_download_auto_examples_07_advanced_plot_advanced_decoding_scikit.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: binder-badge .. image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/nilearn/nilearn.github.io/master?filepath=examples/auto_examples/07_advanced/plot_advanced_decoding_scikit.ipynb :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_advanced_decoding_scikit.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_advanced_decoding_scikit.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_