.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/07_advanced/plot_advanced_decoding_scikit.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_07_advanced_plot_advanced_decoding_scikit.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_07_advanced_plot_advanced_decoding_scikit.py:


Advanced decoding using scikit learn
====================================

This tutorial opens the box of decoding pipelines to bridge integrated
functionalities provided by the :class:`nilearn.decoding.Decoder` object
with more advanced usecases. It reproduces basic examples functionalities with
direct calls to scikit-learn function and gives pointers to more advanced
objects. If some concepts seem unclear,
please refer to the :ref:`documentation on decoding <decoding_intro>`
and in particular to the :ref:`advanced section <going_further>`.
As in many other examples, we perform decoding of the visual category of a
stimuli on :footcite:t:`Haxby2001` dataset,
focusing on distinguishing two categories:
face and cat images.

.. include:: ../../../examples/masker_note.rst

.. GENERATED FROM PYTHON SOURCE LINES 22-28

Retrieve and load the :term:`fMRI` data from the Haxby study
------------------------------------------------------------

First download the data
.......................


.. GENERATED FROM PYTHON SOURCE LINES 28-44

.. code-block:: Python


    # The :func:`nilearn.datasets.fetch_haxby` function will download the
    # Haxby dataset composed of fMRI images in a Niimg,
    # a spatial mask and a text document with label of each image
    from nilearn import datasets

    haxby_dataset = datasets.fetch_haxby()
    mask_filename = haxby_dataset.mask_vt[0]
    fmri_filename = haxby_dataset.func[0]

    # Loading the behavioral labels
    import pandas as pd

    behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=" ")
    behavioral


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>labels</th>
          <th>chunks</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>rest</td>
          <td>0</td>
        </tr>
        <tr>
          <th>1</th>
          <td>rest</td>
          <td>0</td>
        </tr>
        <tr>
          <th>2</th>
          <td>rest</td>
          <td>0</td>
        </tr>
        <tr>
          <th>3</th>
          <td>rest</td>
          <td>0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>rest</td>
          <td>0</td>
        </tr>
        <tr>
          <th>...</th>
          <td>...</td>
          <td>...</td>
        </tr>
        <tr>
          <th>1447</th>
          <td>rest</td>
          <td>11</td>
        </tr>
        <tr>
          <th>1448</th>
          <td>rest</td>
          <td>11</td>
        </tr>
        <tr>
          <th>1449</th>
          <td>rest</td>
          <td>11</td>
        </tr>
        <tr>
          <th>1450</th>
          <td>rest</td>
          <td>11</td>
        </tr>
        <tr>
          <th>1451</th>
          <td>rest</td>
          <td>11</td>
        </tr>
      </tbody>
    </table>
    <p>1452 rows × 2 columns</p>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 45-46

We keep only a images from a pair of conditions(cats versus faces).

.. GENERATED FROM PYTHON SOURCE LINES 46-56

.. code-block:: Python

    from nilearn.image import index_img

    conditions = behavioral["labels"]
    condition_mask = conditions.isin(["face", "cat"])
    fmri_niimgs = index_img(fmri_filename, condition_mask)
    conditions = conditions[condition_mask]
    # Convert to numpy array
    conditions = conditions.values
    run_label = behavioral["chunks"][condition_mask]


.. GENERATED FROM PYTHON SOURCE LINES 57-59

Performing decoding with scikit-learn
-------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 59-72

.. code-block:: Python


    # Importing a classifier
    # ......................
    # We can import many predictive models from scikit-learn that can be used in a
    # decoding pipelines. They are all used with the same `fit()` and `predict()`
    # functions.
    # Let's define a Support Vector Classifier
    # (or `SVC <https://scikit-learn.org/stable/modules/svm.html >`_).

    from sklearn.svm import SVC

    svc = SVC()


.. GENERATED FROM PYTHON SOURCE LINES 73-80

Masking the data
................
To use a scikit-learn estimator on brain images, you should first mask the
data using a :class:`nilearn.maskers.NiftiMasker` to extract only the
voxels inside the mask of interest,
and transform 4D input :term:`fMRI` data to 2D arrays
(`shape=(n_timepoints, n_voxels)`) that estimators can work on.

.. GENERATED FROM PYTHON SOURCE LINES 80-92

.. code-block:: Python

    from nilearn.maskers import NiftiMasker

    masker = NiftiMasker(
        mask_img=mask_filename,
        runs=run_label,
        smoothing_fwhm=4,
        standardize="zscore_sample",
        memory="nilearn_cache",
        memory_level=1,
    )
    fmri_masked = masker.fit_transform(fmri_niimgs)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/himanshu/Desktop/nilearn_work/nilearn/nilearn/image/resampling.py:492: UserWarning: The provided image has no sform in its header. Please check the provided file. Results may not be as expected.
      warnings.warn(


.. GENERATED FROM PYTHON SOURCE LINES 93-98

Cross-validation with scikit-learn
..................................
To train and test the model in a meaningful way we use cross-validation with
the function :func:`sklearn.model_selection.cross_val_score` that computes
for you the score for the different folds of cross-validation.

.. GENERATED FROM PYTHON SOURCE LINES 98-104

.. code-block:: Python

    from sklearn.model_selection import cross_val_score

    # Here `cv=5` stipulates a 5-fold cross-validation
    cv_scores = cross_val_score(svc, fmri_masked, conditions, cv=5)
    print(f"SVC accuracy: {cv_scores.mean():.3f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    SVC accuracy: 0.823


.. GENERATED FROM PYTHON SOURCE LINES 105-116

Tuning cross-validation parameters
..................................
You can change many parameters of the cross_validation here, for example:

* use a different cross - validation scheme, for example LeaveOneGroupOut()

* speed up the computation by using n_jobs = -1, which will spread the
  computation equally across all processors.

* use a different scoring function, as a keyword or imported from
  scikit-learn : scoring = 'roc_auc'

.. GENERATED FROM PYTHON SOURCE LINES 116-130

.. code-block:: Python

    from sklearn.model_selection import LeaveOneGroupOut

    cv = LeaveOneGroupOut()
    cv_scores = cross_val_score(
        svc,
        fmri_masked,
        conditions,
        cv=cv,
        scoring="roc_auc",
        groups=run_label,
        n_jobs=2,
    )
    print(f"SVC accuracy (tuned parameters): {cv_scores.mean():.3f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    SVC accuracy (tuned parameters): 0.858


.. GENERATED FROM PYTHON SOURCE LINES 131-137

Measuring the chance level
--------------------------
:class:`sklearn.dummy.DummyClassifier` (purely random) estimators are the
simplest way to measure prediction performance at chance. A more controlled
way, but slower, is to do permutation testing on the labels, with
:func:`sklearn.model_selection.permutation_test_score`.

.. GENERATED FROM PYTHON SOURCE LINES 139-141

Dummy estimator
...............

.. GENERATED FROM PYTHON SOURCE LINES 141-149

.. code-block:: Python

    from sklearn.dummy import DummyClassifier

    null_cv_scores = cross_val_score(
        DummyClassifier(), fmri_masked, conditions, cv=cv, groups=run_label
    )

    print(f"Dummy accuracy: {null_cv_scores.mean():.3f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Dummy accuracy: 0.500


.. GENERATED FROM PYTHON SOURCE LINES 150-152

Permutation test
................

.. GENERATED FROM PYTHON SOURCE LINES 152-159

.. code-block:: Python

    from sklearn.model_selection import permutation_test_score

    null_cv_scores = permutation_test_score(
        svc, fmri_masked, conditions, cv=cv, groups=run_label
    )[1]
    print(f"Permutation test score: {null_cv_scores.mean():.3f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Permutation test score: 0.502


.. GENERATED FROM PYTHON SOURCE LINES 160-168

Decoding without a mask: Anova-SVM in scikit-lean
-------------------------------------------------
We can also implement feature selection before decoding as a scikit-learn
`pipeline`(:class:`sklearn.pipeline.Pipeline`). For this, we need to import
the :mod:`sklearn.feature_selection` module and use
:func:`sklearn.feature_selection.f_classif`, a simple F-score
based feature selection (a.k.a.
`Anova <https://en.wikipedia.org/wiki/Analysis_of_variance#The_F-test>`_).

.. GENERATED FROM PYTHON SOURCE LINES 168-193

.. code-block:: Python

    from sklearn.feature_selection import SelectPercentile, f_classif
    from sklearn.pipeline import Pipeline
    from sklearn.svm import LinearSVC

    feature_selection = SelectPercentile(f_classif, percentile=10)
    linear_svc = LinearSVC(dual=True)
    anova_svc = Pipeline([("anova", feature_selection), ("svc", linear_svc)])

    # We can use our ``anova_svc`` object exactly as we were using our ``svc``
    # object previously.
    # As we want to investigate our model, we use sklearn `cross_validate` function
    # with `return_estimator = True` instead of cross_val_score,
    # to save the estimator
    from sklearn.model_selection import cross_validate

    fitted_pipeline = cross_validate(
        anova_svc,
        fmri_masked,
        conditions,
        cv=cv,
        groups=run_label,
        return_estimator=True,
    )
    print(f"ANOVA+SVC test score: {fitted_pipeline['test_score'].mean():.3f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ANOVA+SVC test score: 0.801


.. GENERATED FROM PYTHON SOURCE LINES 194-196

Visualize the :term:`ANOVA` + SVC's discriminating weights
..........................................................

.. GENERATED FROM PYTHON SOURCE LINES 196-220

.. code-block:: Python


    # retrieve the pipeline fitted on the first cross-validation fold and its SVC
    # coefficients
    first_pipeline = fitted_pipeline["estimator"][0]
    svc_coef = first_pipeline.named_steps["svc"].coef_
    print(
        "After feature selection, "
        f"the SVC is trained only on {svc_coef.shape[1]} features"
    )

    # We invert the feature selection step to put these coefs in the right 2D place
    full_coef = first_pipeline.named_steps["anova"].inverse_transform(svc_coef)

    print(
        "After inverting feature selection, "
        f"we have {full_coef.shape[1]} features back"
    )

    # We apply the inverse of masking on these to make a 4D image that we can plot
    from nilearn.plotting import plot_stat_map

    weight_img = masker.inverse_transform(full_coef)
    plot_stat_map(weight_img, title="Anova+SVC weights")


.. image-sg:: /auto_examples/07_advanced/images/sphx_glr_plot_advanced_decoding_scikit_001.png
   :alt: plot advanced decoding scikit
   :srcset: /auto_examples/07_advanced/images/sphx_glr_plot_advanced_decoding_scikit_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    After feature selection, the SVC is trained only on 47 features
    After inverting feature selection, we have 464 features back

    <nilearn.plotting.displays._slicers.OrthoSlicer object at 0x7214eb7af110>


.. GENERATED FROM PYTHON SOURCE LINES 221-223

Going further with scikit-learn
-------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 225-231

Changing the prediction engine
..............................
To change the prediction engine, we just need to import it and use in our
pipeline instead of the SVC.
We can try Fisher's
`Linear Discriminant Analysis (LDA) <https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_lda.html>`_ # noqa

.. GENERATED FROM PYTHON SOURCE LINES 231-252

.. code-block:: Python


    # Construct the new estimator object and use it in a new pipeline after anova
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

    feature_selection = SelectPercentile(f_classif, percentile=10)
    lda = LinearDiscriminantAnalysis()
    anova_lda = Pipeline([("anova", feature_selection), ("LDA", lda)])

    # Recompute the cross-validation score:
    import numpy as np

    cv_scores = cross_val_score(
        anova_lda, fmri_masked, conditions, cv=cv, verbose=1, groups=run_label
    )
    classification_accuracy = np.mean(cv_scores)
    n_conditions = len(set(conditions))  # number of target classes
    print(
        "ANOVA + LDA classification accuracy: %.4f / Chance Level: %.4f"
        % (classification_accuracy, 1.0 / n_conditions)
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ANOVA + LDA classification accuracy: 0.8009 / Chance Level: 0.5000


.. GENERATED FROM PYTHON SOURCE LINES 253-258

Changing the feature selection
..............................
Let's say that you want a more sophisticated feature selection, for example a
Recursive Feature Elimination(RFE) before a svc.
We follows the same principle.

.. GENERATED FROM PYTHON SOURCE LINES 258-278

.. code-block:: Python


    # Import it and define your fancy objects
    from sklearn.feature_selection import RFE

    svc = SVC()
    rfe = RFE(SVC(kernel="linear", C=1.0), n_features_to_select=50, step=0.25)

    # Create a new pipeline, composing the two classifiers `rfe` and `svc`

    rfe_svc = Pipeline([("rfe", rfe), ("svc", svc)])

    # Recompute the cross-validation score
    # cv_scores = cross_val_score(rfe_svc,
    #                             fmri_masked,
    #                             target,
    #                             cv=cv,
    #                             n_jobs=2,
    #                             verbose=1)
    # But, be aware that this can take * A WHILE * ...


.. GENERATED FROM PYTHON SOURCE LINES 279-283

References
----------

 .. footbibliography::


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 19.913 seconds)

**Estimated memory usage:**  916 MB


.. _sphx_glr_download_auto_examples_07_advanced_plot_advanced_decoding_scikit.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/nilearn/nilearn/0.10.4?urlpath=lab/tree/notebooks/auto_examples/07_advanced/plot_advanced_decoding_scikit.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_advanced_decoding_scikit.ipynb <plot_advanced_decoding_scikit.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_advanced_decoding_scikit.py <plot_advanced_decoding_scikit.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
	labels	chunks
0	rest	0
1	rest	0
2	rest	0
3	rest	0
4	rest	0
...	...	...
1447	rest	11
1448	rest	11
1449	rest	11
1450	rest	11
1451	rest	11