This page gives a few simple considerations on the choice of an estimator. It is slightly oriented towards a decoding application, that is the prediction of external variables such as behavior or clinical traits from brain images. For a didactic introduction to decoding with nilearn, see the dedicated section of the nilearn documentation.
A regression problem is a learning task in which the variable to predict –that we often call y– is a continuous value, such as an age. Encoding models  typically call for regressions.
|||Naselaris et al, Encoding and decoding in fMRI, NeuroImage Encoding and decoding in fMRI.2011 http://www.ncbi.nlm.nih.gov/pubmed/20691790|
A classification task consists in predicting a class label for each observation. In other words, the variable to predict is categorical.
Often classification is performed between two classes, but it may well be applied to multiple classes, in which case it is known as a multi-class problem. It is important to keep in mind that the larger the number of classes, the harder the prediction problem.
Some estimators support multi-class prediction out of the box, but many work by dividing the multi-class problem in a set of two class problems. There are two noteworthy strategies:
|One versus All:||sklearn.multiclass.OneVsRestClassifier An estimator is trained to distinguish each class from all the others, and during prediction, the final decision is taken by a vote across the different estimators.|
|One versus One:||sklearn.multiclass.OneVsOneClassifier An estimator is trained to distinguish each pair of classes, and during prediction, the final decision is taken by a vote across the different estimators.|
The “One vs One” strategy is more computationally costly than the “One vs All”. The former scales as the square of the number of classes, whereas the former is linear with the number of classes.
Most estimators have parameters that can be set to optimize their performance. Importantly, this must be done via nested cross-validation.
Indeed, there is noise in the cross-validation score, and when we vary the parameter, the curve showing the score as a function of the parameter will have bumps and peaks due to this noise. These will not generalize to new data and chances are that the corresponding choice of parameter will not perform as well on new data.
With scikit-learn nested cross-validation is done via sklearn.grid_search.GridSearchCV. It is unfortunately time consuming, but the n_jobs argument can spread the load on multiple CPUs.
There is a wide variety of classifiers available in scikit-learn (see the scikit-learn documentation on supervised learning). Here we apply a few linear models to fMRI data:
Note that what is done to the data before applying the estimator is often more important than the choice of estimator. Typically, standardizing the data is important, smoothing can often be useful, and confounding effects, such as session effect, must be removed.
The corresponding weight maps (below) differ widely from one estimator to the other, although the prediction scores are fairly similar. In other terms, a well-performing estimator in terms of prediction error gives us little guarantee on the brain maps.