MLR Plugin Usage

MLR data and model preparation

When calling the MLR plugin, the GUI of the MLR plugin guides through the selection of the input data sets and the anatomical space (tab 1), the cross-validation method (tab 2) and the regression methods (tab 3). In addition, a number of settings concerning the organization of the predictors as well as the scaling of the data can be changed in the last tab (tab 4).

In the first tab ("Input Data"), one ore more pairs of VTC-SDM files can be selected and added to the list. The SDM files contains the predictors associated with the data and their format should comply with the SDM text format. When the SDM file is interactly created via the single study GLM dialog it is possible that a constant predictor is added as the last predictor, however this is by default ignored by the plugin. If this is not to be the case, a related option is available in the last tab to change the way the last SDM predictor is treated (see below). In order to restrict the MLR analysis to the desired anatomical space, a VTC mask can be optionally selected in the first tab. However, because the mask file is defined on the resolution and bounding box of the VTC, it is up to the user to make sure that the provided mask correctly applies to all VTCs of the list. As an alternative, a volume of interest (VOI) can be selected from a VOI file. In this case, because the VOIs are defined on the current VMR, there is no risk that these may not fit the VTC space. 

  

In the second tab, the cross-validation approach can be chosen. The default choice is to use the N-fold cross-validation in which all runs are concatenated and then the resulting time-course is splitted in N time-courses of equal lengh (folds). This will produce N different model estimations and N predictions in each of which one fold will be set aside for performance evaluation. The N-fold cross-validation is the only option available if working with one single VTC-SDM pair. As an alternative, if more than one runs are available, the leave-N-runs-out approach can be used with N=1. In this case, the fold corresponds to the run and there will be as many model estimations as the number of runs.

In the third tab, the regression method can be chosen (see the theoretical background for more information). While the default choice is set on the KLS solution, this may often lead to inaccurate solutions, especially when the number of voxels is much higher than the number of time points (high dimensionality). In these circumstances, switching to the KRR solution will liely improve the performances. For KRR solutions, it is possible to either set the regularization parameter to a desired value (to be set in logarithmic scale and interpreted as 10^x) or choose the KRR solution with generalized cross-validation (KRR-GCV). In the latter case, not a single value for lambda is set, but an entire (logarithmic) range is spanned, starting from a desired minimum value and up to a desired total number of points at a desired step. During model estimation, for each predictor and each split of the data, the optimal lambda value will be chosen automatically within this range as the one corresponding to the minimum of the GCV function in the predefined range. When setting the KRR-GCV solution, the GCV plots will be also shown so that the user can check whether the minimum is reasonably approximated in the defined range, in which case one can be sure that the value giving the RR solution with the best generalization performances has been chosen for the model, or whether the lambda range should be refined in a new run.

 

The RVM regression method is also available from the third tab. Here the automatic relevance determination process can be controlled by setting the maximum number of iterations (default: 1000), the minimum change delta in the alpha hyperparameter estimation and the maximum value for the alpha hyperparameter above which alpha can be considered infinite and therefore the corresponding weight parameter pruned away. When choosing to use the RVM solution, it will be important to check the log tab to see how effective the optmization process has beeen. During RVM iterations, the log information is updated after each 10 iterations and the number of non-zero parameters (nz) can be inspected: depending on the data, the pruning of the parameters can be more or less effective with the final number of non-zero parameters becoming smaller and smaller than the initial number of parameters (equal to the number of time points). When this is the case, increasing the number of iterations may be one possible choice to improve the estimation. When this is not the case, one may consider to either increase the tolerance in the hyperparameter change (delta) or decrease the "infinity" threshold for the hyperparameter (alpha). Other displayed parameters are the marginal likelihood of the data (L), the "well-determinedness of parameters"  (gamma) and the sum of the non-zero parameters.

Important remarks on RVM: Sometimes, depending on the dimensionality of the data, RVMs may run into numerical problems. In such cases, it might be worth applying a different scale factor to the data prior to the kernel calculations. This scaling is controlled as an inverse function of the number of voxels. While the default choice is to use a linear function, thus scaling as the inverse of number of voxels, it is sometimes convenient to apply a stronger scaling, e. g. as the inverse of the squared number of voxels. This can be set in the kernel options of the regression method (tab 3). Moreover, when choosing the RVM method it should be kept in mind that due to the computational complexity of the algorithm when the total number of time points (of all runs concatenated) exceeds 1000, the iterative process may be considerably slow on some machines.    

 


Results

Once all settings have been made, it is possible to hit the button "GO" and start the calculations. At the end of the calculations, a set of maps and associated time-courses will be displayed together with bar graphs reporting the correlations and the mean squared errors of the predictions. There will be one prediction for each study or fold and for each predictor. For instance, the figure below shows the correlation coefficients between the predictions and the provided ratings after training on study 2 and testing on study 1 and vice versa as well as the mean squared errors of the predictions.

The weight map will be overlaid to the current VMR with no inferior threshold applied, thereby the actual voxels included in the calculations will be immediately visible. From the overlay map dialog, it will be possible to both set any (arbitrary) inferior or superior threshold (to best display the most relevant voxels for the prediction as well as the sign of their contribution) and show the time-course with the predictions associated with the estimated model (as many as the number of folds or runs). The map and time-course information will be stored with a special map extension "*.mlr" and can be loaded from the plugins menu.

 


Other options

As default, both the prediction time-courses and the predictive maps are normalized to the their standard deviation. This can be optionally changed in the last tab (Options) of the MLR plugin GUI. In addition, the exclusion of the last N predictors from the SDM as well as the scaling of the predictors, the training and testing data, and the predictions and maps can be also set from this tab.

MLR Results interpretation and remarks

While the prediction performances allow to evaluate the models, the exploration of the predictive maps allows to potentially investigate the role of different brain regions in the prediction. In fact, as far as the MLR methods are based on a linear kernel, the maps express the weighted contribution of each voxel to the prediction and therefore the absolute value of this weight is proportional to the relevance of that voxel for the prediction. On the other hand, it must be said that the areas that are most relevant for the prediction will be not limited to the specialized regions highlighted by conventional univariate regression methods (e. g. GLM), but may disclose a number of other regions whose contribution only becomes relevant as part of the obtained multi-voxel pattern. In other words, the MLR may produce patterns of brain activity that on the one hand predict perceptual and behavioral experience in relation to certain categories of stimulations and, on the other hand, may reveal the partecipation of regions not implicated at all in the processing of the same categories.

Even if there is no restriction in the plugin concerning the selection of the input data, it is generally adviced to perform the analyses using runs from single separate subjects. This is mainly motivated by the computational complexity of the methods (training simultaneously on multiple subjects increases considerably memory and time requirements in manipulating the kernel) and, in addition, training simultaneously on  multiple subjects without properly accounting for inter-subject variability in the spatial patterns may substantially reduce the performance of trained models.