Note: There have been significant changes to this tutorial since the initial 0.1 and 0.2 releases. Even if you already went through the tutorial once, it is highly recommended that you at least glance through it again now.
This tutorial will guide you in a step-by-step fashion through some of the analysis necessary to recreate one of the Princeton EBC Team's submissions to the competition. This tutorial is geared for those who have never seen the MVPA toolbox before; if you already feel perfectly comfortable setting up and running an experiment in the toolbox from start to finish, you will want to skim through this and then start the advanced tutorial (ebc_tutorial_adv.m).
At this point, you should have the MVPA toolbox installed, along with
the EBC extension, as well as the tutorial data in MATLAB form from
the EBC website. For installation instructions,
click here. Once you're ready to proceed, start up Matlab and (if
you're using the GUI) load up ebc_tutorial_intro.m into
the editor. (Don't worry if you don't have the GUI: you can enter the
commands from this tutorial as we come to them, or you can copy from
ebc_tutorial_intro.m as a reference.) Then head to the
next section.
There should be three .mat files in the data archive that you downloaded from the ebc website:
tutorial_ebc1.mat - subject 1's datasettutorial_ebc2.mat - subject 2's datasettutorial_ebc3.mat - subject 3's datasetepi - a matrix containing the fMRI voxel activations
of some ~38,000 intra-cranial voxels. This matrix was created by the
Princeton EBC team based on data downloaded from the EBC website.
Rows of the matrix correspond to voxels, and columns correspond to
TRs. To create this matrix, we imported the raw functional DICOM data into AFNI. The
temporal offset of each slice within a volume was corrected using the
'tshift' option in AFNI's 3dvolreg script. This script was also used
to motion correct by aligning all volumes from one subject to a
single 'template' volume from that subject. Next, AFNI's 3dAutomask
script was used to automatically create a brain mask so that only
voxels with intensities likely to have come from inside the brain
would be included in subsequent analysis. The data was then imported
into Matlab using the Princeton MVPA toolbox and Ziad Saad's
AFNI-Matlab library.wholebrain - a binary, three-dimensional matrix that
relates the epi matrix to the 3D fMRI voxel grid; when
considered in sequential order, the n'th index with value 1
in wholebrain corresponds to the voxel in the n'th
row of epi.baseregs - a matrix containing the values of the
subject-determined feature ratings for each TR. Rows of the matrix
correspond to different features, and columns correspond to TRs.condnames - a cell array holding the English names of
the features in baseregs. For instance,
codenames{1} is 'Amusement'.movies - a row vector that records which phase of the
experiment (movie #1 or movie #2) each TR corresponds to.movies_noblank - a binary matrix that indicates which
TRs we consider 'rest' or 'blank', so that they may be excluded from
our analysis. A value of 0 indicates a blank TR.
Let's assume you've extracted the data into the subdirectory
data of your current working directory. The first step
in our analysis is to load this data into RAM for the first subject.
>> load 'data/tutorial_ebc1'; >> whos Name Size Bytes Class baseregs 30x1726 414240 double array condnames 1x30 3366 cell array epi 37828x1726 522329024 double array movies 1x1726 13808 double array movies_noblank 1x1726 13808 double array wholebrain 64x64x34 1114112 double array Grand total is 65485897 elements using 523888358 bytes
In addition to these .mat files, there is also a
.mat file included in the ebc subdirectory
that will find useful: ebc_params.mat This contains a
cell array of the optimized parameters obtained by the Princeton EBC
Team from a parameter search. In our case, we will only need a small
subset of these parameters, which we found provided the best performance.
Let's load these parameters now.
>> load 'ebc_params.mat'
>> default_params = ebc_params{1, 3, 2};
>> default_params
default_params =
N: [300 750 200 50 200 1500 400 100 100 200 1000 300 400]
penalty: [0.0500 0 0 0.4500 0 0.0500 0 2 0.0500 0.2000 0 0.0500 0]
time_average_window: [3 2 4 2 4 3 2 3 2 3 3 5 5]
subject: 1
fselect: 2
noblanks: 1
Inside the ebc_params.mat file, there is a cell array of
default_param objects called ebc_params.
Each index in the array corresponds to the result of one of our
parameter searches: index {1, 3, 1} means that these are
the parameters for subject 1, where spatial averaging was performed
after feature selection, and where blanks were excluded from the
training set. Don't worry too much about this now; all that matters
is that we now have the optimal parameters for our experiment for each
of the thirteen regressors of interest.
Next, we start loading the matrix data into the data structures used by MVPA, so we can take advantage of MVPA's automation.
In the MVPA toolbox, all data relating to the analysis for a given
subject in a given experiment is wrapped in a single object,
idiomatically named subj. We initialize an empty
subject with the experiment 'ebc';
>> subj = init_subj('ebc', 'subject1');
Next we insert the data we want into the subject structure, so that we
can take advantage of the MVPA toolbox's automation. MVPA has four
basic data types: pattern, regressors,
selector, and mask.
A mask object is used to specify which voxels in a particular pattern
should be used for analysis. This allows us to easily select
subsets of voxels for analysis based on whatever criteria we
choose. Furthermore, since masks are 3D matrices, they also
provide the grid locations of each voxel in the pattern they
mask. The mask for our data is the matrix 'wholebrain' we loaded
earlier.
To create a new mask object, we use the
initset_object function to INIT a new 'pattern' object
named 'epi', and to SET the object with the matrix 'epi' we loaded
already. Note that like almost all MVPA commands, the output is the
modified subj structure, and the first input parameter is
the subj structure on which to act.
>> subj = initset_object(subj, 'mask', 'wholebrain', wholebrain);
A pattern object is used to hold any dataset that we will use as
the basis of our analysis. Typically these are voxel timeseries, but
they can also be statistics of interest or some transformed or
dimensionality reduced form of the original data, as well. In any
case, now that we have our mask, we load the voxel pattern that we
will be analyzing, and and associate it with the mask. We can use the
same initset_object function as before, but we pass in an
optional object field, masked_by, that specifies
the mask associated with a given pattern.
>> subj = initset_object(subj, 'pattern', 'epi', epi, 'masked_by', 'wholebrain');
A regressors object is used to store any set of data
that is a label for the training or testing phases of the
experiment, i.e. a variable that is trying to be predicted. (Yes,
the name can be confusing -- technically, in the prediction
experiment, the voxels are the regressors -- but it is the best we
have for now.) The baseregs and condnames
data are inserted into subj as a new
regressors object, where condnames is an
optional object field like masked_by was. First we
take only the first thirteen regressors (these are the only ones of
interest), and then we create our new object:
>> baseregs = baseregs(1:13, :); >> condnames = condnames(1:13); >> subj = initset_object(subj, 'regressors', 'baseregs', baseregs, 'condnames', condnames);
Finally, a selector object is used to label each
timepoint with information about the experiment: integers represent
different runs of the fMRI scanner, or a binary selector indicates
rest TRs to be excluded from analysis. MVPA will automatically set
up an N-1 cross-validation experiment (isolate one run for test
data, and assess performance by training on the rest, rotated across
all runs) for you later on using these selectors. Thus the
movies and movies_noblank vectors are
inserted as selector objects.
>> subj = initset_object(subj, 'selector', 'movies', movies); >> subj = initset_object(subj, 'selector', 'movies_noblank', movies_noblank);
We now have our completed subject structure. We use the command
summarize to print out the information on what we have
just created.
>> summarize(subj)
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
Lastly, we clean up the unneeded raw data we loaded just before:
>> clear epi baseregs movies movies_noblank condnames wholebrain;
Although many times removing linear trends from the data can be done
in other programs (such as AFNI), our data has not yet been processed
in this way. Thus the first step is to run Matlab's
detrend function on the data. However, it's important that we
detrend each run separately; while doing such a simple operation might
be relatively easy (maybe 5-10 lines of Matlab code) to code right on
the spot, MVPA makes it easy for us to apply any arbitrary
function to specific runs of a pattern. We use the
apply_to_runs function to do this using the following command.
>> subj = apply_to_runs(subj, 'epi', 'movies', 'apply_detrend');
Beginning apply_to_runs, function 'apply_detrend': # runs = 2
1 2
Pattern epi_detrend created by apply_to_runs
apply_to_runs takes in four mandatory arguments: the
subj structure, the pattern object to be processed, the
selector object designating runs, and then the name of the function
to be called on each run. Then apply_to_runs creates a
new pattern for us that contains the individually processed runs
concatenated back together.
The only requirement for a function to be applied to runs is that
it must take as input the pattern associated with a single run and
return some processed form of that run. More arguments can be passed
in optionally (again, more on this later.) We have already
implemented for you three useful functions:
apply_detrend, apply_zscore, and
apply_filt, which detrend, z-score, and filter the data
respectively. We're planning on adding denoising techniques (such as
shrinkage on discrete wavelet transform coefficients) in the future,
but feel free to make your own apply_ functions for any
preprocessing you might wish to perform on the data. (We'd like to see
them, too!)
Finally, now that we have our new pattern 'epi_detrend',
we won't need the original pattern epi anymore. To
conserve RAM (each full 38000x1726 pattern takes up about 500MB of
memory), we tell MVPA to move the pattern epi from RAM to
the harddrive, where it is automatically saved to a folder and
managed. This is accomplished via the move_pattern_to_hd
function.
>> subj = move_pattern_to_hd(subj, 'epi');
>> summarize(subj);
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726] [HD]
2) epi_detrend - [37828 x 1726]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
Summarizing, we see that pattern epi now has the
notation [HD] next to it: this indicates that the pattern
is currently stored on the hard drive.
Now that we have de-trended, Z-Scoring the data (divide by standard
deviation, subtract the mean) is an essential step of the
analysis. This is again an operation we want performed on an
individual run basis; we again will use the
apply_to_runs, but this time we specify
apply_zscore instead. Additionally, we want to specify a
name for the new pattern this time: instead of creating a new pattern
called 'epi_detrend_zscore', we will have
apply_to_runs create a new pattern simply named
epi_z for simplicity.
To pass in optional parameters to any MVPA function, we typically use property-value pairs: first a string naming the optional variable we are providing, and then the value that we wish to assign to that variable. We demonstrate this in the next code sample.
>> subj = apply_to_runs(subj, 'epi_detrend', 'movies', 'apply_zscore', 'new_patname', 'epi_z');
Beginning apply_to_runs, function 'apply_zscore': # runs = 2
1 2
Pattern epi_z created by apply_to_runs
Once again, we now have another pattern residing in memory that we're not going to use. We'll move it back to the hard drive, like we did before:
>> subj = move_pattern_to_hd(subj, 'epi_detrend');
Another crucial pre-processing step is to remove uninformative voxels from the analysis. This involves choosing and calculating a test statistic for our pattern, and then either removing all voxels beneath a certain threshold (say, p<0.05 from an ANOVA) or taking some sorted subset of the voxels (say, the lowest 500 p-values). But before we do any feature selection, we want to ensure that we avoid 'peeking' at all times: if any data point is going to be used in the testing phase of our cross validation experiment, we cannot include it in our statistical calculations when selecting important voxels. This would artifically boost the performance of our training and could lead us to conclude that we have better generalization performance than is actually the case. Thus, even though we are not ready to perform the experiment, we need to set up the data partitioning of the cross validation experiment in order to rigorously complete our pre-processing.
Luckily, this is quite easy within the MVPA toolbox. The
create_xval_indices function will automatically
partition a single selector object intoa group of
leave-out-one cross-validation trials. Furthermore, if we specifiy
an actives selector, a binary selector indicating blank or
rest TRs, create_xval_indices will also remove these
rest TRs from our experimental trials too. Thus setting up the
cross validation selectors is reduced to a single command.
>> subj = create_xvalid_indices(subj, 'movies', 'actives_selname','movies_noblank');
Selector group 'movies_xval' created by create_xvalid_indices
>> summarize(subj)
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726] [HD]
2) epi_detrend - [37828 x 1726] [HD]
3) epi_z - [37828 x 1726]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
3) movies_xval_1 - [GRP size 2] [ 1 x 1726]
4) movies_xval_2 - [GRP size 2] [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
We now have two new selector objects corresponding to the
two iterations of our cross-validation experiment:
movies_xval_1 and movies_xval_2. These are
both part of a group named movies_xval.
Now we are ready to extract the important voxels from the data. We
will select a subset of the most informative voxels from the 38,000
in 'epi_z'. In the MVPA toolbox, this is implemented as the
feature_select function: feature_select
takes in pattern, regressors, and
selector objects and optionally creates a new
mask selecting voxels that match some desirability
metric: since MVPA was originally designed to facilitate
classification experiments (predicting discrete conditions), the
default is to select voxels that vary significantly across
conditions using an ANOVA with a threshold of p<0.05.
However, we are performing regression, not classification, and our
regressors are continuous variables and not discrete labels; we
cannot perform an ANOVA, so we instead use the cross-correlation
coefficient between each voxel and each feature regressor. These
values will be stored in a type of pattern object called a
statmap: a single value for each voxel that corresponds
to some statistic that is useful in the future. To perform feature
selection, we will calculate the cross-correlation coeffficients for
each voxel and each regressor variable, store these in a statmap
pattern, and then finally create a new mask object to select only
the highest N voxels from that statmap, where N is a
parameter we take from the default_params structure. We
will use these masks when we run our learning and prediction
algorithms. (Note: MVPA allows you to easily create and plug in your
own statmap functions for use in
feature_select.)
Because this is an introductory tutorial, we will limit our analysis
to a single regressor of interest. However, so that we can consider each
regressor separately, we use MVPA's separate_regressors
function to break a regressors matrix into a group of
individually named row vectors.
>> subj = separate_regressors(subj, 'baseregs');
Regressors group 'baseregs_grp' created by separate_regressors.m
>> summarize(subj)
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726] [HD]
2) epi_detrend - [37828 x 1726] [HD]
3) epi_z - [37828 x 1726]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
2) Amusement - [GRP size 13] [ 1 x 1726]
3) Attention - [GRP size 13] [ 1 x 1726]
4) Arousal - [GRP size 13] [ 1 x 1726]
5) Body Parts - [GRP size 13] [ 1 x 1726]
6) Environmental Sounds - [GRP size 13] [ 1 x 1726]
7) Faces - [GRP size 13] [ 1 x 1726]
8) Food - [GRP size 13] [ 1 x 1726]
9) Language - [GRP size 13] [ 1 x 1726]
10) Laughter - [GRP size 13] [ 1 x 1726]
11) Motion - [GRP size 13] [ 1 x 1726]
12) Music - [GRP size 13] [ 1 x 1726]
13) Sadness - [GRP size 13] [ 1 x 1726]
14) Tools - [GRP size 13] [ 1 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
3) movies_xval_1 - [GRP size 2] [ 1 x 1726]
4) movies_xval_2 - [GRP size 2] [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
We now have thirteen new regressors objects corresponding to the
individual rows of our baseregs matrix. We only need one
of them, but it's okay to leave the rest in, because none of these
objects takes up close to the amount of RAM that our single
epi_z pattern does. Furthermore, if you find that the output from
summarize is getting too crowded, you can always turn off
the display of individual group members:
>> summarize(subj, 'display_groups', false);
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726] [HD]
2) epi_detrend - [37828 x 1726] [HD]
3) epi_z - [37828 x 1726]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
2-14) baseregs_grp * [GRP size 13] [ 1 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
3- 4) movies_xval * [GRP size 2] [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
Now it's time to run feature_select. In this example,
we're choosing the 'Amusement' regressor, but you can use whichever you
want in your own experiments. Because feature_select was
originally designed for classification, we need to pass in a bunch of
optional arguments to specify that we want to use our own
statmap function and to avoid automatically creating a
threshold mask. Thus, the optional arguments
statmap_funct and statmap_arg become
statmap_xcorr (our cross-correlation statistic, written
in statmap_xcorr.m) and
[] (it needs no extra arguments.) Furthermore, we set
the thresh argument to [](empty) to prevent
feature_select from automatically creating threshold
masks for us.
>> subj = feature_select(subj, 'epi_z', 'Amusement', 'movies_xval',
'statmap_funct', 'statmap_xcorr',
'statmap_arg', [],
'new_map_patname', 'stat_Amusement',
'thresh', []);
Starting 2 statmap_xcorr iterations
1 2
Pattern statmap group 'stat_Amusement' and mask group 'epi_z_thresh' created by feature_select
>> summarize(subj)
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726] [HD]
2) epi_detrend - [37828 x 1726] [HD]
3) epi_z - [37828 x 1726]
4) stat_Amusement_1 - [GRP size 2] [37828 x 1]
5) stat_Amusement_2 - [GRP size 2] [37828 x 1]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
2) Amusement - [GRP size 13] [ 1 x 1726]
3) Attention - [GRP size 13] [ 1 x 1726]
4) Amusement - [GRP size 13] [ 1 x 1726]
5) Body Parts - [GRP size 13] [ 1 x 1726]
6) Environmental Sounds - [GRP size 13] [ 1 x 1726]
7) Faces - [GRP size 13] [ 1 x 1726]
8) Food - [GRP size 13] [ 1 x 1726]
9) Language - [GRP size 13] [ 1 x 1726]
10) Laughter - [GRP size 13] [ 1 x 1726]
11) Motion - [GRP size 13] [ 1 x 1726]
12) Music - [GRP size 13] [ 1 x 1726]
13) Sadness - [GRP size 13] [ 1 x 1726]
14) Tools - [GRP size 13] [ 1 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
3) movies_xval_1 - [GRP size 2] [ 1 x 1726]
4) movies_xval_2 - [GRP size 2] [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
Ignore the 'epi_z_thresh' message output; there was no mask created,
and that message is a minor bug that will be fixed in the next release. We
now have have a group of two statmap patterns for our
'Amusement' regressor, one for each iteration of the cross validation
experiment (remember, we need to avoid 'peeking.') We next create a
group of mask objects named 'Amusement' and containig the
N ``best'' voxels using the created_sorted_mask
function. In the case of ''Amusement'', N equals 400.
>> subj = create_sorted_mask(subj, 'stat_Amusement', 'Amusement', 400);
>> summarize(subj, 'display_groups', false)
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726] [HD]
2) epi_detrend - [37828 x 1726] [HD]
3) epi_z - [37828 x 1726]
4- 5) stat_Amusement * [GRP size 2] [37828 x 1]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
2-14) baseregs_grp * [GRP size 13] [ 1 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
3- 4) movies_xval * [GRP size 2] [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
2- 3) Amusement * [GRP size 2] [ 64 x 64 x 34] [ 350]
Now we have two new mask objects, again, one for each
iteration of the cross validation experiment. At this point, we will
not perform any more pre-processing of the data. In future versions of
this tutorial (soon to be released), we will perform the averaging
optimizations used by Denis Chirigev and Greg Stephens in their EBC submission.
There are two major optimizations we can perform at this point. The first is to spatially average values of the intra-cranial voxels. Note that we are performing this spatial smoothing after already selecting for the most important voxels based on non-smoothed data. This is because it seems to provide the best performance empirically, and this is the way that Denis & Greg implemented their smoothing when they were under time pressure. Once you've finished the tutorials, you should be able to try experimenting with various smoothing configurations to see if you can improve performance even more.
To perform the spatial filtering, we use the function
create_spatial_avg_pat, which takes in a pattern name and
a mask name and spatially smooths the voxels that survive the mask
(thus eliminating the need to smooth extra-cranial voxels or the need
to pad borders with zeros.) This is very computational expensive to
write within Matlab, and is currently the longest operation of this
toolbox. Depending on the speed of your processor, this should take
somewhere between 10 and 45 minutes to complete. Note that once
again, once the operation is complete, we will move the old pattern
out of memory to conserve RAM.
>> subj = create_spatial_avg_pat(subj, 'epi_z', 'wholebrain');
Starting create_spatial_avg_pat on 'epi_z', 37828 voxels
progress: 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
Pattern 'epi_z_savg' created by create_spatial_avg_pat
>> subj = move_pattern_to_hd(subj, 'epi_z');
Another important optimization is to temporally average individual
voxels. This generally boosts generalization performance by at least
0.1 in the correlation coefficient. At this point, we're going to
temporally filter the spatially smoothed pattern generated by the
previous step, 'epi_z_savg'. However, this time, we can
save on memory and computation time by only smoothing those voxels
that we know we will be using in the regression at the end of this
experiment; we will use the two masks in the mask group
Amusement (generated from the statmap
stat_Amusement) and filter only voxels that are allowed
through the mask. We also want to filter each run separately as well.
Once again we can use apply_to_runs: as optional
parameters, apply_to_runs will take in a mask name or the
name of a mask group, and for each mask in th group, it will apply the
processing to individual runs of a masked pattern. Thus, running
apply_to_runs with the mask group Amusement
will produce two new filtered patterns. Finally, we need to create
the filter based on the default parameters we loaded before. We see
that the default time_average_window parameter is 3.
The first step to begin filtering is to make the averaging box filter the data will be filtered by. This is just a row vector of the desired size that sums to one.
>> filt = ones(1, 3); >> filt = filt./sum(filt);
Now we pass this filter into apply_filt through
apply_to_runs as an optional argument,
'filt'. All the optional arguments to
apply_filt are passed to the interior 'apply' function as well.
Furthermore, we specify that this pattern group should be named
''epi_z_savg_tavg_Amusement'', so that it is clear how the
pattern was created: spatial and temporal averaging, with the
Amusement mask.
>> subj = apply_to_runs(subj, 'epi_z_savg', 'movies', 'apply_filt',
'maskname', 'Amusement', 'new_patname',
'epi_z_savg_tavg_Amusement',
'filt', filt);
Beginning apply_to_runs, function 'apply_filt': # runs = 2
1 2
Pattern group epi_z_savg_tavg_Amusement created by apply_to_runs
If we now summarize, we see the new
epi_z_savg_tavg_Amusement group of patterns, just as we wanted. We're
now finished all pre-processing of the data, and can continue on to
the cross validation stage of the experiment.
>> summarize(subj, 'display_groups', false)
Subject 'subject1' in 'ebc' experiment
Patterns - [ nVox x nTRs]
1) epi - [37828 x 1726] [HD]
2) epi_detrend - [37828 x 1726] [HD]
3) epi_z - [37828 x 1726] [HD]
4- 5) stat_Amusement * [GRP size 2] [37828 x 1]
6) epi_z_savg - [37828 x 1726]
7- 8) epi_z_savg_tavg_Amusement * [GRP size 2] [ 400 x 1726]
Regressors - [nCond x nTRs]
1) baseregs - [ 13 x 1726]
2-14) baseregs_grp * [GRP size 13] [ 1 x 1726]
Selectors - [nCond x nTRs]
1) movies - [ 1 x 1726]
2) movies_noblank - [ 1 x 1726]
3- 4) movies_xval * [GRP size 2] [ 1 x 1726]
Masks - [ X x Y x Z ] [ nVox]
1) wholebrain - [ 64 x 64 x 34] [37828]
2- 3) Amusement * [GRP size 2] [ 64 x 64 x 34] [ 350]
At this point performing the actual cross validation experiment and
getting our predictions is very easy. The
cross_validation function will perform an entire cross
validation experiment for a given pattern or group of
pattern, given a set of regressors,
selector objects, and mask objects. By
default, the MVPA toolbox is configured for classification
experiments, which calculate performance by tallying 'errors' made
when the prediction of the algorithm doesn't match the label in the
regressors object. Thus, we will need to use our own
performance metric function if we want to calculate the correlation
coefficient of the predictions with the real regressor values.
This custom metric is already written in
perfmet_xcorr.m (Note: MVPA makes it easy to write your
own performance metrics and plug them into
cross_validation; see the manual for details.)
In this tutorial, we'll be using the ridge regression algorithm as
implemented by Greg Stephens. This is set using the
class_args structure, which is required by
cross_validation and which is passed to the specified
training and testing algorithms. As a penalty parameter, we choose
the value taken from default_params, which in this case
is 0.05, multiplied by the number of voxels included int
the analysis (400). A good rule of thumb for ridge
regression is that the penalty parameter should increase linearly as
the number of voxels included increases. We will also substitute
some variables for values that one might normally hard-code, so that
the syntax of cross_validation will be more clear to
the reader.
>> class_args.train_funct_name = 'train_ridge'; >> class_args.test_funct_name = 'test_ridge'; >> class_args.penalty = 0.05 * 400; >> regsname = 'Amusement'; >> maskname = 'Amusement';
Finally, like feature_select, we want to override many
of the default classification-based settings of
cross_validation: we specify
perfmet_functs to be perfmet_xcorr and
perfmet_args to be {[]} (we're using cross
correlation, which takes no arguments.) All iterations of the
experiment will be run automatically, and all relevant data to the
experiment will be recorded in a results structure that
is returned by cross_validation. (We're not going to
go into the details results structure here, but feel free to explore
it on your own.)
>> [subj results] = cross_validation(subj, 'epi_z_savg_tavg_Amusement', regsname, 'movies_xval', maskname, class_args,
'perfmet_functs', 'perfmet_xcorr',
'perfmet_args', {[]});
Starting 2 cross-validation classification iterations - train_ridge
1 0.11
2 0.36
060623_1457: Cross-validation using train_ridge and test_ridge - got total_perfs - 0.23561
>> results
results =
header: [1x1 struct]
iterations: [1x2 struct]
total_perf: 0.2356
Although the cumulative score across both iterations is only 0.2356, and the score on the first iteration is still dismal, the score of .36 on the second iteration (generalizing from movie 1 onto movie 2) is indicative of the advantages to this approach. In the advanted tutorial, several more steps towards optimization are taken: leaving blanks in the training runs, using slightly different parameters, and averaging the predictions of all three subjects together. When these optimizations are all performed, the final EBC (transformed) score is .476, much higher than one might anticipate.
This concludes the introductory tutorial. If you're confused and
need more detailed explanations about any particular part of the
toolbox, please see the appropriate section in the manual. If you want to continue the
analysis and get results for each regression and every subject,
please continue to the advanced tutorial by loading and examining
ebc_tutorial_adv.m Right now, there is no HTML writeup
for the advanced tutorial, but the code is quite similar (just with
more loops and Matlab tricks) and there should be comments to help
you along the way.
Thank you for downloading the MVPA toolbox, and we hope that you will find it useful.