Applying (with as few loops as possible) a function to given elements/voxels (x,y,z) taken from subfields of multiple structs (nifti's) in MATLAB? - matlab

I have a dataset of n nifti (.nii) images. Ideally, I'd like to be able to get the value of the same voxel/element from each image, and apply a function to the n data points. I'd like to do this for each voxel/element across the whole image, so that I can reconvert the result back into .nii format.
I've used the Tools for NIfTI and ANALYZE image toolbox to load my images:
data(1)=load_nii('C:\file1.nii');
data(2)=load_nii('C:\file2.nii');
...
data(n)=load_nii('C:\filen.nii');
From which I obtain a struct object with each sub-field containing one loaded nifti. Each of these has a subfield 'img' corresponding to the image data I want to work on. The problem comes from trying to select a given xyz within each img field of data(1) to data(n). As I discovered, it isn't possible to select in this way:
data(:).img(x,y,z)
or
data(1:n).img(x,y,z)
because matlab doesn't support it. The contents of the first brackets have to be scalar for the call to work. The solution from googling around seems to be a loop that creates a temporary variable:
for z = 1:nz
for x = 1:nx
for y = 1:ny
for i=1:n;
points(i)=data(i).img(x,y,z);
end
[p1(x,y,z,:),~,p2(x,y,z)] = fit_data(a,points,b);
end
end
end
which works, but takes too long (several days) for a single set of images given the size of nx, ny, nz (several hundred each).
I've been looking for a solution to speed up the code, which I believe depends on removing those loops by vectorisation, preselecting the img fields (via getfield ?)and concatenating them, and applying something like arrayfun/cellfun/structfun, but i'm frankly a bit lost on how to do it. I can only think of ways to pre-select which themselves require loops, which seems to defeat the purpose of the exercise (though a solution with fewer loops, or fewer nested loops at least, might do it), or fun into the same problem that calls like data(:).img(x,y,z) dont work. googling around again is throwing up ways to select and concatenate fields within a struct, or a given field across multiple structs. But I can't find anything for my problem: select an element from a non-scalar sub-field in a sub-struct of a struct object (with the minimum of loops). Finally I need the output to be in the form of a matrix that the toolbox above can turn back into a nifti.
Any and all suggestions, clues, hints and help greatly appreciated!

You can concatenate images as a 4D array and use linear indexes to speed up calculations:
img = cat(4,data.img);
p1 = zeros(nx,ny,nz,n);
p2 = zeros(nx,ny,nz);
sz = ny*nx*nz;
for k = 1 : sz
points = img(k:sz:end);
[p1(k:sz:end),~,p2(k)] = fit_data(a,points,b);
end

Related

How to define multiple objects under a class in a for loop in matlab? [duplicate]

I have 200 time points. For each time point, there is an image, the size of which is 40*40 double, corresponds to this time point. For example, image 1 corresponds to time point 1; image k corresponds to time point k (k = 1,2,...,200).
The time points are T = 1:200, with the images named as Image_T, thus Image_1, Image_2 etc.
I want to put all these 200 images together. The final size is 40*40*200 double. The final image looks like fMRI image (fmri_szX = 40, fmri_szY = 40 and fmri_szT = 200). How to achieve that?
Thanks!
Dynamic variables
Note that whilst this is possible, it's considered to be bad programming (see for instance here, or this blog by Loren and even the Mathworks in their documentation tell you not to do this). It would be much better to load your images directly into either a 3D array or a cell structure, avoiding dynamic variable names. I just posted this for completeness; if you ever happen to have to use this solution, you should change to a (cell-) array immediately.
The gist of the linked articles as to why eval is such a bad idea, is that MATLAB can no longer predict what the outcome of the operation will be. For instance A=3*(2:4) is recognised by MATLAB to output a double-array. If you eval stuff, MATLAB can no longer do this. MATLAB is an interpreted language, i.e. each line of code is read then ran, without compiling the entire code beforehand. This means that each time MATLAB encounters eval, it has to stop, evaluate the expression, then check the output, store that, and continue. Most of the speed-engines employed by MATLAB (JIT/MAGMA etc) can't work without predicting the outcome of statements, and will therefore shut down during the eval evaluation, rendering your code very slow.
Also there's a security aspect to the usage of eval. Consider the following:
var1 = 1;
var2 = 2;
var3 = 3;
varnames = {'var1', 'var2; disp(''GOTCHA''); %', 'var3'};
accumvar = [];
for k = 1:numel(varnames)
vname = varnames{k};
disp(['Reading from variable named ' vname]); eval(['accumvar(end+1) = ' vname ';']);
end
Now accumvar will contain the desired variable names. But if you don't set accumvar as output, you might as well not use a disp, but e.g. eval('rm -rf ~/*') which would format your entire disk without even telling you it's doing so.
Loop approach
for ii = 200:-1:1
str = sprintf('Image_%d',ii);
A(:, :, :, ii) = eval(str);
end
This creates your matrix. Note that I let the for loop run backwards, so as to initialise A in its largest size.
Semi-vectorised approach
str = strsplit(sprintf('image_%d ',1:200),' '); % Create all your names
str(end) = []; % Delete the last entry (empty)
%Problem: eval cannot handle cells, loop anyway:
for ii = 200:-1:1
A(:, :, :, ii) = eval(str{ii});
end
eval does not support arrays, so you cannot directly plug the cellarray strin.
Dynamic file names
Despite having a similar title as above, this implies having your file names structured, so in the file browser, and not MATLAB. I'm assuming .jpg files here, but you can add every supported image extension. Also, be sure to have all images in a single folder and no additional images with that extension, or you have to modify the dir() call to include only the desired images.
filenames = dir('*.jpg');
for ii = length(filenames):-1:1
A(:,:,:,ii) = imread(filenames{ii});
end
Images are usually read as m*n*3 files, where m*n is your image size in pixels and the 3 stems from the fact that they're read as RGB by imread. Therefore A is now a 4D matrix, structured as m*n*3*T, where the last index corresponds to the time of the image, and the first three are your image in RGB format.
Since you do not specify how you obtain your 40*40 double, I have left the 4D matrix. Could be you read them and then switch to using a uint16 interpretation of RGB, which is a single number, which would result in a m*n*1*T variable, which you can reduce to a 3D variable by calling A = squeeze(A(:,:,1,:));

Read large number of .h5 datasets

I'm working with these h5 files that have tens of thousands of datasets that contains vectors of numerical values and all of the same size. My goal is to read the datasets and create one large matrix from these vectors. The datasets are named from "0" to "xxxxx" (some large number) I was able to read them and get the matrix but it takes forever to do so. I was wondering if you can take a look at my code and suggest a way to make it run faster
here is how I do it right now
t =[];
for i = 0:40400 % there are 40401 datasets in this particular file
j = int2str(i);
p = '/mesh/'; % The parent group
s = strcat(p,j); % to create the full path of a dataset e.g. '/mesh/0'
r = h5read('temp.h5',s); % the file name is temp and s has the dataset path
t = [t;r];
end
in this particular case, there are 40401 datasets, each has 80802x1 vector of numerical values. Therefore eventually I want to create 80802x40401 matrix. This code takes over a day to finish. I think one of the reason it is slow because in every iteration, matlab access the h5 file. I would appreciate it if some of you have some tips in speeding up the code
When I copied you code in an editor, I get the red tilde under the t with the warning:
The variable t appears to change size on every loop iteration. Consider preallocating for speed.
You should allocate the final memory of t before starting the loop, with the function zeros:
t = zeros(80804,40401);
You should also read this: Programming Patterns: Maximizing Code Performance by Optimizing Memory Access:
Preallocate arrays before accessing them within loops
Store and access data in columns
Avoid creating unnecessary variables
Maybe p = '/mesh/'; is useless inside the loop and can be done outside the loop, since it doesn't change. It could be even better to not have p and directly do s = strcat('/mesh/',j);

Create a loop using part of variable name in MATLAB

I am a beginner in Matlab and have not been able to find an answer to my question so far. Your help will definitely be very much appreciated.
I have 70 matrices (100x100), named SUBJ_1, SUBJ_2 etc. I would like to create a loop so that I would calculate some metrics (i.e. max and min values) for each matrix, and save the output in a 70x2 result matrix (where each row would correspond to the consecutively named SUBJ_ matrix).
I am struggling with both stages - how to use the names of individual variables in a 'for' loop and how to properly save individual outputs in a combined array.
Many thanks and all the best!
Don't use such variable names, create a big cell array named SUBJ and put each Matrix in it.
r=zeros(numel(SUBJ),2)
for idx=1:numel(SUBJ)
r(idx,1)=min(min(SUBJ{idx}))
r(idx,2)=max(max(SUBJ{idx}))
end
min and max are called twice because first call creates maximum among rows, second call among columns.
Even though this is in principle possible in Matlab, I would not recommend it: too slow and cumbersome to implement.
You could instead use a 3-D matrix (100x100x70) SUBJ which would contain all the SUBJ_1 etc. in one matrix. This would allow you to calculate min/max etc. with just one line of code. Matlab will take care of the loops internally:
OUTPUT(:,1) = min(min(SUBJ,[],1)[],2);
OUTPUT(:,2) = max(max(SUBJ,[],1)[],2);
Like this, OUTPUT(1,1) contains min(min(SUBJ(:,:,1))) and so on...
As to how to use the names of individual variables in a 'for' loop, here gives an example:
SUBJ = [];
for idx = 1:70
term = eval(['SUBJ_',num2str(idx)]);
SUBJ = [SUBJ; max(max(term)),min(min(term))];
end

Import multiple tab delimited files into matlab from different subdirectories

Sorry I am new to matlab.
What I have: A folder containing about 80 subfolders, labeled Day01, Day02, Day03, etc. Each subfolder has a file called "sample_ids.txt" It is a n x m matrix in a tab delimited format.
What I need: 1 data structure that is an array of matrices, where each matrix is the data from "sample_ids.txt" and it should be in the alphabetical order of Day01, Day02, Day03, etc.
I have no idea how to get from point A to point B. Any guidance would be greatly appreciated.
You can decompose this problem into two parts: finding the files, and reading them into memory.
Finding the files is pretty easy, and has already been covered on StackOverflow.
For loading them into memory, you want a multidimensional array, which is as simple as creating a regular array and start using more index dimensions: A = ones(2); A(:,:,2) = ones(2); will, for example, give you a 3-dimensional array of size 2-by-2-by-2, with ones all over.
What you want, is probably want something like this:
A = [] % No prealocation. Fix for speed-up.
files = dir('./Day*/sample_ids.txt');
for file = files
temp = load(file.name);
A(:,:,size(A,3)+1) = temp;
end
disp(A) % display the contents of A afterards...
I haven't tested this code extensively, but it should work OK.
A few important points:
All files must contain matrices of the exact same dimensions - MATLAB can't handle arrays that have different dimensions in different layers (at least not with regular arrays - you could use cell arrays, but that quickly becomes more complicated...). Think of it as trying to build a matrix from vectors of different lengths.
If you have a lot of data, and you know how much, you can save a lot of time by pre-allocating A. This is as easy as A = zeros(k,l,m) for m datafiles with k rows and l columns in each. If you do this, you'll also have to figure out the index of the current file, so you can use that as the third index in the assignment (on the second line in the loop block). I leave this as an internet research excersize :)

Matlab: Query complicated structures

I am using structures in Matlab to organize my results in an intuitive way. My analysis is quite complex and hierarchical, so this works well---logically. For example:
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1). Each level of the structure has several fields. Each alpha field is a structured array, indexed for each dataset, and classification is also a structured array, one for each cross validation run on the data.
To simplify, consider the the classification field:
>> classification
ans =
1x8 struct array with fields:
bestLambda
bestBetas
scores
statObj
fitObj
In which statObj has fields (for example):
dprime: 6.5811
hit: 20
miss: 0
falseAlarms: 0
correctRejections: 30
Of course, the fields have different values for each subject and cross validation run. Given this structure, is there a good way to find the mean of dprime over cross validation runs (i.e. the elements of classification) without needing to construct a for loop to extract, store, and finally compute on?
I was hoping that reshape(struct2array(classification.statObj),5,8) would work, so I could construct a matrix with stats as rows and cross validations runs as columns, but this won't work. I put these items in their own structure specifically because the fields of classification hold elements of various types (matrices, structures, integers).
I am not opposed to restructuring my output entirely, but I'd like it to be done in such a way that the organization is fairly self-commenting, and I could say return to this structure a year from now and remember what and where everything is.
I came up with the following, although I'm not sure if it is what you are looking for:
%# create a structure hierarchy similar to yours
%# (I ignore everything before alpha10, and only create a part of it)
alpha10 = struct();
for a=1:5
alpha10(a).classification = struct();
for c=1:8
alpha10(a).classification(c).statObj = struct('dprime',rand());
end
end
%# matrix of 'dprime' for each alpha across each cross-validation run
st = [alpha10.classification];
st = [st.statObj];
dp = reshape([st.dprime], 8, 5)' %# result is 5-by-8 matrix
Next you can compute mean across the second dimension of this matrix dp
For anyone who happens across this post, and is wrestling with something similar, it is worth asking yourself if such a nested structure-of-structures is really your best option. It may be easier to flatten the hierarchy and include descriptive fields as labels. For instance
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1)
might instead be
resultObj(1).
AnlaysisType = 'multivariate'
GroupSolution = false
SignalType = 'distributed'
Processing = 'raw'
alpha = 10
crossvalidation = 1
dprime = 6.5811
bestLambda = []
bestBetas = []
scores = []
fitObj = []
That's not valid Matlab syntax there, but it get's the point across. Rather than building a hierarchy out of nested structures, create a 1xN structure with labels and data. It is a more general solution that is easier to query and work with.