I am using structures in Matlab to organize my results in an intuitive way. My analysis is quite complex and hierarchical, so this works well---logically. For example:
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1). Each level of the structure has several fields. Each alpha field is a structured array, indexed for each dataset, and classification is also a structured array, one for each cross validation run on the data.
To simplify, consider the the classification field:
>> classification
ans =
1x8 struct array with fields:
bestLambda
bestBetas
scores
statObj
fitObj
In which statObj has fields (for example):
dprime: 6.5811
hit: 20
miss: 0
falseAlarms: 0
correctRejections: 30
Of course, the fields have different values for each subject and cross validation run. Given this structure, is there a good way to find the mean of dprime over cross validation runs (i.e. the elements of classification) without needing to construct a for loop to extract, store, and finally compute on?
I was hoping that reshape(struct2array(classification.statObj),5,8) would work, so I could construct a matrix with stats as rows and cross validations runs as columns, but this won't work. I put these items in their own structure specifically because the fields of classification hold elements of various types (matrices, structures, integers).
I am not opposed to restructuring my output entirely, but I'd like it to be done in such a way that the organization is fairly self-commenting, and I could say return to this structure a year from now and remember what and where everything is.
I came up with the following, although I'm not sure if it is what you are looking for:
%# create a structure hierarchy similar to yours
%# (I ignore everything before alpha10, and only create a part of it)
alpha10 = struct();
for a=1:5
alpha10(a).classification = struct();
for c=1:8
alpha10(a).classification(c).statObj = struct('dprime',rand());
end
end
%# matrix of 'dprime' for each alpha across each cross-validation run
st = [alpha10.classification];
st = [st.statObj];
dp = reshape([st.dprime], 8, 5)' %# result is 5-by-8 matrix
Next you can compute mean across the second dimension of this matrix dp
For anyone who happens across this post, and is wrestling with something similar, it is worth asking yourself if such a nested structure-of-structures is really your best option. It may be easier to flatten the hierarchy and include descriptive fields as labels. For instance
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1)
might instead be
resultObj(1).
AnlaysisType = 'multivariate'
GroupSolution = false
SignalType = 'distributed'
Processing = 'raw'
alpha = 10
crossvalidation = 1
dprime = 6.5811
bestLambda = []
bestBetas = []
scores = []
fitObj = []
That's not valid Matlab syntax there, but it get's the point across. Rather than building a hierarchy out of nested structures, create a 1xN structure with labels and data. It is a more general solution that is easier to query and work with.
Related
I have a dataset of n nifti (.nii) images. Ideally, I'd like to be able to get the value of the same voxel/element from each image, and apply a function to the n data points. I'd like to do this for each voxel/element across the whole image, so that I can reconvert the result back into .nii format.
I've used the Tools for NIfTI and ANALYZE image toolbox to load my images:
data(1)=load_nii('C:\file1.nii');
data(2)=load_nii('C:\file2.nii');
...
data(n)=load_nii('C:\filen.nii');
From which I obtain a struct object with each sub-field containing one loaded nifti. Each of these has a subfield 'img' corresponding to the image data I want to work on. The problem comes from trying to select a given xyz within each img field of data(1) to data(n). As I discovered, it isn't possible to select in this way:
data(:).img(x,y,z)
or
data(1:n).img(x,y,z)
because matlab doesn't support it. The contents of the first brackets have to be scalar for the call to work. The solution from googling around seems to be a loop that creates a temporary variable:
for z = 1:nz
for x = 1:nx
for y = 1:ny
for i=1:n;
points(i)=data(i).img(x,y,z);
end
[p1(x,y,z,:),~,p2(x,y,z)] = fit_data(a,points,b);
end
end
end
which works, but takes too long (several days) for a single set of images given the size of nx, ny, nz (several hundred each).
I've been looking for a solution to speed up the code, which I believe depends on removing those loops by vectorisation, preselecting the img fields (via getfield ?)and concatenating them, and applying something like arrayfun/cellfun/structfun, but i'm frankly a bit lost on how to do it. I can only think of ways to pre-select which themselves require loops, which seems to defeat the purpose of the exercise (though a solution with fewer loops, or fewer nested loops at least, might do it), or fun into the same problem that calls like data(:).img(x,y,z) dont work. googling around again is throwing up ways to select and concatenate fields within a struct, or a given field across multiple structs. But I can't find anything for my problem: select an element from a non-scalar sub-field in a sub-struct of a struct object (with the minimum of loops). Finally I need the output to be in the form of a matrix that the toolbox above can turn back into a nifti.
Any and all suggestions, clues, hints and help greatly appreciated!
You can concatenate images as a 4D array and use linear indexes to speed up calculations:
img = cat(4,data.img);
p1 = zeros(nx,ny,nz,n);
p2 = zeros(nx,ny,nz);
sz = ny*nx*nz;
for k = 1 : sz
points = img(k:sz:end);
[p1(k:sz:end),~,p2(k)] = fit_data(a,points,b);
end
i searched a lot in google but didnt find an answer that did help me without reducing my performance.
I have a Matrice A and B of the same size with different values. Then i want to filter:
indices=find(A<5 & B>3)
A(indices)=
B(indices)=
Now I want to apply a function on the indices -> indices_2=find(A>=5 | b<=3) without using the find function on the whole matrices A and B again. Logic operations are not possible in this case because I need the indices and not 0 and 1.
Something like:
A(~indices)=
B(~indices)=
instead of:
indices_2=find(A>=5 | B<=3)
A(indices_2)=
B(indices_2)=
And after that I want to split these sets once again.... Just Filtering.
I used indices_2=setdiff(indices, size(A)) but it did screw my computation performance. Is there any other method to split the matrices into subsets without using find twice?
Hope you understand my problem and it fits the regulations.
I don't understand why you can't just use find again, nor why you can't use logical indexing in this case but I suppose if you are going to restrict yourself like this then you could accomplish this using setdiff:
indices_2 = setdiff(1:numel(A), indices)
however, if you are worried about performance, you should be sticking to logical indexing:
indices = A<5 & B>3
A(indices)=...
B(indices)=...
A(~indices)=...
B(~indices)=...
I think you may be looking for something like this:
%Split your data in two and keep track of which numbers you have
ranks = 1:numel(A);
indices= find(A<5 & B>3);
% Update the numbers list to contain the set of numbers you are interested in
ranks_2 = ranks(~indices)
% Operate on the set you are interested in and find the relevant ranks
indices_2= A(~indices)>=5 | B(~indices)<=3
ranks_2 = ranks_2(indices_2)
I wanted to ask how to generate a data set in Matlab. I need it to test Feature Selection Algorithms on high dimensional data... The data set should be synthetic, multivariate and contain INTERACTING features.
Synthetic data sets like the MONKS problem is available on http://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems .... unfortunately I have no clue how to visualize/generate and modify the data according to my need. The goal is to run an algorithm which detects interacting features.
I will be very thankful for a kind reply.
I'm not sure this is what you are looking for, but if I needed to do this, I would start by generating anonymous functions and generic variable names that I could apply randomly within a dataset.
For example, you could generate a dataset:
myData = rand(100,6);
and create a few functions which include interdependencies
interact = #(x) x*x;
interact2 = #(x) x*(x-1);
then create a random logical distribution
y = round(rand(100,1)); %(100 rows of random 0's or 1's)
go through the dataset and use the interact function on only rows where y is true
dataset(y == 1,:) = interact(dataset(y==1,:));
repeat the above with the other interaction functions you define if you desire. it would probably be useful to do this so that you can avoid row dependencies (see below) so generating a few datasets could be in order, i.e.
dataset2(y==1,:) = interact2(dataset(y==1,:));
A similar approach might be taken with variables (in the example set it shows some categorical variables).
myVariable = repmat('data', 100, 1);
listofvariables = genvarname(cellstr(myVariable));
y = round(rand(100,1)); % logical index for the data
randomly select a generic variable to repeat
applyvar = round(rand(1,1)*100);
selectedVariable = listofvariables(applyvar);
replace indices of the variable list with your repeated variable
listofvariables(y == 1) = selectedVariable;
put together the dataset(s) in some order of your choosing
[cellstr(num2str(dataset(:,1))) listofvariables cellstr(num2str(dataset(:,2)) cellstr(num2str(dataset2(:,2))]
I am writing a solution that manages data from an eye tracker. I currently hold the data in a N x 5 matrix, with the following columns:
X Position, Y Position, timestamp, Velocity, Acceleration
Each row represents a single sample from the eye tracker (which runs at 1000Hz).
At present, I access the data in the form of a matrix - e.g. if I want to access the velocity for sample #600, I use 'dataStream(600,4)'.
This is fine, but I'd prefer my code to be more readable. The '4' could be confusing; something like dataStream.velocity(600) would be ideal. I understand that this would be a simple use of STRUCT. However, there are situations in which I need to copy an entire sample (i.e. all columns from one row of my matrix). As I understand it, this would not easily be achieved in a STRUCT object, as the various arrays in each STRUCT sub-heading are not intrinsically linked. I would have to (I think) copy each element separately, for example if I wanted to copy sample #100, I believe I would need to copy dataStream.xPos(100), dataStream.yPos(100), dataStream.timestamp(100) and so on separately.
Is there something I'm missing with regards to management of STRUCTs, or would I be better off saving the hassle and sticking with the matrix approach?
If it is just for an increased readability, I would not use structs, but rather use an quite simple approach by defining variables for the different columns of your data matrix. See for instance:
xPosition = 1;
yPosition = 2;
timestamp = 3;
Velocity = 4;
Acceleration = 5;
With this variables you can write quite meaningful queries, for instance, instead of dataStream(600,1) you would write:
dataStream(600, xPosition)
Note that you also could define more complex queries, for instance
position = [1 2];
wholeSample = 1:5;
to query the multiple columns at once.
You can copy struct easily
s = struct(another_struct);
In terms of performance, struct will be slower than matrix. Use readable constant to replace your numerical indices as suggested by #H.Muster.
I'm writing a simulation in Matlab.
I will eventually run this simulation hundreds of times.
In each simulation run, there are millions of simulation cycles.
In each of these cycles, I calculate a very complex function, which takes ~0.5 sec to finish.
The function input is a long bit array (>1000 bits) - which is an array of 0 and 1.
I hold the bit arrays in a matrix of 0 and 1, and for each one of them I only run the function once - as I save the result in a different array (res) and check if the bit array is in the matrix before running the functions:
for i=1:1000000000
%pick a bit array somehow
[~,indx] = ismember(bit_array,bit_matrix,'rows');
if indx == 0
indx = length(results) + 1;
bit_matrix(indx,:) = bit_array;
res(indx) = complex_function(bit_array);
end
result = res(indx)
%do something with result
end
I have two quesitons, really:
Is there a more efficient way to find the index of a row in a matrix then 'ismember'?
Since I run the simulation many times, and there is a big overlap in the bit-arrays I'm getting, I want to cache the matrix between runs so that I don't recalculate the function over the same bit-arrays over and over again. How do I do that?
The answer to both questions is to use a map. There are a few steps to do this.
First you will need a function to turn your bit_array into either a number or a string. For example, turn [0 1 1 0 1 0] into '011010'. (Matlab only supports scalar or string keys, which is why this step is required.)
Defined a map object
cachedRunMap = containers.Map; %See edit below for more on this
To check if a particular case has been run, use iskey.
cachedRunMap.isKey('011010');
To add the results of a run use the appending syntax
cachedRunMap('011010') = [0 1 1 0 1]; %Or whatever your result is.
To retrieve cached results, use the getting syntax
tmpResult = cachedRunMap.values({'011010'});
This should efficiently store and retrieve values until you run out of system memory.
Putting this together, now your code would look like this:
%Hacky magic function to convert an array into a string of '0' and '1'
strFromBits = #(x) char((x(:)'~=0)+48); %'
%Initialize the map
cachedRunMap = containers.Map;
%Loop, computing and storing results as needed
for i=1:1000000000
%pick a bit array somehow
strKey = strFromBits(bit_array);
if cachedRunMap.isKey(strKey)
result = cachedRunMap(strKey);
else
result = complex_function(bit_array);
cachedRunMap(strKey) = reult;
end
%do something with result
end
If you want a key which is not a string, that needs to be declared at step 2. Some examples are:
cachedRunMap = containers.Map('KeyType', 'char', 'ValueType', 'any');
cachedRunMap = containers.Map('KeyType', 'double', 'ValueType', 'any');
cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'any');
cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'double');
Setting a KeyType of 'char' sets the map to use strings as keys. All other types must be scalars.
Regarding issues as you scale this up (per your recent comments)
Saving data between sessions: There should be no issues saving this map to a *.mat file, up to the limits of your systems memory
Purging old data: I am not aware of a straightforward way to add LRU features to this map. If you can find a Java implementation you can use it within Matlab pretty easily. Otherwise it would take some thought to determine the most efficient method of keeping track of the last time a key was used.
Sharing data between concurrent sessions: As you indicated, this probably requires a database to perform efficiently. The DB table would be two columns (3 if you want to implement LRU features), the key, value, (and last used time if desired). If your "result" is not a type which easily fits into SQL (e.g. a non-uniform size array, or complex structure) then you will need to put additional thought into how to store it. You will also need a method to access the database (e.g. the database toolbox, or various tools on the Mathworks file exchange). Finally you will need to actually setup a database on a server (e.g. MySql if you are cheap, like me, or whatever you have the most experience with, or can find the most help with.) This is not actually that hard, but it takes a bit of time and effort the first time through.
Another approach to consider (much less efficient, but not requiring a database) would be to break up the data store into a large (e.g. 1000's or millions) number of maps. Save each into a separate *.mat file, with a filename based on the keys contained in that map (e.g. the first N characters of your string key), and then load/save these files between sessions as needed. This will be pretty slow ... depending on your usage it may be faster to recalculate from the source function each time ... but it's the best way I can think of without setting up the DB (clearly a better answer).
For a large list, a hand-coded binary search can beat ismember, if maintaining it in sorted order isn't too expensive. If that's really your bottleneck. Use the profiler to see how much the ismember is really costing you. If there aren't too many distinct values, you could also store them in a containers.Map by packing the bit_matrix in to a char array and using it as the key.
If it's small enough to fit in memory, you could store it in a MAT file using save and load. They can store any basic Matlab datatype. Have the simulation save the accumulated res and bit_matrix at the end of its run, and re-load them the next time it's called.
I think that you should use containers.Map() for the purpose of speedup.
The general idea is to hold a map that contains all hash values. If your bit arrays have uniform distribution under the hash function, most of the time you won't need the call to ismember.
Since the key type cannot be an array in Matlab, you can calculate some hash function on your array of bits.
For example:
function s = GetHash(bitArray)
s = mod( sum(bitArray), intmax('uint32'));
end
This is a lousy hash function, but enough to understand the principle.
Then the code would look like:
map = containers.Map('KeyType','uint32','ValueType','any');
for i=1:1000000000
%pick a bit array somehow
s = GetHash(bit_array);
if isKey %Do the slow check.
[~,indx] = ismember(bit_array,bit_matrix,'rows');
else
map(s) = 1;
continue;
end
if indx == 0
indx = length(results) + 1;
bit_matrix(indx,:) = bit_array;
res(indx) = complex_function(bit_array);
end
result = res(indx)
%do something with result
end