I am trying to vectorize the following data structure in Matlab but I cannot find/code an efficient way.
A = 1x2 struct array with fields: [a , b , c]
A(1) = a: 1 , b: 2 , c: [1x1 struct]
A(1).c = key: 5
A(2) = a: 1 , b: [] , c: [1x3 struct]
A(2).c = 1x3 struct array with fields: [key , key2]
A(2).c(1).key = 3
A(2).c(2).key = 4
A(2).c(3).key = 7
A(2).c(1).key2 = 10
A(2).c(2).key2 = []
A(2).c(3).key2 = 17
I know. This is a highly inefficient data structure. That's why I am trying to vectorize it with index, so the final structure will look like
A = 1x1 structure with fields [a , b , c , b_index , c_index]
A.a = [1 1]
A.b = [2]
A.b_index = [1]
A.c = 1x1 structure with fields [key key2 key2_index]
A.c_index = [1 2 2 2]
A.c.key = [5 3 4 7]
A.c.key2 = [10 17]
A.c.key2_index = [2 4]
My attempt 1:
I've first tried parfor at each level (for this example, specifically: A, c, key 3 levels) with a survey first to see if it is empty, what data it contains, do I need to index this field. and then vertcat(x.(fieldname)) if it is not a structure leaf. But if it is, I've package it up as a cell and recursively push it down to be vectorized.
That works, but it unfortunately takes too long. When I did a profile on it, it showed the mex distribution function that's taking up all the time. I'm guessing that's because I am doing parfor at every level, hence MATLAB has to index and distribute to each worker very frequently at every level.
My attempt 2:
I've tried to do a parfor survey of the structure completely first. Use a uint8 value for each field. And then at the combination stage, I use vertcat to check the survey results first to see if I need to index and if I need to do cat(3,...) for the data field. But that is memory inefficient and slow at the survey stage. And it doesn't speed up much at the combination stage. Though indexing becomes much easier.
I guess my questions are
1. How can I code it in a way that parfor only index and distribute the whole array once so my first attempt can be more efficient, or is my second attempt a better idea?
2. What is a good general approach to the problem?
My two cents for your 2. question: Matlab's parfor works faster on simple arrays/matrices. This is due to the fact that arrays are allocated contagiously in memory and thus enable faster access and computation. So, instead of having complex structures, I would suggest using simpler arrays etc. if you're more concerned with the performance of your program and not with the readability.
Related
I am trying to convert a structure array to a matrix. Each field of the structure stores a vector that can reach up to 520000 rows. There can be up to 20 fields in a single structure array, but number of rows is the same across the fields.
As a down scaled example consider the structure s, where each field is an integer:
s=struct('a',1,'b',2);
s(2)=struct('a',3,'b',4);
s=s';
In the desired output, each field will correspond to a column. a values will be in the first column, while b values will be in the second:
desiredOutput = [1 2; 3 4];
I have approached this in an indirect way:
cell2mat(struct2cell(s))'
However, this involves two transformations which i find unnecessary due to the well behaved nature of my structure.
I have also approached this using a for loop:
fields = fieldnames(s);
nrows = size(s,1);
ncols = numel(fields);
desiredOutput = nan(nrows,ncols);
for jj=1:ncols
desiredOutput(:,jj) = [s.(fields{fields(jj)})]';
end
I hoped to find a struct2mat function but it does not exist. Is there a simpler way to accomplish this task that I am not aware of?
I had something similar to this written out. So, if you don't mind, I will 'kinda' copy that out over here.
data(1,1).val = 1;
data(1,2).val = 2;
data(2,1).val = 3;
data(2,2).val = 4;
This gives a 2x2 struct with field val.
A = reshape([data.val],size(data))
Now, A looks like this [ 1 2 : 3 4]
A =
1 2
3 4
Does that help?
Assume that we have three vectors say a, b and c including increasing real-valued numbers as follows:
a=[3 4 19 22];
b=[1 10 15];
c=[3 5 11];
What is the most efficient way (without using loops) to find increasing sequence of numbers in such a problem in MATLAB?
For the above example the output should be like this:
[3 10 11]
[4 10 11]
Which both have their first element from a, their second one from b and their third one from c, so they have three increasing elements as it should be.
Note: The first number have to come from a, the second from b and the third from c.
Using loops is not a good choice for this problem because the vectors may have more length and the number of vectors would be increased in general cases, so the run time would take so long.
Any help would be gratefully appreciated…
Thanks in advance
There are several ways of approaching this problem, and one should always consider the time vs. space payoffs.
Approach 1 (Requires a lot of space (n^3 where n is the length of the arrays) but has no direct loops!):
[A,B,C] = meshgrid(a,b,c);
idxs = find((A<B)&(B<C));
[ib,ia,ic] = ind2sub(size(A),idxs);
answer = [a(1,ia);b(1,ib);c(1,ic)];
Approach 2 (Requires less space (n^2 where n is the length of the arrays), but does contain a loop, but it seems to be faster for big values of n. This approach is much faster if you only need to find how many solutions there are.):
ab = double(bsxfun(#lt,a',b));
bc = double(bsxfun(#lt,b',c));
abc = ab*bc;
numberOfAnswers = sum(abc(:));
[idx] = find(abc);
cumnum = [0;cumsum(abc(idx))];
[ia,ic] = ind2sub(size(abc),idx);
answer = zeros(3,numberOfAnswers);
for i = 1:numel(idx)
answer(1,(cumnum(i)+1):cumnum(i+1)) = a(ia(i));
answer(2,(cumnum(i)+1):cumnum(i+1)) = b((b>a(ia(i)))&(b<c(ic(i))));
answer(3,(cumnum(i)+1):cumnum(i+1)) = c(ic(i));
end
EDIT: Both methods will give a 3-by-m-matrix where m is the number of solutions, and the solutions will be the columns of the matrix.
Matlab's syntax is infuriating, specifically with structs. In the Bioinformatics toolkit, there is a method called jcampread(File) which is described here.
In the description, the method jcampread() takes a Filepath and outputs into a struct called jcampStruct. From my understanding, in Matlab, you don't declare return variable types like you do in C: you just give the return variable a name and it somehow knows that the return of jcampread() method will be a jcampStruct. How it does, I have no idea, but it does.
I put in the code exactly how their example shows in step 4 of the Example section, and I get the following error message back from Matlab:
Incorrect number of right hand side elements in
dot name assignment. Missing [] around left hand
side is a likely cause.
Error in jcampread>ntupleRead (line 510)
dataBlock.ZName = name{Zidx};
Error in jcampread (line 192)
dataBlocks = ntupleRead(fid);
This site says the problem occurs "when f has more than one matrix element." Code is below:
»f.a = [1 0]
f =
a: [1 0]
»f.b = [1 2]
f =
a: [1 0]
b: [1 2]
»f = setfield(f,'a',[2 2])
f =
a: [2 2]
b: [1 2]
»f(2).a=1
f =
1x2 struct array with fields:
a
b
»f = setfield(f,'a',[2 2])
??? Error using ==> setfield
Incorrect number of right hand side elements in dot name assignment.
Missing [] around left hand side is a likely cause.
I assume this means the matrix f looks like this:
f = [ [a1; b1]; [a2; b2]; ]
f = [ [[2 2]; [1 2]]; [[1]; []]; ]
When they tried to update f.a which was set to
f.a = [[2 2]; [1]]
...to a single element [2 2], it doesn't like that because f.a is currently a matrix with 2 vector elements. Basically if you are going to reassign f.a (all elements of the attribute a of matrix f), you have to reassign f.a to have the same number of elements as it currently has.
I think that is why this error is occuring in the setfield example.
My question: how does this apply to jcampread()? jcampStruct is literally a structure with the same attributes, and those attributes are assigned only once. I do not understand:
a. How matlab knows the return value of jcampread() is a jcampStruct, and
b. Why (given that it knows (a)), the 'Incorrect number of right hand..' error message is firing here.
Can anyone clear this up for me?
You are creating a non scalar structure and there is no way to assign at once, i.e. without a loop, a different value to the same field of each sub-structure.What does it mean?
Scalar structure
s.a = 1;
size(s)
ans =
1 1
Now, adding fields doesn't change the size of the structure:
s.b = 2;
size(s)
ans =
1 1
Non-scalar structure
However, assigning a value to the same field, but into a position > 1 of the structure, will grow it a non-scalar one:
s(2).a = 3
size(s)
ans =
1 2
Also, notice how the sub-structure in position 2 replicates/pre-allocates the fields of the initial structure even though you assigned to a alone:
s(2)
ans =
a: 3
b: []
Pointers
Additionally, the field s(2).b is just an empty pointer:
whos s
Name Size Bytes Class
s 1x2 496 struct
and by adding a scalar double (8 bytes), we get
s(2).b = 4;
whos s
Name Size Bytes Class
s 1x2 608 struct
Pro of non-scalar structure
What you can do with a non-scalar structure, is retrieve one field across all sub-structure (considering you don't run into concatenation issues):
for ii = 1:100
s(ii).a = rand(1,2);
end
cat(1,s.a)
the last command will concatenate all values of a single field from all sub-structure into a 100 by 2 array.
Cons
To assign different values across the sub-structures, even if the same field, you need to loop (as above in the for loop).
At most you could deal() the same values into one field across all sub-structures:
clear s
[s(1:100)] = deal([1, 2]);
I am reading 5 columns from a .txt file to 5 vectors.
Sometimes some vectors are one element larger than others, so I need to check if they are all of equal length, and if not, I have to find which ones are the largest and delete their last element. I think I should be able to do this without loops. I was originally thinking of using find in combination with isequal but isequal only returns a logical, and does not provide any information on which vectors are the largest.
[Seconds,Sensor1VStatic,Sensor2VPulsed,TemperatureC,RelativeHumidity] = importfile(path);
Then depending on what vectors are longer by one element, I will do, for example
Seconds(end) = [];
Sensor1VStatic(end) = [];
If Seconds and Sensor1VStatic are longer than the other vectors by one element
Assume your vectors are in a cell array A:
A = {[1 2 3], [1 2 3 4], [1 2 3 4 5]};
You can get the size of each vector with
sz = cellfun(#(x)size(x,2), A);
Which will return (for the above example)
sz = [ 3 4 5]
Now you can find the shortest vector:
minLength = min(sz);
And finally, make all vectors this length:
B = cell2mat(cellfun(#(x)x(1:minLength), A, 'uniformoutput', false))';
There may be more elegant ways (and note that cellfun really is doing "implicit looping")
Applying this to your (now expanded) example, you could probably assign the output of importfile directly to the cell array - or you can do it as a separate line:
A = {Seconds,Sensor1VStatic,Sensor2VPulsed,TemperatureC,RelativeHumidity};
But it all becomes a lot of work. Instead you could do:
minLength = min(size(Seconds,1), size(Sensor1VStatic,1), size(Sensor2VPulsed,1), ...
Seconds = Seconds(1:minLength);
...
There is scope for some cleverness but it won't make things more readable, and it won't save time in the long run...
My question has two parts:
Split a given matrix into its columns
These columns should be stored into an array
eg,
A = [1 3 5
3 5 7
4 5 7
6 8 9]
Now, I know the solution to the first part:
the columns are obtained via
tempCol = A(:,iter), where iter = 1:end
Regarding the second part of the problem, I would like to have (something like this, maybe a different indexing into arraySplit array), but one full column of A should be stored at a single index in splitArray:
arraySplit(1) = A(:,1)
arraySplit(2) = A(:,2)
and so on...
for the example matrix A,
arraySplit(1) should give me [ 1 3 4 6 ]'
arraySplit(2) should give me [ 3 5 5 8 ]'
I am getting the following error, when i try to assign the column vector to my array.
In an assignment A(I) = B, the number of elements in B and I must be the same.
I am doing the allocation and access of arraySplit wrongly, please help me out ...
Really it sounds like A is alread what you want--I can't imagine a scenario where you gain anything by splitting them up. But if you do, then your best bet is likely a cell array, ie.
C = cell(1,3);
for i=1:3
C{i} = A(:,i);
end
Edit: See #EitanT's comment below for a more elegant way to do this. Also accessing the vector uses the same syntax as setting it, e.g. v = C{2}; will put the second column of A into v.
In a Matlab array, each element must have the same type. In most cases, that is a float type. An your example A(:, 1) is a 4 by 1 array. If you assign it to, say, B(:, 2) then B(:, 1) must also be a 4 by 1 array.
One common error that may be biting you is that a 4 by 1 array and a 1 by 4 array are not the same thing. One is a column vector and one is a row vector. Try transposing A(:, 1) to get a 1 by 4 row array.
You could try something like the following:
A = [1 3 5;
3 5 7;
4 5 7;
6 8 9]
arraySplit = zeros(4,1,3);
for i =1:3
arraySplit(:,:,i) = A(:,i);
end
and then call arraySplit(:,:,1) to get the first vector, but that seems to be an unnecessary step, since you can readily do that by accessing the exact same values as A(:,1).