MATLAB: is fieldnames' order defined? - matlab

For the same input structure, will fieldnames always return the same cell array, even on different computers, different OS's, and different MATLAB versions? Or could it order the field names differently? E.g.:
myStructure = load myStructure;
x = fieldnames(myStructure);
% days later, diff computer, diff OS, and diff version of MATLAB...
y = fieldnames(myStructure);
x == y %?
The documentation for fieldnames does not seem to promise that the same order is returned every time. But on the other hand, the existence of orderfields seems to imply that fieldnames predictably returns an underlying, normally unchanging order.

I believe the structure fields are ordered as they created. If you save the structure into mat-file and open it later with another MATLAB, the order will be kept. You can always reorder fields with ORDERFIELDS function. You can order in many different ways (sort alphabetically, using a cell arrays, another structure or permutation vector), see the documentation for more details.
By the way, fields order does not affect structures comparison.
s1 = struct('a',0,'b',1)
s1 =
a: 0
b: 1
s2 = struct('b',1,'a',0)
s2 =
b: 1
a: 0
isequal(s1,s2)
ans =
1
s1=orderfields(s1,s2)
s1 =
b: 1
a: 0
UPDATE:
Here is the quote from the MATLAB documentation for structure data type under "Listing the Fields of a Structure" subtitle:
The fields appear in the order in which they were created.
Hope this answers your question.

Related

Matlab: How to load a file as array of structs

I am trying to convert data file (here string representing file with three lines) into a structure array like this:
cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
str = cell2struct(cel, {'f1', 'f2'}, 2);
However, now I have a struct array of dimension 1x1, where I can only access the columns using array's fields, but not the whole rows (like 'str(2)' for the second row).
What I need is to have an array of structs (or how it can be called) like this:
str = struct('f1', {1, 2, 3}, 'f2', {1.1, 2.2, 3.3});
because now I can (for instance) filter it like this:
subStr = str(find([str.f1] > 1))
which I could not do in the first case.
Any idea how to get there?
At the end I was able to do it by:
cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
[f1, f2] = cel{:};
str = struct('f1', num2cell(f1'), 'f2', num2cell(f2'));
But it does not feel right and I am afraid it will be expensive (the files are quite large).
EDIT:
My solution is indeed too memory demanding, therefore not usable.
Typical files have header, footer, and c. 5e6 lines of data in six columns.
Thanks
It's easier if you're actually working with a file that contains lines. For example, if data.txt contains:
1 1.1
2 2.2
3 3.3
And now you can simply load this using:
tbl = readtable('data.txt');
tbl.Properties.VariableNames = {'f1', 'f2'};
Which results in much nicer (imho) filtering syntax:
subTbl = tbl(tbl.f1 > 1, :);
I suggest you read a bit about tables in MATLAB, to learn about their (many) capabilities.
Finally, if you insist on working with struct arrays, you can do:
str = table2struct(tbl); 3×1 struct array with fields: f1 f2
Each element of cel is an array. Using cellfun and num2cell they can be converted to cell arrays:
names = {'f1', 'f2'};
cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
cel2 = cellfun(#num2cell, cel, 'UniformOutput', 0);
prep = [names;cel2];
str = struct(prep{:}).';
I wish I would read those more carefully sooner, but according to this and this it is not encouraged to save large datasets the way I was trying to, because
Structures with many fields and small contents have a large overhead and should be avoided. A large array of structures with numeric scalar fields requires much more memory than a structure with fields containing large numeric arrays.
and
For structures and cell arrays, MATLAB creates a header not only for each array, but also for each field of the structure and for each cell of a cell array. Because of this, the amount of memory required to store a structure or cell array depends not only on how much data it holds, but also on how it is constructed.
So therefore array str.f(1:N) requires (for larger N) much more memory than str(1:N).f.

Fastest type to use for comparing hashes in matlab

I have a table in Matlab with some columns representing 128 bit hashes.
I would like to match rows, to one or more rows, based on these hashes.
Currently, the hashes are represented as hexadecimal strings, and compared with strcmp(). Still, it takes many seconds to process the table.
What is the fastest way to compare two hashes in matlab?
I have tried turning them into categorical variables, but that is much slower. Matlab as far as I know does not have a 128 bit numerical type. nominal and ordinal types are deprecated.
Are there any others that could work?
The code below is analogous to what I am doing:
nodetype = { 'type1'; 'type2'; 'type1'; 'type2' };
hash = {'d285e87940fb9383ec5e983041f8d7a6'; 'd285e87940fb9383ec5e983041f8d7a6'; 'ec9add3cf0f67f443d5820708adc0485'; '5dbdfa232b5b61c8b1e8c698a64e1cc9' };
entries = table(categorical(nodetype),hash,'VariableNames',{'type','hash'});
%nodes to match. filter by type or some other way so rows don't match to
%themselves.
A = entries(entries.type=='type1',:);
B = entries(entries.type=='type2',:);
%pick a node/row with a hash to find all counterparts of
row_to_match_in_A = A(1,:);
matching_rows_in_B = B(strcmp(B.hash,row_to_match_in_A.hash),:);
% do stuff with matching rows...
disp(matching_rows_in_B);
The hash strings are faithful representations of what I am using, but they are not necessarily read or stored as strings in the original source. They are just converted for this purpose because its the fastest way to do the comparison.
Optimization is nice, if you need it. Try it out yourself and measure the performance gain for relevant test cases.
Some suggestions:
Sorted arrays are easier/faster to search
Matlab's default numbers are double, but you can also construct integers. Why not use 2 uint64's instead of the 128bit column? First search for the upper 64bit, then for the lower; or even better: use ismember with the row option and put your hashes in rows:
A = uint64([0 0;
0 1;
1 0;
1 1;
2 0;
2 1]);
srch = uint64([1 1;
0 1]);
[ismatch, loc] = ismember(srch, A, 'rows')
> loc =
4
2
Look into the compare functions you use (eg edit ismember) and strip out unnecessary operations (eg sort) and safety checks that you know in advance won't pose a problem. Like this solution does. Or if you intend do call a search function multiple times, sort in advance and skip the check/sort in the search function later on.

Print the name of a variable on upon a plot/figure

Is it possible to refer back to/access the names of variables (say nx1 arrays) that make up a matrix? I wish to access them to insert there names into a plot or figure (as a text) that I have created. Here is an example:
A = [supdamp, clgvlv,redamp,extfanstat,htgvlv,occupied,supfanspd]
%lots of code here but not changing A, just using A(:,:)'s
%drawn figure
text(1,1,'supdamp')
...
text(1,n,'supfanspd')
I have failed in an attempt create a string named a with their names in so that I could loop through a(i,1), then use something like text(1,n,'a(i,1)')
Depending on your problem, it might make sense to use structures with dynamical field names.
Especially if your data in the array have some meaning other than just entries of a matrix in linear algebra sense.
# name your variables so that your grandma could understand what they store
A.('supdamp') = supdamp
A.('clgvlv') = clgvlv
...
fieldsOfA = fieldnames(a)
for n = 1 : numel(fieldsOfA )
text(1, n, fieldsOfA{n})
end

output a structure from a input cell

I have a cell that has different data types (cell, logical, double, char) except structure. Now I have to write a function that will sort out different data types and output a structure with the field of those data types. The fields have to appear according to their appearance in the cell. So, if the first 'n' element(s) of the cell is double and the (n+1)th element is a char then the first field of the output structure will be double and second field will be char.
Below is an example where buildStructure is the function header. sa is the output structure.
ca = {'Moriarty', [true, false], false, {'Pink Suitcase'}}
sa = buildStructure(ca)
sa=>
char: {'Moriarty'}
logical: {[true, false] [false]}
cell: {{'Pink Suitcase'}}
I tried it writing a for loop to store different data types in different cells. However, then I am feeling so lost. How can I figure out which data type appeared when? To do that I stored all the classes in a huge string then used 'strfind' to find the place (thus time) of particular data type. But it is making things only complex. Any help will be appreciated! Thanks.
There are tests for all the data types. see: iscell, ischar, islogial and so on. Their results can be used to index the input.
you can complete this example code:
function out = magicfun(varargin)
il = cellfun(#islogical,varargin);
out = struct('logical',{varargin(il)});
You can use class(), isa(), and unique() to do it generically. It's like bdecaf's approach, but that'll require you to write a test for every type and use a variety of functions. Using class and isa will generalize to data of any type using a single test, and will be shorter to write.
Exact Types Only
By comparing class names from class(), you can partition the input in to types based on the exact (most specific) type of each input. The 'stable' option for unique() keeps the output fields in the order of the first occurrences of the types in the input. (In production code I would probably omit the 'stable' so the output ordering is canonicalized based on the type name, but it depends on your requirements.)
function out = break_types(in)
%BREAK_TYPES Partition a cell array based on the types of its contents
inTypes = cellfun(#class, in, 'UniformOutput',false);
[types,ax,bx] = unique(inTypes, 'stable');
out = struct;
for i = 1:numel(types)
ix = (bx == i);
out.(types{i}) = in(ix);
end
This is pretty complete and should work with anything that didn't do something silly like override class() or isa().
>> ca = {'Moriarty', [true, false], false, {'Pink Suitcase'}};
>> break_types(ca)
ans =
char: {'Moriarty'}
logical: {[1 0] [0]}
cell: {{1x1 cell}}
>>
Considering Inheritance
If you use isa(), you'll also pick up inheritance relationships for classes. For basic Matlab types, this will give you the same answer as the other implementation. But for classes that inherit from other types, it will categorize them in to all the types they match in the input and required lists.
function out = break_types(in)
%BREAK_TYPES Partition a cell array based on the types of its contents
inTypes = cellfun(#class, in, 'UniformOutput',false);
types = unique(inTypes, 'stable');
out = struct;
for i = 1:numel(types)
ix = cellfun(#(x) isa(x, types{i}), in);
out.(types{i}) = in(ix);
end
If you want to ensure that the output struct has an entry for some types even if there are no inputs of that type (so its field would contain an empty array), just append those type names to types before passing them to unique:
requiredTypes = { 'cell' 'int8', 'double', 'float' };
types = unique([inTypes requiredTypes], 'stable');

What are some efficient ways to combine two structures in MATLAB?

I want to combine two structures with differing fields names.
For example, starting with:
A.field1 = 1;
A.field2 = 'a';
B.field3 = 2;
B.field4 = 'b';
I would like to have:
C.field1 = 1;
C.field2 = 'a';
C.field3 = 2;
C.field4 = 'b';
Is there a more efficient way than using "fieldnames" and a for loop?
EDIT: Let's assume that in the case of field name conflicts we give preference to A.
Without collisions, you can do
M = [fieldnames(A)' fieldnames(B)'; struct2cell(A)' struct2cell(B)'];
C=struct(M{:});
And this is reasonably efficient. However, struct errors on duplicate fieldnames, and pre-checking for them using unique kills performance to the point that a loop is better. But here's what it would look like:
M = [fieldnames(A)' fieldnames(B)'; struct2cell(A)' struct2cell(B)'];
[tmp, rows] = unique(M(1,:), 'last');
M=M(:, rows);
C=struct(M{:});
You might be able to make a hybrid solution by assuming no conflicts and using a try/catch around the call to struct to gracefully degrade to the conflict handling case.
Short answer: setstructfields (if you have the Signal Processing Toolbox).
The official solution is posted by Loren Shure on her MathWorks blog, and demonstrated by SCFrench here and in Eitan T's answer to a different question. However, if you have the Signal Processing Toolbox, a simple undocumented function does this already - setstructfields.
help setstructfields
setstructfields Set fields of a structure using another structure
setstructfields(STRUCTIN, NEWFIELDS) Set fields of STRUCTIN using
another structure NEWFIELDS fields. If fields exist in STRUCTIN
but not in NEWFIELDS, they will not be changed.
Internally it uses fieldnames and a for loop, so it is a convenience function with error checking and recursion for fields that are themselves structs.
Example
The "original" struct:
% struct with fields 'color' and 'count'
s = struct('color','orange','count',2)
s =
color: 'orange'
count: 2
A second struct containing a new value for 'count', and a new field, 'shape':
% struct with fields 'count' and 'shape'
s2 = struct('count',4,'shape','round')
s2 =
count: 4
shape: 'round'
Calling setstructfields:
>> s = setstructfields(s,s2)
s =
color: 'orange'
count: 4
shape: 'round'
The field 'count' is updated. The field 'shape' is added. The field 'color' remains unchanged.
NOTE: Since the function is undocumented, it may change or be removed at any time.
I have found a nice solution on File Exchange: catstruct.
Without testing the performance I can say that it did exactly what I wanted.
It can deal with duplicate fields of course.
Here is how it works:
a.f1 = 1;
a.f2 = 2;
b.f2 = 3;
b.f4 = 4;
s = catstruct(a,b)
Will give
s =
f1: 1
f2: 3
f3: 4
I don't think you can handle conflicts well w/o a loop, nor do I think you'd need to avoid one. (although I suppose efficiency could be an issue w/ many many fields...)
I use a function I wrote a few years back called setdefaults.m, which combines one structure with the values of another structure, where one takes precedence over the other in case of conflict.
% SETDEFAULTS sets the default structure values
% SOUT = SETDEFAULTS(S, SDEF) reproduces in S
% all the structure fields, and their values, that exist in
% SDEF that do not exist in S.
% SOUT = SETDEFAULTS(S, SDEF, OVERRIDE) does
% the same function as above, but if OVERRIDE is 1,
% it copies all fields of SDEF to SOUT.
function sout = setdefaults(s,sdef,override)
if (not(exist('override','var')))
override = 0;
end
sout = s;
for f = fieldnames(sdef)'
cf = char(f);
if (override | not(isfield(sout,cf)))
sout = setfield(sout,cf,getfield(sdef,cf));
end
end
Now that I think about it, I'm pretty sure that the "override" input is unnecessary (you can just switch the order of the inputs) though I'm not 100% sure of that... so here's a simpler rewrite (setdefaults2.m):
% SETDEFAULTS2 sets the default structure values
% SOUT = SETDEFAULTS(S, SDEF) reproduces in S
% all the structure fields, and their values, that exist in
% SDEF that do not exist in S.
function sout = setdefaults2(s,sdef)
sout = sdef;
for f = fieldnames(s)'
sout = setfield(sout,f{1},getfield(s,f{1}));
end
and some samples to test it:
>> S1 = struct('a',1,'b',2,'c',3);
>> S2 = struct('b',4,'c',5,'d',6);
>> setdefaults2(S1,S2)
ans =
b: 2
c: 3
d: 6
a: 1
>> setdefaults2(S2,S1)
ans =
a: 1
b: 4
c: 5
d: 6
In C, a struct can have another struct as one of it's members. While this isn't exactly the same as what you're asking, you could end up either with a situation where one struct contains another, or one struct contains two structs, both of which hold parts of the info that you wanted.
psuedocode: i don't remember the actual syntax.
A.field1 = 1;
A.field2 = 'a';
A.field3 = struct B;
to access:
A.field3.field4;
or something of the sort.
Or you could have struct C hold both an A and a B:
C.A = struct A;
C.B = struct B;
with access then something like
C.A.field1;
C.A.field2;
C.B.field3;
C.B.field4;
hope this helps!
EDIT: both of these solutions avoid naming collisions.
Also, I didn't see your matlab tag. By convention, you should want to edit the question to include that piece of info.