Matlab: associating an ID with a dataset (e.g. struct)? - matlab

I am developing a certain feature for a high-order finite element simulation algorithm in Matlab and I am wondering what is a good way of implementing a certain task. I believe I am facing a somewhat common problem, but after doing some digging, I'm not really finding a good solution.
Basically, I have a long list of ID's (corresponding to certain nodes on my mesh), where each ID is associated with small data set. Then when I am running my solver, I need to access the data associated with these nodes and update the data (multiple times).
So, for example, let's say that this is my list of these specific nodes:
nodelist = [3 27 38] %(so these are my node ID's)
Then for each node I have the following dataset associated
a (scalar)
b (5x5 double matrix)
c (10x1 double vector)
(a total of 36 double values associated with each node ID)
In reality, I will of course have a much, much longer list of node ID's and a somewhat larger data set associated with each node (but still only double scalars, matrices and vectors (no characters, strings etc)).
Approach 1
So one approach I cooked up is just to store everything in a 2D double matrix, and then do some relatively complex indexing to access my data when needed. For the example above, the size of my 2D matrix would be
size(2Dmat) = [length(nodelist), 36]
Say I wanted to access b(3,3) for node ID 27, I would access 2Dmat(2,14).
In principle, this works, but the code is just not very clean and readable because of this complex indexing (not to mention, when I change something in the way the data set is set up, I need to re-adjust the whole indexing code).
Approach 2
Another approach would be to use some sort of struct for each node in the node list:
a = 4.4;
b = rand(5,5);
c = rand(10,1);
s = struct('a',a,'b',b,'c',c)
And then I can access the data via, e.g., s.b(3,3) etc. But I just don't know how to associate a struct with the node ID?
Approach 3
The last thing I could think of would be to set up some sort of SQL database, but this seems like an overkill. And besides, I need my code to be as fast as possible, since I need to access these fields in the datasets associated with these chosen nodes many, many times and I imagine doing some queries into a database will slow things down.
Note that ultimately I will convert the code from Matlab to C/C++, so I would prefer to implement something that doesn't rely to heavily on some Matlab specific features.
So, any thoughts on how to implement this functionality in a clean way? I hope my question makes sense and thanks in advance!

Approach 2 is the cleanest, and readily translates to C++. For each node you have a struct s, then:
data(nodeID) = s;
is what is called a struct array. You index as
data(id).b(3,3) = 0.0;
This assumes that the IDs are contiguous, or that there are no huge gaps in their values. But this can always be ensured, it is easy to renumber node IDs if necessary.
In C++, you’d have a vector of structs:
struct Bla{
double a;
double b[3][3];
double c[10];
};
std::vector<Bla> data(N);
Or in C:
Bla* data = malloc(sizeof(Bla)*N);
(and don’t forget free(data) when you’re done with it).
Then, in either C or C++, you access an element this way:
data[id].b[2][2] = 0.0;
The translation is obvious, except that indexing starts at 0 in C++ and at 1 in MATLAB.
Note that this method has a larger memory overhead than Approach 1 in MATLAB, but not in C or C++.
Approach 3 is a bad idea, it will just slow down your code without any benefits.

I think the cleanest solution, given a non-contiguous set of node IDs, would be approach 2 making use of a map container where your node ID is the key (i.e. index) into the map. This can be implemented in MATLAB using a containers.Map object, and in C++ using the std::map container. For example, here's how you can create and add values to a node map in MATLAB:
>> nodeMap = containers.Map('KeyType', 'double', 'ValueType', 'any');
>> nodelist = [3 27 38];
>> nodeMap(nodelist(1)) = struct('a', 4.4, 'b', rand(5, 5), 'c', rand(10, 1));
>> nodeMap(3)
ans =
struct with fields:
a: 4.400000000000000
b: [5×5 double]
c: [10×1 double]
>> nodeMap(3).b(3,3)
ans =
0.646313010111265
In C++, you would need to define a structure or class (e.g. Node) for the data type to be stored in the map. Here's an example (... denotes arguments passed to the Node constructor):
#include <map>
class Node {...}; // Define Node class
typedef std::map<int, Node> NodeMap; // Using int for key type
int main()
{
NodeMap map1;
map1[3] = Node(...); // Initialize and assign Node object
map1.emplace(27, std::forward_as_tuple<...>); // Create Node object in-place
}

Related

Matlab: How can I call object properties using a string?

I am currently working on a data analysis program that contains two objects: Experiment and RunSummary. The Experiment object contains multiple instances of the RunSummary object. Each RunSummary object contains multiple properties (row matrices) each containing different data points for a given run.
For example: Experiment.RunSummary(5).Tmean is row matrix containing all of the average torque values for run 5 in my experiment.
I am currently trying to find a way to combine selected common properties from specific runs into a single matrix that can be used for further analysis. The current way I have had to do this is:
X(:,1) = [Drilling.Runs(1).Tmean,...
Drilling.Runs(2).Tmean,...
Drilling.Runs(3).Tmean,...
Drilling.Runs(5).Tmean]';
X(:,2) = [Drilling.Runs(1).Fmean,...
Drilling.Runs(2).Fmean,...
Drilling.Runs(3).Fmean,...
Drilling.Runs(5).Fmean]';
This code takes the average torque (Tmean) and average force (Fmean) from runs 1, 2, 3, and 5 and combines them in a single matrix, X, with Tmean for all runs in the first column and Fmean in the second. Although this method works, I have over 20 different properties and 15 different runs making this coding very tedious.
I have tried using code such as get(Experiment.RunSummary(i),'Tmean') to try and retrieve these property matricies, but was met with the error:
Conversion to double from RunSummary is not possible.
Is there a way to easily combine all of these different properties
into a single matrix using strings to determine which properties are used?
Thanks,
metro
Edit: Drilling is the name of the Experiment object. Runs is the name of the RunSummary object.
You can use dynamic fields. The documentation is for structs, but the same principal works for classes (at least on my R2012a install).
You can also use the comma-separate nature of object array indexing to compress the code.
Example:
I = [1,2,3,5] ;
props = {'Tmean','Fmean'} ;
Nprops = length(props) ;
X = zeros(length(I),Nprops);
for k = 1:Nprops
X(:,k) = [Drilling.Runs(I).(props{k})]';
end

Matlab: Query complicated structures

I am using structures in Matlab to organize my results in an intuitive way. My analysis is quite complex and hierarchical, so this works well---logically. For example:
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1). Each level of the structure has several fields. Each alpha field is a structured array, indexed for each dataset, and classification is also a structured array, one for each cross validation run on the data.
To simplify, consider the the classification field:
>> classification
ans =
1x8 struct array with fields:
bestLambda
bestBetas
scores
statObj
fitObj
In which statObj has fields (for example):
dprime: 6.5811
hit: 20
miss: 0
falseAlarms: 0
correctRejections: 30
Of course, the fields have different values for each subject and cross validation run. Given this structure, is there a good way to find the mean of dprime over cross validation runs (i.e. the elements of classification) without needing to construct a for loop to extract, store, and finally compute on?
I was hoping that reshape(struct2array(classification.statObj),5,8) would work, so I could construct a matrix with stats as rows and cross validations runs as columns, but this won't work. I put these items in their own structure specifically because the fields of classification hold elements of various types (matrices, structures, integers).
I am not opposed to restructuring my output entirely, but I'd like it to be done in such a way that the organization is fairly self-commenting, and I could say return to this structure a year from now and remember what and where everything is.
I came up with the following, although I'm not sure if it is what you are looking for:
%# create a structure hierarchy similar to yours
%# (I ignore everything before alpha10, and only create a part of it)
alpha10 = struct();
for a=1:5
alpha10(a).classification = struct();
for c=1:8
alpha10(a).classification(c).statObj = struct('dprime',rand());
end
end
%# matrix of 'dprime' for each alpha across each cross-validation run
st = [alpha10.classification];
st = [st.statObj];
dp = reshape([st.dprime], 8, 5)' %# result is 5-by-8 matrix
Next you can compute mean across the second dimension of this matrix dp
For anyone who happens across this post, and is wrestling with something similar, it is worth asking yourself if such a nested structure-of-structures is really your best option. It may be easier to flatten the hierarchy and include descriptive fields as labels. For instance
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1)
might instead be
resultObj(1).
AnlaysisType = 'multivariate'
GroupSolution = false
SignalType = 'distributed'
Processing = 'raw'
alpha = 10
crossvalidation = 1
dprime = 6.5811
bestLambda = []
bestBetas = []
scores = []
fitObj = []
That's not valid Matlab syntax there, but it get's the point across. Rather than building a hierarchy out of nested structures, create a 1xN structure with labels and data. It is a more general solution that is easier to query and work with.

Struct management

I am writing a solution that manages data from an eye tracker. I currently hold the data in a N x 5 matrix, with the following columns:
X Position, Y Position, timestamp, Velocity, Acceleration
Each row represents a single sample from the eye tracker (which runs at 1000Hz).
At present, I access the data in the form of a matrix - e.g. if I want to access the velocity for sample #600, I use 'dataStream(600,4)'.
This is fine, but I'd prefer my code to be more readable. The '4' could be confusing; something like dataStream.velocity(600) would be ideal. I understand that this would be a simple use of STRUCT. However, there are situations in which I need to copy an entire sample (i.e. all columns from one row of my matrix). As I understand it, this would not easily be achieved in a STRUCT object, as the various arrays in each STRUCT sub-heading are not intrinsically linked. I would have to (I think) copy each element separately, for example if I wanted to copy sample #100, I believe I would need to copy dataStream.xPos(100), dataStream.yPos(100), dataStream.timestamp(100) and so on separately.
Is there something I'm missing with regards to management of STRUCTs, or would I be better off saving the hassle and sticking with the matrix approach?
If it is just for an increased readability, I would not use structs, but rather use an quite simple approach by defining variables for the different columns of your data matrix. See for instance:
xPosition = 1;
yPosition = 2;
timestamp = 3;
Velocity = 4;
Acceleration = 5;
With this variables you can write quite meaningful queries, for instance, instead of dataStream(600,1) you would write:
dataStream(600, xPosition)
Note that you also could define more complex queries, for instance
position = [1 2];
wholeSample = 1:5;
to query the multiple columns at once.
You can copy struct easily
s = struct(another_struct);
In terms of performance, struct will be slower than matrix. Use readable constant to replace your numerical indices as suggested by #H.Muster.

In-Place Quicksort in matlab

I wrote a small quicksort implementation in matlab to sort some custom data. Because I am sorting a cell-array and I need the indexes of the sort-order and do not want to restructure the cell-array itself I need my own implementation (maybe there is one available that works, but I did not find it).
My current implementation works by partitioning into a left and right array and then passing these arrays to the recursive call. Because I do not know the size of left and and right I just grow them inside a loop which I know is horribly slow in matlab.
I know you can do an in place quicksort, but I was warned about never modifying the content of variables passed into a function, because call by reference is not implemented the way one would expect in matlab (or so I was told). Is this correct? Would an in-place quicksort work as expected in matlab or is there something I need to take care of? What other hints would you have for implementing this kind of thing?
Implementing a sort on complex data in user M-code is probably going to be a loss in terms of performance due to the overhead of M-level operations compared to Matlab's builtins. Try to reframe the operation in terms of Matlab's existing vectorized functions.
Based on your comment, it sounds like you're sorting on a single-value key that's inside the structs in the cells. You can probably get a good speedup by extracting the sort key to a primitive numeric array and calling the builtin sort on that.
%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(#(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(#(s)s.foo, sortedC) %// for human readability of results
If you're sorting on multiple field values, extract all the m key values from the n cells to an n-by-m array, with columns in descending order of precedence, and use sortrows on it.
%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
keyCol = keyCols{i};
key(:,i) = cellfun(#(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(#(s)s.foo, sortedC)
One of the keys to performance in Matlab is to get your data in primitive arrays, and use vectorized operations on those primitive arrays. Matlab code that looks like C++ STL code with algorithms and references to comparison functions and the like will often be slow; even if your code is good in O(n) complexity terms, the fixed cost of user-level M-code operations, especially on non-primitives, can be a killer.
Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.
In this post, I only explain MATLAB function-calling convention, and am not discussing the quick-sort algorithm implementation.
When calling functions, MATLAB passes built-in data types by-value, and any changes made to such arguments are not visible outside the function.
function y = myFunc(x)
x = x .* 2; %# pass-by-value, changes only visible inside function
y = x;
end
This could be inefficient for large data especially if they are not modified inside the functions. Therefore MATLAB internally implements a copy-on-write mechanism: for example when a vector is copied, only some meta-data is copied, while the data itself is shared between the two copies of the vector. And it is only when one of them is modified, that the data is actually duplicated.
function y = myFunc(x)
%# x was never changed, thus passed-by-reference avoiding making a copy
y = x .* 2;
end
Note that for cell-arrays and structures, only the cells/fields modified are passed-by-value (this is because cells/fields are internally stored separately), which makes copying more efficient for such data structures. For more information, read this blog post.
In addition, versions R2007 and upward (I think) detects in-place operations on data and optimizes such cases.
function x = myFunc(x)
x = x.*2;
end
Obviously when calling such function, the LHS must be the same as the RHS (x = myFunc(x);). Also in order to take advantage of this optimization, in-place functions must be called from inside another function.
In MEX-functions, although it is possible to change input variables without making copies, it is not officially supported and might yield unexpected results...
For user-defined types (OOP), MATLAB introduced the concept of value object vs. handle object supporting reference semantics.

Hash tables in MATLAB

Does MATLAB have any support for hash tables?
Some background
I am working on a problem in Matlab that requires a scale-space representation of an image. To do this I create a 2-D Gaussian filter with variance sigma*s^k for k in some range., and then I use each one in turn to filter the image. Now, I want some sort of mapping from k to the filtered image.
If k were always an integer, I'd simply create a 3D array such that:
arr[k] = <image filtered with k-th guassian>
However, k is not necessarily an integer, so I can't do this. What I thought of doing was keeping an array of ks such that:
arr[find(array_of_ks_ = k)] = <image filtered with k-th guassian>
Which seems pretty good at first thought, except I will be doing this lookup potentially a few thousand times with about 20 or 30 values of k, and I fear that this will hurt performance.
I wonder if I wouldn't be better served doing this with a hash table of some sort so that I would have a lookup time that is O(1) instead of O(n).
Now, I know that I shouldn't optimize prematurely, and I may not have this problem at all, but remember, this is just the background, and there may be cases where this is really the best solution, regardless of whether it is the best solution for my problem.
Consider using MATLAB's map class: containers.Map. Here is a brief overview:
Creation:
>> keys = {'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', ...
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Annual'};
>> values = {327.2, 368.2, 197.6, 178.4, 100.0, 69.9, ...
32.3, 37.3, 19.0, 37.0, 73.2, 110.9, 1551.0};
>> rainfallMap = containers.Map(keys, values)
rainfallMap =
containers.Map handle
Package: containers
Properties:
Count: 13
KeyType: 'char'
ValueType: 'double'
Methods, Events, Superclasses
Lookup:
x = rainfallMap('Jan');
Assign:
rainfallMap('Jan') = 0;
Add:
rainfallMap('Total') = 999;
Remove:
rainfallMap.remove('Total')
Inspect:
values = rainfallMap.values;
keys = rainfallMap.keys;
sz = rainfallMap.size;
Check key:
if rainfallMap.isKey('Today')
...
end
Matlab R2008b (7.7)’s new containers.Map class is a scaled-down Matlab version of the java.util.Map interface. It has the added benefit of seamless integration with all Matlab types (Java Maps cannot handle Matlab structs for example) as well as the ability since Matlab 7.10 (R2010a) to specify data types.
Serious Matlab implementations requiring key-value maps/dictionaries should still use Java’s Map classes (java.util.EnumMap, HashMap, TreeMap, LinkedHashMap or Hashtable) to gain access to their larger functionality if not performance. Matlab versions earlier than R2008b have no real alternative in any case and must use the Java classes.
A potential limitation of using Java Collections is their inability to contain non-primitive Matlab types such as structs. To overcome this, either down-convert the types (e.g., using struct2cell or programmatically), or create a separate Java object that will hold your information and store this object in the Java Collection.
You may also be interested to examine a pure-Matlab object-oriented (class-based) Hashtable implementation, which is available on the File Exchange.
You could use java for it.
In matlab:
dict = java.util.Hashtable;
dict.put('a', 1);
dict.put('b', 2);
dict.put('c', 3);
dict.get('b')
But you would have to do some profiling to see if it gives you a speed gain I guess...
Matlab does not have support for hashtables. EDIT Until r2010a, that is; see #Amro's answer.
To speed up your look-ups, you can drop the find, and use LOGICAL INDEXING.
arr{array_of_ks==k} = <image filtered with k-th Gaussian>
or
arr(:,:,array_of_ks==k) = <image filtered with k-th Gaussian>
However, in all my experience with Matlab, I've never had a lookup be a bottleneck.
To speed up your specific problem, I suggest to either use incremental filtering
arr{i} = GaussFilter(arr{i-1},sigma*s^(array_of_ks(i)) - sigma*s^(array_of_ks(i-1)))
assuming array_of_ks is sorted in ascending order, and GaussFilter calculates the filter mask size based on the variance (and uses, 2 1D filters, of course), or you can filter in Fourier Space, which is especially useful for large images and if the variances are spaced evenly (which they most likely aren't unfortunately).
It's a little clugey, but I'm surprised nobody has suggested using structs. You can access any struct field by variable name as struct.(var) where var can be any variable and will resolve appropriately.
dict.a = 1;
dict.b = 2;
var = 'a';
display( dict.(var) ); % prints 1
You can also take advantage of the new type "Table". You can store different types of data and get statistics out of it really easy.
See http://www.mathworks.com/help/matlab/tables.html for more info.