MATLAB tables let you index into any column/field using the row name, e.g., MyTable.FourthColumn('SecondRowName'). Compared to this, dictionaries (containers.Map) seem primitive, e.g., it serves the role of a 1-column table. It also has its own dedicated syntax, which slows down the thinking about how to code.
I'm beginning to think that I can forget the use of dictionaries. Are there typical situations for which that would not be advisable?
TL;DR: No. containers.Map has uses that cannot be replaced with a table. And I would not choose a table for a dictionary.
containers.Map and table have many differences worth noting. They each have their use. A third container we can use to create a dictionary is a struct.
To use a table as a dictionary, you'd define only one column, and specify row names:
T = table(data,'VariableNames',{'value'},'RowNames',names);
Here are some notable differences between these containers when used as a dictionary:
Speed: The struct has the fastest access by far (10x). containers.Map is about twice as fast as a table when used in an equivalent way (i.e. a single-column table with row names).
Keys: A struct is limited to keys that are valid variable names, the other two can use any string as a key. The containers.Map keys can be scalar numbers as well (floating-point or integer).
Data: They all can contain heterogeneous data (each value has a different type), but a table changes how you index if you do this (T.value(name) for homogeneous data, T.value{name} for heterogeneous data).
Syntax: To lookup the key, containers.Map provides the most straight-forward syntax: M(name). A table turned into a dictionary requires the pointless use of the column name: T.value(name). A struct, if the key is given by the contents of a variable, looks a little awkward: S.(name).
Construction: (See the code below.) containers.Map has the most straight-forward method for building a dictionary from given data. The struct is not meant for this purpose, and therefore it gets complicated.
Memory: This is hard to compare, as containers.Map is implemented in Java and therefore whos reports only 8 bytes (i.e. a pointer). A table can be more memory efficient than a struct, if the data is homogeneous (all values have the same type) and scalar, as in this case all values for one column are stored in a single array.
Other differences:
A table obviously can contain multiple columns, and has lots of interesting methods to manipulate data.
A stuct is actually a struct array, and can be indexed as S(i,j).(name). Of course name can be fixed, rather than a variable, leading to S(i,j).name. Of the three, this is the only built-in type, which is the reason it is so much more efficient.
Here is some code that shows the difference between these three containers for constructing a dictionary and looking up a value:
% Create names
names = cell(1,100);
for ii=1:numel(names)
names{ii} = char(randi(+'az',1,20));
end
name = names{1};
% Create data
values = rand(1,numel(names));
% Construct
M = containers.Map(names,values);
T = table(values.','VariableNames',{'value'},'RowNames',names);
S = num2cell(values);
S = [names;S];
S = struct(S{:});
% Lookup
M(name)
T.value(name)
S.(name)
% Timing lookup
timeit(#()M(name))
timeit(#()T.value(name))
timeit(#()S.(name))
Timing results (microseconds):
M: 16.672
T: 23.393
S: 2.609
You can go simpler, you can access structs using string field:
clear
% define
mydata.('vec')=[2 4 1];
mydata.num=12.58;
% get
select1='num';
value1=mydata.(select1); %method 1
select2='vec';
value2=getfield(mydata,select2) %method 2
Related
Let me suppose I'm facing some data obtained a by SQL database query as below (of course my real case is bigger, thoudans of rows and many columns).
key_names header1 header2 header3
-------------------------------------
key1 a 1 bar
key2 b 2 foo
key3 c 3 bla
My goal is to organize data in Matlab (at work I must use it) in a smart and effecient way to get the following results:
Access data by key obtaining the whole row, like dataset(key, :)
Access data by key plus header getting back a single value dataset.header(key)
If possible, getting a whole column (for all keys).
First of all, I used the dataset class provided by the Statistic Toolbox because it has all these features, but I decided to move away because it is really slow (from what I got, basically it is a wrapper onto cell arrays): the bottleneck of my code was getting the data instead of performing computations. In fact, I read that is better trying to avoid it as much as possible.
The newer class table looks more efficient but still not very much: from what I have understood, it is the new version of dataset as explained in the official documentation.
I considered also using containers.Map but it looks not to have the access by both key and column.
Therefore, struct seems to be the best choice as it is really fast and it has all the features I'm looking for.
So here my questions:
Did someone face my same problem? Which way to organize data is the best one?
Let me suppose struct is the best. How can I efficiently create and fill a structure like this: mystruct.key.header?
I'd like to get something like this:
mystruct.key1.header1
ans = a
Of course I could loop but there must be a better way. I documented in this good starting point but the struct is created empty:
fn1 = {'a', 'b', 'c'}; %first level
fn2 = {'d', 'e', 'f'}; %second level
s2 = cell2struct(cell(size(fn2(:))),fn2(:));
s = cell2struct(repmat({s2},size(fn1(:))),fn1(:))
In the cell2struct documentation all the examples do not rename all the levels. The deal help is a good way to fill the data (depending on the Matlab version as from 7.0 it was substituted with a new coding style) but I'm still missing how to combine the parts of creating the structure with the filling one.
Any suggestion or code example is really appreciated.
If you think, or sure, that structs are the best option for you, you can use table2struct. First, import all the data into Matlab as a table, and then convert it to a structure.
mystruct = table2struct(data);
to access your data you would use the following syntax:
mystruct(key).header
if key is an array, then you need to collect all the values to a list using either a cell array:
values = {mystruct(key).header}
or different variables:
[v1, v2, v3] = mystruct(key).header
but the latter option is problematic if you are not sure hoe many outputs to expect.
I'm not sure what will be more convenient to you, but you can also convert to a scalar structure by setting 'ToScalar' argument to true.
I have code in which there are many objects, where each is referenced by its own numeric ID/pointer. I wish to store these objects in some sort of structure where I can reference the objects from the structure using their numeric ID. However, the IDs are not sequential, and I don't want to allocate space for all of the non-existent IDs. This rules out simply creating an object array.
I'm currently using the containers.Map class which stores values/objects with lookup keys, but it's rather slow. Are there any alternatives?
As an example, this code would create a containers.Map object, map filled with fictional objects:
%create object storage container which uses uint32 keys and can store values of any class
map = containers.Map('KeyType','uint32','ValueType','Any');
%construct objects with ID property, and store in map
for ID = [8 230 755 67 102]
map(ID) = example_obj(ID)
end
Is there anything that could replace the containers.Map map object in this code which wouldn't allocate space for all of the non-present IDs from 1 to 755?
I have a table with 100+ values corresponding to each row, so I'm exploring different ways to store them.
Without any indexes, would I lose anything if I store these 100 values in an integer[] column in postgresql? As compared to storing them in separate columns.
Plus, since we can add indexes to array elemnets,
CREATE INDEX test_index on test ((foo[1]));
Would there be a performance difference queries using such an index as compared to regular index on a column?
As far as I've read, this performance difference would come into picture in arrays with variable length elements; but I'm not sure about fixed length ones.
Don't go for the lazy way.
If you need to store 100 and more values as array, it is ok, if it has sense has array for your application, your data.
If you need to query for a specific element of the array, then this design is not good, regardless of performances, and you must use columns. This will help you in the moment you must delete a "column" in the middle or redesign it.
Anyway, as wrote by Frank in comments, if values are all same type, consider to model them to another table (if also the meaning is the same).
I have a cell array in MATLAB that is reasonably large with very mixed data called sales. One column is a store identifier and that store identifier is a mix of letters and numbers (i.e. AF7-24M). I want to grab all the rows in sales where the store identifier is equal to a particular store identifier. I tried doing some logical indexing but I'm having trouble getting it to work...
I also would rather not just loop over all the rows because I need to do this multiple times and it's quite a slow process
you can use strcmp... for example:
strcmp(sales,'AF7-24M')
For case insensitive string comparison, use strcmpi instead of strcmp.
Am just mulling over what's the best way i.e. data structure to store a data that has several rows and columns. Shoudl I store it as :
1. an array of arrays?
2. NSDictionary?
or is there any grid-like data structure in iOS where I can easily fetch any row/column with ease from the data structure? For example, I must be able to fetch the value in 3rd column in row 5. Currently, say, I store each row as an array and the store these arrays in another array (so an array of arrays, say), then to fetch the value in column 3 in row 5, I need to fetch the 5th row in the array of arrays, and then in the resulting array, I need to fetch the 3rd object. Is there a better way to do this? Thoughts please?
then to fetch the value in column 3 in row 5, I need to fetch the 5th
row in the array of arrays, and then in the resulting array, I need to
fetch the 3rd object. Is there a better way to do this?
An array of arrays is fine for the implementation, and the collection subscripting that was recently added to Objective-C makes this easier -- you can use an expression like
NSString *s = myData[m][n];
to get the string at the nth column of the mth row.
That said, it may still be a good idea to create a separate class for your data structure, so that the rest of your code is protected from needing to know about how the data is stored. That would also simplify the process of changing the implementation from, say, an array of arrays to a SQLite table or something else.
Your data storage class doesn't need to be fancy or complicated. Here's a first pass:
#interface DataTable
- (id)objectAtRow:(NSInteger)row column:(NSInteger)column;
- (void)setObjectAtRow:(NSInteger)row column:(NSInteger)column;
#end
I'm sure you can see how to implement those in terms of an array of arrays. You'll have to do a little work to add rows and/or columns when the caller tries to set a value outside the current bounds. You might also want to add support for things like fast enumeration and writing to and reading from property lists, but that can come later.
There are other ways of doing it, but there's nothing wrong with the method you are using. You could use an NSDictionary with a key of type NSIndexPath, for example, or even a string key of the form "row,col", but I don't see any advantage in those except for sparse matrices.
You can either use an array of arrays, as you're doing, or an array of dictionaries. Either is fine, and I don't think there's any preference for one over the other. It all depends on which way is most convenient for you to set up the data structure in the first place. Accessing the data for the table view is equally easy using either method.