I have code in which there are many objects, where each is referenced by its own numeric ID/pointer. I wish to store these objects in some sort of structure where I can reference the objects from the structure using their numeric ID. However, the IDs are not sequential, and I don't want to allocate space for all of the non-existent IDs. This rules out simply creating an object array.
I'm currently using the containers.Map class which stores values/objects with lookup keys, but it's rather slow. Are there any alternatives?
As an example, this code would create a containers.Map object, map filled with fictional objects:
%create object storage container which uses uint32 keys and can store values of any class
map = containers.Map('KeyType','uint32','ValueType','Any');
%construct objects with ID property, and store in map
for ID = [8 230 755 67 102]
map(ID) = example_obj(ID)
end
Is there anything that could replace the containers.Map map object in this code which wouldn't allocate space for all of the non-present IDs from 1 to 755?
Related
I'm having trouble understanding what the Hash Function does and doesn't do, as well as what exactly a Bucket is.
From my understanding:
A HashTable is a data structure that maps keys to values using a Hash Function.
A HashFunction is meant to map data from an array of arbitrary/unknown size to a data array of fixed size.
There can be duplicate Values in the original data array, but this is irrelevant.
Each Value will have a unique Key. Thus, each Key has exactly 1 Value.
The HashFunction will generate a HashCode for each (Value, Key) pair. However, Collisions can occur in which multiple (Value, Key) pairs map to the same HashCode.
This can be remedied by using either Chaining/Open Addressing methods.
The HashCode is the index value indicating the position of a particular entry from the original data array within the Bucket array.
The Bucket array is the fixed data array constructed that will contain the entries from the original array.
My questions:
How are the Keys generated for each value? Is the HashFunction meant to generate both Key and HashCode values for each entry? Does each Bucket thus contain only one entry (assuming a Chaining implementation to remedy Collision)?
How are the Keys generated for each value?
Key is not generated, it is provided by you and serves as an input to the hash function which in turn converts that key into index of hash table. Simply speaking:
H(key)=index
so the value you are looking for is:
hash_table[index] = value
Is the HashFunction meant to generate HashCode values for each entry?
It all depends on the implementation of hash function and hash table. Some hash functions might generate a hashcode out of provided key and then for example take its modulo(size) where size is the size of hash table, in order to get the index. Others might convert the key directly into index. In either case the ultimate goal of hash function is to find the location of searched data within hash table in constant time.
Does each Bucket thus contain only one entry (assuming a Chaining implementation to remedy Collision)?
Ideally each key should be mapped to a unique index but mostly that's not the case since the number of buckets (i.e. indices) is far smaller than the number of keys so the average length of a chain per bucket (i.e. number of collisions per bucket) is no.of keys/no.of indices
MATLAB tables let you index into any column/field using the row name, e.g., MyTable.FourthColumn('SecondRowName'). Compared to this, dictionaries (containers.Map) seem primitive, e.g., it serves the role of a 1-column table. It also has its own dedicated syntax, which slows down the thinking about how to code.
I'm beginning to think that I can forget the use of dictionaries. Are there typical situations for which that would not be advisable?
TL;DR: No. containers.Map has uses that cannot be replaced with a table. And I would not choose a table for a dictionary.
containers.Map and table have many differences worth noting. They each have their use. A third container we can use to create a dictionary is a struct.
To use a table as a dictionary, you'd define only one column, and specify row names:
T = table(data,'VariableNames',{'value'},'RowNames',names);
Here are some notable differences between these containers when used as a dictionary:
Speed: The struct has the fastest access by far (10x). containers.Map is about twice as fast as a table when used in an equivalent way (i.e. a single-column table with row names).
Keys: A struct is limited to keys that are valid variable names, the other two can use any string as a key. The containers.Map keys can be scalar numbers as well (floating-point or integer).
Data: They all can contain heterogeneous data (each value has a different type), but a table changes how you index if you do this (T.value(name) for homogeneous data, T.value{name} for heterogeneous data).
Syntax: To lookup the key, containers.Map provides the most straight-forward syntax: M(name). A table turned into a dictionary requires the pointless use of the column name: T.value(name). A struct, if the key is given by the contents of a variable, looks a little awkward: S.(name).
Construction: (See the code below.) containers.Map has the most straight-forward method for building a dictionary from given data. The struct is not meant for this purpose, and therefore it gets complicated.
Memory: This is hard to compare, as containers.Map is implemented in Java and therefore whos reports only 8 bytes (i.e. a pointer). A table can be more memory efficient than a struct, if the data is homogeneous (all values have the same type) and scalar, as in this case all values for one column are stored in a single array.
Other differences:
A table obviously can contain multiple columns, and has lots of interesting methods to manipulate data.
A stuct is actually a struct array, and can be indexed as S(i,j).(name). Of course name can be fixed, rather than a variable, leading to S(i,j).name. Of the three, this is the only built-in type, which is the reason it is so much more efficient.
Here is some code that shows the difference between these three containers for constructing a dictionary and looking up a value:
% Create names
names = cell(1,100);
for ii=1:numel(names)
names{ii} = char(randi(+'az',1,20));
end
name = names{1};
% Create data
values = rand(1,numel(names));
% Construct
M = containers.Map(names,values);
T = table(values.','VariableNames',{'value'},'RowNames',names);
S = num2cell(values);
S = [names;S];
S = struct(S{:});
% Lookup
M(name)
T.value(name)
S.(name)
% Timing lookup
timeit(#()M(name))
timeit(#()T.value(name))
timeit(#()S.(name))
Timing results (microseconds):
M: 16.672
T: 23.393
S: 2.609
You can go simpler, you can access structs using string field:
clear
% define
mydata.('vec')=[2 4 1];
mydata.num=12.58;
% get
select1='num';
value1=mydata.(select1); %method 1
select2='vec';
value2=getfield(mydata,select2) %method 2
Here is a sample reduce function from the Matlab documentation:
function MeanDistReduceFun(intermKey, intermValIter, outKVStore)
sumLen = [0 0];
while hasnext(intermValIter)
sumLen = sumLen + getnext(intermValIter);
end
add(outKVStore, 'Mean', sumLen(1)/sumLen(2));
end
This creates a final dataset tagged by the key Mean. However, I would like to dynamically generate the key based off the unique keys from the map stage. Can I simply use intermKey in place of 'Mean' in the add function, or should I include the key in intermValIter somehow and extract it?
The short answer: Yes, you can use intermKey to use the unique keys passed by the map function.
The long answer: Between the map and reduce stages, all of the unique keys and their associated values are stored as ValueIterator objects:
http://www.mathworks.com/help/matlab/ref/valueiterator-object.html
Each ValueIterator object is associated with a single key. So once this intermediate grouping is done, the reduce function is called a single time for each unique key passed by the mapper, and intermValIter contains all of the values associated with the unique key intermKey. Therefore, specifying intermKey will use each of the unique keys passed by the mapper.
As the above doc link mentions, the only interaction you have with the ValueIterator object is by using hasnext and getnext to loop through the values contained in intermValIter.
Am just mulling over what's the best way i.e. data structure to store a data that has several rows and columns. Shoudl I store it as :
1. an array of arrays?
2. NSDictionary?
or is there any grid-like data structure in iOS where I can easily fetch any row/column with ease from the data structure? For example, I must be able to fetch the value in 3rd column in row 5. Currently, say, I store each row as an array and the store these arrays in another array (so an array of arrays, say), then to fetch the value in column 3 in row 5, I need to fetch the 5th row in the array of arrays, and then in the resulting array, I need to fetch the 3rd object. Is there a better way to do this? Thoughts please?
then to fetch the value in column 3 in row 5, I need to fetch the 5th
row in the array of arrays, and then in the resulting array, I need to
fetch the 3rd object. Is there a better way to do this?
An array of arrays is fine for the implementation, and the collection subscripting that was recently added to Objective-C makes this easier -- you can use an expression like
NSString *s = myData[m][n];
to get the string at the nth column of the mth row.
That said, it may still be a good idea to create a separate class for your data structure, so that the rest of your code is protected from needing to know about how the data is stored. That would also simplify the process of changing the implementation from, say, an array of arrays to a SQLite table or something else.
Your data storage class doesn't need to be fancy or complicated. Here's a first pass:
#interface DataTable
- (id)objectAtRow:(NSInteger)row column:(NSInteger)column;
- (void)setObjectAtRow:(NSInteger)row column:(NSInteger)column;
#end
I'm sure you can see how to implement those in terms of an array of arrays. You'll have to do a little work to add rows and/or columns when the caller tries to set a value outside the current bounds. You might also want to add support for things like fast enumeration and writing to and reading from property lists, but that can come later.
There are other ways of doing it, but there's nothing wrong with the method you are using. You could use an NSDictionary with a key of type NSIndexPath, for example, or even a string key of the form "row,col", but I don't see any advantage in those except for sparse matrices.
You can either use an array of arrays, as you're doing, or an array of dictionaries. Either is fine, and I don't think there's any preference for one over the other. It all depends on which way is most convenient for you to set up the data structure in the first place. Accessing the data for the table view is equally easy using either method.
I want to store a collection of data in ArrayList or Hastable but data retrival should be efficient and fast. I want to know the data structure hides between ArrayList and Hastable (i.e Linked list,Double Linked list)
An ArrayList is a dynamic array that grows as new items are added that go beyond the current capacity of the list. Items in ArrayList are accessed by index, much like an array.
The Hashtable is a hashtable behind the scenes. The underlying data structure is typically an array but instead of accessing via an index, you access via a key field which maps to a location in the hashtable by calling the key object's GetHashCode() method.
In general, ArrayList and Hashtable are discouraged in .NET 2.0 and above in favor of List<T> and Dictionary<TKey, TValue> which are much better generic versions that perform better and don't have boxing costs for value types.
I've got a blog post that compares the various benefits of each of the generic containers here that may be useful:
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
While it talks about the generic collecitons in particular, ArrayList would have similar complexity costs to List<T> and Hashtable to Dictionary<TKey, TValue>
A hashtable will map string values to values in your hashtable. An arraylist puts a bunch of items in numbered order.
Hastable ht = new Hastable();
ht("examplenum") = 5;
ht("examplenum2") = 7;
//Then to retrieve
int i = ht("example"); //value of 5
ArrayList al = new ArrayList();
al.Add(2);
al.Add(3);
//Then to retrieve
int j = al[0] //value of 2
As its name implies an ArrayList (or a List) is implemented with an Array... and in fact a Hashtable is also implemented with the same data structure. So both of them have a constant access cost (the best of all possible).
What you have to think about is what kind of key do you need. If your data must be accessed with an arbitrary key (for example, a string) you will not be able to use an ArrayList. Also, a Hashtable should be your preferred choice if the keys are not (more or less) correlative.
Hope it helps.