I would like to construct a hash table in Matlab, the keys of which are matrices of different sizes, and the values of which are also matrices. The containers.Map class only allows strings as keys. I can certainly just use a cell for the keys, a cell for the value and match the indices of the two cells. Is there a better way to construct the hash table and the associated hash function?
I just played around with containers.Map a little, it seems that you can use char arrays of any length as keys.
>> a = containers.Map;
>> a(repmat('bla',50,500)) = 1;
>> a(repmat('bla',50,500))
ans =
1
You can also convert any numeric array into a char array as follows:
>> x = randn(4)
x =
-0.7371 -0.0799 0.1129 -1.1667
-1.7499 0.8985 0.4400 -1.8543
0.9105 0.1837 0.1017 -1.1407
0.8671 0.2908 2.7873 -1.0933
>> s = char(typecast(x(:),'uint8')')
s =
''uÔ_þ翼qÿû¿/å\¬"í?éúè#¿ë?.YðjÛs´¿Ó¶Ó·PÀì?+Ç? Õ9NÒ?Üéñé¼?
°À9-(Ü?ç¥ìƺ?NsivL#V*aó¨ªò¿{Ò5«ý¿Q8ß:#ò¿í=µU~ñ¿'
Or using the full 16-bit Unicode values allowed by char:
>> s = char(typecast(x(:),'uint16')')
s =
'疺㓦쁁뿛쓆遫뿅䅀庲뿋ꁰ頳劜㿡礋쮼㿘旈帡㿨ﮢ电玼㿼譍醪㿳랝趚蠷뿴瞶ꆲ쀂伴愹?㿬ꑨ廆뿽㼝ὧ㾱?ﺳ⩝㾢棑罓턽䀁ᕾ統렆뾱'
So putting these together, it is possible to use any array (properly converted to a char array) as key into a hash table:
>> a(s) = 5;
>> a(s)
ans =
5
And, given the numeric array cast to char, it is possible to cast it back to numeric array as well (though the shape of the array will get lost):
x = randn(1,20);
s = char(typecast(x,'uint8'));
y = typecast(uint8(s),'double');
assert(isequal(x,y)) % does not throw an error
There is another alternative. It is possible to use keys of type different from a string with containers.Map, as stated in the documentation. Keys can be either char arrays, or numeric scalars; they cannot be numeric arrays:
>> a = containers.Map('KeyType','double','ValueType','double');
>> a(5) = 10;
>> a([5,3]) = 5;
Error using containers.Map/subsasgn
Specified key type does not match the type expected for this container.
Thus, you could compute a hash value (as a floating-point double value or 64-bit integer value) from your arrays. How to best do this I don't know, maybe the dot product with a set of random values? At this related question there are some suggestions. There are also some functions on the MATLAB File Exchange that would be helpful (e.g. here and here).
Related
I have a vector of cells (say, size of 50x1, called tokens) , each of which is a struct with properties x,f1,f2 which are strings representing numbers. for example, tokens{15} gives:
x: "-1.4343429"
f1: "15.7947111"
f2: "-5.8196158"
and I am trying to put those numbers into 3 vectors (each is also 50x1) whose type is float. So I create 3 vectors:
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
and that works fine (why wouldn't it?). But then when I try to populate those vectors: (L is a for loop index)
x(L)=tokens{L}.x;
.. also for the other 2
I get :
The following error occurred converting from string to single:
Conversion to single from string is not possible.
Which I can understand; implicit conversion doesn't work for single. It does work if x, f1 and f2 are of type 50x1 double.
The reason I am doing it with floats is because the data I get is from a C program which writes the some floats into a file to be read by matlab. If I try to convert the values into doubles in the C program I get rounding errors...
So, (after what I hope is a good question,) how might I be able to get the numbers in those strings, at the right precision? (all the strings have the same number of decimal places: 7).
The MCVE:
filedata = fopen('fname1.txt','rt');
%fname1.txt is created by a C program. I am quite sure that the problem isn't there.
scanned = textscan(filedata,'%s','Delimiter','\n');
raw = scanned{1};
stringValues = strings(50,1);
for K=1:length(raw)
stringValues(K)=raw{K};
end
clear K %purely for convenience
regex = 'x=(?<x>[\-\.0-9]*),f1=(?<f1>[\-\.0-9]*),f2=(?<f2>[\-\.0-9]*)';
tokens = regexp(stringValues,regex,'names');
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
for L=1:length(tokens)
x(L)=tokens{L}.x;
f1(L)=tokens{L}.f1;
f2(L)=tokens{L}.f2;
end
Use function str2double before assigning into yours arrays (and then cast it to single if you want). Strings (char arrays) must be explicitely converted to numbers before using them as numbers.
I have a table in Matlab with some columns representing 128 bit hashes.
I would like to match rows, to one or more rows, based on these hashes.
Currently, the hashes are represented as hexadecimal strings, and compared with strcmp(). Still, it takes many seconds to process the table.
What is the fastest way to compare two hashes in matlab?
I have tried turning them into categorical variables, but that is much slower. Matlab as far as I know does not have a 128 bit numerical type. nominal and ordinal types are deprecated.
Are there any others that could work?
The code below is analogous to what I am doing:
nodetype = { 'type1'; 'type2'; 'type1'; 'type2' };
hash = {'d285e87940fb9383ec5e983041f8d7a6'; 'd285e87940fb9383ec5e983041f8d7a6'; 'ec9add3cf0f67f443d5820708adc0485'; '5dbdfa232b5b61c8b1e8c698a64e1cc9' };
entries = table(categorical(nodetype),hash,'VariableNames',{'type','hash'});
%nodes to match. filter by type or some other way so rows don't match to
%themselves.
A = entries(entries.type=='type1',:);
B = entries(entries.type=='type2',:);
%pick a node/row with a hash to find all counterparts of
row_to_match_in_A = A(1,:);
matching_rows_in_B = B(strcmp(B.hash,row_to_match_in_A.hash),:);
% do stuff with matching rows...
disp(matching_rows_in_B);
The hash strings are faithful representations of what I am using, but they are not necessarily read or stored as strings in the original source. They are just converted for this purpose because its the fastest way to do the comparison.
Optimization is nice, if you need it. Try it out yourself and measure the performance gain for relevant test cases.
Some suggestions:
Sorted arrays are easier/faster to search
Matlab's default numbers are double, but you can also construct integers. Why not use 2 uint64's instead of the 128bit column? First search for the upper 64bit, then for the lower; or even better: use ismember with the row option and put your hashes in rows:
A = uint64([0 0;
0 1;
1 0;
1 1;
2 0;
2 1]);
srch = uint64([1 1;
0 1]);
[ismatch, loc] = ismember(srch, A, 'rows')
> loc =
4
2
Look into the compare functions you use (eg edit ismember) and strip out unnecessary operations (eg sort) and safety checks that you know in advance won't pose a problem. Like this solution does. Or if you intend do call a search function multiple times, sort in advance and skip the check/sort in the search function later on.
Suppose I use containers map to create a dictionary in MATLAB which has the following map:
1-A;
2-B;
3-C;
Denote the dictionary as D.
Now I have an input list [2,1,3], and what I am expecting is [B,A,C]. The problem is, I can't just use [2,1,3] as the input list for D, but only input 2,1 and 3 one by one for D and get B, A, C each time.
This can get the job done but as you can see, it's a bit less efficient.
So my question is: is there anything else I can do to let the dictionary return the whole list at the same time?
As far as I can find there is no one-step solution like python's dict.items. You can, however, get in a few lines. mydict.keys() gives you the keys of the dict as a cell array, and mydict.values() gives you the values as a cell array, so you can (in theory) combine those:
>> mykeys = mydict.keys();
>> myvals = mydict.values();
>> mypairs = [mykeys',myvals']
mypairs =
3×2 cell array
'A' [1]
'B' [2]
'C' [3]
However, in principle maps are unordered, and I can't find anything in the MATLAB documentation that says that the order returns by keys and the order returned by values is necessarily consistent (unlike Python). So if you want to be extra safe, you can call values with a cell array of the keys you want, which in this case would be all the keys:
>> mykeys = mydict.keys();
>> myvals = mydict.values(mykeys);
>> mypairs = [mykeys',myvals']
mypairs =
3×2 cell array
'A' [1]
'B' [2]
'C' [3]
I have a cell_array for which 29136x1 cell value shows in the workspace pallet. I also have a map new_labels with 4x1 Map in workspace pallet. Printing new_label on prompt gives
new_labels =
Map with properties:
Count: 4
KeyType: char
ValueType: double
Each entry in the cell_array is the key in the map, but the problem is there a type mismatch as keyType in map is char and entries of cell_array are of type cell.
Because of this I cannot access the map and hence something like the following:
arrayfun(#(x) new_labels(x), cell_array, 'un',0);
gives error Specified key type does not match the type expected for this container.
I tried converting to char type using char_cell_array = char(cell_array) but that converts the array to be of size 29136x4 which means every entry is just one char and not really a string.
Any help appreciated.
If you want to use the iterative way, you have to use cellfun. arrayfun operates on numeric arrays. Because cell_array is a cell array, you need to use cellfun instead of arrayfun as cellfun will iterate over cell arrays instead.
However, what you're really after is specifying more than one key into the dictionary to get the associated values. Don't use arrayfun/cellfun for that. There is a dedicated MATLAB function designed to take in multiple keys. Use the values method for that which is built-in to the containers.Map interface:
out = values(new_labels, cell_array);
By just using values(new_labels), this retrieves all of the values in the dictionary. If you want to retrieve specific values based on input keys, supply a second input parameter that is a cell array which contains all of the keys you want to access in the containers.Map object. Because you already have this cell array, you simply use this as the second input into values.
Running Example
>> A = containers.Map({1,2,3,4}, {'a','b','c','d'})
A =
Map with properties:
Count: 4
KeyType: double
ValueType: char
>> cell_array = {1,2,2,3,3,4,1,1,1,2,2};
>> out = values(A, cell_array)
out =
'a' 'b' 'b' 'c' 'c' 'd' 'a' 'a' 'a' 'b' 'b'
Say I have an array that contains the following elements:
1.0e+14 *
1.3325 1.6485 2.0402 1.0485 1.2027 2.0615 1.7432 1.9709 1.4807 0.9012
Now, is there a way to grab 1.0e+14 * (base and exponent) individually?
If I do arr(10), then this will return 9.0120e+13 instead of 0.9012e+14.
Assuming the question is to grab any elements in the array with coefficient less than one. Is there a way to obtain 1.0e+14, so that I could just do arr(i) < 1.0e+14?
I assume you want string output.
Let a denote the input numeric array. You can do it this way, if you don't mind using evalc (a variant of eval, which is considered bad practice):
s = evalc('disp(a)');
s = regexp(s, '[\de+-\.]+', 'match');
This produces a cell array with the desired strings.
Example:
>> a = [1.2e-5 3.4e-6]
a =
1.0e-04 *
0.1200 0.0340
>> s = evalc('disp(a)');
>> s = regexp(s, '[\de+-\.]+', 'match')
s =
'1.0e-04' '0.1200' '0.0340'
Here is the original answer from Alain.
Basic math can tell you that:
floor(log10(N))
The log base 10 of a number tells you approximately how many digits before the decimal are in that number.
For instance, 99987123459823754 is 9.998E+016
log10(99987123459823754) is 16.9999441, the floor of which is 16 - which can basically tell you "the exponent in scientific notation is 16, very close to being 17".
Now you have the exponent of the scientific notation. This should allow you to get to whatever your goal is ;-).
And depending on what you want to do with your exponent and the number, you could also define your own method. An example is described in this thread.