This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Hash tables in MATLAB
General Question
Is there any way to get a hashset or hashmap structure in Matlab?
I often find myself in situations where I need to find unique entries or check membership in vectors and using commands like unique() or logical indexing seems to search through the vectors and is really slow for large sets of values. What is the best way to do this in Matlab?
Example
Say, for example, that I have a list of primes and want to check if 3 is prime:
primes = [2,3,5,7,11,13];
if primes(primes==3)
disp('yes!')
else
disp('no!')
end
if I do this with long vectors and many times things get really slow.
In other languages
So basically, is there any equivalents to python's set() and dict(), or similarly Java's java.util.HashSet and java.util.HashMap, in Matlab? And if not, is there any good way of doing lookups in large vectors?
Edit: reflection on the answers
This is the running time i got on the suggestions in the answers.
>> b = 1:1000000;
>> tic; for i=1:100000, any(b==i);; end; toc
Elapsed time is 125.925922 seconds.
s = java.util.HashSet();
>> for i=1:1000000, s.add(i); end
>> tic; for i=1:100000, s.contains(i); end; toc
Elapsed time is 25.618276 seconds.
>> m = containers.Map(1:1000000,ones(1,1000000));
>> tic; for i=1:100000, m(i); end; toc
Elapsed time is 2.715635 seconds
The construction of the java set was quite slow as well though so depending on the problem this might be quite slow as well. Really glad about the containers.Map tip. It really destroys the other examples, and it was instant in set up too.
Like this?
>> m = java.util.HashMap;
>> m.put(1,'hello,world');
>> m.get(1)
ans =
hello, world
Alternatively, if you want a Matlab-native implementation, try
>> m = containers.Map;
>> m('one') = 1;
>> m('one')
ans =
1
This is actually typed - the only keys it will accept are those of type char. You can specify the key and value type when you create the map:
>> m = containers.Map('KeyType','int32','ValueType','double');
>> m(1) = 3.14;
>> m(1)
ans =
3.14
You will now get errors if you try to put any key other than an int32 and any value other than a double.
You also have Sets available to you:
>> s = java.util.HashSet;
>> s.put(1);
>> s.contains(1)
ans =
1
>> s.contains(2)
ans =
0
Depending on how literal your example is, the disp will be a massive overhead (I/O is very slow).
That aside, I believe the quickest way to do a check like this is:
if find(primes==3,1,'first')
disp('yes');
else
disp('no');
end
Edit, you could also use any(primes==3) - a quick speed test shows they're approximately equivalent:
>> biglist = 1:100000;
>> tic;for i=1:10000
find(biglist==i,1,'first');
end
toc
Elapsed time is 1.055928 seconds.
>> tic;for i=1:10000
any(biglist==i);
end
toc
Elapsed time is 1.054392 seconds.
Related
For example, if I want to read the middle value from magic(5), I can do so like this:
M = magic(5);
value = M(3,3);
to get value == 13. I'd like to be able to do something like one of these:
value = magic(5)(3,3);
value = (magic(5))(3,3);
to dispense with the intermediate variable. However, MATLAB complains about Unbalanced or unexpected parenthesis or bracket on the first parenthesis before the 3.
Is it possible to read values from an array/matrix without first assigning it to a variable?
It actually is possible to do what you want, but you have to use the functional form of the indexing operator. When you perform an indexing operation using (), you are actually making a call to the subsref function. So, even though you can't do this:
value = magic(5)(3, 3);
You can do this:
value = subsref(magic(5), struct('type', '()', 'subs', {{3, 3}}));
Ugly, but possible. ;)
In general, you just have to change the indexing step to a function call so you don't have two sets of parentheses immediately following one another. Another way to do this would be to define your own anonymous function to do the subscripted indexing. For example:
subindex = #(A, r, c) A(r, c); % An anonymous function for 2-D indexing
value = subindex(magic(5), 3, 3); % Use the function to index the matrix
However, when all is said and done the temporary local variable solution is much more readable, and definitely what I would suggest.
There was just good blog post on Loren on the Art of Matlab a couple days ago with a couple gems that might help. In particular, using helper functions like:
paren = #(x, varargin) x(varargin{:});
curly = #(x, varargin) x{varargin{:}};
where paren() can be used like
paren(magic(5), 3, 3);
would return
ans = 16
I would also surmise that this will be faster than gnovice's answer, but I haven't checked (Use the profiler!!!). That being said, you also have to include these function definitions somewhere. I personally have made them independent functions in my path, because they are super useful.
These functions and others are now available in the Functional Programming Constructs add-on which is available through the MATLAB Add-On Explorer or on the File Exchange.
How do you feel about using undocumented features:
>> builtin('_paren', magic(5), 3, 3) %# M(3,3)
ans =
13
or for cell arrays:
>> builtin('_brace', num2cell(magic(5)), 3, 3) %# C{3,3}
ans =
13
Just like magic :)
UPDATE:
Bad news, the above hack doesn't work anymore in R2015b! That's fine, it was undocumented functionality and we cannot rely on it as a supported feature :)
For those wondering where to find this type of thing, look in the folder fullfile(matlabroot,'bin','registry'). There's a bunch of XML files there that list all kinds of goodies. Be warned that calling some of these functions directly can easily crash your MATLAB session.
At least in MATLAB 2013a you can use getfield like:
a=rand(5);
getfield(a,{1,2}) % etc
to get the element at (1,2)
unfortunately syntax like magic(5)(3,3) is not supported by matlab. you need to use temporary intermediate variables. you can free up the memory after use, e.g.
tmp = magic(3);
myVar = tmp(3,3);
clear tmp
Note that if you compare running times with the standard way (asign the result and then access entries), they are exactly the same.
subs=#(M,i,j) M(i,j);
>> for nit=1:10;tic;subs(magic(100),1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0103
>> for nit=1:10,tic;M=magic(100); M(1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0101
To my opinion, the bottom line is : MATLAB does not have pointers, you have to live with it.
It could be more simple if you make a new function:
function [ element ] = getElem( matrix, index1, index2 )
element = matrix(index1, index2);
end
and then use it:
value = getElem(magic(5), 3, 3);
Your initial notation is the most concise way to do this:
M = magic(5); %create
value = M(3,3); % extract useful data
clear M; %free memory
If you are doing this in a loop you can just reassign M every time and ignore the clear statement as well.
To complement Amro's answer, you can use feval instead of builtin. There is no difference, really, unless you try to overload the operator function:
BUILTIN(...) is the same as FEVAL(...) except that it will call the
original built-in version of the function even if an overloaded one
exists (for this to work, you must never overload
BUILTIN).
>> feval('_paren', magic(5), 3, 3) % M(3,3)
ans =
13
>> feval('_brace', num2cell(magic(5)), 3, 3) % C{3,3}
ans =
13
What's interesting is that feval seems to be just a tiny bit quicker than builtin (by ~3.5%), at least in Matlab 2013b, which is weird given that feval needs to check if the function is overloaded, unlike builtin:
>> tic; for i=1:1e6, feval('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 49.904117 seconds.
>> tic; for i=1:1e6, builtin('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 51.485339 seconds.
For example, if I want to read the middle value from magic(5), I can do so like this:
M = magic(5);
value = M(3,3);
to get value == 13. I'd like to be able to do something like one of these:
value = magic(5)(3,3);
value = (magic(5))(3,3);
to dispense with the intermediate variable. However, MATLAB complains about Unbalanced or unexpected parenthesis or bracket on the first parenthesis before the 3.
Is it possible to read values from an array/matrix without first assigning it to a variable?
It actually is possible to do what you want, but you have to use the functional form of the indexing operator. When you perform an indexing operation using (), you are actually making a call to the subsref function. So, even though you can't do this:
value = magic(5)(3, 3);
You can do this:
value = subsref(magic(5), struct('type', '()', 'subs', {{3, 3}}));
Ugly, but possible. ;)
In general, you just have to change the indexing step to a function call so you don't have two sets of parentheses immediately following one another. Another way to do this would be to define your own anonymous function to do the subscripted indexing. For example:
subindex = #(A, r, c) A(r, c); % An anonymous function for 2-D indexing
value = subindex(magic(5), 3, 3); % Use the function to index the matrix
However, when all is said and done the temporary local variable solution is much more readable, and definitely what I would suggest.
There was just good blog post on Loren on the Art of Matlab a couple days ago with a couple gems that might help. In particular, using helper functions like:
paren = #(x, varargin) x(varargin{:});
curly = #(x, varargin) x{varargin{:}};
where paren() can be used like
paren(magic(5), 3, 3);
would return
ans = 16
I would also surmise that this will be faster than gnovice's answer, but I haven't checked (Use the profiler!!!). That being said, you also have to include these function definitions somewhere. I personally have made them independent functions in my path, because they are super useful.
These functions and others are now available in the Functional Programming Constructs add-on which is available through the MATLAB Add-On Explorer or on the File Exchange.
How do you feel about using undocumented features:
>> builtin('_paren', magic(5), 3, 3) %# M(3,3)
ans =
13
or for cell arrays:
>> builtin('_brace', num2cell(magic(5)), 3, 3) %# C{3,3}
ans =
13
Just like magic :)
UPDATE:
Bad news, the above hack doesn't work anymore in R2015b! That's fine, it was undocumented functionality and we cannot rely on it as a supported feature :)
For those wondering where to find this type of thing, look in the folder fullfile(matlabroot,'bin','registry'). There's a bunch of XML files there that list all kinds of goodies. Be warned that calling some of these functions directly can easily crash your MATLAB session.
At least in MATLAB 2013a you can use getfield like:
a=rand(5);
getfield(a,{1,2}) % etc
to get the element at (1,2)
unfortunately syntax like magic(5)(3,3) is not supported by matlab. you need to use temporary intermediate variables. you can free up the memory after use, e.g.
tmp = magic(3);
myVar = tmp(3,3);
clear tmp
Note that if you compare running times with the standard way (asign the result and then access entries), they are exactly the same.
subs=#(M,i,j) M(i,j);
>> for nit=1:10;tic;subs(magic(100),1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0103
>> for nit=1:10,tic;M=magic(100); M(1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0101
To my opinion, the bottom line is : MATLAB does not have pointers, you have to live with it.
It could be more simple if you make a new function:
function [ element ] = getElem( matrix, index1, index2 )
element = matrix(index1, index2);
end
and then use it:
value = getElem(magic(5), 3, 3);
Your initial notation is the most concise way to do this:
M = magic(5); %create
value = M(3,3); % extract useful data
clear M; %free memory
If you are doing this in a loop you can just reassign M every time and ignore the clear statement as well.
To complement Amro's answer, you can use feval instead of builtin. There is no difference, really, unless you try to overload the operator function:
BUILTIN(...) is the same as FEVAL(...) except that it will call the
original built-in version of the function even if an overloaded one
exists (for this to work, you must never overload
BUILTIN).
>> feval('_paren', magic(5), 3, 3) % M(3,3)
ans =
13
>> feval('_brace', num2cell(magic(5)), 3, 3) % C{3,3}
ans =
13
What's interesting is that feval seems to be just a tiny bit quicker than builtin (by ~3.5%), at least in Matlab 2013b, which is weird given that feval needs to check if the function is overloaded, unlike builtin:
>> tic; for i=1:1e6, feval('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 49.904117 seconds.
>> tic; for i=1:1e6, builtin('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 51.485339 seconds.
This question already has answers here:
Faster way to initialize arrays via empty matrix multiplication? (Matlab)
(4 answers)
Closed 9 years ago.
/edit: See here for an interesting discussion of the topic. Thanks #Dan
Using a(m,n) = 0 appears to be faster, depending of the size of matrix a, than a = zeros(m,n). Are both variants the same when it comes to pre-allocation before a loop?
They are definately not the same.
Though there are ways to beat the performance of a=zeros(m,n), simply doing a(m,n) = 0 is not a safe way to do it. If any entries in a already exist they will keep existing.
See this for some nice options, also consider doing the loop backwards if you don't mind the risk.
I think it depends on your m and n. You can check the time for yourself
tic; b(2000,2000) = 0; toc;
Elapsed time is 0.004719 seconds.
tic; a = zeros(2000,2000); toc;
Elapsed time is 0.004399 seconds.
tic; a = zeros(2,2); toc;
Elapsed time is 0.000030 seconds.
tic; b(2,2) = 0; toc;
Elapsed time is 0.000023 seconds.
I have the following vector v = [r1 r2 r3 r4 r5 .... rn], with r integer numbers.
I want to check:
if r1 not equal to r2 not equal to r3 ... not equal to rn (all different of each other):
print v
else (some elements are equal and other not equal):
print the index of the equal elements.
I recommend using the unique command. It will return the values of all unique values.
If you want to check that all values in your matrix v are unique, I would use the following command:
everything_is_unique = length(unique(v))==length(v);
You can also return the indices of the equal elements.
See the documentation on unique for more information.
Alternate to using unique you can also sort the elements and check whether they are all different:
all(diff(sort(v)))
By using sort with more input arguements you could get the indices you are looking for.
I'll give one more solution, and a comparison of all methods tried so far.
My solution is based on the observation that when using builtin functions like sort() or unique(), you lose opportunities for early escapes. That is, the sort() will have to sort the vector completely before you can continue with your algorithm, even though this is not required if two equal values are already detected inside the sort.
Therefore, I simply iterate through the array, and compare the current value to all following values using any(). This works around some of these issues, and works well enough for a lot of cases.
However, the worst case complexity is O(N²), which is a hell of a lot worse than sort(), which has only O(N·log(N)). So as usual, it all depends on context :)
Trying this:
clc
N = 1e4;
% Zigzag's solution
tic
for ii = 1:1e2
v = randi(N, N,1);
length(unique(v))==length(v);
end
toc
% Dennis Jaheruddin's solution
tic
for ii = 1:1e4
v = randi(N, N,1);
all(diff(sort(v)));
end
toc
% My solution
tic
for ii = 1:1e4
v = randi(N, N,1);
cond = true;
for jj = 1:numel(v)
if any(v(jj) == v(jj+1:end))
cond = false;
break;
end
end
end
toc
The random numbers are generated inside the loops to ensure a variety of different cases will come by. Results on my PC:
Elapsed time is 16.787976 seconds. % unique
Elapsed time is 14.284696 seconds. % sort + diff
Elapsed time is 5.376655 seconds. % loop + any
So explicit looping (provided feature accel is on) with early exit is actually almost three times faster than the standard "vectorized" approach :)
PS - I also tried to nest another loop to try and improve having to compare all value before detecting equal values (first v(jj)==v(jj+1:end) is evaluated completely, before any() can start doing its job), but here, the overheads really start to get in the way (or the JIT is not coping well enough with this sort of thing, I don't know). In theory, this should be even faster of course, but unfortunately, not in MATLAB :)
However, change the random number generation
v = randi(N, N,1);
into
v = randi(N*N, N,1);
and the results are quite different:
Elapsed time is 0.162625 seconds. % unique
Elapsed time is 0.147369 seconds. % sort + diff
Elapsed time is 30.767247 seconds. % loop + any
Here I used only 100 iterations instead of 10.000, for obvious reasons :)
I have two big matrices in two files, A (21,000 x 80,000) and B(3,000 x 80,000) that I want to multiply:
C = A*B_transposed
Currently I have the following script:
A = dlmread('fileA')
B = dlmread('fileB')
C = A*(B')
dlmwrite('result', C)
exit
However, reading the matrices (first two lines) takes very long and Matlab (after each dlmread) proceeds to print these matrices. Do you know how to disable this printing and make the process faster?
To suppress printing you merely need to put a semicolon after each line:
A = dlmread('fileA');
B = dlmread('fileB');
dlmwrite('result', A * B');
One way to speed up the read is to tell Matlab what delimiter you are using, so that it doesn't need to be inferred. For example, if the file is tab delimited you could use
A = dlmread('fileA','\t');
or if it's comma delimited you could use:
A = dlmread('fileA',',');
Other than that, you could consider using a different file format. Where are the files generated? If they're generated by another Matlab process, then you could save them in Matlab's binary format, which is accessed using load and save:
A = [1 2; 3 4];
save('file.mat','A');
clear A;
load('file.mat','A');
For a quick benchmark, I wrote the following matrix to two files:
>> A = [1 2 3; 4 5 6; 7 8 9];
>> dlmwrite('test.txt',A);
>> save('test.mat','A');
I then ran two benchmarks:
>> tic; for i=1:1000; dlmread('test.txt',','); end; toc
Elapsed time is 0.506136 seconds.
>> tic; for i=1:1000; load('test.mat','A'); end; toc
Elapsed time is 0.260381 seconds.
Here the version using load came in at half the time of the dlmread version. You could do your own benchmarking for matrices of the appropriate size and see what works best for you.