I have two big matrices in two files, A (21,000 x 80,000) and B(3,000 x 80,000) that I want to multiply:
C = A*B_transposed
Currently I have the following script:
A = dlmread('fileA')
B = dlmread('fileB')
C = A*(B')
dlmwrite('result', C)
exit
However, reading the matrices (first two lines) takes very long and Matlab (after each dlmread) proceeds to print these matrices. Do you know how to disable this printing and make the process faster?
To suppress printing you merely need to put a semicolon after each line:
A = dlmread('fileA');
B = dlmread('fileB');
dlmwrite('result', A * B');
One way to speed up the read is to tell Matlab what delimiter you are using, so that it doesn't need to be inferred. For example, if the file is tab delimited you could use
A = dlmread('fileA','\t');
or if it's comma delimited you could use:
A = dlmread('fileA',',');
Other than that, you could consider using a different file format. Where are the files generated? If they're generated by another Matlab process, then you could save them in Matlab's binary format, which is accessed using load and save:
A = [1 2; 3 4];
save('file.mat','A');
clear A;
load('file.mat','A');
For a quick benchmark, I wrote the following matrix to two files:
>> A = [1 2 3; 4 5 6; 7 8 9];
>> dlmwrite('test.txt',A);
>> save('test.mat','A');
I then ran two benchmarks:
>> tic; for i=1:1000; dlmread('test.txt',','); end; toc
Elapsed time is 0.506136 seconds.
>> tic; for i=1:1000; load('test.mat','A'); end; toc
Elapsed time is 0.260381 seconds.
Here the version using load came in at half the time of the dlmread version. You could do your own benchmarking for matrices of the appropriate size and see what works best for you.
Related
Old Title: *Small matrix multiplication much slower in R2016b than R2016a*
(update below)
I find that multiplication of small matrices seems much smaller in R2016b than R2016a. Here's a minimal example:
r = rand(50,100);
s = rand(100,100);
tic; r * s; toc
This takes about 0.0012s in R2016a and 0.018s R2016b.
Creating an artificial loop to make sure this isn't just some initial overhead or something leads to the same loss factor:
tic; for i = 1:1000, a = r*s; end, toc
This takes about 0.18s in R2016a and 2.1s R2016b.
Once I make the matrices much bigger, say r = rand(500,1000); and s = rand(1000,1000), the version behave similarly (R2016b even seems to be ~15% faster). Anyone have any insight as to why this is, or can verify this behavior on another system?
I wonder if it has to do with the new arithmetic expansions implementation (if this feature has some cost for small matrix multiplication): http://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/
update
After many tests, I discovered that this difference was not between MATLAB versions (my apologies). Instead, it seems to be a difference of what's in my base workspace... and worse, the type of variable that's in the base workspace.
I cleared a huge workspace (which had many large cell arrays with many small, differently sized matrix entries). If I clear the variables and do the timing of r*s, I get much faster runtime (x10-x100) than before the workspace was loaded.
So the question is, why does having variables in the workspace affect the matrix multiplication of two small variables? And even more, why does having certain types of variables slow down the workspace dramatically.
Here's an example where a large variable in cell form in the workspace affects the runtime of the matrix multiplication or two unrelated matrices. If I collapse this cell to a matrix, the effect goes away.
clear;
ticReps = 10000;
nCells = 100;
aa = rand(50,100);
bb = rand(100, 100);
% test original timing
tic; for i = 1:ticReps, aa * bb; end
fprintf('original: %3.3f\n', toc);
% make some matrices inside a large number of cells
q = cell(nCells, nCells);
for i = 1:nCells * nCells
q{i} = sprand(10000,10000, 0.0001);
end
% the timing again
tic; for i = 1:ticReps, aa * bb; end
fprintf('after large q cell: %3.3f\n', toc);
% make q into a matrix
q = cat(2, q{:});
% the timing again
tic; for i = 1:ticReps, aa * bb; end
fprintf('after large q matrix: %3.3f\n', toc);
clear q
% the timing again
tic; for i = 1:ticReps, aa * bb; end
fprintf('after clear q: %3.3f\n', toc);
In both staged, q takes up about 2Gb. Result:
original: 0.183
after large q cell: 0.320
after large q matrix: 0.175
after clear q: 0.184
I've received an update from mathworks.
As far as I understand it, they say that this is the fault of the Windows memory manager, which allots memory to large cell arrays in a fairly fragmented manner. Since the (unrelated) multiplication needs memory (for the output), getting this piece of memory now takes longer time due to the memory fragmentation caused by the cell. Linux (as tested) does not have this issue.
Can you use functions inside a matlab parfor loop? for instance I have a code that looks like:
matlabpool open 2
Mat=zeros(100,8);
parfor(i=1:100)
Mat(i,:)=foo();
end
Inside the function I have a bunch of other variables. In particular there is a snippet of code that looks like this:
function z=foo()
err=1;
a=zeros(10000,1);
p=1;
while(err>.0001)
%statements to update err
% .
% .
% .
p=p+1;
%if out of memory allocate more
if(p>length(a))
a=[a;zeros(length(a),1)];
end
end
%trim output after while loop
if(p<length(a))
a(p+1:end)=[];
end
%example output
z=1:8;
end
I read somewhere that all variables that grow inside of a for loop nested inside of a matlab parfor loop must be preallocated, but in this case I have a variable that is preallocated, but might grow later. matlab did not give me any errors when i used mlint, but I was wondering if there are issues that I should be aware of.
Thanks,
-akt
According to Mathworks' documentation, your implementation of the matrix Mat is a sliced variable. That means you update different "slices" of the same matrix in different iterations, but the iterations do not affect each other. There is no data dependency between the loops. So you are ok to go.
Growing a inside function foo does not affect the parfor, because a is a regular variable located in foo's stack. You can do anything with a.
There do exist several issues to notice:
Don't use i and j as iteration counters
Defining i or j in any purpose is bad.
I have never been bored to refer people to this post - Using i and j as variables in Matlab.
Growing a is bad
Every time you do a=[a;zeros(length(a),1)]; the variable is copied as a whole into a new empty place in RAM. As its size doubles each time, it could be a disaster. Not hard to imagine.
A lighter way to "grow" -
% initialize a list of pointers
p = 1;
cc = 1;
c{1} = zeros(1000,1);
% use it
while (go_on)
% do something
disp(c{cc})
....
p=p+1;
if (p>1000)
cc = cc+1;
c{cc} = zeros(1000,1);
p = 1;
end
end
Here, you grow a list of pointers, a cell array c. It's smaller, faster, but still it needs copying in memory.
Use minimal amount of memory
Suppose you only need a small part of a, that is, a(end-8:end), as the function output. (This assumption bases on the caller Mat(i,:)=foo(); where size(Mat, 2)=8. )
Suppose err is not related to previous elements of a, that is, a(p-1), a(p-2), .... (I will loosen this assumption later.)
You don't have to keep all previous results in memory. If a is used up, just throw it.
% if out of memory, don't allocate more; flush it
if (p>1000)
p = 1;
a = zeros(1000,1);
end
The second assumption may be loosen to that you only need a certain number of previous elements, while this number is already known (hopefully it's small). For example,
% if out of memory, flush it, but keep the last 3 results
if (p>1000)
a = [a(end-3:end); zeros(997,1)];
p = 4;
end
Trimming is not that complicated
% trim output after while loop
a(p+1:end)=[];
Proof:
>> a=1:10
a =
1 2 3 4 5 6 7 8 9 10
>> a(3:end)=[]
a =
1 2
>> a=1:10
a =
1 2 3 4 5 6 7 8 9 10
>> a(11:end)=[]
a =
1 2 3 4 5 6 7 8 9 10
>>
The reason is end is 10 (although you can't use it as a stanalone variable), and 11:10 gives an empty array.
The short answer is yes, you can call a function inside a parfor.
The long answer is that parfor only works if each iteration inside the parfor is independent of the other iterations. Matlab has checks to catch when this isn't the case; though I don't know how full-proof they are. In your example, each foo() call can run independently and store its return value in a specific location of Mat that won't be written to or read by any other parfor iteration, so it should work.
This breaks down if values in Mat are being read by foo(). For example, if the parfor ran 4 iterations simultaneously, and inside each iteration, foo() was reading from Mat(1) and then writing a new value to Mat(1) that was based on what it read, the timing of the reads/writes would change the output values, and matlab should flag that.
I have noticed many individual questions on SO but no one good guide to MATLAB optimization.
Common Questions:
Optimize this code for me
How do I vectorize this?
I don't think that these questions will stop, but I'm hoping that the ideas presented here will them something centralized to refer to.
Optimizing Matlab code is kind of a black-art, there is always a better way to do it. And sometimes it is straight-up impossible to vectorize your code.
So my question is: when vectorization is impossible or extremely complicated, what are some of your tips and tricks to optimize MATLAB code? Also if you have any common vectorization tricks I wouldn't mind seeing them either.
Preface
All of these tests are performed on a machine that is shared with others, so it is not a perfectly clean environment. Between each test I clear the workspace to free up memory.
Please don't pay attention to the individual numbers, just look at the differences between the before and after optimisation times.
Note: The tic and toc calls I have placed in the code are to show where I am measuring the time taken.
Pre-allocation
The simple act of pre-allocating arrays in Matlab can give a huge speed advantage.
tic;
for i = 1:100000
my_array(i) = 5 * i;
end
toc;
This takes 47 seconds
tic;
length = 100000;
my_array = zeros(1, length);
for i = 1:length
my_array(i) = 5 * i;
end
toc;
This takes 0.1018 seconds
47 seconds to 0.1 seconds for a single line of code added is an amazing improvement. Obviously in this simple example you could vectorize it to my_array = 5 * 1:100000 (which took 0.000423 seconds) but I am trying to represent the more complicated times when vectorization isn't an option.
I recently found that the zeros function (and others of the same nature) are not as fast at pre-allocating as simply setting the last value to 0:
tic;
length = 100000;
my_array(length) = 0;
for i = 1:length
my_array(i) = 5 * i;
end
toc;
This takes 0.0991 seconds
Now obviously this tiny difference doesn't prove much but you'll have to believe me over a large file with many of these optimisations the difference becomes a lot more apparent.
Why does this work?
The pre-allocation methods allocate a chunk of memory for you to work with. This memory is contiguous and can be pre-fetched, just like an Array in C++ or Java. However if you do not pre-allocate then MATLAB will have to dynamically find more and more memory for you to use. As I understand it, this behaves differently to a Java ArrayList and is more like a LinkedList where different chunks of the array are split all over the place in memory.
Not only is this slower when you write data to it (47 seconds!) but it is also slower every time you access it from then on. In fact, if you absolutely CAN'T pre-allocate then it is still useful to copy your matrix to a new pre-allocated one before you start using it.
What if I don't know how much space to allocate?
This is a common question and there are a few different solutions:
Overestimation - It is better to grossly overestimate the size of your matrix and allocate too much space, than it is to under-allocate space.
Deal with it and fix later - I have seen this a lot where the developer has put up with the slow population time, and then copied the matrix into a new pre-allocated space. Usually this is saved as a .mat file or similar so that it could be read quickly at a later date.
How do I pre-allocate a complicated structure?
Pre-allocating space for simple data-types is easy, as we have already seen, but what if it is a very complex data type such as a struct of structs?
I could never work out to explicitly pre-allocate these (I am hoping someone can suggest a better method) so I came up with this simple hack:
tic;
length = 100000;
% Reverse the for-loop to start from the last element
for i = 1:length
complicated_structure = read_from_file(i);
end
toc;
This takes 1.5 minutes
tic;
length = 100000;
% Reverse the for-loop to start from the last element
for i = length:-1:1
complicated_structure = read_from_file(i);
end
% Flip the array back to the right way
complicated_structure = fliplr(complicated_structure);
toc;
This takes 6 seconds
This is obviously not perfect pre-allocation, and it takes a little while to flip the array afterwards, but the time improvements speak for themselves. I'm hoping someone has a better way to do this, but this is a pretty good hack in the mean time.
Data Structures
In terms of memory usage, an Array of Structs is orders of magnitude worse than a Struct of Arrays:
% Array of Structs
a(1).a = 1;
a(1).b = 2;
a(2).a = 3;
a(2).b = 4;
Uses 624 Bytes
% Struct of Arrays
a.a(1) = 1;
a.b(1) = 2;
a.a(2) = 3;
a.b(2) = 4;
Uses 384 Bytes
As you can see, even in this simple/small example the Array of Structs uses a lot more memory than the Struct of Arrays. Also the Struct of Arrays is in a more useful format if you want to plot the data.
Each Struct has a large header, and as you can see an array of structs repeats this header multiple times where the struct of arrays only has the one header and therefore uses less space. This difference is more obvious with larger arrays.
File Reads
The less number of freads (or any system call for that matter) you have in your code, the better.
tic;
for i = 1:100
fread(fid, 1, '*int32');
end
toc;
The previous code is a lot slower than the following:
tic;
fread(fid, 100, '*int32');
toc;
You might think that's obvious, but the same principle can be applied to more complicated cases:
tic;
for i = 1:100
val1(i) = fread(fid, 1, '*float32');
val2(i) = fread(fid, 1, '*float32');
end
toc;
This problem is no longer simple because in memory the floats are represented like this:
val1 val2 val1 val2 etc.
However you can use the skip value of fread to achieve the same optimizations as before:
tic;
% Get the current position in the file
initial_position = ftell(fid);
% Read 100 float32 values, and skip 4 bytes after each one
val1 = fread(fid, 100, '*float32', 4);
% Set the file position back to the start (plus the size of the initial float32)
fseek(fid, position + 4, 'bof');
% Read 100 float32 values, and skip 4 bytes after each one
val2 = fread(fid, 100, '*float32', 4);
toc;
So this file read was accomplished using two freads instead of 200, a massive improvement.
Function Calls
I recently worked on some code that used many function calls, all of which were located in separate files. So lets say there were 100 separate files, all calling each other. By "inlining" this code into one function I saw a 20% improvement in execution speed from 9 seconds.
Obviously you would not do this at the expense of re-usability, but in my case the functions were automatically generated and not reused at all. But we can still learn from this and avoid excessive function calls where they are not really needed.
External MEX functions incur an overhead for being called. Therefore one call to a large MEX function is a lot more efficient than many calls to smaller MEX functions.
Plotting Many Disconnected Lines
When plotting disconnected data such as a set of vertical lines, the traditional way to do this in Matlab is to iterate multiple calls to line or plot using hold on. However if you have a large number of individual lines to plot, this becomes very slow.
The technique I have found uses the fact that you can introduce NaN values into data to plot and it will cause a break in the data.
The below contrived example converts a set of x_values, y1_values, and y2_values (where the line is from [x, y1] to [x, y2]) to a format appropriate for a single call to plot.
For example:
% Where x is 1:1000, draw vertical lines from 5 to 10.
x_values = 1:1000;
y1_values = ones(1, 1000) * 5;
y2_values = ones(1, 1000) * 10;
% Set x_plot_values to [1, 1, NaN, 2, 2, NaN, ...];
x_plot_values = zeros(1, length(x_values) * 3);
x_plot_values(1:3:end) = x_values;
x_plot_values(2:3:end) = x_values;
x_plot_values(3:3:end) = NaN;
% Set y_plot_values to [5, 10, NaN, 5, 10, NaN, ...];
y_plot_values = zeros(1, length(x_values) * 3);
y_plot_values(1:3:end) = y1_values;
y_plot_values(2:3:end) = y2_values;
y_plot_values(3:3:end) = NaN;
figure; plot(x_plot_values, y_plot_values);
I have used this method to print thousands of tiny lines and the performance improvements were immense. Not only in the initial plot, but the performance of subsequent manipulations such as zoom or pan operations improved as well.
I am trying to concatenate an array of numbers from 1->(a-1) + (a+1)->n.
I was using the cat function
cat(2, 1:a-1, a+1:n)
but I am getting the error
Index exceeds matrix dimensions.
Unless I am completely mistaken, I am just trying to concatenate two matrices of numbers so I'm not quite sure why I'm getting this error.
I'm trying to accomplish this:
>> a = 3;
>> n = 10;
>> cat(2, 1:a-1, a+1:n)
ans =
[1,2,4,5,6,7,8,9,10]
Is this the wrong way to do it? Any idea why this error is coming up?
Do you have a variable called cat in your workspace?
>> cat(2, 2:3, 4:6) # this works fine
ans =
2 3 4 5 6
>> cat = 1:3; # introduce the variable 'cat'
>> cat(2, 2:3, 4:6) # now it breaks
??? Index exceeds matrix dimensions.
It looks like you have a variable named cat in the workspace. The clean way is, of course, to rename the variable: If you have a sufficiently recent version of Matlab (R2012x, I think), you can replace cat in the first line it gets assigned (select the variable to see the gray ticks to the right of the window, indicating where the variable occurs in the function), and use shift+enter to replace all occurrences. Or you can use the Find/Replace all function (make sure you only replace words, not substrings, though).
If you cannot replace the existing variable name, you can use square brackets for catenation along the first and/or second dimension:
cat(2,a,b)
is equivalent to
[a,b]
Just for completeness, the concatenation you're trying to accomplish can also be achieved like so:
R = 1:n;
R = R(R ~= a)
I personally think this looks cleaner than
R = [1:a-1 a+1:n]
but that's personal; I always feel a little confusion towards something like 1:a-1>5 (is it ((1:a)-1)>5 or (1:(a-1))>5 or (1:a)-(1>5) or ...). I just always have to think for a second, whereas I understand my solution instantly.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Hash tables in MATLAB
General Question
Is there any way to get a hashset or hashmap structure in Matlab?
I often find myself in situations where I need to find unique entries or check membership in vectors and using commands like unique() or logical indexing seems to search through the vectors and is really slow for large sets of values. What is the best way to do this in Matlab?
Example
Say, for example, that I have a list of primes and want to check if 3 is prime:
primes = [2,3,5,7,11,13];
if primes(primes==3)
disp('yes!')
else
disp('no!')
end
if I do this with long vectors and many times things get really slow.
In other languages
So basically, is there any equivalents to python's set() and dict(), or similarly Java's java.util.HashSet and java.util.HashMap, in Matlab? And if not, is there any good way of doing lookups in large vectors?
Edit: reflection on the answers
This is the running time i got on the suggestions in the answers.
>> b = 1:1000000;
>> tic; for i=1:100000, any(b==i);; end; toc
Elapsed time is 125.925922 seconds.
s = java.util.HashSet();
>> for i=1:1000000, s.add(i); end
>> tic; for i=1:100000, s.contains(i); end; toc
Elapsed time is 25.618276 seconds.
>> m = containers.Map(1:1000000,ones(1,1000000));
>> tic; for i=1:100000, m(i); end; toc
Elapsed time is 2.715635 seconds
The construction of the java set was quite slow as well though so depending on the problem this might be quite slow as well. Really glad about the containers.Map tip. It really destroys the other examples, and it was instant in set up too.
Like this?
>> m = java.util.HashMap;
>> m.put(1,'hello,world');
>> m.get(1)
ans =
hello, world
Alternatively, if you want a Matlab-native implementation, try
>> m = containers.Map;
>> m('one') = 1;
>> m('one')
ans =
1
This is actually typed - the only keys it will accept are those of type char. You can specify the key and value type when you create the map:
>> m = containers.Map('KeyType','int32','ValueType','double');
>> m(1) = 3.14;
>> m(1)
ans =
3.14
You will now get errors if you try to put any key other than an int32 and any value other than a double.
You also have Sets available to you:
>> s = java.util.HashSet;
>> s.put(1);
>> s.contains(1)
ans =
1
>> s.contains(2)
ans =
0
Depending on how literal your example is, the disp will be a massive overhead (I/O is very slow).
That aside, I believe the quickest way to do a check like this is:
if find(primes==3,1,'first')
disp('yes');
else
disp('no');
end
Edit, you could also use any(primes==3) - a quick speed test shows they're approximately equivalent:
>> biglist = 1:100000;
>> tic;for i=1:10000
find(biglist==i,1,'first');
end
toc
Elapsed time is 1.055928 seconds.
>> tic;for i=1:10000
any(biglist==i);
end
toc
Elapsed time is 1.054392 seconds.