Dataset array indexing is very slow with Statistics Toolbox - matlab

Why is indexing into a dataset array so slow? A peak into the dataset.subsref function shows that all the columns of the dataset are stored in a cell array. However, cell indexing is much, much faster than dataset indexing, which is just indexing into a cell array under the hood. My guess is that this has to do with some overhead with MATLAB OOP. Any ideas on how to speed this up?
%% Using R2011a, PCWIN64
feature accel off; % turn off JIT
dat = (1:1e6)';
dat2 = repmat({'abc'}, 1e6, 1);
celldat = {dat dat2};
ds = dataset(dat, dat2);
N = 1e2;
tic;
for j = 1:N
tmp = celldat{2};
end
toc;
tic;
for j = 1:N
tmp2 = ds.dat2; % 2.778sec spent on line 262 of dataset.subsref
end
toc;
feature accel on; % turn JIT back on
Elapsed time is 0.000165 seconds.
Elapsed time is 2.778995 seconds.
EDIT: I've updated the example to be more like the problem I'm seeing. A huge amount of time is spent on line 262 of dataset.subsref - "b = a.data{varIndex};". It's very strange to me since it is a simple cell dereference. I'm wondering if there is a OOP trick that will allow me to index into "a.data" without the strange overhead.
EDIT2: As per Andrew's suggestion, I've submitted this as a bug to MatWorks. Will update if I hear anything from them.
EDIT3: Matlab responded and said they are aware of the problem now and will fix it in a future release. They noted that the problem is specific to cell arrays, and to try to avoid them if possible.

Yes, you are most likely seeing the overhead of Matlab OOP method calls. They are expensive compared to cell indexing, or method calls in some other languages. Your .513872 seconds / 1e4 ~= 51 microseconds per call, which is the approximate cost of a few MCOS method calls; they're ~5-15 microsececonds each on machines I've seen. So that looks like method overhead of the subsref() call itself and other methods and property accesses it's calling in turn.
For some details and discussion, see: Is MATLAB OOP slow or am I doing something wrong?
I don't know of a way to make this faster, aside from structuring your code to minimize calls to "ds.dat" or other methods. If possible, when working with the data set, call "ds.dat" once, keep it in a local variable and work with it there, and then push it back in to the ds object.
Caveat: I don't know what "feature accel" does or how it could affect these timings.
Edit: I threw it in the profiler like Richie suggested. On my R2009b, looks like about half the time is method call overhead, and the rest in find(), strcmp(), and other operations inside subsref; subsref doesn't call any other methods in turn.
Edit 2: The revised example is showing much higher timings. Method call overhead doesn't account for all that.

Related

Matlab break iteration if line execution requires too much time

I run a Matlab code with an iteration loop, and during each iteration it sample random numbers and uses the function intlinprog. My issue is that, due to the large amount of data I provide to the intlinprog function and to the stochastic values I assign to part of its variables, some of the iterations take a really long time.
My code is more or less like this:
rounds = 1E3;
Total_PF = zeros(rounds,4893);
for i=1:rounds
i
cT = zeros (4894,1);
cT(4894,1) = 1;
xint = linspace(1,4893,4893);
xint = xint';
AT = rand(4,4894);
beT = ones(4,1);
lb = zeros(4894,1);
ub = ones (4894,1);
ub(4894,1) = Inf;
[x] = intlinprog(cT,xint,AT,beT,[],[],lb,ub);
Total_PF(i,:)= (x(1:length(x)-1)');
end
Now in the minimal working example I provided, all the iterations are quite fast, but in my real code, sometimes intlinprog takes really long time ( I mean hours) to do a single iteration.
Therefore, I was wondering: is there a way to break the intlinprog while the intlinprog line is being executed? I was thinking that it may be done by modifying the matlab function but first of all I do not know if I am allowed to do it, secondly I am afraid that may be very dangerous.
This is very difficult to do effectively.
You could try using a timer object to watch the value. However, since you are using inherent Matlab functions and not executing external functions, you could set a time limit value and utilize tic and toc in a while loop while you execute the intlinprog and checking the value of toc against your time limit and break your code if toc exceeds the limit.

advice with pointers in matlab

I am running a very large meta-simulation where I go through two hyperparameters (lets say x and y) and for each set of hyperparameters (x_i & y_j) I run a modest sized subsimulation. Thus:
for x=1:I
for y=1:j
subsimulation(x,y)
end
end
For each subsimulation however, about 50% of the data is common to every other subsimulation, or subsimulation(x_1,y_1).commondata=subsimulation(x_2,y_2).commondata.
This is very relevant since so far the total simulation results file size is ~10Gb! Obviously, I want to save the common subsimulation data 1 time to save space. However, the obvious solution, being to save it in one place would screw up my plotting function, since it directly calls subsimulation(x,y).commondata.
I was wondering whether I could do something like
subsimulation(x,y).commondata=% pointer to 1 location in memory %
If that cant work, what about this less elegant solution:
subsimulation(x,y).commondata='variable name' %string
and then adding
if(~isstruct(subsimulation(x,y).commondata)),
subsimulation(x,y).commondata=eval(subsimulation(x,y).commondata)
end
What solution do you guys think is best?
Thanks
DankMasterDan
You could do this fairly easily by defining a handle class. See also the documentation.
An example:
classdef SimulationCommonData < handle
properties
someData
end
methods
function this = SimulationCommonData(someData)
% Constructor
this.someData = someData;
end
end
end
Then use like this,
commonData = SimulationCommonData(something);
subsimulation(x, y).commondata = commonData;
subsimulation(x, y+1).commondata = commonData;
% These now point to the same reference (handle)
As per my comment, as long as you do not modify the common data, you can pass it as third input and still not copy the array in memory on each iteration (a very good read is Internal Matlab memory optimizations). This image will clarify:
As you can see, the first jump in memory is due to the creation of common and the second one to the allocation of the output c. If the data were copied on each iteration, you would have seen many more memory fluctuations. For instance, a third jump, then a decrease, then back up again and so on...
Follows the code (I added a pause in between each iteration to make it clearer that no big jumps occur during the loop):
function out = foo(a,b,common)
out = a+b+common;
end
for ii = 1:10; c = foo(ii,ii+1,common); pause(2); end

Timing program execution in MATLAB; weird results

I have a program which I copied from a textbook, and which times the difference in program execution runtime when calculating the same thing with uninitialized, initialized array and vectors.
However, although the program runs somewhat as expected, if running several times every once in a while it will give out a crazy result. See below for program and an example of crazy result.
clear all; clc;
% Purpose:
% This program calculates the time required to calculate the squares of
% all integers from 1 to 10000 in three different ways:
% 1. using a for loop with an uninitialized output array
% 2. Using a for loop with a pre-allocated output array
% 3. Using vectors
% PERFORM CALCULATION WITH AN UNINITIALIZED ARRAY
% (done only once because it is so slow)
maxcount = 1;
tic;
for jj = 1:maxcount
clear square
for ii = 1:10000
square(ii) = ii^2;
end
end
average1 = (toc)/maxcount;
% PERFORM CALCULATION WITH A PRE-ALLOCATED ARRAY
% (averaged over 10 loops)
maxcount = 10;
tic;
for jj = 1:maxcount
clear square
square = zeros(1,10000);
for ii = 1:10000
square(ii) = ii^2;
end
end
average2 = (toc)/maxcount;
% PERFORM CALCULATION WITH VECTORS
% (averaged over 100 executions)
maxcount = 100;
tic;
for jj = 1:maxcount
clear square
ii = 1:10000;
square = ii.^2;
end
average3 = (toc)/maxcount;
% Display results
fprintf('Loop / uninitialized array = %8.6f\n', average1)
fprintf('Loop / initialized array = %8.6f\n', average2)
fprintf('Vectorized = %8.6f\n', average3)
Result - normal:
Loop / uninitialized array = 0.195286
Loop / initialized array = 0.000339
Vectorized = 0.000079
Result - crazy:
Loop / uninitialized array = 0.203350
Loop / initialized array = 973258065.680879
Vectorized = 0.000102
Why is this happening ?
(sometimes the crazy number is on vectorized, sometimes on loop initialized)
Where did MATLAB "find" that number?
That is indeed crazy. Don't know what could cause it, and was unable to reproduce on my own Matlab R2010a copy over several runs, invoked by name or via F5.
Here's an idea for debugging it.
When using tic/toc inside a script or function, use the "tstart = tic" form that captures the output. This makes it safe to use nested tic/toc calls (e.g. inside called functions), and lets you hold on to multiple start and elapsed times and examine them programmatically.
t0 = tic;
% ... do some work ...
te = toc(t0); % "te" for "time elapsed"
You can use different "t0_label" suffixes for each of the tic and toc returns, or store them in a vector, so you preserve them until the end of your script.
t0_uninit = tic;
% ... do the uninitialized-array test ...
te_uninit = toc(t0_uninit);
t0_prealloc = tic;
% ... test the preallocated array ...
te_prealloc = toc(t0_prealloc);
Have the script break in to the debugger when it finds one of the large values.
if any([te_uninit te_prealloc te_vector] > 5)
keyboard
end
Then you can examine the workspace and the return values from tic, which might provide some clues.
EDIT: You could also try testing tic() on its own to see if there's something odd with your system clock, or whatever tic/toc is calling. tic()'s return value looks like a native timestamp of some sort. Try calling it many times in a row and comparing the subsequent values. If it ever goes backwards, that would be surprising.
function test_tic
t0 = tic;
for i = 1:1000000
t1 = tic;
if t1 <= t0
fprintf('tic went backwards: %s to %s\n', num2str(t0), num2str(t1));
end
t0 = t1;
end
On Matlab R2010b (prerelease), which has int64 math, you can reproduce a similar ridiculous toc result by jiggering the reference tic value to be "in the future". Looks like an int rollover effect, as suggested by gary comtois.
>> t0 = tic; toc(t0+999999)
Elapsed time is 6148914691.236258 seconds.
This suggests that if there were some jitter in the timer that toc were using, you might get rollover if it occurs while you're timing very short operations. (I assume toc() internally does something like tic() to get a value to compare the input to.) Increasing the number of iterations could make the effect go away because a small amount of clock jitter would be less significant as part of longer tic/toc periods. Would also explain why you don't see this in your non-preallocated test, which takes longer.
UPDATE: I was able to reproduce this behavior. I was working on some unrelated code and found that on one particular desktop with a CPU model we haven't used before, a Core 2 Q8400 2.66GHz quad core, tic was giving inaccurate results. Looks like a system-dependent bug in tic/toc.
On this particular machine, tic/toc will regularly report bizarrely high values like yours.
>> for i = 1:50000; t0 = tic; te = toc(t0); if te > 1; fprintf('elapsed: %.9f\n', te); end; end
elapsed: 6934787980.471930500
elapsed: 6934787980.471931500
elapsed: 6934787980.471899000
>> for i = 1:50000; t0 = tic; te = toc(t0); if te > 1; fprintf('elapsed: %.9f\n', te); end; end
>> for i = 1:50000; t0 = tic; te = toc(t0); if te > 1; fprintf('elapsed: %.9f\n', te); end; end
elapsed: 6934787980.471928600
elapsed: 6934787980.471913300
>>
It goes past that. On this machine, tic/toc will regularly under-report elapsed time for operations, especially for low CPU usage tasks.
>> t0 = tic; c0 = clock; pause(4); toc(t0); fprintf('Wall time is %.6f seconds.\n', etime(clock, c0));
Elapsed time is 0.183467 seconds.
Wall time is 4.000000 seconds.
So it looks like this is a bug in tic/toc that is related to particular CPU models (or something else specific to the system configuration). I've reported the bug to MathWorks.
This means that tic/toc is probably giving you inaccurate results even when it doesn't produce those insanely large numbers. As a workaround, on this machine, use etime() instead, and time only longer chunks of work to compensate for etime's lower resolution. You could wrap it in your own tick/tock functions that use the for i=1:50000 test to detect when tic is broken on the current machine, use tic/toc normally, and have them warn and fall back to using etime() on broken-tic systems.
UPDATE 2012-03-28: I've seen this in the wild for a while now, and it's highly likely due to an interaction with the CPU's high resolution performance timer and speed scaling, and (on Windows) QueryPerformanceCounter, as described here: http://support.microsoft.com/kb/895980/. It is not a bug in tic/toc, the issue is in the OS features that tic/toc is calling. Setting a boot parameter can work around it.
Here's my theory about what might be happening, based on these two pieces of data I found:
There is a function maxNumCompThreads which controls the maximum number of computational threads used by MATLAB to perform tasks. Quoting the documentation:
By default, MATLAB makes use of the
multithreading capabilities of the
computer on which it is running.
Which leads me to think that perhaps multiple copies of your script are running at the same time.
This newsgroup thread discusses a bug in an older version of MATLAB (R14) "in the way that MATLAB accelerates M-code with global structure variables", which it appears the TIC/TOC functions may use. The solution there was to disable the accelerator using the undocumented FEATURE function:
feature accel off
Putting these two things together, I'm wondering if the multiple versions of your script that are running in the workspace may be simultaneously resetting global variables used by the TIC/TOC functions and screwing one another up. Maybe this isn't a problem when converting your script to a function as Amro did since this would separate the workspaces that the two programs are running in (i.e. they wouldn't both be running in the main workspace).
This could also explain the exceedingly large numbers you get. As gary and Andrew have pointed out, these numbers appear to be due to an integer roll-over effect (i.e. an integer overflow) whereby the starting time (from TIC) is larger than the ending time (from TOC). This would result in a huge number that is still positive because TIC/TOC are internally using unsigned 64-bit integers as time measures. Consider the following possible scenario with two scripts running at the same time on different threads:
The first thread calls TIC, initializing a global variable to a starting time measure (i.e. the current time).
The first thread then calls TOC, and the immediate action the TOC function is likely to make is to get the current time measure.
The second thread calls TIC, resetting the global starting time measure to the current time, which is later than the time just measured by the TOC function for the first thread.
The TOC function for the first thread accesses the global starting time measure to get the difference between it and the measure it previously took. This difference would result in a negative number, except that the time measures are unsigned integers. This results in integer overflow, giving a huge positive number for the time difference.
So, how might you avoid this problem? Changing your scripts to functions like Amro did is probably the best choice, as that seems to circumvent the problem and keeps the workspace from becoming cluttered. An alternative work-around you could try is to set the maximum number of computational threads to one:
maxNumCompThreads(1);
This should keep multiple copies of your script from running at the same time in the main workspace.
There are at least two possible error sources. Can you try to differentiate between 'tic/toc' and 'fprintf' by just looking at the computed values without formatting them.
I don't understand the braces around 'toc' but they shouldn't do any harm.
Here is a hypothesis which is testable. Matlab's tic()/toc() have to be using some high-resolution timer. On Windows, because their return value looks like clock cycles, I think they're using the Win32 QueryPerformanceCounter() call, or maybe something else hitting the CPU's RDTSC time stamp counter. These apparently have glitches on some multiprocessor systems, mentioned in the linked articles. Perhaps your machine is one of those, getting different results if the Matlab process is moved from core to core by the process scheduler.
http://msdn.microsoft.com/en-us/library/ms644904(VS.85).aspx
http://www.virtualdub.org/blog/pivot/entry.php?id=106
This would be hardware and system configuration dependent, which would explain why other posters haven't been able to reproduce it.
Try using Windows Task Manager to set the affinity on your Matlab.exe process to a single CPU. (On the Processes tab, right-click MATLAB.exe, "Set affinity...", un-check all but CPU 0.) If the crazy timing goes away while affinity is set, looks like you found the cause.
Regardless, the workaround looks like to just increase maxcount so you're timing longer pieces of work, and the noise you're apparently getting in tic()/toc() is small compared to the measured value. (You don't want to have to muck around with CPU affinity; Matlab is supposed to be easy to run.) If there's a problem in there that's causing int overflow, the other small positive numbers are a bit suspect too. Besides, hi-res timing in a high level language like Matlab is a bit problematic. Timing workloads down to a couple hundred microseconds subjects them to noise from other transient conditions in your machine's state.

Growable data structure in MATLAB

I need to create a queue in matlab that holds structs which are very large. I don't know how large this queue will get. Matlab doesn't have linked lists, and I'm worried that repeated allocation and copying is really going to slow down this code which must be run thousands of times. I need some sort of way to use a growable data structure. I've found a couple of entries for linked lists in the matlab help but I can't understand what's going on. Can someone help me with this problem?
I posted a solution a while back to a similar problem. The way I tried it is by allocating the array with an initial size BLOCK_SIZE, and then I keep growing it by BLOCK_SIZE as needed (whenever there's less than 10%*BLOCK_SIZE free slots).
Note that with an adequate block size, performance is comparable to pre-allocating the entire array from the beginning. Please see the other post for a simple benchmark I did.
Just create an array of structs and double the size of the array when it hits the limit. This scales well.
If you're worried that repeated allocation and copying is going to slow the code down, try it. It may in fact be very slow, but you may be pleasantly surprised.
Beware of premature optimization.
Well, I found the easy answer:
L = java.util.LinkedList;
I think the built-in cell structure would be suitable for storing growable structures.
I made a comparison among:
Dynamic size cell, size of the cell changes every loop
Pre-allocated cell
Java LinkedList
Code:
clear;
scale = 1000;
% dynamic size cell
tic;
dynamic_cell = cell(0);
for ii = 1:scale
dynamic_cell{end + 1} = magic(20);
end
toc
% preallocated cell
tic;
fixed_cell = cell(1, scale);
for ii = 1:scale
fixed_cell{ii} = magic(20);
end
toc
% java linked list
tic;
linked_list = java.util.LinkedList;
for ii = 1:scale
linked_list.add(magic(20));
end
toc;
Results:
Elapsed time is 0.102684 seconds. % dynamic
Elapsed time is 0.091507 seconds. % pre-allocated
Elapsed time is 0.189757 seconds. % Java LinkedList
I change scale and magic(20) and find the dynamic and pre-allocated versions are very close on speed. Maybe cell only stores pointer-like structures and is efficient on resizing.
The Java way is slower. And I find it sometimes unstable (it crashes my MATLAB when the scale is very large).

Turning a binary matrix into a vector of the last nonzero index in a fast, vectorized fashion

Suppose, in MATLAB, that I have a matrix, A, whose elements are either 0 or 1.
How do I get a vector of the index of the last non-zero element of each column in a faster, vectorized way?
I could do
[B, I] = max(cumsum(A));
and use I, but is there a faster way? (I'm assuming cumsum would cost a bit of time even suming 0's and 1's).
Edit: I guess that I vectorized even more than I need fast - Mr. Fooz' loop is great but each loop in MATLAB seems to cost me a lot in debugging time even if it is fast.
Fast is what you should worry about, not necessarily full vectorization. Recent versions of Matlab are much smarter about handling loops efficiently. If there's a compact vectorized way of expressing something, it's usually faster, but loops should not (always) be feared like they used to be.
clc
A = rand(5000)>0.5;
A(1,find(sum(A,1)==0)) = 1; % make sure there is at least one match
% Slow because it is doing too much work
tic;[B,I1]=max(cumsum(A));toc
% Fast because FIND is fast and it runs the inner loop
tic;
I3=zeros(1,5000);
for i=1:5000
I3(i) = find(A(:,i),1,'last');
end
toc;
assert(all(I1==I3));
% Even faster because the JIT in Matlab is smart enough now
tic;
I2=zeros(1,5000);
for i=1:5000
I2(i) = 0;
for j=5000:-1:1
if A(j,i)
I2(i) = j;
break;
end
end
end
toc;
assert(all(I1==I2));
On R2008a, Windows, x64, the cumsum version takes 0.9 seconds. The loop and find version takes 0.02 seconds. The double loop version takes a mere 0.001 seconds.
EDIT: Which one is fastest depends on the actual data. The double-loop takes 0.05 seconds when you change the 0.5 to 0.999 (because it takes longer to hit the break; on average). cumsum and the loop&find implementation have more consistent speeds.
EDIT 2: gnovice's flipud solution is clever. Unfortunately, on my test machine it takes 0.1 seconds, so it's much faster than cumsum, but slower than the looped versions.
As shown by Mr Fooz, for loops can be pretty fast now with newer versions of MATLAB. However, if you really want to have compact vectorized code, I would suggest trying this:
[B,I] = max(flipud(A));
I = size(A,1)-I+1;
This is faster than your CUMSUM-based answer, but still not quite as fast as Mr Fooz's looping options.
Two additional things to consider:
What results do you want to get for a column that has no ones in it at all? With the above option I gave you, I believe you will get an index of size(A,1) (i.e. the number of rows in A) in such a case. For your option, I believe you will get a 1 in such a case, while the nested-for-loops option from Mr Fooz will give you a 0.
The relative speed of these different options will likely vary based on the size of A and the number of non-zeroes you expect it to have.