This question already has answers here:
Measure CPU time usage in Matlab (in milliseconds)
(2 answers)
Closed 6 years ago.
I am measuring the cputime taken by kmeans algorithm for each iteration using cputime feature. However, some iterations return cputime = 0. Here's my implementation:
load fisheriris;
[~,C] = kmeans(meas, 3, 'options',statset('MaxIter', 1),'Display', 'off');
results=[];
for i = 1:15
t=cputime;
[~,C] = kmeans(meas, 3, 'options',statset('MaxIter', 1),'Start',C, 'Display', 'off');
elapsedCPUTime=cputime-t;
results=[results;elapsedCPUTime];
end
This is the results I got for 15 iterations: 0, 0, 0.046875, 0, 0, 0, 0, 0, 0.03125, 0, 0, 0, 0, 0 ,0.03125. My first thought is that the computational time was too quick, hence 0 second. Is it true? If so, how can we achieve more precise cputime?
Thanks a lot.
From the documentation:
To measure performance, it is recommended that you use the timeit or
tic and toc functions. For more information, see Using tic and toc Versus the cputime Function.
Following that link:
Time a significant enough portion of code. Ideally, the code you are timing should take more than 1/10 second to run.
Try running kmeans in a loop.
load fisheriris;
[~,C] = kmeans(meas, 3, 'options',statset('MaxIter', 1),'Display', 'off');
results=[];
for ii = 1:15
t=cputime;
for k = 1:100
[~,Ctemp] = kmeans(meas, 3, 'options',statset('MaxIter', 1),'Start',C, 'Display', 'off');
end
elapsedCPUTime=(cputime-t)/100;
C = Ctemp;
results=[results;elapsedCPUTime];
end
Using i as the loop variable might yield unexpected results, which is why I changed it to ii.
Related
I am learning Matlab and now using the function chirp.
freq = 1/11025; duration = 1.5; c = 0:freq:duration;
y = chirp(c,0,150,duration)
The problem is, that it doesn't stop at 1.5. Instead it stops at 1.65 . But I don't know why.
Your interpretation of the chirp() function is not correct. Here is how you can create a fully customizable chirp function via the dsp.Chirp:
hChirp = dsp.Chirp(...
'TargetFrequency', 10, ...
'InitialFrequency', 0,...
'TargetTime', 10, ...
'SweepTime', 10, ...
'SamplesPerFrame', 10000, ...
'SampleRate', 1000);
plot(hChirp()); set(gcf, 'color', 'w'), grid on;
title('Chirp to 10 Hz')
Which gives the following output in this example:
You can refer to the documentation for further detail. This should be a more rigorous way of defining your signal.
I'm trying to solve a cone program in Matlab by calling MOSEK while varying the bound on one of the constraints.
I would like to do so in parallel to take advantage of all the cores that I have. Here is a modified example to illustrate my point.
testBounds=[0.1, 0.15, 0.2, 0.25, 0.3];
clear prob;
[r, res] = mosekopt('symbcon');
prob.c = [0 0 0 1 1 1];
% Specify the non-conic part of the problem.
prob.a = sparse([1 1 2 0 0 0]);
prob.buc = 1;
prob.blx = [0 0 0 -inf -inf -inf];
prob.bux = inf*ones(6,1);
% Specify the cones.
prob.cones.type = [res.symbcon.MSK_CT_QUAD, res.symbcon.MSK_CT_RQUAD];
prob.cones.sub = [4, 1, 2, 5, 6, 3];
prob.cones.subptr = [1, 4];
for i=1:5
% Specify the changing bound
prob.blc = testBounds(i);
% Optimize the problem.
[r,res]=mosekopt('minimize',prob);
% Display the primal solution.
res.sol.itr.xx'
end
I tried to do this with parfor, but it isn't permitted. Unfortunately, MOSEK documentation doesn't go into detail about parallizing. How can I carry out the above in parallel?
The problem with your code is your use of the variable prob. While on an algorithmic level it is independent because each iteration of the loop uses it's own setting for blc and not uses any previous data, parfor does not support this use. Easiest solution is not to modify the variable prob but instead copy it in each iteration, making prob a broadcast and prob2 a local variable:
parfor ii=1:5
% Specify the changing bound
%copy broadcast variable prob to a temporary variable prob2
%this way the iteration has writing capabilities
prob2=prob
prob2.blc = testBounds(ii);
% Optimize the problem.
[r,res]=mosekopt('minimize',prob2);
% Display the primal solution.
res.sol.itr.xx'
end
Another issue with your code is the way you are returning data. parfor has no order when processing the data, thus just displaying it to the console will not give you any useful results. It's also slow. I don't know what exactly you need and what the datatypes are, thus I have not touched that part of the code. The code in my answer does the calculations but is not returning any results because res and r are both temporary variables.
N=5;
r = cell(1,N);
res = cell(1,N);
for ii=1:N
% Specify the changing bound
prob2=prob;
prob2.blc = testBounds(ii);
% Optimize the problem.
[r{ii},res{ii}]=mosekopt('minimize',prob2);
% Display the primal solution.
res{ii}.sol.itr.xx' %'// better not display at all during calculation
end
parfor does not allow for creation of variables within its bounds. Therefore I elected to preallocate r and res as cells, which can act the output storage. See the prob/ prob2 issue in #Daniel's answer
yesterday I implemented my first bootstrap in MATLab. (and yes, I know, for loops are evil.):
%data is an mxn matrix where the data should be sampled per column but there
can be a NaNs Elements
%from the array (a column of data) n values are sampled nReps times
function result = bootstrap_std(data, n, nReps,quantil)
result = zeros(1,size(data,2));
for i=1:size(data,2)
bootstrap_data = zeros(n,nReps);
values = find(~isnan(data(:,i)));
if isempty(values)
bootstrap_data(:,:) = NaN;
else
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
end
stat = zeros(1,nReps);
for k=1:nReps
stat(k) = nanstd(bootstrap_data(:,k));
end
sort(stat);
result(i) = quantile(stat,quantil);
end
end
As one can see, this version works columnwise. The algorithm does what it should but is really slow when the data size increaes. My question is now: Is it possible to implement this logic without using for loops? My problem is here that I could not find a version of datasample which does the sampling columnwise. Or is there a better function to use?
I am happy for any hint or idea how I can speed up this implementation.
Thanks and best regards!
stephan
The bottlenecks in your implementation are
The function spends a lot of time inside nanstd which is unnecessary since you exclude NaN values from your sample anyway.
There are a lot of functions that operate column-wise, but you spend time looping over the columns and calling them many times.
You make many calls to datasample which is a relatively slow function. It's much faster to create a random vector of indices using randi and use that instead.
Here's how I would write the function (actually I probably wouldn't put in this many comments, and I wouldn't use so many temp variables, but I'm doing it now so you can see what all the steps of the computation are).
function result = bootstrap_std_new(data, n, nRep, quantil)
result = zeros(1, size(data,2));
for i = 1:size(data,2)
isbad = isnan(data(:,i)); %// Vector of NaN values
if all(isbad)
result(i) = NaN;
else
data0 = data(~isbad, i); %// Temp copy of this column for indexing
index = randi(size(data0,1), n, nRep); %// Create the indexing vector
bootstrapdata = data0(index); %// Sample the data
stdevs = std(bootstrapdata); %// Stdev of sampled data
result(i) = quantile(stdevs, quantil); %// Find the correct quantile
end
end
end
Here are some timings
>> data = randn(100,10);
>> data(randi(1000, 50, 1)) = NaN;
>> tic, bootstrap_std(data, 50, 1000, 0.5); toc
Elapsed time is 1.359529 seconds.
>> tic, bootstrap_std_new(data, 50, 1000, 0.5); toc
Elapsed time is 0.038558 seconds.
So this gives you about a 35x speedup.
Your main issue seems to be that you may have varying numbers/positions of NaN in each column, so can't work on the full matrix unless you're okay with also sampling NaNs. However, some of the inner loops could be simplified.
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
Since you're sampling with replacement, you should be able to just do:
bootstrap_data = datasample(data(values,i), n*nReps);
bootstrap_data = reshape(bootstrap_data, [n nReps]);
Also nanstd can work on a full matrix so no need to loop:
stat = nanstd(bootstrap_data); % or nanstd(x,0,2) to change dimension
It would also be worth just looking over your code with profile to see where the bottlenecks are.
I have a problem with training a model using the PASCAL dev kit with the Discriminatively trained deformable part model system developed by Felzenszwalb, D. McAllester, D. Ramaman and his team which is implemented in Matlab.
Currently I have this output error when I tried to train a 1-component model for 'cat' using 10 positive and 10 negative images.
Error:
??? Index exceeds matrix dimensions.
Error in ==> pascal_train at 48
models{i} = train(cls, models{i}, spos{i}, neg(1:maxneg),
0, 0, 4, 3, ...
Error in ==> pascal at 28
model = pascal_train(cls, n, note);
And this is the pascal_train file
function model = pascal_train(cls, n, note)
% model = pascal_train(cls, n, note)
% Train a model with 2*n components using the PASCAL dataset.
% note allows you to save a note with the trained model
% example: note = 'testing FRHOG (FRobnicated HOG)
% At every "checkpoint" in the training process we reset the
% RNG's seed to a fixed value so that experimental results are
% reproducible.
initrand();
if nargin < 3
note = '';
end
globals;
[pos, neg] = pascal_data(cls, true, VOCyear);
% split data by aspect ratio into n groups
spos = split(cls, pos, n);
cachesize = 24000;
maxneg = 200;
% train root filters using warped positives & random negatives
try
load([cachedir cls '_lrsplit1']);
catch
initrand();
for i = 1:n
% split data into two groups: left vs. right facing instances
models{i} = initmodel(cls, spos{i}, note, 'N');
inds = lrsplit(models{i}, spos{i}, i);
models{i} = train(cls, models{i}, spos{i}(inds), neg, i, 1, 1, 1, ...
cachesize, true, 0.7, false, ['lrsplit1_' num2str(i)]);
end
save([cachedir cls '_lrsplit1'], 'models');
end
% train root left vs. right facing root filters using latent detections
% and hard negatives
try
load([cachedir cls '_lrsplit2']);
catch
initrand();
for i = 1:n
models{i} = lrmodel(models{i});
models{i} = train(cls, models{i}, spos{i}, neg(1:maxneg), 0, 0, 4, 3, ...
cachesize, true, 0.7, false, ['lrsplit2_' num2str(i)]);
end
save([cachedir cls '_lrsplit2'], 'models');
end
% merge models and train using latent detections & hard negatives
try
load([cachedir cls '_mix']);
catch
initrand();
model = mergemodels(models);
48: model = train(cls, model, pos, neg(1:maxneg), 0, 0, 1, 5, ...
cachesize, true, 0.7, false, 'mix');
save([cachedir cls '_mix'], 'model');
end
% add parts and update models using latent detections & hard negatives.
try
load([cachedir cls '_parts']);
catch
initrand();
for i = 1:2:2*n
model = model_addparts(model, model.start, i, i, 8, [6 6]);
end
model = train(cls, model, pos, neg(1:maxneg), 0, 0, 8, 10, ...
cachesize, true, 0.7, false, 'parts_1');
model = train(cls, model, pos, neg, 0, 0, 1, 5, ...
cachesize, true, 0.7, true, 'parts_2');
save([cachedir cls '_parts'], 'model');
end
save([cachedir cls '_final'], 'model');
I have highlighted the line of code where the error occurs at line 48.
I am pretty sure that the system is reading in both the positive and negative images for training correctly. I have no idea where this error is occurring since matlab does not indicate precisely which index is exceeding the matrix dimensions.
I have tried to tidy up the code as much as possible do guide me if I have done wrong somewhere.
Any suggestions where I should start looking at?
Ok, I tried with the use of display to check the variables in use for pascal_train;
disp(i);
disp(size(models));
disp(size(spos));
disp(length(neg));
disp(maxneg);
So the results returned were;
1
1 1
1 1
10
200
Just replace:
models{i} = train(cls, models{i}, spos{i}, neg(1:maxneg),
as
models{i} = train(cls, models{i}, spos{i}, neg(1:min(length(neg),maxneg)),
there are several similar sentences at other place in this script, you should revise them all.
The reason is that your train sample set is small, so you list 'neg' is short than maxneg(200)
I don't have an answer to your question, but here is a suggestion that might help you debug this problem yourself.
In the Matlab menu go to Debug-> Stop if Errors/Warnings ... and select "Always stop if error (dbstop if error)". Now run your script again and this time when you get the error, matlab will stop at the line where the error occurred as if you put a breakpoint there. At that point you have the whole workspace at your disposal and you can check all variables and matrix sizes to see which variable is giving you the error you are seeing.
Given a MATLAB uint32 to be interpreted as a bit string, what is an efficient and concise way of counting how many nonzero bits are in the string?
I have a working, naive approach which loops over the bits, but that's too slow for my needs. (A C++ implementation using std::bitset count() runs almost instantly).
I've found a pretty nice page listing various bit counting techniques, but I'm hoping there is an easy MATLAB-esque way.
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
Update #1
Just implemented the Brian Kernighan algorithm as follows:
w = 0;
while ( bits > 0 )
bits = bitand( bits, bits-1 );
w = w + 1;
end
Performance is still crappy, over 10 seconds to compute just 4096^2 weight calculations. My C++ code using count() from std::bitset does this in subsecond time.
Update #2
Here is a table of run times for the techniques I've tried so far. I will update it as I get additional ideas/suggestions.
Vectorized Scheiner algorithm => 2.243511 sec
Vectorized Naive bitget loop => 7.553345 sec
Kernighan algorithm => 17.154692 sec
length( find( bitget( val, 1:32 ) ) ) => 67.368278 sec
nnz( bitget( val, 1:32 ) ) => 349.620259 sec
Justin Scheiner's algorithm, unrolled loops => 370.846031 sec
Justin Scheiner's algorithm => 398.786320 sec
Naive bitget loop => 456.016731 sec
sum(dec2bin(val) == '1') => 1069.851993 sec
Comment: The dec2bin() function in MATLAB seems to be very poorly implemented. It runs extremely slow.
Comment: The "Naive bitget loop" algorithm is implemented as follows:
w=0;
for i=1:32
if bitget( val, i ) == 1
w = w + 1;
end
end
Comment:
The loop unrolled version of Scheiner's algorithm looks as follows:
function w=computeWeight( val )
w = val;
w = bitand(bitshift(w, -1), uint32(1431655765)) + ...
bitand(w, uint32(1431655765));
w = bitand(bitshift(w, -2), uint32(858993459)) + ...
bitand(w, uint32(858993459));
w = bitand(bitshift(w, -4), uint32(252645135)) + ...
bitand(w, uint32(252645135));
w = bitand(bitshift(w, -8), uint32(16711935)) + ...
bitand(w, uint32(16711935));
w = bitand(bitshift(w, -16), uint32(65535)) + ...
bitand(w, uint32(65535));
I'd be interested to see how fast this solution is:
function r = count_bits(n)
shifts = [-1, -2, -4, -8, -16];
masks = [1431655765, 858993459, 252645135, 16711935, 65535];
r = n;
for i=1:5
r = bitand(bitshift(r, shifts(i)), masks(i)) + ...
bitand(r, masks(i));
end
Going back, I see that this is the 'parallel' solution given on the bithacks page.
Unless this is a MATLAB implementation exercise, you might want to just take your fast C++ implementation and compile it as a mex function, once per target platform.
EDIT: NEW SOLUTION
It appears that you want to repeat the calculation for every element in a 4096-by-4096 array of UINT32 values. If this is what you are doing, I think the fastest way to do it in MATLAB is to use the fact that BITGET is designed to operate on matrices of values. The code would look like this:
numArray = ...your 4096-by-4096 matrix of uint32 values...
w = zeros(4096,4096,'uint32');
for iBit = 1:32,
w = w+bitget(numArray,iBit);
end
If you want to make vectorized versions of some of the other algorithms, I believe BITAND is also designed to operate on matrices.
The old solution...
The easiest way I can think of is to use the DEC2BIN function, which gives you the binary representation (as a string) of a non-negative integer:
w = sum(dec2bin(num) == '1'); % Sums up the ones in the string
It's slow, but it's easy. =)
Implemented the "Best 32 bit Algorithm" from the Stanford link at the top.
The improved algorithm reduced processing time by 6%.
Also optimized the segment size and found that 32K is stable and improves time by 15% over 4K.
Expect 4Kx4K time to be 40% of Vectorized Scheiner Algorithm.
function w = Ham(w)
% Input uint32
% Output vector of Ham wts
for i=1:32768:length(w)
w(i:i+32767)=Ham_seg(w(i:i+32767));
end
end
% Segmentation gave reduced time by 50%
function w=Ham_seg(w)
%speed
b1=uint32(1431655765);
b2=uint32(858993459);
b3=uint32(252645135);
b7=uint32(63); % working orig binary mask
w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
w =bitand(w+bitshift(w, -4),b3);
w =bitand(bitshift(w,-24)+bitshift(w,-16)+bitshift(w,-8)+w,b7);
end
Did some timing comparisons on Matlab Cody.
Determined a Segmented Modified Vectorized Scheiner gives optimimum performance.
Have >50% time reduction based on Cody 1.30 sec to 0.60 sec change for an L=4096*4096 vector.
function w = Ham(w)
% Input uint32
% Output vector of Ham wts
b1=uint32(1431655765); % evaluating saves 15% of time 1.30 to 1.1 sec
b2=uint32(858993459);
b3=uint32(252645135);
b4=uint32(16711935);
b5=uint32(65535);
for i=1:4096:length(w)
w(i:i+4095)=Ham_seg(w(i:i+4095),b1,b2,b3,b4,b5);
end
end
% Segmentation reduced time by 50%
function w=Ham_seg(w,b1,b2,b3,b4,b5)
% Passing variables or could evaluate b1:b5 here
w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
w = bitand(bitshift(w, -4), b3) + bitand(w, b3);
w = bitand(bitshift(w, -8), b4) + bitand(w, b4);
w = bitand(bitshift(w, -16), b5) + bitand(w, b5);
end
vt=randi(2^32,[4096*4096,1])-1;
% for vt being uint32 the floor function gives unexpected values
tic
v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
toc
% a corrected method is
v=num_ones(mod(vt,65536)+1)+num_ones(floor(double(vt)/65536)+1);
toc
A fast approach is counting the bits in each byte using a lookup table, then summing these values; indeed, it's one of the approaches suggested on the web page given in the question. The nice thing about this approach is that both lookup and sum are vectorizable operations in MATLAB, so you can vectorize this approach and compute the hamming weight / number of set bits of a large number of bit strings simultaneously, very quickly. This approach is implemented in the bitcount submission on the MATLAB File Exchange.
Try splitting the job into smaller parts. My guess is that if you want to process all data at once, matlab is trying to do each operation on all integers before taking successive steps and the processor's cache is invalidated with each step.
for i=1:4096,
«process bits(i,:)»
end
I'm reviving an old thread here, but I ran across this problem and I wrote this little bit of code for it:
distance = sum(bitget(bits, 1:32));
Looks pretty concise, but I'm scared that bitget is implemented in O(n) bitshift operations. The code works for what I'm going, but my problem set doesn't rely on hamming weight.
num_ones=uint8(zeros(intmax('uint32')/2^6,1));
% one time load of array not implemented here
tic
for i=1:4096*4096
%v=num_ones(rem(i,64)+1)+num_ones(floor(i/64)+1); % 1.24 sec
v=num_ones(mod(i,64)+1)+num_ones(floor(i/64)+1); % 1.20 sec
end
toc
tic
num_ones=uint8(zeros(65536,1));
for i=0:65535
num_ones(i+1)=length( find( bitget( i, 1:32 ) ) ) ;
end
toc
% 0.43 sec to load
% smaller array to initialize
% one time load of array
tic
for i=1:4096*4096
v=num_ones(mod(i,65536)+1)+num_ones(floor(i/65536)+1); % 0.95 sec
%v=num_ones(mod(i,65536)+1)+num_ones(bitshift(i,-16)+1); % 16 sec for 4K*1K
end
toc
%vectorized
tic
num_ones=uint8(zeros(65536,1));
for i=0:65535
num_ones(i+1)=length( find( bitget( i, 1:32 ) ) ) ;
end % 0.43 sec
toc
vt=randi(2^32,[4096*4096,1])-1;
tic
v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
toc