clc;clear all;close all;
fileID = fopen('H:\dictionary.txt');
S = textscan(fileID,'%s','Delimiter','\n') ;
fclose(fileID);
S = S{1} ;
% remove empty cells
S = S(~cellfun('isempty',S));
n=length(S);
k=0;
for i=1:n
for j=1:n
k=k+1;
y(k,1)=strcat(S(i),S(j))
end
end
This is my code for sha-1 hashing. where i am getting problem in for loop to generate all possible combinations in line
y(k,1)=strcat(S(i),S(j)).
its running properly. but its taking too long. i have been running this code for 2 days still its not getting over as my dictionary contains over 5000 words. please suggest me some good idea to do faster and some better way to improve and crack it.
Since you did not provide some data to test the code, I created my own test data, which is a cell array containing 400 words:
% create cell array with a lot of words
S = repmat({'lalala','bebebebe','ccececeece','ddededde'},1,100);
Here is the code with some small changes but with huge impact on the performance.
Note that the variable 'y' is here named 'yy' so that you can just copy and paste the code to compare it with your existing code:
% Preallocate memory by specifying variable and cell size
yy = cell(n^2,1);
% Measure time
tic
k = 0;
for i=1:n
for j=1:n
k=k+1;
% Replace strcat() with [] and index cell content with S{i} instead
% of indexing the cell itself with S(i)
yy{k}=[S{i},S{j}];
end
end
% Stop and output time measurement
toc
With my examplary data, your code took 7.78s to run and the improved and proposed code took 0.23s on my computer.
I would recommend to read the Matlab docs about Preallocation.
Related
I am experimenting with MATLAB SPDM. However, I have the following problem to solve:
I am running a quite long algorithm and I would like to save the progress along the way in case the power gets cut, someone unplugs the power plug or memory error.
The loop has 144 iterations that take each around 30 minutes to complete => 72h. A lot of problems can occur in that interval.
Of course, I have the distributed computing toolbox on my machine. The computer has 4 physical cores. I run MATLAB R2016a.
I do not really want to use a parfor loop because I concatenate results and have dependency across iterations. I think SPMD is the best choice for what I want to do.
I'll try to describe what I want as best as I can:
I want to be able to save at a set iteration of the loop the results so far, and I want to save the results by worker.
Below is a Minimum (non)-Working Example. The last four lines should be put in a different .m file. This function, called within a parfor loop, allows to save intermediate iterations. It is working properly in other routines that I use. The error is at line 45 (output_save). Somehow, I would like to "pull" the composite object into a "regular" object (cell/structure).
My hunch is that I do not quite understand how Composite objects work and especially how they can be saved into "regular" objects (cells, structures, etc).
% SPMD MWE
% Clear necessary things
clear output output2 output_temp iter kk
% Useful thing that will be used later on
Rorder=perms(1:4);
% Stem of the file to save the data to
stem='MWE_MATLAB_spmd';
% Create empty cells where the results of the kk loop will be stored
output1{1,1}=[];
output2{1,2}=[];
% Start the parpool
poolobj=gcp;
% Define which worker/lab will do which iteration
iterperworker=ceil(size(Rorder,1)/poolobj.NumWorkers);
for i=1:poolobj.NumWorkers
if i<poolobj.NumWorkers
itertodo{1,i}=1+(iterperworker)*(i-1):iterperworker*i;
else
itertodo{1,i}=1+(iterperworker)*(i-1):size(Rorder,1);
end
end
%Start the spmd
% try
spmd
iter=1;
for kk=itertodo{1,labindex}
% Print which iteration is done at the moment
fprintf('\n');
fprintf('Ordering %d/%d \r',kk,size(Rorder,1));
for j=1:size(Rorder,2)
output_temp(1,j)=Rorder(kk,j).^j; % just to populate a structure
end
output.output1{1,1}=cat(2,output.output1{1,1},output_temp); % Concatenate the results
output.output2{1,2}=cat(2,output.output1{1,2},0.5*output_temp); % Concatenate the results
labindex_save=labindex;
if mod(iter,2)==0
output2.output=output; % manually put output in a structure
dosave(stem,labindex_save,output2); % Calls the function that allows me to save in parallel computing
end
iter=iter+1;
end
end
% catch me
% end
% Function to paste in another m-file
% function dosave(stem,i,vars)
% save(sprintf([stem '%d.mat'],i),'-struct','vars')
% end
A Composite is created only outside an spmd block. In particular, variables that you define inside an spmd block exist as a Composite outside that block. When the same variable is used back inside an spmd block, it is transformed back into the original value. Like so:
spmd
x = labindex;
end
isa(x, 'Composite') % true
spmd
isa(x, 'Composite') % false
isequal(x, labindex) % true
end
So, you should not be transforming output using {:} indexing - it is not a Composite. I think you should simply be able to use
dosave(stem, labindex, output);
I have to read in hundreds of TIFF files, perform some mathematical operation, and output a few things. This is being done for thousands of instances. And the biggest bottleneck is imread. Using PixelRegion, I read in only parts of the file, but it is still very slow.
Currently, the reading part is here.
Can you suggest how I can speed it up?
for m = 1:length(pfile)
if ~exist(pfile{m}, 'file')
continue;
end
pConus = imread(pfile{m}, 'PixelRegion',{[min(r1),max(r1)],[min(c1),max(c1)]});
pEvent(:,m) = pConus(tselect);
end
General Speedup
The pixel region does not appear to change at each iteration. I'm not entirely sure if Matlab will optimize the min and max calls (though I'm pretty sure it won't). If you don't change them at each iteration, move them outside the for loop and calculate them once.
Parfor
The following solution assumes you have access to the parallel computing toolbox. I tested it with 10,840 tiffs, each image was 1000x1000 originally, but I only read in a 300x300 section of them. I am not sure how many big pConus(tselect) is, so I just stored the whole 300x300 image.
P.S. Sorry about the formatting. It refuses to format it as a block of code.
Results based on my 2.3 GHz i7 w/ 16GB of ram
for: 130s
parfor: 26s + time to start pool
% Setup
clear;clc;
n = 12000;
% Would be faster to preallocate this, but negligeble compared to the
% time it takes imread to complete.
fileNames = {};
for i = 1:n
name = sprintf('in_%i.tiff', i);
% I do the exist check here, assuming that the file won't be touched in
% until the program advances a files lines.
if exist(name, 'file')
fileNames{end+1} = name;
end
end
rows = [200, 499];
cols = [200, 499];
pics = cell(1, length(fileNames));
tic;
parfor i = 1:length(fileNames)
% I don't know why using the temp variable is faster, but it is
temp = imread(fileNames{i}, 'PixelRegion', {rows, cols});
pics{i} = temp;
end
toc;
I want to create a frontend where the user can browse pictures forward by pressing Enter.
Pseudo-Code
hFig=figure
nFrames=5;
k=1;
while k < nFrames
u=signal(1*k,100*k,'data.wav'); % 100 length
subplot(2,2,1);
plot(u);
subplot(2,2,2);
plot(sin(u));
subplot(2,2,3);
plot(cos(u));
subplot(2,2,4);
plot(tan(u));
% not necessary but for heading of overal figure
fprintf('Press Enter for next slice\n');
str=sprintf('Slice %d', k);
mtit(hFig, str);
k=k+1;
keyboard
end
function u=signal(a,b,file)
[fs,smplrt]=audioread(file);
u=fs(a:b,1);
end
where
something is wrong in updating the data because pressing CMD+Enter increases k by one but does not update the data. Sometimes (rarely), the data is once the next iteration.
something is wrong with while's condition because k can be bigger than nFrames. keyboard just keep asking for more inputs.
My mistake earlier in Error-Checking
I had earlier a problem where the closure of the window lead to the crash of the application. I include this here because I mentioned a problem about it in the comment of one answer. I avoid the problem now by
hFig=figure;
n=5;
k=1;
while k<nFrames
% for the case, the user closes the window but starts new iteration
if(not(ishandle(hFig)))
hFig=figure;
end
...
end
which creates a new Figure if the earlier was closed by the user.
I tried unsuccessfully putting hFig=figure; inside the while loop's if clause earlier to avoid repetition in the code.
Please, let me know if you know why you cannot have the handle hFig in the while loop's if clause.
How can you loop subplots with updated outputs in Matlab?
To stop the script waiting for an input from the user you should use input instead of keyboard.
Actually keyboard makes your script entering in a debug mode. It stops the executino of the script as (like a breakpoint) allowing the user to, for example, check the value of a variable.
You can modify your scripr as follows (modification are at the end of your script, identified by "UPDATED SECTION):
hFig=figure
nFrames=5;
k=1;
while k < nFrames
u=signal(1*k,100*k,'handel.wav'); % 100 length
subplot(2,2,1);
plot(u);
subplot(2,2,2);
plot(sin(u));
subplot(2,2,3);
plot(cos(u));
subplot(2,2,4);
plot(tan(u));
% not necessary but for heading of overal figure
%
% UPDATED SECTION
%
% Use the string "Press Enter for next slice\n" as the prompt for the
% call to "input"
%
% fprintf('Press Enter for next slice\n');
% str=sprintf('Slice %f', k);
% Use %d instead of "%f" to print integer data
str=sprintf('Slice %d', k);
mtit(hFig, str);
k=k+1;
% Use "input" instead of "keyboard"
% keyboard
input('Press Enter for next slice\n')
end
Hope this helps.
Qapla'
This is more a question to understand a behavior rather than a specific problem.
Mathworks states that numerical are stored continuous which makes preallocation important. This is not the case for cell arrays.
Are they something similar than vector or array of pointers in C++?
This would mean that prealocation is not so important since a pointer is half the size of a double (according to whos - but there surely is overhead somewhere to store the datatype of the mxArray).
Running this code:
clear all
n = 1e6;
tic
A = [];
for i=1:n
A(end + 1) = 1;
end
fprintf('Numerical without preallocation %f s\n',toc)
clear A
tic
A = zeros(1,n);
for i=1:n
A(i) = 1;
end
fprintf('Numerical with preallocation %f s\n',toc)
clear A
tic
A = cell(0);
for i=1:n
A{end + 1} = 1;
end
fprintf('Cell without preallocation %f s\n',toc)
tic
A = cell(1,n);
for i=1:n
A{i} = 1;
end
fprintf('Cell with preallocation %f s\n',toc)
returns:
Numerical without preallocation 0.429240 s
Numerical with preallocation 0.025236 s
Cell without preallocation 4.960297 s
Cell with preallocation 0.554257 s
There is no surprise for the numerical values. But the did surprise me since only the container of the pointers and not the data itself would need reallocation. Which should (since the pointer is smaller than a double) lead to difference of <.2s. Where does this overhead come from?
A related question would be, if I would like to make a data container for heterogeneous data in Matlab (preallocation is not possible since the final size is not known in the beginning). I think handle classes are not good since the also have huge overhead.
already looking forward to learn something
magu_
Edit:
I tried out the linked list proposed by Eitan T but I think the overhead from matlab is still rather big. I tried something with an double array as data (rand(200000,1)).
I made a little plot to illustrate:
code for the graph: (I used the dlnode class from the matlab hompage as stated in the answering post)
D = rand(200000,1);
s = linspace(10,20000,50);
nC = zeros(50,1);
nL = zeros(50,1);
for i = 1:50
a = cell(0);
tic
for ii = 1:s(i)
a{end + 1} = D;
end
nC(i) = toc;
a = list([]);
tic
for ii = 1:s(i)
a.insertAfter(list(D));
end
nL(i) = toc;
end
figure
plot(s,nC,'r',s,nL,'g')
xlabel('#iter')
ylabel('time (s)')
legend({'cell' 'list'})
Don't get me wrong I love the idea of linked list, since there are rather flexible, but I think the overhead might be to big.
Are cell arrays something similar to a vector or an array of pointers in C++?
Cell arrays allow storing data of different types and sizes indeed, but each cell also adds a constant overhead of 112 bytes (see this other answer of mine). This is far more than an 8-byte double, and this is non-negligible, especially when dealing with large cell arrays as in your example.
It is reasonable to assume that a cell array is implemented as a continuous array of pointers, each pointing to the actual content of the cell.
This means that you can modify the content of each cell individually without actually resizing the cell array container itself. However, this also means that adding new cells to the cell array requires dynamic storage allocation and this is why preallocating memory for a cell array improves performance.
A related question would be, if I would like to make a data container for heterogeneous data in Matlab (preallocation is not possible since the final size is not known in the beginning)
Not knowing the final size may indeed be a problem, but you could always preallocate a cell array with the maximum supported size necessary (if there is one), and remove the empty cells in the end. I also suggest that you look into implementing linked lists in MATLAB.
clc
clear all
ii=1;
S =cell(size(30,1)); % cell size.
for ii=1:1:3
rand_id= rand(1,1) *3; % Randomly generte a number between 1 to 3.
if (rand_id<1)
rand_id=1; % 0 is ommitted.
else rand_id=floor(rand_id);
end
% rand_id will be used to open a previously saved file randomly.
if (rand_id==1)
f_id_1=fopen('C1.txt','r'); % Open and read a file.
elseif (rand_id==2)
f_id_1=fopen('C2.txt','r'); % Open and read a file.
end
% saning the file to read the text.
events_1=textscan(f_id_1, '%s', 'Delimiter', '\n');
fclose(f_id_1);
events_1=events_1{1}; % saving the text.
rand_event=events_1{randi(numel(events_1))}; % selects one text randomly.
S{ii}=rand_event;
end
I wrote the above code to randomly select a file. The file contains number of sentences. My aim is to randomly pick a sentence . I did that. Now, my problem is I cant save all the picked sentences inside the loop.
When I declare S(ii)=rand_event It shows error. When I try S(ii)=rand_event(ii) It only returns 1, 2, 3 characters in the three loops.
Please help.
S(ii)
is considered to be a matrix with well defined dimensions. I guess that your 'sentences' have different length. One solution might be to use a cell array.
S{ii}=rand_event
Cell arrays use curly braces.