Profiler inconsistency - matlab

I run some code with Profiler, the code deals with DICOM files, and the dicom... functions in Matlab.
In the main window of the Profiler, I see that dicominfo>parseSequence takes almost all of the running time. Inside this inner function, it seems like that:
You can see here that dicominfo>parseSequence takes total of 241.488 seconds (in the title), and inside it there is a function called dicominfo>processMetadata, that takes 240.801 seconds, 99.7% from the time.
But, when I click it to see its contents, it says that the processMetadata takes only 50.391 seconds! :
How can it be...? Where all my time goes...?
EDIT
I really think that this is a Profiler issue, but for #Tokkot ask, I attach the piece of code that I profiled:
cd ([Fname '\' seq(m).name]);
files=dir; % Names of all files in current sequence
for n=4:length(files)
info=dicominfo([pwd '\' files(n).name]);
info.PatientName=PatientName; % convert the field Name to initials
info.PatientID=''; % delete ID
[X,~]=dicomread([pwd '\' files(n).name]);
dicomwrite(X, sprintf('anon%s', files(n).name), info, 'createmode', 'copy');
delete([pwd '\' files(n).name]);
end

Related

How can I make a saving code faster? -MatLab

I'm running a short code to open one by one a list of files and saving back only one of the variables contained in the files. The process seems to me much slower than I expected and getting slower with time, I don't fully understand why and how I could make it run faster. I always struggle with optimization. I'd appreciate if you have suggestions.
The code is the following (the ... substitute the actual path just for example):
main_dir=dir(strcat('\\storage2-...\Raw\DAQ5\'));
filename={};
for m=7:size(main_dir,1)
m
second_dir=dir([main_dir(m).folder '\' main_dir(m).name '\*.mat']);
for mm=1:numel(second_dir)
filename{end+1}=[second_dir(mm).folder '\' second_dir(mm).name];
for mmm=1:numel(filename)
namefile=sprintf(second_dir(mm,1).name);
load(string(filename(1,mmm)));
save(['\\storage2-...\DAQ5\Ch1_',namefile(end-18:end-4),'.mat'], 'Ch_1_y')
end
end
end
The original file is about 17 MB and once the single variable is saved it is about 6 MB in size.
The Matlab load function takes an optional additional argument to specify just a selected variable to read from the input file.
s = load('path/to/file.mat', 'Ch_1_y');
That way you don't have to spend time loading in all the other variables from those input .mat files that you're just going to immediately throw away.
And using save to save MAT-files over SMB shares can be slow. You might want to call save to write it to a temporary local file first, and then copy the completed file to the final destination. Sounds like more I/O, but it can actually be a net win, depending on your particular system and network. Measure it both ways to see if it's a win in your particular situation.

How to generate new numbers if i run the program everytime from the begining?

I am working on CATScript in optimization of a part.
When I run the script everytime it shoud provide numbers in ascending order.
For example if I run the program for the first time it should provide the output as " 1 "
and if I run the program again it shoud provide the output as " 2 " and so on.
I am stuck with this and I could not figure out th logic that we have to use here.
Looking forward for your help.
Thank you!!
An option (matlab based) could be to save a counter variable to a .mat-file at the end of the script, which is then loaded again at the beginning of the script.
That would allow you to keep track of how many times the script have been run.
In CATIA if it is being run multiple times on the same part/product, you could add a hidden, integer parameter to the specification tree and increment it each time the macro is run.
Another, more generic way would be to create a text file on the user's local and update the number in the text file.

fopen error - works for a while but then gives an error

I have a script that is running a series of for loops, and within these for loops a file is created that is then run using an external program using the script command. In summary it looks like this:
for i=1:n1
for j=1:n2
for k=1:n3
fid=fopen('file.txt','w');
fprintf(fid,'Some commands to pass to external program depending on i j k');
fclose(fid);
system('program file.txt');
end
end
end
The script has in total about 500k cases (n1xn2xn3), and runs fine for a small scenario (about 100 runs), but for the entire script it runs for a while and then returns an error for no apparent reason, giving this error:
fopen invalid file identifier object
There is no obvious reason for this, and Im wondering if anyone could point out what is wrong?
Just a guess: an instance of your external program is reading file.txt and at the same time the next iteration of your nested loop wants to open file.txt for writing. The more instances of your external program are running at the same time, the slower your machine, the more likely becomes this scenario. (called a 'race condition')
Possible solution for this: use a separate text file per case with a unique file name
You should also consider using other ways to call your external function because file handling for 500k cases should be very slow.
Hope that helps,
Eli

Can I change the script from one Matlab session while the other one is running that script?

I have two Matlab sessions runs parallel.
To be handy, I just change the parameters that are hard-coded into the scripts for each run.
So my question is, can I change the script when the first Matlab session is running that script? After I changed and saved that very script, will the first Matlab session run according to the original version of the script?
I have multiple scripts that call each other. Will it be more complicated in this situation?
If the answer is YES, it will appear to me that for each run, Matlab will make a ad-hoc copy of all the scripts and run that copy regardless of the hard-disk changes.
MATLAB's first step after you press "run" is to parse all the script/function's M-code and all of its dependencies into something akin to "byte code". That means that whatever MATLAB is running, is entirely in memory and thus not coupled anymore to what's in the M-file(s).
Therefore, you may indeed use another MATLAB session to change parameters in an M-file, save it, and run it in the new session, without affecting what the outcomes of the first session are.
Be sure to save or print the values of those variables though; working this way is a sure way to forget what values of those parameters belong to which session again :)
Note that this is NOT true for:
data files, or other files explicitly read during runtime
MEX files
A better workflow would be to convert those scripts into modular functions that receive configurable parameters as input, as opposed to hardcoding the values in the code.
That way you call the same function in each MATLAB session without making any changes to the M-files, only each session passes different input arguments as needed.
To learn more about how MATLAB detects changes in M-files, run the following:
>> help changeNotification
>> help changeNotificationAdvanced
You also might also wanna read about the following functions: rehash and clear functions
EDIT:
One way to find out which scripts/functions are currently "loaded in memory" is to use inmem. Say for example we have the following script saved in a file available on the path (the same works for functions):
testScript.m
x = 10;
disp(x)
Now starting with a clean session, the script is initially not loaded. After calling the script, the file is loaded and remains in memory:
% initially not loaded
>> ismember('testScript', inmem())
ans =
0
% execute script
>> testScript
10
% file is cached in memory
>> ismember('testScript', inmem())
ans =
1
Immediately continuing with the same session, make an edit to the file (for example change x to 99). By checking the list of loaded functions/scripts again, you will see that MATLAB has already detected the change, and invalidated the cached version by removing it from memory:
>> % .. make changes to testScript.m file
% file is automatically unloaded
>> ismember('testScript', inmem())
ans =
0
% execute the new script
>> testScript
99
% the result is cached once more
>> ismember('testScript', inmem())
ans =
1
I tested the above on my Windows machine, but I can't guarantee this behavior is cross-platform, you'll have to test it on Mac/Linux and see if works the same...
The script can definately be altered without influencing an ongoing run. However, if your flow gets more complicated it can be problematic to depend on what will happen:
Here are some flows you will not likely want:
main1 calls sub
sub is edited
main2 calls sub
main1 continues to run and calls sub for a second time
In the above case I would expect the second run of main1 to be calling the altered version of sub, but I would not depend on it.
main1 calls sub
sub is edited
sub called by main1 hits a breakpoint
I am not even sure what will happen, but I believe that you will stop on the original line, but will see the edited code. So the line you find may not even be the line with the breakpoint anymore.
So to conclude: Don't alter your script frequently to change the output, rather give it inputs that will determine the output.

In MATLAB exist( x, 'file' ) takes forever

I am using exist(x, 'file') to check for the existence of a file on my machine. The execution of this command takes FOREVER (over 10 seconds per call!).
My matlabpath is not too long (about 200 entries) and all folders on path are on my local drive (no network).
Why does exist takes forever?
Is there a way to make it run FASTER?
PS,
This call to exist is part of Matlab's execution of loadlibrary. So, if you are calling loadlibrary and you don't know why it takes forever - this question is also for you.
Here's one idea. You could put the directory containing those header files up at the front of the MATLAB path, so when exist() goes looking through the path, it finds them quickly and doesn't have to search through the rest of the entries. If it's spending its time stepping through your path, that may help.
Wow! That was a tough one. Bottom line: Delete %TEMP% files!
I had a few thousands files lying around in %TEMP%. It appears MATLAB really likes to go over and over the TEMP directory.
After clearing the TEMP folder, exist runs in no time!
(Thanks Andrew for the Process Monitor advice!)
exist is a built in Matlab function. It is designed to check existence of other types of objects (such as variables in Matlab) as well as files. Being a built in function, it's not a simple to see how it is coded. At least on Windows, when you call exist('filename','file') it seemingly only makes one API call to the operating system to check the file existence. So either the operating system is taking a long time, or there is some bloat in the exist function making it run slowly. See the solutions from the other posters for ideas on how to make the operating system return its result more quickly
People sometimes complain that running exist('filename','file') in a loop makes the loop very slow, this is due the call taking perhaps milliseconds and looping over a few thousand times. The solution here is to replace
if exist('filename','file')
% your code
with the line
if java.io.File('filename').exists
% your code
For 372 files
Matlab: Elapsed time is 40.207266 seconds. (get a cup of thee)
Java: Elapsed time is 0.122165 seconds. (eye blinking)