I am using exist(x, 'file') to check for the existence of a file on my machine. The execution of this command takes FOREVER (over 10 seconds per call!).
My matlabpath is not too long (about 200 entries) and all folders on path are on my local drive (no network).
Why does exist takes forever?
Is there a way to make it run FASTER?
PS,
This call to exist is part of Matlab's execution of loadlibrary. So, if you are calling loadlibrary and you don't know why it takes forever - this question is also for you.
Here's one idea. You could put the directory containing those header files up at the front of the MATLAB path, so when exist() goes looking through the path, it finds them quickly and doesn't have to search through the rest of the entries. If it's spending its time stepping through your path, that may help.
Wow! That was a tough one. Bottom line: Delete %TEMP% files!
I had a few thousands files lying around in %TEMP%. It appears MATLAB really likes to go over and over the TEMP directory.
After clearing the TEMP folder, exist runs in no time!
(Thanks Andrew for the Process Monitor advice!)
exist is a built in Matlab function. It is designed to check existence of other types of objects (such as variables in Matlab) as well as files. Being a built in function, it's not a simple to see how it is coded. At least on Windows, when you call exist('filename','file') it seemingly only makes one API call to the operating system to check the file existence. So either the operating system is taking a long time, or there is some bloat in the exist function making it run slowly. See the solutions from the other posters for ideas on how to make the operating system return its result more quickly
People sometimes complain that running exist('filename','file') in a loop makes the loop very slow, this is due the call taking perhaps milliseconds and looping over a few thousand times. The solution here is to replace
if exist('filename','file')
% your code
with the line
if java.io.File('filename').exists
% your code
For 372 files
Matlab: Elapsed time is 40.207266 seconds. (get a cup of thee)
Java: Elapsed time is 0.122165 seconds. (eye blinking)
Related
I'm running a short code to open one by one a list of files and saving back only one of the variables contained in the files. The process seems to me much slower than I expected and getting slower with time, I don't fully understand why and how I could make it run faster. I always struggle with optimization. I'd appreciate if you have suggestions.
The code is the following (the ... substitute the actual path just for example):
main_dir=dir(strcat('\\storage2-...\Raw\DAQ5\'));
filename={};
for m=7:size(main_dir,1)
m
second_dir=dir([main_dir(m).folder '\' main_dir(m).name '\*.mat']);
for mm=1:numel(second_dir)
filename{end+1}=[second_dir(mm).folder '\' second_dir(mm).name];
for mmm=1:numel(filename)
namefile=sprintf(second_dir(mm,1).name);
load(string(filename(1,mmm)));
save(['\\storage2-...\DAQ5\Ch1_',namefile(end-18:end-4),'.mat'], 'Ch_1_y')
end
end
end
The original file is about 17 MB and once the single variable is saved it is about 6 MB in size.
The Matlab load function takes an optional additional argument to specify just a selected variable to read from the input file.
s = load('path/to/file.mat', 'Ch_1_y');
That way you don't have to spend time loading in all the other variables from those input .mat files that you're just going to immediately throw away.
And using save to save MAT-files over SMB shares can be slow. You might want to call save to write it to a temporary local file first, and then copy the completed file to the final destination. Sounds like more I/O, but it can actually be a net win, depending on your particular system and network. Measure it both ways to see if it's a win in your particular situation.
I'm using MATLAB and calling an .exe via the system command.
[status,cmdout] = system(command_s);
where command_s is a command string that is formatted earlier in my script to pass all the desired options to the .exe. The .exe would normally write to a .csv file via the > redirection operator in Windows/DOS. Instead, this output is going to cmdout where I use it later in the MATLAB script. It is working correctly and as expected. I'm doing it this way so that the process just uses memory and does not write a very large file to the disk, which would then have to be read from the disk and then deleted after I'm done with it. In the end, it saves a .mat file that's usually in hundreds of KB instead of 10s/100s of MBs as the .csv file would be (some unneeded data is thrown out in the end).
The issue I'm having is since I'm dealing with large files, the executable can take a significant amount of time. I typically have to wait about 2 minutes after executing this command. In the meantime, I have no feedback to know it is progressing and that my system hasn't froze. I know I could add the & symbol to the end of my string, command_s, and run MATLAB code while this is running in the background (or asynchronously as some would say), but that brings up an external window AND makes cmdout empty - so I cannot use the output - forcing me to sit there for 2 minutes wondering each time it executes.
Is there any way to run in the background AND get the stdout from the command?
Maybe you could try system(command_s,'-echo')?
My problem is as described. My script downloads files through an external call to cmd (using the system function and then .NET to make keypresses). The issue is that when it tries to fopen these files I downloaded (filenames from a text file I write as I download), it doesn't find them, causing an error. When I run the script again after seeing it fail, it works but only up to the point where it's trying to download/call new files again, where it runs into the same problem.
Are new files downloaded during when a script is running somehow not visible to the search path? Because the folder is most definitely in my search path (seeing as it works outside of during-script downloads). It's not that it isn't getting the files fast enough either, cause they appear in my folder almost instantly, and I've tried a delay to allow for it to recognize it, but that didn't work either.
I'm not sure if it's important to note that the script calls an external function which tries to read the files from the .txt list I create in the main script.
Any ideas?
The script to download the files looks like so:
NET.addAssembly('System.Windows.Forms');
sendkey = #(strkey) System.Windows.Forms.SendKeys.SendWait(strkey);
system('start cygwinbatch.bat')
pause(.1)
sendkey(callStr1)
sendkey('{ENTER}')
pause(.1)
sendkey(callStr2)
sendkey('{ENTER}')
pause(.1)
sendkey('exit')
pause(2)
sendkey('{ENTER}')
But that is not the main reason I am asking: I am confident that the downloads are occurring when the script calls them, because I see them appearing in my folder as it called. I am more confused as to why MATLAB doesn't seem to know they are there while the script is running, and I have to stop it and run it again for it to recognize the ones I've downloaded already.
Thank you,
Aaron
The answer here is probably to run the 'rehash' function. Matlab does not look for new files while executing an operation, and in some environments misses new files even during interactive activity.
Running the rehash function forces Matlab to search through its full path and determine if there are any new files.
I've never tried to run rehash in the middle of an operation though. ...
My guess is that the MATLAB interpreter is trying to look ahead and is throwing errors based on a snapshot of what the filesystem looked like before the files were downloaded. Do you get different behavior if you run it one line at a time using F9? If that's the case then you may be able to prevent the interpreter from looking ahead by using eval().
I have a script that is running a series of for loops, and within these for loops a file is created that is then run using an external program using the script command. In summary it looks like this:
for i=1:n1
for j=1:n2
for k=1:n3
fid=fopen('file.txt','w');
fprintf(fid,'Some commands to pass to external program depending on i j k');
fclose(fid);
system('program file.txt');
end
end
end
The script has in total about 500k cases (n1xn2xn3), and runs fine for a small scenario (about 100 runs), but for the entire script it runs for a while and then returns an error for no apparent reason, giving this error:
fopen invalid file identifier object
There is no obvious reason for this, and Im wondering if anyone could point out what is wrong?
Just a guess: an instance of your external program is reading file.txt and at the same time the next iteration of your nested loop wants to open file.txt for writing. The more instances of your external program are running at the same time, the slower your machine, the more likely becomes this scenario. (called a 'race condition')
Possible solution for this: use a separate text file per case with a unique file name
You should also consider using other ways to call your external function because file handling for 500k cases should be very slow.
Hope that helps,
Eli
I have noticed that MATLAB (R2011b on Windows 7, 64 bit) tends to slow down if I am in debugging mode for a long period of time (e.g. 3 hours). I don't recall this happening on previous versions of MATLAB.
The slow down is small, but significant enough to have an impact on my productivity (sometimes MATLAB needs to wait for up to 1 sec before I can type on the command line or on the editor).
I usually spend hours on debugging mode (e.g. after stopping at a keyboard statement) coding full projects in this mode. I find working on debugging mode convenient to organically grow my code while inspecting my code anytime in execution time.
The odd thing is my machine has 16 GB of RAM and the total size of all workspaces while in debugging mode is usually less than 4 GB. I don't have any other large process running in the background, and my system reports ~8GB of free RAM.
Also, unfortunately MATLAB does not let me call pack from debugging mode; it complains with :
Warning: PACK can only be used from the MATLAB command line.
I have reproduced this behavior after restarting MATLAB, rebooting my system, and on different days. With this, my question/s are:
Has anybody else noticed this? Is there anything I could do to prevent this slowdown without exiting debugging mode?
Are there any technical notes or statements from Mathworks addressing this issue?
In case it matters, my code is on a network drive, so I added the following on my startup.m file, which should alleviate any impact on performance resulting from it:
system_dependent('RemoteCWDPolicy', 'None');
system_dependent('RemotePathPolicy', 'None');
system_dependent('DirChangeHandleWarn','Never');
I have experienced some similar issues. The problem ended up being that Mathworks changed how Matlab caches files. For some users, it is now storing data in the TMP folder as defined by the environment variables. This folder was being scanned by anti virus and causing a lot of performance problem. Of course, IT wouldn't let us exclude the TMP folder from scans. So we added a line to our start up script that changes the environment variable of TMP to some other location within an excluded folder.
You don't have to worry about changing the variable back or messing up other programs. When applications launch, they copy the environment variables into their own local instance of them. Any changes made to them only change the local copy of those variables, not the system copy.
Here is the function you will need.
setenv('TEMP', 'C:\TEMP');
I'm not sure if it was TMP or TEMP. Check your environment variables to be sure.
I am using MATLAB R2011 on linux 10, windows 7 (32 bit).
I experienced MATLAB slowing down while printing simple variables in command window.
It turned that there was one .m file loaded in my Editor.
It was a big file with 10000 lines. These lines were simple data that should have been saved as mat file. When i closed this file, the editor was back to its normal speed.