Matlab read from fifo with fopen timeout - matlab

I'm working with named pipes (fifo) to communicate between python and MATLAB. The MATLAB code to read from the pipe is functional but it hangs if nothing has been written to the fifo. I would prefer it would gracefully timeout when no data is available.
If the pipe exists (in bash):
$ mkfifo pipe_read
but has no data the MATLAB open command:
>> fid = fopen('pipe_read', 'r');
hangs until data is available:
$ echo "test data" >> pipe_read
Rather than blocking forever I would like fopen to return a file-id that indicates an error (i.e. similar to -1 as it does when the file does not exist) if there is no data available.
Could there be a solution similar to the asynchronous reads available in the commands for writing and reading to serial instruments: http://www.mathworks.com/help/matlab/ref/readasync.html ?
Or possibly fopen could be embedded into a matlab timer object that enables a timeout?
This has been asked before but without an answer:
Matlab read from named pipe (fifo)

I'm pretty sure the issue is not actually with Matlab's fopen, but the underlying open system call. Generally, the use of a pipe or FIFO only makes sense when there exists both a reader and a writer, and so, by default, open(2) will block until the other end of the FIFO has been opened as well.
I don't think it will work to embed the fopen call in any other Matlab object. As far as I'm aware, the only way to circumvent this is to write your own version of fopen, as a specialized Mex function. In this case, you can make a call to open(2) with the O_NONBLOCK flag or'd with whatever read/write flag you'd like. But digging around in man 2 open, under the ERRORS section, you can see that ENXIO is returned if "O_NONBLOCK and O_WRONLY are set, the file is a FIFO, and no process has it open for reading". That means you need to make sure that Python has opened the FIFO for reading before Matlab tries to open for writing (or vice versa).
As a final point, keep in mind that Matlab's fopen returns a handle to a file descriptor. Your Mex function should probably mirror that, so you can pass it around to fread/fscanf/etc without issues.

In Linux, a system call with timeout would do the trick. For example:
timeout = 5; % timeout in seconds
pipe = 'pipe_read';
[exit_code,str] = system(sprintf('timeout %ds cat %s', timeout, pipe));
switch(exit_code);
case 0; doSomething(str); % found data
case 124; doTimeout(); % timedout
end
MacOS has gtimeout which I assume is similar.

Related

Possible to see tracing when using cat or vi opening a text file

Is it possible to trace through what is being read through a text file using eBPF? There are ways to see the amount of memory being used and count reads and writes but I would like to even output the user data using bpf_trace_print if possible.
I think this would require tracing open() (or openat()) system call and correlate it (fd in particular) with traced read calls.
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/format defines what syscall arguments can be accessed. What may interest you is char *buf buffer pointer, where read() places bytes it has read.
However, it is possible that the trace call occurs before any bytes have been read (need to check the kernel source). So, may be more reliable way is to use raw tracepoint (BPF_PROG_TYPE_RAW_TRACEPOINT) hooked at read() return.

Invoke matlab script from command line multiple times on the same Matlab instance [duplicate]

Is there a way to call Matlab functions from outside, in particular by the Windows cmd (but also the Linux terminal, LUA-scripts, etc...), WITHOUT opening a new instance of Matlab each time?
for example in cmd:
matlab -sd myCurrentDirectory -r "function(parameters)" -nodesktop -nosplash -nojvm
opens a new instance of Matlab relatively fast and executes my function. Opening and closing of this reduced matlab prompt takes about 2 seconds (without computations) - hence for 4000 executions more than 2 hours. I'd like to avoid this, as the called function is always located in the same workspace. Can it be done in the same instance always?
I already did some research and found the possibility of the MATLAB COM Automation Server, but it seems quite complicated to me and I don't see the essential steps to make it work for my case. Any advices for that?
I'm not familiar with c/c++/c# but I'm thinking about the use of python (but just in the worst case).
Based on the not-working, but well thought, idea of #Ilya Kobelevskiy here the final workaround:
function pipeConnection(numIterations,inputFile)
for i=1:numIterations
while(exist('inputfile','file'))
load inputfile;
% read inputfile -> inputdata
output = myFunction(inputdata);
delete('inputfile');
end
% Write output to file
% Call external application to process output data
% generate new inputfile
end;
Another convenient solution would be to compile an executable of the Matlab function:
mcc -m myfunction
run this .exe-file using cmd:
cd myCurrentDirectory && myfunction.exe parameter1 parameter2
Be aware that the parameters are now passed as strings and the original .m-file needs to be adjusted considering that.
further remarks:
I guess Matlab still needs to be installed on the system, though
it is not necessary to run it.
I don't know how far this method is limited respectively the complexity of the
underlying function.
The speed-up compared to the initial apporach given in the question is
relatively small
Amongst the several methods exposed here, there is one workaround that should reduce the execution time of your multiple matlab calls. The idea is to run a custom function multiple times within on matlab session.
For example, myRand.m function is defined as
function r = myRand(a,b)
r = a + (b-a).*rand;
Within the matlab command window, we generate the single line command like this
S = [1:5; 1:5; 101:105];
cmd_str = sprintf('B(%d) = myRand(%d,%d);', S)
It generates the following command string B(1) = myRand(1,101);B(2) = myRand(2,102);B(3) = myRand(3,103);B(4) = myRand(4,104);B(5) = myRand(5,105); that is executed within a single matlab session with
matlab -nojvm -nodesktop -nosplash -r "copy_the_command_string_here";
One of the limitation is that you need to run your 4000 function calls in a row.
I like approach proposed by Magla, but given the constrains stated in your comment to it, it can be improved to still run single function in one matlab session.
Idea is to pipe your inputs and outputs. For inputs, you can check if certain input file exists, if it does, read input for your function from it, do work, write output to another file to signal script/function processing results that it matlab function is done and is waiting for the next input.
It is very straightforwad to implement using disk files, with some effort it is probably possible to do through memory disk (i.e., open input/output fiels in RAM).
function pipeConnection(numIterations,inputFile,outputFile)
for i=1:numIterations
while(!isfile(inputFile))
sleep(50);
end;
% Read inputs
output = YourFunction(x,y,z);
% Write output to file, go to next iteration
end;
return;
If number of iterations is unknown when you start, you can also encode exit conditions in input file rather than specifying number of iterations right away.
If you're starting up MATLAB from the command line with the -r option in the way you describe, then it will always start a new instance as you describe. I don't believe there's a way around this.
If you are calling MATLAB from a C/C++ application, MATLAB provides the MATLAB engine interface, which would connect to any running instance of MATLAB.
Otherwise the MATLAB Automation Server interface that you mention is the right way to go. If you're finding it complicated, I would suggest posting a separate question detailing what you've tried and what difficulties you're having.
For completeness, I'll mention that MATLAB also has an undocumented interface that can be called directly from Java - however, as it's undocumented it's very difficult to get right, and is subject to change across versions so you shouldn't rely on it.
Edit: As of R2014b, MATLAB makes available the MATLAB Engine for Python, via which you can automate MATLAB from a Python script. And as of R2016b, there is also the MATLAB Engine for Java. If anyone was previously considering the undocumented Java techniques mentioned above, this would now be the way to go.

Call a function by an external application without opening a new instance of Matlab

Is there a way to call Matlab functions from outside, in particular by the Windows cmd (but also the Linux terminal, LUA-scripts, etc...), WITHOUT opening a new instance of Matlab each time?
for example in cmd:
matlab -sd myCurrentDirectory -r "function(parameters)" -nodesktop -nosplash -nojvm
opens a new instance of Matlab relatively fast and executes my function. Opening and closing of this reduced matlab prompt takes about 2 seconds (without computations) - hence for 4000 executions more than 2 hours. I'd like to avoid this, as the called function is always located in the same workspace. Can it be done in the same instance always?
I already did some research and found the possibility of the MATLAB COM Automation Server, but it seems quite complicated to me and I don't see the essential steps to make it work for my case. Any advices for that?
I'm not familiar with c/c++/c# but I'm thinking about the use of python (but just in the worst case).
Based on the not-working, but well thought, idea of #Ilya Kobelevskiy here the final workaround:
function pipeConnection(numIterations,inputFile)
for i=1:numIterations
while(exist('inputfile','file'))
load inputfile;
% read inputfile -> inputdata
output = myFunction(inputdata);
delete('inputfile');
end
% Write output to file
% Call external application to process output data
% generate new inputfile
end;
Another convenient solution would be to compile an executable of the Matlab function:
mcc -m myfunction
run this .exe-file using cmd:
cd myCurrentDirectory && myfunction.exe parameter1 parameter2
Be aware that the parameters are now passed as strings and the original .m-file needs to be adjusted considering that.
further remarks:
I guess Matlab still needs to be installed on the system, though
it is not necessary to run it.
I don't know how far this method is limited respectively the complexity of the
underlying function.
The speed-up compared to the initial apporach given in the question is
relatively small
Amongst the several methods exposed here, there is one workaround that should reduce the execution time of your multiple matlab calls. The idea is to run a custom function multiple times within on matlab session.
For example, myRand.m function is defined as
function r = myRand(a,b)
r = a + (b-a).*rand;
Within the matlab command window, we generate the single line command like this
S = [1:5; 1:5; 101:105];
cmd_str = sprintf('B(%d) = myRand(%d,%d);', S)
It generates the following command string B(1) = myRand(1,101);B(2) = myRand(2,102);B(3) = myRand(3,103);B(4) = myRand(4,104);B(5) = myRand(5,105); that is executed within a single matlab session with
matlab -nojvm -nodesktop -nosplash -r "copy_the_command_string_here";
One of the limitation is that you need to run your 4000 function calls in a row.
I like approach proposed by Magla, but given the constrains stated in your comment to it, it can be improved to still run single function in one matlab session.
Idea is to pipe your inputs and outputs. For inputs, you can check if certain input file exists, if it does, read input for your function from it, do work, write output to another file to signal script/function processing results that it matlab function is done and is waiting for the next input.
It is very straightforwad to implement using disk files, with some effort it is probably possible to do through memory disk (i.e., open input/output fiels in RAM).
function pipeConnection(numIterations,inputFile,outputFile)
for i=1:numIterations
while(!isfile(inputFile))
sleep(50);
end;
% Read inputs
output = YourFunction(x,y,z);
% Write output to file, go to next iteration
end;
return;
If number of iterations is unknown when you start, you can also encode exit conditions in input file rather than specifying number of iterations right away.
If you're starting up MATLAB from the command line with the -r option in the way you describe, then it will always start a new instance as you describe. I don't believe there's a way around this.
If you are calling MATLAB from a C/C++ application, MATLAB provides the MATLAB engine interface, which would connect to any running instance of MATLAB.
Otherwise the MATLAB Automation Server interface that you mention is the right way to go. If you're finding it complicated, I would suggest posting a separate question detailing what you've tried and what difficulties you're having.
For completeness, I'll mention that MATLAB also has an undocumented interface that can be called directly from Java - however, as it's undocumented it's very difficult to get right, and is subject to change across versions so you shouldn't rely on it.
Edit: As of R2014b, MATLAB makes available the MATLAB Engine for Python, via which you can automate MATLAB from a Python script. And as of R2016b, there is also the MATLAB Engine for Java. If anyone was previously considering the undocumented Java techniques mentioned above, this would now be the way to go.

When do you need to `END { close STDOUT}` in Perl?

In the tchrists broilerplate i found this explicit closing of STDOUT in the END block.
END { close STDOUT }
I know END and close, but i'm missing why it is needed.
When start searching about it, found in the perlfaq8 the following:
For example, you can use this to make
sure your filter program managed to
finish its output without filling up
the disk:
END {
close(STDOUT) || die "stdout close failed: $!";
}
and don't understand it anyway. :(
Can someone explain (maybe with some code-examples):
why and when it is needed
how and in what cases can my perl filter fill up the disk and so on.
when things getting wrong without it...
etc??
A lot of systems implement "optimistic" file operations. By this I mean that a call to for instance print which should add some data to a file can return successfully before the data is actually written to the file, or even before enough space is reserved on disk for the write to succeed.
In these cases, if you disk is nearly full, all your prints can appear successful, but when it is time to close the file, and flush it out to disk, the system realizes that there is no room left. You then get an error when closing the file.
This error means that all the output you thought you saved might actually not have been saved at all (or partially saved). If that was important, your program needs to report an error (or try to correct the situation, or ...).
All this can happen on the STDOUT filehandle if it is connected to a file, e.g. if your script is run as:
perl script.pl > output.txt
If the data you're outputting is important, and you need to know if all of it was indeed written correctly, then you can use the statement you quoted to detect a problem. For example, in your second snippet, the script explicitly calls die if close reports an error; tchrist's boilerplate runs under use autodie, which automatically invokes die if close fails.
(This will not guarantee that the data is stored persistently on disk though, other factors come into play there as well, but it's a good error indication. i.e. if that close fails, you know you have a problem.)
I believe Mat is mistaken.
Both Perl and the system have buffers. close causes Perl's buffers to be flushed to the system. It does not necessarily cause the system's buffers to be written to disk as Mat claimed. That's what fsync does.
Now, this would happen anyway on exit, but calling close gives you a chance to handle any error it encountered flushing the buffers.
The other thing close does is report earlier errors in attempts by the system to flush its buffers to disk.

Why does writing to an unconnected socket send SIGPIPE first?

There are so many possible errors in the POSIX environment. Why do some of them (like writing to an unconnected socket in particular) get special treatment in the form of signals?
This is by design, so that simple programs producing text (e.g. find, grep, cat) used in a pipeline would die when their consumer dies. That is, if you're running a chain like find | grep | sed | head, head will exit as soon as it reads enough lines. That will kill sed with SIGPIPE, which will kill grep with SIGPIPE, which will kill find with SEGPIPE. If there were no SIGPIPE, naively written programs would continue running and producing content that nobody needs.
If you don't want to get SIGPIPE in your program, just ignore it with a call to signal(). After that, syscalls like write() that hit a broken pipe will return with errno=EPIPE instead.
See this SO answer for a detailed explanation of why writing a closed descriptor / socket generates SIGPIPE.
Why is writing a closed TCP socket worse than reading one?
SIGPIPE isn't specific to sockets — as the name would suggest, it is also sent when you try to write to a pipe (anonymous or named) as well. I guess the reason for having separate error-handling behaviour is that broken pipes shouldn't always be treated as an error (whereas, for example, trying to write to a file that doesn't exist should always be treated as an error).
Consider the program less. This program reads input from stdin (unless a filename is specified) and only shows part of it at a time. If the user scrolls down, it will try to read more input from stdin, and display that. Since it doesn't read all the input at once, the pipe will be broken if the user quits (e.g. by pressing q) before the input has all been read. This isn't really a problem, though, so the program that's writing down the pipe should handle it gracefully.
it's up to the design.
at the beginning people use signal to control events notification which were sent to the user space, and later it is not necessary because there're more popular skeletons such as polling which don't require a system caller to make a signal handler.