perl msgrcv() errno 22 (EINVAL) Ubuntu - perl

I have two perl processes that communicate over System V IPC message Q on Ubuntu.
The receiver runs successfully for a time, receives messages in a poller function like this
sub getCompleteRecord {
while( msgrcv($q, $buff, $size, $msgType, &IPC_NOWAIT) ) {
# assemble record from messages and return
}
# print $! error code
#`here if nothing ready (42) or other error`
After some time I eventually get an error code 22 (EINVAL), which means invalid argument, and then subsequently all calls to msgrcv() fail with 22 and the separate sender process also cannot msgsnd(), again getting EIVAL.
When I restart the processes the queue can again be used.
Any suggestions for reasons, or how to approach diagnosing this?
As noted in the comments the meaning of the error code 22 is that either the msqid ($q) or the buffer size ($size) are incorrect. However, this is all happening in a loop, and those two values never change. I log the values before each call and I see many successes and then suddenly a failure for seemingly the same values.
masterQueue 360448 read buffer size:5000 msgType1
Message received
--- many similar successes, then:
masterQueue 360448 read buffer size:5000 msgType1
read error 22=Invalid argument
And from this point both reader process and writer process fail. If I restart, then everything works for about 30 minutes and then fails again.

Related

How to recover from EAGAIN error on a non-blocking socket.send?

I'm writing a simple script for LoPy4 module. The scrip is supposed to send a PING message at an interval of X. I'm doing this using socket.send() with a non-blocking socket, but every once in a while I get EAGAIN error.
Comparing the size of the data that supposed to be sent and the return value of the send() method shows that everything was sent. So, I guess the buffer supposed to be empty, and still, I get this error which takes approx 10 sec to recover (or restarting the LoPy).
How can I recover (or avoid) from this error skipping just one interval
s = socket.socket(socket.AF_LORA, socket.SOCK_RAW)
s.setblocking(False)
while True:
try:
bytes = s.send(MY_DATA)
except OSError as e:
if e.args[0] == 11:
print('EAGAIN error occurred, skipping')
continue
time.sleep(INTERVAL)

How to run a background process with mod perl

I am using perl to return data sets in XML. Now I have come across a situation where I need to run some clean up after sending a dataset to the client. But some where, in the chain of mod perl and Apache, the output gets held onto until my method returns.
I have attempted to clear the buffers with commands like.
$| =1;
STDOUT->flush(); # flush the buffer so the content is sent to the client and the finish hook can carry on, with out delaying the return.
if ($mod_perl_io){
$mod_perl_io->rflush;
}
Yet I still get no output until my method returns. I then found out that my browser my be waiting for the connection to close and found that setting the content type in the header should fix this.
rint $cgi->header(-type => "text/plain; charset=UTF-8", -cookie => $config->{'cookie'});
Still no luck, in fact I had always been sending the correct headers.
So I though the best option is to simply start a new thread and let my method return. But when I create a new thread.
use threads ('yield',
'stack_size' => 64*4096,
'exit' => 'threads_only',
'stringify');
my $thr = threads->create('doRebuild', $dbconnect, $dbusername, $dbpassword, $bindir);
sub doRebuild {
my ($dbconnect, $dbusername, $dbpassword, $bindir ) = #_;
1;
}
I get a segfault
[Fri Feb 22 10:16:47 2013] [notice] child pid 26076 exit signal Segmentation fault (11)
From what I have read this is done by mod perl to ensure thread safe operation. Not sure if this is correct.
So I thought I'd try using {exe }
{exec 'perl', "$bindir/rebuild_needed_values.pl", qw('$dbconnect' '$dbusername' '$dbpassword');}
From what I gather this is taking over the process from mod perl and not letting it return anything.
I know this isn't as specific as a question on stack overflow should be, but this sort of thing must be a common problem how have others solved it?
You could use fork(), however I like to recommend http://gearman.org/ for background processing.
A solution like Gearman is much better, because your background process is not in Apache's process chain.
Your process will survive an Apache restart if implemented using gearman. It is also more secure, as the Gearman environment can be run in a chroot jail.
A nice side effect of using Gearman is that your background process becomes callable from other machines and even other languages.
Gearman makes it easy to collect the data from your process at a later time as well, and you can feed back progress information to your web app rather easily.

trapping SIGABRT from perl on VMS

Given kill.pl:
$SIG{INT} = sub { print "int\n" };
$SIG{TERM} = sub { print "term\n" };
$SIG{ABRT} = sub { print "abort\n" };
print "sleeping...\n";
sleep 60;
And kill.com:
$ perl kill.pl
And launching+aborting like so:
submit /log_file=kill.log kill.com
delete /entry=XXXXXX/noconfirm
The signal handlers do not get called. Similar code works on Linux when the process is killed.
kill.log just shows:
(19:58)$ perl kill.pl
sleeping...
%JBC-F-JOBABORT, job aborted during execution
I read the vmsperl documentation and tried some things from http://perldoc.perl.org/sigtrap.html. Is there a way to do this?
Note that if I call:
#kill.com
And do a CTRL+C, SIGINT is handled by kill.pl.
I added the perl tag in case someone knows if there is a way to tell perl to trap every signal which might be the one I'm interested in. My attempt was:
$SIG{$_} = \&subroutine for keys(%SIG);
You're not sending a signal to the process -- you're instructing the queue manager to delete the process, which it does. I think the easiest way to do what you want is to use Perl to send the signal. Submit your job as before and use:
$ show system/batch
to find the pid of the job. You'll see something like this when the queue manager has assigned an entry of 572:
Pid Process Name State Pri I/O CPU Page flts Pages
00003EA1 BATCH_572 HIB 1 259 0 00:00:00.05 511 626 B
Send your signal like so to pid 0x3ea1, noting that the job notification indicates it completed rather than aborted:
$ perl -e "kill 'ABRT', 0x3ea1;"
$
Job KILL (queue SYS$BATCH, entry 572) completed
Look at your log file and you'll see these two lines at the end:
sleeping...
abort
Is this an a VAX or Alpha system? I believe your 'delete' call may not be throwing an abort signal to your running job. Been too long since I've used it, but can't remember a tool that would throw a specific signal to a batch job - LIB$SIGNAL went from a process, not to it. You should try trapping the remaining signals from the 'error-signals' list on the sigtrap doc.

Weird Winsock recv() slowdown

I'm writing a little VOIP app like Skype, which works quite good right now, but I've run into a very strange problem.
In one thread, I'm calling within a while(true) loop the winsock recv() function twice per run to get data from a socket.
The first call gets 2 bytes which will be casted into a (short) while the second call gets the rest of the message which looks like:
Complete Message: [2 Byte Header | Message, length determined by the 2Byte Header]
These packets are round about 49/sec which will be round about 3000bytes/sec.
The content of these packets is audio-data that gets converted into wave.
With ioctlsocket() I determine wether there is some data on the socket or not at each "message" I receive (2byte+data). If there's something on the socket right after I received a message within the while(true) loop of the thread, the message will be received, but thrown away to work against upstacking latency.
This concept works very well, but here's the problem:
While my VOIP program is running and when I parallely download (e.g. via browser) a file, there always gets too much data stacked on the socket, because while downloading, the recv() loop seems actually to slow down. This happens in every download/upload situation besides the actual voip up/download.
I don't know where this behaviour comes from, but when I actually cancel every up/download besides the voip traffic of my application, my apps works again perfectly.
If the program runs perfectly, the ioctlsocket() function writes 0 into the bytesLeft var, defined within the class where the receive function comes from.
Does somebody know where this comes from? I'll attach my receive function down below:
std::string D_SOCKETS::receive_message(){
recv(ClientSocket,(char*)&val,sizeof(val),MSG_WAITALL);
receivedBytes = recv(ClientSocket,buffer,val,MSG_WAITALL);
if (receivedBytes != val){
printf("SHORT: %d PAKET: %d ERROR: %d",val,receivedBytes,WSAGetLastError());
exit(128);
}
ioctlsocket(ClientSocket,FIONREAD,&bytesLeft);
cout<<"Bytes left on the Socket:"<<bytesLeft<<endl;
if(bytesLeft>20)
{
// message gets received, but ignored/thrown away to throw away
return std::string();
}
else
return std::string(buffer,receivedBytes);}
There is no need to use ioctlsocket() to discard data. That would indicate a bug in your protocol design. Assuming you are using TCP (you did not say), there should not be any left over data if your 2byte header is always accurate. After reading the 2byte header and then reading the specified number of bytes, the next bytes you receive after that constitute your next message and should not be discarded simply because it exists.
The fact that ioctlsocket() reports more bytes available means that you are receiving messages faster than you are reading them from the socket. Make your reading code run faster, don't throw away good data due to your slowness.
Your reading model is not efficient. Instead of reading 2 bytes, then X bytes, then 2 bytes, and so on, you should instead use a larger buffer to read more raw data from the socket at one time (use ioctlsocket() to know how many bytes are available, and then read at least that many bytes at one time and append them to the end of your buffer), and then parse as many complete messages are in the buffer before then reading more raw data from the socket again. The more data you can read at a time, the faster you can receive data.
To help speed up the code even more, don't process the messages inside the loop directly, either. Do the processing in another thread instead. Have the reading loop put complete messages in a queue and go back to reading, and then have a processing thread pull from the queue whenever messages are available for processing.

Select and read sockets (Unix)

I have an intermittent problem with a telnet based server on Unix (the problem crops up on both AIX and Linux).
The server opens two sockets, one to a client telnet session, and one to a program running on the same machine as the server. The idea is that the data is passed through the server to and from this program.
The current setup has a loop using select to wait for a "read" file descriptor to become available, then uses select to wait for a "write" file descriptor to become available.
Then the program reads from the incoming file descriptor, then processes the data before writing to the outgoing descriptor.
The snippet below shows what is going on. The problem is that very occasionally the read fails, with errno being set to ECONNRESET or ETIMEDOUT. Neither of these are codes documented by read, so where are they coming from?
The real question is, how can I either stop this happening, or handle it gracefully?
Could doing two selects in a row be the problem?
The current handling behaviour is to shut down and restart. One point to note is that once this happens it normally happens three or four times, then clears up. The system load doesn't really seem to be that high (it's a big server).
if (select(8, &readset, NULL, NULL, NULL) < 0)
{
break;
}
if (select(8, NULL, &writeset, NULL, NULL) < 0)
{
break;
}
if (FD_ISSET(STDIN_FILENO, &readset)
&& FD_ISSET(fdout, &writeset))
{
if ((nread = read(STDIN_FILENO, buff, BUFFSIZE)) < 0)
{
/* This sometimes fails with errno =
ECONNRESET or ETIMEDOUT */
break;
}
}
Look at the comments in http://lxr.free-electrons.com/source/arch/mips/include/asm/errno.h on lines 85 and 98: these basically say there was a network connection reset or time out. Check and see if there are timeouts you can adjust on the remote network program, or send some periodic filler bytes to ensure that the connection stays awake consistently. You may just be victim of an error in the network transit path between the remote client and your local server (this happens to me when my DSL line hiccups).
EDIT: not sure what the downvote is for. The man page for read explicitly says:
Other errors may occur, depending on the object connected to fd.
The error is probably occuring in the select, not in the read: you're not checking errors after the select, you're just proceeding to the read, which will fail if the select returned an error. I'm betting if you check the errno value after the select call you'll see the errors: you don't need to wait for the read to see the errors.