Runtime.exec causes duplicate JVM to hang indefinitely until killed (Solaris 10) - solaris

All,
We are running a J2EE application on WebLogic server 9.2 MP2 with a jrockit 64-bit JVM (27.3.1) on Solaris 10.
We call use runtime.exec to call an executable called jfmerge to create PDF documents.
We have found that in Solaris, when runtime.exec is called, a duplicate JVM is temporarily spawned to kick off the jfmerge process. While this is inefficient (our JVM is 5 GB, thus the duplicated shell JVM is also 5 GB), the major problem lies in the fact that when there is heavy load on this functionality (PDF generation) in our application, sometimes the duplicated JVM never exits.
When the JVM hangs, the servers create large issues (extreme application slowness and terminated user sessions) as the entire duplicate JVM get's all of its 5 GB of process size written to disk swap.
We have noted the following hung thread correlated with a hung JVM process until the process is manually killed:
"[STUCK] ExecuteThread: '17' for queue: 'weblogic.kernel.Default
(self-tuning)'" id=3463 idx=0x158 tid=3460 prio=1 alive, in native,
daemon
at
jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native
Method)
at jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:30)
at java/io/FileInputStream.readBytes([BII)I(FileInputStream.java)
at java/io/FileInputStream.read(FileInputStream.java:194)
at
java/lang/UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227)
at java/io/BufferedInputStream.fill(BufferedInputStream.java:218)
at java/io/BufferedInputStream.read(BufferedInputStream.java:235)
^-- Holding lock:
java/io/BufferedInputStream#0xfffffffec6510470[thin lock]
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getProcessStatus(FormsBean.java:809)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.createPDF(FormsBean.java:750)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getTemplateDetails(FormsBean.java:450)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generateSinglePDF(FormsBean.java:1371)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generatePDF(FormsBean.java:263)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.endorseDocument(FormsBean.java:2377)
at
gov/v3/common/formgeneration/sessionbean/Forms_qaco28_EOImpl.endorseDocument(Forms_qaco28_EOImpl.java:214)
at
gov/v3/delegates/common/FormsAndNoticesDelegate.endorseDocument(FormsAndNoticesDelegate.java:128)
at
gov/v3/actions/common/EndorseDocumentAction.executeRequest(EndorseDocumentAction.java:68)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.dispatchToExecuteMethod(V3CommonDispatchAction.java:532)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.executeBaseAction(V3CommonDispatchAction.java:336)
at
gov/v3/fwk/controller/struts/action/V3BaseDispatchAction.execute(V3BaseDispatchAction.java:69)
at
org/apache/struts/action/RequestProcessor.processActionPerform(RequestProcessor.java:484)
at
gov/v3/fwk/controller/struts/requestprocessor/V3TilesRequestProcessor.processActionPerform(V3TilesRequestProcessor.java:384)
at
org/apache/struts/action/RequestProcessor.process(RequestProcessor.java:274)
at
org/apache/struts/action/ActionServlet.process(ActionServlet.java:1482)
at
org/apache/struts/action/ActionServlet.doGet(ActionServlet.java:507)
at
gov/v3/fwk/controller/struts/servlet/V3ControllerServlet.doGet(V3ControllerServlet.java:110)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:743)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:856)
at
weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:283)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:175)
at
weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3231)
at
weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at
weblogic/security/service/SecurityManager.runAs(SecurityManager.java:121)
at
weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2002)
at
weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:1908)
at
weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1362)
at weblogic/work/ExecuteThread.execute(ExecuteThread.java:209)
at weblogic/work/ExecuteThread.run(ExecuteThread.java:181)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
We would like to do a couple of things:
1.) Prevent the spawning of a duplicate JVM, as we do not need any of it's functions when executing the simple jfmerge executable, and it creates massive overhead.
2.) In the short term at least prevent this duplicate JVM from handing indefinitely.

This answer is late, but we have the same problem, and the problem for us is how Solaris manage the memory.
The problem is when we have a application server, using a lot of memory 10GB in my case, and we want to run a simple "ls", the new process needs 10GB to run.
Solaris needs the 10GB extra available in our server, Linux use a feature known as “copy-on-write” This feature reduces the overhead of forking a new process
http://developers.sun.com/solaris/articles/subprocess/subprocess.html
Historical Background and Problem Description
Traditionally, Unix has had only one way to create a new process: using a fork() system call, often followed by an exec() system call. The fork() call makes a copy of the entire parent process' address space, and exec() turns that copy into a new process.
(Note: In the Solaris OS, the term swap space is used to describe a combination of physical memory and disk swap space configured for the system. However, with other Unix systems this term may mean swap space on disk, also known as backing store. To avoid any confusion, I'll use the term Virtual Memory (VM) to mean physical memory plus disk swap space.)
Generally, the fork/exec method has worked quite well. However, it has disadvantages in some cases, such as running out of memory without a good reason and poor fork performance.
Out of Memory: For a large-memory process, the fork() system call can fail due to an inadequate amount of VM, because fork() requires twice the amount of the parent memory. This can happen even when fork() is immediately followed by an exec() call that would release most of that extra memory. When this happens, the application will usually terminate.
For example, suppose a 64-bit application is consuming 6 gigabytes (Gbytes) of VM at the moment, and it needs to create a subprocess to run the ls(1) command. The parent process issues a fork() call that will succeed only if there is another 6 Gbytes of VM available at the moment. If the system doesn't have that much VM available (which is a frequent situation), fork() will fail with ENOMEM. Obviously, the ls(1) command doesn't need anywhere near 6 Gbytes of memory to run, but fork() doesn't know that.
Not only applications, but also Sun's own tools can suffer from the same problem. For example, the following Sun RFE (request for enhancement) has been filed for dbx: "4748951 dbx shell should use posix_spawn() for non-builtin commands rather than fork(2)".
RFE 4748951 came about when a customer's utility invoked dbx to read a huge core file using a script that also needed to run a cut(1) command from within dbx. They got a cannot fork - try again error message causing dbx to abort. An investigation revealed that dbx used fork/exec to execute that tiny cut(1) command and ran out of VM during the fork() call.
The Solaris Java Virtual Machine (JVM) is also suffering from the same problem currently, as described in this Sun RFE: "5049299 Use posix_spawn, not fork, on S10 to avoid swap exhaustion".
So you have 3 options.
1.- Execute the Runtime.exec function earlier.
2.- Create a inter process comunication with other java server, and ececute there the Runtime.exec instruccion.
3.- Create a JNI class to call a system C function. I take this option, and it work perfect.
I put my sample code here.
Java Code.
public class CallOS {
static {
System.loadLibrary("CallOS");
}
public native int exec(java.lang.String cmd);
public static void main(String[] args) {
int returnValue = 0;
returnValue = new CallOS().exec("ls -la");
System.out.println("- " + returnValue);
}
}
C header Code. This is generate with javah -jni CallOS
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CallOS */
#ifndef _Included_CallOS
#define _Included_CallOS
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: CallOS
* Method: exec
* Signature: (Ljava/lang/String;)I
*/
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *, jobject, jstring);
#ifdef __cplusplus
}
#endif
#endif
C code.
#include "CallOS.h"
#include <stdlib.h>
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *env, jobject obj, jstring cmd)
{
jint retval;
jbyte *str;
str = (*env)->GetStringUTFChars(env, cmd, NULL);
if(str == NULL) return NULL;
retval = system(str);
(*env)->ReleaseStringUTFChars(env, cmd, str);
return retval;
};
I hope this help for you.

Are you handling the spawned process stdout/stderr properly ? You need to consume both in separate threads to reliably prevent blocking. See this answer for details. It may be the case that your process spawning works properly for some jobs, and for others (due to the quantity of stdout/err that triggers a hang).
On the subject of duplicate processes, I would expect the JVM to fork/exec. This duplicates the Java process (fork) and then it should replace it with the new process (exec). I wonder if that's what you're seeing ? Note also that I'd expect the OS to be implementing COW (copy-on-write) to duplicate only those memory pages that differ between images, so in normal circumstances the duplication of the JVM wouldn't consume as much memory as you may think.

As Brian implied, on unix, the standard way for another process to start another program is to fork into a parent process and a child process. The child process then calls exec to replace itself with the new program. The JVM has to do this to start your jfmerge program.
Normally, the memory size of the child process isn't an issue, because the OS uses copy-on-write to let the two processes share the same memory image until the child calls exec. It could be that the JVM's model for child processes requires it to fork twice, with the grandchild exec'ing jfmerge and the child process that manages the grandchild. That would explain why you see a duplicate JVM process that you are seeing. The stack trace shows a process blocked reading from an input stream. It may be that jfmerge is running slowly and the process is just hung waiting for jfmerge to produce some output.
What you could do is to get some other process to launch jfmerge, instead of your 5GB JVM. Write a standalone program which just runs jfmerge on demand, and have it communicate with the main process through some form of inter-process communication. This standalone jfmerge server wouldn't require as much memory to operate, so the impact of forked child processes wouldn't be so great.

Related

TestRealTime: How to test a realtime Operating System with Rational Test Real Time

On an AUTOSAR realtime Operating System (OS), the software architecture is layered separately (User space, systemcall interface, kernel space). Also, the switching between user context and kernel context is handled by hardware-specific infrastructure and typically the context switching handler is written in assembly code.
IBM® Rational® Test RealTime v8.0.1 (RTRT) currently treats embedded-assembly-code as mentioned in the below Q&A.
https://www.ibm.com/support/pages/how-treat-embedded-assembly-code ( ** )
RTRT tool using code insertion technololy (technically known as instrumentation process) to insert its own code to measure code coverage of the system under test.
In my case, OS with full pre-emptive design doesn't have the termination points. As the result, OS always runs unless loss of power supply. If there's no work, OS shall be in sleep (normally an idle state and do nothing). If any unexpected errors or exceptions occurs, OS shall be shutdown and run into an infinite loop. These indicated that OS is always running.
I learnt from ( ** ) and ensure context switching working correctly.
But I don't know how to teach RTRT to finish its postprocessing (consisting of attolcov and attolpostpro) in a right way. Note that OS has worked correctly throughout all my tasks already and was confirmed by debugger. SHUTDOWN OS procedure has been executed correctly and OS has been in INFINITE loop (such as while(1){};)
After RTRT ends all its processes, the coverage report of OS module is still empty.
Based on IBM guideline for RTRT
https://www.ibm.com/developerworks/community/forums/atom/download/attachment_14076432_RTRT_User_Guide.pdf?nodeId=de3b0048-968c-4111-897e-b73654af32af
RTRT provides two breakpoints to mark the logging point (priv_writeln) and termination point (priv_close) of its process.
I already tried to drive from INFINITE (my OS) to priv_close (RTRT) by interacting PC register and all Context Switching registers with the Lauterbach debugger but RTRT coverage report was empty even thougth none of errors happened. No error meant that the context switch from kernel space to user space is able to work well and main() function returned correctly.
Solved the problem.
It definitely came from context switching process of Operating System.
In my case, I did a RAM dump to see how user context memory (befor Starting OS) look like.
After that, I will backup all context areas and retore them from Sleep or Infiniteloop in the exact order.
Hereby, RTRT can know the return point and reach its own main() function's _exit to finish report generation progress.
Finally, the code coverage report has been generated.

Context switch by random system call

I know that an interrupt causes the OS to change a CPU from its current task and to run a kernel routine. I this case, the system has to save the current context of the process running on the CPU.
However, I would like to know whether or not a context switch occurs when any random process makes a system call.
I would like to know whether or not a context switch occurs when any random process makes a system call.
Not precisely. Recall that a process can only make a system call if it's currently running -- there's no need to make a context switch to a process that's already running.
If a process makes a blocking system call (e.g, sleep()), there will be a context switch to the next runnable process, since the current process is now sleeping. But that's another matter.
There are generally 2 ways to cause a content switch. (1) a timer interrupt invokes the scheduler that forcibly makes a context switch or (2) the process yields. Most operating systems have a number of system services that will cause the process to yield the CPU.
well I got your point. so, first I clear a very basic idea about system call.
when a process/program makes a syscall and interrupt the kernel to invoke syscall handler. TSS loads up Kernel stack and jump to syscall function table.
See It's actually same as running a different part of that program itself, the only major change is Kernel play a role here and that piece of code will be executed in ring 0.
now your question "what will happen if a context switch happen when a random process is making a syscall?"
well, nothing will happen. Things will work in same way as they were working earlier. Just instead of having normal address in TSS you will have address pointing to Kernel stack and SysCall function table address in that random process's TSS.

OS system calls in x86

While working on a educational simplistic RISC processor I was wondering about how system calls work when implementing my software interrupt function. For example, hypothetically lets say our program calls sys_end which ends the current process. Now I know this would go to a vector table and then to the code to end the current process.
My question is the code that ends the process ran in supervisor mode or user mode? No where I seem to look specifies this. I'm assuming if its in normal user mode that could pose a very significant problem as a user mode process could do say do something evil like:
for (i=0; i++; i<10000){
int sys_fork //creates child process
}
which could be very bad I thought the OS would have some say on how many times a process could repeat itself and not to mention what other harmful things a process could do by changing the code in the system call itself.
system calls run in supervisor mode for the duration of the system call. The supervisor mode is necessary for accessing hardware (the screen, the keyboard), and for keeping user processes isolated from each other.
There are (or can be configured) limits on the amount of cpu, number of processes, etc. a user process may use or request, which can offer some protection against the kind of runaway program you describe.
But the default linux configuration allows 10k processes to be created in a tight loop; I've done it myself (both intentionally and accidentally)

How does OS execute compiled binary files?

This questions came to my head when I was studying processes scheduling.
How does OS execute and control the execution of binary and compiled files? I thought maybe OS copies a part of the binary to some memory location, jumps there, comes back after executing that block and executes the next one. But then it wouldn't have any control on it (e.g. the program can do a jump anywhere and don't come back).
In JVM case it makes perfect sense, the VM is interpreting each instruction. But in the binary files case the instructions are real CPU executable instructions so I don't think that an OS acts like VM.
It does exactly that. The operating system, in some order,
creates an entry in the process table
creates a virtual memory space for the process
loads the program code in the process memory
points the process instruction pointer to the process entry point
creates an entry in the scheduler and sets the process thread ready for execution.
Concurrency is not handled by the program being split into blocks. Switching between tasks is done via interrupts: before a process is given CPU, a timer is set up. When the timer finishes, the CPU registers an interrupt, pushes the instruction pointer to the stack and jumps to the interrupt handler defined by the operating system. This handler stores the CPU state in memory, swaps a virtual memory table and restores some other thread that is ready for execution. The same swap occurs if the thread must pause for some other reason (waiting for user / disk / network...) or yields.
http://en.wikipedia.org/wiki/Multitasking#Preemptive_multitasking.2Ftime-sharing
Note that relying on the process yielding the CPU is possible but unreliable (the process might not yield, preventing other processes from running)
http://en.wikipedia.org/wiki/Multitasking#Cooperative_multitasking.2Ftime-sharing
Security is handled by switching the CPU into protected mode where the application code cannot run some instructions (so jumping around randomly is mostly harmless). See the link provided by #SkPhilipp
Note that modern JVM does not interpret each instruction (that would be slow). Instead it compiles into native code and runs the code or (in case of just-in-time compilation) interprets at first, but compiles the "hot spots" (the code that gets run often enough).

How can I avoid zombies in Perl CGI scripts run under Apache 1.3?

Various Perl scripts (Server Side Includes) are calling a Perl module with many functions on a website.
EDIT:
The scripts are using use lib to reference the libraries from a folder.
During busy periods the scripts (not the libraries) become zombies and overload the server.
The server lists:
319 ? Z 0:00 [scriptname1.pl] <defunct>
320 ? Z 0:00 [scriptname2.pl] <defunct>
321 ? Z 0:00 [scriptname3.pl] <defunct>
I have hundreds of instances of each.
EDIT:
We are not using fork, system or exec, apart form the SSI directive
<!--#exec cgi="/cgi-bin/scriptname.pl"-->
As far as I know, in this case httpd itself will be the owner of the process.
MaxRequestPerChild is set to 0 which should not let the parents die before the child process is finished.
So far we figured that temporarily suspending some of the scripts help the server coping with the defunct processes and prevent it from falling over however zombie processes are still forming without a doubt.
Apparently gbacon seems to be the closest to the truth with his theory that the server is not being able to cope with the load.
What could lead to httpd abandoning these processes?
Is there any best practice to prevent these from happening?
Thanks
Answer:
The point goes to Rob.
As he says, CGI scripts that generate SSI's will not have those SSI's handled. The evaluation of SSI's happens before the running of CGI's in the Apache 1.3 request cycle. This was fixed with Apache 2.0 and later so that CGI's can generate SSI commands.
Since we were running on Apache 1.3, for every page view the SSI's turned into defunct processes. Although the server was trying to clear them it was way too busy with the running tasks to be able to succeed. As a result, the server fell over and become unresponsive.
As a short term solution we reviewed all SSI's and moved some of the processes to client side to free up server resources and give it time to clean up.
Later we upgraded to Apache 2.2.
More Band-Aid than best practice, but sometimes you can get away with simple
$SIG{CHLD} = "IGNORE";
According to the perlipc documentation
On most Unix platforms, the CHLD (sometimes also known as CLD) signal has special behavior with respect to a value of 'IGNORE'. Setting $SIG{CHLD} to 'IGNORE' on such a platform has the effect of not creating zombie processes when the parent process fails to wait() on its child processes (i.e., child processes are automatically reaped). Calling wait() with $SIG{CHLD} set to 'IGNORE' usually returns -1 on such platforms.
If you care about the exit statuses of child processes, you need to collect them (commonly referred to as "reaping") by calling wait or waitpid. Despite the creepy name, a zombie is merely a child process that has exited but whose status has not yet been reaped.
If your Perl programs themselves are the child processes becoming zombies, that means their parents (the ones that are forking-and-forgetting your code) need to clean up after themselves. A process cannot stop itself from becoming a zombie.
I just saw your comment that you are running Apache 1.3 and that may be associated with your problem.
SSI's can run CGI's. But CGI scripts that generate SSI's will not have those SSI's handled. The evaluation of SSI's happens before the running of CGI's in the Apache 1.3 request cycle. This was fixed with Apache 2.0 and later so that CGI's can generate SSI commands.
As I'd suggested above, try running your scripts on their own and have a look at the output. Are they generating SSI's?
Edit: Have you tried launching a trivial Perl CGI script to simply printout a Hello World type HTTP response?
Then if this works add a trivial SSI directives such as
<!--#printenv -->
and see what happens.
Edit 2: Just realised what is probably happening. Zombies occur when a child process exits and isn't reaped. These processes are hanging around and slowly using up resources within the process table. A process without a parent is an orphaned process.
Are you forking off processes within your Perl script? If so, have you added a waitpid() call to the parent?
Have you also got the correct exit within the script?
CORE::exit(0);
As you have all the bits yourself, I'd suggest running the individual scripts one at a time from the command line to see if you can spot the ones that are hanging.
Does a ps listing show an inordinate number of instances of one particular script running?
Are you running the CGI's using mod_perl?
Edit: Just saw your comments regarding SSI's. Don't forget that SSI directives can run Perl scripts themselves. Have a look to see what the CGI's are trying to run?
Are they dependent on yet another server or service?