I'm implementing a HAL component for which I'm creating a test.
The test is actually about creating a mock of the HAL callbacks in a different process, and registering itself to the HAL.
In other words, I'm mocking the client of the HAL, then simulating a client crash causing a transaction failure to test the HAL ability to handle errors.
The problem is, trying to use Binder IPC inside a fork causes a segfault.
Reading this question and from the documentation I understood that the HwBinder thread starts automatically by linking to the libbiner.
Given that fork() does not copy the running threads of the parent, this means that the forked process will not find the helper HwBinder thread, or it does assume it exists and try to use it anyway?
This creates a chain of errors ending up by having a SEGV_MAPERR.
My question is, is there anyway to re-initiate the HwBinder thread inside my forked child without having to use exec?
Here is my code:
pid_t pid = fork();
if (pid == 0) {
using namespace ::android::hardware;
using namespace testing;
/**
* Crash happens here
*/
android::sp<IMyHal> iMyHal = IMyHal::getService();
IMyHal->deregisterMyHalCallback();
sp<MockIMyHalCallback> callbackMock(new MockIMyHalCallback());
iMyHal->deregisterMyHalCallback(callbackMock);
Status hidlStatus;
hidlStatus.setException(Status::EX_TRANSACTION_FAILED, "Transaction Failed");
EXPECT_CALL(*callbackMock, testFunction(Eq("1"), _))
.WillOnce(DoAll(testing::Return(ByMove(android::hardware::Return<void>(hidlStatus)))))
.Times(Exactly(1));
} else if (pid > 0) {
/**
* This will trigger iMyHal to call callbackMock.testFunction();
*/
mProxy->callTestFunction("1", 2);
}
This results in this error:
2020-10-30 19:18:31.440 18917-18917/? A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xef400000 in tid 18917 (vendor.xxx.remo), pid 18917 (vendor.xxx.remo)
Do you intend to test your HAL or cross-process HwBinder implementation? In the first case, you may not need to create a whole separate process to test your HAL.
The usual way it's done is:
providing default HAL implementation, which may be a mock (example)
implementing a VTS test for the HAL (example), which can be run against real HAL as well
running default implementation at device start (or just from adb shell) and VTS test separately
Related
Is it possible to catch signals received (specifically SIGSEGV, SIGABRT) by child processes of a program without actually modifying it (or with minimal modification)?
The program I'm talking about is a pretty complex tool of which I don't have low-level (implementation details) knowledge of. I do have access to its source code. I can start it using a command like:
$ ./tool_name start # tool_name is an executable created after compiling and building its source code
It forks many child processes and I want to see if those child processes are being killed by a signal or not.
What I have thought about is to create a simple C program and call above command through that (using system()). Write a signal handler for above signals I'm looking for, and do other stuffs. Is it a right way to keep track of signals received by child processes? Is there a better way to do the same?
I found the answer in Managing Signal Handling for daemons that fork() very helpful for what I'm doing. I'm unsure about how to solve
"You will therefore need to install any signal handling in the execed process when it starts up"
I don't have control over the process that start up. Is there any way for me to force some signal handles on the execed from the parent of the fork?
Edit:{
I'm writing a Perl module that monitors long-running processes. Instead of
system(<long-running cmd>);
you'd use
my_system(<ID>, <long-running cmd>);
I create a lock file for the <ID> and don't let another my_system(<ID>...) call through if there is one currently running with a matching ID.
The parent fork/execs <long-running cmd> and is in change of cleaning up the lock file when it terminates. I'd like to have the child self-sufficient so the parent can exit (or so the child can take care of itself if the parent gets a kill -9).
}
On Unix systems, you can make an exec'd process ignore signals (unless the process chooses to override what you say), but you can't force it to set a handler for it. The most you can do is leave the relevant signal being handled by the default handler.
If you think about it, you'll realize why. To install a signal handler, you have to provide a function pointer - but the process that does the exec() can't specify one of its functions because they won't exist as part of the exec'd process, and it can't specify one of the exec'd processes functions because they don't exist as part of the exec'ing process. Similarly, you can't register atexit() handlers in the exec'ing process that will be honoured by the exec'd process.
As to your programming problem, there's a good reason that the lock file normally contains the process ID (pid) of the process that holds the lock; it allows you to check whether that process is still around, even if it isn't your child. You can read the pid from the lock file, and then use kill(pid, 0) which will tell you if the process exists and you can signal it without actually sending any signal.
One approach would be to use two forks.
The first fork would create a child process responsible for cleaning up the lock file if the parent dies. This process will also fork a grandchild which would exec the long running command.
I'm designing a multi-threaded server with a thread pool. This system is designed to use persistent TCP connections, as clients will maintain connects close to 24/7. The problem I run into is how to manage shutdowns. Currently, a connection comes in through "accept(listen_fd....)" and gets assigned to a work order struct. This struct is dumped onto the work queue, and is picked up by a thread. From this point on, this thread is devoted to the current connection. My code inside the thread is:
/* Function which runs in a thread to handle a request */
void *
handle_req( void *in)
{
ssize_t n;
char read;
/* Convert the input to a workorder_ptr */
workorder_t *workorder_ptr = (workorder_t *)in;
while( !serv_shutdown
&& (n=recv(workorder_ptr->sock_fd,&read,1,0) != 0))
{
printf("Read a character: %c\n",read);
}
printf("Peer has shutdown.\n");
/* Free the workorder memory */
close(workorder_ptr->sock_fd);
free(workorder_ptr);
return NULL;
}
Which simply listens to the socket and echos the characters indefinitely, and operates correctly when the client terminates the connection. You see the "!serv_shutdown" part in the while loop - this is my attempt to get the thread to break out of its loop on a shutdown signal. When a SIGINT is caught, the global variable is set to 1. Unfortunately, the program is currently blocking on the recv statement, and won't check this flag until another character is read. I want to avoid that, since it could be an arbitrary amount of time before another character is sent on this connection.
Also, I read on another post here that it's better to use "select" than "accept" to wait on a socket connection, but I didn't quite understand. Would you do a select to wait, and then do an accept right after that? I'm not sure how select creates a socket connection. I ask this, because if my understanding of select is cleared up, maybe it applies to the question I am asking?
Also also, how do I detect the case where a connection simply times out?
Thanks!
EDIT
I think I may have finally found a solution, after further digging:
Wake up thread blocked on accept() call
Basically, I could create a global pipe and have each thread do a select on its own socket_fd as well as this global pipe. Then, when a signal is caught, I'll just write something to the pipe. All threads should be woken, no?
Well, on FreeBSD, MacOSX and maybe somewhere else there is kevent() call, that allows listening on a broad range of system events including connect requests and signaling when data arrives to the socket.
It will solve all of your problems in a neat way, but it's not portable. There are libs such libevent and libev, that wraps OS-specific functionality like kevent() on BSD's, epoll() on Linux and so on. May be it would help you.
You can use the recv() primitive. If it returns 0, that means that the socket has been closed.
More information: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#recvman
I want to write a robust daemon in perl that will run on Linux and am following the template described in this excellent answer. However there are a few differences in my situation: First I am using Parallel::ForkManager start() and next; to fork on an event immediately followed by exec('handle_event.pl')
In such a situation, I have the following questions:
Where should I define my signal handlers. Should I define them in the parent (the daemon) and assume that they will be inherited in the children?
If I run exec('handle_event.pl') will the handlers get inherited across the exec (I know that they are inherited across the fork)?
If I re-define a new signal handler in handle_event.pl will this definition override the one defined in the parent?
What are best practices in a situation like this?
Thank you
When you fork, the child process has the same signal handlers as the parent. When you exec, any ignored signals remain ignored; any handled signals are reset back to the default handler.
The exec replaces the whole process code with the code that will be executed. As signal handlers are code in the process image, they cannot be inherited across an exec, so exec will reset the signal handling dispositions of handled signals to their default states (ignored signals will remain ignored). You will therefore need to install any signal handling in the execed process when it starts up.
All,
We are running a J2EE application on WebLogic server 9.2 MP2 with a jrockit 64-bit JVM (27.3.1) on Solaris 10.
We call use runtime.exec to call an executable called jfmerge to create PDF documents.
We have found that in Solaris, when runtime.exec is called, a duplicate JVM is temporarily spawned to kick off the jfmerge process. While this is inefficient (our JVM is 5 GB, thus the duplicated shell JVM is also 5 GB), the major problem lies in the fact that when there is heavy load on this functionality (PDF generation) in our application, sometimes the duplicated JVM never exits.
When the JVM hangs, the servers create large issues (extreme application slowness and terminated user sessions) as the entire duplicate JVM get's all of its 5 GB of process size written to disk swap.
We have noted the following hung thread correlated with a hung JVM process until the process is manually killed:
"[STUCK] ExecuteThread: '17' for queue: 'weblogic.kernel.Default
(self-tuning)'" id=3463 idx=0x158 tid=3460 prio=1 alive, in native,
daemon
at
jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native
Method)
at jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:30)
at java/io/FileInputStream.readBytes([BII)I(FileInputStream.java)
at java/io/FileInputStream.read(FileInputStream.java:194)
at
java/lang/UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227)
at java/io/BufferedInputStream.fill(BufferedInputStream.java:218)
at java/io/BufferedInputStream.read(BufferedInputStream.java:235)
^-- Holding lock:
java/io/BufferedInputStream#0xfffffffec6510470[thin lock]
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getProcessStatus(FormsBean.java:809)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.createPDF(FormsBean.java:750)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getTemplateDetails(FormsBean.java:450)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generateSinglePDF(FormsBean.java:1371)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generatePDF(FormsBean.java:263)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.endorseDocument(FormsBean.java:2377)
at
gov/v3/common/formgeneration/sessionbean/Forms_qaco28_EOImpl.endorseDocument(Forms_qaco28_EOImpl.java:214)
at
gov/v3/delegates/common/FormsAndNoticesDelegate.endorseDocument(FormsAndNoticesDelegate.java:128)
at
gov/v3/actions/common/EndorseDocumentAction.executeRequest(EndorseDocumentAction.java:68)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.dispatchToExecuteMethod(V3CommonDispatchAction.java:532)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.executeBaseAction(V3CommonDispatchAction.java:336)
at
gov/v3/fwk/controller/struts/action/V3BaseDispatchAction.execute(V3BaseDispatchAction.java:69)
at
org/apache/struts/action/RequestProcessor.processActionPerform(RequestProcessor.java:484)
at
gov/v3/fwk/controller/struts/requestprocessor/V3TilesRequestProcessor.processActionPerform(V3TilesRequestProcessor.java:384)
at
org/apache/struts/action/RequestProcessor.process(RequestProcessor.java:274)
at
org/apache/struts/action/ActionServlet.process(ActionServlet.java:1482)
at
org/apache/struts/action/ActionServlet.doGet(ActionServlet.java:507)
at
gov/v3/fwk/controller/struts/servlet/V3ControllerServlet.doGet(V3ControllerServlet.java:110)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:743)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:856)
at
weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:283)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:175)
at
weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3231)
at
weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at
weblogic/security/service/SecurityManager.runAs(SecurityManager.java:121)
at
weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2002)
at
weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:1908)
at
weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1362)
at weblogic/work/ExecuteThread.execute(ExecuteThread.java:209)
at weblogic/work/ExecuteThread.run(ExecuteThread.java:181)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
We would like to do a couple of things:
1.) Prevent the spawning of a duplicate JVM, as we do not need any of it's functions when executing the simple jfmerge executable, and it creates massive overhead.
2.) In the short term at least prevent this duplicate JVM from handing indefinitely.
This answer is late, but we have the same problem, and the problem for us is how Solaris manage the memory.
The problem is when we have a application server, using a lot of memory 10GB in my case, and we want to run a simple "ls", the new process needs 10GB to run.
Solaris needs the 10GB extra available in our server, Linux use a feature known as “copy-on-write” This feature reduces the overhead of forking a new process
http://developers.sun.com/solaris/articles/subprocess/subprocess.html
Historical Background and Problem Description
Traditionally, Unix has had only one way to create a new process: using a fork() system call, often followed by an exec() system call. The fork() call makes a copy of the entire parent process' address space, and exec() turns that copy into a new process.
(Note: In the Solaris OS, the term swap space is used to describe a combination of physical memory and disk swap space configured for the system. However, with other Unix systems this term may mean swap space on disk, also known as backing store. To avoid any confusion, I'll use the term Virtual Memory (VM) to mean physical memory plus disk swap space.)
Generally, the fork/exec method has worked quite well. However, it has disadvantages in some cases, such as running out of memory without a good reason and poor fork performance.
Out of Memory: For a large-memory process, the fork() system call can fail due to an inadequate amount of VM, because fork() requires twice the amount of the parent memory. This can happen even when fork() is immediately followed by an exec() call that would release most of that extra memory. When this happens, the application will usually terminate.
For example, suppose a 64-bit application is consuming 6 gigabytes (Gbytes) of VM at the moment, and it needs to create a subprocess to run the ls(1) command. The parent process issues a fork() call that will succeed only if there is another 6 Gbytes of VM available at the moment. If the system doesn't have that much VM available (which is a frequent situation), fork() will fail with ENOMEM. Obviously, the ls(1) command doesn't need anywhere near 6 Gbytes of memory to run, but fork() doesn't know that.
Not only applications, but also Sun's own tools can suffer from the same problem. For example, the following Sun RFE (request for enhancement) has been filed for dbx: "4748951 dbx shell should use posix_spawn() for non-builtin commands rather than fork(2)".
RFE 4748951 came about when a customer's utility invoked dbx to read a huge core file using a script that also needed to run a cut(1) command from within dbx. They got a cannot fork - try again error message causing dbx to abort. An investigation revealed that dbx used fork/exec to execute that tiny cut(1) command and ran out of VM during the fork() call.
The Solaris Java Virtual Machine (JVM) is also suffering from the same problem currently, as described in this Sun RFE: "5049299 Use posix_spawn, not fork, on S10 to avoid swap exhaustion".
So you have 3 options.
1.- Execute the Runtime.exec function earlier.
2.- Create a inter process comunication with other java server, and ececute there the Runtime.exec instruccion.
3.- Create a JNI class to call a system C function. I take this option, and it work perfect.
I put my sample code here.
Java Code.
public class CallOS {
static {
System.loadLibrary("CallOS");
}
public native int exec(java.lang.String cmd);
public static void main(String[] args) {
int returnValue = 0;
returnValue = new CallOS().exec("ls -la");
System.out.println("- " + returnValue);
}
}
C header Code. This is generate with javah -jni CallOS
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CallOS */
#ifndef _Included_CallOS
#define _Included_CallOS
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: CallOS
* Method: exec
* Signature: (Ljava/lang/String;)I
*/
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *, jobject, jstring);
#ifdef __cplusplus
}
#endif
#endif
C code.
#include "CallOS.h"
#include <stdlib.h>
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *env, jobject obj, jstring cmd)
{
jint retval;
jbyte *str;
str = (*env)->GetStringUTFChars(env, cmd, NULL);
if(str == NULL) return NULL;
retval = system(str);
(*env)->ReleaseStringUTFChars(env, cmd, str);
return retval;
};
I hope this help for you.
Are you handling the spawned process stdout/stderr properly ? You need to consume both in separate threads to reliably prevent blocking. See this answer for details. It may be the case that your process spawning works properly for some jobs, and for others (due to the quantity of stdout/err that triggers a hang).
On the subject of duplicate processes, I would expect the JVM to fork/exec. This duplicates the Java process (fork) and then it should replace it with the new process (exec). I wonder if that's what you're seeing ? Note also that I'd expect the OS to be implementing COW (copy-on-write) to duplicate only those memory pages that differ between images, so in normal circumstances the duplication of the JVM wouldn't consume as much memory as you may think.
As Brian implied, on unix, the standard way for another process to start another program is to fork into a parent process and a child process. The child process then calls exec to replace itself with the new program. The JVM has to do this to start your jfmerge program.
Normally, the memory size of the child process isn't an issue, because the OS uses copy-on-write to let the two processes share the same memory image until the child calls exec. It could be that the JVM's model for child processes requires it to fork twice, with the grandchild exec'ing jfmerge and the child process that manages the grandchild. That would explain why you see a duplicate JVM process that you are seeing. The stack trace shows a process blocked reading from an input stream. It may be that jfmerge is running slowly and the process is just hung waiting for jfmerge to produce some output.
What you could do is to get some other process to launch jfmerge, instead of your 5GB JVM. Write a standalone program which just runs jfmerge on demand, and have it communicate with the main process through some form of inter-process communication. This standalone jfmerge server wouldn't require as much memory to operate, so the impact of forked child processes wouldn't be so great.