X10 code does not recognize the second core of the CPU - multicore

I wrote a canonical "Hello, World" demo class in X10:
class Hello {
public static def main(args:Rail[String]):void {
finish for (p in Place.places()) {
at (p) async Console.OUT.println(here+" says hello");
}
Console.OUT.println("Goodbye");
}
}
My computer has a 2-core CPU, but X10 code does not recognize two processing cores. As I understand, it recognizes one core of the CPU only. So, the output to console is as follows:
Place(0) says hello
Goodbye
instead of
Place(0) says hello
Place(1) says hello
Goodbye
as it might be expected to be.
How to force X10 code to recognize all processing cores of the CPU available on my laptop?
My laptop is equipped with an Intel Core 2 Duo CPU. The operating system is Windows 7.
Thank you in advance.

By default, X10 will only start a single place regardless of how many physical cores are available. (In fact, depending on your application it may be better to run a single place for each multi-core CPU, and use async to take advantage of multithreaded parallelism within a place.
To request a different number of places, either set the environment variable X10_NPLACES or use the -np option to the x10 launcher, i.e.
X10_NPLACES=2 x10 Hello
x10 -np 2 Hello

Related

How can I make KDB sleep for 5 seconds?

I've got a rate-limited endpoint I'm querying, and so I need to make KDB pause between requests. Is there a way I can block the current thread?
The solutions using system are good for the majority of use cases. If you want an OS-agnostic method, the below lambda can be used, passing the time to wait as an argument:
{t:.z.p;while[.z.p<t+x]} 00:00:05
This approach can also be used if you're making these requests using secondary/slave threads (for example, using peach), whereas attempting to use the system keyword will fail in this case.
You can use
system"sleep 5"
on linux OS
system"timeout 5"
on Windows OS
For reference system lets you run any commands native to the OS. Like doing \ in the terminal e.g. \timeout 5 ~ system"timeout 5"

Sleep mode and duty cycling in UnetStack, and adding energy consumed in idle listening, and sleeping modes into a simple energy model

I have two questions:
We want to consider a very low transmission duty cycle in our underwater sensor network,as it is the power consumption in listening and sleep mode that will dominate our network lifetime in practice.
I noticed the Scheduler Commands in the new version of UnetStack Simulator, version 3.2.0, the addsleep , showsleep etc, I downloaded the the latest version of the simulator, I tried to use those commands, but it didn't work, I tried to work both on the shell as well as inside groovy scripts, and tried to import org.arl.unet.scheduler, but none of the Scheduler commands worked, and kept receiving errors.
For example, I tried to use this: addsleep 20.s.later, but the simulator does not recognise "later", also received errors for using import org.arl.unet.scheduler.
I wonder if anyone can help me in that, in how to use the addsleep command for example.
Another question:
Besides consuming energy in transmitting and receiving, our modem draws 2.5 mA from a 5V supply while listening for the start of a packet, and can go to sleep and draw about 0.24 mA from a 5V supply, with the ability to wake up and return to the listening mode after a programmable time period.
So my question is, is there a way to consider energy consumed in idle listening and sleeping in a simple energy model?
We implemented a very simple energy model, something like the following (found this example in stackoverflow):
class MyHalfDuplexModem extends HalfDuplexModem {
float energy = 1000
#Override
boolean send(Message m) {
if (m instanceof TxFrameNtf) energy -= 10
if (m instanceof RxFrameNtf) energy -= 1
return super.send(m)
}
}
How to add energy consumed in idle listening, and sleeping to the above code? shall we need to use something like WakeFromSleepNtf ()
Thanks and any help is much appreciated.
Marwa
The scheduler service is usually hardware dependent, as it requires interaction with the specific single board computer (SBC) to put it into a sleep state and allow it to be woken up. On modems, this is usually the modem driver agent.
The HalfDuplexModem simulated modem doesn't provide this service, and so it won't work out of the box. Since HalfDuplexModem doesn't have an energy model build into it, "sleep" doesn't mean much to it. If you wanted to simulate networks where nodes slept and consumed less energy during the sleep, it would be possible to extend the HalfDuplexModem to implement the SCHEDULER service. The service is quite simple, with just 4 messages (AddScheduledSleepReq, RemoveScheduledSleepReq, GetSleepScheduleReq and WakeFromSleepNtf). Your implementation could keep track of the energy used by each node, based on whether it is sleeping, listening or transmitting, since you can keep track of the sleep schedule and hence know how much time the node has been awake/sleeping.
Commands addsleep, showsleep etc are simply convenience shortcuts in the shell extension that use the above 4 messages to do the actual work. They are enabled in the shell by loading the SchedulerShellExt, and you can simply use the messages directly from agents or in simulation scripts.

Theoretical embedded linux requirements

I come from a programmer background using Java, C#, C++, Javascript
I got my self a Raspberry Pi (Model 1 A, the one without ethernet) and played around for a while with it. I used Raspbian and Arch Linux ARM (since it was said it is small and customizable). Unfortunatly I didn't manage to configure them as I want to have them.
I am trying to build a nice looking (embedded) system with the only goal to start (boot) the Raspberry Pi fast and autostart a test application which will be written in C# (Mono), C++ (Qt), Java (Java Runtime) or something in JavaScript/HTML.
Since I was not able to get rid of all the log messages (i got rid of most), the tty login screen, the attempts of connecting to the network (although the Model 1 A does not have ethernet at all) booting was ugly and took long (+1 minute in some cases).
It seems I will have to build a minimum embedded linux but I have a lack in the theory of embedded linux elements and how they fit together.
My question: What are the theoretically required parts of an embedded linux holding either mono, qt, java runtime on a raspberry pi?
So far I know the following parts:
the hardware (raspberry pi model 1 A) + sd card
the sd card holds 2 partitions, 1 boot partition (fat32), 1 data partition (ext4)
a boot loader
a linux kernel (which can be optimized to the needs of a raspi)
But what then? My research got lost at "use a distro" what I don't want. What are the missing pieces between the kernel and starting an application?
An Embedded Linux system is comprised of many different parts that work together towards the same goal of making things work efficiently.
Ideally, that is not much different from a regular GNU/Linux system, but let's see in detail the building blocks of a generic embedded system.
For the following explanation, I am assuming as architecture ARM. What is written below may differ slightly from implementation to implementation, but is usually a common track for commercial embedded systems.
Blocks of a GNU/Linux Embedded System
Hardware
SoC
The SoC is where all the processing takes places, it is the main processing unit of the whole system and the only place that has "intelligence". It is in charge of using the other hardware and running your software.
It is made of various and heterogeneous sub-blocks:
Core + Caches + MMU - the "real" processor, e.g. ARM Cortex-A9. It's the main thing you will notice when choosing a SoC.
May be coadiuvated by e.g. a SIMD coprocessor like NEON.
Internal RAM - generally very small. Used in the first phase of the boot sequence.
Various "Peripherals" - connected via some interconnect
fabric/bus to the Core. These can span from a simple ADC to a 3D Graphics Accelerator. Examples of such IP cores are: USB, PCI-E, SGX, etc.
A low power/real time coprocessor - some systems offer one or more coprocessor thought either to help the main Core with real time tasks (e.g. industrial communication buses) or to handle low power states. Its/their architecture might (or not) be a relative of the Core's one.
External RAM
It is used by the SoC to store temporary data after the system has bootstrapped and during the bootstrap itself. It's usually the memory your embedded system uses during regular operation.
Non-Volatile Memory - optional
May or may not be present. In your case it's the SD card you mentioned. In other cases could be a NAND, NOR or SPI Dataflash memory (or any combination of them).
When present, it is often the regular source of data the SoC will read from and usually stores all the SW components needed for the system to work.
Could not be necessary/useful in some kind of applications.
External Peripherals
Anything not strictly related to the above.
Could be a MAC ID EEPROM, some relays, a webcam or whatever you can possibly imagine.
Software
First of all, we introduce what is called the bootchain, which is what happens as soon as you power up your SoC and - someway - tell it to start running. In the following list, the bootchain is the subsequent calls of point 1 to point 4.
Apart from specific/exotic implementations, it is more or less always the same:
Boot ROM code - a small (usually masked - aka factory impressed) memory contained in the SoC. The first thing the SoC will do when powered up is to execute the code in it.
This code will - generally according to external configuration pins - decide the so-called "boot strategy" or "boot order", which is where (and in what order) to look for additional code to be executed. The suitable mediums are disparate: USB storage devices, USB hosts, SD cards, NANDs, NORs, SPI dataflashes, Ethernets, UARTs, etc.
If none of the above contains something valid, the Boot ROM will usually issue a soft reset of the SoC, and so on.
The code in the medium is not, of course, executed in place: it gets copied into the Internal RAM then executed.
[The following two are contained in what we will call bootloader medium]
1st stage bootloader - it has just been copied by the Boot ROM into
the SoC's Internal RAM. Must be tiny enough to fit that memory
(usually well under 100kB). It is needed because the Boot ROM isn't
big enough and does not know what kind of External RAM the SoC is
attached to. Has the main important function of initializing the
External RAM and the SoC's external memory interface, as well as
other peripherals that may be of interest (e.g. disable watchdog
timers). Once done, it copies the next stage to the External RAM and
executes it. Depending on the context, could be called MLO, SPL or
else.
2nd stage bootloader - the "main" bootloader. Bigger (could be x10) than the 1st stage one, completes the initializiation of the
relevant peripherals (e.g. ethernet, additional storage media, LCD
displays). Allows a much more complicated logic for what to do next
and offers - depending on the level of sofistication - high level
facilities (filesystem/volume handling, data
copy-move-interpretation, LCD output, interactive console, failsafe
policies). Most of the times loads a Linux kernel (and related) into
memory from some medium and passes relevant information to it (e.g.
if not embedded, for newer kernels the DTB physical address is put
in the r2 register - the Kernel then reads the register and
retrieves the DTB)
Linux Kernel - the core of the operating system. Depending on the
hardware platform may or may not be a mainline ("official") version.
Is usually completed by built-in or loadable (from an external
source - free or not) modules. Initializes all the hardware needed for the complete system to work according to hardcoded configuration and the DT - enables MMU, orchestrates the whole system and accesses the hardware exlusively. According to the boot arguments
(cmdline - usually passed by the previous stage) and/or to compiled
options, the Kernel tries to mount a root file system. From the
rootfs, it will try to load an init (namely, /sbin/init - where / is
the just mounted rootfs).
Init and rootfs - init is the first non-Kernel task to be run, and
has PID 1. It initalizes literally everything you need to use your
system. In production embedded systems, it also starts the main
application. In such systems is either BusyBox or a custom crafted
application.
More on rootfs and distros
Rootfs contains all of your GNU/Linux systems that is not Kernel (apart from /lib/modules and other bits).
It contains all the applications that manage peripherals like Ethernet, WiFi, or external UMTS modems.
Contains the interactive part of the system, contains the user interface, and everything else you see when you boot a GNU/Linux system - embedded or not.
A "distro" is just a particular collection of userspace (non-Kernel) programs and libraries (usually) verified to work well one with the other, put toghether by a particular group of people.
Desktop distros usually also ship with a custom-tailored kernel and a bootloader. Examples are Fedora, Ubuntu, Debian, etc.
In the general sense of the term, nothing stops you from creating your own distro, which is what happens everytime a custom embedded system goes in production: through tools like Yocto or Buildroot (or by hand), in fact, you are able to decide the very particular collection (hence distro, distribution) of softwares fit for the purpose of the system.
To sum up and answer exactly to your question, the missing part you are looking for is init and the process of mounting the rootfs: the Kernel mounts - aka renders available to itself - via its drivers and the passed/builtin parameters - a given volume/partition (the ext4 data partition you mention) to the "/" mount point.
In this volume/partition there is a /sbin/init executable, which the Kernel executes.
This is the "Big Bang" of our GNU/Linux userspace system: the place where everything visible starts. Depending on the configuration scripts (usually located under /etc/init.d) the "application" you mention is either run automatically by init or by the user via a terminal/ssh/whatever that - again - init made you possible to use.

Specman beginner's questions

I am new to Specman.
I have a couple of questions:
I am trying to use the agent methodology. After writing the env,agent,bfm etc - what is the recommended way to create clock and reset? by writing a tb.v (calling the top verilog module) or is there a better way?
How do I link the specman env file to the tb (or maybe its just enough to link the ports of the different specman files with a signals_map to the verilog files?
Most important how do I run the environment with irun?
I was thinking of creating a file listing all the verilog files, e.g. - veri.lst
the specman top shall import all the specman files, e.g - spec_top.e
irun -access +wrc veri.lst spec_top.e
should be ok?
should I mention the top level module in the command?
Should I put the test name in a special way in the command?
Thanks alot for all the help!!
Cadence recommends driving clocks from inside an HDL testbench (i.e. written in Verilog in your case). This is because every time the simulator yields control to Specman to execute it wastes processor time. You want to minimize the number of switches as much as possible.
Linking the env to the TB is done by connecting the Verilog signals of interest to the corresponding Specman ports (using hdl_path()).
W.r.t. running it, there are 2 things to keep in mind. e code can be executed in compiled or in interpreted mode. Also, compiled code is faster, but can't be debugged. You have to tell irun what you want compiled and what you want interpreted:
irun -f veri.lst \
compiled_top.e \
-snload interpreted_top.e
What you typically compile are files which you don't expect to change (verification components that you buy or reuse from other projects, for example). The rest of your files you'd load interpreted to be able to easily debug.
Adding to Tudor's great answer -
First - yes, connecting The e TB to the DUT is done using hdl_path(), and connecting the ports to external. You usually would have one unit designated for the interface, so configuring it would look something like this:
extend signal_map {
// name of the instance of the verilog module you interface
keep hdl_path() == "sub_system_a";
keep bind (sig_clock, external);
// name of the clock signal
keep sig_clock.hdl_path == "clk";
};
Please take a look in the IES release, at the UVM Examples.
They are in
specman/uvm/uvm_examples
For example, check out the specman/uvm/uvm_examples/xserial/e/xserial_collector_h.e:
And about the clock -
Connecting a clock in the e TB to the design is very simple. Something like this -
unit synch {
sig_clock : in simple_port of bit is instance;
keep bind(sig_clock, external);
event clock is rise(sig_clock$) #sim;
// can define also on fall or change
};
Now the clock event can be used as sampling event for TCMs and Temporals. This is a simple fast way for using the clock in the TB.
Another way to use the clock, is more "acceleration ready". In this methodology, you would implement a clock agent in verilog, and it will provide "clock services" to the TB. According to this methodology, the TB will not have any "wait cycles" in it. instead - it will call the Clock Agent task "wait_cycles()" - and wait for indication that required number of clock cycles passed.
This is a rather new methodology, oriented to be Acceleration Ready.
It will be demonstrated in the UVM Examples in next IES release, 15.1.
/efrat

Runtime.exec causes duplicate JVM to hang indefinitely until killed (Solaris 10)

All,
We are running a J2EE application on WebLogic server 9.2 MP2 with a jrockit 64-bit JVM (27.3.1) on Solaris 10.
We call use runtime.exec to call an executable called jfmerge to create PDF documents.
We have found that in Solaris, when runtime.exec is called, a duplicate JVM is temporarily spawned to kick off the jfmerge process. While this is inefficient (our JVM is 5 GB, thus the duplicated shell JVM is also 5 GB), the major problem lies in the fact that when there is heavy load on this functionality (PDF generation) in our application, sometimes the duplicated JVM never exits.
When the JVM hangs, the servers create large issues (extreme application slowness and terminated user sessions) as the entire duplicate JVM get's all of its 5 GB of process size written to disk swap.
We have noted the following hung thread correlated with a hung JVM process until the process is manually killed:
"[STUCK] ExecuteThread: '17' for queue: 'weblogic.kernel.Default
(self-tuning)'" id=3463 idx=0x158 tid=3460 prio=1 alive, in native,
daemon
at
jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native
Method)
at jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:30)
at java/io/FileInputStream.readBytes([BII)I(FileInputStream.java)
at java/io/FileInputStream.read(FileInputStream.java:194)
at
java/lang/UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227)
at java/io/BufferedInputStream.fill(BufferedInputStream.java:218)
at java/io/BufferedInputStream.read(BufferedInputStream.java:235)
^-- Holding lock:
java/io/BufferedInputStream#0xfffffffec6510470[thin lock]
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getProcessStatus(FormsBean.java:809)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.createPDF(FormsBean.java:750)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getTemplateDetails(FormsBean.java:450)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generateSinglePDF(FormsBean.java:1371)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generatePDF(FormsBean.java:263)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.endorseDocument(FormsBean.java:2377)
at
gov/v3/common/formgeneration/sessionbean/Forms_qaco28_EOImpl.endorseDocument(Forms_qaco28_EOImpl.java:214)
at
gov/v3/delegates/common/FormsAndNoticesDelegate.endorseDocument(FormsAndNoticesDelegate.java:128)
at
gov/v3/actions/common/EndorseDocumentAction.executeRequest(EndorseDocumentAction.java:68)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.dispatchToExecuteMethod(V3CommonDispatchAction.java:532)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.executeBaseAction(V3CommonDispatchAction.java:336)
at
gov/v3/fwk/controller/struts/action/V3BaseDispatchAction.execute(V3BaseDispatchAction.java:69)
at
org/apache/struts/action/RequestProcessor.processActionPerform(RequestProcessor.java:484)
at
gov/v3/fwk/controller/struts/requestprocessor/V3TilesRequestProcessor.processActionPerform(V3TilesRequestProcessor.java:384)
at
org/apache/struts/action/RequestProcessor.process(RequestProcessor.java:274)
at
org/apache/struts/action/ActionServlet.process(ActionServlet.java:1482)
at
org/apache/struts/action/ActionServlet.doGet(ActionServlet.java:507)
at
gov/v3/fwk/controller/struts/servlet/V3ControllerServlet.doGet(V3ControllerServlet.java:110)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:743)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:856)
at
weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:283)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:175)
at
weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3231)
at
weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at
weblogic/security/service/SecurityManager.runAs(SecurityManager.java:121)
at
weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2002)
at
weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:1908)
at
weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1362)
at weblogic/work/ExecuteThread.execute(ExecuteThread.java:209)
at weblogic/work/ExecuteThread.run(ExecuteThread.java:181)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
We would like to do a couple of things:
1.) Prevent the spawning of a duplicate JVM, as we do not need any of it's functions when executing the simple jfmerge executable, and it creates massive overhead.
2.) In the short term at least prevent this duplicate JVM from handing indefinitely.
This answer is late, but we have the same problem, and the problem for us is how Solaris manage the memory.
The problem is when we have a application server, using a lot of memory 10GB in my case, and we want to run a simple "ls", the new process needs 10GB to run.
Solaris needs the 10GB extra available in our server, Linux use a feature known as “copy-on-write” This feature reduces the overhead of forking a new process
http://developers.sun.com/solaris/articles/subprocess/subprocess.html
Historical Background and Problem Description
Traditionally, Unix has had only one way to create a new process: using a fork() system call, often followed by an exec() system call. The fork() call makes a copy of the entire parent process' address space, and exec() turns that copy into a new process.
(Note: In the Solaris OS, the term swap space is used to describe a combination of physical memory and disk swap space configured for the system. However, with other Unix systems this term may mean swap space on disk, also known as backing store. To avoid any confusion, I'll use the term Virtual Memory (VM) to mean physical memory plus disk swap space.)
Generally, the fork/exec method has worked quite well. However, it has disadvantages in some cases, such as running out of memory without a good reason and poor fork performance.
Out of Memory: For a large-memory process, the fork() system call can fail due to an inadequate amount of VM, because fork() requires twice the amount of the parent memory. This can happen even when fork() is immediately followed by an exec() call that would release most of that extra memory. When this happens, the application will usually terminate.
For example, suppose a 64-bit application is consuming 6 gigabytes (Gbytes) of VM at the moment, and it needs to create a subprocess to run the ls(1) command. The parent process issues a fork() call that will succeed only if there is another 6 Gbytes of VM available at the moment. If the system doesn't have that much VM available (which is a frequent situation), fork() will fail with ENOMEM. Obviously, the ls(1) command doesn't need anywhere near 6 Gbytes of memory to run, but fork() doesn't know that.
Not only applications, but also Sun's own tools can suffer from the same problem. For example, the following Sun RFE (request for enhancement) has been filed for dbx: "4748951 dbx shell should use posix_spawn() for non-builtin commands rather than fork(2)".
RFE 4748951 came about when a customer's utility invoked dbx to read a huge core file using a script that also needed to run a cut(1) command from within dbx. They got a cannot fork - try again error message causing dbx to abort. An investigation revealed that dbx used fork/exec to execute that tiny cut(1) command and ran out of VM during the fork() call.
The Solaris Java Virtual Machine (JVM) is also suffering from the same problem currently, as described in this Sun RFE: "5049299 Use posix_spawn, not fork, on S10 to avoid swap exhaustion".
So you have 3 options.
1.- Execute the Runtime.exec function earlier.
2.- Create a inter process comunication with other java server, and ececute there the Runtime.exec instruccion.
3.- Create a JNI class to call a system C function. I take this option, and it work perfect.
I put my sample code here.
Java Code.
public class CallOS {
static {
System.loadLibrary("CallOS");
}
public native int exec(java.lang.String cmd);
public static void main(String[] args) {
int returnValue = 0;
returnValue = new CallOS().exec("ls -la");
System.out.println("- " + returnValue);
}
}
C header Code. This is generate with javah -jni CallOS
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CallOS */
#ifndef _Included_CallOS
#define _Included_CallOS
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: CallOS
* Method: exec
* Signature: (Ljava/lang/String;)I
*/
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *, jobject, jstring);
#ifdef __cplusplus
}
#endif
#endif
C code.
#include "CallOS.h"
#include <stdlib.h>
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *env, jobject obj, jstring cmd)
{
jint retval;
jbyte *str;
str = (*env)->GetStringUTFChars(env, cmd, NULL);
if(str == NULL) return NULL;
retval = system(str);
(*env)->ReleaseStringUTFChars(env, cmd, str);
return retval;
};
I hope this help for you.
Are you handling the spawned process stdout/stderr properly ? You need to consume both in separate threads to reliably prevent blocking. See this answer for details. It may be the case that your process spawning works properly for some jobs, and for others (due to the quantity of stdout/err that triggers a hang).
On the subject of duplicate processes, I would expect the JVM to fork/exec. This duplicates the Java process (fork) and then it should replace it with the new process (exec). I wonder if that's what you're seeing ? Note also that I'd expect the OS to be implementing COW (copy-on-write) to duplicate only those memory pages that differ between images, so in normal circumstances the duplication of the JVM wouldn't consume as much memory as you may think.
As Brian implied, on unix, the standard way for another process to start another program is to fork into a parent process and a child process. The child process then calls exec to replace itself with the new program. The JVM has to do this to start your jfmerge program.
Normally, the memory size of the child process isn't an issue, because the OS uses copy-on-write to let the two processes share the same memory image until the child calls exec. It could be that the JVM's model for child processes requires it to fork twice, with the grandchild exec'ing jfmerge and the child process that manages the grandchild. That would explain why you see a duplicate JVM process that you are seeing. The stack trace shows a process blocked reading from an input stream. It may be that jfmerge is running slowly and the process is just hung waiting for jfmerge to produce some output.
What you could do is to get some other process to launch jfmerge, instead of your 5GB JVM. Write a standalone program which just runs jfmerge on demand, and have it communicate with the main process through some form of inter-process communication. This standalone jfmerge server wouldn't require as much memory to operate, so the impact of forked child processes wouldn't be so great.