Update file descriptor pointing to /proc/self after fork() from python multiprocess.Process - subprocess

I'm working on a C++ program that uses boost::python to provide a python wrapper/API for the user. The program tracks and limits its own memory usage by opening /proc/self/statm using a file descriptor. Every timestep it seeks to the beginning of that file and reads the vmsize from it.
proc_self_statm_fd = open( "/proc/self/statm", O_RDONLY );
However, this causes a problem when calling fork(). In particular, when a user writes a python script that does something like this:
proc = multiprocessing.Process(name="bkg_process",target=bkg_process,daemon=True)
The problem is that the forked process gets the file descriptor pointing to /proc/self/statm from the parent process, not its own, and this reports the wrong memory usage. Even worse, if the parent process exits, the child process will fail when trying to read from the file descriptor.
What's the correct solution for this? It needs to be handled at the C++ level because we don't have control over the user's python scripts. Is there a way to have the class auto detect that a fork has happened and grab a new file descriptor? In the worst case I can have it re-open the file for every update. I'm worried that would add runtime overhead though.

You could store the PID in the class, and check it against the value of getpid() on each call, and then reopen the file if the PID has changed. getpid() is typically much cheaper than open - on some systems it doesn't even need a context switch (it just fetches the PID from a magic location in the process's own memory).
That said, you may also want to actually measure the cost of reopening the file each time - it may not actually be significant.

Related

Files read immediately after watchman notify are empty

I'm integrating watchman via the socket/bser interface in a JVM program.
I'm seeing odd timing where:
A file is written to by the build system (a small text file)
I get a watchman notification on the bser interface
Thread A listening for bser subscription notifications puts the update onto a queue for a separate thread
Thread B reads the queue, reads the changed file, and then puts the file's data on the wire
However, somehow, Thread B is reading an empty file.
Which, I assume is validly empty at some point, e.g. the IO/syscalls might be:
Clear the file contents
Write chunk 1
Write chunk 2
Close the file
And I assume my Thread B is reading the file between steps 1 and 2. Or maybe 1 and 4, if 4 is when the result is flushed.
My confusion is two fold:
1) I thought watchman's default 20ms wait would account for things like this, and I'd only see an update on my thread A, let alone when my thread B does a read, after step 4, and the data is done being written to the file.
2) Even if watchman did tell me "too soon" about the 1st syscall (say step 1), and I read the results while it was an empty file, there should be another syscall/watchman notification that "btw, the file has some content now".
FWIW/oddly enough, I was seeing this very same behavior when using the Java WatchService API, where I would get an inotify event, but read a file "too soon", and so get either empty or partial results, and then no follow up inotify event when the rest of the data was available.
I assumed this was a fluke/nuance of the WatchService, so I solved it at the time by checking the file mod time before reading it, and just waiting to ensure mod time >2 seconds old before assuming the file is "done" being written.
(Note that this also handled ~100mb+ files being written, where the build process might write a chunk of data every 100ms+, but with WatchService I was seeing 100s of inotify notifications for what was essentially a single continuous write.)
When I ported my WatchService code to watchman, I dropped this "ensureSettled" hack, because I assumed watchman's 20ms settle period (which is way lower than the 2s I was using, but hey it's the default) + it's general robustness compared to the somewhat beta WatchService would mean it wouldn't be a problem.
But within ~a day of using the watchman-ported code, I'm seeing empty file reads, just like I was with the WatchService.
Any ideas about what I'm missing?
I can add back the ensureSettled hack, but at this point I'm curious about what is going on.
The docs aren't very clear on this, sorry!
Dispatching of subscription notifications is subject to the settle timeout, but since file updates are non-atomic it's likely that the default 20ms kicks in before the file contents are visible to you; under the covers, the kernel generates a series of notifications for the various mutations that you're doing, so if the truncate takes 20ms before you write (or perhaps flush) the data out, you'll likely get a notification "in the middle".
This stuff is also operating system dependent. Here's an example of a recently discovered and resolved issue: https://github.com/facebook/watchman/commit/bac383c751b248ae742a2a20df3e8272238c0ae2
it doesn't sound like it is quite the same thing as you're experiencing, it just adds some color to this discussion.
If you already have code to manage the settling in your client, then it may be easier for you to add that back; we do this in watchman-make for example.
You may also wish to try setting https://facebook.github.io/watchman/docs/config.html#settle in a .watchmanconfig file in the root of the directory tree that you're watching and leave that to the watchman server. If/when you change this setting, you will need to delete and restart the watch.
Which you choose depends on how you want to trade ease of configuration against volume of code you want to maintain and (perhaps) volume of support questions from your user base if the .watchmanconfig isn't correctly configured for them.
Note that you can use the command invocation from https://facebook.github.io/watchman/docs/cmd/log-level.html to see the debug logging for the kernel notifications as they come in in real time; this may be helpful for you in understanding exactly which notifications are coming in and when.
Just curious, are you using https://github.com/facebook/watchman/tree/master/java to talk to the watchman server?

How to block until all file descriptors are ready? Use select()/poll()/epoll()?

I am in the situation where I would like a C program to block on a set of file descriptors until all files are ready. This differs from the traditional select(), poll(), and epoll() system calls that only block until any file descriptor is ready. Is there a standard function that will block until all files are ready? Or perhaps there are some other clever tricks?
Obviously, I could call select() in a loop until all file descriptors are ready, but I don't want to incur the overheads of context switches, preemptions, migrations, etc.. I'd rather that the select()'ing task just sleep until all files are ready.
It's not thread safe in case there are other threads operating on some of the same file descriptors at the same time (but you probably shouldn't be doing that anyway) but you can try this:
Initialize the poll set to all of the file descriptors you're interested in.
poll() for the current set of file descriptors
When poll() returns, scan the revents and find all of the file descriptors that are ready. Remove them from the poll set.
If there are any file descriptors still in the set, go back to step 2.
poll one last time with the full set of file descriptors to make sure they are all still ready.
If some are not ready anymore, go back to step 1.
success
It still may involve many poll() calls, but at least it doesn't busy-wait. I don't think there exists a more efficient way.

What are the Perl techniques to detach just a portion of code to run independently?

I'm not involved in close-to-OS programming techniques, but as I know, when it comes to doing something in parallel in Perl the weapon of choice is fork and probably some useful modules built upon it. The doc page for fork says:
Does a fork(2) system call to create a new process running the same program at the same point.
As a consequence, having a big application that consumes a lot of memory and calling fork for a small task means there will be 2 big perl processes, and the second will waste resources just to do some simple work.
So, the question is: what to do (or how to use fork, if it's the only method) in order to have a detached portion of code running independently and consuming just the resources it needs?
Just a very simpel example:
use strict;
use warnings;
my #big_array = ( 1 .. 2000000 ); # at least 80 MB memory
sleep 10; # to have time to inspect easely the memory usage
fork();
sleep 10; # to have time to inspect easely the memory usage
and the child process consumes 80+ MB too.
To be clear: it's not important to communicate to this detached code or to use its result somehow, just to be possible to say "hey, run for me this simple task in the background and let me continue my heavy work meanwhile ... and don't waste my resources!" when running a heavy perl application.
fork() to exec() is your bunny here. You fork() to create a new process (which is a fairly cheap operation, see below), then exec() to replace the big perl you've got running with something smaller. This looks like this:
use strict;
use warnings;
use 5.010;
my #ary = (1 .. 10_000_000);
if (my $pid = fork()) {
# parent
say "Forked $pid from $$; sleeping";
sleep 1_000;
} else {
# child
exec('perl -e sleep 1_000');
}
(#ary was just used to fill up the original process' memory a bit.)
I said that fork()ing was relatively cheap, even though it does copy the entire original process. These statements are not in conflict; the guys who designed fork noticed this same problem. The copy is lazy, that is, only the bits that are actually changed are copied.
If you find you want the processes to talk to each other, you'll start getting into the more complex domain of IPC, about which a number of books have been written.
Your forked process is not actually using 80MB of resident memory. A large portion of that memory will be shared - 'borrowed' from the parent process until either the parent or child writes to it, at which point copy-on-write semantics will cause the memory to actually be copied.
If you want to drop that baggage completely, run exec in your fork. That will replace the child Perl process with a different executable, thus freeing the memory. It's also perfect if you don't need to communicate anything back to the parent.
There is no way to fork just a subset of your process's footprint, so the usual workarounds come down to:
fork before you run memory intensive code in the parent process
start a separate process with system or open HANDLE,'|-',.... Of course this new process won't inherit any data from its parent, so you will need to pass data to this child somehow.
fork() as implemented on most operating systems is nicely efficient. It commonly uses a technique called copy-on-write, to mean that pages are initially shared until one or other process writes to them. Also a lot of your process memory is going to be readonly mapped files anyway.
Just because one process uses 80MB before fork() doesn't mean that afterwards the two will use 160. To start with it will be only a tiny fraction more than 80MB, until each process starts dirtying more pages.

How to preserve data between executions of program

I am running a perl script on a HP-UX box. The script will execute every 15 minutes and will need to compare it's results with the results of the last time it executed.
I will need to store two variables (IsOccuring and ErrorCount) between the executions. What is the best way to do this?
Edit clarification:
It only compares the most recent execution to the current execution.
It doesn't matter if the value is lost between reboots.
And touching the filesystem is pretty much off limits.
If you can't touch the file system, try using a shared memory segment. There are helper modules for that like IPC::ShareLite, or you can use the shmget and related functions directly.
You'll have to store them in a file. This sort of file is often kept in /tmp, but any place where the user running the cron job has access would do. Make sure your script can handle the case where the file is missing.
You could create a separate process running a "remember stuff" service over your choice of IPC mechanism. This sounds like a rather tortured solution to "I don't want to touch the disk" but if it's important enough to offset a couple of days of development work (realistically, if you are new to IPC, and HP-SUX continues to live up to its name) then by all means read man perlipc for a start.
Does it have to be completely re-executed? Can you just have it running in a loop and sleeping for 15 minutes between iterations? Than you don't have to worry about saving the values externally, the program never stops.
I definitely think IPC is the way to go here.
I'd save off the data in a file. Then, inside the script I'd load the last results if the file exists.
Use module Storable to serialize Perl data structures, save them anywhere you want and deserialize them during next script execution.

Perl: Installing signal handlers in forked child which execs

I found the answer in Managing Signal Handling for daemons that fork() very helpful for what I'm doing. I'm unsure about how to solve
"You will therefore need to install any signal handling in the execed process when it starts up"
I don't have control over the process that start up. Is there any way for me to force some signal handles on the execed from the parent of the fork?
Edit:{
I'm writing a Perl module that monitors long-running processes. Instead of
system(<long-running cmd>);
you'd use
my_system(<ID>, <long-running cmd>);
I create a lock file for the <ID> and don't let another my_system(<ID>...) call through if there is one currently running with a matching ID.
The parent fork/execs <long-running cmd> and is in change of cleaning up the lock file when it terminates. I'd like to have the child self-sufficient so the parent can exit (or so the child can take care of itself if the parent gets a kill -9).
}
On Unix systems, you can make an exec'd process ignore signals (unless the process chooses to override what you say), but you can't force it to set a handler for it. The most you can do is leave the relevant signal being handled by the default handler.
If you think about it, you'll realize why. To install a signal handler, you have to provide a function pointer - but the process that does the exec() can't specify one of its functions because they won't exist as part of the exec'd process, and it can't specify one of the exec'd processes functions because they don't exist as part of the exec'ing process. Similarly, you can't register atexit() handlers in the exec'ing process that will be honoured by the exec'd process.
As to your programming problem, there's a good reason that the lock file normally contains the process ID (pid) of the process that holds the lock; it allows you to check whether that process is still around, even if it isn't your child. You can read the pid from the lock file, and then use kill(pid, 0) which will tell you if the process exists and you can signal it without actually sending any signal.
One approach would be to use two forks.
The first fork would create a child process responsible for cleaning up the lock file if the parent dies. This process will also fork a grandchild which would exec the long running command.