How to preserve data between executions of program - perl

I am running a perl script on a HP-UX box. The script will execute every 15 minutes and will need to compare it's results with the results of the last time it executed.
I will need to store two variables (IsOccuring and ErrorCount) between the executions. What is the best way to do this?
Edit clarification:
It only compares the most recent execution to the current execution.
It doesn't matter if the value is lost between reboots.
And touching the filesystem is pretty much off limits.

If you can't touch the file system, try using a shared memory segment. There are helper modules for that like IPC::ShareLite, or you can use the shmget and related functions directly.

You'll have to store them in a file. This sort of file is often kept in /tmp, but any place where the user running the cron job has access would do. Make sure your script can handle the case where the file is missing.

You could create a separate process running a "remember stuff" service over your choice of IPC mechanism. This sounds like a rather tortured solution to "I don't want to touch the disk" but if it's important enough to offset a couple of days of development work (realistically, if you are new to IPC, and HP-SUX continues to live up to its name) then by all means read man perlipc for a start.

Does it have to be completely re-executed? Can you just have it running in a loop and sleeping for 15 minutes between iterations? Than you don't have to worry about saving the values externally, the program never stops.

I definitely think IPC is the way to go here.

I'd save off the data in a file. Then, inside the script I'd load the last results if the file exists.

Use module Storable to serialize Perl data structures, save them anywhere you want and deserialize them during next script execution.

Related

Update file descriptor pointing to /proc/self after fork() from python multiprocess.Process

I'm working on a C++ program that uses boost::python to provide a python wrapper/API for the user. The program tracks and limits its own memory usage by opening /proc/self/statm using a file descriptor. Every timestep it seeks to the beginning of that file and reads the vmsize from it.
proc_self_statm_fd = open( "/proc/self/statm", O_RDONLY );
However, this causes a problem when calling fork(). In particular, when a user writes a python script that does something like this:
proc = multiprocessing.Process(name="bkg_process",target=bkg_process,daemon=True)
The problem is that the forked process gets the file descriptor pointing to /proc/self/statm from the parent process, not its own, and this reports the wrong memory usage. Even worse, if the parent process exits, the child process will fail when trying to read from the file descriptor.
What's the correct solution for this? It needs to be handled at the C++ level because we don't have control over the user's python scripts. Is there a way to have the class auto detect that a fork has happened and grab a new file descriptor? In the worst case I can have it re-open the file for every update. I'm worried that would add runtime overhead though.
You could store the PID in the class, and check it against the value of getpid() on each call, and then reopen the file if the PID has changed. getpid() is typically much cheaper than open - on some systems it doesn't even need a context switch (it just fetches the PID from a magic location in the process's own memory).
That said, you may also want to actually measure the cost of reopening the file each time - it may not actually be significant.

Files read immediately after watchman notify are empty

I'm integrating watchman via the socket/bser interface in a JVM program.
I'm seeing odd timing where:
A file is written to by the build system (a small text file)
I get a watchman notification on the bser interface
Thread A listening for bser subscription notifications puts the update onto a queue for a separate thread
Thread B reads the queue, reads the changed file, and then puts the file's data on the wire
However, somehow, Thread B is reading an empty file.
Which, I assume is validly empty at some point, e.g. the IO/syscalls might be:
Clear the file contents
Write chunk 1
Write chunk 2
Close the file
And I assume my Thread B is reading the file between steps 1 and 2. Or maybe 1 and 4, if 4 is when the result is flushed.
My confusion is two fold:
1) I thought watchman's default 20ms wait would account for things like this, and I'd only see an update on my thread A, let alone when my thread B does a read, after step 4, and the data is done being written to the file.
2) Even if watchman did tell me "too soon" about the 1st syscall (say step 1), and I read the results while it was an empty file, there should be another syscall/watchman notification that "btw, the file has some content now".
FWIW/oddly enough, I was seeing this very same behavior when using the Java WatchService API, where I would get an inotify event, but read a file "too soon", and so get either empty or partial results, and then no follow up inotify event when the rest of the data was available.
I assumed this was a fluke/nuance of the WatchService, so I solved it at the time by checking the file mod time before reading it, and just waiting to ensure mod time >2 seconds old before assuming the file is "done" being written.
(Note that this also handled ~100mb+ files being written, where the build process might write a chunk of data every 100ms+, but with WatchService I was seeing 100s of inotify notifications for what was essentially a single continuous write.)
When I ported my WatchService code to watchman, I dropped this "ensureSettled" hack, because I assumed watchman's 20ms settle period (which is way lower than the 2s I was using, but hey it's the default) + it's general robustness compared to the somewhat beta WatchService would mean it wouldn't be a problem.
But within ~a day of using the watchman-ported code, I'm seeing empty file reads, just like I was with the WatchService.
Any ideas about what I'm missing?
I can add back the ensureSettled hack, but at this point I'm curious about what is going on.
The docs aren't very clear on this, sorry!
Dispatching of subscription notifications is subject to the settle timeout, but since file updates are non-atomic it's likely that the default 20ms kicks in before the file contents are visible to you; under the covers, the kernel generates a series of notifications for the various mutations that you're doing, so if the truncate takes 20ms before you write (or perhaps flush) the data out, you'll likely get a notification "in the middle".
This stuff is also operating system dependent. Here's an example of a recently discovered and resolved issue: https://github.com/facebook/watchman/commit/bac383c751b248ae742a2a20df3e8272238c0ae2
it doesn't sound like it is quite the same thing as you're experiencing, it just adds some color to this discussion.
If you already have code to manage the settling in your client, then it may be easier for you to add that back; we do this in watchman-make for example.
You may also wish to try setting https://facebook.github.io/watchman/docs/config.html#settle in a .watchmanconfig file in the root of the directory tree that you're watching and leave that to the watchman server. If/when you change this setting, you will need to delete and restart the watch.
Which you choose depends on how you want to trade ease of configuration against volume of code you want to maintain and (perhaps) volume of support questions from your user base if the .watchmanconfig isn't correctly configured for them.
Note that you can use the command invocation from https://facebook.github.io/watchman/docs/cmd/log-level.html to see the debug logging for the kernel notifications as they come in in real time; this may be helpful for you in understanding exactly which notifications are coming in and when.
Just curious, are you using https://github.com/facebook/watchman/tree/master/java to talk to the watchman server?

How can I copy load module using rexx?

I want to copy a load module from one pds to another using REXX.
You could invoke IEBCOPY from within a Rexx, allocating the appropriate datasets to the appropriate ddnames before invoking IEBCOPY.
I'm unable to provide an example as I don't have the facilities/access.
Note that doing so may tie up your terminal/session.
You could also go into a more elaborate solution to build and submit a batch job, perhaps even having a panel front end, driving file tailoring/skeletons.
As #cshneid said you can use IEBCOPY Using IEBCOPY in rexx is basically the same as in JCL but:
use TSO Alloc to allocate the files
Call/invoke the program
if running under ISPF you can use LMCOPY. Roughly the following should work, you may need to issue a LMOPEN / LMClose on the data-ids as well ???
Address ISPEXEC
'LMINIT DATAID(DIDFrom) Dataset(in.data.set)'
'LMINIT DATAID(DIDTo) Dataset(to.data.set)'
'LMCOPY FromId('DIDFrom') FROMMEM(mymem) toId('DIDTo') toMem(newMemberName)'
'LMFREE DATAID(DIDFrom)'
'LMFREE DATAID(DIDto)'
If running foreground, the ISPF services used to have the advantage as they "co-ordinated" there actions with all other ISPF users- Less likely to corrupt the PDS directory. Not sure if this is an advantage any more.
Using just REXX what you want to do is not possible, however, you can invoke IEBCOPY (or your site equivalent) to perform the task for you.
You may want to investigate calling programs like IEBCOPY and passing it the appropriate control cards to perform your task.

Is there a way to copy files in a non-blocking way in Scala?

I have checked java.nio.file.Files.copy but that blocks a thread until the copy is done. Are there any libraries that allow one to copy a file in a non-blocking way? I need to perform many of these operations simultaneously and cannot afford to have so many threads blocked.
While I could write something myself using non-blocking streams, I would rather use something tried and tested that would guarantee a correct copy every time (or detect if something went wrong).
Check this: Iterate over lines in a file in parallel (Scala)?
val chunkSize = 128 * 1024
val iterator = Source.fromFile(path).getLines.grouped(chunkSize)
iterator.foreach { lines =>
lines.par.foreach { line => process(line) }
}
Reading (copying) files by chunks in parallel. In this case "par" is used.
So it quite non-blocking in terms / scope of processors (cores).
But you may follow same idea of chunks, for example using Akka/Future/Promises to be even in wider scopes.
You may customize you chunk-size deepening on your performance characteristic, level of system load, etc..
One more link that explains possible way to do read / write data from (property) file in parallel using Akka Actors. This is not quite that you might be want, but it may give an idea.
Idea - you may build your own not-blocking way of reading / copying files.
--
And about your statement "While I could write something myself using non-blocking streams":
I would remind that each OS / File System (FS) may have its own vision about what and where to block. Like Windows blocks a file (write-block at leat) if one thread writes to it. On Linux is is configurable. So if you want to stick to something stable, I would suggest to think it out and go with your own wrapper (over FS) solution based on events, chunks, states.
I have used the Process class, issuing an operating system command to copy the file. Of course, one has to check under which OS the application is running, and issue the appropriate command, but this allows for fast and asynchronous copies.
As Marius rightly mentions in the comments, Scala Process blocks, so I run it wrapped in a Future.
Java 8 Process introduces a function isAlive(). A non-blocking alternative would be to use Java 8 processes and use the scheduler to poll at regular intervals to see if the process has finished. However, I did no need to go to this extent.
Have you checked out the async stuff in scala-io?
http://jesseeichar.github.io/scala-io-doc/0.4.2/index.html#!/core/async%20read%20write

perl File::Tail syncronization

im having this situation:
Im parsing some log files with perl daemon. This daemon writes data to mysql db.
Log file can:
be rotated ('solved by filesize and some logic')
doesnt exist ('ignore_nonexistant' parameter in Tail)
Daemon:
Can be killed
Can became dead by some reazon.
Im using File::Tail to tail tha file. For file rotation mechanism of date of creation or filesize can help. and what mechanism should i use to start tail from some position in file? (asume that there is a lot of such daemons, no write access to filesystem).
I've think about position variable in DB, but this wont help me.
Maybe some mechanism to pass position parameter to parrent process?
I just dont want to reinvent bicycle.
File::Tail already detects rotation and continues reading from the new file.
To deal with the daemon dying and restarting, can you query the database for the last record written when the daemon restarts, and just skip logfile lines until you get to a later line?
Try http://search.cpan.org/dist/Log-Unrotate/.
You'll have to implement your own Log::Unrotate::Cursor class if you wish to store position files in DB instead of local filesystem, but that should be trivial.
We wrote and used Log::Unrotate for 5 years in production and it tries really hard to never skip any data. (It tries so hard that it throws exception if your cursor becomes invalid, for example if log got rotated several times while reader didn't work for some reason. You may want to enable autofix_cursor option to change this behavior).
Also take a look at http://search.cpan.org/dist/File-LogReader/. I never used it but it's supposed to solve the same task.