I've been using Proc::Daemon in an attempt to make a start/stop daemon script, something allows me to do:
X start
X stop
X status
etc. However, in the source code it looks like that Proc::Daemon uses either a "pid" file, or a search of the process table. I'm concerned with both of these approaches, firstly as "pid"s are reused, which may give the impression a service is up when it's actually down, and secondly that process table entries are easily faked, and the checking doesn't look particularly robust.
Is there any robust way to make a start/stop daemon script/program like I've described, or has someone already made one? Note that I haven't got root access, and I'm also on Solaris if that's important.
Although pids are reused, I believe that they round-robin through a (large) fixed size set. e.g. on Solaris this used to be 30,000 (it may be different now). So 30,000 processes would have to start/finish before your pid was reused.
The approach used by Proc::Daemon doesn't look unreasonable and is a fairly common approach to this problem.
An approach I use is to have the daemon process obtain an exclusive (write) lock on a file.
You can test to see if anyone is holding the lock by trying to obtain the lock yourself, and there are various ways of obtaining the PID of a process holding a lock on a file - i.e. fcntl and probably something in /proc.
Some words of advice:
Use local files (ie. not NFS) for locks.
Make sure the lock file exist before the daemon is started.
Never delete the lock file.
The kernel associates locks with the inode number of the file, so you always want the lock file to have the same inode number throughout all time. Deleting and recreating the lock file will change the inode associated with the lock.
A simple keep alive mechanism can be implemented as a cron job - the cron job just tries to spawn the daemon process every N minutes, and then have the daemon quietly exit if it can't obtain the exclusive lock.
Related
The Area Creation process can take up to 24 hours. If something happens during that time which causes the process to stop, will it resume when I run it again or does it start back over from the beginning?
We can assume for this question that the files in $DB_DIR remain in place throughout the running/stopping/starting process.
It will start over from the beginning, assuming you're using areas.osm3s to define the area creation rules. This file contains a number of queries which are being executed to generate the areas. If you restart the process, it will execute those very same queries again from the beginning.
For performance reasons, we use areas_delta.osm3s and the accompanying rules_delta_loop.sh script on the production servers. This way, we can limit the workload to those areas, which have been changed since the last area creation run.
Recently, I have been reading about Operating Systems, and this bugs me a lot.
How is it really possible for one process to manage other process.
Basically a CPU simply executes instructions, after executing one instruction, then it executes the instruction at address pointed by IP and increments the IP.
Let me elaborate my doubt with an example. Lets say I have an User process (or simply a process) which is being executed by CPU. Lets say, it has 'n' instruction and currently executing 'i'th instruction. IP points to (i+1)th instruction.
So, at this point how can all other OS processes like Scheduler, dispatcher etc... comes into play, Since CPU is already executing another process.
One solution (Just a guess), I could think of is , the use of Interrupts and Interrupt Service Routines.
But its only a guess.
PS: I searched and couldn't find any satisfying answer.
With the help of the hardware, ticks causes the CPU to execute operating system code. This code checks the system state and the time that has elapsed since the beginning of this process execution. At this point, the operating system can decide to schedule a different process. All it has to do is save the current state of the running process with the process that is about to start running. (basically changing the content of the registers and saving the registers state before changing to the new process).
Eventually, the CPU is taken away even if the process doesn't want to yield it.
To address your concern, there are no operating system processes in the way you think... it isn't like there are OS processes in the queue waiting among other processes....
I've been using supervisord for a while -- outstanding tool. The one use case I haven't been able to figure out is, how to configure jobs to be restarted until a condition is met, then stop restarting.
Example: let's say you have a bunch of work to do, like scaling thousands of images, or servicing millions of requests on a queue. A useful pattern would be to run many workers in parallel to work on that backlog. You could set up a supervisord job that ensures 100 workers are running, and if any of them crash, supervisord will spin up replacements so the pool of workers won't shrink.
That's great until the work is done. Maybe when the backlog is gone, the number of workers should scale down to 1 or 0. Supervisord will keep spinning up the total to be 100 processes, even if each new process checks to see if there's work to be done, sees none, and shuts down very quickly.
Is there a way for a process instance or process family to communicate with supervisord to say, the autoretsart behavior is no longer needed? Better yet, is there a way to scale the number of worker processes up and down based on some condition (like number of files in a directory or ??).
I know it can be done by updating the supervisord.conf file and running supervisorctl reload, but I'd prefer something that's more declarative and self-managing if such a thing exists.
Is there a way for a process instance or process family to communicate with supervisord to say, the autoretsart behavior is no longer needed?
You can wind down an activity by making sure your processes exit with different exitcode(s) when there is no work and making those the expected exitcodes with autorestart=unexpected in the configuration.
Better yet, is there a way to scale the number of worker processes up and down based on some condition (like number of files in a directory or ??).
The trouble is that the automatic state transitions don't allow for getting processes running again from an expected EXITED state. AFAIK the only way to do this is with the XML-RPC API's startProcess, so you would need to write or find an appropriate event listener that watches for your start condition and then uses the API.
An alternate design is to wrap your worker process in an event handler watching PROCESS COMMUNICATION Events and have one normal subprocess communicating new tasks to a pool of event listeners. But that model doesn't currently eliminate a pool of waiting processes when there is no work, it just organizes the control task in a way that may make it easier to separate out task related logic and resource usage.
Doing a regression on a large dataset, I have a huge read-only matrix that I'd like to share among several threads. I've looked at various way of doing this and found sharedmatrix toolkit to be exactly what I need. Reading through the tutorial, I came up with the following setup:
Session 0 - just loads up the matrix and makes it available
Session 1..n - worker sessions
The problem is that Session 0 has to finish only after all the other n sessions have finished. Do you know how to make a session wait? The best solution would be to make it wait until I kill it as I'm running the scripts on a remote linux system and am not connected to it all the time.
UPDATE:
In the end I've changed my approach to the problem, after reading this part of the tutorial:
The “free” directive marks the shared memory segment for deletion.
Note: it is not actually deleted until every attached session
explicitly detaches or is terminated. As soon as the last session
detaches, the system will reclaim the allocated segment.
This means that I've created one "master" session that loads up the matrix, makes it available and then starts its own computation, and several "slave" sessions that use the shared matrix. Even if the master session finishes earlier, it causes no problem to the slave sessions as the shared matrix remains in the memory until the last process that uses it is terminated.
If you really want to have it wait indefinitely, use Eitan's solution based on pause. More generally, something like labBarrier is what you should use to perform this kind of synchronization.
I have a Perl script that I'm attempting to set up using Perl Threads (use threads). When I run simple tests everything works, but when I do my actual script (which has the threads running multiple SQLPlus sessions), each SQLPlus session runs in order (i.e., thread 1's sqlplus runs steps 1-5, then thread 2's sqlplus runs steps 6-11, etc.).
I thought I understood that threads would do concurrent processing, but something's amiss. Any ideas, or should I be doing some other Perl magic?
A few possible explanations:
Are you running this script on a multi-core processor or multi-processor machine? If you only have one CPU only one thread can use it at any time.
Are there transactions or locks involved with steps 1-6 that would prevent it from being done concurrently?
Are you certain you are using multiple connections to the database and not sharing a single one between threads?
Actually, you have no way of guaranteeing in which order threads will execute. So the behavior (if not what you expect) is not really wrong.
I suspect you have some kind of synchronization going on here. Possibly SQL*Plus only let's itself be called once? Some programs do that...
Other possiblilties:
thread creation and process creation (you are creating subprocesses for SQL*Plus, aren't you?) take longer than running the thread, so thread 1 is finished before thread 2 even starts
You are using transactions in your SQL scripts that force synchronization of database updates.
Check your database settings. You may find that it is set up in a conservative manner. That would cause even minor reads to block all access to that information.
You may also need to call threads::yield.