Doing a regression on a large dataset, I have a huge read-only matrix that I'd like to share among several threads. I've looked at various way of doing this and found sharedmatrix toolkit to be exactly what I need. Reading through the tutorial, I came up with the following setup:
Session 0 - just loads up the matrix and makes it available
Session 1..n - worker sessions
The problem is that Session 0 has to finish only after all the other n sessions have finished. Do you know how to make a session wait? The best solution would be to make it wait until I kill it as I'm running the scripts on a remote linux system and am not connected to it all the time.
UPDATE:
In the end I've changed my approach to the problem, after reading this part of the tutorial:
The “free” directive marks the shared memory segment for deletion.
Note: it is not actually deleted until every attached session
explicitly detaches or is terminated. As soon as the last session
detaches, the system will reclaim the allocated segment.
This means that I've created one "master" session that loads up the matrix, makes it available and then starts its own computation, and several "slave" sessions that use the shared matrix. Even if the master session finishes earlier, it causes no problem to the slave sessions as the shared matrix remains in the memory until the last process that uses it is terminated.
If you really want to have it wait indefinitely, use Eitan's solution based on pause. More generally, something like labBarrier is what you should use to perform this kind of synchronization.
Related
Recently, I have been reading about Operating Systems, and this bugs me a lot.
How is it really possible for one process to manage other process.
Basically a CPU simply executes instructions, after executing one instruction, then it executes the instruction at address pointed by IP and increments the IP.
Let me elaborate my doubt with an example. Lets say I have an User process (or simply a process) which is being executed by CPU. Lets say, it has 'n' instruction and currently executing 'i'th instruction. IP points to (i+1)th instruction.
So, at this point how can all other OS processes like Scheduler, dispatcher etc... comes into play, Since CPU is already executing another process.
One solution (Just a guess), I could think of is , the use of Interrupts and Interrupt Service Routines.
But its only a guess.
PS: I searched and couldn't find any satisfying answer.
With the help of the hardware, ticks causes the CPU to execute operating system code. This code checks the system state and the time that has elapsed since the beginning of this process execution. At this point, the operating system can decide to schedule a different process. All it has to do is save the current state of the running process with the process that is about to start running. (basically changing the content of the registers and saving the registers state before changing to the new process).
Eventually, the CPU is taken away even if the process doesn't want to yield it.
To address your concern, there are no operating system processes in the way you think... it isn't like there are OS processes in the queue waiting among other processes....
When I try to start up my Ensemble production, I get the following error:
ERROR ErrCanNotAcquireRuntimeLock: Could not acquire Ensemble runtime
global lock within timeout '10'
I figured I will disable all the services, processes and opperations and restart them individually to see which one is causing the error, however any action I take on the production takes a very long time and then comes back with the same error.
Googling the issue did not yield much, any ideas?
You should check the contents of your lock table while the production is not running -- it's likely that you have a job (or multiple jobs) that still have locks on the core Ensemble runtime globals. If you can identify the OS-level process(es) and can work out what they are actually doing, you should be able to terminate the OS processes. In both cases, you should perform this detection and termination from within Ensemble. You should be able to use the System Management Portal for both actions, or you can use the ^LOCKTAB and ^JOBEXAM CHUI utilities in the %SYS namespace to track this down.
If you can restart Ensemble server, lock table should be cleared. This however doesn't help to find the cause of your problem.
As far as I know in interrupt handler, there is no need of synchronization technique. The interrupt handler cannot run concurrently. In short, the pre-emption is disabled in ISR. However, I have a doubt regarding tasklets. As per my knowledge, tasklets runs under interrupt context. Thus, In my opinion, there is no need for spin lock under tasklet function routine. However, I am not sure on it. Can somebody please explain on it? Thanks for your replies.
If data is shared between top half and bottom half then go for lock. Simple rules for locking. Locks meant to protect data not code.
1. What to protect?.
2. Why to protect?
3. How to protect.
Two tasklets of the same type do not ever run simultaneously. Thus, there is no need to protect data used only within a single type of tasklet. If the data is shared between two different tasklets, however, you must obtain a normal spin lock before accessing the data in the bottom half. You do not need to disable bottom halves because a tasklet never preempts another running tasklet on the same processor.
For synchronization between code running in process context (A) and code running in softirq context (B) we need to use special locking primitives. We must use spinlock operations augmented with deactivation of bottom-half handlers on the current processor in (A), and in (B) only basic spinlock operations. Using spinlocks makes sure that we don't have races between multiple CPUs while deactivating the softirqs makes sure that we don't deadlock in the softirq is scheduled on the same CPU where we already acquired a spinlock. (c) Kernel docs
I have a large file I need to load into the cache (As a hash) which will be shared between perl processes. Loading the data into cache takes around 2 seconds, but we have over 10 calls per second.
Does using the compute method cause other processes to be locked out?
Otherwise, I would appreciate suggestions on how to manage the load process so that there's a guaranteed lock during load and only one load process happening!
Thanks!
Not sure about the guaranteed lock, but you could use memcached with a known key as a mutex substitute (as of this writing, you haven't said what your "cache" is). If the value for the key is false, set it to true, start loading, and then return the result.
For the requests happening during that time, you could either busy-wait, or try using a 503 "Service Unavailable" status with a few seconds in the Retry-After field. I'm not sure of the browser support for this.
As a bonus, using a time-to-live of, say, 5 minutes on the mutex key will cause the entire file to be reloaded every 5 minutes. Otherwise, if you have a different reason for reloading the file, you will need to manually delete the key.
I've been using Proc::Daemon in an attempt to make a start/stop daemon script, something allows me to do:
X start
X stop
X status
etc. However, in the source code it looks like that Proc::Daemon uses either a "pid" file, or a search of the process table. I'm concerned with both of these approaches, firstly as "pid"s are reused, which may give the impression a service is up when it's actually down, and secondly that process table entries are easily faked, and the checking doesn't look particularly robust.
Is there any robust way to make a start/stop daemon script/program like I've described, or has someone already made one? Note that I haven't got root access, and I'm also on Solaris if that's important.
Although pids are reused, I believe that they round-robin through a (large) fixed size set. e.g. on Solaris this used to be 30,000 (it may be different now). So 30,000 processes would have to start/finish before your pid was reused.
The approach used by Proc::Daemon doesn't look unreasonable and is a fairly common approach to this problem.
An approach I use is to have the daemon process obtain an exclusive (write) lock on a file.
You can test to see if anyone is holding the lock by trying to obtain the lock yourself, and there are various ways of obtaining the PID of a process holding a lock on a file - i.e. fcntl and probably something in /proc.
Some words of advice:
Use local files (ie. not NFS) for locks.
Make sure the lock file exist before the daemon is started.
Never delete the lock file.
The kernel associates locks with the inode number of the file, so you always want the lock file to have the same inode number throughout all time. Deleting and recreating the lock file will change the inode associated with the lock.
A simple keep alive mechanism can be implemented as a cron job - the cron job just tries to spawn the daemon process every N minutes, and then have the daemon quietly exit if it can't obtain the exclusive lock.