Most POSIX named objects (or all?) have unlink functions. e.g:
shm_unlink
mq_unlink
They all have in common, that they remove the name of the object from the system, causing next opens to fail or create a new object.
Why is this designed like this? I know, this is connected to the "everything is a file" policy, but why not delete the file on close? Would you do this the same if you create a new interface?
I think, this has a big drawback. Say, we have a server process and several client processes. If any process unlinks the object (by mistake) all new clients would not find the server. (This can be prohibited by user permissions on the according file, but still...)
Would it not be better, if it had reference counting and the name would be removed automatically when the last object is closed? Why would you want to keep it open?
Because they are low level tools that could be used when performance matters. Deleting the object when it is not used to create it again on next use has a (slight) performance penalty against keeping it alive.
I once used a named semaphore that I used to synchronize accesses to a spool with various producers and consumers. I used an init module to create the named semaphore that was called as part of the boot process, and all other processes knew that the well known semaphore should exist.
If you want a more programmer friendly way that creates the object on demand and destroys it when it is no longer used, you can build a higher level library and encapsulate the creation/unlink operations in it. But if the system call included it, it would not be possible to build a user level library avoiding it.
Would it not be better, if it had reference counting and the name would be removed automatically when the last object is closed?
No.
Because unlink() can fail and because always unlinking a resource that can be shared between processes when all processes merely close that resource simply doesn't fit the paradigm of a shared resource.
You don't demolish a roller coaster just because there's no one waiting in line to ride it again at that moment in time.
Related
Say we have a certain parent process with some arbitrary amount of data stored in memory and we use fork to spawn a child process. I understand that in order for the OS to perform copy on write, the certain page in memory that contains the data that we are modifying will have its Read-only bit set, and the OS will use the exception that will result when the child tries to modify the data to copy the entire page into another area in memory so that the child gets it's own copy. What I don't understand is, if that specific section in memory is marked as Read-only, then the parent process, to whom the data originally belonged, would not be able to modify the data neither. So how can this whole scheme work? Does the parent lose ownership of its data and copy on write will have to be performed even when the parent itself tries to modify the data?
Right, if either process writes a COW page, it triggers a page fault.
In the page fault handler, if the page is supposed to be writeable, it allocates a new physical page and does a memcpy(newpage, shared_page, pagesize), then updates the page table of whichever process faulted to map the newpage to that virtual address. Then returns to user-space for the store instruction to re-run.
This is a win for something like fork, because one process typically makes an execve system call right away, after touching typically one page (of stack memory). execve destroys all memory mappings for that process, effectively replacing it with a new process. The parent once again has the only copy of every page. (Except pages that were already copy-on-write, e.g. memory allocated with mmap is typically COW-mapped to a single physical page of zeros, so reads can hit in L1d cache).
A smart optimization would be for fork to actually copy the page containing the top of the stack, but still do lazy COW for all the other pages, on the assumption that the child process will normally execve right away and thus drop its references to all the other pages. It still costs a TLB invalidation in the parent to temporarily flip all the pages to read-only and back, though.
Some UNIX implementations share the program text between the two since
that cannot be modified. Alternatively, the child may share all of the
parent’s memory, but in that case the memory is shared
copy-on-write, which means that whenever either of the two wants to modify part of the memory, that chunk of memory is explicitly
copied first to make sure the modification occurs in a private memory
area.
Excerpted from: Modern Operating Systems (4th Edition), Tanenbaum
2 Questions that need answering:
1) Why a process need a resource being held by another process in an operating system?
2) Following up to question #1: Why not avoid the deadlock problem altogether by putting the resources exactly in the same place as the process so there is no idea of "sharing" or "distribution" of resources among processes?
So the question relates to a deadlock concept where the process needs a resource from another process.
A process may want to print on a printer but that printer may already been printing data of another process. Or it may want to read keys from a keyboard that another process is already reading.
As seen in 1, resources need to be shared to utilize them effectively. You can of course have one keyboard for each process and one printer for each document but that would be very costly.
I've searched all over the web for definitions, but I'm still confused. I've narrowed it all down to two different defintions:
"A data structure is persistent if it support saccess to multiple version" and "Persistence is the ability of an object to survive the lifetime of the OS process in which it resides".
To me, these mean different things, but maybe I'm just not getting it. Could someone please explaing to me in a basic way what exactly persistence means?
This word means different things in different contexts:
Persistent data structures create new copies of themselves to incorporate changes (all versions are accessible AND modifiable at any time).
Persistence in your second example refers to the ability of objects to be stored in non-volatile memory, such as a hard disk. Otherwise, they would be destroyed when the OS ends its session.
I have a confusion in deadlock avoidance technique.
Could we achieve the deadlock avoidance by adding more number of resources?a)Yes
b)No
Deadlock does not equal deadlock, you have to be more specific. For a "classical" deadlock as described in books (two processes trying to access both the screen and the printer at the same time) adding resources does not count as option, because the process needs those specific resources.
Of course, in this example, adding another printer would solve the deadlock. But to be extensible to software development, where a "resource" is something more abstract, like the access to a certain variable, adding resources is not considered a valid option. If two processes need to share access to a variable, it is not possible to introduce another without changing the behavior of the program.
I'm very new to this Parallel::ForkManager module in Perl and it has a lot of credits, so I think it supports what I need and I just haven't figured out yet.
What I need to do is in each child process, it writes some updates into a global hash map, according to the key value computed in each child process.
However, when I proceed to claim a hash map outside the for loop and expect the hash map is updated after the loop, it turns out that the hash map stays empty.
This means although the update inside the loop succeeds (by printing out the value), outside the loop it is not.
Does anybody know how to write such a piece of code that does what I want?
This isn't really a Perl-specific problem, but a matter of understanding Unix-style processes. When you fork a new process, none of the memory is shared by default between processes. There are a few ways you can achieve what you want, depending on what you need.
One easy way would be to use something like BerkeleyDB to tie a hash to a file on disk. The tied hash can be initialized before you fork and then each child process would have access to it. BerkeleyDB files are designed to be safe to access from multiple processes simultaneously.
A more involved method would be to use some form of inter-process communication. For all the gory details of achieving such, see the perlipc manpage, which has details on several IPC methods supported by Perl.
A final approach, if your Perl supports it, is to use threads and share variables between them.
Each fork call generates a brand new process, so updates to a hash variable in a child process are not visible in the parent (and changes to the parent after the fork call are not visible in the child).
You could use threads (and see also threads::shared) to have a change written in one thread be writeable in another thread.
Another option is to use interprocess communication to pass messages between parent and child processes. The Forks::Super module (of which I am the author) can make this less of a headache.
Or your child processes could write some output to files. When the parent process reaps them, it could load the data from those files and update its global hash map accordingly.
Read the "RETRIEVING DATASTRUCTURES from child processes" section from man Parallel::ForkManager. There are callbacks, child's data can be sent and parent can retrieve them and populate data structures.