memory sharing -- between sytem call & interupt handler - linux-device-driver

I read following link
Linux Device Driver Program, where the program starts?
as per this all system calls operate independent to each other.
1> Then how to share common memory between different system call & interrupt handler.
but there should be some way to allocate memory ... so that they have common access to a block of memory.
2> Also which pointer to allocate the memory? so that it is accessiable by all ?
Is there some example which uses driver private data ?

Related

What exactly happens when an OS goes into kernel mode?

I find that neither my textbooks or my googling skills give me a proper answer to this question. I know it depends on the operating system, but on a general note: what happens and why?
My textbook says that a system call causes the OS to go into kernel mode, given that it's not already there. This is needed because the kernel mode is what has control over I/O-devices and other things outside of a specific process' adress space. But if I understand it correctly, a switch to kernel mode does not necessarily mean a process context switch (where you save the current state of the process elsewhere than the CPU so that some other process can run).
Why is this? I was kinda thinking that some "admin"-process was switched in and took care of the system call from the process and sent the result to the process' address space, but I guess I'm wrong. I can't seem to grasp what ACTUALLY is happening in a switch to and from kernel mode and how this affects a process' ability to operate on I/O-devices.
Thanks alot :)
EDIT: bonus question: does a library call necessarily end up in a system call? If no, do you have any examples of library calls that do not end up in system calls? If yes, why do we have library calls?
Historically system calls have been issued with interrupts. Linux used the 0x80 vector and Windows used the 0x2F vector to access system calls and stored the function's index in the eax register. More recently, we started using the SYSENTER and SYSEXIT instructions. User applications run in Ring3 or userspace/usermode. The CPU is very tricky here and switching from kernel mode to user mode requires special care. It actually involves fooling the CPU to think it was from usermode when issuing a special instruction called iret. The only way to get back from usermode to kernelmode is via an interrupt or the already mentioned SYSENTER/EXIT instruction pairs. They both use a special structure called the TaskStateSegment or TSS for short. These allows to the CPU to find where the kernel's stack is, so yes, it essentially requires a task switch.
But what really happens?
When you issue an system call, the CPU looks for the TSS, gets its esp0 value, which is the kernel's stack pointer and places it into esp. The CPU then looks up the interrupt vector's index in another special structure the InterruptDescriptorTable or IDT for short, and finds an address. This address is where the function that handles the system call is. The CPU pushes the flags register, the code segment, the user's stack and the instruction pointer for the next instruction that is after the int instruction. After the systemcall has been serviced, the kernel issues an iret. Then the CPU returns back to usermode and your application continues as normal.
Do all library calls end in system calls?
Well most of them do, but there are some which don't. For example take a look at memcpy and the rest.

Linux kernel flush_cache_range() call appears to do nothing

Introduction:
We have an application in which Linux running on an ARM accepts data from an external processor which DMA's the data into the ARM's memory space. The ARM then needs to access that data from user-mode code.
The range of addresses must be physically contiguous as the DMA engine in the external processor does not support scatter/gather. This memory range is initially allocated from the ARM kernel via a __get_free_pages(GFP_KERNEL | __GFP_DMA,order) call as this assures us that the memory allocated will be physically contiguous. Then a virt_to_phys() call on the returned pointer gives us the physical address that is then provided to the external processor at the beginning of the process.
This physical address is known also to the Linux user mode code which uses it (in user mode) to call the mmap() API to get a user mode pointer to this memory area. Our Linux kernel driver then sees a corresponding call to its mmap routine in the driver's file_operations structure. The driver then retains the vm_area_struct "vma" pointer that is passed to it in the call to its mmap routine for use later.
When the user mode code receives a signal that new data has been DMA'd to this memory address it then needs to access it from user mode via the user mode pointer we got from the call to mmap() mentioned above. Before the user mode code does this of course the cache corresponding to this memory range must be flushed. To accomplish this flush the user mode code calls the driver (via an ioctl) and in kernel mode a call to flush_cache_range() is made:
flush_cache_range(vma,start,end);
The arguments passed to the call above are the "vma" which the driver had captured when its mmap routine was called and "start" and "end" are the user mode addresses passed into the driver from the user mode code in a structure provided to the ioctl() call.
The Problem:
What we see is that the buffer does not seem to be getting flushed as we are seeing what appears to be stale data when accesses from user mode are made. As a test rather than getting the user mode address from a mmap() call to our driver we instead call the mmap() API to /dev/mem. In this case we get uncached access to the buffer (no flushing needed) and then everything works perfectly.
Our kernel version is 3.8.3 and it's running on an ARM 9. Is there a logical eror in the approach we are attempting?
Thanks!
I have a few question after which i might be able to answer:
1) How do you use "PHYSICAL" address in your mmap() call? mmap should have nothing to do with physical addresses.
2)What exactly do you do to get user virtual addresses in your driver?
3)How do you map these user virtual addresses to physical addresses and where do you do it?
4)Since you preallocate using get_free_pages(), do you map it to kernel space using ioremap_cache()?

Kernel Code vs User Code

Here's a passage from the book
When executing kernel code, the system is in kernel-space execut-
ing in kernel mode.When running a regular process, the system is in user-space executing
in user mode.
Now what really is a kernel code and user code. Can someone explain with example?
Say i have an application that does printf("HelloWorld") now , while executing this application, will it be a user code, or kernel code.
I guess that at some point of time, user-code will switch into the kernel mode and kernel code will take over, but I guess that's not always the case since I came across this
For example, the open() library function does little except call the open() system call.
Still other C library functions, such as strcpy(), should (one hopes) make no direct use
of the kernel at all.
If it does not make use of the kernel, then how does it make everything work?
Can someone please explain the whole thing in a lucid way.
There isn't much difference between kernel and user code as such, code is code. It's just that the code that executes in kernel mode (kernel code) can (and does) contain instructions only executable in kernel mode. In user mode such instructions can't be executed (not allowed there for reliability and security reasons), they typically cause exceptions and lead to process termination as a result of that.
I/O, especially with external devices other than the RAM, is usually performed by the OS somehow and system calls are the entry points to get to the code that does the I/O. So, open() and printf() use system calls to exercise that code in the I/O device drivers somewhere in the kernel. The whole point of a general-purpose OS is to hide from you, the user or the programmer, the differences in the hardware, so you don't need to know or think about accessing this kind of network card or that kind of display or disk.
Memory accesses, OTOH, most of the time can just happen without the OS' intervention. And strcpy() works as is: read a byte of memory, write a byte of memory, oh, was it a zero byte, btw? repeat if it wasn't, stop if it was.
I said "most of the time" because there's often page translation and virtual memory involved and memory accesses may result in switched into the kernel, so the kernel can load something from the disk into the memory and let the accessing instruction that's caused the switch continue.

How does OS execute compiled binary files?

This questions came to my head when I was studying processes scheduling.
How does OS execute and control the execution of binary and compiled files? I thought maybe OS copies a part of the binary to some memory location, jumps there, comes back after executing that block and executes the next one. But then it wouldn't have any control on it (e.g. the program can do a jump anywhere and don't come back).
In JVM case it makes perfect sense, the VM is interpreting each instruction. But in the binary files case the instructions are real CPU executable instructions so I don't think that an OS acts like VM.
It does exactly that. The operating system, in some order,
creates an entry in the process table
creates a virtual memory space for the process
loads the program code in the process memory
points the process instruction pointer to the process entry point
creates an entry in the scheduler and sets the process thread ready for execution.
Concurrency is not handled by the program being split into blocks. Switching between tasks is done via interrupts: before a process is given CPU, a timer is set up. When the timer finishes, the CPU registers an interrupt, pushes the instruction pointer to the stack and jumps to the interrupt handler defined by the operating system. This handler stores the CPU state in memory, swaps a virtual memory table and restores some other thread that is ready for execution. The same swap occurs if the thread must pause for some other reason (waiting for user / disk / network...) or yields.
http://en.wikipedia.org/wiki/Multitasking#Preemptive_multitasking.2Ftime-sharing
Note that relying on the process yielding the CPU is possible but unreliable (the process might not yield, preventing other processes from running)
http://en.wikipedia.org/wiki/Multitasking#Cooperative_multitasking.2Ftime-sharing
Security is handled by switching the CPU into protected mode where the application code cannot run some instructions (so jumping around randomly is mostly harmless). See the link provided by #SkPhilipp
Note that modern JVM does not interpret each instruction (that would be slow). Instead it compiles into native code and runs the code or (in case of just-in-time compilation) interprets at first, but compiles the "hot spots" (the code that gets run often enough).

Runtime.exec causes duplicate JVM to hang indefinitely until killed (Solaris 10)

All,
We are running a J2EE application on WebLogic server 9.2 MP2 with a jrockit 64-bit JVM (27.3.1) on Solaris 10.
We call use runtime.exec to call an executable called jfmerge to create PDF documents.
We have found that in Solaris, when runtime.exec is called, a duplicate JVM is temporarily spawned to kick off the jfmerge process. While this is inefficient (our JVM is 5 GB, thus the duplicated shell JVM is also 5 GB), the major problem lies in the fact that when there is heavy load on this functionality (PDF generation) in our application, sometimes the duplicated JVM never exits.
When the JVM hangs, the servers create large issues (extreme application slowness and terminated user sessions) as the entire duplicate JVM get's all of its 5 GB of process size written to disk swap.
We have noted the following hung thread correlated with a hung JVM process until the process is manually killed:
"[STUCK] ExecuteThread: '17' for queue: 'weblogic.kernel.Default
(self-tuning)'" id=3463 idx=0x158 tid=3460 prio=1 alive, in native,
daemon
at
jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native
Method)
at jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:30)
at java/io/FileInputStream.readBytes([BII)I(FileInputStream.java)
at java/io/FileInputStream.read(FileInputStream.java:194)
at
java/lang/UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:227)
at java/io/BufferedInputStream.fill(BufferedInputStream.java:218)
at java/io/BufferedInputStream.read(BufferedInputStream.java:235)
^-- Holding lock:
java/io/BufferedInputStream#0xfffffffec6510470[thin lock]
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getProcessStatus(FormsBean.java:809)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.createPDF(FormsBean.java:750)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.getTemplateDetails(FormsBean.java:450)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generateSinglePDF(FormsBean.java:1371)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.generatePDF(FormsBean.java:263)
at
gov/v3/common/formgeneration/sessionbean/FormsBean.endorseDocument(FormsBean.java:2377)
at
gov/v3/common/formgeneration/sessionbean/Forms_qaco28_EOImpl.endorseDocument(Forms_qaco28_EOImpl.java:214)
at
gov/v3/delegates/common/FormsAndNoticesDelegate.endorseDocument(FormsAndNoticesDelegate.java:128)
at
gov/v3/actions/common/EndorseDocumentAction.executeRequest(EndorseDocumentAction.java:68)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.dispatchToExecuteMethod(V3CommonDispatchAction.java:532)
at
gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.executeBaseAction(V3CommonDispatchAction.java:336)
at
gov/v3/fwk/controller/struts/action/V3BaseDispatchAction.execute(V3BaseDispatchAction.java:69)
at
org/apache/struts/action/RequestProcessor.processActionPerform(RequestProcessor.java:484)
at
gov/v3/fwk/controller/struts/requestprocessor/V3TilesRequestProcessor.processActionPerform(V3TilesRequestProcessor.java:384)
at
org/apache/struts/action/RequestProcessor.process(RequestProcessor.java:274)
at
org/apache/struts/action/ActionServlet.process(ActionServlet.java:1482)
at
org/apache/struts/action/ActionServlet.doGet(ActionServlet.java:507)
at
gov/v3/fwk/controller/struts/servlet/V3ControllerServlet.doGet(V3ControllerServlet.java:110)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:743)
at javax/servlet/http/HttpServlet.service(HttpServlet.java:856)
at
weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:283)
at
weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:175)
at
weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3231)
at
weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at
weblogic/security/service/SecurityManager.runAs(SecurityManager.java:121)
at
weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2002)
at
weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:1908)
at
weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1362)
at weblogic/work/ExecuteThread.execute(ExecuteThread.java:209)
at weblogic/work/ExecuteThread.run(ExecuteThread.java:181)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
We would like to do a couple of things:
1.) Prevent the spawning of a duplicate JVM, as we do not need any of it's functions when executing the simple jfmerge executable, and it creates massive overhead.
2.) In the short term at least prevent this duplicate JVM from handing indefinitely.
This answer is late, but we have the same problem, and the problem for us is how Solaris manage the memory.
The problem is when we have a application server, using a lot of memory 10GB in my case, and we want to run a simple "ls", the new process needs 10GB to run.
Solaris needs the 10GB extra available in our server, Linux use a feature known as “copy-on-write” This feature reduces the overhead of forking a new process
http://developers.sun.com/solaris/articles/subprocess/subprocess.html
Historical Background and Problem Description
Traditionally, Unix has had only one way to create a new process: using a fork() system call, often followed by an exec() system call. The fork() call makes a copy of the entire parent process' address space, and exec() turns that copy into a new process.
(Note: In the Solaris OS, the term swap space is used to describe a combination of physical memory and disk swap space configured for the system. However, with other Unix systems this term may mean swap space on disk, also known as backing store. To avoid any confusion, I'll use the term Virtual Memory (VM) to mean physical memory plus disk swap space.)
Generally, the fork/exec method has worked quite well. However, it has disadvantages in some cases, such as running out of memory without a good reason and poor fork performance.
Out of Memory: For a large-memory process, the fork() system call can fail due to an inadequate amount of VM, because fork() requires twice the amount of the parent memory. This can happen even when fork() is immediately followed by an exec() call that would release most of that extra memory. When this happens, the application will usually terminate.
For example, suppose a 64-bit application is consuming 6 gigabytes (Gbytes) of VM at the moment, and it needs to create a subprocess to run the ls(1) command. The parent process issues a fork() call that will succeed only if there is another 6 Gbytes of VM available at the moment. If the system doesn't have that much VM available (which is a frequent situation), fork() will fail with ENOMEM. Obviously, the ls(1) command doesn't need anywhere near 6 Gbytes of memory to run, but fork() doesn't know that.
Not only applications, but also Sun's own tools can suffer from the same problem. For example, the following Sun RFE (request for enhancement) has been filed for dbx: "4748951 dbx shell should use posix_spawn() for non-builtin commands rather than fork(2)".
RFE 4748951 came about when a customer's utility invoked dbx to read a huge core file using a script that also needed to run a cut(1) command from within dbx. They got a cannot fork - try again error message causing dbx to abort. An investigation revealed that dbx used fork/exec to execute that tiny cut(1) command and ran out of VM during the fork() call.
The Solaris Java Virtual Machine (JVM) is also suffering from the same problem currently, as described in this Sun RFE: "5049299 Use posix_spawn, not fork, on S10 to avoid swap exhaustion".
So you have 3 options.
1.- Execute the Runtime.exec function earlier.
2.- Create a inter process comunication with other java server, and ececute there the Runtime.exec instruccion.
3.- Create a JNI class to call a system C function. I take this option, and it work perfect.
I put my sample code here.
Java Code.
public class CallOS {
static {
System.loadLibrary("CallOS");
}
public native int exec(java.lang.String cmd);
public static void main(String[] args) {
int returnValue = 0;
returnValue = new CallOS().exec("ls -la");
System.out.println("- " + returnValue);
}
}
C header Code. This is generate with javah -jni CallOS
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CallOS */
#ifndef _Included_CallOS
#define _Included_CallOS
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: CallOS
* Method: exec
* Signature: (Ljava/lang/String;)I
*/
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *, jobject, jstring);
#ifdef __cplusplus
}
#endif
#endif
C code.
#include "CallOS.h"
#include <stdlib.h>
JNIEXPORT jint JNICALL Java_CallOS_exec
(JNIEnv *env, jobject obj, jstring cmd)
{
jint retval;
jbyte *str;
str = (*env)->GetStringUTFChars(env, cmd, NULL);
if(str == NULL) return NULL;
retval = system(str);
(*env)->ReleaseStringUTFChars(env, cmd, str);
return retval;
};
I hope this help for you.
Are you handling the spawned process stdout/stderr properly ? You need to consume both in separate threads to reliably prevent blocking. See this answer for details. It may be the case that your process spawning works properly for some jobs, and for others (due to the quantity of stdout/err that triggers a hang).
On the subject of duplicate processes, I would expect the JVM to fork/exec. This duplicates the Java process (fork) and then it should replace it with the new process (exec). I wonder if that's what you're seeing ? Note also that I'd expect the OS to be implementing COW (copy-on-write) to duplicate only those memory pages that differ between images, so in normal circumstances the duplication of the JVM wouldn't consume as much memory as you may think.
As Brian implied, on unix, the standard way for another process to start another program is to fork into a parent process and a child process. The child process then calls exec to replace itself with the new program. The JVM has to do this to start your jfmerge program.
Normally, the memory size of the child process isn't an issue, because the OS uses copy-on-write to let the two processes share the same memory image until the child calls exec. It could be that the JVM's model for child processes requires it to fork twice, with the grandchild exec'ing jfmerge and the child process that manages the grandchild. That would explain why you see a duplicate JVM process that you are seeing. The stack trace shows a process blocked reading from an input stream. It may be that jfmerge is running slowly and the process is just hung waiting for jfmerge to produce some output.
What you could do is to get some other process to launch jfmerge, instead of your 5GB JVM. Write a standalone program which just runs jfmerge on demand, and have it communicate with the main process through some form of inter-process communication. This standalone jfmerge server wouldn't require as much memory to operate, so the impact of forked child processes wouldn't be so great.