Linux kernel flush_cache_range() call appears to do nothing - linux-device-driver

Introduction:
We have an application in which Linux running on an ARM accepts data from an external processor which DMA's the data into the ARM's memory space. The ARM then needs to access that data from user-mode code.
The range of addresses must be physically contiguous as the DMA engine in the external processor does not support scatter/gather. This memory range is initially allocated from the ARM kernel via a __get_free_pages(GFP_KERNEL | __GFP_DMA,order) call as this assures us that the memory allocated will be physically contiguous. Then a virt_to_phys() call on the returned pointer gives us the physical address that is then provided to the external processor at the beginning of the process.
This physical address is known also to the Linux user mode code which uses it (in user mode) to call the mmap() API to get a user mode pointer to this memory area. Our Linux kernel driver then sees a corresponding call to its mmap routine in the driver's file_operations structure. The driver then retains the vm_area_struct "vma" pointer that is passed to it in the call to its mmap routine for use later.
When the user mode code receives a signal that new data has been DMA'd to this memory address it then needs to access it from user mode via the user mode pointer we got from the call to mmap() mentioned above. Before the user mode code does this of course the cache corresponding to this memory range must be flushed. To accomplish this flush the user mode code calls the driver (via an ioctl) and in kernel mode a call to flush_cache_range() is made:
flush_cache_range(vma,start,end);
The arguments passed to the call above are the "vma" which the driver had captured when its mmap routine was called and "start" and "end" are the user mode addresses passed into the driver from the user mode code in a structure provided to the ioctl() call.
The Problem:
What we see is that the buffer does not seem to be getting flushed as we are seeing what appears to be stale data when accesses from user mode are made. As a test rather than getting the user mode address from a mmap() call to our driver we instead call the mmap() API to /dev/mem. In this case we get uncached access to the buffer (no flushing needed) and then everything works perfectly.
Our kernel version is 3.8.3 and it's running on an ARM 9. Is there a logical eror in the approach we are attempting?
Thanks!

I have a few question after which i might be able to answer:
1) How do you use "PHYSICAL" address in your mmap() call? mmap should have nothing to do with physical addresses.
2)What exactly do you do to get user virtual addresses in your driver?
3)How do you map these user virtual addresses to physical addresses and where do you do it?
4)Since you preallocate using get_free_pages(), do you map it to kernel space using ioremap_cache()?

Related

Why can User call a system call directly?

Before I ask the question, the following is what I know.
The system call is in the kernel area.
The kernel area cannot be used (accessed) directly by the user.
There are two ways to call a system call.
direct call
wrapping function (API) that contains system call
(2. process:
(User Space) wrapping function ->
system call interface ->
(Kernel Space) System call)
So, in 1. case)
How can User use the kernel area directly?
Or I wonder if there's anything I'm mistaken about.
open sns question
internet search
read operating system concepts 10th (page. 64)
The default is that nothing in user-space is able to execute anything in kernel space. How that works depends on the CPU and the OS, but likely involves some kind of "privilege level" that must be matched or exceeded before the CPU will allow software to access the kernel's part of virtual memory.
This default behavior alone would be horribly useless. For an OS to work there must be some way for user-space to transfer control/execution to (at least one) clearly marked and explicitly allowed kernel entry point. This also depends on the OS and CPU.
For example; for "all 80x86" (including all CPUs and CPU modes) an OS can choose between:
a software interrupt (interrupt gate or trap gate)
an exception (e.g. breakpoint exception)
a call gate
a task gate
the sysenter instruction
the syscall instruction
..and most modern operating system choose to use the syscall instruction now.
All of these possibilities share 2 things in common:
a) There is an implied privilege level switch done by the CPU as part of the control transfer
b) The caller is unable to specify the address they're calling. Instead it's set by the kernel (e.g. during the kernel's initialization).

Device Drivers vs the /dev + glibc Interface

I am looking to have the processor read from I2C and store the data in DDR in an embedded system. As I have been looking at solutions, I have been introduced to Linux device drivers as well as the GNU C Library. It seems like for many operations you can perform with the basic Linux drivers you can also perform with basic glibc system calls. I am somewhat confused when one should be used over the other. Both interfaces can be accessed from the user space.
When should I use a kernel driver to access a device like I2C or USB and when should I use the GNU C Library system functions?
The GNU C Library forwards function calls such as read, write, ioctl directly to the kernel. These functions are just very thin wrappers around the system calls. You could call the kernel all by yourself using inline assembly, but that is rarely helpful. So in this sense, all interactions with the kernel driver will go through these glibc functions.
If you have questions about specific interfaces and their trade-offs, you need to name them explicitly.
In ARM:
Privilege states are built into the processor and are changed via assembly commands. A memory protection unit, a part of the chip, is configured to disallow access to arbitrary ranges of memory depending on the privilege status.
In the case of the Linux kernel, ALL physical memory is privileged- memory addresses in userspace are virtual (fake) addresses, translated to real addresses once in privileged mode.
So, to access a privileged memory range, the mechanics are like a function call- you set the parameters indicating what you want, and then make a ('SVC')- an interrupt function which removes control of the program from userspace, gives it to the kernel. The kernel looks at your parameters and does what you need.
The standard library basically makes that whole process easier.
Drivers create interfaces to physical memory addresses and provide an API through the SVC call and whatever 'arguments' it's passed.
If physical memory is not reserved by a driver, the kernel generally won't allow anyone to access it.
Accessing physical memory you're not privileged to will cause a "bus error".
BTW: You can use a driver like UIO to put physical memory into userspace.

What exactly happens when an OS goes into kernel mode?

I find that neither my textbooks or my googling skills give me a proper answer to this question. I know it depends on the operating system, but on a general note: what happens and why?
My textbook says that a system call causes the OS to go into kernel mode, given that it's not already there. This is needed because the kernel mode is what has control over I/O-devices and other things outside of a specific process' adress space. But if I understand it correctly, a switch to kernel mode does not necessarily mean a process context switch (where you save the current state of the process elsewhere than the CPU so that some other process can run).
Why is this? I was kinda thinking that some "admin"-process was switched in and took care of the system call from the process and sent the result to the process' address space, but I guess I'm wrong. I can't seem to grasp what ACTUALLY is happening in a switch to and from kernel mode and how this affects a process' ability to operate on I/O-devices.
Thanks alot :)
EDIT: bonus question: does a library call necessarily end up in a system call? If no, do you have any examples of library calls that do not end up in system calls? If yes, why do we have library calls?
Historically system calls have been issued with interrupts. Linux used the 0x80 vector and Windows used the 0x2F vector to access system calls and stored the function's index in the eax register. More recently, we started using the SYSENTER and SYSEXIT instructions. User applications run in Ring3 or userspace/usermode. The CPU is very tricky here and switching from kernel mode to user mode requires special care. It actually involves fooling the CPU to think it was from usermode when issuing a special instruction called iret. The only way to get back from usermode to kernelmode is via an interrupt or the already mentioned SYSENTER/EXIT instruction pairs. They both use a special structure called the TaskStateSegment or TSS for short. These allows to the CPU to find where the kernel's stack is, so yes, it essentially requires a task switch.
But what really happens?
When you issue an system call, the CPU looks for the TSS, gets its esp0 value, which is the kernel's stack pointer and places it into esp. The CPU then looks up the interrupt vector's index in another special structure the InterruptDescriptorTable or IDT for short, and finds an address. This address is where the function that handles the system call is. The CPU pushes the flags register, the code segment, the user's stack and the instruction pointer for the next instruction that is after the int instruction. After the systemcall has been serviced, the kernel issues an iret. Then the CPU returns back to usermode and your application continues as normal.
Do all library calls end in system calls?
Well most of them do, but there are some which don't. For example take a look at memcpy and the rest.

memory sharing -- between sytem call & interupt handler

I read following link
Linux Device Driver Program, where the program starts?
as per this all system calls operate independent to each other.
1> Then how to share common memory between different system call & interrupt handler.
but there should be some way to allocate memory ... so that they have common access to a block of memory.
2> Also which pointer to allocate the memory? so that it is accessiable by all ?
Is there some example which uses driver private data ?

Where is the mode bit?

I just read this in "Operating System Concepts" from Silberschatz, p. 18:
A bit, called the mode bit, is added to the hardware of the computer
to indicate the current mode: kernel(0) or user(1). With the mode bit,
we are able to distinguish between a task that is executed on behalf
of the operating system and one that is executed on behalf of the
user.
Where is the mode bit stored?
(Is it a register in the CPU? Can you read the mode bit? As far as I understand it, the CPU has to be able to read the mode bit. How does it know which program gets mode bit 0? Do programs with a special adress get mode bit 0? Who does set the mode bit / how is it set?)
Please note that your question depends highly on the CPU itselt; though it's uncommon you might come across certain processors where this concept of user-level/kernel-level does not even exist.
The cs register has another important function: it includes a 2-bit
field that specifies the Current Privilege Level (CPL) of the CPU. The
value 0 denotes the highest privilege level, while the value 3 denotes
the lowest one. Linux uses only levels 0 and 3, which are respectively
called Kernel Mode and User Mode.
(Taken from "Understanding the Linux Kernel 3e", section 2.2.1)
Also note, this depends on the CPU as you can clearly see and it'll change from one to another but the concept, generally, holds.
Who sets it? Typically, the kernel/cpu and a user-process cannot change it but let me explain something here.
**This is an over-simplification, do not take it as it is**
Let's assume that the kernel is loaded and the first application has just started(the first shell), the kernel loads everything for this application to start, sets the bit in the cs register(if you are running x86) and then jumps to the code of the Shell process.
The shell will continue to execute all of its instructions in this context, if the process contains some privileged instruction, the cpu will fetch it and won't execute it; it'll give an exception(hardware exception) that tells the kernel someone tried to execute a privileged instruction and here the kernel code handles the job(CPU sets the cs to kernel mode and jumps to some known-location to handle this type of errors(maybe terminating the process, maybe something else).
So how can a process do something privileged? Talking to a certain device for instance?
Here comes the System Calls; the kernel will do this job for you.
What happens is the following:
You set what you want in a certain place(For instance you set that you want to access a file, the file location is x, you are accessing for reading etc) in some registers(the kernel documentation will let you know about this) and then(on x86) you will call int0x80 instruction.
This interrupts the CPU, stops your work, sets the mode to kernel mode, jumps the IP register to some known-location that has the code which serves file-IO requests and moves from there.
Once your data is ready, the kernel will set this data in a place you can access(memory location, register; it depends on the CPU/Kernel/what you requested), sets the cs flag to user-mode and jumps back to your instruction next to the it int 0x80 instruction.
Finally, this happens whenever a switch happens, the kernel gets notified something happened so the CPU terminates your current instruction, changes the CPU status and jumps to where the code that handles this thing; the process explained above, roughly speaking, applies to how a switch between kernel mode and user-mode happens.
It's a CPU register. It's only accessible if you're already in kernel mode.
The details of how it gets set depend on the CPU design. In most common hardware, it gets set automatically when executing a special opcode that's used to perform system calls. However, there are other architectures where certain memory pages may have a flag set that indicates that they are "gateways" to the kernel -- calling a function on these pages sets the kernel mode bit.
These days it's given other names such as Supervisor Mode or a protection ring.