System call-validating addresses - system-calls

Let's consider the following system call made by the function
size_t read(int fildes, void *buf, size_t nbytes); from unistd.h.
As I understand, the OS will validate that the process who made system call is permitted to access the supplied address of where to put the read data, namely void *buf. But why is it so? Is the address provided a physical address since it needs to do this check? If yes, is the address translated by the MMU to a physical address before doing the system call?
And how could a process supply an address it was not permitted to access? I can't see how a virtual address in a process' address space can be translated to a physical address it is not permitted to access.

Related

Contents of Memory at very small Addresses

can someone please tell me what contents does the memory have at very small Addresses(0-100), such as address 7 for example in a Linux-based operating system such as CentOS and in Windows ?
Low Virtual Addresses
For most operating systems, at least the lower half of a virtual address space depends on the process it belongs to (with "kernel space" in the upper portion). Typically, to catch dodgy pointers (which includes things like "int pointer = NULL; foo = pointer[1234];" and "struct myStructure *pointer = NULL; foo = pointer->myField;"; where the address that's accessed isn't the address that the pointer points to) the lowest virtual addresses are reserved for literally nothing; so that if any software tries to access it the CPU generates a page fault to inform the kernel that software tried to do something very wrong.
Low Physical Addresses
What is at low physical addresses depend on which type of computer it is (80x86, ARM, MIPs, ...) and what the firmware is (e.g. BIOS, UEFI) and other factors (how the chipset was configured). Without this information there can't be a specific answer (the only possible answer is "nobody can know").

Can I have more than 32 netlink sockets in kernelspace?

I have several kernel modules which need to interact with userspace. Hence, each module has a Netlink socket.
My problem is that these sockets interfere with each other. This is because all of them register to the same Netlink address family (because there aren't many available to begin with - the max is 32 and more than half are already reserved) and also because they all bind themselves to the same pid (the kernel pid - zero).
I wish there were more room for address families. Or, better yet, I wish I could bind my sockets to other pids. How come Netlink is the preferred user-kernel channel if only 32 sockets can be open at any one time?
libnl-3's documentation says
The netlink address (port) consists of a 32bit integer. Port 0 (zero) is reserved for the kernel and refers to the kernel side socket of each netlink protocol family. Other port numbers usually refer to user space owned sockets, although this is not enforced.
That last claim seems to be a lie right now. The kernel uses a constant as pid and doesn't export more versatile functions:
if (netlink_insert(sk, 0))
goto out_sock_release;
I guess I can recompile the kernel and increase the address family limit. But these are kernel modules; I shouldn't have to do that.
Am I missing something?
No.
Netlink's socket count limit is why Generic Netlink exists.
Generic Netlink is a layer on top of stock Netlink. Instead of opening a socket, you register a callback on an already established socket, and listen to messages directed to a "sub"-family there. Given there are more available family slots (1023) and no ports, I'm assuming they felt a separation between families and ports was unnecessary at this layer.
To register a listener in kernelspace, use genl_register_family() or its siblings. In userspace, Generic Netlink can be used via libnl-3's API (though it's rather limited, but the code speaks a lot and is open).
You are confused by MAX_LINKS variable name. It is not a "maxumum amount of links", it's a "maximum amount of families". The things you listed are netlink families or IOW netlink groups. There are indeed 32 families. Each family dedicated to serve some particular purpose. For example NETLINK_SELINUX is for SELinux notification and NETLINK_KOBJECT_UEVENT is for kobject notifications (these are what udev handles).
But there are no restrictions on number of sockets for each of the family.
When you call netlink_create it's checking your protocol number which in case of netlink socket is netlink family like NETLINK_SELINUX. Look at the code
static int netlink_create(struct net *net, struct socket *sock, int protocol,
int kern)
{
...
if (protocol < 0 || protocol >= MAX_LINKS)
return -EPROTONOSUPPORT;
...
This is how your MAX_LINKS is using.
Later, when to actually create socket it invokes __netlink_create, which in turn calls sk_alloc, which in turn calls sk_prot_alloc. Now, in sk_prot_alloc it allocates socket by kmallocing (netlink doesn't have its own slab cache):
slab = prot->slab;
if (slab != NULL) {
sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
if (!sk)
return sk;
if (priority & __GFP_ZERO) {
if (prot->clear_sk)
prot->clear_sk(sk, prot->obj_size);
else
sk_prot_clear_nulls(sk, prot->obj_size);
}
} else
sk = kmalloc(prot->obj_size, priority);

What's the Allocation Base shown in the "address" command of windbg?

When you use !address command to find the module that owns an memory address, it shows both a Allocation Base and Base Address.
So Allocation Base is where the DLL image gets loaded (same as the output of the lm command), what about the Base Address then?
AllocationBase refers the start address of the allocated block in memory.
This block can hold segments of different types.
When checking for a specific address, the base address will tell you where the block it belongs to starts and the base address will point to the segment start address.
Check this link, a great tutorial from MSDN:
Memory User Mode Tutorial

Linux kernel flush_cache_range() call appears to do nothing

Introduction:
We have an application in which Linux running on an ARM accepts data from an external processor which DMA's the data into the ARM's memory space. The ARM then needs to access that data from user-mode code.
The range of addresses must be physically contiguous as the DMA engine in the external processor does not support scatter/gather. This memory range is initially allocated from the ARM kernel via a __get_free_pages(GFP_KERNEL | __GFP_DMA,order) call as this assures us that the memory allocated will be physically contiguous. Then a virt_to_phys() call on the returned pointer gives us the physical address that is then provided to the external processor at the beginning of the process.
This physical address is known also to the Linux user mode code which uses it (in user mode) to call the mmap() API to get a user mode pointer to this memory area. Our Linux kernel driver then sees a corresponding call to its mmap routine in the driver's file_operations structure. The driver then retains the vm_area_struct "vma" pointer that is passed to it in the call to its mmap routine for use later.
When the user mode code receives a signal that new data has been DMA'd to this memory address it then needs to access it from user mode via the user mode pointer we got from the call to mmap() mentioned above. Before the user mode code does this of course the cache corresponding to this memory range must be flushed. To accomplish this flush the user mode code calls the driver (via an ioctl) and in kernel mode a call to flush_cache_range() is made:
flush_cache_range(vma,start,end);
The arguments passed to the call above are the "vma" which the driver had captured when its mmap routine was called and "start" and "end" are the user mode addresses passed into the driver from the user mode code in a structure provided to the ioctl() call.
The Problem:
What we see is that the buffer does not seem to be getting flushed as we are seeing what appears to be stale data when accesses from user mode are made. As a test rather than getting the user mode address from a mmap() call to our driver we instead call the mmap() API to /dev/mem. In this case we get uncached access to the buffer (no flushing needed) and then everything works perfectly.
Our kernel version is 3.8.3 and it's running on an ARM 9. Is there a logical eror in the approach we are attempting?
Thanks!
I have a few question after which i might be able to answer:
1) How do you use "PHYSICAL" address in your mmap() call? mmap should have nothing to do with physical addresses.
2)What exactly do you do to get user virtual addresses in your driver?
3)How do you map these user virtual addresses to physical addresses and where do you do it?
4)Since you preallocate using get_free_pages(), do you map it to kernel space using ioremap_cache()?

Detect mprotected memory address

Is there a function to detect whether a given virtual address mapped by mmap is protected by mprotect? Accessing such an address will result in segmentation fault if PROT_NONE is set. So I'd like to first detect whether they're protected or not.
It's better if I don't need to introduce signal handlers. If there isn't any such function, any other lightweight solution is also fine. Thanks.