Why file starting offset in mmap() must be multiple of the page size - offset

In mmap() manpage:
Its prototype is:
void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
and description:
The mmap() function asks to map 'length' bytes starting at offset 'offset'
from the file (or other object) specified by the file descriptor fd into
memory, preferably at address 'start'.
Sepcifically, for the last argument:
'offset' should be a multiple of the page size as returned by getpagesize(2).
From what I practised, offset MUST be a multiple of the page size, for example, 4096 on my Linux, otherwise, mmap() would return Invalid argument, offset is for file offset, why it must be multiple of virtual memory system's page size?
Thanks,

The simple answer: to make it fast. The more complex answer: whenever you access the memory at a location within the mapped memory, the OS must make sure that this location is filled with the contents of the file. But the OS can only detect whether you access a memory page - not a single location. What it does is, it creates a simple relation between offsets in the file and memory pages - and whenever you access a memory page, that part of the file is loaded. To make these calculations fast, it restricts you to start at certain offsets.

Related

Libfuzzer target for on-disk parsing

I'm currently integrating libFuzzer in a project which parses files on the hard drive. I have some prior experience with AFL, where a command line like this one was used:
afl-fuzz -m500 -i input/ -o output/ -t100 -- program_to_fuzz ##
...where ## was a path to the generated input.
Looking at libFuzzer however, I see that the fuzz targets look like this:
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
DoSomethingInterestingWithMyAPI(Data, Size);
return 0; // Non-zero return values are reserved for future use.
}
I understand that the input isn't provided in the form of a file, but as a buffer in-memory instead. The problem is that the program I'm trying to fuzz works with files and obtains its data through fread() calls. At no point in time is the whole input supposed to be loaded in memory (where, in the general case, it might not even fit); so there's not much I can do with a const uint8_t*.
Writing the buffer back to the hard drive to get back a file seems extremely inefficient. Is there a way around this?
You can do as in this example from google security team.
The buf_to_file defined here takes your buffer and returns a char* pathname you can then pass to you target:
(from https://github.com/google/security-research-pocs/blob/master/autofuzz/fuzz_utils.h#L27 )
// Write the data provided in buf to a new temporary file. This function is
// meant to be called by LLVMFuzzerTestOneInput() for fuzz targets that only
// take file names (and not data) as input.
//
// Return the path of the newly created file or NULL on error. The caller should
// eventually free the returned buffer (see delete_file).
extern "C" char *buf_to_file(const uint8_t *buf, size_t size);
Be sure to free the ressource with the delete_file function.
You could use LD_PRELOAD and override fread.

How to find the number of data mapped by mmap()?

if mmap() was used to read a file, how can I find the number of data mapped by mmap().
float *map = (float *)mmap(NULL, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
The mmap system call does not read data. It just maps the data in your virtual address space (by indirectly configuring your MMU), and that virtual address space is changed by a successful mmap. Later, your program will read that data (or not). In your example, your program might later read map[356] if mmap has succeeded (and you should test against its failure).
Read carefully the documentation of mmap(2). The second argument (in your code, FILESIZE) defines the size of the mapping (in bytes). You might check that it is a multiple of sizeof(float) and divide it by sizeof(float) to get the number of elements in map that are meaningful and obtained from the file. The size of the mapping is rounded up to a multiple of pages. The man page of mmap(2) says:
A file is mapped in multiples of the page size. For a file that is
not a multiple of the page size, the remaining memory is zeroed when
mapped, and writes to that region are not written out to the file.
Data is mapped in pages. A page is usually 4096 bytes. Read more about paging.
The page size is returned by getpagesize(2) or by sysconf(3) with _SC_PAGESIZE (which usually gives 4096).
Consider reading some book like Operating Systems: Three Easy Pieces (freely downloadable) to understand how virtual memory works and what is a memory mapped file.
On Linux, the /proc/ filesystem (see proc(5)) is very useful to understand the virtual address space of some process: try cat /proc/$$/maps in your terminal, and read more to understand its output. For a process of pid 1234, try also cat /proc/1234/maps
From inside your process, you could even read sequentially the /proc/self/maps pseudo-file to understand its virtual address space, like here.

reading data from sysfs

I am trying to provide DMA via PCI. For that purpose I have an example of sysfs driver. I succesfully stored data to RAM but unfortunately I cant read them. I have a functions store_dmaread and show_dmaread. I acces them via c code like this. The write function works fine but the show function which I open via read() works (reads the DMA data, prints them) but the user space buffer is not visible in that function.
char buf[2] = {3,3};
fw = open("/sys/bus/pci/devices/0000\:01\:00.0/dmaread", O_RDWR);
read (fw,buf, 2);
write (fw, buf, 2);
close(fw);
the function in the driver looks like this:
static ssize_t show_dmaread(struct device *dev, struct device_attribute *attr, char *buf)
{
printk("User space buffer value %d \n", buf[0]) // PRINTS 0
// MORE CODE WHICH WORKS
}
static ssize_t store_dmaread(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
{
// WORKS FINE THE ATTRIBUTE CHANGES ITS VALUE
}
Thanks a lot for help
From your question, it appears you are expecting that the char * buf passed to your show_dmaread function points directly to the userspace buffer passed to read (or at the very least has been populated with the data in the user-side buffer):
However, looking in Documentation/filesystem/sysfs.txt it says:
sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
method. Sysfs will call the method exactly once for each read or
write. This forces the following behavior on the method
implementations:
On read(2), the show() method should fill the entire buffer. Recall that an attribute should only be exporting one value, or an
array of similar values, so this shouldn't be that expensive.
This allows userspace to do partial reads and forward seeks
arbitrarily over the entire file at will. If userspace seeks back to
zero or does a pread(2) with an offset of '0' the show() method will
be called again, rearmed, to fill the buffer.
Which leads me to believe you are getting a newly allocated buffer and that some other kernel code manages copying your buffer back over to userspace.

Very few write cycles in stm32f4

I'm using a STM32F401VCT6U "discovery" board, and I need to provide a way for the user to write addresses in memory at runtime.
I wrote what can be simplified to the following function:
uint8_t Write(uint32_t address, uint8_t* values, uint8_t count)
{
uint8_t index;
for (index = 0; index < count; ++index) {
if (IS_FLASH_ADDRESS(address+index)) {
/* flash write */
FLASH_Unlock();
if (FLASH_ProgramByte(address+index, values[index]) != FLASH_COMPLETE) {
return FLASH_ERROR;
}
FLASH_Lock();
} else {
/* ram write */
((uint8_t*)address)[index] = values[index]
}
}
return NO_ERROR;
}
In the above, address is the base address, values is a buffer of size at least count which contains the bytes to write to memory and count the number of bytes to write.
Now, my problem is the following: when the above function is called with a base address in flash and count=100, it works normally the first few times, writing the passed values buffer to flash. After those first few calls however, I cannot write just any value anymore: I can only reset bits in the values in flash, eg an attempt to write 0xFF to 0x7F will leave 0x7F in the flash, while writing 0xFE to 0x7F will leave 0x7E, and 0x00 to any value will be successful (but no other value will be writable to the address afterwards).
I can still write normally to other addresses in the flash by changing the base address, but again only a few times (two or three calls with count=100).
This behaviour suggests that the maximum write count of the flash has been reached, but I cannot imagine it can be so fast. I'd expect at the very least 10,000 writes before exhaustion.
So what am I doing wrong?
You have missunderstood how flash works - it is not for example as straight forward as writing EEPROM. The behaviour you are discribing is normal for flash.
To repeatidly write the same address of flash the whole sector must be first erased using FLASH_EraseSector. Generally any data that needs to preserved during this erase needs to be either buffered in RAM or in another flash sector.
If you are repeatidly writing a small block of data and are worried about flash burnout do to many erase write cycles you would want to write an interface to the flash where each write you move your data along the flash sector to unwriten flash, keeping track of its current offset from the start of sector. Only then when you run out of bytes in the sector would you need to erase and start again at start of sector.
ST's "right way" is detailed in AN3969: EEPROM emulation in STM32F40x/STM32F41x microcontrollers
This is more or less the process:
Reserve two Flash pages
Write the latest data to the next available location along with its 'EEPROM address'
When you run out of room on the first page, write all of the latest values to the second page and erase the first
Begin writing values where you left off on page 2
When you run out of room on page 2, repeat on page 1
This is insane, but I didn't come up with it.
I have a working and tested solution, but it is rather different from #Ricibob's answer, so I decided to make this an answer.
Since my user can write anywhere in select flash sector, my application cannot handle the responsability of erasing the sector when needed while buffering to RAM only the data that need to be preserved.
As a result, I transferred to my user the responsability of erasing the sector when a write to it doesn't work (this way, the user remains free to use another address in the sector to avoid too many write-erase cycles).
Solution
Basically, I expose a write(uint32_t startAddress, uint8_t count, uint8_t* values) function that has a WRITE_SUCCESSFUL return code and a CANNOT_WRITE_FLASH in case of failure.
I also provide my user with a getSector(uint32_t address) function that returns the id, start address and end address of the sector corresponding to the address passed as a parameter. This way, the user knows what range of address is affected by the erase operation.
Lastly, I expose an eraseSector(uint8_t sectorID) function that erase the flash sector whose id has been passed as a parameter.
Erase Policy
The policy for a failed write is different from #Ricibob's suggestion of "erase if the value in flash is different of FF", as it is documented in the Flash programming manual that a write will succeed as long as it is only bitreset (which matches the behavior I observed in the question):
Note: Successive write operations are possible without the need of an erase operation when
changing bits from ‘1’ to ‘0’.
Writing ‘1’ requires a Flash memory erase operation.
If an erase and a program operation are requested simultaneously, the erase operation is
performed first.
So I use the macro CAN_WRITE(a,b), where a is the original value in flash and b the desired value. The macro is defined as:
!(~a & b)
which works because:
the logical not (!) will transform 0 to true and everything else to false, so ~a & b must equal 0 for the macro to be true;
any bit at 1 in a is at 0 in ~a, so it will be 0 whatever its value in b is (you can transform a 1 in 1 or 0);
if a bit is 0 in a, then it is 1 in ~a, if b equals 1 then ~a & b != 0 and we cannot write, if bequals 0 it's OK (you can transform a 0 to 0 only, not to 1).
List of flash sector in STM32F4
Lastly and for future reference (as it is not that easy to find), the list of sectors of flash in STM32 can be found on page 7 of the Flash programming manual.

How would one transfer files larger than 2,147,483,646 bytes (~2 GiB) with Win32 TransmitFile()?

Quoted from MSDN entry for TransmitFile:
The maximum number of bytes that can be transmitted using a single call to the TransmitFile function is 2,147,483,646, the maximum value for a 32-bit integer minus 1. The maximum number of bytes to send in a single call includes any data sent before or after the file data pointed to by the lpTransmitBuffers parameter plus the value specified in the nNumberOfBytesToWrite parameter for the length of file data to send. If an application needs to transmit a file larger than 2,147,483,646 bytes, then multiple calls to the TransmitFile function can be used with each call transferring no more than 2,147,483,646 bytes. Setting the nNumberOfBytesToWrite parameter to zero for a file larger than 2,147,483,646 bytes will also fail since in this case the TransmitFile function will use the size of the file as the value for the number of bytes to transmit.
Alright. Sending a file of size 2*2,147,483,646 bytes (~ 4 GiB) with TransmitFile would then have to be divided into two parts at minimum (e.g. 2 GiB + 2 GiB in two calls to TransmitFile). But how exactly would one go about doing that, while preferably also keeping the underlying TCP connection alive in between?
When the file is indeed <=2,147,483,646 bytes in size, one could just write:
HANDLE fh = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN, NULL);
TransmitFile(SOCK_STREAM_socket, fh, 0, 0, NULL, NULL, TF_DISCONNECT);
to let Windows handle all the lower-level stuff (caching, chunking the data up into pieces for efficient transmission etc. However, unlike the comparable Linux sendfile() syscall, there is no immediately obvious offset argument in the call (although the fifth argument, LPOVERLAPPED lpOverlapped probably is exactly what I'm looking for). I suppose I could hack something together, but I'm also looking for a graceful, good practice Win32 solution from someone who actually knows about this stuff.
You can use the lpOverlapped parameter to specify a 64-bit offset within the file at which to start the file data transfer by setting the Offset and OffsetHigh member of the OVERLAPPED structure. If lpOverlapped is a NULL pointer, the transmission of data always starts at the current byte offset in the file.
So, for lack of a minimal example readily available on the net, which calls are necessary to accomplish such a task?
Managed to figure it out based on the comments.
So, if LPOVERLAPPED lpOverlapped is a null pointer, the call starts transmission at the current file offset of the file (much like the Linux sendfile() syscall and its off_t *offset parameter). This offset (pointer) can be manipulated with SetFilePointerEx, so one could write:
#define TRANSMITFILE_MAX ((2<<30) - 1)
LARGE_INTEGER total_bytes;
memset(&total_bytes, 0, sizeof(total_bytes));
while (total_bytes < filesize) {
DWORD bytes = MIN(filesize-total_bytes, TRANSMITFILE_MAX);
if (!TransmitFile(SOCK_STREAM_socket, fh, bytes, 0, NULL, NULL, 0))
{ /* error handling */ }
total_bytes.HighPart += bytes;
SetFilePointerEx(fh, total_bytes, NULL, FILE_BEGIN);
}
closesocket(SOCK_STREAM_socket);
to accomplish the task.
Not very elegant imo, but it works.