Why are mmap syscall flags not set - system-calls

I am trying to call mmap with a direct syscall.
#include <sys/mman.h>
int main() {
__asm__("mov $0x0, %r9;"
"mov $0xffffffffffffffff, %r8;"
"mov $0x32, %rcx;"
"mov $0x7, %rdx;"
"mov $0x1000, %rsi;"
"mov $0x303000, %rdi;"
"mov $0x9, %rax;"
"syscall;");
return 0;
}
I compiled the program statically:
$ gcc -static -o foo foo.c
But the syscall fails, as shown by strace:
$ strace ./foo
mmap(0x303000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, -1, 0) = -1 EBADF (Bad file descriptor)
We can see that mmap flags are wrongly set. 0x32 should be MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS.
The thing is, if I put another mmap call with mmap from the libc:
int main() {
mmap(0x202000, 4096, 0x7, 0x32, -1, 0);
__asm__("mov $0x0, %r9;"
"mov $0xffffffffffffffff, %r8;"
"mov $0x32, %rcx;"
"mov $0x7, %rdx;"
"mov $0x1000, %rsi;"
"mov $0x303000, %rdi;"
"mov $0x9, %rax;"
"syscall;");
return 0;
}
Then both mmap work:
$ strace ./foo
mmap(0x202000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x202000
mmap(0x303000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x303000
So it seems that using libc, mmap flags are "resolved" or something. But I can't really understand what is happening.
Why does the mmap syscall example only work if I put a libc mmap call before ?

Kernel syscall interface on AMD64 uses r10 register as a fourth argument, not rcx.
mov $0x32, %r10
See linked question for more details.

Related

Can't properly bind socket unless I pass a seemingly incorrect length

The code I've written doesn't work properly unless I pass a seemingly incorrect number to the length argument of bind. It kind of does work; after the accept call I can see it by running netstat, but the port is incorrect. I originally wrote the code in x86 assembly and I then wrote it in C. In both cases, it didn't work.
When calling the function socket, I passed AF_INET (2) as the domain, SOCK_STREAM (1) as the type, and TCP (6) as the protocol.
Here is the code, the one written in C:
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
int main(void) {
unsigned int fd = socket(AF_INET, SOCK_STREAM, 6);
struct sockaddr_in thing = (struct sockaddr_in) {
.sin_family = AF_INET, //2 bytes
.sin_port = 0x2923, //2 bytes
.sin_addr = 0 //4 bytes
};
printf("%zu\n", sizeof thing);
bind(fd, (struct sockaddr*) &thing, 8); //Using 16 instead of 8 fixes the port problem, why?
listen(fd, 1);
accept(fd, 0, 0);
}
To me, 8 looks like the correct length to use. However, 16 seems to be the correct number to use. Why?
Here is the original code written in x86 NASM Assembly:
section .text
global _start
_start:
push byte 6
push byte 1
push byte 2
mov ecx, esp
mov ebx, 1 ;socket
mov eax, 102
int 80h
mov edi, eax ;file descriptor
push dword 0x00000000 ;4 bytes
push word 0x2923 ;2 bytes
push word 2 ;2 bytes
mov ecx, esp
push byte 8 ;Length argument, byte turned into 4 bytes in stack
push ecx
push edi
mov ecx, esp
mov ebx, 2 ;bind
mov eax, 102
int 80h
push byte 1
push edi
mov ecx, esp
mov ebx, 4 ;listen
mov eax, 102
int 80h
push byte 0
push byte 0
push edi
mov ecx, esp
mov ebx, 5 ;accept
mov eax, 102
int 80h
bind(fd, (struct sockaddr*) &thing, 8); //Using 16 instead of 8 fixes the port problem, why?
This code will fail with EINVAL (checked with Linux), but this error is simply ignored by your program. This means no explicit bind will be done.
listen(fd, 1);
accept(fd, 0, 0);
Since the explicit bind failed an implicit bind will be done, effectively meaning that the socket will listen on a random address. That's why it does not seem to work. netstat should show the random port here, not the actually intended one.
Look at the examples in the man page: https://man7.org/linux/man-pages/man2/bind.2.html.
You have to pass the size of the sockaddr_in structure.
bind(fd, (struct sockaddr*) &thing, sizeof(thing));
It's very likely that bind makes an internal check, to ensure that the 3rd parameter is indeed the size of the structure in the 2nd parameter
For AF_INET sockets bind surely knows that the 2nd parameter is a sockaddr_in instance, so it knows what value the 3rd parameter must be.
Really, there's nothing mysterious going on here.

Is there a limit above max_dgram_qlen and wmem_{max,default}?

I am trying to determinate the different limits of the unix datagram sockets, as I am using it as IPC for my project.
The obscure thing I want to control is the size of my socket's internal buffer :
I want to know how many datagrams I can send before my socket would block.
I've understood that 2 differents limits affect the size of the socket's buffer :
/proc/sys/net/core/wmem_{max, default} sets the max (-default) size of a socket's writing buffer
/proc/sys/net/unix/max_dgram_qlen sets the maximum number of datagram the buffer can hold
I know that /proc/sys/net/core/rmem_{max, default} sets the max (-default) size of a socket's reading buffer but as I am working on local unix socket it doesn't seem to have a impact.
I have set wmem_{max, default} to 136314880 (130 MB) and max_dgram_qlen to 500000.
And wrote a small program where the sender socket only sends fixed size datagram to the receiver socket until is would block, I then print the size and number of datagram I was able to send.
Here is the code I used :
#include <err.h>
#include <stdio.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/un.h>
#include <unistd.h>
/* Payload size in bytes. */
#define PAYLOAD_SIZE 100
#define CALL_AND_CHECK(syscall) \
if (syscall < 0) { err(1, NULL); }
int main(void)
{
int receiver_socket_fd = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0);
if (receiver_socket_fd < 0)
err(1, NULL);
char* binding_path = "test_socket";
struct sockaddr_un addr;
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, binding_path, sizeof(addr.sun_path));
/* Check if the file exists, if yes delete it ! */
if (access(binding_path, F_OK) != -1) {
CALL_AND_CHECK(unlink(binding_path));
}
CALL_AND_CHECK(bind(receiver_socket_fd, (struct sockaddr const*) &addr, sizeof(addr)));
int sender_socket_fd = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0);
if (sender_socket_fd < 0)
err(1, NULL);
CALL_AND_CHECK(connect(sender_socket_fd, (struct sockaddr const*) &addr, sizeof(addr)));
struct payload { char data[PAYLOAD_SIZE]; };
/* Create test payload with null bytes. */
struct payload test_payload;
memset(&test_payload.data, 0, PAYLOAD_SIZE);
ssize_t total_size_written = 0;
ssize_t size_written = 0;
do {
size_written = write(sender_socket_fd, (const void *) &test_payload, PAYLOAD_SIZE);
if (size_written > 0)
total_size_written += size_written;
} while (size_written > 0);
printf("socket_test: %zu bytes (%ld datagrams) were written before blocking, last error was :\n", total_size_written, total_size_written / PAYLOAD_SIZE);
perror(NULL);
CALL_AND_CHECK(unlink(binding_path));
CALL_AND_CHECK(close(sender_socket_fd));
CALL_AND_CHECK(close(receiver_socket_fd));
return 0;
}
I was expecting to reach either the max size in bytes of the socket (here 130MB) or the max number of datagram I set (500 000).
But the actual result is that I am only able to write 177494 datagrams before being blocked.
I can change the size of my payload it's always the same result (as long as I don't reach the maximum size in bytes first). So it seems that I am hitting a limit above max_dgram_qlen and wmem_{max, default} that I can't found.
I have of course tried to investigate ulimit or limits.conf without success. ulimit -b doesn't even work on my machine (says "options not found" and returns).
I am working on Debian 10 (buster) but have launched my test program on different OS with the same result : I hit a limit of datagram that I don't know about.
Do you have any idea of which limit I didn't see and I am reaching ? And if I can read or modify this limit ?

Segfault when running hello world shellcode in C program

sorry if this question sounds dumb but I am very new to shellcoding and I was trying to get a hello world example to work on a 32 bit linux machine.
As this is shellcoding, I used a few tricks to remove null bytes and shorten the code. Here it is:
section .data
section .text
global _start
_start:
;Instead of xor eax,eax
;mov al,0x4
push byte 0x4
pop eax
;xor ebx,ebx
push byte 0x1
pop ebx
;xor ecx,ecx
cdq ; instead of xor edx,edx
;mov al, 0x4
;mov bl, 0x1
mov dl, 0x8
push 0x65726568
push 0x74206948
;mov ecx, esp
push esp
pop ecx
int 0x80
mov al, 0x1
xor ebx,ebx
int 0x80
This code works fine when I compile and link it with the following commands:
$ nasm -f elf print4.asm
$ ld -o print4 -m elf_i386 print4.o
However, I tried running it within the following C code:
$ cat shellcodetest.c
#include
#include
char *shellcode = "\x04\x6a\x58\x66\x01\x6a\x5b\x66\x99\x66\x08\xb2\x68\x68\x68\x65\x69\x48\x54\x66\x59\x66\x80\xcd\x01\xb0\x31\x66\xcd\xdb\x80";
int main(void) {
( *( void(*)() ) shellcode)();
}
$ gcc shellcodetest.c –m32 –z execstack -o shellcodetest
$ ./shellcodetest
Segmentation fault (core dumped)
Could someone please explain what is happening there? I tried running the code in gdb and noticed something weird happening with esp. But as I said before, I still lack experience to really understand what is going on here.
Thanks in advance!
Your shellcode does not work, because it is not entered in the correct endianness. You did not state how you extracted the bytes from the file print4, but both objdump and xxd gives the bytes in correct order.
$ xxd print4 | grep -A1 here
0000060: 6a04 586a 015b 99b2 0868 6865 7265 6848 j.Xj.[...hherehH
0000070: 6920 7454 59cd 80b0 0131 dbcd 8000 2e73 i tTY....1.....s
$ objdump -d print4
print4: file format elf32-i386
Disassembly of section .text:
08048060 <_start>:
8048060: 6a 04 push $0x4
8048062: 58 pop %eax
8048063: 6a 01 push $0x1
...
The changes you need to do is to swap the byte order, '\x04\x6a' -> '\x6a\x04'.
When I run your code with this change, it works!
$ cat shellcodetest.c
char *shellcode = "\x6a\x04\x58\x6a\x01\x5b\x99\xb2\x08\x68\x68\x65\x72\x65\x68\x48\x69\x20\x74\x54\x59\xcd\x80\xb0\x01\x31\xdb\xcd\x80";
int main(void) {
( *( void(*)() ) shellcode)();
}
$ gcc shellcodetest.c -m32 -z execstack -o shellcodetest
$ ./shellcodetest
Hi there$

Analyzing binary taken from memory dump in IDA Pro

I'm having problems with analyzing a simple binary in IDA Pro.
When running a program, i dumped part of its memory (for example, unpacked code section in the memory) into a file, using WinDbg.
I would like to analyze it using IDA, but when trying to just load the binary - it will only show its raw data.
Of course the binary is not a full PE file, so I'm not expecting a deep analysis, just a nicer way to read the disassembly.
So the question is - How can i make IDA disassemble the binary?
Thanks! :)
select an appropriate address and press c
that is MakeCode(Ea); ida will convert the raw bytes to code and disassemble it
pasted below is a simple automation with an idc script but idas automation is imho subpar so you should stick with manual pressing of C in user interface
:dir /b
foo.dmp
foo.idc
:xxd foo.dmp
0000000: 6a10 6830 b780 7ce8 d86d ffff 8365 fc00 j.h0..|..m...e..
0000010: 64a1 1800 0000 8945 e081 7810 001e 0000 d......E..x.....
0000020: 750f 803d 0850 887c 0075 06ff 15f8 1280 u..=.P.|.u......
0000030: 7cff 750c ff55 0850 e8c9 0900 00 |.u..U.P.....
:type foo.idc
#include <idc.idc>
static main (void) {
auto len,temp,fhand;
len = -1; temp = 0;
while (temp < 0x3d && len != 0 ) {
len = MakeCode(temp);
temp = temp+len;
}
fhand = fopen("foo.asm","wb");
GenerateFile(OFILE_LST,fhand,0,0x3d,0x1F);
fclose(fhand);
Wait();
Exit(0);
}
:f:\IDA_FRE_5\idag.exe -c -B -S.\foo.idc foo.dmp
:head -n 30 foo.asm | tail
seg000:00000000 ; Segment type: Pure code
seg000:00000000 seg000 segment byte public 'CODE' use32
seg000:00000000 assume cs:seg000
seg000:00000000 assume es:nothing, ss:nothing, ds:nothing, fs:no thing, gs:nothing
seg000:00000000 push 10h
seg000:00000002 push 7C80B730h
seg000:00000007 call near ptr 0FFFF6DE4h
seg000:0000000C and dword ptr [ebp-4], 0
with windbg you can get the disassembly right from command line like this
:cdb -c ".dvalloc /b 60000000 2000;.readmem foo.dmp 60001000 l?0n61;u 60001000 60001040;q" calc
0:000> cdb: Reading initial command '.dvalloc /b 60000000 2000;.readmem foo.dmp 60001000 l?0n61;u 60001000 60001040;q'
Allocated 2000 bytes starting at 60000000
Reading 3d bytes.
60001000 6a10 push 10h
60001002 6830b7807c push offset kernel32!`string'+0x88 (7c80b730)
60001007 e8d86dffff call 5fff7de4
6000100c 8365fc00 and dword ptr [ebp-4],0
60001010 64a118000000 mov eax,dword ptr fs:[00000018h]
60001016 8945e0 mov dword ptr [ebp-20h],eax
60001019 817810001e0000 cmp dword ptr [eax+10h],1E00h
60001020 750f jne 60001031
60001022 803d0850887c00 cmp byte ptr [kernel32!BaseRunningInServerProcess (7c885008)],0
60001029 7506 jne 60001031
6000102b ff15f812807c call dword ptr [kernel32!_imp__CsrNewThread (7c8012f8)]
60001031 ff750c push dword ptr [ebp+0Ch]
60001034 ff5508 call dword ptr [ebp+8]
60001037 50 push eax
60001038 e8c9090000 call 60001a06
6000103d 0000 add byte ptr [eax],al
6000103f 0000 add byte ptr [eax],al
quit:
ollydbg 1.10 view-> file-> (mask any file) -> foo.dmp -> rightclick -> disassemble

Analog read from all 7 inputs using PRU and a host C program for beaglebone black

I'm kinda new to the beaglebone black world running on a AM335X Cortex A8 processor and I would like to use the PRU for fast analog read with the maximum sampling rate possible.
I would like to read all 7 inputs in a loop form like:
while( n*7 < sampling_rate){ //initial value for n = 0
read(AIN0); //and store it in shared memory(7*n + 0)
read(AIN1); //and store it in shared memory(7*n + 1)
read(AIN2); //and store it in shared memory(7*n + 2)
read(AIN3); //and store it in shared memory(7*n + 3)
read(AIN4); //and store it in shared memory(7*n + 4)
read(AIN5); //and store it in shared memory(7*n + 5)
read(AIN6); //and store it in shared memory(7*n + 6)
n++;
}
so that I can read them from a host program running on the main processor. Any idea how to do so? I tried using a ready code called ADCCollector.c from a package named AM335x_pru_package but I can't figure out how to get all the addresses and values of the registers used.
This is the code I was trying to modify (ADCCollector.p):
.origin 0 // offset of the start of the code in PRU memory
.entrypoint START // program entry point, used by debugger only
#include "ADCCollector.hp"
#define BUFF_SIZE 0x00000fa0 //Total buff size: 4kbyte(Each buffer has 2kbyte: 500 piece of data
#define HALF_SIZE BUFF_SIZE / 2
#define SAMPLING_RATE 1 //Sampling rate(16khz) //***//16000
#define DELAY_MICRO_SECONDS (1000000 / SAMPLING_RATE) //Delay by sampling rate
#define CLOCK 200000000 // PRU is always clocked at 200MHz
#define CLOCKS_PER_LOOP 2 // loop contains two instructions, one clock each
#define DELAYCOUNT DELAY_MICRO_SECONDS * CLOCK / CLOCKS_PER_LOOP / 1000 / 1000 * 3 //if sampling rate = 98000 --> = 3061.224
.macro DELAY
MOV r10, DELAYCOUNT
DELAY:
SUB r10, r10, 1
QBNE DELAY, r10, 0
.endm
.macro READADC
//Initialize buffer status (0: empty, 1: first buffer is ready, 2: second buffer is ready)
MOV r2, 0
SBCO r2, CONST_PRUSHAREDRAM, 0, 4
INITV:
MOV r5, 0 //Shared RAM address of ADC Saving position
MOV r6, BUFF_SIZE //Counting variable
READ:
//Read ADC from FIFO0DATA
MOV r2, 0x44E0D100
LBBO r3, r2, 0, 4
//Add address counting
ADD r5, r5, 4
//Write ADC to PRU Shared RAM
SBCO r3, CONST_PRUSHAREDRAM, r5, 4
DELAY
SUB r6, r6, 4
MOV r2, HALF_SIZE
QBEQ CHBUFFSTATUS1, r6, r2 //If first buffer is ready
QBEQ CHBUFFSTATUS2, r6, 0 //If second buffer is ready
QBA READ
//Change buffer status to 1
CHBUFFSTATUS1:
MOV r2, 1
SBCO r2, CONST_PRUSHAREDRAM, 0, 4
QBA READ
//Change buffer status to 2
CHBUFFSTATUS2:
MOV r2, 2
SBCO r2, CONST_PRUSHAREDRAM, 0, 4
QBA INITV
//Send event to host program
MOV r31.b0, PRU0_ARM_INTERRUPT+16
HALT
.endm
// Starting point
START:
// Enable OCP master port
LBCO r0, CONST_PRUCFG, 4, 4 //#define CONST_PRUCFG C4 taken from ADCCollector.hp
CLR r0, r0, 4
SBCO r0, CONST_PRUCFG, 4, 4
//C28 will point to 0x00012000 (PRU shared RAM)
MOV r0, 0x00000120
MOV r1, CTPPR_0
ST32 r0, r1
//Init ADC CTRL register
MOV r2, 0x44E0D040
MOV r3, 0x00000005
SBBO r3, r2, 0, 4
//Enable ADC STEPCONFIG 1
MOV r2, 0x44E0D054
MOV r3, 0x00000002
SBBO r3, r2, 0, 4
//Init ADC STEPCONFIG 1
MOV r2, 0x44E0D064
MOV r3, 0x00000001 //continuous mode
SBBO r3, r2, 0, 4
//Read ADC and FIFOCOUNT
READADC
Another question is: if I simply changed the #define Sampling_rate from 16000 to any other number below or equal to 200000 in the (.p) file, I will get that sampling rate? or should I change other things?
Thanks in advance.
I used the c wrappers from libpruio: http://www.freebasic.net/forum/viewtopic.php?f=14&t=22501
and then use this code to get all my ADC values:
#include "stdio.h"
#include "c_wrapper/pruio.h" // include header
#include "sys/time.h"
//! The main function.
int main(int argc, char **argv) {
struct timeval start, now;
long mtime, seconds, useconds;
gettimeofday(&start, NULL);
int i,x;
pruIo *io = pruio_new(PRUIO_DEF_ACTIVE, 0x98, 0, 1); //! create new driver structure
if (pruio_config(io, 1, 0x1FE, 0, 4)){ // upload (default) settings, start IO mode
printf("config failed (%s)\n", io->Errr);}
else {
do {
gettimeofday(&now, NULL);
seconds = now.tv_sec - start.tv_sec;
useconds = now.tv_usec - start.tv_usec;
mtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
printf("%lu",mtime);
for(i = 1; i < 9; i++) {
printf(",%d", io->Adc->Value[i]); //0-66504 for 0-1.8v
}
printf("\n");
x++;
}while (mtime < 100);
printf("count: %d \n", x);
pruio_destroy(io); /* destroy driver structure */
}
return 0;
}
In your example you use libpruio in IO mode (synchronous) and therefore you have no control over the sampling rate, since the host CPU doesn't work in real-time.
To get the maximum sampling rate (as mentioned in the OP) you have to use either RB or MM mode. In those modes libpruio buffers the samples in memory and the host can access them asynchronously. See example rb_file.c (or triggers.bas) in the libpruio package.