I have a school project, recoding a strace-like command on a x86_64 OpenSUSE. (Intell i7)
For this purpose we are, of course, using ptrace system call but it is forbiden to use PTRACE_SYSCALL. We have to use PTRACE_SINGLESTEP and detect systems calls thanks to PTRACE_PEEKTEXT and opcodes corresponding to system calls instructions (0x80CD for int0x80, 0x050F for syscall and 0x340F for sysenter).
Until there, I'm good. But then we have to fetch the parameters of the system call. For syscall and intx80 it's kind of easy, I look into rax to know which system call it is, then I look into rdi, rsi, rdx, etc.
But for sysenter I cannot find how it's really working. So I tried to code a little assembly program to test those 3 instructions.
global main
section .text
push rbp
mov rbp, rsp
mov rdi, 1
mov rsi, FormatStr
mov rdx, 30
mov rax, 1
section .rodata
FormatStr db 'Hello World ! Sysenter Test !',0Ah,0
Which works perfectly fine !
Now for the int 0x80 version I just change the number of the system call in rax from 1 to 4. (In 32, dunno why but the system calls numbers aren't the same)
global main
section .text
push rbp
mov rbp, rsp
mov rdi, 1
mov rsi, FormatStr
mov rdx, 30
mov rax, 4
int 0x80
section .rodata
FormatStr db 'Hello World ! Sysenter Test !',0Ah,0
Which works at 50%. A string is displayed but it's garbage.
Now if I put a sysenter I get a SIGILL signal. I tried with 1 and 4 in rax.
My project just has to run on my computer but I have to be able to detect and analyse binaries who are using sysenter
Can someone give a little explication on those things ?
Thank you !
Ps : sorry for my bad english


Using scanf in x86-64 gas assembly gives sigsegv [duplicate]

When compiling below code:
global main
extern printf, scanf
section .data
msg: db "Enter a number: ",10,0
format:db "%d",0
section .bss
number resb 4
section .text
mov rdi, msg
mov al, 0
call printf
mov rsi, number
mov rdi, format
mov al, 0
call scanf
mov rdi,format
mov rsi,[number]
inc rsi
mov rax,0
call printf
nasm -f elf64 example.asm -o example.o
gcc -no-pie -m64 example.o -o example
and then run
it runs, print: enter a number:
but then crashes and prints:
Segmentation fault (core dumped)
So printf works fine but scanf not.
What am I doing wrong with scanf so?
Use sub rsp, 8 / add rsp, 8 at the start/end of your function to re-align the stack to 16 bytes before your function does a call.
Or better push/pop a dummy register, e.g. push rdx / pop rcx, or a call-preserved register like RBP you actually wanted to save anyway. You need the total change to RSP to be an odd multiple of 8 counting all pushes and sub rsp, from function entry to any call.
i.e. 8 + 16*n bytes for whole number n.
On function entry, RSP is 8 bytes away from 16-byte alignment because the call pushed an 8-byte return address. See Printing floating point numbers from x86-64 seems to require %rbp to be saved,
main and stack alignment, and Calling printf in x86_64 using GNU assembler. This is an ABI requirement which you used to be able to get away with violating when there weren't any FP args for printf. But not any more.
See also Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
To put it another way, RSP % 16 == 8 on function entry, and you need to ensure RSP % 16 == 0 before you call a function. How you do this doesn't matter. (Not all functions will actually crash if you don't, but the ABI does require/guarantee it.)
gcc's code-gen for glibc scanf now depends on 16-byte stack alignment
even when AL == 0.
It seems to have auto-vectorized copying 16 bytes somewhere in __GI__IO_vfscanf, which regular scanf calls after spilling its register args to the stack1. (The many similar ways to call scanf share one big implementation as a back end to the various libc entry points like scanf, fscanf, etc.)
I downloaded Ubuntu 18.04's libc6 binary package: and extracted the files (with 7z x blah.deb and tar xf data.tar, because 7z knows how to extract a lot of file formats).
I can repro your bug with LD_LIBRARY_PATH=/tmp/bionic-libc/lib/x86_64-linux-gnu ./bad-printf, and also it turns out with the system glibc 2.27-3 on my Arch Linux desktop.
With GDB, I ran it on your program and did set env LD_LIBRARY_PATH /tmp/bionic-libc/lib/x86_64-linux-gnu then run. With layout reg, the disassembly window looks like this at the point where it received SIGSEGV:
│0x7ffff786b49a <_IO_vfscanf+602> cmp r12b,0x25 │
│0x7ffff786b49e <_IO_vfscanf+606> jne 0x7ffff786b3ff <_IO_vfscanf+447> │
│0x7ffff786b4a4 <_IO_vfscanf+612> mov rax,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ab <_IO_vfscanf+619> add rax,QWORD PTR [rbp-0x458] │
│0x7ffff786b4b2 <_IO_vfscanf+626> movq xmm0,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ba <_IO_vfscanf+634> mov DWORD PTR [rbp-0x678],0x0 │
│0x7ffff786b4c4 <_IO_vfscanf+644> mov QWORD PTR [rbp-0x608],rax │
│0x7ffff786b4cb <_IO_vfscanf+651> movzx eax,BYTE PTR [rbx+0x1] │
│0x7ffff786b4cf <_IO_vfscanf+655> movhps xmm0,QWORD PTR [rbp-0x608] │
>│0x7ffff786b4d6 <_IO_vfscanf+662> movaps XMMWORD PTR [rbp-0x470],xmm0 │
So it copied two 8-byte objects to the stack with movq + movhps to load and movaps to store. But with the stack misaligned, movaps [rbp-0x470],xmm0 faults.
I didn't grab a debug build to find out exactly which part of the C source turned into this, but the function is written in C and compiled by GCC with optimization enabled. GCC has always been allowed to do this, but only recently did it get smart enough to take better advantage of SSE2 this way.
Footnote 1: printf / scanf with AL != 0 has always required 16-byte alignment because gcc's code-gen for variadic functions uses test al,al / je to spill the full 16-byte XMM regs xmm0..7 with aligned stores in that case. __m128i can be an argument to a variadic function, not just double, and gcc doesn't check whether the function ever actually reads any 16-byte FP args.

How to load second stage boot loader from first stage?

I have written simple first stage bootloader which displays "Hello world" using interrupt to bios. Now as a next obvious step to write a second stage, but where code for that should exist and how to load it from first stage ?
Here is a program for first stage
[BITS 16] ;Tells the assembler that its a 16 bit code
[ORG 0x7C00] ;Origin, tell the assembler that where the code will
;be in memory after it is been loaded
MOV SI, HelloString ;Store string pointer to SI
CALL PrintString ;Call print string procedure
JMP $ ;Infinite loop, hang it here.
PrintCharacter: ;Procedure to print character on screen
;Assume that ASCII value is in register AL
MOV AH, 0x0E ;Tell BIOS that we need to print one charater on screen.
MOV BH, 0x00 ;Page no.
MOV BL, 0x07 ;Text attribute 0x07 is lightgrey font on black background
INT 0x10 ;Call video interrupt
RET ;Return to calling procedure
PrintString: ;Procedure to print string on screen
;Assume that string starting pointer is in register SI
next_character: ;Lable to fetch next character from string
MOV AL, [SI] ;Get a byte from string and store in AL register
INC SI ;Increment SI pointer
OR AL, AL ;Check if value in AL is zero (end of string)
JZ exit_function ;If end then return
CALL PrintCharacter ;Else print the character which is in AL register
JMP next_character ;Fetch next character from string
exit_function: ;End label
RET ;Return from procedure
HelloString db 'Hello World', 0 ;HelloWorld string ending with 0
TIMES 510 - ($ - $$) db 0 ;Fill the rest of sector with 0
DW 0xAA55 ;Add boot signature at the end of bootloader
On x86 you would do the following (simplified):
Have the bootloader load the n-th sector of the disk/floppy (wherever you're booting from) into memory and execute it (i.e. load segment/offset and do retf). A better alternative is to search the filesystem for a certain filename (e.g. KERNEL.BIN) -- but you'd need to know the file system type (e.g. FAT12 if you're testing from a floppy image).
The kernel would then start in real mode. It sets up code descriptors, GDT, and so on, activates 32-bit addressing (you should have heard of "A20") and finally enters protected mode. Then you need a far jump to a 32-bit code segment (kernel file must be linked together in a way that the 32-bit code is at an absolute position, e.g. at offset 512, right after the 16-bit real mode stuff).
The 32-bit kernel assembly, then, just defines EXTERN _mykernel (for example) and calls that symbol.
Then you can begin writing your kernel as C function mykernel.
Okay that was a short overview of what I did a few years ago (with lots of copy&paste from the Internet ;). If that isn't helpful, here are some good web resources on OS development: (wiki with many hobbyist OS developers, German only...)
Hope that helps ^^
Look at the GRUB implementation here (stage 1):
First noticed the starting point at 0x7c00 and the end signature of 0xaa55 for this first sector. From within the disassembly, u can see this:
349 copy_buffer:
350 movw ABS(stage2_segment), %es
352 /*
353 * We need to save %cx and %si because the startup code in
354 * stage2 uses them without initializing them.
355 */
356 pusha
357 pushw %ds
359 movw $0x100, %cx
360 movw %bx, %ds
361 xorw %si, %si
362 xorw %di, %di
364 cld
366 rep
367 movsw
369 popw %ds
370 popa
372 /* boot stage2 */
373 jmp *(stage2_address)
375 /* END OF MAIN LOOP */
Essentially the logic is to copy the stage 2 code into another part of memory, and after that jump directly there, and that is "boot stage2". In other words, "boot stage1" is effectively triggered from BIOS after it has loaded the sector into memory, whereas stage2 is where you jump there - it can be anywhere.
Minimal runnable NASM BIOS example that loads stage 2 and jumps to it
org 0x7C00
; You should do further initializations here
; like setup the stack and segment registers.
; Load stage 2 to memory.
mov ah, 0x02
; Number of sectors to read.
mov al, 1
; This may not be necessary as many BIOS set it up as an initial state.
mov dl, 0x80
; Cylinder number.
mov ch, 0
; Head number.
mov dh, 0
; Starting sector number. 2 because 1 was already loaded.
mov cl, 2
; Where to load to.
mov bx, stage2
int 0x13
jmp stage2
; Magic bytes.
times ((0x200 - 2) - ($ - $$)) db 0x00
dw 0xAA55
; Print 'a'.
mov ax, 0x0E61
int 0x10
; Pad image to multiple of 512 bytes.
times ((0x400) - ($ - $$)) db 0x00
Compile and run:
nasm -f bin -o main.img main.asm
qemu-system-i386 main.img
Expected outcome: a gets printed to the screen, and then the program halts.
Tested on Ubuntu 14.04.
Saner GAS example using a linker script and more correct initialization (segment registers, stack) on my GitHub.