RISC-V ecall syscall calling convention on pk/Linux - system-calls

What is the calling convention for a syscall in a program that runs under the RISC-V pseudo-kernel (pk) or Linux?
Looking at the code generated by the riscv-gnu-toolchain the rules seem to be:
syscall number is passed in a7
syscall arguments are passed in a0 to a5
unused arguments are set to 0
return value is returned in a0
Is this it?
Is it really necessary to zero-out the unused arguments?
What about register a6? Can this be used for yet another sycall argument?
Example that calls the exit() syscall:
li a0, 1 # argument that is used by the syscall
li a1, 0 # unused arguments
li a2, 0
li a3, 0
li a4, 0
li a5, 0
li a7, 93 # exit syscall number

Yes, this is basically it.
No, it isn't necessary to zero out the unused arguments. The zeroing out of unused arguments when using the riscv-gnu-toolchain (with the newlib C library) is just an artifact of the newlib sycall calling code. To keep it simple, the code has a single scall (old name for ecall) wrapper with 6 syscall arguments. Thus, the exit() implementation just calls that wrapper with some additional zeroes.
As of 2020, the maximum number of syscall arguments in Linux is 6. The same goes for the pseudo-kernel. Thus, a6 is always unused.
Both Linux and pk supply the syscall number in a7. And the syscall numbers used by pk follow the Linux standard.
The syscall(2) Linux man-page also summarizes the calling conventions on different architectures, including RISC-V. It specifies a1 as possibly used to return a second return value, but this doesn't match the code in glibc and newlib.

Related

How to get ioctl command value of the given driver?

How to get ioctl command value (integer value) of the given driver, which is not part of kernel source tree.
Example
#define ioctl_cmd _IOW('a', 1, struct example*)
I need an integer value of the command ioctl_cmd without actually modifying the driver.
The _IOW(type,nr,size) macro is defined for userspace code by #include <linux/ioctl.h>. The actual source of the macro is in "/usr/include/asm-generic/ioctl.h".
One way to get the integer value of the ioctl command value is to print it to the terminal in a C program:
#include <stdio.h>
#include <linux/ioctl.h>
#include "your_driver_ioctls.h" // defines `ioctl_cmd`
int main(void)
{
printf("ioctl_cmd = %u (0x%x)\n", ioctl_cmd, ioctl_cmd);
}
Alternatively, you can look at the definition of _IOW in the source to see how the ioctl command code is composed:
Bits 31 to 30 indicate the direction of transfer of the memory pointed to by the optional third argument of the ioctl() call:
_IOC_NONE = 0 (no direction)
_IOC_WRITE = 1 (userland is writing to kernel)
_IOC_READ = 2 (userland is reading from kernel)
_IOC_WRITE | _IOC_READ = 3 (userland is writing to and reading from kernel)
The _IOW(type,nr,size) macro sets the direction to _IOC_WRITE.
Bits 29 to 16 indicate the 14-bit size of the memory pointed to by the optional third argument of the ioctl() call. The _IOW(type,nr,size) macro sets this to the size of the type specified in the third parameter of the macro call (sizeof(size)).
Bits 15 to 8 indicate the 8-bit "type number" of the ioctl command code. Historically, a single ASCII character value was used for the type number, but any unsigned number up to 255 can actually be used. All the ioctl command codes defined for a device generally use the same type number. The _IOW(type,nr,size) macro sets this to the first parameter of the macro call (type).
Bits 7 to 0 indicate the 8-bit "function number" of the ioctl command code. The _IOW(type,nr,size) macro sets this to the second parameter of the macro call (nr).
Note that the above way of defining ioctl command codes is mostly just a convention. In particular, earlier subsystems such as TTY use a simpler scheme consisting of just a "type number" and a "function number").
Your #define ioctl_cmd _IOW('a', 1, struct example*) is unusual because it says that the optional third argument of the ioctl() call points to a struct example* and the size of that would be 4 or 8 (depending on the size of pointers in userspace). More conventionally, it would be defined as _IOW('a', 1, struct example).

windbg help missing kernel32 function

I am trying to follow this tutorial here https://www.microsoftpressstore.com/articles/article.aspx?p=2201303 specifically the part where it mentions x kernel32!writeprocessmemory
I am unable to find the method kernel32!WriteProcessMemory = even though documentation mentions it but i can find
kernel32!_imp__WriteProcessMemory and kernel32!WriteProcessMemoryStub. I am new to windbg and trying to follow the tutorial so i am not sure if this method has been deprecated and if so, what is it's substitute and how do we achieve similar functionality.
Thanks
The exported WriteProcessMemory function in fact points to the kernel32!WriteProcessMemoryStub stub which itself jumps onto the kernel32!__imp_WriteProcessMemory which redirects to the kernelbase DLL which is the "real" location for this function.
Let's check with a link dump:
C:>link /dump /exports c:\windows\system32\kernel32.dll | findstr /I WriteProcess
1579 62A 00036C50 WriteProcessMemory
0x36C50 is the RVA where the function "WriteProcessMemory" resides in kernel32 (as given by the export table). Now in windbg:
0:007> ln kernel32 + 0x36c50
Browse module
Set bu breakpoint
(00007ff9`4a6e6c50) KERNEL32!WriteProcessMemoryStub | (00007ff9`4a6e6c60) KERNEL32!ZombifyActCtxStub
We have an exact match which is in fact the KERNEL32!WriteProcessMemoryStub function. If we look at it:
0:007> u KERNEL32!WriteProcessMemoryStub
KERNEL32!WriteProcessMemoryStub:
00007ff9`4a6e6c50 48ff2599150400 jmp qword ptr [KERNEL32!_imp_WriteProcessMemory (00007ff9`4a7281f0)]
00007ff9`4a6e6c57 cc int 3
We can see it's just a jump to KERNEL32!_imp_WriteProcessMemory (located somewhere in the .idata section of kernel32).
Now if we look at what is contained at this location, we have a pointer:
0:007> dp KERNEL32!_imp_WriteProcessMemory L1
00007ff9`4a7281f0 00007ff9`496f0ca0
If we ask windbg what is this pointer:
0:007> ln 00007ff9`496f0ca0
Browse module
Set bu breakpoint
(00007ff9`496f0ca0) KERNELBASE!WriteProcessMemory | (00007ff9`496f0dc4) KERNELBASE!OpenWow64CrossProcessWorkConnection
Exact matches:
KERNELBASE!WriteProcessMemory (void)
We can see that in fact the "real" location for the WriteProcessMemory is in fact in kernelbase.dll.
note: you can actually do the last two commands in one with dps:
0:007> dps KERNEL32!_imp_WriteProcessMemory L1
00007ff9`4a7281f0 00007ff9`496f0ca0 KERNELBASE!WriteProcessMemory
Windbg command used:
ln (List Nearest Symbols): given an address, find the nearest symbol.
u (unassemble): used to disassemble a function
dp (display memory): display memory (pointer sized).
dps(Display Words and Symbols): as dp but with symbolic information.

Using scanf in x86-64 gas assembly gives sigsegv [duplicate]

When compiling below code:
global main
extern printf, scanf
section .data
msg: db "Enter a number: ",10,0
format:db "%d",0
section .bss
number resb 4
section .text
main:
mov rdi, msg
mov al, 0
call printf
mov rsi, number
mov rdi, format
mov al, 0
call scanf
mov rdi,format
mov rsi,[number]
inc rsi
mov rax,0
call printf
ret
using:
nasm -f elf64 example.asm -o example.o
gcc -no-pie -m64 example.o -o example
and then run
./example
it runs, print: enter a number:
but then crashes and prints:
Segmentation fault (core dumped)
So printf works fine but scanf not.
What am I doing wrong with scanf so?
Use sub rsp, 8 / add rsp, 8 at the start/end of your function to re-align the stack to 16 bytes before your function does a call.
Or better push/pop a dummy register, e.g. push rdx / pop rcx, or a call-preserved register like RBP you actually wanted to save anyway. You need the total change to RSP to be an odd multiple of 8 counting all pushes and sub rsp, from function entry to any call.
i.e. 8 + 16*n bytes for whole number n.
On function entry, RSP is 8 bytes away from 16-byte alignment because the call pushed an 8-byte return address. See Printing floating point numbers from x86-64 seems to require %rbp to be saved,
main and stack alignment, and Calling printf in x86_64 using GNU assembler. This is an ABI requirement which you used to be able to get away with violating when there weren't any FP args for printf. But not any more.
See also Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
To put it another way, RSP % 16 == 8 on function entry, and you need to ensure RSP % 16 == 0 before you call a function. How you do this doesn't matter. (Not all functions will actually crash if you don't, but the ABI does require/guarantee it.)
gcc's code-gen for glibc scanf now depends on 16-byte stack alignment
even when AL == 0.
It seems to have auto-vectorized copying 16 bytes somewhere in __GI__IO_vfscanf, which regular scanf calls after spilling its register args to the stack1. (The many similar ways to call scanf share one big implementation as a back end to the various libc entry points like scanf, fscanf, etc.)
I downloaded Ubuntu 18.04's libc6 binary package: https://packages.ubuntu.com/bionic/amd64/libc6/download and extracted the files (with 7z x blah.deb and tar xf data.tar, because 7z knows how to extract a lot of file formats).
I can repro your bug with LD_LIBRARY_PATH=/tmp/bionic-libc/lib/x86_64-linux-gnu ./bad-printf, and also it turns out with the system glibc 2.27-3 on my Arch Linux desktop.
With GDB, I ran it on your program and did set env LD_LIBRARY_PATH /tmp/bionic-libc/lib/x86_64-linux-gnu then run. With layout reg, the disassembly window looks like this at the point where it received SIGSEGV:
│0x7ffff786b49a <_IO_vfscanf+602> cmp r12b,0x25 │
│0x7ffff786b49e <_IO_vfscanf+606> jne 0x7ffff786b3ff <_IO_vfscanf+447> │
│0x7ffff786b4a4 <_IO_vfscanf+612> mov rax,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ab <_IO_vfscanf+619> add rax,QWORD PTR [rbp-0x458] │
│0x7ffff786b4b2 <_IO_vfscanf+626> movq xmm0,QWORD PTR [rbp-0x460] │
│0x7ffff786b4ba <_IO_vfscanf+634> mov DWORD PTR [rbp-0x678],0x0 │
│0x7ffff786b4c4 <_IO_vfscanf+644> mov QWORD PTR [rbp-0x608],rax │
│0x7ffff786b4cb <_IO_vfscanf+651> movzx eax,BYTE PTR [rbx+0x1] │
│0x7ffff786b4cf <_IO_vfscanf+655> movhps xmm0,QWORD PTR [rbp-0x608] │
>│0x7ffff786b4d6 <_IO_vfscanf+662> movaps XMMWORD PTR [rbp-0x470],xmm0 │
So it copied two 8-byte objects to the stack with movq + movhps to load and movaps to store. But with the stack misaligned, movaps [rbp-0x470],xmm0 faults.
I didn't grab a debug build to find out exactly which part of the C source turned into this, but the function is written in C and compiled by GCC with optimization enabled. GCC has always been allowed to do this, but only recently did it get smart enough to take better advantage of SSE2 this way.
Footnote 1: printf / scanf with AL != 0 has always required 16-byte alignment because gcc's code-gen for variadic functions uses test al,al / je to spill the full 16-byte XMM regs xmm0..7 with aligned stores in that case. __m128i can be an argument to a variadic function, not just double, and gcc doesn't check whether the function ever actually reads any 16-byte FP args.

How to implement 16 bit->32 bit lookup table in ARM assembly using NEON?

In an iOS 6 project, I have a buffer containing two byte words (16 bits) that need to be translated to four byte words (32 bits) via a lookup table. I hard-code the values into the table, and then use the the value of the two byte buffer to determine which 32 bit table value to retrieve. Here's an example:
void map_values(uint32_t *dst,uint16_t *src,uint32_t *lut,int buf_length){
int i=0;
for(i=0;i<buf_length;i++){
*dst = *(lut+(*src));
dst++;
src++;
}
}
The problem is, it's too slow. Could this be sped up by processing 4 output bytes at a time using NEON? The thing is, I'm iffy on how to take the value from the src buffer and use that as an input to the lookup table to figure out what value to retrieve. Also, the word lengths are the same in the table and the output buffer, but not for the source. So, I can only read two 16 bit words as input, versus the four 32 bit word output I need. Any ideas? Is there a better way to approach this problem, perhaps?
Current asm output from clang (clang -O3 -arch armv7 lut.c -S):
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__textcoal_nt,coalesced,pure_instructions
.section __TEXT,__const_coal,coalesced
.section __TEXT,__picsymbolstub4,symbol_stubs,none,16
.section __TEXT,__StaticInit,regular,pure_instructions
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _map_values
.align 2
.code 16 # #map_values
.thumb_func _map_values
_map_values:
# BB#0:
cmp r3, #0
it eq
bxeq lr
LBB0_1: # %.lr.ph
# =>This Inner Loop Header: Depth=1
ldrh r9, [r1], #2
subs r3, #1
ldr.w r9, [r2, r9, lsl #2]
str r9, [r0], #4
bne LBB0_1
# BB#2: # %._crit_edge
bx lr
.subsections_via_symbols
Lookup tables are (nearly) unvectorizable. Very small lookup tables can be handled with the vtbl instruction, but your lookup table is far too big for that.
What are you using the lookup table for? If the values can be computed on the fly without too much work instead of looking them up, that may actually be a significant win for you.
My first thought is that you might get some luck out of vtablelookup in the vecLib portion of the Accelerate framework. The signature is:
vUInt32 vtablelookup (
vSInt32 Index_Vect,
uint32_t *Table
);
where vSInt32 and vUInt32 are 128 bit packed 32 bit signed/unsigned integers respectively. I believe the function is backed by NEON on ARM. The big problem will be converting your src array into 32 bit indices, which could well slow things down so much as to render the speed gains from vectorising the lookup pointless.

How to use command line arguments in Fortran?

GCC version 4.6
The Problem: To find a way to feed in parameters to the executable, say a.out, from the command line - more specifically feed in an array of double precision numbers.
Attempt: Using the READ(*,*) command, which is older in the standard:
Program test.f -
PROGRAM MAIN
REAL(8) :: A,B
READ(*,*) A,B
PRINT*, A+B, COMMAND_ARGUMENT_COUNT()
END PROGRAM MAIN
The execution -
$ gfortran test.f
$ ./a.out 3.D0 1.D0
This did not work. On a bit of soul-searching, found that
$./a.out
3.d0,1.d0
4.0000000000000000 0
does work, but the second line is an input prompt, and the objective of getting this done in one-line is not achieved. Also the COMMAND_ARGUMENT_COUNT() shows that the numbers fed into the input prompt don't really count as 'command line arguments', unlike PERL.
If you want to get the arguments fed to your program on the command line, use the (since Fortran 2003) standard intrinsic subroutine GET_COMMAND_ARGUMENT. Something like this might work
PROGRAM MAIN
REAL(8) :: A,B
integer :: num_args, ix
character(len=12), dimension(:), allocatable :: args
num_args = command_argument_count()
allocate(args(num_args)) ! I've omitted checking the return status of the allocation
do ix = 1, num_args
call get_command_argument(ix,args(ix))
! now parse the argument as you wish
end do
PRINT*, A+B, COMMAND_ARGUMENT_COUNT()
END PROGRAM MAIN
Note:
The second argument to the subroutine get_command_argument is a character variable which you'll have to parse to turn into a real (or whatever). Note also that I've allowed only 12 characters in each element of the args array, you may want to fiddle around with that.
As you've already figured out read isn't used for reading command line arguments in Fortran programs.
Since you want to read an array of real numbers, you might be better off using the approach you've already figured out, that is reading them from the terminal after the program has started, it's up to you.
The easiest way is to use a library. There is FLAP or f90getopt available. Both are open source and licensed under free licenses.
The latter is written by Mark Gates and me, just one module and can be learned in minutes but contains all what is needed to parse GNU- and POSIX-like command-line options. The first is more sophisticated and can be used even in closed-source projects. Check them out.
Furthermore libraries at https://fortranwiki.org/fortran/show/Command-line+arguments
What READ (*,*) does is that it reads from the standard input. For example, the characters entered using the keyboard.
As the question shows COMMAND_ARGUMENT_COUNT() can be used to get the number of the command line arguments.
The accepted answer by High Performance Mark show how to retrieve the individual command line arguments separated by blanks as individual character strings using GET_COMMAND_ARGUMENT(). One can also get the whole command line using GET_COMMAND(). One then has to somehow parse that character-based information into the data in your program.
I very simple cases you just need the program requires, for example, two numbers, so you read one number from arg 1 and another form arg 2. That is simple. Or you can read a triplet of numbers from a single argument if they are comma-separated like 1,2,3 using a simple read(arg,*) nums(1:3).
For general complicated command line parsing one uses libraries such as those mentioned in the answer by Hani. You have set them up so that the library knows the expected syntax of the command line arguments and the data it should fill with the values.
There is a middle ground, that is still relatively simple, but one already have multiple arguments, that correspond to Fortran variables in the program, that may or may not be present. In that case one can use the namelist for the syntax and for the parsing.
Here is an example, the man point is the namelist /cmd/ name, point, flag:
implicit none
real :: point(3)
logical :: flag
character(256) :: name
character(1024) :: command_line
call read_command_line
call parse_command_line
print *, point
print *, "'",trim(name),"'"
print *, flag
contains
subroutine read_command_line
integer :: exenamelength
integer :: io, io2
command_line = ""
call get_command(command = command_line,status = io)
if (io==0) then
call get_command_argument(0,length = exenamelength,status = io2)
if (io2==0) then
command_line = "&cmd "//adjustl(trim(command_line(exenamelength+1:)))//" /"
else
command_line = "&cmd "//adjustl(trim(command_line))//" /"
end if
else
write(*,*) io,"Error getting command line."
end if
end subroutine
subroutine parse_command_line
character(256) :: msg
namelist /cmd/ name, point, flag
integer :: io
if (len_trim(command_line)>0) then
msg = ''
read(command_line,nml = cmd,iostat = io,iomsg = msg)
if (io/=0) then
error stop "Error parsing the command line or cmd.conf " // msg
end if
end if
end subroutine
end
Usage in bash:
> ./command flag=T name=\"data.txt\" point=1.0,2.0,3.0
1.00000000 2.00000000 3.00000000
'data.txt'
T
or
> ./command flag=T name='"data.txt"' point=1.0,2.0,3.0
1.00000000 2.00000000 3.00000000
'data.txt'
T
Escaping the quotes for the string is unfortunately necessary, because bash eats the first quotes.