I'm writing a small Kernel to learn more about Operating ystems.
I recently decided to start implementing User Mode, just for fun.
To achieve this, I followed this guide: https://blog.llandsmeer.com/tech/2019/07/21/uefi-x64-userland.html
Unfortunately, though, I've seen nothing but gpfaults, page faults and reboots in the last 24 hours. I tried and retried, following many different guides, from the OSDev Wiki, to random blogs, and checking with Volume 2 of the AMD Programmer's Manual for x86-64, but nothing. It seems as though, instead of jumping to user_main, sysretq rejumps to kernel_main (Indeed, running the function twice results in the same weird page fault - with a random text output that should only be displaied once [at boot]). If i use sysret or o64 sysret instead of sysretq, QEMU outright resets.
I seriously don't know how to deal with this problem.
Links and references:
You can find my Kernel at: https://github.com/Alessandro-Salerno/SalernOS-Kernel
The code for entering User Mode can be found at src/User/Userspace/userspace.asm and the SCE (System Call Extension) code can be found at src/Syscall/sce.asm. The entry point is in src/kernel.c
The code I use in kernel.c to jump to Userspace.
...
#include "User/Userspace/userspace.h"
uint64_t user_stack[1024];
void user_main() {
while (TRUE);
}
void kernel_main(boot_t* __bootinfo) {
// init code (Up to line 74 in src/kernel.c)
kernel_userspace_enter(user_main, &user_stack[500]);
}
Nth Edit: I used log cpu_reset in QEMU to get some info when the system crashes:
CPU Reset (CPU 0)
RAX=0000000000006297 RBX=000000000ff1c2b0 RCX=0000000000002d60 RDX=00000000ff000000
RSI=0000000000009ff8 RDI=0000000000002d60 RBP=0000000000000000 RSP=0000000000009ff8
R8 =0000000000000000 R9 =000000000000a1f0 R10=cccccccccccccccd R11=0000000000000202
R12=000000000ff1c176 R13=000000000ff1c177 R14=0000000000006296 R15=0000000000000000
RIP=0000000000002d60 RFL=00000202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 00000fff 00a09300 DPL=0 DS [-WA]
CS =002b 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
SS =0023 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
DS =0010 0000000000000000 00000fff 00a09300 DPL=0 DS [-WA]
FS =0010 0000000000000000 00000fff 00a09300 DPL=0 DS [-WA]
GS =0010 0000000000000000 00000fff 00a09300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0030 000000000000a000 00068fff 00a08900 DPL=0 TSS64-avl
GDT= 0000000000005000 00000fff
IDT= 000000000021f000 00000fff
CR0=80010033 CR2=fffffffffffffff8 CR3=0000000000100000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=000000000ff1c150 CCO=EFLAGS
EFER=0000000000000d01
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=ff000000ff000000 ff000000ff000000 XMM01=0000000000000000 3ff0000000000000
XMM02=0000000000000000 0000000000000000 XMM03=0015001600400003 0038004000000000
XMM04=0000000000000000 0000000000000000 XMM05=0000000000000000 0000000000000000
XMM06=0000000000000000 0000000000000000 XMM07=0000000000000000 0000000000000000
XMM08=0000000000000000 0000000000000000 XMM09=0000000000000000 0000000000000000
XMM10=0000000000000000 0000000000000000 XMM11=0000000000000000 0000000000000000
XMM12=0000000000000000 0000000000000000 XMM13=0000000000000000 0000000000000000
XMM14=0000000000000000 0000000000000000 XMM15=0000000000000000 0000000000000000
I solved it.
At letast I think so....
Remember to get the size of the GDT instance, not the GDT Type when setting up your GDT. If I find something else, I will update this answer.
Related
I am running a bash script launching many MATLAB independent processes in parallel in background. However, some of them shutdown, probably due to memory constraints. This is the report I get from the crash dump file. Do you have any idea on how to prevent this happening? Thanks
--------------------------------------------------------------------------------
abort() detected at Fri Feb 10 03:57:00 2023 +0100
--------------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled - No sandbox or build area path
Crash Mode : continue (default)
Default Encoding : UTF-8
Deployed : false
GNU C Library : 2.17 stable
Graphics Driver : Unknown software
Graphics card 1 : 0x102b ( 0x102b ) 0x533 Version 0.0.0.0 (0-0-0)
Java Version : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
MATLAB Architecture : glnxa64
MATLAB Entitlement ID : 841490
MATLAB Root : /data/cad/Matlab/R2019b
MATLAB Version : 9.7.0.1319299 (R2019b) Update 5
OpenGL : software
Operating System : "CentOS Linux release 7.4.1708 (Core) "
Process ID : 30261
Processor ID : x86 Family 6 Model 79 Stepping 1, GenuineIntel
Session Key : 29ff05b5-c4c1-448e-b09a-c651244a5c8b
Static TLS mitigation : Disabled: Unnecessary
Window System : No active display
Fault Count: 1
Abnormal termination:
abort()
Register State (from fault):
RAX = 0000000000000000 RBX = 00007fbfcfc36c40
RCX = ffffffffffffffff RDX = 0000000000000006
RSP = 00007fbfcfc367d8 RBP = 0000000000000000
RSI = 0000000000007f51 RDI = 0000000000007635
R8 = 0000000000000038 R9 = 0000000000000038
R10 = 0000000000000008 R11 = 0000000000000206
R12 = 0000000000030005 R13 = 00007fbfcfc36c70
R14 = 0000000000000000 R15 = 0000000000000000
RIP = 00007fbfeeaff1f7 EFL = 0000000000000206
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x00007fbfeeaff1f7 /lib64/libc.so.6+00217591 gsignal+00000055
[ 1] 0x00007fbfeeb008e8 /lib64/libc.so.6+00223464 abort+00000328
[ 2] 0x00007fbfd0593a53 /data/cad/Matlab/R2019b/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+00731731
[ 3] 0x00007fbfd057fe7f /data/cad/Matlab/R2019b/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+00650879
[ 4] 0x00007fbfd05c5805 /data/cad/Matlab/R2019b/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+00935941
[ 5] 0x0000000000001000 <unknown-module>+00000000
I try running a srio interface on my p2020 custom board. I plug a FPGA board with srio firmware to SRIO1 and configure SRIO as a host.
In uboot_config
#define CONFIG_SRIO1 /* SRIO port 1 */
#define CONFIG_SYS_SRIO1_MEM_VIRT 0xC0000000
#define CONFIG_SYS_SRIO1_MEM_BUS 0xC0000000
#define CONFIG_SYS_SRIO1_MEM_PHYS CONFIG_SYS_SRIO1_MEM_BUS
#define CONFIG_SYS_SRIO1_MEM_SIZE 0x10000000 /* 256M */
in tlb.c
SET_TLB_ENTRY(1, CONFIG_SYS_SRIO1_MEM_VIRT, CONFIG_SYS_SRIO1_MEM_PHYS,
MAS3_SX | MAS3_SW | MAS3_SR,
MAS2_I | MAS2_G,
0, 3, BOOKE_PAGESZ_256M, 1),
Try to read srio memory from u-boot
=> md.l 0xc0000000
c0000000:
p2020 is stucked.
I can watch a read request and read response on FPGA board.
Why I can't read a srio memory?
I set a gpio 'marks' for each Interrupt vector in start.S. When I try to read Srio memory, uboot is stacked. An interrupt doesn't occur. I cannot determine the cause of the error.
I tried to read the SRIO1 memory from linux
# devmem 0xc0000000 32
Disabling lock debugging due to kernel taint
Machine check in kernel mode.
Caused by (from MCSR=10008): Bus - Read Data Bus Error
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2 P2020 DS
Modules linked in:
CPU: 1 PID: 1578 Comm: devmem Tainted: G M 4.9.34 #28
task: eb161a80 task.stack: ef0ca000
NIP: 1000b5fc LR: 1000b510 CTR: c02e1108
MSR: 0202d000 <VEC,CE,EE,PR,ME> CR: 40000242 XER: 20000000
DEAR: b7e79000 ESR: 00000000
GPR00: 40000242 bfab4250 b7e81470 b7e79000 1000b510 40000242 b7e79000 b7d88444
GPR08: 0202d000 00000000 b7e79000 bfab4250 ef0ca000 100c8126 00000000 00000000
GPR16: 00000000 00000000 100a3560 100c0000 100c3fc5 00000000 100c0000 00000003
GPR24: 100c225c 100c0000 00000000 00001000 bfab4554 00000000 b7e79000 00000020
NIP [1000b5fc] 0x1000b5fc
LR [1000b510] 0x1000b510
Oops: Machine check, sig: 7 [#1]
arch/powerpc/kernel/traps.c +731
Machine check exception usually means a hardware problem. I connected the SRIO1 port to a SRIO2 of my p2020 (SRIO2 address starts at 0xd0000000)
# devmem 0xc0000000
0x00710002
# devmem 0xd0000000
0x00710002
It's work! I think, the problem in the FPGA board.
I have some trouble on creating parpool in matlab in slurm
when I submit the job, it will get stuck :
parpool size: 24
Starting parallel pool (parpool) using the 'local' profile ...
or error
{Error using parpool (line 104)
Failed to start a parallel pool. (For information in addition to the causing
error, validate the profile 'local' in the Cluster Profile Manager.)
Error in run (line 86)
evalin('caller', [script ';']);
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line
666)
Failed to initialize the interactive session.
Error using
parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus
(line 767)
The interactive communicating job failed with no message.
}
There is also a matlab crash dump file
------------------------------------------------------------------------
Segmentation violation detected at Sun Apr 2 11:36:33 2017
------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled - No sandbox or build area path
Crash Mode : continue (default)
Default Encoding : UTF-8
GNU C Library : 2.23 stable
Host Name : wmc-slave-g2
MATLAB Architecture : glnxa64
MATLAB Root : /opt/matlab/R2017a
MATLAB Version : 9.2.0.538062 (R2017a)
Operating System : Linux 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64
Processor ID : x86 Family 6 Model 79 Stepping 1, GenuineIntel
Fault Count: 1
Abnormal termination:
Segmentation violation
Register State (from fault):
RAX = 00007f7410256900 RBX = 0000000000000000
RCX = 0000000000000000 RDX = 00007f7410256978
RSP = 00007f741e240868 RBP = 00007f741e240870
RSI = 0000000000000000 RDI = 00007f741e240870
R8 = 0000000000000000 R9 = 0000000000000000
R10 = 00000000000000ed R11 = 00007f743afade60
R12 = 00007f7410256978 R13 = 00007f741e2408a0
R14 = 00007f741e2409f0 R15 = 00007f7410258110
RIP = 00007f743afade60 EFL = 0000000000010202
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x00007f743afade60 /opt/matlab/R2017a/bin/glnxa64/libboost_thread.so.1.56.0+00069216 _ZNK5boost6thread15get_thread_infoEv+00000000
[ 1] 0x0000000000000000 <unknown-module>+00000000
If this problem is reproducible, please submit a Service Request via:
A technical support engineer might contact you with further information.
Thanks
This is usually due to java version incompatibility used for Matlab on Linux. issue:
echo $MATLAB_JAVA
If this variable set, you can unset this option and let Matlab use its own java. you can run a script like this:
#!/bin/sh
unset MATLAB_JAVA
matlab -desktop
I'm trying to add new (dummy) system call to linux kernel.
1) I added the system call code under linux-source/kernel/myfile.c and updated the Makefile accordingly.
2) Updated syscall.h, unistd.h and entry.S files to reflect the new system call (pedagogictime(int flag,struct timeval *time))
Then compiled the kernel and installed and rebooted the image.
When I run: cat /proc/kallsyms | grep "pedag", this is the output I'm getting
0000000000000000 T sys_pedagogictime
0000000000000000 d event_exit__pedagogictime
0000000000000000 d event_enter__pedagogictime
0000000000000000 d __syscall_meta_pedagogictime
0000000000000000 d types_pedagogictime
0000000000000000 d args__pedagogictime
0000000000000000 t trace_init_flags_enter__pedagogictime
0000000000000000 t trace_init_flags_exit__pedagogictime
0000000000000000 t __event_exit__pedagogictime
0000000000000000 t __event_enter__pedagogictime
0000000000000000 t __p_syscall_meta__pedagogictime
0000000000000000 t __initcall_trace_init_flags_exit__pedagogictimeearly
0000000000000000 t __initcall_trace_init_flags_enter__pedagogictimeearly
which means the system call is registered correctly.
In my user space program, I'm writing:
#define __NR_pedagogictime 1326 //1326 is my system call number
struct timeval *now = (struct timeval *)malloc(sizeof(struct timeval));
long ret = syscall(__NR_pedagogictime,0,now);
if(ret)
perror("syscall ");
But I'm getting the error:
"syscall : Function not implemented"
I would really appreciate any help about this. Thanks.
Edit:
Btw, the assembly code for the syscall() looks like this (if it helps):
movl $6, %esi
movl $1326, %edi
movl $0, %eax
call syscall
cltq
You've chosen the wrong syscall number. Take a look at how the kernel checks the syscall number limits here. For example (x86, 32bit):
496 ENTRY(system_call)
497 RING0_INT_FRAME # can't unwind into user space anyway
498 pushl_cfi %eax # save orig_eax
499 SAVE_ALL
500 GET_THREAD_INFO(%ebp)
501 # system call tracing in operation / emulation
502 testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
503 jnz syscall_trace_entry
504 cmpl $(nr_syscalls), %eax
505 jae syscall_badsys
506 syscall_call:
507 call *sys_call_table(,%eax,4)
508 movl %eax,PT_EAX(%esp) # store the return value
So, you can see that this code compares %eax (syscall number) and nr_syscalls (sys_call_table size). Above or equal leads to syscall_badsys.
You'll need to modify the arch/x86/include/asm/unistd_32.h header too.
I have a simple test program causing an infinite wait on lock.
public class SyncBlock
{
}
class Program
{
public static SyncBlock sync = new SyncBlock();
private static void ThreadProc()
{
try
{
Monitor.Enter(sync);
}
catch (Exception)
{
//Monitor.Exit(sync);
Console.WriteLine("3rd party code threw an exception");
}
}
static void Main(string[] args)
{
Thread newThread = new Thread(ThreadProc);
newThread.Start();
Console.WriteLine("Acquiring lock");
Monitor.Enter(sync);
Console.WriteLine("Releasing lock");
Monitor.Exit(sync);
}
}
So the main thread is basically get locked when it tries to do Monitor.Enter(sync). If I looked at !clrStack on main thread, its output basically show it which make sense but when I try to see native side of stack, I am expecting to see some Wait on single/multiple object type of call but I don't see it. Can anyone explain it. Thanks
0:000> !CLRStack
PDB symbol for mscorwks.dll not loaded
OS Thread Id: 0x1e8 (0)
ESP EIP
0012f0a8 77455e74 [GCFrame: 0012f0a8]
0012f178 77455e74 [HelperMethodFrame_1OBJ: 0012f178] System.Threading.Monitor.Enter (System.Object)
0012f1d0 00a40177 ConsoleApplication1.Program.Main(System.String[])
0012f400 70fc1b4c [GCFrame: 0012f400]
0:000> kb
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
0012eeb4 710afb92 0012ee68 002d6280 00000000 ntdll!KiFastSystemCallRet
0012ef1c 710af7c3 00000001 002d6280 00000000 mscorwks!StrongNameFreeBuffer+0x1b1f2
0012ef3c 710af8cc 00000001 002d6280 00000000 mscorwks!StrongNameFreeBuffer+0x1ae23
0012efc0 710af961 00000001 002d6280 00000000 mscorwks!StrongNameFreeBuffer+0x1af2c
0012f010 710afae1 00000001 002d6280 00000000 mscorwks!StrongNameFreeBuffer+0x1afc1
0012f06c 70fdc5ae ffffffff 00000001 00000000 mscorwks!StrongNameFreeBuffer+0x1b141
0012f080 710df68a ffffffff 00000001 00000000 mscorwks!LogHelp_NoGuiOnAssert+0x10562
0012f10c 710b1154 002aad90 ffffffff 002aad90 mscorwks!StrongNameFreeBuffer+0x4acea
0012f128 710b10d8 42b8b47d 00000000 002aad90 mscorwks!StrongNameFreeBuffer+0x1c7b4
0012f1e0 70fc1b4c 0012f1f0 0012f230 0012f270 mscorwks!StrongNameFreeBuffer+0x1c738
0012f1f0 70fd2219 0012f2c0 00000000 0012f290 mscorwks+0x1b4c
0012f270 70fe6591 0012f2c0 00000000 0012f290 mscorwks!LogHelp_NoGuiOnAssert+0x61cd
0012f3ac 70fe65c4 0023c038 0012f478 0012f444 mscorwks!CoUninitializeEE+0x2ead
0012f3c8 70fe65e2 0023c038 0012f478 0012f444 mscorwks!CoUninitializeEE+0x2ee0
0012f3e0 7103389d 0012f444 42b8b0f1 00000000 mscorwks!CoUninitializeEE+0x2efe
0012f544 710337bd 002332e0 00000001 0012f580 mscorwks!GetPrivateContextsPerfCounters+0xf546
0012f7ac 71033d0d 00000000 42b8b9c9 00000001 mscorwks!GetPrivateContextsPerfCounters+0xf466
0012fc7c 71033ef7 00ce0000 00000000 42b8979 mscorwks!GetPrivateContextsPerfCounters+0xf9b6
0012fccc 71033e27 00ce0000 42b8b8a1 00000000 mscorwks!CorExeMain+0x168
* ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscoreei.dll -
0012fd14 71cf55ab 71033d8f 0012fd30 71f37f16 mscorwks!CorExeMain+0x98
* ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\system32\mscoree.dll -
0012fd20 71f37f16 00000000 71cf0000 0012fd44 mscoreei!CorExeMain+0x38
0012fd30 71f34de3 00000000 7723d0e9 7ffd8000 mscoree!CreateConfigStream+0x13f
0012fd44 774319bb 7ffd8000 084952f9 00000000 mscoree!CorExeMain+0x8
0012fd84 7743198e 71f34ddb 7ffd8000 00000000 ntdll!RtlInitializeExceptionChain+0x63
0012fd9c 00000000 71f34ddb 7ffd8000 00000000 ntdll!RtlInitializeExceptionChain+0x36
You have to point windbg to the microsoft windows symbols server to get a good stack trace.
type in the following in your windbg command window:
.sympath srv*c:\websymbols*http://msdl.microsoft.com/download/symbols
Also see this:
Using microsoft symbol server to get symbols
Also, to answer your original question about how to debug this, here is the cookbook:
0:000> !clrstack
OS Thread Id: 0x1358 (0)
ESP EIP
0012f328 7c90e514 [GCFrame: 0012f328]
0012f3f8 7c90e514 [HelperMethodFrame_1OBJ: 0012f3f8] System.Threading.Monitor.Enter(System.Object)
0012f450 00d10177 Program.Main(System.String[])
0012f688 79e71b4c [GCFrame: 0012f688]
In your original program, the background thread was started first. So, it acquired the lock. However it exited without releasing the lock. After that your main thread tried to acquire the lock and it is stuck because the lock is already owned.
How do you find out who owns it? First do a !threads followed by !syncblk.
0:000> !threads
ThreadCount: 3
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 1
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
0 1 1358 0014bb00 200a020 Enabled 00000000:00000000 001540d0 0 MTA
2 2 1360 0015e320 b220 Enabled 00000000:00000000 001540d0 0 MTA (Finalizer)
XXXX 3 0 00175a98 9820 Enabled 00000000:00000000 001540d0 1 Ukn
0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
2 0017903c 3 1 00175a98 0 XXX 013503cc SyncBlock
-----------------------------
Total 2
CCW 0
RCW 0
ComClassFactory 0
Free 0
As you can see, !syncblk says that the owining thread object is 00175a98. From the !threads output, you can see that thread object 00175a98 is the dead thread that exited while owning the lock.
Hope this helps.