OS detection using only open(), read(), write() and close() - system-calls

I'm writing some code that is going to run under an hypervisor which only allows open, read, write and close syscalls to the external world.
Since part of the code is dependant on the platform it is being run on, I'd like to be able to automatically choose the appropriate code-path at runtime.
What is the most robust way of detecting the operating system using only these syscalls? I'm primarily interested in detecting windows and linux, but osx would be useful too.

uname watches for /proc/sys/kernel/ostype, you could use that.
For windows, it's worse. In theory, you should have C:\windows\system32\kernel32.dll. The problem is, however, that windows installation root is not required to be 'C:\' (although it's very common) - so i highly doubt ordinary open() could be considered reliable.

Related

Why can't compiled machine code (EXE, PE, APP) work on all platforms?

When C code is compiled to an exe/pe/app, (from my knowledge) it is converted into machine code. This can then be run by a processor.
My question is, since this is very low level, it shouldn't make any calls to OS specific functions (as these will already have been compiled to machine code as well). So why can't it be run on different platforms, like Linux, Windows, OSX?
The premise of the question is based on a big misunderstanding about how computers work.
Compile a simple "hello world" executable. Disassemble it, or let the Godbolt Compiler Explorer do that for you.
Does it contain a copy of the library implementation of puts / printf? No. It's dynamically linked to libc so every program doesn't need its own copy of every library function it uses.
Does it contain graphics drivers that actually draw the text in video memory? No, of course not, and that wouldn't even be possible for a program that runs in a multi-tasking OS with memory protection: The OS can't let processes access the hardware directly; they'd be able to crash the computer or draw on each other's windows.
Instead, processes make system calls to interact with things outside of themselves.
Leaving all that aside, there are multiple architectures that don't understand each other's machine code. So even within the same OS, an x86 binary won't run natively on an ARM CPU.

What does sys_vm86old syscall do?

My question is quite simple.
I encountered this sys_vm86old syscall (when reverse engineering) and I am trying to understand what it does.
I found two sources that could give me something but I'm still not sure that I fully understand; these sources are
The Source Code and this page which gives me this paragraph (but it's more readable directly on the link):
config GRKERNSEC_VM86
bool "Restrict VM86 mode"
depends on X86_32
help:
If you say Y here, only processes with CAP_SYS_RAWIO will be able to
make use of a special execution mode on 32bit x86 processors called
Virtual 8086 (VM86) mode. XFree86 may need vm86 mode for certain
video cards and will still work with this option enabled. The purpose
of the option is to prevent exploitation of emulation errors in
virtualization of vm86 mode like the one discovered in VMWare in 2009.
Nearly all users should be able to enable this option.
From what I understood, it would ensure that the calling process has cap_sys_rawio enabled. But this doesn't help me a lot...
Can anybody tell me ?
Thank you
The syscall is used to execute code in VM86 mode. This mode allows you to run old "real mode" 32bit code (like present in some BIOS) inside a protected mode OS.
See for example the Wikipedia article on it: https://en.wikipedia.org/wiki/Virtual_8086_mode
The setting you found means you need CAP_SYS_RAWIO to invoke the syscall.
I think X11 especially is using it to call BIOS methods for switching the video mode. There are two syscalls, the one with old suffix offers less operations but is retained for binary (ABI) compatibility.

Machine Independency

Let's think about a simple C program compiled in Windows.
I can compile the program on an Intel CPU machine and run it on an AMD CPU one (same operating system). So does it mean that the instruction set of the CPU's are the same?
Why doesn't the same program run on a machine with different OS and the same CPU?
The binary setup of the object files are totally different. Also which libraries are available or how to call them.
Just compare the header of an ELF or an EXE file to see what I mean.
If you write a simple program like "main(){printf("Hello\n"); return 0;} there is a lot going on behind the scenes that are covered by the compiler to get these lines printed. Running on the same CPU doesn't help, because it could execute the assembly instructions, but it would fail horribly as soon as calling the first OS function.
To elaborate this a bit:
Just as an exmaple. Lets assume that we are running on Amiga OS with a Motorola 68000 CPU.
If I remember correctly, the calling convetions to call a system library involved loading the pointers into i.e. an adress register of the CPU and then call the OS function.
Now lets assume I write my own OS also using a Motorola 68000 CPU. However, when I design my OS, I thought it is a much better idea to use the stack for data exchange, so when you call a similar function in my own private OS, you don't pass the adress in the address register, instead you push it on the stack.
Now when your executable would be executed in my OS (supposing it could be loaded because I use the same object structure) your executable would put values in a register and my OS would try to pop them from the stack, because it doesn't know that the values it was looking for were supposed to be somehwere else.
I hope this is a bit more detailed so you can understand it, but of course the problems go much deeper then this, as this is just a tiny part of the problems involved.
Both your Intel and AMD use the x86 (or x86-64) architecture. That's why you can run the same software on both. However, the compiled program contains more than just dependencies on the architecture, it also contains dependencies on the underlying operating system. Even the binary format of a Linux executable for example is different from a Windows one.
You can however take a simple C program which uses the C standard library and compile it across different operating systems and processor architectures. As long as your code does not contain operating system dependent code, it will port across operating systems. Similarly, if your code does not rely on the underlying architecture endianess for instance, it will port across architectures.
Johan.

Why should I Minimize the use of system call in my code?

I wanted to know is there any reason to minimize use of system call in code and what is the alternate of not using system call ,one would say use API but api in turns use system call
Is it True??
Because most system calls have an inherent overhead. A system call is a means of tapping into the kernel, a controlled gateway towards obtaining some service.
When performing a system call, some actions are taken (warning, it's a simplification):
You invoke a library (wrapper) function
The function puts the arguments where they are expected. Also the function puts the number of the system call in eax
The function calls a trap (int 0x80 or whatever)
The processor is switched to kernel mode
The kernel invokes some system_call routine
The registers are saved onto the kernel stack
The arguments are checked to be valid
The action is performed
The registers are restored from the kernel stack
The processor is returned to user mode
The function (finally...) returns
And I probably forgot some of the steps. Doesn't this sound like a lot of work ? All you wanted is the bold part. The rest is overhead.
A system call requires that the system switches from User mode to Kernel mode. This makes system calls expensive.
An article to understand this better:
Understanding User and Kernel Mode - Jeff Atwood
First, if you use framework or APIs (e.g. by using wxWidgets instead of rendering the windows manually, or the GNU C library) your code is portable between different operating systems.
Second, you if you're using APIs you won't have problems if the manufacturer changes how the operating system works under the hood, as the APIs (should) be the same as before.
The only reason that cames to my mind right now is portability issues. If you use system calls, your code will only run on that Operating System. And if you need to compile the same source to another OS, you will be in trouble, the API may be completely different.

Why an executable program for a specific CPU does not work on Linux and Windows?

An executable problem like exe does not work on Linux (without wine). When compiling source code compiler produce object code which is specific to a particular cpu architecture. But same application does not work with on an another OS with same CPU. I know code may include instructions to specific to the OS that will prevent executable running. But what about a simple program 2+2 ? Confusing part is what the hell that machine code prevents working. Machine code specific to cpu right? If we strip executable file format could we see same machine code (like 2 + 2) for both operating systems?
One more question: What about assembly language? DO windows and Linux use different assembly language for same cpu?.
There are many differences. Among them:
Executable Format: Every OS requires the binaries to conform to a specific binary format. For Windows, this is Portable Executable (PE) format. For Linux, it's ELF most of the time (it supports other types too).
Application Binary Interface: Each OS defines a set of primary system functions and the way a program calls them. This is fundamentally different between Linux and Windows. While the instructions that compute 2 + 2 are identical on Linux and Windows in x86 architecture, the way the application starts, the way it prints out the output, and the way it exits differs between the operating systems.
Yes, both Linux and Windows programs on x86 architecture use the instruction set that the CPU supports which is defined by Intel.
It's due to the difference of how the program is loaded into memory and given resources to run. Even the simplest programs need to have code space, data space and the ability to acquire runtime memory and do I/O. The infrastructure to do these low-level tasks is completely different between the platforms, unless you have some kind of adaptation layer, like WINE or Cygwin.
Assuming, however, that you could just inject arbitrary assembled CPU instructions into the code segment of a running process and get that code to execute, then, yes, the same code would run on either platform. However, it would be quite restricted, and doing complex things like even jumps to external modules would fail, due to how these things are done differently on different platforms.
Problem 1 is the image format. When an application is launched into execution the Os has to load the applicaiton image, find out its entry point and launch it from there. That means that the OS must understand the image format and there are different formats between various OS.
Problem 2 is access to devices. Once launched an application can read and write registries in the CPU and that's about it. To do anything interesting, like to display a character on a console, it needs access to devices and that means it has to ask for such access from the OS. Each Os has a different API that is offered to access such devices.
Problem 3 is priviledges instructions. The newly launched process would perhaps need a memory location to store something, can't accomplish everything with regiestries. This means it needs to allocate RAM and set up the translation from VA to physical address. These are priviledges operations only the OS can do and again, the API to access these services vary between OSs.
Bottom line is that applications are not writen for a CPU, but for a set of primitive services the OS offer. the alternative is to write the apps against a set of primitive services a Virtual Machine offers, and this leads to apps that are more or less portable, like Java apps.
Yes, but, the code invariably calls out to library functions to do just about anything -- like printing "4" to the terminal. And these libraries are platform-specific, and differ between Linux and Windows. This is why it's not portable -- not, indeed, an instruction-level problem.
Here are some of the reasons I can think of off the top of my head:
Different container formats (which so far seems to be the leading differentiator in this answer -- however its not the only reason).
different dynamic linker semantics.
different ABI.
different exception handling mechanisms -- windows has SEH -- upon which C++ exception handling is built
different system call semantics and different system calls -- hence different low-level libraries.
To the second question: Windows only runs on x86, x64, and IA64 (not sure about the mobile versions). For Linux, see here.