How to identify that you're running under a VM? - virtualization

Is there a way to identify, from within a VM, that your code is running inside a VM?
I guess there are more or less easy ways to identify specific VM systems, especially if the VM has the provider's extensions installed (such as for VirtualBox or VMWare). But is there a general way to identify that you are not running directly on the CPU?

A lot of the research on this is dedicated to detecting so-called "blue pill" attacks, that is, a malicious hypervisor that is actively attempting to evade detection.
The classic trick to detect a VM is to populate the ITLB, run an instruction that must be virtualized (which necessarily clears out such processor state when it gives control to the hypervisor), then run some more code to detect if the ITLB is still populated. The first paper on it is located here, and a rather colorful explanation from a researcher's blog and alternative Wayback Machine link to the blog article (images broken).
Bottom line from discussions on this is that there is always a way to detect a malicious hypervisor, and it's much simpler to detect one that isn't trying to hide.

Red Hat has a program which detects which (if any) virtualization product it's being run under: virt-what.
Using a third-party-maintained tool such is this is a better strategy long-term than trying to roll your own detection logic: more eyes (testing against more virtualization products), etc.

A more empirical approach is to check for known VM device drivers. You could write WMI queries to locate, say, the VMware display adapter, disk drive, network adapter, etc. This would be suitable if you knew you only had to worry about known VM host types in your environment. Here's an example of doing this in Perl, which could be ported to the language of your choice.

It depends on what you are after:
If the VM is not hiding from you on purpose, you can use some known hook. LIke looking for VmWare drivers or the presence of certain strings in memory or certain other tell-tale signs.
If the VM is really wanting you to do special things for it, it will have some obvious hook in place, like modifying the ID of the processor or adding some special registers that you can access to detect it. Or s a special device in a known location in memory (presuming you can get raw access to the physical memory space of your world). NOte that modern machine designs like the IBM Power6 and Sun UltraSparc T1/T2 are designed to ALWAYS run a hypervisor, and never directly on raw hardware. The interface to the "hardware" that an OS uses is in fact the interface ot a hypervisor software layer, with no way to get around it. In this case, detection is trivial since it is a constant "yes". This is the likely future direction for all computer systems that can afford the overhead, look at the support in recent designs like the Freescale QorIQ P4080 chip, for example (www.freescale.com/qoriq).
If the VM is intentionally trying to hide, and you are chasing its presence, it is a game of cat-and-mouse where the timing disturbance and different performance profile of a VM is almost always going to give it away. Obviously, this depends on how the VM is implemented and how much hardware support there is in place in the architecture (I think a zSeries mainframe is much better at hiding the presence of a VM or stack of VMs under your particular OS than a regular x86 machine is, for example). See http://jakob.engbloms.se/archives/97 for some discussion on this topic. It is possible to try to hide as a VM, but detection is quite likely to always win if it tries hard enough.

In most cases, you shouldn't try to. You shouldn't care if someone is running your code in a VM, except in a few specific cases.
If you need to, in Linux the most common way is to look at /sys/devices/virtual/dmi/id/product_name, which will list the name of the laptop/mainboard on most real systems, and the hypervisor on most virtual systems. dmidecode | grep Product is another common method, but I think that requires root access.

I once ran across an assembly code snippet that told you if you were in a VM....I googled but couldn't find the original article.
I did find this though: Detect if your program is running inside a Virtual Machine.
Hope it helps.

Here is a (java + windows) solution to identify whether underlying machine is physical or virtual.
Virtual Machines Examples:
Manufacturer
Xen
Microsoft Corporation
innotek GmbH
Red Hat
VMware, Inc.
Model
HVM domU
Virtual Machine
VirtualBox
KVM
VMware Virtual Platform
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
public abstract class OSUtil {
public static final List<String> readCmdOutput(String command) {
List<String> result = new ArrayList<>();
try {
Process p=Runtime.getRuntime().exec("cmd /c " + command);
p.waitFor();
BufferedReader reader=new BufferedReader(
new InputStreamReader(p.getInputStream())
);
String line;
while((line = reader.readLine()) != null) {
if(line != null && !line.trim().isEmpty()) {
result.add(line);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
public static final String readCmdOutput(String command, int lineNumber) {
List<String> result = readCmdOutput(command);
if(result.size() < lineNumber) {
return null;
}
return result.get(lineNumber - 1);
}
public static final String getBiosSerial() {
return readCmdOutput("WMIC BIOS GET SERIALNUMBER", 2);
}
public static final String getHardwareModel() {
return readCmdOutput("WMIC COMPUTERSYSTEM GET MODEL", 2);
}
public static final String getHardwareManufacturer() {
return readCmdOutput("WMIC COMPUTERSYSTEM GET MANUFACTURER", 2);
}
public static void main(String[] args) {
System.out.println("BIOS Serial: " + getBiosSerial());
System.out.println("Hardware Model: " + getHardwareModel());
System.out.println("Hardware Manufacturer: " + getHardwareManufacturer());
}
}
You can use the output to decide whether it is a VM or a physical machine:
Physical machine output:
BIOS Serial: 2HC3J12
Hardware Model: Inspiron 7570
Hardware Manufacturer: Dell Inc.
Virtual machine output:
BIOS Serial: 0
Hardware Model: Innotec GmBH
Hardware Manufacturer: Virtual Box

If it VM does the job well, it should be invisible to the client that it's being virtualized. However, one can look at other clues.
I would imagine that looking for known drivers or software specific to the VM environment would be the best possible way.
For example, on a VMWare client running Windows, vmxnet.sys would be the network driver, displayed as VMware accelerated AMD PCNet Adapter.

One good example is that apparently doing a WMI query for the motherboard manufacturer, and if it returns "Microsoft" you're in a VM. Thought I believe this is only for VMWare. There are likely different ways to tell for each VM host software.
This article here http://blogs.technet.com/jhoward/archive/2005/07/26/407958.aspx has some good suggestions and links to a couple of ways to detect if you are in a VM (VMWare and VirtualPC at least).

You might be able to identify whether you're in a virtual machine by looking at the MAC address of your network connection. Xen for example typically recommends using a specific range of addresses 00:16:3e:xx:xx:xx.
This isn't guaranteed as it's up to the administrator of the system to specify what MAC address they like.

In Linux systems, you can try to search for common files on /proc.
Example, the existente of /proc/vz/ tell you is a OpenVZ.
Here's a full guide to detect VM's environent under Linux without have to "drink pills" :)

TrapKIT provides ScoopyNG, a tool for VMware identification -- it attempts to work around evasion techniques, but doesn't necessarily target any virtualization software other than VMware. Both source and binaries are available.

Related

Contents of Memory at very small Addresses

can someone please tell me what contents does the memory have at very small Addresses(0-100), such as address 7 for example in a Linux-based operating system such as CentOS and in Windows ?
Low Virtual Addresses
For most operating systems, at least the lower half of a virtual address space depends on the process it belongs to (with "kernel space" in the upper portion). Typically, to catch dodgy pointers (which includes things like "int pointer = NULL; foo = pointer[1234];" and "struct myStructure *pointer = NULL; foo = pointer->myField;"; where the address that's accessed isn't the address that the pointer points to) the lowest virtual addresses are reserved for literally nothing; so that if any software tries to access it the CPU generates a page fault to inform the kernel that software tried to do something very wrong.
Low Physical Addresses
What is at low physical addresses depend on which type of computer it is (80x86, ARM, MIPs, ...) and what the firmware is (e.g. BIOS, UEFI) and other factors (how the chipset was configured). Without this information there can't be a specific answer (the only possible answer is "nobody can know").

Device Drivers vs the /dev + glibc Interface

I am looking to have the processor read from I2C and store the data in DDR in an embedded system. As I have been looking at solutions, I have been introduced to Linux device drivers as well as the GNU C Library. It seems like for many operations you can perform with the basic Linux drivers you can also perform with basic glibc system calls. I am somewhat confused when one should be used over the other. Both interfaces can be accessed from the user space.
When should I use a kernel driver to access a device like I2C or USB and when should I use the GNU C Library system functions?
The GNU C Library forwards function calls such as read, write, ioctl directly to the kernel. These functions are just very thin wrappers around the system calls. You could call the kernel all by yourself using inline assembly, but that is rarely helpful. So in this sense, all interactions with the kernel driver will go through these glibc functions.
If you have questions about specific interfaces and their trade-offs, you need to name them explicitly.
In ARM:
Privilege states are built into the processor and are changed via assembly commands. A memory protection unit, a part of the chip, is configured to disallow access to arbitrary ranges of memory depending on the privilege status.
In the case of the Linux kernel, ALL physical memory is privileged- memory addresses in userspace are virtual (fake) addresses, translated to real addresses once in privileged mode.
So, to access a privileged memory range, the mechanics are like a function call- you set the parameters indicating what you want, and then make a ('SVC')- an interrupt function which removes control of the program from userspace, gives it to the kernel. The kernel looks at your parameters and does what you need.
The standard library basically makes that whole process easier.
Drivers create interfaces to physical memory addresses and provide an API through the SVC call and whatever 'arguments' it's passed.
If physical memory is not reserved by a driver, the kernel generally won't allow anyone to access it.
Accessing physical memory you're not privileged to will cause a "bus error".
BTW: You can use a driver like UIO to put physical memory into userspace.

Theoretical embedded linux requirements

I come from a programmer background using Java, C#, C++, Javascript
I got my self a Raspberry Pi (Model 1 A, the one without ethernet) and played around for a while with it. I used Raspbian and Arch Linux ARM (since it was said it is small and customizable). Unfortunatly I didn't manage to configure them as I want to have them.
I am trying to build a nice looking (embedded) system with the only goal to start (boot) the Raspberry Pi fast and autostart a test application which will be written in C# (Mono), C++ (Qt), Java (Java Runtime) or something in JavaScript/HTML.
Since I was not able to get rid of all the log messages (i got rid of most), the tty login screen, the attempts of connecting to the network (although the Model 1 A does not have ethernet at all) booting was ugly and took long (+1 minute in some cases).
It seems I will have to build a minimum embedded linux but I have a lack in the theory of embedded linux elements and how they fit together.
My question: What are the theoretically required parts of an embedded linux holding either mono, qt, java runtime on a raspberry pi?
So far I know the following parts:
the hardware (raspberry pi model 1 A) + sd card
the sd card holds 2 partitions, 1 boot partition (fat32), 1 data partition (ext4)
a boot loader
a linux kernel (which can be optimized to the needs of a raspi)
But what then? My research got lost at "use a distro" what I don't want. What are the missing pieces between the kernel and starting an application?
An Embedded Linux system is comprised of many different parts that work together towards the same goal of making things work efficiently.
Ideally, that is not much different from a regular GNU/Linux system, but let's see in detail the building blocks of a generic embedded system.
For the following explanation, I am assuming as architecture ARM. What is written below may differ slightly from implementation to implementation, but is usually a common track for commercial embedded systems.
Blocks of a GNU/Linux Embedded System
Hardware
SoC
The SoC is where all the processing takes places, it is the main processing unit of the whole system and the only place that has "intelligence". It is in charge of using the other hardware and running your software.
It is made of various and heterogeneous sub-blocks:
Core + Caches + MMU - the "real" processor, e.g. ARM Cortex-A9. It's the main thing you will notice when choosing a SoC.
May be coadiuvated by e.g. a SIMD coprocessor like NEON.
Internal RAM - generally very small. Used in the first phase of the boot sequence.
Various "Peripherals" - connected via some interconnect
fabric/bus to the Core. These can span from a simple ADC to a 3D Graphics Accelerator. Examples of such IP cores are: USB, PCI-E, SGX, etc.
A low power/real time coprocessor - some systems offer one or more coprocessor thought either to help the main Core with real time tasks (e.g. industrial communication buses) or to handle low power states. Its/their architecture might (or not) be a relative of the Core's one.
External RAM
It is used by the SoC to store temporary data after the system has bootstrapped and during the bootstrap itself. It's usually the memory your embedded system uses during regular operation.
Non-Volatile Memory - optional
May or may not be present. In your case it's the SD card you mentioned. In other cases could be a NAND, NOR or SPI Dataflash memory (or any combination of them).
When present, it is often the regular source of data the SoC will read from and usually stores all the SW components needed for the system to work.
Could not be necessary/useful in some kind of applications.
External Peripherals
Anything not strictly related to the above.
Could be a MAC ID EEPROM, some relays, a webcam or whatever you can possibly imagine.
Software
First of all, we introduce what is called the bootchain, which is what happens as soon as you power up your SoC and - someway - tell it to start running. In the following list, the bootchain is the subsequent calls of point 1 to point 4.
Apart from specific/exotic implementations, it is more or less always the same:
Boot ROM code - a small (usually masked - aka factory impressed) memory contained in the SoC. The first thing the SoC will do when powered up is to execute the code in it.
This code will - generally according to external configuration pins - decide the so-called "boot strategy" or "boot order", which is where (and in what order) to look for additional code to be executed. The suitable mediums are disparate: USB storage devices, USB hosts, SD cards, NANDs, NORs, SPI dataflashes, Ethernets, UARTs, etc.
If none of the above contains something valid, the Boot ROM will usually issue a soft reset of the SoC, and so on.
The code in the medium is not, of course, executed in place: it gets copied into the Internal RAM then executed.
[The following two are contained in what we will call bootloader medium]
1st stage bootloader - it has just been copied by the Boot ROM into
the SoC's Internal RAM. Must be tiny enough to fit that memory
(usually well under 100kB). It is needed because the Boot ROM isn't
big enough and does not know what kind of External RAM the SoC is
attached to. Has the main important function of initializing the
External RAM and the SoC's external memory interface, as well as
other peripherals that may be of interest (e.g. disable watchdog
timers). Once done, it copies the next stage to the External RAM and
executes it. Depending on the context, could be called MLO, SPL or
else.
2nd stage bootloader - the "main" bootloader. Bigger (could be x10) than the 1st stage one, completes the initializiation of the
relevant peripherals (e.g. ethernet, additional storage media, LCD
displays). Allows a much more complicated logic for what to do next
and offers - depending on the level of sofistication - high level
facilities (filesystem/volume handling, data
copy-move-interpretation, LCD output, interactive console, failsafe
policies). Most of the times loads a Linux kernel (and related) into
memory from some medium and passes relevant information to it (e.g.
if not embedded, for newer kernels the DTB physical address is put
in the r2 register - the Kernel then reads the register and
retrieves the DTB)
Linux Kernel - the core of the operating system. Depending on the
hardware platform may or may not be a mainline ("official") version.
Is usually completed by built-in or loadable (from an external
source - free or not) modules. Initializes all the hardware needed for the complete system to work according to hardcoded configuration and the DT - enables MMU, orchestrates the whole system and accesses the hardware exlusively. According to the boot arguments
(cmdline - usually passed by the previous stage) and/or to compiled
options, the Kernel tries to mount a root file system. From the
rootfs, it will try to load an init (namely, /sbin/init - where / is
the just mounted rootfs).
Init and rootfs - init is the first non-Kernel task to be run, and
has PID 1. It initalizes literally everything you need to use your
system. In production embedded systems, it also starts the main
application. In such systems is either BusyBox or a custom crafted
application.
More on rootfs and distros
Rootfs contains all of your GNU/Linux systems that is not Kernel (apart from /lib/modules and other bits).
It contains all the applications that manage peripherals like Ethernet, WiFi, or external UMTS modems.
Contains the interactive part of the system, contains the user interface, and everything else you see when you boot a GNU/Linux system - embedded or not.
A "distro" is just a particular collection of userspace (non-Kernel) programs and libraries (usually) verified to work well one with the other, put toghether by a particular group of people.
Desktop distros usually also ship with a custom-tailored kernel and a bootloader. Examples are Fedora, Ubuntu, Debian, etc.
In the general sense of the term, nothing stops you from creating your own distro, which is what happens everytime a custom embedded system goes in production: through tools like Yocto or Buildroot (or by hand), in fact, you are able to decide the very particular collection (hence distro, distribution) of softwares fit for the purpose of the system.
To sum up and answer exactly to your question, the missing part you are looking for is init and the process of mounting the rootfs: the Kernel mounts - aka renders available to itself - via its drivers and the passed/builtin parameters - a given volume/partition (the ext4 data partition you mention) to the "/" mount point.
In this volume/partition there is a /sbin/init executable, which the Kernel executes.
This is the "Big Bang" of our GNU/Linux userspace system: the place where everything visible starts. Depending on the configuration scripts (usually located under /etc/init.d) the "application" you mention is either run automatically by init or by the user via a terminal/ssh/whatever that - again - init made you possible to use.

How can user-level executables interact with "protected memory" devices in Kernel?

When compiled to object code and then object files are linked together, statically or otherwise, they are placed in a [VAD][1], at least I know this for sure on modern Windows operating systems dated back to the late 2000s. My guess is that, and this is from the top of my head, but I assume a kernel library that is dynamically linked is placed in a TBL paged virtual address space with the main executable file and if dynamic linked libraries exist, like C's standard, they are linked together with main executable, but static, like [SDL][2], are not. So how is the executable accesses the protected memory, like drivers, etc. through the linked kernel library?
Hope my question isn't too confusing.
Basically, what I want to ask, in the shortest question, is:
How does a compiled/linked executable and accompanying libraries/APIs actually reach or interact with the OS API, kernel API, or otherwise system software necessary for hardware manipulation in run-time?
I can speak only for windows =)
Firstly, thread has context, which include two stacks - kernel mode stack and usermode stack. CPU has commands - SYSENTER. These instruction use MSR registers IA32_SYSENTER_* which describes kernel mode entry point. When called, they switch current level to ring 0, switch stack to kernel-mode stack and call km entrypoint. On Windows this entry point called KiFastCallEntry. Basically these function call KiSystemService (), which save UM context into stack (KTRAP), copy arguments and call appropriate system service (usermode provide index into System Service Descriptor Table). After that KiSystemService set usermode context from KTRAP and call sysexit, which switch current privilege level to 3, switch stack to usermode and transfer control to caller (basically this is ntdll stubs). There are some difference with old xp and 2000 (they use int 2e trap from IDT) and AMD on x64.
This isn`t very precious description (e.g. there are several service descriptors tables, etc). You can read "Windows Inside" or something like http://shift32.wordpress.com/2011/10/14/inside-kisystemservice/ and http://wiki.osdev.org/SYSENTER#Compatability_across_Intel_and_AMD

binary translation

The VMM traps privileged instructions and they are translated using binary translation, but actually into what are these special instructions translated into?
Thanks
Binary translation is a system virtualization technique.
The sensitive instructions in the binary of Guest OS are replaced by either Hypervisor calls which safely handle such sensitive instructions or by some undefined opcodes which result in a CPU trap. Such a CPU trap is handled by the Hypervisor.
On most modern CPUs, context sensitive instructions are Non-Virtualizable. Binary translation is a technique to overcome this limitation.
For example, if the Guest had wanted to modify/read the CPUs Processor Status Word containing important flags/control bitfields, the Host program would scan the guest binary for such instructions and replace them with either a call to hypervisor or some dummy opcode.
Para-Virtualization on the other hand is a technique where the source code of the guest os is modified. All system resource access related code is modified with Hypervisor APIs.
See VMware_paravirtualization.pdf, pages 3 and 4.
This approach, depicted in Figure 5,
translates kernel code to replace
nonvirtualizable instructions with new
sequences of instructions that have
the intended effect on the virtual
hardware.
So the privileged instructions are translated into other instructions, which access the virtual BIOS, memory management, and devices provided by the Virtual Machine Monitor, instead of executing directly on the real hardware.
Exactly what these instructions are, is defined by the VM implementation. Vendors of proprietary virtualization software don't necessarily publish their binary translation techniques.