Why Does the BIOS INT 0x19 Load Bootloader at "0x7C00"? - operating-system

As we know the BIOS Interrupt (INT) 0x19 which searches for a boot signature (0xAA55). Loads and executes our bootloader at 0x7C00 if it found.
My Question : Why 0x7C00? What is the reason ? How to evaluate it through some methods?

Maybe because the MBR is loaded into the memory (by the BIOS) into the 0x7c00 address then int 0x19 searches for the MBR sector signature 0xAA55 on sector 0x7c00
about 0xAA55:
Not a checksum, but more of a signature. It does provide some simple
evidence that some MBR is present.
0xAA55 is also an alternating bit pattern: 1010101001010101
It's often used to help determine if you are on a little-endian or
big-endian system, because it will read as either AA55 or 55AA. I
suspect that is part of why it is put on the end of the MBR.
about 0x7c00:
Check this website out (this might help u in finding the answer): https://www.glamenv-septzen.net/en/view/6

This probably dead but I'm going to answer.
At the start of any bootloader when you set the origin of the segment to 0x7c00 then the registers jump address to that as well. So ideally if you check out some online resources that tell you how to use the int 0x19 command they will guide you on how to jump to another address.
To fix this you would ideally, reset the stack to 0 at the start of every jump to an new address.

( It seems like this duplicates following questions:
What is significance of memory at 0000:7c00 to booting sequence?
Does the BIOS copy the 512-byte bootloader to 0x7c00 )
Inspired by an answer to the former, I would quote two sources:
[Notes] Why is the MBR loaded to 0x7C00 on x86? (Full Edition)
So, first answer the question about the 16KB model from David Bradley's reply:
It had to boot on a 32KB machine. DOS 1.0 required a minimum of 32KB,
so we weren't concerned about attempting a boot in 16KB.
To execute DOS 1.0, at least 32KB is required, so the 16KB model has
not been considered.
Followed by the answer to "Why 32KB-1024B?":
We wanted to leave as much room as possible for the OS to load itself
within the 32KB. The 808x Intel architecture used up the first portion
of the memory range for software interrupts, and the BIOS data area
was after it. So we put the bootstrap load at 0x7C00 (32KB-1KB) to
leave all the room in between for the OS to load. The boot sector was
512 bytes, and when it executes it'll need some room for data and a
stack, so that's the other 512 bytes . So the memory map looks like
this after INT 19H executes:
Why BIOS loads MBR into 0x7C00 in x86 ?
No, that case was out of consideration. One of IBM PC 5150 ROM BIOS
Developer Team Members, Dr. David Bradley says:
"DOS 1.0 required a minimum of 32KB, so we weren't concerned about
attempting a boot in 16KB."
(Note: DOS 1.0 required 16KiB minimum ? or 32KiB ? I couldn't find out
which correct. But, at least, in 1981's early BIOS development, they
supposed that 32KiB is DOS minimum requirements.)
BIOS developer team decided 0x7C00 because:
They wanted to leave as much room as possible for the OS to load
itself within the 32KiB.
8086/8088 used 0x0 - 0x3FF for interrupts vector, and BIOS data area
was after it.
The boot sector was 512 bytes, and stack/data area for boot program
needed more 512 bytes.
So, 0x7C00, the last 1024B of 32KiB was chosen.
Once OS loaded and started, boot sector is never used until power reset.
So, OS and application can use the last 1024B of 32KiB freely.
I hope this answer is based enough to be sure why / how it happened so.

Related

List of all registers to be queried with `r`

I was looking for a way to get the contents of the MXCSR register in WinDbg. Looking up the help for the r command I found a lot of options. I thought I had covered all registers with the command
0:000> rM 0xfe7f
However, the MXCSR register was still not included. So I did a full search in WinDbg help, which did not give me any results (sorry for the German screenshot):
So I continued my search in the Internet and finally found
0:000> r mxcsr
mxcsr=00001f80
I am now wondering whether there are other registers that will not be displayed by rM 0xfe7f but are available anyways. I am especially interested in user mode and x86 and AMD64 architecture.
I had a look at dbgeng.dll (version 10.0.20153.1000) and found a few more registers by trying some strings around offset 7DC340. Based on some of that information, I found the MSDN websites x64 registers and x86 registers.
In addition I found
brto, brfrom, exto, exfrom
The registers zmm0 through zmm15 can be used as zmm0h, possibly for the high half.
The registers xmm0/ymm0 through xmm15/ymm15 can be used as ymm0h and ymm0l, likely for the high and low half.
some more which didn't work either because of my CPU model or because I tried it in user mode instead of kernel mode.

Intel Reset Vector

Possible duplicate: Software initialization code at 0xFFFFFFF0H
When the system boots up (Intel), reset vector is at address 0xFFFFFFF0 (16 bytes less than 4G) (as mentioned in above link). That address contains FAR JUMP to where the BIOS is. I read the answer, comments and referenced link, also did some searching, but still cannot understand how 32-bit address can be map to 16-bit (Real Mode)?
My confusion is that in this link: http://www.starman.vertcomp.com/asm/bios/index.html, author mentioned that address F000:FFF0 (16 bytes less than 1MB) contains JUMP to where the BIOS is.
How 0xFFFFFFF0 gets mapped to F000:FFF0? Does it even gets mapped?
If the computer doesn't have physical 4G of memory, let say it has only 1G, where is the 0xFFFFFFF0 address?
Thanks in advance for help.
When I have a chance I will edit this with references.
The 386 manual states that the address lines 31-20 are high on reset until a JMP is encountered, then they are low again. The mapping isn't really there its more of a hack.
The top if the address space where there is no RAM (in a system with say 1GB of RAM) the chipset will map ROM code rather than RAM to that address. It doesn't make sense to have RAM there since on first power on there would be no code there to execute, so it must be non volatile.

What are Ring 0 and Ring 3 in the context of operating systems?

I've been learning basics about driver development in Windows I keep finding the terms Ring 0 and Ring 3. What do these refer to? Are they the same thing as kernel mode and user mode?
Linux x86 ring usage overview
Understanding how rings are used in Linux will give you a good idea of what they are designed for.
In x86 protected mode, the CPU is always in one of 4 rings. The Linux kernel only uses 0 and 3:
0 for kernel
3 for users
This is the most hard and fast definition of kernel vs userland.
Why Linux does not use rings 1 and 2: CPU Privilege Rings: Why rings 1 and 2 aren't used?
How is the current ring determined?
The current ring is selected by a combination of:
global descriptor table: a in-memory table of GDT entries, and each entry has a field Privl which encodes the ring.
The LGDT instruction sets the address to the current descriptor table.
See also: http://wiki.osdev.org/Global_Descriptor_Table
the segment registers CS, DS, etc., which point to the index of an entry in the GDT.
For example, CS = 0 means the first entry of the GDT is currently active for the executing code.
What can each ring do?
The CPU chip is physically built so that:
ring 0 can do anything
ring 3 cannot run several instructions and write to several registers, most notably:
cannot change its own ring! Otherwise, it could set itself to ring 0 and rings would be useless.
In other words, cannot modify the current segment descriptor, which determines the current ring.
cannot modify the page tables: How does x86 paging work?
In other words, cannot modify the CR3 register, and paging itself prevents modification of the page tables.
This prevents one process from seeing the memory of other processes for security / ease of programming reasons.
cannot register interrupt handlers. Those are configured by writing to memory locations, which is also prevented by paging.
Handlers run in ring 0, and would break the security model.
In other words, cannot use the LGDT and LIDT instructions.
cannot do IO instructions like in and out, and thus have arbitrary hardware accesses.
Otherwise, for example, file permissions would be useless if any program could directly read from disk.
More precisely thanks to Michael Petch: it is actually possible for the OS to allow IO instructions on ring 3, this is actually controlled by the Task state segment.
What is not possible is for ring 3 to give itself permission to do so if it didn't have it in the first place.
Linux always disallows it. See also: Why doesn't Linux use the hardware context switch via the TSS?
How do programs and operating systems transition between rings?
when the CPU is turned on, it starts running the initial program in ring 0 (well kind of, but it is a good approximation). You can think this initial program as being the kernel (but it is normally a bootloader that then calls the kernel still in ring 0).
when a userland process wants the kernel to do something for it like write to a file, it uses an instruction that generates an interrupt such as int 0x80 or syscall to signal the kernel. x86-64 Linux syscall hello world example:
.data
hello_world:
.ascii "hello world\n"
hello_world_len = . - hello_world
.text
.global _start
_start:
/* write */
mov $1, %rax
mov $1, %rdi
mov $hello_world, %rsi
mov $hello_world_len, %rdx
syscall
/* exit */
mov $60, %rax
mov $0, %rdi
syscall
compile and run:
as -o hello_world.o hello_world.S
ld -o hello_world.out hello_world.o
./hello_world.out
GitHub upstream.
When this happens, the CPU calls an interrupt callback handler which the kernel registered at boot time. Here is a concrete baremetal example that registers a handler and uses it.
This handler runs in ring 0, which decides if the kernel will allow this action, do the action, and restart the userland program in ring 3. x86_64
when the exec system call is used (or when the kernel will start /init), the kernel prepares the registers and memory of the new userland process, then it jumps to the entry point and switches the CPU to ring 3
If the program tries to do something naughty like write to a forbidden register or memory address (because of paging), the CPU also calls some kernel callback handler in ring 0.
But since the userland was naughty, the kernel might kill the process this time, or give it a warning with a signal.
When the kernel boots, it setups a hardware clock with some fixed frequency, which generates interrupts periodically.
This hardware clock generates interrupts that run ring 0, and allow it to schedule which userland processes to wake up.
This way, scheduling can happen even if the processes are not making any system calls.
What is the point of having multiple rings?
There are two major advantages of separating kernel and userland:
it is easier to make programs as you are more certain one won't interfere with the other. E.g., one userland process does not have to worry about overwriting the memory of another program because of paging, nor about putting hardware in an invalid state for another process.
it is more secure. E.g. file permissions and memory separation could prevent a hacking app from reading your bank data. This supposes, of course, that you trust the kernel.
How to play around with it?
I've created a bare metal setup that should be a good way to manipulate rings directly: https://github.com/cirosantilli/x86-bare-metal-examples
I didn't have the patience to make a userland example unfortunately, but I did go as far as paging setup, so userland should be feasible. I'd love to see a pull request.
Alternatively, Linux kernel modules run in ring 0, so you can use them to try out privileged operations, e.g. read the control registers: How to access the control registers cr0,cr2,cr3 from a program? Getting segmentation fault
Here is a convenient QEMU + Buildroot setup to try it out without killing your host.
The downside of kernel modules is that other kthreads are running and could interfere with your experiments. But in theory you can take over all interrupt handlers with your kernel module and own the system, that would be an interesting project actually.
Negative rings
While negative rings are not actually referenced in the Intel manual, there are actually CPU modes which have further capabilities than ring 0 itself, and so are a good fit for the "negative ring" name.
One example is the hypervisor mode used in virtualization.
For further details see:
https://security.stackexchange.com/questions/129098/what-is-protection-ring-1
https://security.stackexchange.com/questions/216527/ring-3-exploits-and-existence-of-other-rings
ARM
In ARM, the rings are called Exception Levels instead, but the main ideas remain the same.
There exist 4 exception levels in ARMv8, commonly used as:
EL0: userland
EL1: kernel ("supervisor" in ARM terminology).
Entered with the svc instruction (SuperVisor Call), previously known as swi before unified assembly, which is the instruction used to make Linux system calls. Hello world ARMv8 example:
hello.S
.text
.global _start
_start:
/* write */
mov x0, 1
ldr x1, =msg
ldr x2, =len
mov x8, 64
svc 0
/* exit */
mov x0, 0
mov x8, 93
svc 0
msg:
.ascii "hello syscall v8\n"
len = . - msg
GitHub upstream.
Test it out with QEMU on Ubuntu 16.04:
sudo apt-get install qemu-user gcc-arm-linux-gnueabihf
arm-linux-gnueabihf-as -o hello.o hello.S
arm-linux-gnueabihf-ld -o hello hello.o
qemu-arm hello
Here is a concrete baremetal example that registers an SVC handler and does an SVC call.
EL2: hypervisors, for example Xen.
Entered with the hvc instruction (HyperVisor Call).
A hypervisor is to an OS, what an OS is to userland.
For example, Xen allows you to run multiple OSes such as Linux or Windows on the same system at the same time, and it isolates the OSes from one another for security and ease of debug, just like Linux does for userland programs.
Hypervisors are a key part of today's cloud infrastructure: they allow multiple servers to run on a single hardware, keeping hardware usage always close to 100% and saving a lot of money.
AWS for example used Xen until 2017 when its move to KVM made the news.
EL3: yet another level. TODO example.
Entered with the smc instruction (Secure Mode Call)
The ARMv8 Architecture Reference Model DDI 0487C.a - Chapter D1 - The AArch64 System Level Programmer's Model - Figure D1-1 illustrates this beautifully:
The ARM situation changed a bit with the advent of ARMv8.1 Virtualization Host Extensions (VHE). This extension allows the kernel to run in EL2 efficiently:
VHE was created because in-Linux-kernel virtualization solutions such as KVM have gained ground over Xen (see e.g. AWS' move to KVM mentioned above), because most clients only need Linux VMs, and as you can imagine, being all in a single project, KVM is simpler and potentially more efficient than Xen. So now the host Linux kernel acts as the hypervisor in those cases.
From the image we can see that when the bit E2H of register HCR_EL2 equals 1, then VHE is enabled, and:
the Linux kernel runs in EL2 instead of EL1
when HCR_EL2.TGE == 1, we are a regular host userland program. Using sudo can destroy the host as usual.
when HCR_EL2.TGE == 0 we are a guest OS (e.g. when you run an Ubuntu OS inside QEMU KVM inside the host Ubuntu. Doing sudo cannot destroy the host unless there's a QEMU/host kernel bug.
Note how ARM, maybe due to the benefit of hindsight, has a better naming convention for the privilege levels than x86, without the need for negative levels: 0 being the lower and 3 highest. Higher levels tend to be created more often than lower ones.
The current EL can be queried with the MRS instruction: what is the current execution mode/exception level, etc?
ARM does not require all exception levels to be present to allow for implementations that don't need the feature to save chip area. ARMv8 "Exception levels" says:
An implementation might not include all of the Exception levels. All implementations must include EL0 and EL1.
EL2 and EL3 are optional.
QEMU for example defaults to EL1, but EL2 and EL3 can be enabled with command line options: qemu-system-aarch64 entering el1 when emulating a53 power up
Code snippets tested on Ubuntu 18.10.
Intel processors (x86 and others) allow applications limited powers. To restrict (protect) critical resources like IO, memory, ports etc, CPU in liaison with the OS (Windows in this case) provides privilege levels (0 being most privilege to 3 being least) that map to kernel mode and user mode respectively.
So, the OS runs kernel code in ring 0 - highest privilege level (of 0) provided by the CPU - and user code in ring 3.
For more details, see http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/

Does booting in EFI mode mean that I shall not have access to BIOS interrupts?

I am attempting to develop a simple OS. I have done some assembly programs before and have had to use INT 10h to display characters to the screen. I understand that UEFI has support for legacy BIOS and may still be able to use INT 10h services. However, If I choose to build a pure UEFI bootable OS, should I avoid using INT 10h? Or am I looking at things the wrong way?
In other words, does the drilled down printf to stdout (screen) end up calling the BIOS INT 10h? Or is the question - "Is SYS_WRITE function call based on INT 10h?" more appropriate?
Will I still have to create a boot sector with 512 bytes and place them as the zeroth sector on a disk (or disk image)? Does the location 0x7c00 have significance anymore?
If your bootloader is a UEFI bootloader (you will know if it is), then you may not use BIOS at all including int 0x10 - you must use UEFI bootservices which provide all of the functionality that BIOS would otherwise provide to legacy boot systems.
If you are not writing a UEFI bootloader, but your hardware is UEFI enabled, your bootloader will be loaded in "legacy" mode, and you will be able to use BIOS as before.
Or to put it another way, your boot image can either be a UEFI bootloader, or it can be a legacy BIOS image. Legacy BIOS images can't use UEFI, and UEFI bootloaders can't use BIOS.
In other words, does the drilled down printf to stdout (screen) end up calling the BIOS INT 10h? Or is the question - "Is SYS_WRITE function call based on INT 10h?" more appropriate?
Depends who wrote your printf function (you're the OS, there's no-one beneath you). If you call Int 0x10 and haven't set up the IDT to handle that as a call into UEFI to write a character to the screen, then you're just using undefined behaviour.
Will I still have to create a boot sector with 512 bytes and place them as the zeroth sector on a disk (or disk image)? Does the location 0x7c00 have significance anymore?
No, and no. UEFI supports much larger bootloaders, and are not loaded at 0x7C00. If you want to know which memory regions have special significance, you must ask UEFI to give you a memory map.
The PC BIOS is not part of the UEFI programming model, so you shouldn't use it in UEFI applications. For printing to the screen for instance you would use a function from the UEFI library.
Reading the first sector from a disk and loading it to 0x7C00 is a BIOS specific booting protocol. UEFI bootloaders are loaded from a filesystem. You can read more about it on the OSDev Wiki.

entering ring 0 from user mode

Most modern operating systems run in the protected mode. Now is it possible for the user programs to enter the "ring 0" by directly setting the corresponding bits in some control registers. Or does it have to go through some syscall.
I believe to access the hardware we need to go through the operating system. But if we know the address of the hardware device can we just write some assembly language code with reference to the location of the device and access it. What happens when we give the address of some hardware device in the assembly language code.
Thanks.
To enter Ring 0, you must perform a system call, and by its nature, the system controls where you go, because for the call you simply give an index to the CPU, and the CPU looks inside a table to know what to call. You can't really get around the security aspect (obviously) to do something else, but maybe this link will help.
You can ask the operating system to map the memory of the hardware device into the memory space of your program. Once that's done, you can just read and write that memory from ring 3. Whether that's possible to do, or how to do that, depends on the operating system or the device.
; set PE bit
mov cr0, eax
or eax, 1
mov eax, cr0
; far jump (cs = selector of code segment)
jmp cs:#pm
#pm:
; Now we are in PM
Taken from Wikipedia.
Basic idea is to set (to 1) 0th bit in cr0 control register.
But if you are already in protected mode (i.e. you are in windows/linux), security restricts you to do it (you are in ring 3 - lowest trust).
So be the first one to get into protected mode.