Intel Reset Vector - operating-system

Possible duplicate: Software initialization code at 0xFFFFFFF0H
When the system boots up (Intel), reset vector is at address 0xFFFFFFF0 (16 bytes less than 4G) (as mentioned in above link). That address contains FAR JUMP to where the BIOS is. I read the answer, comments and referenced link, also did some searching, but still cannot understand how 32-bit address can be map to 16-bit (Real Mode)?
My confusion is that in this link: http://www.starman.vertcomp.com/asm/bios/index.html, author mentioned that address F000:FFF0 (16 bytes less than 1MB) contains JUMP to where the BIOS is.
How 0xFFFFFFF0 gets mapped to F000:FFF0? Does it even gets mapped?
If the computer doesn't have physical 4G of memory, let say it has only 1G, where is the 0xFFFFFFF0 address?
Thanks in advance for help.

When I have a chance I will edit this with references.
The 386 manual states that the address lines 31-20 are high on reset until a JMP is encountered, then they are low again. The mapping isn't really there its more of a hack.
The top if the address space where there is no RAM (in a system with say 1GB of RAM) the chipset will map ROM code rather than RAM to that address. It doesn't make sense to have RAM there since on first power on there would be no code there to execute, so it must be non volatile.

Related

How does a microprocessor process an instruction set

For example if I have an 8085 microprocessor.
And below are the instructions.
MVI A, 52H : Store 32H in the accumulator
STA 4000H : Copy accumulator contents at address 4000H
HLT : Terminate program execution
How does the microprocessor understand the commands MVI, STA, HLT.
If I am correct, HLT has 76 as an opcode. In that case, how does a microprocessor recognize 76 as instruction rather than data?
It depends on the processor. Some have fixed-length instructions, in which case the instruction bytes are at every <n> locations, whereas some have variable-length instructions, so that which words/bytes are opcodes and which are arguments depends on what came before. To further complicate this, some processors have certain instructions which must be aligned or padded to certain addresses. Yikes.
The 8085 has variable length instructions. So you have to start at the PC and interpret each instruction based on its length to know where the next begins, and which bytes are data/arguments as opposed to opcodes.
A value of 76 can represent anything, it depends on how it is being interpreted.
In the case of a micro processor, there is a special register that contains the memory address of the next instruction to execute. This data is then loaded and interpreted as an instruction to execute. If the address of the next instruction contains the value 76, this will be interpreted as HLT (in your case). Obviously a different processor might interpret 76 as a different instruction.
On the other hand, if the data from this address is interpreted as a numerical value, it will just mean 76.
It's just that when the processor finds 76 as a part of a program that it is executing, that is, its "program counter" points to the place in memory where the 76 is, it will interpret it as an instruction.
If the processor is then told by its program to load that same 76, from some other place in memory or even from the same place in memory, into a register and use it for calculations, it is interpeted as data.
This is the so called Von Neumann architecture, where program and data are stored in the same computer memory. It all looks the same, but the processor is told by its program which content to treat as data.

Difference between machine language, binary code and a binary file

I'm studying programming and in many sources I see the concepts: "machine language", "binary code" and "binary file". The distinction between these three is unclear to me, because according to my understanding machine language means the raw language that a computer can understand i.e. sequences of 0s and 1s.
Now if machine language is a sequence of 0s and 1s and binary code is also a sequence of 0s and 1s then does machine language = binary code?
What about binary file? What really is a binary file? To me the word "binary file" means a file, which consists of binary code. So for example, if my file was:
010010101010010
010010100110100
010101100111010
010101010101011
010101010100101
010101010010111
Would this be a binary file? If I google binary file and see Wikipedia I see this example picture of binary file which confuses me (it's not in binary?....)
Where is my confusion happening? Am I mixing file encoding here or what? If I were to ask one to SHOW me what is machine language, binary code and binary file, what would they be? =) I guess the distinction is too abstract to me.
Thnx for any help! =)
UPDATE:
In Python for example, there is one phrase in a file I/O tutorial, which I don't understand: Opens a file for reading only in binary format. What does reading a file in binary format mean?
Machine code and binary are the same - a number system with base 2 - either a 1 or 0. But machine code can also be expressed in hex-format (hexadecimal) - a number system with base 16. The binary system and hex are very interrelated with each other, its easy to convert from binary to hex and convert back from hex to binary. And because hex is much more readable and useful than binary - it's often used and shown. For instance in the picture above in your question -uses hex-numbers!
Let say you have the binary sequence 1001111000001010 - it can easily be converted to hex by grouping in blocks - each block consisting of four bits.
1001 1110 0000 1010 => 9 14 0 10 which in hex becomes: 9E0A.
One can agree that 9E0A is much more readable than the binary - and hex is what you see in the image.
I'm honestly surprised to not see the information I was looking for, looking back though, I guess the title of this thread isn't fully appropriate to the question the OP was asking.
You guys all say "Machine Code is a bunch of numbers".
Sure, the "CODE" is a bunch of numbers, but what people are wondering (I'm guessing) is "what actually is happening physically?"
I'm quite a novice when it comes to programming, but I understand enough to feel confident in 'roughly' answering this question.
Machine code, to the actual circuitry, isn't numbers or values.
Machine code is a bunch of voltage gates that are either open or closed, and depending on what they're connected to, a certain light will flicker at a certain time etc.
I'm guessing that the "machine code" dictates the pathway and timing for specific electrical signals that will travel to reach their overall destination.
So for 010101, 3 voltage gates are closed (The 0's), 3 are open (The 1's)
I know I'm close to the right answer here, but I also know it's much more sophisticated - because I can imagine that which I don't know.
010101 would be easy instructions for a simple circuit, but what I can't begin to fathom is how a complex computer processes all of the information.
So I guess let's break it down?
x-Bit-processors tell how many bits the processor can process at once.
A bit is either 1 or 0, "On" or "Off", "Open" or "Closed"
so 32-bit processors process "10101010 10101010 10101010 10101010" - this many bits at once.
A processor is an "integrated circuit", which is like a compact circuit board, containing resistors/capacitors/transistors and some memory. I'm not sure if processors have resistors but I know you'll usually find a ton of them located around the actual processor on the circuit board
Anyways, a transistor is a switch so if it receives a 1, it sends current in one direction, or if it receives a 0, it'll send current in a different direction... (or something like that)
So I imagine that as machine code goes... the segment of code the processor receives changes the voltage channels in such a way that it sends a signal to another part of the computer (why do you think processors have so many pins?), probably another integrated circuit more specialized to a specific task.
That integrated circuit then receives a chunk of code, let's say 2 to 4 bits 01 or 1100 or something, which further defines where the final destination of the signal will end up, which might be straight back to the processor, or possibly to some output device.
Machine code is a very efficient way of taking a circuit and connecting it to a lightbulb, and then taking that lightbulb out of the circuit and switching the circuit over to a different lightbulb
Memory in a computer is highly necessary because otherwise to get your computer to do anything, you would need to type out everything (in machine code). Instead, all of the 1's and 0's are stored inside some storage device, either a spinning hard disk with a magnetic head pin that 'reads' 1's or 0's based on the charge of the disk, or a flash memory device that uses a series of transistors, where sending a voltage through elicits 1's and 0's (I'm not fully aware how flash memory works)
Fortunately, someone took the time to think up a different base number system for programming (hex), and a way to compile those numbers (translate them) back into binary. And then all software programs have branched out from there.
Each key on the keyboard creates a specific signal in binary that translates to
a bunch of switches being turned on or off using certain voltages, so that a current could be run through the specific individual pixels on your screen that create "1" or "0" or "F", or all the characters of this post.
So I wonder, how does a program 'program', or 'make' the computer 'do' something... Rather, how does a compiler compile a program of a code different from binary?
It's hard to think about now because I'm extremely tired (so I won't try) but also because EVERYTHING you do on a computer is because of some program.
There are actively running programs (processes) in task manager. These keep your computer screen looking the way you've become accustomed, and also allow for the screen to be manipulated as if to say the pictures on the screen were real-life objects. (They aren't, they're just pictures, even your mouse cursor)
(Ok I'm done. enough editing and elongating my thoughts, it's time for bed)
Also, what I don't really get is how 0's are 'read' by the computer.
It seems that a '0' must not be a 'lack of voltage', rather, it must be some other type of signal
Where perhaps something like 1 volt = 1, and 0.5 volts = 0. Some distinguishable difference between currents in a circuit that would still send a signal, but could be the difference between opening and closing a specific circuit.
If I'm close to right about any of this, serious props to the computer engineers of the world, the level of sophistication is mouthwatering. I hope to know everything about technology someday. For now I'm just trying to get through arduino.
Lastly... something I've wondered about... would it even be possible to program today's computers without the use of another computer?
Machine language is a low-level programming language that generally consists entirely of numbers. Because they are just numbers, they can be viewed in binary, octal, decimal, hexadecimal, or any other way. Dave4723 gave a more thorough explanation in his answer.
Binary code isn't a very well-defined technical term, but it could mean any information represented by a sequence of 1s and 0s, or it could mean code in a machine language, or it could mean something else depending on context.
Technically, all files are stored in binary, we just don't usually look at the binary when we view a file. However, the term binary file is usually used to refer to any non-text file; e.g. an .exe, a .png, etc.
You have to understand how a computer works in its basic principles and this will clear things up for you... Therefore I recommend on reading into stuff like Neumann Architecture
Basically in a very simple computer you only have one memory like an array
which has instructions for your processor, the data and everything is a binary numbers.
Your program starts at a certain place in your memory and reads the first number...
so here comes the twist: these numbers can be instructions or data.
Your processor reads these numbers and interprets them as instructions
Example: the start address is 0
in 0 is a instruction like "read value from address 120 into the ALU (Math-Unit)
then it steps to address 1
"read value from address 121 into ALU"
then it steps to address 2
"subtract numbers in ALU"
then it steps to address 3
"if ALU-Value is smaller than zero go to address 10"
it is not smaller than zero so it steps to address 4
"go to address 20"
you see that this is a basic if(a < b)
You can write these instructions as numbers and they can be run by your processor but because nobody wants to do this work (that was what they did with punchcards in the 60s)
assembler was invented...
that looks like:
add 10 ,11, 20 // load var from address 10 and 11; run addition and store into address 20
In Conclusion:
Assembler (processor instructions) can be called binary because it's stored in plain numbers
But everything else can be a Binary file, too.
In reality if you have a simple .exe file it is both... If you have variables in there like a = 10 and b = 20, these values can be stored some where between if clauses and for loops... It depends on the compiler where it put these
But if you have a complex 3D-model it can be stored in a separate file with no executable code in it...
I hope it helps to clear things up a little.

Why Does the BIOS INT 0x19 Load Bootloader at "0x7C00"?

As we know the BIOS Interrupt (INT) 0x19 which searches for a boot signature (0xAA55). Loads and executes our bootloader at 0x7C00 if it found.
My Question : Why 0x7C00? What is the reason ? How to evaluate it through some methods?
Maybe because the MBR is loaded into the memory (by the BIOS) into the 0x7c00 address then int 0x19 searches for the MBR sector signature 0xAA55 on sector 0x7c00
about 0xAA55:
Not a checksum, but more of a signature. It does provide some simple
evidence that some MBR is present.
0xAA55 is also an alternating bit pattern: 1010101001010101
It's often used to help determine if you are on a little-endian or
big-endian system, because it will read as either AA55 or 55AA. I
suspect that is part of why it is put on the end of the MBR.
about 0x7c00:
Check this website out (this might help u in finding the answer): https://www.glamenv-septzen.net/en/view/6
This probably dead but I'm going to answer.
At the start of any bootloader when you set the origin of the segment to 0x7c00 then the registers jump address to that as well. So ideally if you check out some online resources that tell you how to use the int 0x19 command they will guide you on how to jump to another address.
To fix this you would ideally, reset the stack to 0 at the start of every jump to an new address.
( It seems like this duplicates following questions:
What is significance of memory at 0000:7c00 to booting sequence?
Does the BIOS copy the 512-byte bootloader to 0x7c00 )
Inspired by an answer to the former, I would quote two sources:
[Notes] Why is the MBR loaded to 0x7C00 on x86? (Full Edition)
So, first answer the question about the 16KB model from David Bradley's reply:
It had to boot on a 32KB machine. DOS 1.0 required a minimum of 32KB,
so we weren't concerned about attempting a boot in 16KB.
To execute DOS 1.0, at least 32KB is required, so the 16KB model has
not been considered.
Followed by the answer to "Why 32KB-1024B?":
We wanted to leave as much room as possible for the OS to load itself
within the 32KB. The 808x Intel architecture used up the first portion
of the memory range for software interrupts, and the BIOS data area
was after it. So we put the bootstrap load at 0x7C00 (32KB-1KB) to
leave all the room in between for the OS to load. The boot sector was
512 bytes, and when it executes it'll need some room for data and a
stack, so that's the other 512 bytes . So the memory map looks like
this after INT 19H executes:
Why BIOS loads MBR into 0x7C00 in x86 ?
No, that case was out of consideration. One of IBM PC 5150 ROM BIOS
Developer Team Members, Dr. David Bradley says:
"DOS 1.0 required a minimum of 32KB, so we weren't concerned about
attempting a boot in 16KB."
(Note: DOS 1.0 required 16KiB minimum ? or 32KiB ? I couldn't find out
which correct. But, at least, in 1981's early BIOS development, they
supposed that 32KiB is DOS minimum requirements.)
BIOS developer team decided 0x7C00 because:
They wanted to leave as much room as possible for the OS to load
itself within the 32KiB.
8086/8088 used 0x0 - 0x3FF for interrupts vector, and BIOS data area
was after it.
The boot sector was 512 bytes, and stack/data area for boot program
needed more 512 bytes.
So, 0x7C00, the last 1024B of 32KiB was chosen.
Once OS loaded and started, boot sector is never used until power reset.
So, OS and application can use the last 1024B of 32KiB freely.
I hope this answer is based enough to be sure why / how it happened so.

entering ring 0 from user mode

Most modern operating systems run in the protected mode. Now is it possible for the user programs to enter the "ring 0" by directly setting the corresponding bits in some control registers. Or does it have to go through some syscall.
I believe to access the hardware we need to go through the operating system. But if we know the address of the hardware device can we just write some assembly language code with reference to the location of the device and access it. What happens when we give the address of some hardware device in the assembly language code.
Thanks.
To enter Ring 0, you must perform a system call, and by its nature, the system controls where you go, because for the call you simply give an index to the CPU, and the CPU looks inside a table to know what to call. You can't really get around the security aspect (obviously) to do something else, but maybe this link will help.
You can ask the operating system to map the memory of the hardware device into the memory space of your program. Once that's done, you can just read and write that memory from ring 3. Whether that's possible to do, or how to do that, depends on the operating system or the device.
; set PE bit
mov cr0, eax
or eax, 1
mov eax, cr0
; far jump (cs = selector of code segment)
jmp cs:#pm
#pm:
; Now we are in PM
Taken from Wikipedia.
Basic idea is to set (to 1) 0th bit in cr0 control register.
But if you are already in protected mode (i.e. you are in windows/linux), security restricts you to do it (you are in ring 3 - lowest trust).
So be the first one to get into protected mode.

Mixing 32 bit and 16 bit code with nasm

This is a low-level systems question.
I need to mix 32 bit and 16 bit code because I'm trying to return to real-mode from protected mode. As a bit of background information, my code is doing this just after GRUB boots so I don't have any pesky operating system to tell me what I can and can't do.
Anyway, I use [BITS 32] and [BITS 16] with my assembly to tell nasm which types of operations it should use, but when I test my code use bochs it looks like the for some operations bochs isn't executing the code that I wrote. It looks like the assembler is sticking in extras 0x66 and 0x67's which confuses bochs.
So, how do I get nasm to successfully assemble code where I mix 32 bit and 16 bit code in the same file? Is there some kind of trick?
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
The problem turned out to be that I wasn't setting up my descriptor tables correctly. I had one bit flipped wrong so instead of going to 16-bit mode I was going to 32-bit mode (with segments that happened to have a limit of one meg).
Thanks for the suggestions!
Terry
The 0x66 and 0x67 are opcodes that are used to indicate that the following opcode should be interpreted as a non-default bitness. More specifically, (and according to this link),
"When NASM is in BITS 16 mode, instructions which use 32-bit data are prefixed with an 0x66 byte, and those referring to 32-bit addresses have an 0x67 prefix. In BITS 32 mode, the reverse is true: 32-bit instructions require no prefixes, whereas instructions using 16-bit data need an 0x66 and those working on 16-bit addresses need an 0x67."
This suggests that it's bochs that at fault.
You weren't kidding about this being low-level!
Have you checked the generated opcodes / operands to make sure that nasm is honoring your BITS directives correctly? Also check to make sure the jump targets are correct - maybe nasm is using the wrong offsets.
If it's not a bug in nasm, maybe there is a bug in bochs. I can't imagine that people switch back to 16-bit mode from 32-bit mode very often anymore.
If you're in real mode your default size is implicitly 16 bits, so you should use BITS 16 mode. This way if you need a 32-bit operand size you add the 0x66 prefix, and for a 32-bit address size you add the 0x67 prefix.
Look at the Intel IA-32 Software Developer's Guide, Volume 3, Chapter 16 (MIXING 16-BIT AND 32-BIT CODE; the chapter number might change according to the edition of the book):
Real-address mode, virtual-8086 mode, and SMM are native 16-bit modes.
The BITS 32 directive will only confuse the assembler if you use it outside of Protected Mode or Long Mode.