Mixing 32 bit and 16 bit code with nasm - operating-system

This is a low-level systems question.
I need to mix 32 bit and 16 bit code because I'm trying to return to real-mode from protected mode. As a bit of background information, my code is doing this just after GRUB boots so I don't have any pesky operating system to tell me what I can and can't do.
Anyway, I use [BITS 32] and [BITS 16] with my assembly to tell nasm which types of operations it should use, but when I test my code use bochs it looks like the for some operations bochs isn't executing the code that I wrote. It looks like the assembler is sticking in extras 0x66 and 0x67's which confuses bochs.
So, how do I get nasm to successfully assemble code where I mix 32 bit and 16 bit code in the same file? Is there some kind of trick?
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

The problem turned out to be that I wasn't setting up my descriptor tables correctly. I had one bit flipped wrong so instead of going to 16-bit mode I was going to 32-bit mode (with segments that happened to have a limit of one meg).
Thanks for the suggestions!
Terry

The 0x66 and 0x67 are opcodes that are used to indicate that the following opcode should be interpreted as a non-default bitness. More specifically, (and according to this link),
"When NASM is in BITS 16 mode, instructions which use 32-bit data are prefixed with an 0x66 byte, and those referring to 32-bit addresses have an 0x67 prefix. In BITS 32 mode, the reverse is true: 32-bit instructions require no prefixes, whereas instructions using 16-bit data need an 0x66 and those working on 16-bit addresses need an 0x67."
This suggests that it's bochs that at fault.

You weren't kidding about this being low-level!
Have you checked the generated opcodes / operands to make sure that nasm is honoring your BITS directives correctly? Also check to make sure the jump targets are correct - maybe nasm is using the wrong offsets.
If it's not a bug in nasm, maybe there is a bug in bochs. I can't imagine that people switch back to 16-bit mode from 32-bit mode very often anymore.

If you're in real mode your default size is implicitly 16 bits, so you should use BITS 16 mode. This way if you need a 32-bit operand size you add the 0x66 prefix, and for a 32-bit address size you add the 0x67 prefix.
Look at the Intel IA-32 Software Developer's Guide, Volume 3, Chapter 16 (MIXING 16-BIT AND 32-BIT CODE; the chapter number might change according to the edition of the book):
Real-address mode, virtual-8086 mode, and SMM are native 16-bit modes.
The BITS 32 directive will only confuse the assembler if you use it outside of Protected Mode or Long Mode.

Related

Compiled COM files with empty project is over 10 KiB large in Turbo Pascal

I have a problem with the binary's size of old Pascal versions.
We need very small simple programs. We would like to use Turbo Pascal 2 in MS-DOS (higher is the same problem) to compile COM files. But the size is always 10 KiB and larger, even for an empty project like:
begin
end.
Compiled file sizes 10052 bytes. I do not understand why. I tested compiler commands, changed stack/heaps with no results.
Compilation output:
Compiling --> c:emtpy.com
3 lines
code: 0002 paragraphs (32 bytes), 0D7B paragraphs free
data: 0000 paragraphs (0 bytes), 0FE7 paragraphs free
stack/heap: 0400 paragraphs (16384 bytes) (minimum)
4000 paragraphs (262144 bytes) (maximum)
Is it possible to get a smaller COM file, and is it possible to convert the Pascal code automatically into ASM code?
Any version of Turbo Pascal up to 3.02 will result into an executable file which includes the whole Run-Time Library. As you discovered, the size of it for TP2 on your target operating system is about 10,050 bytes.
We need very small simple programs.
... then Turbo Pascal 2 is not a good option to start up. Better try with any version from 4 up, if you want to stick with Pascal and are targeting MS-DOS. Or switch to C or assembly language, which will be able to produce smaller executables, at the cost of being more difficult to develop.
[...] is it possible to convert the Pascal code automatically into ASM code.
It can be done using Turbo Pascal but it is not practical (basically you need a disassembler; IDA is such a tool, used nowadays; the version you need is not free.) Also you won't gain much by smashing some bytes from an already compiled application: you will end much better starting it straight in assembly language.
Anyway, the best course to achieve it is to drop Turbo Pascal and go to Free Pascal, which compiler produces .s files, which are written in assembly language (although maybe not in the the same syntax as you are used.) There is (was?) a sub-project to target the 16-bit i8086 processor, which seems reasonably up-to-date (I never tried it.)
Update
You mentioned in a comment you really need the .COM format (which Turbo Pascal 4-7 does not support directly). The problem then is about the memory model. .COM programs are natively using the so-called tiny model (16-bit code and data segments overlapping at the same location), but it can be somewhat evaded for application (not TSR) which can grab all the available memory; TP 1-3 for MS-DOS uses a variant of the compact model (data pointers are 32-bit "far" but code pointers are 16-bit "near", which caps at 64 Ki bytes of code); TP 4-7 are instead using the large model where each unit have a separate code segment. It could be possible to rewrite the Run-Time Library to use only one code segment, then relink the TP-produced executables to convert the FAR CALLs into NEAR CALLs (that one is easy since all the information is in the relocation table of the .EXE). However, you will be home sooner using directly Free Pascal, which supports natively the tiny memory model and can produce .COM executables; while still being highly compatible with Turbo Pascal.

Do instruction sets like x86 get updated? If so, how is backwards compatibility guaranteed?

How would an older processor know how to decode new instructions it doesn't know about?
New instructions use previously-unused opcodes, or other ways to find more "coding space" (e.g. prefixes that didn't previously mean anything for a given opcode).
How would an older processor know how to decode new instructions it doesn't know about?
It won't. A binary that wants to work on old CPUs as well as new ones has to either limit itself to a baseline feature set, or detect CPU features at run-time and set function pointers to select versions of a few important functions. (aka "runtime dispatching".)
x86 has a good mechanism (the cpuid instruction) for letting code query CPU features without any kernel support needed. Some other ISAs need CPU info hard-coded into the OS or detected via I/O accesses so the only viable method is for the kernel to export info to user-space in an OS-specific way.
Or if you're building from source on a machine with a newer CPU and don't care about old CPUs, you can use gcc -O3 -march=native to let GCC use all the ISA extensions the current CPU supports, making a binary that will fault on old CPUs. (e.g. x86 #UD (UnDefined instruction) hardware exception, resulting in the OS delivering a SIGILL or equivalent to the process.)
Or in some cases, a new instruction may decode as something else on old CPUs, e.g. x86 lzcnt decodes as bsr with an ignored REP prefix on older CPUs, because x86 has basically no unused opcodes left (in 32-bit mode). Sometimes this "decode as something else" is actually useful as a graceful fallback to allow transparent use of new instructions, notably pause = rep nop = nop on old CPUs that don't know about it. So code can use it in spin loops without checking CPUID.
-march=native is common for servers where you're setting things up to just run on that server, not making a binary to distribute.
Most of the times, old processor will have "Undefined Instruction" exception. The instruction is not defined in old CPU.
In more rare cases, the instruction will execute as a different instruction. This happens when then new instruction is encoded via obligatory prefix. As an example, PAUSE is encoded as REP NOP, so it executed as nothing on older CPUs.

why is there severals encodings for one instruction in ARMv7

I am currently trying to implement a disassembler for the ARM cortex A9, which implement the ARMv7 instruction set.
For that I am using the manual "DDI0406C_b_arm_architecture_reference_manual.pdf" that can be download here (after having registered on arm website) :
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.architecture/index.html
In this manual, in the part A8.8 with the instructions details, I can't understand why there is several encoding for one instruction (like A1, A2, ...), that all seem to be implemented with ARMv7.
Also, as the ARM cortex A9 used thumb-2, does it also implement the A1/A2/... encodings, or only the T1/T2... ?
I really read all parts of this manual are related to encodings, but I still don't understand how we can know which encoding is used for a program.
Different encoding of an instruction do functionally different things.
One example for usage of different encodings is A8.9.12 ADR
This instruction adds an immediate value to the PC value to form a PC-relative address, and writes the result to the
destination register.
If instruction is encoded as A1 then offset must be interpreted as zero or positive, if it is encoded as A2 then offset is negative.
Another example is A8.8.132 POP
If the list contains more than one register, the instruction is assembled to encoding A1. If the list contains exactly one register, the instruction is assembled to encoding A2.
I can imagine different POP encodings are created probably to create different microcodes for performance reasons.
For the second part of your question, Cortex-A9 is an ARMv7-A architecture CPU and it supports all the instructions as specified in the manual you pointed. May be you should also read Cortex™-A9 Technical Reference Manual.
There is no way to really distinguish between ARM and Thumb inside the instruction-stream. You can only decide based on the way a function gets called (if the lowest bit is set to 1 then it's thumb, otherwise arm).
The ARM-Encoding are quite "stable", you'll only find a few A1 encodings, BLX is an example where a A2 encoding is given, but this is mainly because the new ARM-ARM is structured differently from the older ones. BL and BLX were two different instructions, BLX was added in additional instruction space (the upper 4 bits which are normaly used for conditions are set to 1111, which in ARM prior to v5 meant "never execute".
For the Thumb-Encodings it's different, there are a lot of them, because they had to be placed in a more compressed instruction-space, page A6-220 has information about how to decide whatever an thumb instruction consists of two or just one halfword.
The Ax encodings are arm when the processor is in arm mode it will decode bits it finds using those encodings. if there is more than one A1, A2, it should be obvious that there is a different feature or reason for that. those two instructions can be considered separate (look at the overuse of the mov instruction in x86 for example, it has many encodings). Treat each encoding as a separate "instruction".
Then there are the Tx variants, those are thumb and thumb2 extensions. The thumb are all 16 bit (the bl can be decoded as two separate 16 bit instructions) and the descriptions below them indicate "all thumb variants" or "armv4t to the present" or some such language. The thumb2 extensions are all 32 bit, the first 16 bits being an undefined instruction in the thumb world. These have more limitations on what architectures support them.
You are not going to be able to completely create a disassembler for one of these processors, for the same reason you cant do one for x86 or many other processors (all?). If you assume that all the instructions are one mode (arm or thumb or thumb+thumb2) but no mode mixing (arm+thumb) then you can because everything is fixed instruction length and you can simply disassemble everything data and code and you wont run into any problems. In order to disassemble mixed mode you have to basically emulate/execute the instructions and follow instruction flow (just like a variable word length instruction set disassembler) to try to find the transitions, problem here of course the transitions are multi instruction at a minimum load a register then bx that register, sometimes there is math involved on the instruction computation, and there is no guarantee that the address computation or load happens the instruction before the bx. So you could do some of that and get a long way through disassembling the program.
If thumb2 is supported/allowed on the processor you are using then you have the variable instruction length problem for times that you detect entry points to thumb code. And unless you already are doing this you have to follow execution of the code to determine where instructions start (elementary variable instruction length disassembly stuff).
The combination of technical reference manual and architectural reference manual will tell you if the architecture and implementation of that architecture (trm) allow arm and thumb modes. I would assume an A9 supports arm thumb and thumb2, all three.
The cortex-m family is the only one so far that is limited to not supporting arm, and their thumb2 varies widely as the cortex-m0 (and m1) are armv6m and the m3 and m4 are armv7m (A few dozen (armv6m) instructions to many dozen thumb2 extensions in the armv7m). There are separate architectural reference manuals specifically for the -m variants, armv7-m vs the armv7-ar manuals for example.

Interested in VM for lisp-like languages on 8-bit system

I'm looking for recommended virtual machines that can run on a 8-bit microprocessor AND support dynamic languages. I'd like a VM solution because I perceive benefits in terms of code density, portability, and ability to have a smaller interpreter, leaving more room for larger programs.
My goal is to run a complete LOGO interpreter, following "LOGO for the Apple II" syntax, on something like a 6502 microprocessor.
I've seen references to PyMite, Java "micro edition", and of course now the UCSD p-System sources from the 1970s are available.
Suggestions are welcome.
(Note: I've already +1'ed the FORTH answer.)
Since you mention the 6502, Steve Wozniak (!) wrote an article for Byte magazine in the late 1970s, describing the SWEET16 interpreter for the 6502. This was a partial VM for the 6502, that provided 16-bit integer arithmetic that was EASILY interspersed into 6502 assembly language. It was the basis for the original Integer BASIC, that (as I recall) was later replaced by the floating-point Applesoft BASIC.
FORTH implementation for 6502.
You might want to check out the PICOBIT system, which is a Scheme implementation that works on very very small systems, such as the PIC18. It has since been ported to ARM, and could almost certainly be ported to the 6502 or other processors.

Is there some kind of tool to look at the encoding of Intel x86 instructions?

Forgive me if this might be a dumb question but, I'm in an assembly class that was mostly taught using an emulated CPU that was supposed to teach the concepts of assembly code. We haven't even written an Intel program, so I'm trying to adjust. In our emulated CPU, we were able to generate a symbol table file that gave the bytes equivalent for instructions:
http://imgur.com/tw5S8.png
Would I be able to do such a thing with Intel x86 instructions?
Try IDA. It has an option to show binary values of opcodes.
EDIT: Well.. it's a disassembler. Try opening a binary file, and set the number of opcode bytes to show (in Options/General/) to something that is not zero.
If you are looking for an IDE that shows you in real time the opcodes for the instruction you've used, then I don't think you'll find one, because of lack of "market". Can you explain why you need it? Do you want to know just their length, or want to learn them? There is simple pattern for lengths, so by dissasembling many binaries you'll catch it. If it's the opcodes you want.. well, there are lots of them, almost no rules, and practically no use to do it.
I see.. then you have to generate the list file . Your assembler should have an option for that. (for NASM it's -l listfile). Just put any instruction(s) in your .asm file, and generate listing for it. It should contain the binary encoding for each instruction.
First, get Intel Instruction Set Refference, or, better, this link: http://siyobik.info/index.php?module=x86 . There you'll find that most opcodes have several encodings. In your particular case, the bit 1 of the opcode specifies direction, and since both operands are registers, you can toggle the direction and swap the register codes, and the result will be the same. Usually you have this freedom on most register to register arithmetic operations. To check this, try decompiling with IDA this source file:
db 02h, E0h
db 00h, C4h
There is a demo program shipped with fasm.dll which has an editor and hex-viewer: