iphone dev : problem with inline asm - iphone

I read the different topics that deal with that kind of problem, but I still have no answer.
Here is my problem :
In my header file, i have this :
int cl, ch, _a = a, _b = b;\
__asm__ ("smull %0, %1, %2, %3\n"
"mov %0, %0, lsr %4\n"
"orr %0, %0, %1, lsl %5\n"
: "=&r" (cl), "=&r" (ch)
: "r" (_a), "r" (_b), "i" (r), "i" (32-(r)));
cl; })
In my project settings, I verified that these following options was ckecked
link text
But I have a console error :
{standard input}:242:selected processor doesn't support -- smull r0,r1,r2,r3'
{standard input}:244:unshifted register required --orr r0,r0,r1,lsl#20'
Could you help me ?

Are you compiling the file for arm? By default code for the iPhone is compiled for thumb (which is usually preferable unless you are doing floating point math). The ASM you listed is arm. You will need to set any file that uses that header to compile as arm, since GCC does not allow you to switch backends inside a single compilation unit. You can change it by deselecting "Compile for Thumb" under "GCC 4.x Code Generation."
Compiling your entire project as arm will most likely have a significant (negative) impact on memory usage, and by proxy performance. Including ASM via a macro in a header like that is going to prove very messy on the iPhone. In general you are better off putting all your ASM in a single file and compiling that one file as arm, but for what is only a 3 instruction sequence that will probably not be worth while either.

Related

GNU support for extended inline assembler for cortex m7 double precision FPU

The GCC suite of programs provides an extended inline assembler constraint for single precision FPU code for Cortex M4 & M7 MPUs, so it is straightforward to code a check of FPU performance.
However this facility is not available for double precision FPUs (as available with Cortex M7) and extra coding is required. Is anyone aware of whether GNU is working on the provision of such a facility?
My main problem was being able to output a double return from my inline assembler. In the absence of the compiler recognising a double constraint (which GNU documents clearly as not being available on Thumb instructions) there was no apparent way of achieving this. However by inspecting the emitted assembler for double returning functions, I noticed that the compiler ALWAYS moves the D7 register to the D0 register (as one of its last instructions) and D0 is of course a function's output register by convention. So I modified my code to use the D7 register to store my returned double and, it works!
Subsequent to the above, Nate Eldridge drew my attention to an undocumented GCC modifier (%P) in association with the (documented but not supposed to work with THUMB instructions) "w" constraint. The double precision code now works. Interestingly, when using the %P modifier the compiler emits D7 as its choice of register for the DP arithmetic; the subsequent move of D7 to D0 remains, but this time to good effect.

How to load symbol files to BCC profiler

With bcc tools' profile, I am getting mostly "[unknown]" in the profile output on my C program. This is, of course, expected because the program symbol are not loaded. However, I am not sure how to properly load the symbols so that "profile" program can pick it up. I have built my program with debug enabled "-g", but how do I load the debug symbols to "profile"?
Please see the DEBUGGING section in bcc profile's manpage:
See "[unknown]" frames with bogus addresses? This can happen for different
reasons. Your best approach is to get Linux perf to work first, and then to
try this tool. Eg, "perf record -F 49 -a -g -- sleep 1; perf script", and
to check for unknown frames there.
The most common reason for "[unknown]" frames is that the target software has
not been compiled
with frame pointers, and so we can't use that simple method for walking the
stack. The fix in that case is to use software that does have frame pointers,
eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer.
Another reason for "[unknown]" frames is JIT compilers, which don't use a
traditional symbol table. The fix in that case is to populate a
/tmp/perf-PID.map file with the symbols, which this tool should read. How you
do this depends on the runtime (Java, Node.js).
If you seem to have unrelated samples in the output, check for other
sampling or tracing tools that may be running. The current version of this
tool can include their events if profiling happened concurrently. Those
samples may be filtered in a future version.
In your case, since it's a C program, I would recommend compiling with -fno-omit-frame-pointer.

why is there severals encodings for one instruction in ARMv7

I am currently trying to implement a disassembler for the ARM cortex A9, which implement the ARMv7 instruction set.
For that I am using the manual "DDI0406C_b_arm_architecture_reference_manual.pdf" that can be download here (after having registered on arm website) :
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.architecture/index.html
In this manual, in the part A8.8 with the instructions details, I can't understand why there is several encoding for one instruction (like A1, A2, ...), that all seem to be implemented with ARMv7.
Also, as the ARM cortex A9 used thumb-2, does it also implement the A1/A2/... encodings, or only the T1/T2... ?
I really read all parts of this manual are related to encodings, but I still don't understand how we can know which encoding is used for a program.
Different encoding of an instruction do functionally different things.
One example for usage of different encodings is A8.9.12 ADR
This instruction adds an immediate value to the PC value to form a PC-relative address, and writes the result to the
destination register.
If instruction is encoded as A1 then offset must be interpreted as zero or positive, if it is encoded as A2 then offset is negative.
Another example is A8.8.132 POP
If the list contains more than one register, the instruction is assembled to encoding A1. If the list contains exactly one register, the instruction is assembled to encoding A2.
I can imagine different POP encodings are created probably to create different microcodes for performance reasons.
For the second part of your question, Cortex-A9 is an ARMv7-A architecture CPU and it supports all the instructions as specified in the manual you pointed. May be you should also read Cortex™-A9 Technical Reference Manual.
There is no way to really distinguish between ARM and Thumb inside the instruction-stream. You can only decide based on the way a function gets called (if the lowest bit is set to 1 then it's thumb, otherwise arm).
The ARM-Encoding are quite "stable", you'll only find a few A1 encodings, BLX is an example where a A2 encoding is given, but this is mainly because the new ARM-ARM is structured differently from the older ones. BL and BLX were two different instructions, BLX was added in additional instruction space (the upper 4 bits which are normaly used for conditions are set to 1111, which in ARM prior to v5 meant "never execute".
For the Thumb-Encodings it's different, there are a lot of them, because they had to be placed in a more compressed instruction-space, page A6-220 has information about how to decide whatever an thumb instruction consists of two or just one halfword.
The Ax encodings are arm when the processor is in arm mode it will decode bits it finds using those encodings. if there is more than one A1, A2, it should be obvious that there is a different feature or reason for that. those two instructions can be considered separate (look at the overuse of the mov instruction in x86 for example, it has many encodings). Treat each encoding as a separate "instruction".
Then there are the Tx variants, those are thumb and thumb2 extensions. The thumb are all 16 bit (the bl can be decoded as two separate 16 bit instructions) and the descriptions below them indicate "all thumb variants" or "armv4t to the present" or some such language. The thumb2 extensions are all 32 bit, the first 16 bits being an undefined instruction in the thumb world. These have more limitations on what architectures support them.
You are not going to be able to completely create a disassembler for one of these processors, for the same reason you cant do one for x86 or many other processors (all?). If you assume that all the instructions are one mode (arm or thumb or thumb+thumb2) but no mode mixing (arm+thumb) then you can because everything is fixed instruction length and you can simply disassemble everything data and code and you wont run into any problems. In order to disassemble mixed mode you have to basically emulate/execute the instructions and follow instruction flow (just like a variable word length instruction set disassembler) to try to find the transitions, problem here of course the transitions are multi instruction at a minimum load a register then bx that register, sometimes there is math involved on the instruction computation, and there is no guarantee that the address computation or load happens the instruction before the bx. So you could do some of that and get a long way through disassembling the program.
If thumb2 is supported/allowed on the processor you are using then you have the variable instruction length problem for times that you detect entry points to thumb code. And unless you already are doing this you have to follow execution of the code to determine where instructions start (elementary variable instruction length disassembly stuff).
The combination of technical reference manual and architectural reference manual will tell you if the architecture and implementation of that architecture (trm) allow arm and thumb modes. I would assume an A9 supports arm thumb and thumb2, all three.
The cortex-m family is the only one so far that is limited to not supporting arm, and their thumb2 varies widely as the cortex-m0 (and m1) are armv6m and the m3 and m4 are armv7m (A few dozen (armv6m) instructions to many dozen thumb2 extensions in the armv7m). There are separate architectural reference manuals specifically for the -m variants, armv7-m vs the armv7-ar manuals for example.

"All programs are interpreted". How?

A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?
A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c
Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.
My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.
The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)
I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.

Mixing 32 bit and 16 bit code with nasm

This is a low-level systems question.
I need to mix 32 bit and 16 bit code because I'm trying to return to real-mode from protected mode. As a bit of background information, my code is doing this just after GRUB boots so I don't have any pesky operating system to tell me what I can and can't do.
Anyway, I use [BITS 32] and [BITS 16] with my assembly to tell nasm which types of operations it should use, but when I test my code use bochs it looks like the for some operations bochs isn't executing the code that I wrote. It looks like the assembler is sticking in extras 0x66 and 0x67's which confuses bochs.
So, how do I get nasm to successfully assemble code where I mix 32 bit and 16 bit code in the same file? Is there some kind of trick?
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
The problem turned out to be that I wasn't setting up my descriptor tables correctly. I had one bit flipped wrong so instead of going to 16-bit mode I was going to 32-bit mode (with segments that happened to have a limit of one meg).
Thanks for the suggestions!
Terry
The 0x66 and 0x67 are opcodes that are used to indicate that the following opcode should be interpreted as a non-default bitness. More specifically, (and according to this link),
"When NASM is in BITS 16 mode, instructions which use 32-bit data are prefixed with an 0x66 byte, and those referring to 32-bit addresses have an 0x67 prefix. In BITS 32 mode, the reverse is true: 32-bit instructions require no prefixes, whereas instructions using 16-bit data need an 0x66 and those working on 16-bit addresses need an 0x67."
This suggests that it's bochs that at fault.
You weren't kidding about this being low-level!
Have you checked the generated opcodes / operands to make sure that nasm is honoring your BITS directives correctly? Also check to make sure the jump targets are correct - maybe nasm is using the wrong offsets.
If it's not a bug in nasm, maybe there is a bug in bochs. I can't imagine that people switch back to 16-bit mode from 32-bit mode very often anymore.
If you're in real mode your default size is implicitly 16 bits, so you should use BITS 16 mode. This way if you need a 32-bit operand size you add the 0x66 prefix, and for a 32-bit address size you add the 0x67 prefix.
Look at the Intel IA-32 Software Developer's Guide, Volume 3, Chapter 16 (MIXING 16-BIT AND 32-BIT CODE; the chapter number might change according to the edition of the book):
Real-address mode, virtual-8086 mode, and SMM are native 16-bit modes.
The BITS 32 directive will only confuse the assembler if you use it outside of Protected Mode or Long Mode.