What happens to a program that contains certain instructions that are not supported by the cpu in question? - x86-64

Say for example in the case of x86-64 you have a program that makes use of avx-512 instructions or any other special purpose instruction, and the cpu in question does not support the instruction at the low level such as a haswell based cpu, how does this situation get resolved?
Is the program simply just not able to run (my first thought) or is there a way for the cpu to still run the program using the instructions it does support, however it will simply not receive the performance advantage offered by the instruction in question?

If the program checks CPU features and avoids actually run unsupported instructions, you just miss out on performance. Libraries like x264 and x265 do this, setting function pointers when they start to versions of functions that are the best supported version for this CPU.
If you do run an unsupported instruction (e.g. because you compiled with gcc -O3 -march=skylake-avx512 to tell the compiler it could assume AVX-512 support) there are 2 possibilities:
It happens to decode as something else (which won't be the case for AVX-512, but does happen for example with lzcnt decoding as bsr).
The CPU raises a #UD exception (UnDefined instruction). The OS handles it and delivers a SIGILL (Illegal Instruction) to the process, or equivalent on non-POSIX OSes like Windows.

Related

Do instruction sets like x86 get updated? If so, how is backwards compatibility guaranteed?

How would an older processor know how to decode new instructions it doesn't know about?
New instructions use previously-unused opcodes, or other ways to find more "coding space" (e.g. prefixes that didn't previously mean anything for a given opcode).
How would an older processor know how to decode new instructions it doesn't know about?
It won't. A binary that wants to work on old CPUs as well as new ones has to either limit itself to a baseline feature set, or detect CPU features at run-time and set function pointers to select versions of a few important functions. (aka "runtime dispatching".)
x86 has a good mechanism (the cpuid instruction) for letting code query CPU features without any kernel support needed. Some other ISAs need CPU info hard-coded into the OS or detected via I/O accesses so the only viable method is for the kernel to export info to user-space in an OS-specific way.
Or if you're building from source on a machine with a newer CPU and don't care about old CPUs, you can use gcc -O3 -march=native to let GCC use all the ISA extensions the current CPU supports, making a binary that will fault on old CPUs. (e.g. x86 #UD (UnDefined instruction) hardware exception, resulting in the OS delivering a SIGILL or equivalent to the process.)
Or in some cases, a new instruction may decode as something else on old CPUs, e.g. x86 lzcnt decodes as bsr with an ignored REP prefix on older CPUs, because x86 has basically no unused opcodes left (in 32-bit mode). Sometimes this "decode as something else" is actually useful as a graceful fallback to allow transparent use of new instructions, notably pause = rep nop = nop on old CPUs that don't know about it. So code can use it in spin loops without checking CPUID.
-march=native is common for servers where you're setting things up to just run on that server, not making a binary to distribute.
Most of the times, old processor will have "Undefined Instruction" exception. The instruction is not defined in old CPU.
In more rare cases, the instruction will execute as a different instruction. This happens when then new instruction is encoded via obligatory prefix. As an example, PAUSE is encoded as REP NOP, so it executed as nothing on older CPUs.

How to load symbol files to BCC profiler

With bcc tools' profile, I am getting mostly "[unknown]" in the profile output on my C program. This is, of course, expected because the program symbol are not loaded. However, I am not sure how to properly load the symbols so that "profile" program can pick it up. I have built my program with debug enabled "-g", but how do I load the debug symbols to "profile"?
Please see the DEBUGGING section in bcc profile's manpage:
See "[unknown]" frames with bogus addresses? This can happen for different
reasons. Your best approach is to get Linux perf to work first, and then to
try this tool. Eg, "perf record -F 49 -a -g -- sleep 1; perf script", and
to check for unknown frames there.
The most common reason for "[unknown]" frames is that the target software has
not been compiled
with frame pointers, and so we can't use that simple method for walking the
stack. The fix in that case is to use software that does have frame pointers,
eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer.
Another reason for "[unknown]" frames is JIT compilers, which don't use a
traditional symbol table. The fix in that case is to populate a
/tmp/perf-PID.map file with the symbols, which this tool should read. How you
do this depends on the runtime (Java, Node.js).
If you seem to have unrelated samples in the output, check for other
sampling or tracing tools that may be running. The current version of this
tool can include their events if profiling happened concurrently. Those
samples may be filtered in a future version.
In your case, since it's a C program, I would recommend compiling with -fno-omit-frame-pointer.

Compiling Perl for Performance

What different methods can be used to compile Perl differently, in such a way that would actually improve the performance of Perl scripts run on that machine? Though outdated, http://dan.corlan.net/bench.html, seems to indicate that different performance results can be achieved by compiling things differently. Is that the case, or am I misunderstanding something?
Are there any performance gains from not using a default Perl package (or one that is installed by default in Linux)?
I never measured this but I was led to believe that perl compiled without threads is 10% faster. I am not sure if this is "on average" or on "certain operations" or if it is true at all.
The perl that comes with most (or all?) Linux distributions was compiled with threads.
Based on this, if you build your own perl without threads it should be faster. Incidentally this is what you get when you compile it with the default flags.
Steffen Schwingon has been doing some performance measurements and wrote about them here:
http://blogs.perl.org/users/steffen_schwigon/2012/01/perlformance.html
It would be nice if made some measurements and showed some results.

"All programs are interpreted". How?

A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?
A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c
Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.
My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.
The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)
I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.

What is INT 21h?

Inspired by this question
How can I force GDB to disassemble?
I wondered about the INT 21h as a concept. Now, I have some very rusty knowledge of the internals, but not so many details. I remember that in C64 you had regular Interrupts and Non Maskable Interrupts, but my knowledge stops here. Could you please give me some clue ? Is it a DOS related strategy ?
From here:
A multipurpose DOS interrupt used for various functions including reading the keyboard and writing to the console and printer. It was also used to read and write disks using the earlier File Control Block (FCB) method.
DOS can be thought of as a library used to provide a files/directories abstraction for the PC (-and a bit more). int 21h is a simple hardware "trick" that makes it easy to call code from this library without knowing in advance where it will be located in memory. Alternatively, you can think of this as the way to utilise the DOS API.
Now, the topic of software interrupts is a complex one, partly because the concepts evolved over time as Intel added features to the x86 family, while trying to remain compatible with old software. A proper explanation would take a few pages, but I'll try to be brief.
The main question is whether you are in real mode or protected mode.
Real mode is the simple, "original" mode of operation for the x86 processor. This is the mode that DOS runs in (when you run DOS programs under Windows, a real mode processor is virtualised, so within it the same rules apply). The currently running program has full control over the processor.
In real mode, there is a vector table that tells the processor which address to jump to for every interrupt from 0 to 255. This table is populated by the BIOS and DOS, as well as device drivers, and sometimes programs with special needs. Some of these interrupts can be generated by hardware (e.g. by a keypress). Others are generated by certain software conditions (e.g. divide by 0). Any of them can be generated by executing the int n instruction.
Programs can set/clear the "enable interrupts" flag; this flag affects hardware interrupts only and does not affect int instructions.
The DOS designers chose to use interrupt number 21h to handle DOS requests - the number is of no real significance: it was just an unused entry at the time. There are many others (number 10h is a BIOS-installed interrupt routine that deals with graphics, for instance). Also note that all this is for IBM PC compatibles only. x86 processors in say embedded systems may have their software and interrupt tables arranged quite differently!
Protected mode is the complex, "security-aware" mode that was introduced in the 286 processor and much extended on the 386. It provides multiple privilege levels. The OS must configure all of this (and if the OS gets it wrong, you have a potential security exploit). User programs are generally confined to a "minimal privilege" mode of operation, where trying to access hardware ports, or changing the interrupt flag, or accessing certain memory regions, halts the program and allows the OS to decide what to do (be it terminate the program or give the program what it seems to want).
Interrupt handling is made more complex. Suffice to say that generally, if a user program does a software interrupt, the interrupt number is not used as a vector into the interrupt table. Rather a general protection exception is generated and the OS handler for said exception may (if the OS is design this way) work out what the process wants and service the request. I'm pretty sure Linux and Windows have in the past (if not currently) used this sort of mechanism for their system calls. But there are other ways to achieve this, such as the SYSENTER instruction.
Ralph Brown's interrupt list contains a lot of information on which interrupt does what. int 21, like all others, supports a wide range of functionality depending on register values.
A non-HTML version of Ralph Brown's list is also available.
The INT instruction is a software interrupt. It causes a jump to a routine pointed to by an interrupt vector, which is a fixed location in memory. The advantage of the INT instruction is that is only 2 bytes long, as oposed to maybe 6 for a JMP, and that it can easily be re-directed by modifying the contents of the interrupt vector.
Int 0x21 is an x86 software interrupt - basically that means there is an interrupt table at a fixed point in memory listing the addresses of software interrupt functions. When an x86 CPU receives the interrupt opcode (or otherwise decides that a particular software interrupt should be executed), it references that table to execute a call to that point (the function at that point must use iret instead of ret to return).
It is possible to remap Int 0x21 and other software interrupts (even inside DOS though this can have negative side effects). One interesting software interrupt to map or chain is Int 0x1C (or 0x08 if you are careful), which is the system tick interrupt, called 18.2 times every second. This can be used to create "background" processes, even in single threaded real mode (the real mode process will be interrupted 18.2 times a second to call your interrupt function).
On the DOS operating system (or a system that is providing some DOS emulation, such as Windows console) Int 0x21 is mapped to what is effectively the DOS operating systems main "API". By providing different values to the AH register, different DOS functions can be executed such as opening a file (AH=0x3D) or printing to the screen (AH=0x09).
This is from the great The Art of Assembly Language Programming about interrupts:
On the 80x86, there are three types of events commonly known as
interrupts: traps, exceptions, and interrupts (hardware interrupts).
This chapter will describe each of these forms and discuss their
support on the 80x86 CPUs and PC compatible machines.
Although the terms trap and exception are often used synonymously, we
will use the term trap to denote a programmer initiated and expected
transfer of control to a special handler routine. In many respects, a
trap is nothing more than a specialized subroutine call. Many texts
refer to traps as software interrupts. The 80x86 int instruction is
the main vehicle for executing a trap. Note that traps are usually
unconditional; that is, when you execute an int instruction, control
always transfers to the procedure associated with the trap. Since
traps execute via an explicit instruction, it is easy to determine
exactly which instructions in a program will invoke a trap handling
routine.
Chapter 17 - Interrupt Structure and Interrupt Service Routines
(Almost) the whole DOS interface was made available as INT21h commands, with parameters in the various registers. It's a little trick, using a built-in-hardware table to jump to the right code. Also INT 33h was for the mouse.
It's a "software interrupt"; so not a hardware interrupt at all.
When an application invokes a software interrupt, that's essentially the same as its making a subroutine call, except that (unlike a subroutine call) the doesn't need to know the exact memory address of the code it's invoking.
System software (e.g. DOS and the BIOS) expose their APIs to the application as software interrupts.
The software interrupt is therefore a kind of dynamic-linking.
Actually, there are a lot of concepts here. Let's start with the basics.
An interrupt is a mean to request attention from the CPU, to interrupt the current program flow, jump to an interrupt handler (ISR - Interrupt Service Routine), do some work (usually by the OS kernel or a device driver) and then return.
What are some typical uses for interrupts?
Hardware interrupts: A device requests attention from the CPU by issuing an interrupt request.
CPU Exceptions: If some abnormal CPU condition happens, such as a division by zero, a page fault, ... the CPU jumps to the corresponding interrupt handler so the OS can do whatever it has to do (send a signal to a process, load a page from swap and update the TLB/page table, ...).
Software interrupts: Since an interrupt ends up calling the OS kernel, a simple way to implement system calls is to use interrupts. But you don't need to, in x86 you could use a call instruction to some structure (some kind of TSS IIRC), and on newer x86 there are SYSCALL / SYSENTER intructions.
CPUs decide where to jump to looking at a table (exception vectors, interrupt vectors, IVT in x86 real mode, IDT in x86 protected mode, ...). Some CPUs have a single vector for hardware interrupts, another one for exceptions and so on, and the ISR has to do some work to identify the originator of the interrupt. Others have lots of vectors, and jump directly to very specific ISRs.
x86 has 256 interrupt vectors. On original PCs, these were divided into several groups:
00-04 CPU exceptions, including NMI. With later CPUs (80186, 286, ...), this range expanded, overlapping with the following ranges.
08-0F These are hardware interrupts, usually referred as IRQ0-7. The PC-AT added IRQ8-15
10-1F BIOS calls. Conceptually, these can be considered system calls, since the BIOS is the part of DOS that depends on the concrete machine (that's how it was defined in CP/M).
20-2F DOS calls. Some of these are multiplexed, and offer multitude of functions. The main one is INT 21h, which offers most of DOS services.
30-FF The rest, for use by external drivers and user programs.