How does a computer distinguish whether a binary pattern is an instruction or just a number? - cpu-architecture

I am reading the book "Computer Organization and Embedded Systems" by Hamacher and my question is: "How does a computer distinguish whether a binary pattern is an instruction or just a number?"
Can anyone help me understand that concept?

A Von Neumann Processor (pretty much any processor out there) can not distinguish between code and data in memory. What ever the instruction pointer of the CPU points to will be loaded into the instruction decoder as an instruction. If it is not a valid instruction it will raise an exception in the CPU.
This enables a program to create new executable code in memory or even change its own code. On the other hand this enables many code injection attacks.

The way the computer distinguishes between instructions and numbers simply depends on what is reading the data and where. For example, a simple Arithmetic Logic Unit (ALU) will include an input for the operation to be done, and two inputs for the operands. The data going into the operand ports is read as numbers, whereas the data going into the operator input is read as an instruction.
It all depends on what computer architecture unit is reading the data, and on what input that unit is reading it.

Related

What is the difference between Program Status Word (PSW) and Program Counter (PC)?

In an Operating Systems course, the instructor introduced PSW and PC when he talked about Interrupt Handling.
His explanation was
PC holds the address of the next instruction to be fetched
PSW contains execution status information
But later I searched online and found that PSW = PC + status register. This makes me quite confused.
On the one hand, I am not sure what "execution status information" refers to. On the other hand, if PSW has the functions of a PC, why do we still need it?
Appreciate any explanation.
This isn't really standardized terminology. Most architectures have some register that plays the role of a status word, containing bits to indicate things like whether an add instruction caused a carry. But different architectures give it different names, and what exactly is included can vary widely. I'm not aware of any architecture that includes the program counter as part of their status word, but if they want to do that, well, who's going to stop them?
This is the kind of thing where you just have to look at the definition given by whatever book or article you are reading (or infer it from context), and realize that a different author may use the word differently.
In general, interrupts are hardware level subroutine calls. They do the same thing as a subroutine call (change the algorithm that the processor is executing) however they do it without warning the "executing code" that they are now operating.
In order to not damage the "executing code" all information that it was using must be stored. This includes the Program Counter (usually saved to the stack by the interrupt hardware in the same way that a subroutine call does) and all of the registers that the interrupt function will alter- these must be saved by pushing them onto the stack. The registers etc must be restored before the return from interrupt (RETI) instruction - the PC is restored by the RETI itself.
The PSW (often called the flag register) is a very important register and must generally be saved first. It contains bits like Zero (the last calculation resulted in a zero result) Carry (the last calculation resulted in a carry ie the result number is bigger than the register can hold) and several other flags. I suggest that you read the data sheet of an 8 bit microcontroller for an idea of what these flags might be. suffice it to say that these flags are needed in order to perform conditional jumps. And whilst they will often be ignored you can't take that chance.
You are probably correct in Your instructor using the term PSW to mean all all of the registers.
The subject of interrupts contains concepts that are common to subroutine calls in general (e.g. don't leave data that you don't want overwritten in a register before entering a subroutine). And later on in operating systems, the concept of context switches that occur during multi-tasking.
Peter

OracleSolaris 11.2 -- is /usr/kernel/drv/driver.conf required for PCI?

I'm implementing a small PCI driver for academic purposes, and one thing I'm not clear about if we actually have to provide driver.conf? Different materials which I read (including http://blog.csdn.net/hotsolaris/article/details/1763716), say that for PCI the driver config file is optional, however in my case it seems that pci_config_setup() is successful only with driver.conf provided:
name="mydrv" parent="/pci#0,0/pci8086,2e11"
Then I do:
% add_drv -i 'pciXXXX,YY' mydrv
and it adds in the system with no warning or error messages.
So I assume that some properties of a PCI device can't be derived automatically by the system, e.g. parent bus?
I would appreciate if anybody could shed some light on this. Thanks.
If you look at a random selection of very small files under /kernel/drv for actual physical hardware, you'll see that they almost always only contain the line
ddi_forceattach=1;
Pseudo drivers will have a driver.conf(4) file which reflects their parentage in the system. I really recommend reading that manpage, it goes into good detail about what's required here.

trying to know more about verilog language, vhdl,and assembly language

I would like to know what is the difference between verilog and assembly language.
Next semester we will be working with micro-controllers, but I would like to learn a little bit about it before the semester begins. I've been doing a lot of research about low-level programming, and so far I have gained a good understanding in assembly language, but I get confused trying to understand Verilog and VHDL?
Verilog and VHDL are completely different languages for describing hardware, for purposes of programming FPGAs.
FPGAs are devices that can be on-the-fly programmed to implement any sort of digital logic (and sometimes analog too).
So using verilog or VHDL, I can design a circuit that creates a couple latches, some twos-complement adders, a mux, and a clock source, and suddenly you've just designed a circuit that can calculate. You could then take the output from the VHDL compiler (or whatever its called), "download" it to the FPGA, and now you actually have some hardware that can be used to do calculation.
Of course, you can use FPGAs to implement all sorts of complicated stuff - even a full custom CPU. One uses verilog and VHDL to design the circuits that are programmed to FPGAs. Those circuits could implement something simple like a ripple counter, or something more complex like a LCD driver, or something even more complex like a USB transceiver. You can go from as simple as a few latches to as complicated as a fully operating CPU; as long as its digital hardware, you can make whatever you want with VHDL and some FPGAs.
To clarify further -
"Assembly language" typically refers to raw instructions given to some sort of CPU. Of course, there are many different types of CPUs (x86, ARM, SPARC, MIPS) and further many different variants of those types of CPUs. Each CPU has its own instruction set.
Machine code is complete, fully specified, ready to be executed instructions. Assembly languages allow you type instructions from your CPU's instruction set in plain text, use labels and such, and describe the memory layout structure of the program. Put the assembly through an assembler and out comes machine code in your CPUs machine instruction set.
You could design your own CPU from scratch using VHDL. As you're designing the CPU, you would have it implement your own custom instruction set. From there, you could take the VHDL for your CPU, compile it, write it to an FPGA and have your own custom CPU. Then you could start writing programs for your made-up CPU using your custom instruction set by writing a custom assembler. Some friends of mine in college did this for giggles.
For example, you know how most CPUs are load-store, register based CPUs? Instructions tend to go something like this:
Load the value '1' into register A
Load the value '2' into register B
Add register A and register B, storing result in register A
(You just added 1 + 2! Heh)
That sort of model of computation happens to be the most popular, but it's not the only way you could do computation. What if you had a stack based CPU, where you push values onto a hardware stack, and then computations work with the values on the top of the stack, pushing results back onto the stack.
For instance:
Push 1 onto the stack (stack current contains: 1)
Push 2 onto the stack (stack current contains: 2 1)
Push 3 onto the stack (stack currently contains: 3 2 1 )
Add
'Add' takes the top two elements on the stack, adds them together, and pushes the result on the top of the stack.
Stack now contains: 5 1
Add
Stack now contains: 6
Neat isn't it? As far as a computation model goes, it has its advantages - operands tend to be short, and need fewer bits. Smaller instructions means that the CPU can be faster.
The problem is that no such processor like this exists anymore.
But if you knew what you were doing, you could design one in VHDL, program it to an FGPA, and suddenly you have one of the only operating stack-based processors in existence.
Say, if you were doing a masters thesis, for instance, you might dig around and find out that virtual-machine-based programming languages like C# and Java compile down to a bytecode for a CPU that doesn't really exist, but the model for that CPU proves useful for making code portable. You might find out that the imaginary machines used by these languages are based on stack-based processor models. If you were looking for something interesting to do, perhaps you write in VHDL a processor that natively implements the Java bytecode language. Now you'd be the only person that has a computer that can directly run Java.
Verilog and VHDL are both HDLs (Hardware description languages) used mainly for describing digital electronics. Their targets may be FPGA or ASIC (custom silicon).
Assembly level on the other hand is using an processors instruction set to perform a series of calculations. Every thing executed on a computer eventually ends up as an assembly level instruction. One example of an instruction set would be the x86 ISA.
Summary: Verilog, VHDL describe hardware. Assembly is the low level program being executed on a processor.

Looking for the best equivalents of prefetch instructions for ia32, ia64, amd64, and powerpc

I'm looking at some slightly confused code that's attempted a platform abstraction of prefetch instructions, using various compiler builtins. It appears to be based on powerpc semantics initially, with Read and Write prefetch variations using dcbt and dcbtst respectively (both of these passing TH=0 in the new optional stream opcode).
On ia64 platforms we've got for read:
__lfetch(__lfhint_nt1, pTouch)
wherease for write:
__lfetch_excl(__lfhint_nt1, pTouch)
This (read vs. write prefetching) appears to match the powerpc semantics fairly well (with the exception that ia64 allows for a temporal hint).
Somewhat curiously the ia32/amd64 code in question is using
prefetchnta
Not
prefetchnt1
as it would if that code were to be consistent with the ia64 implementations (#ifdef variations of that in our code for our (still live) hpipf port and our now dead windows and linux ia64 ports).
Since we are building with the intel compiler I should be able to many of our ia32/amd64 platforms consistent by switching to the xmmintrin.h builtins:
_mm_prefetch( (char *)pTouch, _MM_HINT_NTA )
_mm_prefetch( (char *)pTouch, _MM_HINT_T1 )
... provided I can figure out what temporal hint should be used.
Questions:
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Some systems support the prefetchw instructions for writes
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
If the line is exclusively used by the calling thread, it shouldn't matter how you bring the line, both reads and writes would be able to use it. The benefit for prefetchw mentioned above is that it will bring the line and give you ownership on it, which may take a while if the line was also used by another core. The hint level on the other hand is orthogonal with the MESI states, and only affects how long would the prefetched line survive. This matters if you prefetch long ahead of the actual access and don't want to prefetch to get lost in that duration, or alternatively - prefetch right before the access, and don't want the prefetches to thrash your cache too much.
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Just speculating - perhaps the larger caches and aggressive memory BW are more vulnerable to bad prefetching and you'd want to reduce the impact through the non-temporal hint. Consider that your prefetcher is suddenly set loose to fetch anything it can, you'd end up swamped in junk prefetches that would through away lots of useful cachelines. The NTA hint makes them overrun each other, leaving the rest undamaged.
Of course this may also be just a bug, I can't tell for sure, only whoever developed the compiler, but it might make sense for the reason above.
The best resource I could find on x86 prefetching hint types was the good ol' article What Every Programmer Should Know About Memory.
For the most part on x86 there aren't different instructions for read and write prefetches. The exceptions seem to be those that are non-temporal aligned, where a write can bypass the cache but as far as I can tell, a read will always get cached.
It's going to be hard to backtrack through why the earlier code owners used one hint and not the other on a certain architecture. They could be making assumptions about how much cache is available on processors in that family, typical working set sizes for binaries there, long term control flow patterns, etc... and there's no telling how much any of those assumptions were backed up with good reasoning or data. From the limited background here I think you'd be justified in taking the approach that makes the most sense for the platform you're developing on now, regardless what was done on other platforms. This is especially true when you consider articles like this one, which is not the only context where I've heard that it's really, really hard to get any performance gain at all with software prefetches.
Are there any more details known up front, like typical cache miss ratios when using this code, or how much prefetches are expected to help?

How to determine SSE prefetch instruction size?

I am working with code which contains inline assembly for SSE prefetch instructions. A preprocessor constant determines whether the instructions for 32-, 64- or 128-bye prefetches are used. The application is used on a wide variety of platforms, and so far I have had to investigate in each case which is the best option for the given CPU. I understand that this is the cache line size. Is this information obtainable automatically? It doesn't seem to be explicitly present in /proc/cpuinfo.
I think your question is related to this question or this one. I think it is clear that - unless you can rely on a OS or library-function - you will want to use the CPUID instruction, but the question then becomes exactly what information you are looking for. - And of course, AMD's and Intel's implementations don't need to agree. This page suggests using Cpuid.1.EBX[15:8] (i.e., BH) for finding out on Intel and function 80000005h on AMD. In addition, on Intel, CPUID.2... seems to contain the relevant information, but it looks like a real pain to parse out the desired information.
I think, from what I've read, both AMD and Intel CPUID instructions will support CPUID.1.EBX[15:8], which returns the size of one cache line in QUADWORDs as used by the CLFLUSH instruction (which isn't present on all processors, so I don't know whether you'll always find something there). So, after executing CPUID.1, you'd have to multiply BH by 8 to get the cache line size in bytes. This hinges on my implicit assumption (please can anyone say whether it is really valid?) that the definition of one cache line size is always the same for CLFLUSH and PREFETCHh instructions.
Also, Intel's manuals states that PREFETCHh is only a hint, but that, if it prefetches anything, it will always be a minimum of 32 bytes.
EDIT1:
Another useful resource (even if not directly answering your question) for the optimised use of PREFETCHh is Intel's optimisation manual here.