which component manages or provides instructions to the control unit in a processor? - cpu-architecture

I am newbie to computer architecture and I have the following questions,
Which unit or component controls operations such as incrementing program counter, loading the instruction to the IR and the other Fetch- decode-execute-write cycle?
If it is the control unit, how does it know when to perform the operations?
Is the Operating system involved in any of these tasks except scheduling which program to execute?
Why does it matter if the OS is 32 bit or 64 bit? Shouldn't we worry about the compiler or interpreter in this case?

Which unit or component controls operations such as incrementing
program counter, loading the instruction to the IR and the other
Fetch- decode-execute-write cycle?
A processor can either have a centralized control unit or a distributed control unit. Processors that are not pipelined or that have a two-stage pipeline (i.e., fetch and execute) use a centralized control unit. More sophisticated processors use distributed control where each stage of the pipeline may generate control signals. The term control refers to operations such as fetching instructions, reading data from and writing data to memory, determining the execution unit that can execute a given instruction, and determining dependencies between instructions. This is in contrast to the term datapath, which refers to the part of the CPU that contains the execution units and registers.
Ancient CPUs consisted of two components called the control path (aka control unit) and the datapath. You might have seen these terms in computer architecture textbooks. An example of such CPUs is the Intel 8086. In the 8086, the control unit is called the bus interface unit (BIU) and is responsible for the following tasks:
Calculating the physical address of the next instruction to fetch.
Calculating the physical address of memory or I/O locations to read from or write to (i.e., to perform load and store operations).
Fetching instruction bytes from memory and placing them into a buffer.
Reading and writing to memory or I/O devices.
The datapath of the 8086 is also called the execution unit and is responsible for the following tasks:
Reading the values of the registers specified by the instruction to be executed.
Writing the result of an instruction to the specified register.
Generating branch results or memory or I/O requests to the BIU. At any given cycle, the BIU has to arbitrate between either performing a read/write operation or an instruction fetch operation.
Executing instructions in the ALU.
The 8086 can be described as a pipelined processor with two stages corresponding to the two units. There is basically no decoding; the instruction bytes are hardwired to the ALU and register file to perform the specified operation.
If it is the control unit, how does it know when to perform the
operations?
Every instruction has an opcode, which is just a bunch of bits that identify the operation that needs to be performed by the ALU (on the 8086). The opcode bits can be simply hardwired by design to the ALU so that it does the right thing automatically. Other bits of the instruction may specify the operands or register identifiers, which may be passed to the register file to perform a register read or write operation.
Is the Operating system involved in any of these tasks except
scheduling which program to execute?
No.
Why does it matter if the OS is 32 bit or 64 bit? Shouldn't we worry
about the compiler or interpreter in this case?
It depends on whether the processor supports a 32-bit operating mode and a 64-bit operating mode. The number and/or size of registers and the supported features are different in different modes, which is why the instruction encoding is different. For example, the 32-bit x86 instruction set defines 8 32-bit general-purpose architectural registers while the 64-bit x86 instruction set (also called x86-64) defines 16 64-bit general-purpose architectural registers. In addition, the size of a virtual memory address is 64-bit on x86-64 and 32-bit on x86, which not only impacts the instruction encoding but also the format of an executable binary (See Executable and Linkable Format). A modern x86 processor generally supports both modes of operation. Since the instruction encoding is different, then the binary executables that contain them are different for different modes and can only run in the mode they are compiled for. A 32-bit OS means that the binaries of the OS require the processor to operate in 32-bit. This also poses a restriction that any application to be run on the OS must also be 32-bit because a 32-bit OS doesn't know how to run a 64-bit application (because the ABI is different). On the other hand, typically, a 64-bit OS is designed to run both 32-bit and 64-bit applications.
The gcc compiler will by default emit 64-bit x86 binaries if the OS it is running on is 64-bit. However, you can override that by specifying the -m32 compiler switch so that it emits a 32-bit x86 binary instead. You can use objdump to observe the many differences between a 32-bit executable and a 64-bit executable.

Related

Multicore CPUs, Different types of CPUs and operating systems

An operating system should support CPU architecture and not specific CPU, for example if some company has Three types of CPUs all based of x86 architecture,
one is a single core processor, the other one a dual core and the last one has five cores, The operating system isn't CPU type based, it's architecture based, so how would the kernel know if the CPU it is running on supports multi-core processing or how many cores does it even have....
also for example Timer interrupts, Some versions of Intel's i386 processor family use PIT and others use the APIC Timer, to generate periodic timed interrupts, how does the operating system recognize that if it wants for example to config it... ( Specifically regarding timers I know they are usually set by the BIOS but the ISR handles for Timed interrupts should also recognize the timer mechanism it is running upon in order to disable / enable / modify it when handling some interrupt )
Is there such a thing as a CPU Driver that is relevant to the OS and not the BIOS?, also if someone could refer me to somewhere I could gain more knowledge about how Multi-core processing is triggered / implemented by the kernel in terms of "code" It would be great
The operating system kernel almost always has an abstraction layer called the HAL, which provides an interface above the hardware the rest of the kernel can easily use. This HAL is also architecture-dependent and not model-dependent. The CPU architecture has to define some invokation method that will allow the HAL to know about which features are present and which aren't present in the executing processor.
On the IA32/64 architecture, the is an instruction known as CPUID. You may ask another question here:
Was CPUID present from the beginning?
No, CPUID wasn't present in the earliest CPUs. In fact, it came a lot later with the developement in i386 processor. The 21st bit in the EFLAGS register indicates support for the CPUID instruction, according to Intel Manual Volume 2A.
PUSHFD
Using the PUSHFD instruction, you can copy the contents of the EFLAGS register on the stack and check if the 21st bit is set.
How does CPUID return information, if it is just an instruction?
The CPUID instruction returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers. Its output depends on the values put into the EAX and ECX registers before execution.
Each value (which is valid for CPUID) that can be put in the EAX register is known as a CPUID leaf. Some leaves have subleaves, .i.e. they depend on an sub-leaf value in the ECX register.
How is multi-core support detected at the OS kernel level?
There is a standard known as ACPI (Advanced Configuration and Power Interface) which defines a set of ACPI tables. These include the MADT or multiple APIC descriptor table. This table contains entries that have information about local APICs, I/O APICs, Interrupt Redirections and much more. Each local APIC is associated with only one logical processor, as you should know.
Using this table, the kernel can get the APIC-ID of each local APIC present in the system (only those ones whose CPUs are working properly). The APIC id itself is divided into topological Ids (bit-by-bit) whose offsets are taken using CPUID. This allows the OS know where each CPU is located - its domain, chip, core, and hyperthreading id.

How are applications and data accessed by the CPU from RAM

I am having a bit of trouble understanding how applications and data are accessed by the CPU from RAM after the application has been loaded into RAM and a file opened (thus data for the file also stored in RAM).
By my understanding, a CPU just gets instructions from RAM as the program counter ticks or carries out tasks after an interrupt. How then does it access the application and data. Is it that it doesn't and still just gets instructions (for example to load a file on the hard drive to be opened in the application) and processes any requests made by the application which are stored in RAM as instructions thereafter (like saving a file). Or does the application and data relating to an opened file (for example) just stay in RAM and not get accessed by the CPU at all.
Similarly, after reading an article, it said that a copy of the operating system is stored in RAM. The CPU can then access the operating system. (I thought the CPU just worked with instructions from RAM). How does it then communicate with the operating system and how are interrupts sent to the CPU, from the copy of the OS in RAM or from the OS in the hard drive.
Sorry if this is really confusing, alot i didn't understand.
Root of your question: Lack of clear differentiation between Computer's Hardware and Computer's Software.
Components of a Computer System
Just so that we are clear about both of them and that we understand their nature, let me state as follows:
Hardware: It includes CPU, RAM, Disk, Register, Graphics Card, Network Card, Memory BUS and everything that you can touch and call to be the 'Computer'. It is the body.
Software: It includes Operating System, Program, CPU instruction, Compiler, Programming Language and almost everything intangible about the computer. It is the soul.
Firmware: It is that basic code which is absolutely essential for hardware's working. This is stored on a Read Only Memory installed in the hardware itself. This piece of software is vital for hardware therefore is considered in the mid of hardware and software and hence called Firmware.
We will start with understanding from the time when we say that the computer is up and running and is properly executing our instructions. But at that time you will say - How did I reach here? So I will mention a few points about the startup of the computer.
When the power button is pressed...
...the most primitive and basic input output system (therefore called BIOS), which is hard written on the computer hardware begins execution. This is written on Read Only Memory and this starts the process to get the machine to stand on its own. And it loads the software (Operating System) from one piece of hardware (disks) into another piece of hardware (RAM and CPU registers) enabling the software to work properly with hardware.
Now the body and soul are together and the individual (machine) can work.
Until now, OS is already in RAM and CPU. (Read When the power button is pressed if you doubt it.) Let's handle your question paragraph by paragraph now -
First Paragraph
I am having a bit of trouble understanding how applications and data
are accessed by the CPU from RAM after the application has been loaded
into RAM and a file opened (thus data for the file also stored in
RAM).
The explanation is as follows:
The exact issue here is your thinking that it is CPU and RAM that access the data. CPU and RAM are only executing units.
It is OS (software) that accesses the data by means of CPU and RAM (hardware). It is in the realm of OS where applications are executed.
This is why you can install Linux and Windows on same hardware but cannot execute .exe files in Linux because OS does the execution and not RAM/CPU.
Further, how do CPU and RAM and disk physically interact to bring in the data, execute it, save it back etc. is in the domain of hardware. That would require explanation which involves logic gates (AND, OR, NOT...), diodes, circuitry and a hell lot of other things which an Electronics guy can explain.
Second Paragraph
By my understanding, a CPU just gets instructions from RAM as the
program counter ticks or carries out tasks after an interrupt. How
then does it access the application and data. Is it that it doesn't
and still just gets instructions (for example to load a file on the
hard drive to be opened in the application) and processes any
requests made by the application which are stored in RAM as
instructions thereafter (like saving a file).
As you have guessed it - CPU doesn't get instructions, Operating System does it through CPU. Also, just the way brain doesn't directly instruct the hands and legs to move and instead uses nerves for interaction, the CPU doesn't tell the disks to give/take the data. CPU works with RAM and registers only. Multiple units of hardware work in conjunction to provide a path for data and instruction to travel. The important pieces of involved hardware are:
Processor (CPU and registers built in the CPU)
Cache
Memory (RAM)
Disk
Tape
I like the image provided in this answer. This image not only lists the hardware pieces but also illustrates the mammoth difference in the execution speed of these pieces.
Let's move on to the...
Third Paragraph
Similarly, after reading an article, it said that a copy of the
operating system is stored in RAM. The CPU can then access the
operating system. (I thought the CPU just worked with instructions
from RAM). How does it then communicate with the operating system and
how are interrupts sent to the CPU, from the copy of the OS in RAM or
from the OS in the hard drive.
By now you already know that indeed OS is present in RAM and CPU registers. That is where it lives. That is from where it tells the CPU how to work. If OS would be small enough (or if Registers and Caches would be big enough), the OS would live even closer to CPU.
The CPU does not communicate with the OS. It can't. It is the worker that is controlled by a boss. OS is that boss.
CPU cannot access Operating System. CPU is the body, OS is the soul. Soul tells the body what to do, not vice-versa.
CPU doesn't work with instructions from RAM. It merely executes the instructions given by the Operating System (which may be living in RAM). So even when there is an instruction to load some module of OS into the RAM, it is not RAM/CPU but OS itself that issues that instruction.
Interrupts are of two types - Hardware and Software - and your query is about the software interrupts. Since the executive part of OS is in the RAM, in simple words we can say that interrupts are sent to CPU from OS living in RAM.
Conclusions
The lack of distinction between hardware and software is the basic cause of your confusions. Take some course about Operating Systems on Coursera or Academic Earth for deeper understanding.
It is confusing indeed. Let me try to explain.
CPU and RAM
The CPU is hardwired to the RAM via the 'motherboard', and they work together. The CPU can perform many instructions, but it has to be told what to do by instructions in RAM. The CPU is basically in a loop: all it does it fetch the next instruction from RAM and execute it, over and over.
So how does this RAM get filled with instructions?
BIOS (basic input/output system)
When the computer first boots up, a portion of RAM is filled with data from a chip on the motherboard (the BIOS chip), and the CPU is turned on and starts processing. These are the factory settings.
The data from the BIOS chip that is copied to RAM consists of a library of instructions to access hardware devices (hard disks, CD/ROM, USB storage, network cards etc.),
and a program using that library to load what is called the bootsector, the first sector on the boot device, into RAM, and transfer control to it (with a jump instruction).
BOOTLOADER
The bootsector data that the BIOS program loaded from the boot device is very small - only 440 bytes - but with the help of the BIOS library, this is enough to be able to load more sectors and execute these. The bootsector and the data it loads is called the bootloader, which is in charge of loading the Operating System.
In effect, the bootloader is a more dynamic version of the BIOS: the BIOS program resides in flash memory, whereas the bootloader resides on hard disks, USB sticks, SSD drives etc., and thus can be larger and more complex.
OPERATING SYSTEM
In it's turn, The operating system (OS) is simply a more advanced version of the bootloader, as it can load and run multiple programs from multiple locations at the same time.
--
The BIOS knows about drives.
The Bootloader knows about drives and partitions.
The OS knows about drives, partitions, and file systems.
CPU,as you've noticed, reads the program from RAM, instruction by instruction. When an instruction is executed, it might refer to data stored in memory, which it either fetches explicitly to the registers (internal storage of the CPU, quite small - on x86_64 that's like several 64-bit registers + other stuff like segment registers, IP, SP etc) with a separate instruction, or the data read from the memory (we are talking about small amount of data). That's all it really does.
Loading a file from a disk would be done by asking the appropriate controller to fetch the data into a specific place in memory. CPU is connected to buses which will carry instructions to appropriate controllers.
As to interrupts these are special things - CPU has several interrupt lines which can be activated by various devices, for example your network card. When it receives such an interrupt, it is usually handled by an interrupt handler, which is just a program located in a well-known place in memory. They can be registered by, for example, operating system. Each interrupt line has its own interrupt handler. When interrupt happens, the CPU saves the current state of the program it happens to be executing, handles interrupt, restores the state and resumes the program.
You seem to be asking about addressing modes. At the risk of gross oversimplification (ignoring caching, segments, and logical memory), memory stored as a sequential array accessed by an integer address.
The CPU has a number of internal storage areas called registers. We will call them R0 to Rn. The processor assigns some registers dedicated purposes. One of those registers is the PC.
One common addressing mode is deferred. I indicate this mode as (Rn). An instruction like this:
MOV (R0), R1
uses the value contained in R0 as a memory address, fetches the value stored that memory location, and stores a copy of that value in R1.
An instruction sequence like this:
MOV (R0), R1
MOV (R2), R3
is stored in memory as data (ignoring protection), code, data, and variables all use the same type of memory. In other words, any memory location can be interpreted as code, data, or variable.
The CPU executes the next instruction located at (PC). After executing the instruction, the CPU automatically increments the PC to point to the next instruction.

What is a "Logical CPU Core"

I am reading some Operating Systems materials. I read this phrase that confused me a little:
"Multicore refers to a computer or processor that has more than one logical CPU core, and that can execute multiple instructions at the same time."
What is a "logical CPU core", is it a processor? Does it correspond to something physical, or is it the OS which sees logical CPU cores but in reality there is less physical processors than logical CPU cores?
A logical CPU core contains the complete architectural context of a uniprocessor. This is the unit for which the OS can do scheduling and control architectural state such as the address for exceptions (for an architecture that does not hardwire such).
There are two common cases where it will not correspond one-to-one with a physical core. First, a single physical core can implement multiple virtual processors, e.g., Intel's Hyper-Threading. In this case the OS scheduler should be aware that virtual processors may share various resources such as instruction fetch, instruction scheduling hardware, and execution units, which generally means that tasks should be scheduled to distinct physical cores to maximize performance. (This issue also applies to a lesser extent to distinct cores that share L2 cache. Such concerns are somewhat related to NUMA optimizations for multi-CPU computers.)
In the second case, a hypervisor's virtualization of the hardware can present an arbitrary number of cores to the OS. While a hypervisor would typically make visible to a guest OS no more logical processors than provided by hardware (i.e., including virtual processors associated with hardware multithreading), theoretically the hypervisor could present an arbitrary number of processors to the OS (just as an OS can present the impression of an arbitrary number of processors to the application layer by using time slicing). In such a software virtualization context, the hypervisor may not expose to the OS the nature of the processors, so the OS could only treat them as abstract units for scheduling.
Somewhat complicating this division, it is also possible for hardware to implement multithreading without providing a full virtual processor for each thread. E.g., the MIPS Multithreading Application Specific Extension makes a distinction between Virtual Processing Elements (which behave as distinct processors in terms of architectural state) and Thread Contexts (which share the system coprocessor among threads in the same VPE). As a further complication, it may be possible for Thread Contexts to be migrated among VPEs. E.g., a physical processor core might have two VPEs and five Thread Contexts and the OS might be allowed to assign a given TC to either VPE such that either VPE could have between one and four TCs. In addition, unprivileged software can FORK and YIELD threads without OS involvement if spare hardware threads are available (in the case of FORK) or at least one thread will still be active (in the case of YIELD).
For MIPS MT-ASE, the OS would generally only be concerned with Thread Contexts, but some optimizations are possible with a more complete knowledge of the actual hardware configuration and some correctness issues are possible if a Thread Context is treated as a Virtual Processing Element.
It might be helpful to have some background knowledge:
Processor
A processor could describe either a single execution core or a single physical multi-core chip. The context of use will define the meaning of the term. e.g Normal PC computer should only have one processor
Chips
A chip refers to a physical integrated circuit (IC) on a computer. A chip is usually referred to an execution unit that can be single- or multi-core technology.
Sockets
The socket refers to a physical connector on a computer motherboard that accepts a single physical chip. Many motherboards can have multiple sockets that can, in turn, accept multi-core chips. 
Cores
Since the advent of multi-core technology, such as dual-cores and quad-cores. Essentially a core comprises a logical execution unit containing an L1 cache and functional units. Cores are able to independently execute programs or threads. Supercomputers are listed as having thousands of cores.
Hyper-threading
Hyper-threading is an Intel technology that originally preceded multi-core systems, and was used to make a single core appear logically as multiple cores on the same chip. Hyper-threading improves performance by sharing the computational workload between multiple cores whenever possible, allowing the operating system to schedule more than one process at a time. For more, see Intel Hyper-Threading Technology. 
Physical/Logical Cores
sockets and cores
As shown in the picture, you have 2 sockets, and each socket has 4 cores, and each core can execute 4 threads currently(due to Hyper-threading). In this case, if you use command lscpu on Linux, you may see you have 32 CPUs. Actually, you have 1 chip, 2 sockets, 8 cores, and 32 CPUs (From Linux perspective)
I guess it referes to ALU (Arithmetic Logical Unit) of CPU.
The ALU unit of any proccessor is the part pf the CPU reasponsible for performing all the arithmetic and logical operations.

On x86-64, is the "movnti" instruction atomic?

On x86-64 CPUs (either Intel or AMD), is the "movnti" instruction that writes 4/8 bytes to a 32/64-bit aligned address atomic?
Yes, movnti is atomic on naturally-aligned addresses, just like all other naturally-aligned 8/16/32/64b stores (and loads) on x86. This applies regardless of memory-type (writeback, write-combining, uncacheable, etc.) See that link for the wording of the guarantees in Intel's x86 manual.
Note that atomicity is separate from memory ordering. Normal x86 stores are release-store operations, but movnt stores are "relaxed".
Fun fact: 32-bit code can use x87 (fild/fistp) or SSE/MMX movq to do atomic 64-bit loads/stores. gcc's std::atomic implementation actually does this. It's only SSE accesses larger than 8B (e.g. movaps or movntps 16B/32B/64B vector stores) that are not guaranteed atomic. (Even 16B operations are atomic are on some hardware, but there's no standard way to detect this).
seems clearly not:
Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation such as SFENCE should be used in conjunction with MOVNTI instructions if multiple processors might use different memory types to read/write the memory location.

How are the stack pointer and program status word maintained in multiprocessor architecture?

In a multi-processor architecture, how are registers organized?
For example, in a 4 cores processor, a minimum of 4 processes can run at a time.
How are stack pointer, program status registers and program counter organized?
What about other general purpose registers?
My guess is, each core will have a separate set of registers.
Imagine 4 completely separate computers, each with a single-core CPU. A 4-core computer is like that; except:
All CPUs share the same physical address space (and can all use the same RAM, PCI devices, etc)
Interrupt/IRQ controllers may be designed so the OS can tell it which CPU/s should be interrupted by the IRQ
CPUs are typically able to signal each other (e.g. "inter-processor interrupts")
Some CPUs may share some caches
Some CPUs may share some control registers (e.g. for things like power management, cache configuration, etc)
For modern CPUs, some CPUs may share some or all execution units (SMT, hyper-threading, etc)
For modern systems (where memory controller is built into the physical chip) some CPUs may share the same memory controller
Most of this is "invisible" to most software. Unless you're writing part of an OS that controls power management, you don't need to care if power management is shared between CPUs or not; unless you're writing an OSs/kernel's low level IRQ handling you don't need to care how IRQs reach device drivers, etc.
The same applies to how many CPUs actually exist. The OS/kernel normally ensures that applications only need to care about higher level abstractions (e.g. "threads"). How this higher level abstraction works depends on the OS - normally (for most OSs) the OS/kernel attempts to provide the illusion that all threads are running at the same time by switching between them "quickly enough" (where if there's only 4 CPUs a maximum of 4 threads actually do run at the exact same time), but it's usually far more complex than this (involving things like thread priorities, pre-emption rules, etc) and (even though it's relatively rare) it may be very different (e.g. for some systems the same thread may be run on multiple CPUs at the same time for fault tolerance/redundancy purposes; for some systems there might just be a queue of functions and their data, where multiple functions run at the same time; etc).
Multiprocessor means that there are at least two discrete processors on the same platform -- usually on the same motherboard
A subset is distributed multiprocessing, where two PC's for example are programmed to appear as a single system with two processors
Multicore means that the most or all of the CPU is replicated many times on single chip.
- this also means that stack, status, program counter and all generic purpose registers are replicated.
Hyperthreading is a technique, where each stage of the pipeline executes commands from different processes.
Multiprocessing means in OS level that everything a process consists of, is switched every now and then.
Multithreading is a lightweight variant of multiprocessing, where the threads e.g. share the same code segment and same data segment, same file descriptors etc. but have unique stacks (and of course unique status registers and program counters)
Also means multiprocessing in general (hardware architecture)