In segmented memory management the logical address is a pair (segment number, offset) which is then translated to physical address using segment table. I dont understand how this logical address is generated from instruction like JMP some_address.
Thanks
There are several different methods employed to implement segmented memory management. The entire concept of using segments was obsolete by the 1980's and only survived longer in legacy and poorly designed processors.
Normally, this is NOT correct:
In segmented memory management the logical address is a pair
In segmented memory, the address is a composed of an offset and a base location specified by a register.
Related
So I am currently learning Operating Systems and Programming.
I want how the registers work in detail.
All I know is there is the main memory and our CPU which takes address and instruction from the main memory by the help of the address bus.
And also there is something MCC (Memory Controller Chip which helps in fetching the memory location from RAM.)
On the internet, it shows register is temporary storage and data can be accessed faster than ram for registers.
But I want to really understand the deep-down process on how they work. As they are also of 32 bits and 16 bits something like that. I am really confused.!!!
I'm not a native english speaker, pardon me for some perhaps incorrect terminology. Hope this will be a little bit helpful.
Why does registers exists
When user program is running on CPU, it works in a 'dynamic' sense. That is, we should store incoming source data or any intermediate data, and do specific calculation upon them. Memory devices are needed. We have a choice among flip-flop, on-chip RAM/ROM, and off-chip RAM/ROM.
The term register for programmer's model is actually a D flip-flop in the physical circuit, which is a memory device and can hold a single bit. An IC design consists of standard cell part (including the register mentioned before, and and/or/etc. gates) and hard macro (like SRAM). As the technology node advances, the standard cells' delay are getting smaller and smaller. Auto Place-n-Route tool will place the register and the related surrounding logic nearby, to make sure the logic can run at the specified 3.0/4.0GHz speed target. For some practical reasons (which I'm not quite sure because I don't do layout), we tend to place hard macros around, leading to much longer metal wire. This plus SRAM's own characteristics, on-chip SRAM is normally slower than D flip-flop. If the memory device is off the chip, say an external Flash chip or KGD (known good die), it will be further slower since the signals should traverse through 2 more IO devices which have much larger delay.
how they work together with cpu
Each register is assigned a different 'address' (which maybe not open to programmer). That is implemented by adding address decode logic. For instance, when CPU is going to execute an instruction mov R1, 0x12, the address decode logic sees the binary code of R1, and selects only those flip-flops corresponding to R1. Then data 0x12 is stored (written) into those flip-flops. Same for read process.
Regarding "they are also of 32 bits and 16 bits something like that", the bit width is not a problem. Both flip-flops and a word in RAM can have a bit width of N, as long as the same address can select N flip-flops or N bits in RAM at one time.
Registers are small memories which resides inside the processor (what you called CPU). Their role is to hold the operands for fast processor calculations and to store the results. A register is usually designated by a name (AL, BX, ECX, RDX, cr3, RIP, R0, R8, R15, etc.) and has a size which is the number of bits it can store (4, 8, 16, 32, 64, 128 bits). Other registers have special meanings, and their bits control the state or provide information about the state of the processor.
There are not many registers (because they are very expensive). All of them have a capacity of only a few kilobytes, so they can't store all the code and data of your program, which can go up to gigabytes. This is the role of the central memory (what you call RAM). This big memory can hold gigabytes of data and each byte has its address. However, it only holds data when the computer is turned on. The RAM reside outside of the CPU Chip and interacts with him via Memory Controller Chip which stands as interface between CPU and RAM.
On top of that, there is the hard drive that stores your data when you turn off your computer.
That is a very simple view to get you started.
I am trying to understand both paradigms of memory management;however, I fail to see the big picture and the difference between both. Paging consists of taking fixed size pages from a secondary to a primary storage in order to do some task requested by a process. Segmentation consists of assigning to each unit in a process an address space, so they are allowed to grow. I don't quiet see how they are related and that's because there are still a lot of holes in my understanding. Can someone fill them up?
I think you have something confused. One problem you have is that the term "segment" had multiple meanings.
Segmentation is a method of memory management. Memory is managed in segments that are of variable or fixed length, depending upon the processor. Segments originated on 16-bit processors as a means to access more than 64K of memory.
On the PDP-11, programmers used segments to map different memory into the 64K address space. At any given time a process could only access 64K of memory but the memory that made up that 64K could change.
The 8086 and it successors used segments with base registers. Each segment could have 64K (that grew with the processors) but a process could have 4 segments (more in later processors).
Paging allows a process to have a larger address space than there is physical memory available.
The 8086's successors used the kludge of paging on top of segments. However, that bit of ugliness has finally gone away in 64-bit mode.
You got your answer right there, paging relates with fixed size pages in a storage while segmentation deals with units in a page. 'Segments' are objects in the class 'Page'
Most OSes use paging for virtual memory. Why is this? Why not use segmentation? Is it just because of a hardware issue? Is one better than the other in certain cases? Basically, if you had to choose one over the other, which one would you want to use and why?
Let's assume it's an x86 for argument's sake.
OS like windows and Linux use a combination of both segmentation and paging. The virtual memory of a process is first divided into segments and then each segment consists of a lot of pages. The OS first goes to the specific segment and in that segment it then locates the particular page to access an address
Taken from :operating systems concepts by galvin
one of the issues..
Segmentation permits the physical address space of a process to be non-
contiguous. Paging is another memory-management scheme that offers this
advantage. However, paging avoids external fragmentation and the need for compaction, whereas segmentation does not.
Segmentaion problem:
The problem arises because, when code fragments
or data residing in main memory need to be swapped out, space must be found
on the backing store. The backing store has the same fragmentation problems
but access is much slower, so compaction is impossible.
Paging solves it by:
The basic method for implementing paging involves breaking physical memory into fixed-sized blocks called frames and breaking logical memory into
blocks of the same size called pages.The backing store is divided into fixed-sized blocks that are the same size as the memory frames or clusters of multiple frames.
Since pages-frames-The backing store all are divided into same size so it doesn't lead to external fragmentation. But may have internal fragmentation.
So pagesize must be chosen correctly
Operating Systems concepts
Note, that Single-Address-Space Operating Systems sometimes use segmentation to isolate processes.
Register variables are a well-known way to get fast access (register int i). But why are registers on the top of hierarchy (registers, cache, main memory, secondary memory)? What are all the things that make accessing registers so fast?
Registers are circuits which are literally wired directly to the ALU, which contains the circuits for arithmetic. Every clock cycle, the register unit of the CPU core can feed a half-dozen or so variables into the other circuits. Actually, the units within the datapath (ALU, etc.) can feed data to each other directly, via the bypass network, which in a way forms a hierarchy level above registers — but they still use register-numbers to address each other. (The control section of a fully pipelined CPU dynamically maps datapath units to register numbers.)
The register keyword in C does nothing useful and you shouldn't use it. The compiler decides what variables should be in registers and when.
Registers are a core part of the CPU, and much of the instruction set of a CPU will be tailored for working against registers rather than memory locations. Accessing a register's value will typically require very few clock cycles (likely just 1), as soon as memory is accessed, things get more complex and cache controllers / memory buses get involved and the operation is going to take considerably more time.
Several factors lead to registers being faster than cache.
Direct vs. Indirect Addressing
First, registers are directly addressed based on bits in the instruction. Many ISAs encode the source register addresses in a constant location, allowing them to be sent to the register file before the instruction has been decoded, speculating that one or both values will be used. The most common memory addressing modes indirect through a register. Because of the frequency of base+offset addressing, many implementations optimize the pipeline for this case. (Accessing the cache at different stages adds complexity.) Caches also use tagging and typically use set associativity, which tends to increase access latency. Not having to handle the possibility of a miss also reduces the complexity of register access.
Complicating Factors
Out-of-order implementations and ISAs with stacked or rotating registers (e.g., SPARC, Itanium, XTensa) do rename registers. Specialized caches such as Todd Austin's Knapsack Cache (which directly indexes the cache with the offset) and some stack cache designs (e.g., using a small stack frame number and directly indexing a chunk of the specialized stack cache using that frame number and the offset) avoid register read and addition. Signature caches associate a register name and offset with a small chunk of storage, providing lower latency for accesses to the lower members of a structure. Index prediction (e.g., XORing offset and base, avoiding carry propagation delay) can reduce latency (at the cost of handling mispredictions). One could also provide memory addresses earlier for simpler addressing modes like register indirect, but accessing the cache in two different pipeline stages adds complexity. (Itanium only provided register indirect addressing — with option post increment.) Way prediction (and hit speculation in the case of direct mapped caches) can reduce latency (again with misprediction handling costs). Scratchpad (a.k.a. tightly coupled) memories do not have tags or associativity and so can be slightly faster (as well as have lower access energy) and once an access is determined to be to that region a miss is impossible. The contents of a Knapsack Cache can be treated as part of the context and the context not be considered ready until that cache is filled. Registers could also be loaded lazily (particularly for Itanium stacked registers), theoretically, and so have to handle the possibility of a register miss.
Fixed vs. Variable Size
Registers are usually fixed size. This avoids the need to shift the data retrieved from aligned storage to place the actual least significant bit into its proper place for the execution unit. In addition, many load instructions sign extend the loaded value, which can add latency. (Zero extension is not dependent on the data value.)
Complicating Factors
Some ISAs do support sub-registers, notable x86 and zArchitecture (descended from S/360), which can require pre-shifting. One could also provide fully aligned loads at lower latency (likely at the cost of one cycle of extra latency for other loads); subword loads are common enough and the added latency small enough that special casing is not common. Sign extension latency could be hidden behind carry propagation latency; alternatively sign prediction could be used (likely just speculative zero extension) or sign extension treated as a slow case. (Support for unaligned loads can further complicate cache access.)
Small Capacity
A typical register file for an in-order 64-bit RISC will be only about 256 bytes (32 8-byte registers). 8KiB is considered small for a modern cache. This means that multiplying the physical size and static power to increase speed has a much smaller effect on the total area and static power. Larger transistors have higher drive strength and other area-increasing design factors can improve speed.
Complicating Factors
Some ISAs have a large number of architected registers and may have very wide SIMD registers. In addition, some implementations add additional registers for renaming or to support multithreading. GPUs, which use SIMD and support multithreading, can have especially high capacity register files; GPU register files are also different from CPU register files in typically being single ported, accessing four times as many vector elements of one operand/result per cycle as can be used in execution (e.g., with 512-bit wide multiply-accumulate execution, reading 2KiB of each of three operands and writing 2KiB of the result).
Common Case Optimization
Because register access is intended to be the common case, area, power, and design effort is more profitably spent to improve performance of this function. If 5% of instructions use no source registers (direct jumps and calls, register clearing, etc.), 70% use one source register (simple loads, operations with an immediate, etc.), 25% use two source registers, and 75% use a destination register, while 50% access data memory (40% loads, 10% stores) — a rough approximation loosely based on data from SPEC CPU2000 for MIPS —, then more than three times as many of the (more timing-critical) reads are from registers than memory (1.3 per instruction vs. 0.4) and
Complicating Factors
Not all processors are design for "general purpose" workloads. E.g., processor using in-memory vectors and targeting dot product performance using registers for vector start address, vector length, and an accumulator might have little reason to optimize register latency (extreme parallelism simplifies hiding latency) and memory bandwidth would be more important than register bandwidth.
Small Address Space
A last, somewhat minor advantage of registers is that the address space is small. This reduces the latency for address decode when indexing a storage array. One can conceive of address decode as a sequence of binary decisions (this half of a chunk of storage or the other). A typical cache SRAM array has about 256 wordlines (columns, index addresses) — 8 bits to decode — and the selection of the SRAM array will typically also involve address decode. A simple in-order RISC will typically have 32 registers — 5 bits to decode.
Complicating Factors
Modern high-performance processors can easily have 8 bit register addresses (Itanium had more than 128 general purpose registers in a context and higher-end out-of-order processors can have even more registers). This is also a less important consideration relative to those above, but it should not be ignored.
Conclusion
Many of the above considerations overlap, which is to be expected for an optimized design. If a particular function is expected to be common, not only will the implementation be optimized but the interface as well. Limiting flexibility (direct addressing, fixed size) naturally aids optimization and smaller is easier to make faster.
Registers are essentially internal CPU memory. So accesses to registers are easier and quicker than any other kind of memory accesses.
Smaller memories are generally faster than larger ones; they can also require fewer bits to address. A 32-bit instruction word can hold three four-bit register addresses and have lots of room for the opcode and other things; one 32-bit memory address would completely fill up an instruction word leaving no room for anything else. Further, the time required to address a memory increases at a rate more than proportional to the log of the memory size. Accessing a word from a 4 gig memory space will take dozens if not hundreds of times longer than accessing one from a 16-word register file.
A machine that can handle most information requests from a small fast register file will be faster than one which uses a slower memory for everything.
Every microcontroller has a CPU as Bill mentioned, that has the basic components of ALU, some RAM as well as other forms of memory to assist with its operations. The RAM is what you are referring to as Main memory.
The ALU handles all of the arthimetic logical operations and to operate on any operands to perform these calculations, it loads the operands into registers, performs the operations on these, and then your program accesses the stored result in these registers directly or indirectly.
Since registers are closest to the heart of the CPU (a.k.a the brain of your processor), they are higher up in the chain and ofcourse operations performed directly on registers take the least amount of clock cycles.
I'm reading "Understanding Linux Kernel". This is the snippet that explains how Linux uses Segmentation which I didn't understand.
Segmentation has been included in 80 x
86 microprocessors to encourage
programmers to split their
applications into logically related
entities, such as subroutines or
global and local data areas. However,
Linux uses segmentation in a very
limited way. In fact, segmentation
and paging are somewhat redundant,
because both can be used to separate
the physical address spaces of
processes: segmentation can assign a
different linear address space to each
process, while paging can map the same
linear address space into different
physical address spaces. Linux prefers
paging to segmentation for the
following reasons:
Memory management is simpler when all
processes use the same segment
register values that is, when they
share the same set of linear
addresses.
One of the design objectives of Linux
is portability to a wide range of
architectures; RISC architectures in
particular have limited support for
segmentation.
All Linux processes running in User
Mode use the same pair of segments to
address instructions and data. These
segments are called user code segment
and user data segment , respectively.
Similarly, all Linux processes running
in Kernel Mode use the same pair of
segments to address instructions and
data: they are called kernel code
segment and kernel data segment ,
respectively. Table 2-3 shows the
values of the Segment Descriptor
fields for these four crucial
segments.
I'm unable to understand 1st and last paragraph.
The 80x86 family of CPUs generate a real address by adding the contents of a CPU register called a segment register to that of the program counter. Thus by changing the segment register contents you can change the physical addresses that the program accesses. Paging does something similar by mapping the same virtual address to different real addresses. Linux using uses the latter - the segment registers for Linux processes will always have the same unchanging contents.
Segmentation and Paging are not at all redundant. The Linux OS fully incorporates demand paging, but it does not use memory segmentation. This gives all tasks a flat, linear, virtual address space of 32/64 bits.
Paging adds on another layer of abstraction to the memory address translation. With paging, linear memory addresses are mapped to pages of memory, instead of being translated directly to physical memory. Since pages can be swapped in and out of physical RAM, paging allows more memory to be allocated than what is physically available. Only pages that are being actively used need to be mapped into physical memory.
An alternative to page swapping is segment swapping, but it is generally much less efficient given that segments are usually larger than pages.
Segmentation of memory is a method of allocating multiple chunks of memory (per task) for different purposes and allowing those chunks to be protected from each other. In Linux a task's code, data, and stack sections are all mapped to a single segment of memory.
The 32-bit processors do not have a mode bit for disabling
segmentation, but the same effect can be achieved by mapping the
stack, code, and data spaces to the same range of linear addresses.
The 32-bit offsets used by 32-bit processor instructions can cover a
four-gigabyte linear address space.
Aditionally, the Intel documentation states:
A flat model without paging minimally requires a GDT with one code and
one data segment descriptor. A null descriptor in the first GDT entry
is also required. A flat model with paging may provide code and data
descriptors for supervisor mode and another set of code and data
descriptors for user mode
This is the reason for having a one pair of CS/DS for kernel privilege execution (ring 0), and one pair of CS/DS for user privilege execution (ring 3).
Summary: Segmentation provides a means to isolate and protect sections of memory. Paging provides a means to allocate more memory that what is physically available.
Windows uses the fs segment for local thread storage.
Therefore, wine has to use it, and the linux kernel needs to support it.
Modern operating systems (i.e. Linux, other Unixen, Windows NT, etc.) do not use the segmentation facility provided by the x86 processor. Instead, they use a flat 32 bit memory model. Each user mode process has it's own 32 bit virtual address space.
(Naturally the widths are expanded to 64 bits on x86_64 systems)
Intel first added segmentation on the 80286, and then paging on the 80386. Unix-like OSes typically use paging for virtual memory.
Anyway, since paging on x86 didn't support execute permissions until recently, OpenWall Linux used segmentation to provide non-executable stack regions, i.e. it set the code segment limit to a lower value than the other segment's limits, and did some emulation to support trampolines on the stack.