Using an AT28C256 as non-volatile SRAM for a Z80 - eeprom

I've been using an AT28C256 as EEPROM 'ROM' for a Z80 project quite successfully. As the AT28C256 can be programmed at 5V using the /WE pin, I was thinking about also using it as a form of non-volatile SRAM, rather than adding another chip.
Yes, the AT28C256 is only 32kB in size, so I'm not using the whole 16-bit address space on the Z80 - but I wanted to know if this is possible?
Could I just OR the /MREQ and /WR lines on the Z80 together for the /WE on the AT28C256? Or am I missing something?
I could then set my Stack Pointer (SP) to the 32k boundary, rather than the usual 0xFFFF.

You can use an EEPROM like a RAM, but only if you take its behavior into account.
You can simply connect:
Z80-/MREQ to EEPROM-/CE, but you will need to gate this
Z80-/WR to EEPROM-/WE
Z80-/RD to EEPROM-/OE
Things to consider, consult the data sheet for details:
If you write a byte (or use the page write algorithm) the EEPROM will not output the stored values if you read it, until the self-timed write cycle has passed.
The write cycle is about some milliseconds long.
The EEPROM might fail after a few 10k write cycles (Thanks, Stefan Paul Noack).
You can't use it for the program that changes the chip's contents, because of point 1.
You can't use it for the stack or any other data that needs to be stored and retrieved quickly, because of point 2.
However, you can use it for the application's data. But you will need another memory for the program to run.
And if your program needs a stack or other variables to be written quickly, you will need an additional RAM. (Note: I remember a Z80 application that implemented a printer queue with just simple DRAM, using only the CPU's registers for the program's variables, and using the DRAM only for the data to buffer.)
To have multiple chip's as memory, you will need to gate the /CE-pins of these memories depending on their address range.

Related

What causes dma_map_page/dma_unmap_page() to take longer time on some hardware?

I've been programming a Linux kernel module for several years for a PCIe device. One of the main feature is to transfer data from the PCIe card to the host memory using DMA.
I'm using streaming DMA, i.e. it's the user program that allocates the memory, and my kernel module has to do the job of locking the pages and creating the scatter gather structure. It works correctly.
However, when used on some more recent hardware with Intel processors, the function calls dma_map_page() and dma_unmap_page() are taking much longer time to execute.
I've tried to use dma_map_sg() and dma_unmap_sg(), it takes approximately the same longer-time.
I've tried to split the dma_unmap_sg() into a first call to dma_sync_for_cpu(), followed by the call to dma_unmap_sg_attr() with attribute DMA_ATTR_SKIP_CPU_SYNC. It works correctly. And I can see the additional time is spend on the unmap operation, not for the sync.
I've tried to play with the Linux kernel command line parameters relating to the iommu (on, force, strict=0), and also intel_iommu, with no change in the behavior.
Some other hardware show a decent transfer rate, i.e. more than 6GB/s on PCIe3x8 (max 8GB/s).
The issue on some recent hardware is limiting transfer rate to ~3GB/s (I've checked that the card is correctly configured for PCIe3x8, and the programmer of the Windows device driver manages to achieve the 6GB/s on the same system. Things are more behind the curtains in Windows and I cannot get much information from it.)
On some hardware, the behavior is either normal or slowed, depending on the Linux distribution (and the Linux kernel version I guess). On some other hardware, the roles are reversed, i.e. the slow one becomes the fast one and vice-versa.
I cannot figure out the cause of this. Any clue?

What exactly is a machine instruction?

The user's program in main memory consists of machine instructions and
data. In contrast, the control memory holds a fixed microprogram that
cannot be altered by the occasional user. The microprogram consists of
microinstructions that specify various internal control signals for
execution of register microoperations. Each machine instruction
initiates a series of micro instructions in control memory. These
microsinstructions generates microoperations to fetch the instruction
for main memory; to evaluate the effective address, to execute the
operation specified by the instruction, and to return control the
fetch phase in order to repeat the cycle for the next instruction
I don't exactly understand here the difference between machine instruction, microinstruction and micropeerations. i certainly do understand that microinstructions according to the paragraph given are the intermediate level of instructions but which of the other 2 is the one that is more close to the machine language. Are CLA, ADD, STA, BUN, BSA, AND etc machine instructions or microoperations?
A CPU presents itself to the outside as a device capable of executing machine instructions. For example,
mov (%esi,%ebx,4), %edx
is a machine instruction that moves 4 bytes of data at address ESI+4*EBX into register EDX. Machine instructions are public - they are published by CPU manufacturer in a user manual. Compilers such as gcc will output files that contain machine instructions, and these will typically end up in EXE/DLL files.
If you look closely at the above instruction, you will see that it is a fairly complex operation. It involves some arithmetic (multiplying and addition) to get the memory address, then moving data from that address into a register. From CPU's perspective, it would also make sense to use the arithmetical unit that is already there. So it makes natural sense to break down this instruction into microinstructions. In essence, mov instruction is implemented internally by CPU as a microprogram written in microinstructions. This is, however, an implementation detail of a CPU. Microinstructions are internal to CPU and they are invisible to anybody except to CPU manufacturer.
Microinstructions have several benefits:
they simplify internal CPU architecture, design and testing, thus lowering cost per unit
they make it easy to create rich and powerful sets of machine instructions (you just have to combine microinstrcutions in different ways)
they provide a consistent machine language across different CPUs (e.g. Xeon and Pentium both implement basic x86_64 instruction set even though they are very different in hardware)
create optimizations (i.e. the same instruction on one CPU can be implemented by a hardware, the other can be emulated in microinstructions)
fix bugs (e.g. you can fix Spectre vulnerability while the machine is running and without buying a new CPU and opening your server)
For more information, see https://en.wikipedia.org/wiki/Micro-operation
I think the answer to your question is in these three sentences:
The user's program in main memory consists of machine instructions and data
Each machine instruction initiates a series of micro-instructions in control memory.
These micro-instructions generate micro-operations.
So:
The user supplies machine instructions
Those get translated into micro-instructions
Those get translated into micro-operations
The mnemonics you mentioned are what the user might use to write or read a list of machine instructions (the actual instructions just being patterns of bits understood by the processor). The "occasional user" (i.e. everyone other than the chip's designer) never needs to deal directly in micro-instructions or micro-operations, so would never know individual names for them.

Operating system file system block size?

I recently have this question for my homework and I have trouble figuring it out. I tried searching online, but I can't seem to find any answers.
" Some file systems use two block sizes for disk storage allocation,
for example, 4- Kbyte and 512-byte blocks. Thus, a 6 Kbytes file can
be allocated with a single 4- Kbyte block and four 512-byte blocks.
Discuss the advantage of this scheme compared to the file systems that
use one block size for disk storage allocation. "
So are more blocks better?
Any help? thanks in advance.
You can't have a big amount of different block sizes, that would be hell to implement and manage. I also think that some hardware limitations restrain what sizes you can use.
Now the thing is, unless the amount of data you wish to store fits exactly in all the blocks you are using, then some space is going to be wasted in the last block.
For example, if your block is one gygabyte long (hypothetically speaking), and you want to store a 1 or 2 bytes long file, you've just wasted nearly a gigabyte of disk space. All information is stored as blocks. You can't store half a block.
Long blocks make for better performance, though, since the disk may spend more time fetching information from a block before proceeding to the next one. Also it's less blocks to track and manage.
Linux is a fun operating system to play with because it can work with so many different file systems (as far as I remember you only get some variations of FAT and NTFS with Windows). You could read more about file system on this link:
Linux System Administrators Guide: Chapter 5. Using Disks and Other Storage Media
See section 5.10.5 for more info on advantages and disadvantages of small and big block sizes.
So back to your question: having different block sizes like that allows you to optimize storage. You can minimize wasted space by switching to smaller blocks by the end of the file, while having as few blocks as possible to reduce I/O times.

Writing to hard disk from contiguous physical memory

I have an ARM based device, running linux, which is connected to a camera, and I'm trying to store captured frames to HD efficiently.
I'm developing in user space, but can modify drivers at will
I'm coding in C
Frames which are written into memory using DMA, and I have their physical memory pointer.
I am able to control all the frame capturing flow, and I can tell when the frame buffers are stable (dqueued from the video4linux driver)
Linux version is 3.0.35
I'm familiar with kernel source code, not an expert, but I'm able to find my way in it and figure out things, as long as I get some hints...
I believe I have 2 alternatives:
Find the optimal configuration for my filesystem, for opening the file and writing into it. I'm now using ext4, and standard fopen() fwrite() functions. I understand I can also use mmap, or add O_DIRECT flag when calling open(), but didn't try it yet.
Find a way to pass the physical address of the buffer (I can get it
from my Video4Linux driver) directly to the filesystem/hard drive driver,
so the data will be transfered directly from there.
I found method 1 to be slow, having memory transactions as my bottleneck, since fwrite involves copying data from userspace to kernel space, and then again into some sort of cache, and then on to DMA. Too many memory transactions for a simple store...
Regarding method 2 - I don't know if that's possible, but if I was the one designing this system from scratch, this is what I would do.
Any thoughts?
Regarding method 1 (using open() and write(), mmap() and/or O_DIRECT)
can you recommend an optimal settings for my purpose?
Is method 2 (storing to HD directly from an existing DMA buffer) possible? If so - can you point me to an example?
the only problem with writing into a file via mmap on UNIXs, is that you either have to deal with signals in case of out-of-disk-space
or you have make certain that the file is not sparse
and thus all needed disk space is already allocated.
I think an uptodate G++ provides a method of converting signals into C++ exception handling,
but I'm not certain how supported this is on other systems than mac-os.

Is it true that CPU never fetches anything from memory directly?

I hear that cpu just fetches instruction from the EIP register,never fetches from memory directly.
But AFAIK,EIP just stores the address of the next instruction,the instruction itself is still in the memory.If CPU never fetches memory,how can it know what the next instruction actually is?
UPDATE
BTW,I know there're x86,x64,x87 architectures,but which does x86-64 belong to,x86 or x64??
The simple answer to your question is "no, it's not true".
The picture isn't very simple due to caching, instruction pipeline, branch prediction etc. However, the instruction pointer is just that, a pointer. It doesn't store opcodes.
EIP (Extended Instruction Pointer) should hold the address of the instruction. It's just a way to keep a tab of which instruction is being processed currently (or sometimes, which instruction to process next).
The instructions themselves are stored in the Memory (HDD, RAM, Cache) and need to be fetched by the CPU.
Maybe what you heard meant that since so many levels of caches are used generally it's quite rare that the fetch needs to access the RAM..
Well I don't know the point to your question.
Yes the CPU (in a broad sense of the word) does fetch from memory. It has a number of memory management devices (for cache line handling and pipelining). In fact, the 'pipeline' puts the instructions in L1 cache. Indeed, the instruction processor itself only fetches from there. The processor in reality probably never even looks at EIP (unless an instruction uses it directly as an operand).
So the real answer would be, find yourself a wikipedia articale on i86 processor design, and have a ball. You'll be able to know exactly what happens where.
Cheers
Not true in that way. CPU accesses memory thru the cache, so you can kinda say that it does not do it directly. (Also DMA cahnnel can transfer data between memory and IO without ever touching CPU).
Yes, CS:EIP points to the memory, to the next instruction to execute, but you can use direct addresses too for example (load the content of the address 0x0800 to the AX register, by default this is relative to DS segment):
MOV AX,[0x0800]