Communication between processor and high speed perihperal

Communication between processor and high speed perihperal - interface

Considering that a processor runs at 100 MHz and the data is coming to the processor from an external device/peripheral at the rate of 1000 Mbit/s (8 Bits/Clockcycle # 125 MHz), which is the best way to handle traffic that comes at a higher speed to the processor ?

First off, you can't do it in software. There would be no way to sample the digital lines at a sufficient rate, or to doing anything useful with it.
You need to use a hardware FIFO buffer or memory cell. When a data burst comes in, it can be buffered in the high speed FIFO and then read out as needed by the processor.
Drop in high speed FIFO chips are surprisingly expensive (though most are dual ported). To cut cost, you would be best off using an SRAM chip, and a hardware adder to increment the address lines on incoming data.

This is not an uncommon situation for software. semaj said the right word. This is a system engineering issue. Other folks have the right answer too. If you want to look at or process that data with the 100MHz processor, it is not going to happen, dont bother trying. You CAN look at snapshots of it or have the hardware filter out a specific percentage of it that you are looking for. At the end of the day though it is a systems issue, what does the hardware provide, where does it put this data, what is the softwares task for this data, does it see X buffers of data come in on the goesinta, and the notify the goesouta hardware that there are X buffers ready to go? Does the hardware examine and align the buffers so that you can look at a header, and then decide where to route the hardware? Once you do your system engineering you will know if you can use that processor or not, and if you can use it what its job is and how to do it.
Your direct question. What is the best way to handle it. The best way to handle it is to have hardware (fpga, asic, etc) move it into and out of some storage device (ram of some sort probably). Not necessarily the same ram the processor runs out of (DMA is a good thing to avoid). The hardware is something the software can talk to but you cannot examine all of that data so dont try. Without knowing what kind of data this is, what form, what the software looks at how much work you are willing to force the hardware to do, etc determines the rest of the answer. If you expect a certain (guaranteed) percentage to be bad or not belong to this processor, etc have the hardware filter that out and then what is left you can process.
Networking is a good example of this, PCs have gige ports but cannot process GigE line rate data. That is why we use switches now instead of hubs, hardware slices out a percentage of the data so the pc can handle it, the protocols take care of the data that cannot be processed by resending it later. And the switches processors dont look at all of the data, the hardware slices it up so the software can examine just the header. Or sometimes the software simply manages tables that drive the hardware and the hardware does all the work of processing the data.
Do your system engineering the answers will simply fall out.

You buffer it. Typically data from a device is written to a memory buffer (circular queue) using DMA (no cpu involved). The cpu reads from the memory buffer at a constant rate. Usually devices send data in bursts. This keeps the buffer from filling up. If there is too much data, buffer overflow.

DMA (direct memory access) is possibly the solution, however, it seems unlikely that the memory bus could run faster than the processor core, so the receiving peripheral would have to accept data into a larger register than 8 bit because 125MHz could not be sustained. For example a 16bit register would allow memory writes at 62.5MHz which may be achievable. Also the receiving device would have to be able to accept an external clock that is both faster and asynchronous to the core clock. Also of course the receiving peripheral must have support for DMA.
Unless you are more specific about your hardware and the communication protocol it is difficult to give anything other than a general answer.

Related

What are IO ports, serial ports and what's the difference between them?

I'm confused.
I have recently started working on building an operating system while using bochs as an emulator and a certain manual online.
In the manual to move the vga framebuffer cursor I'm using the IO ports using the command 'out'. I get how to control it but I don't know what is it that I'm controlling, and after some reading it seems like everywhere it was addressed as an abstract thing that for example makes the cursor to change its position on the screen.
What I want to know: what are they physically? are they cables? if yes from where to where they are connected? can I input from them also as there name suggest? and why do I need the out command and cant write directly to their place in the memory?
If in your answer you can also include the serial ports and the difference between them and the IO ones it will be amazing,
with respect,
revolution
(btw the operating system is 32 bits)

An IO port is basically memory on the motherboard that you can write/read. The motherboard makes some memory available other than RAM. The CPU has a control bus which allows it to "tell" the motherboard that what it outputs on the data bus is to be written somewhere else than RAM. When you output to the VGA buffer, you write to video memory on the motherboard. The out/in instructions are used to write/read IO ports instead of writing to RAM. When you use out/in instructions, you instruct the CPU to set a certain line on its control bus to tell the motherboard to write/read a certain byte to an IO port instead of RAM.
Today, a lot of RAM memory is used for hardware mapping instead of IO ports. This is often called the PCI hole. It is memory mapped IO. So you will write to RAM and it will send the data to hardware like graphics memory. All of this is transparent to OS developers. You are simply using very abstract hardware interfaces which are either conventional (open source) or proprietary.
Serial ports in the meantime are simply ports which are serial in nature. A serial port is defined to be a port where data is transferred one bit at a time. USB is serial (universal serial bus). VGA is serial and others are too. These ports are not like IO ports. You can output to them indirectly using IO ports.
IO ports offer various hardware interfaces which allow to drive hardware. For example, if you have a VGA compatible screen and set text mode, the motherboard will make certain IO ports available and, when you write to these IO ports, video memory will vary depending on what you output to these ports. Eventually, the VGA screen will refresh when the video controller will output data written to video memory through the actual VGA port. I'm not totally aware of how all of this works since I'm not an electrical engineer and I never read about this stuff. To what I know, you can see the pins of the VGA port and what they do independently on wikipedia. VGA works with RGBHV. RGB stands for red, green and blue while HV stand for horizontal/vertical sync. As stated on wiki in the article on analog television:
Synchronizing pulses added to the video signal at the end of every scan line and video frame ensure that the sweep oscillators in the receiver remain locked in step with the transmitted signal so that the image can be reconstructed on the receiver screen. A sync separator circuit detects the sync voltage levels and sorts the pulses into horizontal and vertical sync.
The horizontal synchronization pulse (horizontal sync, or HSync), separates the scan lines. The horizontal sync signal is a single short pulse which indicates the start of every line. The rest of the scan line follows, with the signal ranging from 0.3 V (black) to 1 V (white), until the next horizontal or vertical synchronization pulse.
Memory in itself takes various forms in hardware. Video memory is often called VRAM (Video RAM) or the Frame Buffer as you can read in a Wikipedia article. So in itself video memory is an array of DRAM. DRAM today is one capacitor (which stores the data) and one mosfet transistor (which controls the flow of the data). So you have special wiring on the motherboard between the data bus of the processor and the VRAM. When you output data to video memory, you write to VRAM on the motherboard. Where you write and how just depends on the video mode you set up.
Most modern systems work with HDMI/Display port along with graphics card. These graphics card are other hardware interfaces which are often complex and they often cannot be known because the drivers for the cards are provided by the manufacturers. osdev.org has information on Intel HD Graphics which has a special interface to interact with. It can be used to gather info on the monitor and to determine what RAM address to use to write to the monitor.

Is there a trade-off for memory to memory DMA transfer when the data size is small?

I am learning about the STM32 F4 microcontroller. I'm trying to find out about limitations for using DMA.
Per my understanding and research, I know that if the data size is small (that is, the device uses DMA to generate or consume a small amount of data), the overhead is increased because DMA transfer requires the DMA controller to perform operations, thereby unnecessarily increasing system cost.
I did some reaserch and found the following:
Limitation of DMA
CPU puts all its lines at high impedance state so that the DMA controller can then transfer data directly between device and memory without CPU intervention. Clearly, it is more suitable for device with high data transfer rates like a disk.
Over a serial interface, data is transferred one bit at a time which makes it slow to use DMA.
Is that correct? What else do I need to know?

DMA -CPU puts all its lines at high impedance state
I do not know where did you take it from - but you should not use this source any more.
Frequency of the DMA transfers do not matter unless you reach the the BUS throughput. you can transfer one byte per week, month, year, decade ..... and it is absolutely OK.
In the STM32 microcontrollers it is a very important feature as we can transfer data from/to external devices even if the uC is in low power mode with the core (CPU) sleeping. DMA controller can even wake up the core when some conditions are met.

As #Vinci and #0___________ (f.k.a. #P__J__) already pointed out,
A DMA controller works autonomously and doesn't create overhead on the CPU it supplements (at least not by itself). But:
The CPU/software must perform some instructions to configure the DMA and to trigger it or have it triggered by some peripheral. For this, it needs CPU time and program memory space (usually ROM). Besides, it usually needs some additional RAM in variables to manage the software around the DMA.
Hence, you are right, using a DMA comes with some kinds of overhead.
And furthermore,
The DMA transfers make use of the memory bus(es) that connect the involved memories/registers/peripherals to the DMA controller. That is, while the DMA controller does its own work, it may cause the CPU which it tries to offload to stall in the meantime, at least for short moments when the data words are transferred (which in turn sum up for longer transfers...).
On the other hand, a DMA doesn't only help you to reduce the CPU load (regarding total CPU time to implement some feature). If used "in a smart way", it helps you to reduce software latencies to implement different functions because one part of the implementation can be "hidden" behind the DMA-driven data transfer of another part (unless, both rely on the same bus resources - see above...).

The information is right in that using a DMA requires some development work and some runtime to manage the DMA transfer itself (see also
a related question
here), which may not be worth the benefits of using DMA. That is, for small portions of data one doesn't gain as much performance (or latency) as during big transfers. On embedded systems, DMA controllers (and their channels) are limited resources so it is important to consider which part of the function benefits from such a resource most. Therefore, one would usually prefer using DMA for the data transfers to/from disks (if it is about "payload data" such as large files or video streams) over slow serial connections.
The information is wrong, however, in that DMA is not worth using on serial interfaces as those only transfer a single bit at a time. Please note that microcontrollers (as your
STM32F4)
have built-in peripheral components that convert the serial bit-by-bit stream into a byte-by-byte or word-by-word stream, which can easily be tranferred by DMA in a helpful way - especially if the size of the packets is known in advance and software doesn't have to analyse a non-formatted stream. Furthermore, not every serial connection is "slow" at all. If the project uses, e. g., an SPI flash chip, then the SPI serial connection is the one used for data transfer.

Why do we need to specify the number of flash wait cycles?

Especially when working with "faster" devices like STMF4xx/F7xx we need to specify the number of flash wait cycles, based on the supply voltage and the sys-clock frequency.
When the CPU fetches instructions/or constants this is done over the FLITF. Am I right with the assumption that the FLITF holds a CPU request as long as it can provide the requested data, making it impossible for other Bus-Masters to access flash meanwhile.
If this was true, why should it be important to any interface to know flash wait cycles. Like Cache does preload instructions so or so, independent if it knows how long to wait, no?

Because the flash interface isn't magic.
It has to meet the necessary setup and hold times for addressing and reading out the flash cells, which will vary somewhat depending on voltage. Taking the STM32F411 as an example (because I have that TRM handy), doing some maths with the voltage/frequency/wait-state table implies that a read from flash on one of those takes in the order of ~30ns above 2.7V, down to ~60ns below 2.1V.
Since the flash interface doesn't have its own asynchronous nanosecond-precision timekeeping ability (because that would be needlessly complicated, power-hungry, and silly), that translates to asserting its signals for n clock cycles, after which it can assume the data signals from the cells are stable enough to read back*. How does it know what the clock frequency is, and therefore what n should be? Simple: you, as the programmer who set the clock, tell it. Some hardware things are just infinitely easier to let software deal with.
* and then going through the further shenanigans of extracting the relevant 8, 16 or 32 bits out of the 128-bit line it's read, to finally spit that out the other side onto the AHB bus to the waiting CPU, obviously.

Polling vs. Interrupts with slow/fast I/O devices

I'm learning about the differences between Polling and Interrupts for I/O in my OS class and one of the things my teacher mentioned was that the speed of the I/O device can make a difference in which method would be better. He didn't follow up on it but I've been wracking my brain about it and I can't figure out why. I feel like using Interrupts is almost always better and I just don't see how the speed of the I/O device has anything to do with it.

The only advantage of polling comes when you don't care about every change that occurs.
Assume you have a real-time system that measures the temperature of a vat of molten plastic used for molding. Let's also say that your device can measure to a resolution of 1/1000 of a degree and can take new temperature every 1/10,000 of a second.
However, you only need the temperature every second and you only need to know the temperature within 1/10 of a degree.
In that kind of environment, polling the device might be preferable. Make one polling request every second. If you used interrupts, you could get 10,000 interrupts a second as the temperature moved +/- 1/1000 of a degree.
Polling used to be common with certain I/O devices, such as joysticks and pointing devices.
That said, there is VERY little need for polling and it has pretty much gone away.

Generally you would want to use interrupts, because polling can waste a lot of CPU cycles. However, if the event is frequent, synchronous (and if other factors apply e.g. short polling times...) polling can be a good alternative, especially because interrupts create more overhead than polling cycles.
You might want to take a look at this thread as well for more detail:
Polling or Interrupt based method

What is responsible for changing core's load and frequency in multicore processor

Having looked for a description of the multicore design i keep finding several diagrams, but all of them look somewhat like this:
I know from looking at i7z command output that different cores can run at different frequencies.
This would suggest that the decisions regarding which core will be given a new process and for changing the frequency of the core itself are done either by the operating system or by the control block of the core itself.
My question is: What controls the frequencies of each individual core? Is the job of associating a READY process with the specific core placed upon the operating system or is it done by something within the processor.

Scheduling processes/threads to cores is purely up to the OS. The hardware has no understanding of tasks waiting to run. Maintaining the OS's list of processes that are runnable vs. waiting for I/O is completely a software thing.
Migrating a thread from one core to another is done by kernel code on the original core storing the architectural state to memory, then OS code on the new core restoring that saved state and resuming user-space execution.
Traditionally, frequency and voltage scaling decisions are made by the OS. Take Linux as an example: The decision-making code is called a governor (and also this arch wiki link came up high on google). It looks at things like how often processes have used their entire time slice on the current core. If the governor decides the CPU should run at a different speed, it programs some control registers to implement the change. As I understand it, the hardware takes care of choosing the right voltage to support the requested frequency.
As I understand it, the OS running on each core makes decisions independently. On hardware that allows each core to run at different frequencies, the decision-making code doesn't need to coordinate with each other. If running a high frequency on one core requires a high voltage chip-wide, the hardware takes care of that. I think the modern implementation of DVFS (dynamic voltage and frequency scaling) is fairly high-level, with the OS just telling the hardware which of N choices it wants, and the onboard power microcontroller taking care of the details of programming oscillators / clock dividers and voltage regulators.
Intel's "Turbo" feature, which opportunistically boosts the frequency above the max sustainable frequency, does the decision making in hardware. Any time the OS requests the highest advertised frequency, the CPU uses turbo when power and cooling allow.
Intel's Skylake takes this a step further: The OS can hand full control over DVFS to the hardware, optionally with constraints. That lets it react from microsecond to microsecond, rather than on a timescale of milliseconds. This does actually allow better performance in bursty workloads, because more power budget is available for turbo when it's useful. A few benchmarks are bursty enough to observe this, like some browser / javascript ones IIRC.
There was a whole talk about Skylake's new power management at IDF2015, check out the slides and/or archived webcast. The old method is described in a lot of detail there, too, to illustrate the difference, so you should really check it out if you want more detail than my summary. (The list of other IDF talks is here, thanks to Agner Fog's blog for the link)

The core frequency is controlled by a given voltage applied to a core's "oscillator".
This voltage can be changed by the Operating System but it can also be changed by the BIOS itself if a high temperature is detected in the CPU.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse