How does a computer process input of a moving mouse? - mouse

Are there registers involved or is it cache memory related?
An illustrative example for my question which perhaps is simple enough, I move my mouse across this screen I am currently typing on. I don;t click on anything, I just move the arrow left to right and up and down. How does the CPU handle the position changes of my mouse in relation to the monitors display which seems instantaneous?
Edit: I understand that this is more handled by the Operating system as a mouse is an external device and the CPU just calculates values and does logic. the mouse moves and on every clock signal the operating system gets an interrupt and handles it appropriately.

When you move/click your mouse, it generates an interrupt. An interrupt is basically a way to tell the cpu that an event has happened that needs to processed. The kernel will then run its interrupt handler to process the mouse events.
For example, the PS/2 mouse communicates by means of a 3-byte packet:
-----------------------------------------------
Byte 1 | YV | XV | YS | XS | 1 | MB | RB | LB |
-----------------------------------------------
Byte 2 | X movement |
-----------------------------------------------
Byte 3 | Y movement |
-----------------------------------------------
The MB, RM, LB flags represent the Middle, Right and Left button clicks.
The kernel will then eventually pass these events onto the application that is running.
For example, in Linux, the X Window Server is the process that handles mouse events. Individual graphical applications are informed of them through a generic X event protocol.
Registers and cache memory are always involved when running code. The kernel interrupt handlers are optimized to quickly process the interrupts and pass it on. The change is seen as near instantaneous because cpu's are extremely fast. Processors work with nanosecond resolution and there are a billion nanoseconds to every second.

Related

What are buffers used for in the construction of the D latch?

I am reading a book Digital Design and Computer Architecture and in the chapter on the D trigger at the transistor level it says "A compact D latch can be constructed from a single transmission gate" and the following is an example of building a latch using this and buffers.
I have a few questions:
How is it that a latch can be built from a single transmission gate, if the Latch is a memory cell that should consist of two looped elements and store the state, and not just pass through a bit of information on a clock pulse.
What are buffers used for when building a D trigger? I couldn't figure it out from what was written in the book. Can you explain this point in a little more detail? And why do they all invert the passing values?
Figure 3.12 (a). D latch constructed from a single transmission gate
Figure 3.12 (b) 12 transistor D latch
Figure 3.13 D-trigger
Figure 3.12
Figure 3.13
What are buffers used for when building a D trigger?
A latch is always and continuously open to respond to inputs that can change the value the latch stores.
However, many designs timed and are clocked, so that means we want to accept changes (requests to store potentially new value) only at the clock edge boundary but otherwise hold the latch with current state as is, as well as hold its output steady.  Clocked designs are tuned/timed so that the combinational (non-sequential) circuitry in between storage completes (just) before the next clock boundary and so can be recorded in registers and the next cycles go forth.  The general concept here is called an Edge Triggered Latch, which is also known as a Flip Flop used in clocked designs.
In order to limit the time period allowed for change of a latch, we add extra circuitry in front of the latch, the effect of this circuitry is to allow inputs to go through on certain time periods and otherwise suppress inputs for the others — allowing change only once per cycle, e.g. at a positive clock edge.
The extra circuitry added can be either a 2nd latch or pulse trigger.  These operate differently and have different advantages and disadvantages.
The 2nd latch approach typically always has one latch in the accepting change state and the other in the opposite state, the storage state (i.e. ignoring input changes).  The states of the two latches reverse every single clock edge (e.g. on the half clock).  Because of this, data transmits from one latch to another only at clock edge boundaries, and with two latches put together, we can make devices that accept input only on rising edge (or only on falling edge) of the clock, e.g. once per full clock cycle.
We might call the extra latch a buffer.
In pulse-triggered designs, we clip short the clock signal going to the latch so that it doesn't last as long as the full half cycle of the clock, and latch sees only a quick blip instead, as another approach to limiting the period of change.
For more information on the variations in Flip Flop designs and their trade offs see this text:
http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s01/Lectures/lecture22-flipflops.pdf
The simple latch is actually acting like a sample and hold for analog signals. The memory is held by a capacitor holding the voltage or logic level. It doesn't pass through when in hold state, because the input is actually disconnected from the latching (holding) capacitor when in that state.
The buffer is there to ensure a minimal load to the output of the latching (holding) capacitor. The buffer is in an inverted mode, and the input to the latch capacitor also is an inverter, hence canceling the inversion.

If the PC register is simultaneously read and written, does its read data contain the previous data or the newly-written data?

If the PC register is simultaneously read and written, does its read data contain the previous data or the newly-written data? Based on my understanding of sequential circuits, the effect of the write command does not instantly take effect in the PC register due to propagation delay so, at the rising edge of the clock, the read command will get the old value. But corollary to my question is if this is the case, shouldn't the read command would also have a delay in some sense and could possibly read the newly-written data?
A program counter is normally special enough that it's not part of a register file with other registers. You don't have a "read command", its output is just always wired up to other parts that read it when appropriate. (i.e. when its output is stable and has the value you want). e.g. see various block diagrams of MIPS pipelines, or non-pipelined single-cycle or multi-cycle designs.
You'd normally build such a physical register out of edge-triggered flip-flops, I think. (https://en.wikipedia.org/wiki/Flip-flop_(electronics)). Note that a D flip-flop does latch the previous input as the current output on a clock edge, and then the input is allowed to change after that.
There's a timing window before the clock edge where the input has to remain stable, it can start to change a couple gate delays after. Note the example of a shift register built by chaining D flip-flops all with the same clock signal.
If you have a problem arranging to capture a value before it starts changing, you could design in some intentional clock skew so the flip-flop reliably latches its input before you trigger the thing providing the input to change it. (But normally whatever you're triggering will itself have at least a couple gate delays before its output actually changes, hence the shift-register made of chained D flip-flops.)
That wiki article also mentions the master-slave edge-triggered D Flip-Flop that chains 2 gated (not clocked) D latches with an inverted clock, so capturing the input happens on the opposite clock edge from updating the output with the previously-captured data.
By comparison and for example, in register files for general-purpose registers in classic RISC pipelines like MIPS, IIRC it's common to build them so write happens in the first half-cycle and read happens in the second half-cycle of the ID stage. (So write-back can "forward" to decode/fetch through the register file, keeping the window of bypass-forwarding or hazards shorter than if you did it in the other order.)
This means the write data has a chance to stabilize before you need to read it.
Overall, it depends how you design it!
If you want the same clock edge to update a register with inputs while also latching the old value to the output, you a master-slave flip-flop will do that (capture the old input into internal state, and latch the old internal state onto the outputs).
Or you could design it so the input is captured on the clock edge, and propagates to the output after a few gate delays and stays latched there for the rest of this clock cycle (or half cycle). That would be a single D flip-flop (per bit).

ADXL345 Accelerometer data use on I2C (Beaglebone Black)

Background Information
I am trying to make sure I will be able to run two ADXL345 Accelerometers on the same I2C Bus.
To my understanding, the bus can transmit up to 400k bits/s on fast mode.
In order to send 1 byte of data, there are 20 extra bits of overhead.
There are 6 bytes per accelerometer reading (XLow, XHigh, YLow, YHigh, ZLow, ZHigh)
I need to do 1000 readings per second with both accelerometers
Thus,
My total data used per second is 336k bits/s which is within my limit of 400k bits/s.
I am not sure if I am doing these calculations correctly.
Question:
How much data am I transmitting per second with two accelerometers reading 1000 times per second on i2c?
Your math seems to be a bit off; for this accelerometer (from the datasheet: https://www.sparkfun.com/datasheets/Sensors/Accelerometer/ADXL345.pdf), in order to read the 6 bytes of XYZ sample data, you need to perform a 6-byte burst read of the registers. What this means in terms of data transfer is a write of the register address to the accelerometer (0x31) then a burst read of 6 bytes continuously. Each of these two transfers requires sending first the I2C device address and the R/W bit, as well as an ACK/NAK per byte, including the address bytes, as well as START/REPEAT START/STOP conditions. So, over all, an individual transfer to get a single sample (ie, a single XYZ acceleration vector) is as follows:
Start (*) | Device Address: 0x1D (7) | Write: 0 (1) | ACK (1) | Register Address: 0x31 (8) | ACK (1) | Repeat Start (*) | Device Address: 0x1D (7) | Read: 1 (1) | ACK (1) | DATA0 (8) | ACK(1) | DATA1 (8) | ACK (1) | ... | DATA5 (8) | NAK (1) | Stop (*)
If we add all that up, we get 81+3 bits of data that need to be transmitted. Note first that the START, REPEAT START and STOP might not actually take a bits worth of time each but for simplicity we can assume they do. Note also that while the device address is only 7 bits, you always need to postpend the READ/WRITE bit, so an I2C transaction is always 8 bits + ACK/NAK, so 9 bits in total. Note also, the I2C max transfer rate really defines the max SCK speed the device can handle, so in fast mode, the SCK is at most 400KHz (thus 400Kbps at most, but because of the protocol, you'll get less in real data). Thus, 84 bits at 400KHz means that we can transfer a sample in 0.21 ms or ~4700 samples/sec assuming no gaps or breaks in transmission.
Since you need to read 2 samples every 1ms (2 accelerometers, so 84 bits * 2 = 164 bits/sample or 164Kbps at 1KHz sampling rate), this should at least be possible for fast mode I2C. However, you will need to be careful that you are taking full use of the I2C controller. Depending on the software layer you are working on, it might be difficult to issue I2C burst reads fast enough (ie, 2 burst read transactions within 1ms). Using the FIFO on the accelerometer would significantly help the latency requirement, meaning instead of having 1ms to issue two burst reads, you can delay up to 32ms to issue 64 burst reads (since you have 2 accelerometers); but since you need to issue a new burst read to read the next sample, you'll have to be careful about the delay introduced by software between calls to whatever API youre using to perform the I2C transactions.

How is CR8 register used to prioritize interrupts in an x86-64 CPU?

I'm reading the Intel documentation on control registers, but I'm struggling to understand how CR8 register is used. To quote the docs (2-18 Vol. 3A):
Task Priority Level (bit 3:0 of CR8) — This sets the threshold value
corresponding to the highest- priority interrupt to be blocked. A
value of 0 means all interrupts are enabled. This field is available
in 64- bit mode. A value of 15 means all interrupts will be disabled.
I have 3 quick questions, if you don't mind:
So bits 3 thru 0 of CR8 make up those 16 levels of priority values. But priority of what? A running "thread", I assume, correct?
But what is that priority value in CR8 compared to when an interrupt is received to see if it has to be blocked or not?
When an interrupt is blocked, what does it mean? Is it "delayed" until later time, or is it just discarded, i.e. lost?
CR8 indicates the current priority of the CPU. When an interrupt is pending, bits 7:4 of the interrupt vector number is compared to CR8. If the vector is greater, it is serviced, otherwise it is held pending until CR8 is set to a lower value.
Assuming the APIC is in use, it has an IRR (Interrupt Request Register) with one bit per interrupt vector number. When that bit is set, the interrupt is pending. It can stay that way forever.
When an interrupt arrives, it is ORed into the IRR. If the interrupt is already pending (that is, the IRR bit for that vector is already set), the new interrupt is merged with the prior one. (You could say it is dropped, but I don't think of it that way; instead, I say the two are combined into one.) Because of this merging, interrupt service routines must be designed to process all the work that is ready, rather than expecting a distinct interrupt for each unit of work.
Another related point is that Windows (and I assume Linux) tries to keep the IRQ level of a CPU as low as possible at all times. Interrupt service routines do as little work as possible at their elevated hardware interrupt level and then cue a deferred procedure call to do the rest of their work at DPC IRQ level. The DPC will normally be serviced immediately unless another IRQ has arrived because they are at a higher priority than normal processes.
Once a CPU starts executing a DPC it will then execute all the DPC's in its per CPU DPC cue before returning the CPU IRQL to zero to allow normal threads to resume.
The advantage of doing it this way is that an incoming hardware IRQ of any priority can interrupt a DPC and get its own DPC on the cue almost immediately, so it never gets missed.
I should also try and explain ( as I think it is 😁) the difference between the IRQ level of a CPU and the priority of an IRQ .
Before Control Register 8 became available with x64 the CPU had no notion of an IRQ level.
The designers of windows NT decided that every logical processor in a system should have a NOTIONAL IRQ Level that would be stored in a data structure called a processor control block for each CPU. They decided there should be 32 levels for no reason I know of 😁.
Software and hardware interrupts are also assigned a level so if they are above the level that the CPU has assigned then they are allowed to continue.
Windows does NOT make use of the interrupt priority assigned by the PIC/APIC hardware, instead it uses the interrupt mask bits in them. The various pins are assigned a vector number and then they get a level.
When Windows raises the LRQL of a CPU in its PCB it also reprograms the interrupt mask of the PIC/APIC. But not straight away.
Every interrupt that occurs causes the Windows trap dispatcher to execute and compare the IRQ level with the CPU IRQL and if the IRQ level is higher the interrupt goes ahead, if not THEN Windows reprograms the mask and returns to the executing thread instead.
The reason for that is that reprogramming the PIC takes time and if no lower level IRQ comes in then windows can save its self a job.
On x64 there IS CR8 and I am still looking at how that works.

Can a sub-microsecond clock resolution be achieved with current hardware?

I have a thread that needs to process a list of items every X nanoseconds, where X < 1 microsecond. I understand that with standard x86 hardware the clock resolution is at best 15 - 16 milliseconds. Is there hardware available that would enable a clock resolution < 1 microsecond? At present, the thread runs continuously as the resolution of nanosleep() is insufficient. The thread obtains the current time from a GPS reference.
You can get the current time with extremely high precision on x86 using the rdtsc instruction. It counts clock cycles (on a fixed reference clock, not the actually dynamic frequency CPU clock), so you can use it as a time source once you find the coefficients that map it to real GPS-time.
This is the clock-source Linux uses internally, on new enough hardware. (Older CPUs had the rdtsc clock pause when the CPU was halted on idle, and/or change frequency with CPU frequency scaling). It was originally intended for measuring CPU-time, but it turns out that a very precise clock with very low-cost reads (~30 clock cycles) was valuable, hence decoupling it from CPU core clock changes.
It sounds like an accurate clock isn't your only problem, though: If you need to process a list every ~1 us, without ever missing a wakeup, you need a realtime OS, or at least realtime functionality on top of a regular OS (like Linux).
Knowing what time it is when you do eventually wake up doesn't help if you slept 10 ms too long because you read a page of memory that the OS decided to evict, and had to get from disk.