What are buffers used for in the construction of the D latch? - cpu-architecture

I am reading a book Digital Design and Computer Architecture and in the chapter on the D trigger at the transistor level it says "A compact D latch can be constructed from a single transmission gate" and the following is an example of building a latch using this and buffers.
I have a few questions:
How is it that a latch can be built from a single transmission gate, if the Latch is a memory cell that should consist of two looped elements and store the state, and not just pass through a bit of information on a clock pulse.
What are buffers used for when building a D trigger? I couldn't figure it out from what was written in the book. Can you explain this point in a little more detail? And why do they all invert the passing values?
Figure 3.12 (a). D latch constructed from a single transmission gate
Figure 3.12 (b) 12 transistor D latch
Figure 3.13 D-trigger
Figure 3.12
Figure 3.13

What are buffers used for when building a D trigger?
A latch is always and continuously open to respond to inputs that can change the value the latch stores.
However, many designs timed and are clocked, so that means we want to accept changes (requests to store potentially new value) only at the clock edge boundary but otherwise hold the latch with current state as is, as well as hold its output steady.  Clocked designs are tuned/timed so that the combinational (non-sequential) circuitry in between storage completes (just) before the next clock boundary and so can be recorded in registers and the next cycles go forth.  The general concept here is called an Edge Triggered Latch, which is also known as a Flip Flop used in clocked designs.
In order to limit the time period allowed for change of a latch, we add extra circuitry in front of the latch, the effect of this circuitry is to allow inputs to go through on certain time periods and otherwise suppress inputs for the others — allowing change only once per cycle, e.g. at a positive clock edge.
The extra circuitry added can be either a 2nd latch or pulse trigger.  These operate differently and have different advantages and disadvantages.
The 2nd latch approach typically always has one latch in the accepting change state and the other in the opposite state, the storage state (i.e. ignoring input changes).  The states of the two latches reverse every single clock edge (e.g. on the half clock).  Because of this, data transmits from one latch to another only at clock edge boundaries, and with two latches put together, we can make devices that accept input only on rising edge (or only on falling edge) of the clock, e.g. once per full clock cycle.
We might call the extra latch a buffer.
In pulse-triggered designs, we clip short the clock signal going to the latch so that it doesn't last as long as the full half cycle of the clock, and latch sees only a quick blip instead, as another approach to limiting the period of change.
For more information on the variations in Flip Flop designs and their trade offs see this text:
http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s01/Lectures/lecture22-flipflops.pdf

The simple latch is actually acting like a sample and hold for analog signals. The memory is held by a capacitor holding the voltage or logic level. It doesn't pass through when in hold state, because the input is actually disconnected from the latching (holding) capacitor when in that state.
The buffer is there to ensure a minimal load to the output of the latching (holding) capacitor. The buffer is in an inverted mode, and the input to the latch capacitor also is an inverter, hence canceling the inversion.

Related

If the PC register is simultaneously read and written, does its read data contain the previous data or the newly-written data?

If the PC register is simultaneously read and written, does its read data contain the previous data or the newly-written data? Based on my understanding of sequential circuits, the effect of the write command does not instantly take effect in the PC register due to propagation delay so, at the rising edge of the clock, the read command will get the old value. But corollary to my question is if this is the case, shouldn't the read command would also have a delay in some sense and could possibly read the newly-written data?
A program counter is normally special enough that it's not part of a register file with other registers. You don't have a "read command", its output is just always wired up to other parts that read it when appropriate. (i.e. when its output is stable and has the value you want). e.g. see various block diagrams of MIPS pipelines, or non-pipelined single-cycle or multi-cycle designs.
You'd normally build such a physical register out of edge-triggered flip-flops, I think. (https://en.wikipedia.org/wiki/Flip-flop_(electronics)). Note that a D flip-flop does latch the previous input as the current output on a clock edge, and then the input is allowed to change after that.
There's a timing window before the clock edge where the input has to remain stable, it can start to change a couple gate delays after. Note the example of a shift register built by chaining D flip-flops all with the same clock signal.
If you have a problem arranging to capture a value before it starts changing, you could design in some intentional clock skew so the flip-flop reliably latches its input before you trigger the thing providing the input to change it. (But normally whatever you're triggering will itself have at least a couple gate delays before its output actually changes, hence the shift-register made of chained D flip-flops.)
That wiki article also mentions the master-slave edge-triggered D Flip-Flop that chains 2 gated (not clocked) D latches with an inverted clock, so capturing the input happens on the opposite clock edge from updating the output with the previously-captured data.
By comparison and for example, in register files for general-purpose registers in classic RISC pipelines like MIPS, IIRC it's common to build them so write happens in the first half-cycle and read happens in the second half-cycle of the ID stage. (So write-back can "forward" to decode/fetch through the register file, keeping the window of bypass-forwarding or hazards shorter than if you did it in the other order.)
This means the write data has a chance to stabilize before you need to read it.
Overall, it depends how you design it!
If you want the same clock edge to update a register with inputs while also latching the old value to the output, you a master-slave flip-flop will do that (capture the old input into internal state, and latch the old internal state onto the outputs).
Or you could design it so the input is captured on the clock edge, and propagates to the output after a few gate delays and stays latched there for the rest of this clock cycle (or half cycle). That would be a single D flip-flop (per bit).

Optimize power consumption with STM32L4 ADC

I'm working on a firmware development on a STM32L4. I need to sample an analog signal at around 200Hz. So basically one analog to digital conversion every 5ms.
Up to now, I was starting the ADC in continuous conversion mode, triggered by a timer. However this prevents to put the STM32 in Stop mode in between conversions, which would be very efficient in terms of power consumption since 99%+ ot the time the product has nothing to do.
So my idea is to use the single conversion mode: use a low power timer to wakeup the product from Stop mode every 5ms, launch a single conversion in the LPTIM interrupt handler (waiting for ADC end of conversion in polling), and go back to Stop mode.
Do you think it makes sense or do you see problems to proceed like this ? I'm not sure about polling for a single ADC conversion inside a handler, what do you think ? I think a single conversion on one channel should be pretty fast (I run at 80MHz, the datasheet mentions a maximum sampling time of 8us)
Do I have to disable/enable ADC (the bit ADEN) between each single conversion ?
Also, I have to know how long a single conversion lasts to assess whether the solution is interesting or not. I'm confused about the sampling time (bits SMP). The reference manual states: "This sampling time must be enough for the input voltage source to charge the embedded capacitor to the input voltage level." What is the way to find the right SMP value ?
There are no problems with the general idea, LPTIM1 can generate wakeup events through the EXTI controller even in Stop2 mode.
I'm not sure about polling for a single ADC conversion inside a handler, what do you think ?
You might want to put the MCU in Sleep mode in the timer interrupt, and have the ADC trigger an interrupt when the conversion is complete. So disable SLEEPDEEP in the timer interrupt, and enable it in the ADC interrupt.
What is the way to find the right SMP value ?
Empirical method: start with the longest sampling time, and start decreasing it. When the conversion result significantly changes, go one or two steps back.

Simulation: send packets according to exponential distribution

I am trying to build a network simulation (aloha like) where n nodes decide at any instant whether they have to send or not according to an exponential distribution (exponentially distributed arrival times).
What I have done so far is: I set a master clock in a for loop which ticks and any node will start sending at this instant (tick) only if a sample I draw from a uniform [0,1] for this instant is greater than 0.99999; i.e. at any time instant a node has 0.00001 probability of sending (very close to zero as the exponential distribution requires).
Can these arrival times be considered exponentially distributed at each node and if yes with what parameter?
What you're doing is called a time-step simulation, and can be terribly inefficient. Each tick in your master clock for loop represents a delta-t increment in time, and in each tick you have a laundry list of "did this happen?" possible updates. The larger the time ticks are, the lower the resolution of your model will be. Small time ticks will give better resolution, but really bog down the execution.
To answer your direct questions, you're actually generating a geometric distribution. That will provide a discrete time approximation to the exponential distribution. The expected value of a geometric (in terms of number of ticks) is 1/p, while the expected value of an exponential with rate lambda is 1/lambda, so effectively p corresponds to the exponential's rate per whatever unit of time a tick corresponds to. For instance, with your stated value p = 0.00001, if a tick is a millisecond then you're approximating an exponential with a rate of 1 occurrence per 100 seconds, or a mean of 100 seconds between occurrences.
You'd probably do much better to adopt a discrete-event modeling viewpoint. If the time between network sends follows the exponential distribution, once a send event occurs you can schedule when the next one will occur. You maintain a priority queue of pending events, and after handling the logic of the current event you poll the priority queue to see what happens next. Pull the event notice off the queue, update the simulation clock to the time of that event, and dispatch control to a method/function corresponding to the state update logic of that event. Since nothing happens between events, you can skip over large swatches of time. That makes the discrete-event paradigm much more efficient than the time step approach unless the model state needs updating in pretty much every time step. If you want more information about how to implement such models, check out this tutorial paper.

How do I compare two signals whose edges are almost in the same place?

I am verifying part of a design which generates pulses with precisely timed edges. I have a basic behavioral model which produces an output which is similar, but not exactly the same as the design. The differences between the two are smaller than the precision needed for the design, so my model is good enough. The problem is: how do I do a comparison between these two signals?
I tried:
assert(out1 == out1_behav);
But that fails since the two signals have edges which happen 1ps apart. The design only requires that the edges be placed with 100ps precision, so I want a pass in this situation.
I thought about using a specify block with $delay() timing checks, however this causes me other problems since I need to run with +no_timing_checks to keep my ram models from failing in this RTL sim.
Is there a simple way to check that these edges are "almost" the same?
With the design requirement for the the signals to match within 100ps you could add a compare logic will a 100ps transition delay to act as a filter.
bit match;
assign #100ps match = (out1 == out1_behav);
always #*
assert #0 (match==1);
Verilog has different ways of assigning delay: transition and transport. Transition delays control the rise, fall, and indeterminate/high-Z timing. They can act as a filter if a driving signal gives a pulse less then the delay. Transport delays will always follow the the driving signals with a time shift. When the delays are large transition and transport will look the same.
assign #delay transition = driver; // Transition delay
always #(rhs) transport <= #dealy driver; // Transport delay
example: http://www.edaplayground.com/s/6/878, click the run button to see the waveform.
If you are using Modelsim/Questa, you can still use +notimingchecks, and then use the tcl command tchech_set to turn on individual timing checks, like $fullskew
Otherwise you will have to write a behavioral block that records the timestamps of the rising and falling edges of the two signals and checks the absolute value of the difference.

In FSMs does one State last one clock cycle or more?

Need to design a simple one for school.
More specifically a Moore FSM. Im not sure how state transitions happen, is it next state each clock?
I need to know because im wondering if i can shift a register and add a value to it, all in the same state... Could use wave edges?
EDIT:
I have to design the ALU part with registers as a schematic from gate-level, so no target CPU.
I made the algorith diagram, then put states to function blocks according Moore FSM rules. each block of operations gets one state.
For instance in a state S1, i have the following operations: y0 = shift Reg1 left; y1 = Reg1 = Reg1 + Reg2. So the microcommand that the control part of Moore FSM outputs would be 0000011 (yn...y1,y0). this microcommand should be the input to the ALU part which i need to design. Now i realized y1,y0 will conflict eachother, since both are using Reg1.
Its problematic since I dont actually have the Control part, I have to imagine the core FSM and design only the ALU with registers. This is why i was wondering if i get more than one clock cycle, so i can sequence y0,y1 or do i have to complete the entire operation in one clock?
I plan on making parallel-in, parallel-out non-shift registers, obviously i cant do the two operations of the microcommand at the same time. So what can i do:
1. make extra states? which i really dont want to do
2.use edges of a single clock? (might cause problems?)
3.Assume i get a preset amount of ticks from the clock to complete the microcommand ?
This would make the most sense, but i dont know if its the case.
The control unit does "know" the algorithm and thus how many operations need to be performed
I have to note again, that the control part is totally abstract and i have no idea how this is handled in practice.
A FSM itself has no inherent notion of time (although it can be defined). A Moore machine is simplified model and lacks the ability to even formally represent an ever progressing "time" (without, of course, implementing the counting entirely with states); remember, there is only a finite set of states.
In any case, time can be introduced in an implementation detail of a particular FSM and the amount of time might required to change between particular states might not be constant. (A particular FSM might also map differently to different implementations.) In the case of a clocked system it would require looking into how each "clock" is defined in the implementation; it might be leading edge, trailing edge, both, or something entirely different.
Instead of looking at the FSM here for guidance (it is just the logical progression of states), look at the opcodes (or whatever the implementation is) that the FSM represents, and how the CPU (or whatever the implementation is) in question "executes" them.
(What do the books say? ;-)