How to cover latency between request and response - system-verilog

Let's say we have a protocol where request req is asserted with req_id and corresponding rsp will be asserted with rsp_id. These can be out of order. I want to cover the number of clks or latency between req with particular req_id and rsp with the same id. I tried something like this. Is this correct way of doing? Is there any other efficient way?
covergroup cg with function sample(int a);
coverpoint a {
a1: bins short_latency = {[0:10]};
a2: bins med_latency = {[11:100]};
a3: bins long_latency = {[101:1000]};
}
endgroup
// Somewhere in code
cg cg_inst = new();
sequence s;
int lat;
int id;
#(posedge clk) disable iff (~rst)
(req, id = req_id, lat = 0) |-> ##[1:$] ((1'b1, lat++) and (rsp && rsp_id == id, cg_inst.sample(lat)));
endsequence

You're trying to use the |-> operator inside a sequence, which is only allowed inside a property.
If rsp can only come one cycle after req, then this code should work:
property trans;
int lat, id;
(req, id = req_id, lat = 0) |=> (1, lat++) [*0:$] ##1 rsp && rsp_id == id
##0 (1, $display("lat = %0d", lat));
endproperty
The element after ##0 is there for debugging. You can omit it in production code.
I wouldn't mix assertions and coverage like this, though, as I've seen that the implication operators can cause issues with variable flow (i.e. lat won't get updated properly). You should have a property that just covers that you've seen a matching response after a request:
property cov_trans;
int lat, id;
(req, id = req_id, lat = 0) ##1 (1, lat++) [*0:$] ##1 rsp && rsp_id == id
##0 (1, $display("cov_lat = %0d", lat));
endproperty
cover property (cov_trans);
Notice that I've used ##1 to separate the request from the response.

Basically your idea is right , But looks like the right hand side of the sequence will be evaluated once when the condition is true and hence the lat will be incremented only once .
You will need a loop mechanism to count the latency.
Below is an sample working example. You can change [1:$], ##1 etc based on how close the signals are generated
property ps;
int lat;
int id;
#(posedge clk)
disable iff (~rst)
(req, id = req_id, lat = 0) |=> (1'b1, lat++)[*1:$] ##1 (rsp && rsp_id == id, cg_inst.sample(lat));
endproperty
assert property (ps);

Alternatively...
property/sequences though they appear to be small code , in this case for every req ( which has not yet received a rsp ) a seperate process with its own counter is forked. This results in many counters doing very similar work. In case there are many req in flight ( and/or many instances of the property or sequence ) it will start adding into simulation run-time [ even though this is just a small block of code ]
so another approach is to keep the trigger simpler and we try to keep the processing linear.
int counter=0; // you can use a larger variablesize to avoid the roll-over issue
int arr1[int] ; // can use array[MAX_SIZE] if you know the max request id is small
always #( posedge clk ) counter <= counter + 1 ; // simple counter
function int latency (int type_set_get , int a ) ;
if ( type_set_get == 0 ) arr1[a] = counter; // set
//DEBUG $display(" req id %d latency %d",a,counter-arr1[a]);
// for roll-over - if ( arr1[a] > counter ) return ( MAX_VAL_SIZE - arr1[a] + counter ) ;
return (counter - arr1[a]); //return the difference between captured clock and current clock .
endfunction
property ps();
#(posedge clk)
disable iff (~rst)
##[0:$]( (req,latency(0,req_id) ) or (rsp,cg_inst.sample(latency(1,rsp_id))) );
endproperty
assert property (ps);
The above property is triggered only when req/rsp is seen and only 1 thread is active looking for it.
If needed extra checks can be added into the function , But for latency counting this should be fine.
Anecdote :
Mentor AE - Dan discovered an assertion which was slowing our simulations by as much as 40 % . The poorly written assertion was part of our block tb and its effects went unnoticed there , as our block level test, run times were limited. It then sneaked into our top-level tb causing untold runtime losses till it was discovered a year later :) . [ guess we should have profiled our simulation runs earlier ]
Say for example if the above protocol implemented an abort at a later time, then the req-rsp thread will continue to process and wait ( till the simulation ends) for an aborted transaction , though it will not affect the functionality , it will sneakily continue to hog processor resources doing nothing useful in return. Till finally an vendor AE steps in to save the day :)

Related

Verilog Pipelined Multiplier Intuition

I'm trying to understanding how the following code works, but struggling to put it together in my head. Could someone give me a more intuitive (visual) explanation of how this pipelined multiplier stage works?
// This is one stage of an 8 stage (9 depending on how you look at it)
// pipelined multiplier that multiplies 2 64-bit integers and returns
// the low 64 bits of the result. This is not an ideal multiplier but
// is sufficient to allow a faster clock period than straight *
module mult_stage(
input clock, reset, start,
input [63:0] product_in, mplier_in, mcand_in,
output logic done,
output logic [63:0] product_out, mplier_out, mcand_out
);
logic [63:0] prod_in_reg, partial_prod_reg;
logic [63:0] partial_product, next_mplier, next_mcand;
assign product_out = prod_in_reg + partial_prod_reg;
assign partial_product = mplier_in[7:0] * mcand_in;
assign next_mplier = {8'b0,mplier_in[63:8]};
assign next_mcand = {mcand_in[55:0],8'b0};
//synopsys sync_set_reset "reset"
always_ff #(posedge clock) begin
prod_in_reg <= #1 product_in;
partial_prod_reg <= #1 partial_product;
mplier_out <= #1 next_mplier;
mcand_out <= #1 next_mcand;
end
// synopsys sync_set_reset "reset"
always_ff #(posedge clock) begin
if(reset)
done <= #1 1'b0;
else
done <= #1 start;
end
endmodule

NBA assignment of $urandom

Can $urandom be NBA assigned in a for loop to an unpacked array of variables?
module tb();
logic clk [2];
initial clk[0] = 0;
always clk[0] = #1ns !clk[0];
for (genvar i = 1; i < 2; i++)
assign #(1ns/2) clk[i] = clk[i-1];
int tmp [2] [8];
always # (posedge clk[0]) begin
foreach (tmp[0][i]) begin
/*int m;
m = $urandom(); // SECTION 1 - using this code works (commenting out SECTION 2)
tmp[0][i] <= m;*/
tmp[0][i] <= $urandom(); // SECTION 2
end
#1ns;
foreach (tmp[0][i]) begin
$display("%1d", tmp[0][i]);
end
$finish();
end
for (genvar i = 1; i < 2; i++) begin
always_ff # (posedge clk[i]) begin
tmp[i] <= tmp[i-1]; // SECTION 3 (just removing this works too)
end
end
endmodule
Using Cadence tools (xrun 17.09-v002), I get all 8 of tmp[0] ints assigned the same value.
-2147414528
-2147414528
-2147414528
-2147414528
-2147414528
-2147414528
-2147414528
-2147414528
Can someone confirm whether this code is legal?
I have spoken to Cadence and been told this:
R&D’s response.
This use model of having $urandom call inside a non-blocking assignment is wrong.
The scheduling semantics of System Verilog dictates that the RHS is calculated and sampled once in the "inactive region" and then in the "NBA region" it's assigned the ALL of the elements of the foreach at the same time!
There is no difference in calling $urandom in a procedural loop versus serially calling $urandom multiple times. Your code gives the desired results in several tools, including Cadence's on EDAPlayground.com. Perhaps you are not showing is part of your problem. It always helps to show an MCVE, like
module top;
int tmp [2] [8];
bit clk;
initial begin
#1 clk=1;
#1 $display("%p",
tmp[0]);
end
always # (posedge clk) begin
foreach (tmp[,i]) begin
tmp[0][i] <= $urandom();
end
end
endmodule

Good Counter Design or Possible Metastability Issues?

Quick summary of my goal:
Design a counter triggered by a variable length auto-reload timer.
A little more verbose: There will be a register with a value that changes (predictably changes, and is latched before the EN signal for the AutoReloadTimer module) that sets the rate at which the counter increments.
Here's the auto-reload timer:
module AutoReloadTimer( clk, rst, EN, D, done );
input clk;
input rst;
input EN;
input [WIDTH-1:0] D;
output done;
parameter WIDTH = 8;
// OneShot EN -> load
wire load;
OneShotD OneShot_D(
.clk( clk ),
.rst( rst ),
.in( EN ),
.RE( load )
);
reg [WIDTH-1:0] counter, load_value;
always #( posedge clk ) begin
if ( rst ) begin
counter <= {WIDTH{1'b1}};
load_value <= {WIDTH{1'b1}};
end else if ( load ) begin
counter <= D;
load_value <= D;
end else if (counter == 0 ) begin
counter <= load_value;
load_value <= load_value;
end else begin
counter <= counter - 1'b1;
load_value <= load_value;
end
end
assign done = ( counter == 0 );
endmodule
And here is the counter triggered by the done signal from the AutoReloadTimer module:
module Counter( clk, rst, EN, CLR, Q );
input clk;
input rst;
input EN;
input CLR;
output [WIDTH-1:0] Q;
parameter WIDTH = 8;
reg [WIDTH-1:0] ctr;
always #( posedge clk ) begin
if ( rst ) begin
ctr <= {WIDTH{1'b0}};
end else if ( CLR ) begin
ctr <= {WIDTH{1'b0}};
end else if ( EN ) begin
ctr <= ctr + 1'b1;
end else begin
ctr <= ctr;
end
end
assign Q = ctr;
endmodule
And here is a portion of the waveform from a testbench:
What I'm curious about here is my counter's stability - is it an issue that the done signal goes low at the rising edge of the clock? I'm still fairly new to Verilog and digital design. I'm familiar with the term and somewhat the idea of metastability but I'm not fully comfortable with my understanding of it.
Looking for input, criticism, etc.
Edit
I forgot to include what configuration I had the modules in to produce that diagram:
wire ART_done;
AutoReloadTimer ART0 (
.clk( clk ),
.rst( rst ),
.EN( EN ),
.D( 4 ),
.done( ART_done )
);
Counter uut (
.clk(clk),
.rst(rst),
.EN(ART_done),
.CLR(CLR),
.Q(Q)
);
As long as your AutoReloadTimer and Counter modules, as well as any logic that uses the done signal are on the same clock, you won't have any metastability issues. What you would have is a fully synchronous implementation. Naturally, you must also meet the timing requirements of the device your using
The done signal will actually change some small combinatorial path delay after the rising clock edge that causes the counter to hit 0. Any logic that uses the done signal has the rest of the clock period before the next rising edge, to do what it needs to do (more combinatorial logic) and still meet the setup time of any register input that is conditioned by the done signal.
The metastability issues will only arise if the input to any registers are transitioning right as the clock is transitioning. This can happen if the data that's being registered is coming from a register that uses an asynchronous clock, or if the register's setup or hold timing is violated.

Removing the need to reset the device before using it

I'm having trouble implementing a controller block for an 8-bit multiplier. It works normally, but only if I turn the reset wire on, then off, such as in the following stimulus (which works fine):
`timescale 1ns / 100ps
module Controller_tb(
);
reg reset;
reg START;
reg clk;
reg LSB;
wire STOP;
wire ADD_cmd;
wire SHIFT_cmd;
wire LOAD_cmd;
Controller dut (.reset(reset),
.START(START),
.clk(clk),
.LSB(LSB),
.STOP(STOP),
.ADD_cmd(ADD_cmd),
.SHIFT_cmd(SHIFT_cmd),
.LOAD_cmd(LOAD_cmd)
);
always
begin
clk <= 0;
#25;
clk <= 1;
#25;
end
initial
begin
LSB <= 0;
START <= 0;
reset <= 1;
#55;
reset <= 0;
#10;
START <= 1;
#100;
START <= 0;
LSB <= 1;
#200;
#20;
#100;
end
initial
$monitor ("stop,shift_cmd,load_cmd, add_cmd: " , STOP,SHIFT_cmd,LOAD_cmd,ADD_cmd);
endmodule
Here's the simulation result for the working stimulus:
Now, when I set the reset to zero, without ever bringing it high, here's what happens:
Clearly, I'm using the reset wire to bring my Controller to the IDLE state. Here's the code for the controller block:
`timescale 1ns / 1ps
module Controller(
input reset,
input START,
output STOP,
input clk,
input LSB,
output ADD_cmd,
output SHIFT_cmd,
output LOAD_cmd
);
//Five states:
//IDLE : 000 , INIT: 001, TEST: 011, ADD: 010, SHIFT: 110
localparam [2:0] S_IDLE = 0;
localparam [2:0] S_INIT = 1;
localparam [2:0] S_TEST = 2;
localparam [2:0] S_ADD = 3;
localparam [2:0] S_SHIFT = 4;
reg [2:0] state,next_state;
reg [3:0] count;
// didn't assign the outputs to wire.. if not work, check this.
assign ADD_cmd = (state == S_ADD);
assign SHIFT_cmd = (state == S_SHIFT);
assign LOAD_cmd = (state == S_INIT);
assign STOP = (state == S_IDLE);
always #(*) begin
case(state)
S_INIT: begin
count = 3'b000;
end
S_SHIFT: begin
count = count + 1;
end
endcase
end
always #(*)
begin
next_state = state;
case (state)
S_IDLE: next_state = START ? S_INIT : S_IDLE;
S_INIT: next_state = S_TEST;
S_TEST: next_state = LSB ? S_ADD : S_SHIFT;
S_ADD: next_state = S_SHIFT;
S_SHIFT: next_state = (count == 8) ? S_IDLE : S_TEST;
endcase
end
always #(posedge clk)
begin
//state <= S_IDLE;
if(reset) state <= S_IDLE;
else state <= next_state;
end
reg [8*6-1:0] statename;
always #* begin
case( state )
S_IDLE: statename <= "IDLE";
S_INIT: statename <= "INIT";
S_TEST: statename <= "TEST";
S_ADD: statename <= "ADD";
S_SHIFT: statename <= "SHIFT";
default: statename <= "???";
endcase
end
endmodule
I don't know how to fix this. As you can see from the code above, there is a commented portion which is basically always initializing the state to IDLE. But even that doesn't work. Here's the simulation for the code above removing the comment from '//state <= S_IDLE;':
It's going into a different state than any listed above, and I have no idea why.
So I'd like to know:
Why is it going into an unknown state? Why doesn't my uncommented code work?
What can I change for it to work as I intend?
Your problem is that without a reset or initial value, state and next_state will be X. Your case statement assigning to statename will take the default branch and decode to ???. Since your process that assigns next_state does not handle cases where state is X it will get stuck in this state forever.
Your attempt to fix this will not work:
state <= S_IDLE;
if(reset) state <= S_IDLE;
else state <= next_state;
When reset is low you are making two assignments to state, the first as S_IDLE and the second as next_state. This is not a race condition. The Verilog standard states that:
Nonblocking assignments shall be performed in the order the statements were executed.
Since no re-ordering of the event queue occurs for sequential statements within a process this translates to last assignment wins. Therefore your state <= S_IDLE; is effectively optimised away since regardless of the value of reset the assignment will be overridden.
There are two ways you could fix this so that you don't need a reset:
1. Use the default clause to make your state machine safe
always #(*)
begin
next_state = state;
case (state)
S_IDLE: next_state = START ? S_INIT : S_IDLE;
S_INIT: next_state = S_TEST;
S_TEST: next_state = LSB ? S_ADD : S_SHIFT;
S_ADD: next_state = S_SHIFT;
S_SHIFT: next_state = (count == 8) ? S_IDLE : S_TEST;
default: next_state = S_IDLE;
endcase
end
This will ensure that your state-machine is 'safe' and drops into S_IDLE if state is a non-encoded value (including X).
2. Initialise the variable
reg [2:0] state = S_IDLE;
For some synthesis targets (e.g. FPGAs) this will initialise the register to a specific value and can be used alongside or instead of a reset (see Altera Documentation on power-up values).
A couple of general points:
Depending on your synthesis tool it may be better to use an enumeration rather than explicitly defining values for your states. This allows the tool to optimise based on the overall design or use a global configuration for encodings (for example safe, one-hot).
Using a reset registers holding state is standard practice so you should carefully consider whether you really want to avoid using a reset.
The uncommented code is an example of poor coding practice because you are making 2 nonblocking assignments to state in the same timestep. Synthesis linting tools are likely to warn you of this situation.
Since using a reset is a common, good practice, I don't think you need to fix anything.

ONE clock period pulse based on trigger signal

i am making a midi interface. UART works fine, it sends the 8 bit message along with a flag to a control unit. When the flag goes high, the unit will store the message in a register and make a clr_flag high in order to set the flag of UART low again. The problem is that i can not make this clr_flag one period long. I need it to be ONE period long, because this signal also controls a state machine that indicates what kind of message is being stored (note_on -> key_note -> velocity, for example).
My question here is, how can a signal (flag in this case) trigger a pulse just for one clk period? what i have now makes almost a pulse during a clock period, but i does it twice, because the flag has not become 0 yet. ive tried many ways and now i have this:
get_data:process(clk, flag)
begin
if reset = '1' then
midi <= (others => '0');
clr_flag <= '0';
control_flag <= '0';
elsif ((clk'event and clk='1') and flag = '1') then
midi <= data_in;
clr_flag <= '1';
control_flag <= '1';
elsif((clk'event and clk='0') and control_flag = '1') then
control_flag <= '0';
elsif((clk'event and clk='1') and control_flag = '0') then
clr_flag <= '0';
end if;
end process;
the problem with this double pulse or longer than one period pulse(before this, i had something that made clr_flag a two period clk pulse), is that the system will go though two states instead of one per flag.
so in short: when one signal goes high (independent of when it goes low), a pulse during one clock period should be generated.
thanks for your help.
The trick to making a single cycle pulse is realising that having made the pulse, you have to wait as long as the trigger input is high before getting back to the start. Essentially you are building a very simple state machine, but with only 2 states you can use a simple boolean to tell them apart.
Morten is correct about the need to adopt one of the standard patterns for a clocked process; I have chosen a different one that works equally well.
get_data:process(clk, reset)
variable idle : boolean;
begin
if reset = '1' then
idle := true;
elsif rising_edge(clk) then
clr_flag <= '0'; -- default action
if idle then
if flag = '1' then
clr_flag <= '1'; -- overrides default FOR THIS CYCLE ONLY
idle <= false;
end if;
else
if flag = '0' then
idle := true;
end if;
end if;
end if;
end process;
There are several issues to address in order to make the design for a one cycle
pulse using flip flops (registers).
First, the use of flip flops in hardware through VHDL constructions typically
follows a structure like:
process (clk, reset) is
begin
-- Clock
if rising_edge(clk) then
-- ... Flip flops to update at rising edge
end if;
-- Reset
if reset = '1' then
-- Flip flops to update at reset, which need not be all
end if;
end process;
So the get_data process should be updated accordingly, thus:
Sensitivity list should contain only clock (clk) and reset
The nested structure with if on event should be as above
Only rising edge of clk should be used, thus no check on clk = '0'
Making a one cycle pulse on clr_flag when flag goes high can be made with a
synchronous '0' to '1' detector on flag, using a version of flag that is
delayed a single cycle, called flag_ff below, and then checking for (flag =
''1) and (flag_ff = '0').
The resulting code may then look like:
get_data : process (clk, reset) is
begin
-- Clock
if rising_edge(clk) then
flag_ff <= flag; -- One cycle delayed version
clr_flag <= '0'; -- Default value with no clear
if (flag = '1') and (flag_ff = '0') then -- Detected flag going from '0' to '1'
midi <= data_in;
clr_flag <= '1'; -- Override default value making clr_flag asserted signle cycle
end if;
end if;
-- Reset
if reset = '1' then
midi <= (others => '0');
clr_flag <= '0';
-- No need to reset flag_ff, since that is updated during reset anyway
end if;
end process;
Synchronisation and Edge Detection for FSM
The Rise, Edge and Fall outputs will strobe for one cycle when those events are detected. Inputs and outputs are synchronised for use with a Finite State Machine.
entity SynchroniserBit is
generic
(
REG_SIZE: natural := 3 -- Default number of bits in sync register.
);
port
(
clock: in std_logic;
reset: in std_logic;
async_in: in std_logic := '0';
sync_out: out std_logic := '0';
rise_out: out std_logic := '0';
fall_out: out std_logic := '0';
edge_out: out std_logic := '0'
);
end;
architecture V1 of SynchroniserBit is
constant MSB: natural := REG_SIZE - 1;
signal sync_reg: std_logic_vector(MSB downto 0) := (others => '0');
alias sync_in: std_logic is sync_reg(MSB);
signal rise, fall, edge, previous_sync_in: std_logic := '0';
begin
assert(REG_SIZE >= 2) report "REG_SIZE should be >= 2." severity error;
process (clock, reset)
begin
if reset then
sync_reg <= (others => '0');
previous_sync_in <= '0';
rise_out <= '0';
fall_out <= '0';
edge_out <= '0';
sync_out <= '0';
elsif rising_edge(clock) then
sync_reg <= sync_reg(MSB - 1 downto 0) & async_in;
previous_sync_in <= sync_in;
rise_out <= rise;
fall_out <= fall;
edge_out <= edge;
sync_out <= sync_in;
end if;
end process;
rise <= not previous_sync_in and sync_in;
fall <= previous_sync_in and not sync_in;
edge <= previous_sync_in xor sync_in;
end;
Below is a way of creating a signal (flag2) that lasts exactly one clock period from a signal (flag1) that lasts at least one clock period.
I don't program in VHDL~ here is what I usually do for the same propose in Verilog:
always #(posedge clk or negedge rst) begin
if(~rst) flgD <= 1'b0;
else flgD <= flg;
end
assign trg = (flg^flgD)&flgD;
I am new to verilog and this is the sample code, which I used for triggering. Hope this serves your purpose. You can try same logic in VHDL.
module main(clk,busy,rd);
input clk,busy; // busy input condition
output rd; // trigger signal
reg rd,en;
always #(posedge clk)
begin
if(busy == 1)
begin
rd <= 0;
en <= 0;
end
else
begin
if (en == 0 )
begin
rd <= 1;
en <= 1;
end
else
rd <= 0;
end
end
endmodule
The below verilog code shall hold the value for the signals for one clock cycle exactly.
module PulseGen #(
parameter integer BUS_WIDTH = 32
)
(
input [BUS_WIDTH-1:0] i,
input clk,
output [BUS_WIDTH-1:0] o
);
reg [BUS_WIDTH-1:0] id_1 = 0 ;
reg [BUS_WIDTH-1:0] id_2 = 0 ;
always #(posedge clk)begin
id_1 <= i;
id_2 <= id_1;
end
assign o = (id_1 & ~id_2);
The way to achieve this is to create a debounce circuit. If you need a D flip-flop to change from 0 to 1, only for the first clock, just add an AND gate before its input like the image below:
So here you can see a D flip-flop and its debounce circuit.
P.S. Circuit created using this.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--input of minimum 1 clock pulse will give output of wanted length.
--load number 5 to PL input and you will get a 5 clock pulse no matter how long input is.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
library ieee ;
use ieee.std_logic_1164.all ;
use ieee.std_logic_unsigned.all ;
entity fifth is
port (clk , resetN : in std_logic;
pdata : in integer range 0 to 5; --parallel data in. to choose how many clock the out pulse would be.
din : in std_logic;
dout : out std_logic
) ;
end fifth ;
architecture arc_fifth of fifth is
signal count : integer range 0 to 5;
signal pl : std_logic; --trigger detect output.
signal sample1 : std_logic;
signal sample2 : std_logic;
--trigger sync proccess.
begin
process(clk , resetN)
begin
if resetN = '0' then
sample1<='0';
sample2<='0';
elsif rising_edge(clk) then
sample1<=din;
sample2<=sample1;
end if;
end process;
pl <= sample1 and (not sample2); --trigger detect output. activate the counter.
--counter proccess.
process ( clk , resetN )
begin
if resetN = '0' then
count <= 0 ;
elsif rising_edge(clk) then
if pl='1' then
count<=pdata;
else
if count=0 then
count<=count;
else
count<=count-1;
end if;
end if;
end if ;
end process ;
dout<='1' when count>0 else '0';--output - get the wanted lenght pulse no matter how long is input
end arc_fifth ;