System verilog simulation performance for uvm_hdl_read vs assign statement

System verilog simulation performance for uvm_hdl_read vs assign statement - system-verilog

I need to generate around 10,000 connectivity assertions to check that values driven at DUT interface at the beginning of simulation has reached (and is retained) at around 10,000 points inside the DUT throughout the simulation.
The 10k destination points inside DUT are available through xls, hence, if I can take the paths to an array, I could write a single assertion property and instantiate it multiple times while iterating through the array.
What is the better approach here for simulation performance?
Using assign :
logic[7:0] dest_vals[10000];
assign dest_vals[0] = dut_path0.sig0; //This assignment code can be script-generated
assign dest_vals[1] = dut_path1.sig1;
....
assign dest_vals[9999] = dut_path9999.sig9999;
property check_val(expected_val, dut_val);
#(posedge assert_clk) expected_val == dut_val;
endproperty
genvar i;
generate
for(i=0;i<9999;i++)
assert property check_val(exp_val[i], dest_vals[i]);
endgenerate
Using uvm_hdl_read :
string dest_val_path[10000] = '{
"dut_path0.sig0",
"dut_path1.sig1",
....
"dut_path9999.sig9999"
};
property check_val(expected_val, dut_val);
#(posedge assert_clk) expected_val == dut_val;
endproperty
genvar i;
generate
for(i=0;i<9999;i++) begin
initial forever begin
#(posedge assert_clk);
uvm_hdl_read(dest_val_path[i],dest_vals[i]);
end
assert property check_val(exp_val[i], dest_vals[i]);
end
endgenerate
Follow up question: Is there a way to change static array of size 10k to dynamic array/associative array/queue, so that if 10k destination points changes to 11k in future, array need not be re-sized manually?

Using uvm_hdl_read to access a signal by a string name is significantly slower than a direct hierarchical reference to a signal in an assign statement. A lot of code gets executed to lookup up that signal name in a string database and retrieve it from the simulator's internal database using the SystemVerilog's C based "VPI" routines. Plus, you can lose some optimizations because the value of the signal needs to be readily available at any point in time. In the case of the continuous assign statement, the simulator can work backwards and figure out it only needs the signal value on the assert_clk trigger.
How noticeable this performance difference is depends on a number of other factors
The overall size of the entire design's activity relative to additional code executed by uvm_hdl_read
How often the assert_clk triggers relative to the other activity in the design
How good a job you were able to limit VPI access to just the signals involved. Every signal you give VPI access to increases the size of the string database and prevents optimizations around that signal.
There is another option that does not involve assign statements or generate loops. You can use the bind construct to insert a module with your assertion. This works particularly well in your situation since you already have a script that produces the code for you.
module #(type T=logic [7:0]) mcheck(input logic clk, T expected_val, dut_val);
property check_val;
#(posedge clk) expected_val == dut_val;
endproperty
assert property(check_val);
endmodule
bind dut_path0 mcheck m(tb_path.assert_clk,tb_path.expected_val[0], sig0);
bind dut_path1 mcheck m(tb_path.assert_clk,tb_path.expected_val[1], sig1);
...
Note the port connections in the bind module are relative to the dut_path where mcheck is bound into.

Related

Can someone explain the control flow of modules in System Verilog

I know how to link modules but could someone explain the flow of calling the modules to be used when I want it to be used.
Like have a state machine and depending on the state I can call a module to activate, or like if I need to repeat a process how to go back to a module earlier in a state machine.
again I get the instantiating part like this
wire clk;
wire sig;
wire out;
A a(clk, sig, topout);
B b(clk, sig);
endmodule
but can someone explain how to call modules and how the control flow works in general for them?
(I am new to HDLs so I appreciate any help)

Verilog is a language specifically developed to simulate behavior of hardware. Hardware is a set of transistors and other elements which always statically presented and function in parallel. Functioning of such elements could be enabled or disabled, but the hardware is still present.
Verilog is similar to the hardware in the sense that all its elements are always present, intended for parallel functioning.
The basic functional elements of Verilog are gates, primitives and procedural blocks (i.e., always blocks). Those blocks are connected by wires.
All those elements are then grouped in modules. Modules are used to create logical partitioning of the hardware mode. They cannot be 'called'. They can be instantiated in a hierarchical manner to describe a logical structure of the model. They cannot be instantiated conditionally since they represent pieces of hardware. Different module instances are connected via wires to express hierarchical connectivity between lower-level elements.
There is one exception however, the contents of an always block is pure software. It describes an algorithmic behavior of it and as such, software flow constructs are allowed inside always block (specific considerations must be used to make it synthesizable).
As it comes to simulation, Verilog implements an event-driven simulation mode which is intended to mimic parallel execution of hardware. In other words, a Verilog low level primitive (gate or always block) is executed only if at least one of its inputs changes.
The only flow control which is usually used in such models is a sequence of input events and clocks. The latter are used to synchronize results of multiple parallel operations and to organize pipes or other sequential functions.
As I mentioned before, hardware elements can be enabled/disabled by different methods, so the only further control you can use by implementing such methods in your hardware description. For example, all hardware inside a particular module can be turned off by disabling clock signal which the module uses. There could be specific enable/disable signals on wires or in registers, and so on.
Now to your question: your code defines hierarchical instantiation of a couple of modules.
module top(out);
output wire out;
wire clk;
wire sig;
A a(clk, sig, out);
B b(clk, sig);
endmodule
Module 'top' (missing in your example) contains instances of two other modules, A and B. A and B are module definitions. They are instantiated as corresponding instances 'a' and 'b'. The instances are connected by signals 'clk', which is probably a clock signal, some signal 'sig' which is probably an output of one of the modules and input in another. 'out' is output of module top, which is probably connected to another module or an element in a higher level of hierarchy, not shown here.
The flow control in some sense is defined by the input/output relation between modules 'A' and 'B'. For example:
module A(input clk, input sig, output out);
assign out = sig;
...
endmodule
module B(input clk, output sig);
always#(posedge clk) sig <= some-new-value;
...
endmodule
However, in general it is defined by the input/output relation of the internal elements inside module (always blocks in the above example). input/output at the module port level is mostly used for semantic checking.
In the event-driven simulation it does not matter hardware of which module is executed first. However as soon as the value of the 'sig' changes in always#(posedge clk) of module 'B', simulation will cause hardware in module 'A' (the assign statement to be evaluated (or re-evaluated). This is the only way you can express a sequence in the flow at this level. Same as in hardware.

If you are like me you are looking at Verilog with the background of a software programmer. Confident in the idea that a program executes linearly. You think of ordered execution. Line 1 before line 2...
Verilog at its heart wants to execute all the lines simultaneously. All the time.
This is a very parallel way to program and until you get it, you will struggle to think the right way about it. It is not how normal software works. (I recall it took me weeks to get my head around it.)
You can prefix blocks of simultaneous execution with conditions, which are saying execute the lines in this block when the condition is true. All the time the condition is true. One class of such conditions is the rising edge of a clock: always #(posedge clk). Using this leads to a block of code that execute once every time the clk ticks (up).
Modules are not like subroutines. They are more like C macros - they effectively inline blocks of code where you place them. These blocks of code execute all the time any conditions that apply to them are true. Typically you conditionalize the internals of a module on the state of the module arguments (or internal register state). It is the connectivity of the modules through the shared arguments that ensures the logic of a system works together.

How to use System-Verilog Assertions on modules within top level DUT

I'm trying to write a memory scheduling testbench and to verify that I am accessing the correct addresses and writing the correct values at the right time I want to compare what is going on with the signals inside my top module with my schedule that I developed.
I've tried looking up "dot notation" and using my simulator (ModelSim) to access the signal (which I can do on the waveform fine) but I want to be able to use SVAs to check I have the correct values.
module top(input d, output q)
//wires
wire sub1_output
// Instantiate submodule
sub_module1 sub1(.sub1_output(sub1_output));
always begin
// logic
end
endmodule
module tb_top();
reg d;
wire q;
DUT top(.*);
initial begin
// This is where I want to access something within my DUT
assert(DUT.sub1_output == 1'b1)
end
endmodule
When I try this my code compiles and runs, but if I write the assertion such that it should fail and stop my code, nothing happens.

DUT.sub1_output
Is the correct format to use assertions on signals within top level instantiations.

Order in always_comb block

I have the impression that in an always_comb block, all the non-blocking assignment should work in parallel. That is, if I have
always_comb
begin
a = b;
b = c;
end
Then, a should be equal to c regardless of the order of above two lines in the always_comb block, as they are evaluated concurrently anyway. However, today I experienced an issue that change the order of above two lines, the results are different!!! Whay is that?

The statements within a begin/end block execute serially. It does not matter if you are using an always_comb or any other kind of always block. But you are using blocking assignments, not non-blocking assignments, which is the proper thing to do in an always_comb block. Non-blocking assignments are used to assign sequential logic, which implies storage of the current and next state.

This difference stems from that combinational always blocks cannot "self-trigger". The way the simulator works when a signal changes value is to locate all always blocks with that signal in the sensitivity list, then execute them one by one sequentially. But, only if that block is not already running! In your case the expected behavior would require the block to run twice, but instead only one iteration occurs for every update of c.
The situation is unfortunate since the sensitivity list is a simulator concept and generally ignored altogether for synthesis. Most synthesis tools would generate a wire from your code without producing any warning, creating a simulation-synthesis mismatch.
Note that an explicit sensitivity list (e.g., always #(b or c)) does not make any difference. One solution is to always ensure that the assignments are in the right order. Another is to use non-blocking assignments, but this is generally advised against since it slows down the simulator. (Note that VHDL does not have blocking assignments, and would thus always have this performance penalty. On the plus side you do not have problems like this.)

VHDL simulation in real time?

I've written some code that has an RTC component in it. It's a bit difficult to do proper emulation of the code because the clock speed is set to 50MHz so to see any 'real time' events take place would take forever. I did try to do simulation for 2 seconds in modelsim but it ended up crashing.
What would be a better way to do it if I don't have an evaluation board to burn and test using scope?

If you could provide a little more specific example of exactly what you're trying to test and what is chewing up your simulation cycles that would be helpful.
In general, if you have a lot of code that you need to test in simulation, it's helpful if you can create testbenches of the sub-modules and test them first. Often, if you simulate at the top (chip) level and try to stimulate sub-modules that are buried deep in the hierarchy of a design, it takes many clock ticks just to get data into and out of the sub-module. If you simulate the sub-module directly you have direct access to the modules I/O and can test the things you want to test in that module in fewer cycles than if you try to get to it from the top level.
If you are trying to test logic that has very deep fifos that you are trying to fill or a specific count of a large counter you're trying to hit, you can either add logic to your code to help create those conditions in fewer cycles (like a load instruction on the counter) or you can force the values of internal signals of your design from the testbench itself.
These are just a couple of general ideas. Again, if you provide more detail about what it is you're simulating there are probably people on this forum that can provide help that is more specific to your problem.

As already mentioned by Ciano, if you provided more information about your design we would be able to give more accurate answer. However, there are several tips that hardware designers should follow, specially for complex system simulation. Some of them (that I mostly use) are listed below:
Hierarchical simulation (as Ciano, already posted): instead of simulating the entire system, try to simulate smaller set of modules.
Selective configuration: most systems require some initialization processes such as reset initialization time, external chips register initialization, etc... Usually for simulation purposes a few of them are not require and you may use a global constant to jump these stages when simulating, like:
constant SIMULATION_ENABLE : STD_LOGIC := '1';
...;
-- in reset condition:
if SIMULATION_ENABLE = '1' then
currentState <= state_executeSystem; -- jump the initialization procedures
else
currentState <= state_initializeSystem;
end if;
Be careful, do not modify your code directly (hard coded). As the system increases, it becomes impossible to remember which parts of it you modified to simulate. Use constants instead, as the above example, to configure modules to simulation profile.
Scaled time/size constants: instead of using (everytime) the real values for time and sizes (such as time event, memory sizes, register file size, etc) use scaled values whenever possible. For example, if you are building a RTC that generates an interrupt to the main system every 60 seconds - scale your constants (if possible) to generate interrupts to about (6ms, 60us). Of course, the scale choice depends on your system. In my designs, I use two global configuration files. One of them I use for simulation and the other for synthesis. Most constant values are scaled down to enable lower simulation time.
Increase the abstraction: for bigger modules it might be useful to create a simplified and more abstract module, acting as a model of your module. For example, if you have a processor that has this RTC (you mentioned) as a peripheral, you may create a simplified module of this RTC. Pretending that you only need the its interrupt you may create a simplified model such as:
constant INTERRUPT_EVENTS array(1 to 2) of time := (
32 ns,
100 ms
);
process
for i in 1 to INTERRUPT_EVENTS'length loop
rtcInterrupt <= '0';
wait for INTERRUPT_EVENTS(i);
rtcInterrupt <= '1';
wait for clk = '1' and clk'event
end for
wait;
end process;

How can I create a task which drives an output across time without using globals?

I want to write some tasks in a package, and then import that package in some files that use those tasks.
One of these tasks toggles a reset signal. The task is reset_board and the code is as follows:
package tb_pkg;
task reset_board(output logic rst);
rst <= 1'b0;
#1;
rst <= 1'b1;
#1;
rst <= 1'b0;
#1;
endtask
endpackage
However, if I understand this correctly, outputs are only assigned at the end of execution, so in this case, the rst signal will just get set to 0 at the end of the task's execution, which is obviously not what I want.
If this task were declared locally in the module in which it is used, I could refer to the rst signal directly (since it is declared in the module). However, this would not allow me to put the task in a separate package. I could put the task in a file and then `include it in the module, but I'm trying to avoid the nasty complications that come with the way SystemVerilog handles includes (long-story-short, it doesn't work the way C does).
So, is there any way that the task can drive an output with different values across the duration of its execution without it having to refer to a global variable?

A quick solution is to use a ref that passes the task argument by reference instead of an output argument that is copied after returning from the task.
task reset_board(ref logic rst);
There are a few drawbacks of doing it this way. You can only pass variables of matching types by reference, so when you call reset_board(*signal*), signal cannot be a wire. Another problem is you cannot use an NBA <= to assign a variable passed by reference, you must use a blocking assignment =. This is because you are allowed to pass automatic variables by reference to a task, but automatic variable are not allowed to be assigned by NBAs. There is no way for the task to check the storage type of the argument passed to it.
Standard methodologies like the UVM recommend using virtual interfaces or abstract classes to create these kinds of connections from the testbench to the DUT. See my DVCon paper for more information.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse