If I have both alwas_comb and always_ff in a single module, which one executed first?.
for example, I have seen this code in a book but I am confused about the functionality. for example, if WE=0 what will be the value of Qout?
module SyncRAM #(parameter M = 4, N = 8)(output logic [N-1:0] Qout,
input logic [M-1:0] Address, input logic [N-1:0] Data, input logic clk, WE);
logic [N-1:0] mem [0:(1<<M)-1];
always_comb
Qout = mem[Address];
always_ff #(posedge clk)
if (~WE)
mem[Address] <= Data;
endmodule
Any help about the truth table of this code is appreciated,
regards
The specific answer to your question is that Qout will just track the value of mem[Address]. In other words, on the rising edge of the clock, if WE is 0, Qout will be driven with the value written to the memory. This is because the memory will behave like a bank of flip-flops, while the Qout output will behave as if it is directly connected to the Q output of a bank of flip-flops.
The order of the execution of the two always blocks is deterministic, because Qout is driven using a blocking assignment (=), whereas the memory is written to using a non-blocking assignment (<=). See the answer here for much more detail.
Related
I am currently developing an AES encryption core for a Pynq-Z1 FPGA board. I would like to see the routing of the logic in FPGA logic and timing summary of the design.
The project synthesises, but it results in a warning saying that I am using exceeding the number of IOB blocks on the package. This is understandable because the core takes in and outputs a 4 x 4 matrix.
Instead, I would like to have "internal I/O" in order to see the routing on FPGA fabric. How would I go about doing this? Currently, the device view shows an empty topology (shown below) but my synthesised design utilises 4148 LUT and 389 FF. I expect to see some CLBs highlighted.
design device view
I appreciate any feedback and reference to any application notes which might further progress my FPGA understanding.
Cheers
You can use a simple wrapper around your core with a serial interface. Something like:
entity wrapper is
port(clk, rst, dsi, dsi_core, shift_out: in std_ulogic;
di: in std_ulogic_vector(7 downto 0);
dso_core: out std_ulogic;
do: out std_ulogic_vector(7 downto 0)
);
end entity wrapper;
architecture rtl of wrapper is
signal di_core, do_core, do_buffer: std_ulogic_vector(127 downto 0);
begin
u0: entity work.core(rtl)
port map(clk, rst, dsi_core, di_core, dso_core, do_core);
input_process: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
di_core <= (others => '0');
elsif dsi = '1' then
di_core <= di & di_core(127 downto 8);
end if;
end if;
end process input_process;
output_process: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
do_buffer <= (others => '0');
elsif dso_core = '1' then
do_buffer <= do_core;
elsif shift_out = '1' then
do_buffer <= do_buffer(119 downto 0) & X"00";
end if;
end if;
end process output_process;
do <= do_buffer(127 downto 120);
end architecture rtl;
The wrapper just receives inputs, one byte at a time (when dsi = '1') and shifts them in a 128-bits register that is connected to the 128-bits input of your core. When 16 bytes have been entered the environment asserts dsi_core to instruct the core that the 128-bits input can be sampled and processed. The environment waits until the core asserts dso_core, signalling that the processing is over and the 128-bits output is available on the do_core output port of core. When dso_core is asserted the wrapper samples do_core in a 128-bits register (do_buffer). The environment can now read the leftmost byte of do_buffer which drives the do output port of the wrapper. The environment asserts shift_out to shift do_buffer one byte to the left and read the next byte...
This kind of wrapper is a very common practice when you want to test in the real hardware a sub-component of a larger system. As it frequently happens that the number of IOs of sub-components exceeds the number of available IOs, serial input-output solves this. Of course there is a significant latency overhead due to the IO operations but it is just for testing, isn't it?
Your demands are contradictory.
If the design can not place all the I/Os it can not show all the routing as it has not all the begin and/or endpoints. You should reduce your I/O.
The simplest way is to have a real or imaginary interface which much less pins.
An imaginary interface is one which is syntactically correct, reduces your I/Os but will never be used in real life so does not have to be functionally correct.
As it happens you are the third person to ask about reducing I/O in the last weeks and I posted an (untested) SPI interface which has a parameter to generate an arbitrary number of internal inputs and outputs. You can find it here: How can I assign a 256-bit std_logic_vector input
Here I have a simple example below.
module A(o,clk,rst,i);
output o;
input i,clk,rst;
...
endmodule
and here is an interface class definition below.
interface my_if(input bit clk);
logic o,rst,i;
wire clk;
clocking cb#(posedge clk);
input o; // why input here ?
output i,rst; // why output here ?
endclocking
...
endinterface
My question is how to decide the signal inside cb is input or output ??
Thank you !
There are many uses of input/output in SystemVerilog, which can be confusing.
For ports, they the flow of data across a boundary. For a clocking block, they represent whether a signal is passively being sampled, or actively driven. Depending on the situation, it is perfectly reasonable to have a port declared as an output, and the same signal declared as a clocking block input.
I would like to understand how tasks in System Verilog work. I thought that a task was just a way of naming and parametrising a bit of code that could otherwise appear enclosed between a begin and an end. However, the way that the parameters work is non-obvious.
Say I want to factor out instances of non-blocking assignments from a module. I might do something like the following, thus reaching the point where there are two instances of the same task that differ just in the parameters (ff_0 and ff_1).
module test_inlined;
bit clk;
int count = 0;
logic [7:0] x, y, z;
task automatic ff_0;
#(posedge clk);
y <= x;
endtask
task automatic ff_1; // really same task as ff_0 to within variable renaming
#(posedge clk);
z <= y;
endtask
always
ff_0;
always
ff_1;
always #(posedge clk)
$strobe("%d: x=%d, y=%d, z=%d", count, x, y, z);
always
#5 clk = !clk;
always #(posedge clk)
begin
x <= count;
count ++;
if (count > 20) $finish;
end
endmodule
It would be trivial to instead place the factored out assignments (aka flip-flops) into two instantiations of the same module, so it would make sense for it to be also possible to express the same functionality in terms of two instances of the same task.
The following does not work because out is supposedly automatic, or that is what Modelsim claims. I do not see why it would be since it is fairly obviously a reference to a static member of a module?
task automatic ff (ref logic [7:0] out, ref logic [7:0] inp, ref bit clk);
#(posedge clk);
out <= inp;
endtask
module test_broken;
bit clk;
int count = 0;
logic [7:0] x, y, z;
always
ff(y, x, clk);
always
ff(z, y, clk);
// .... same as before
endmodule
It does make sense that the tasks need to be automatic to use ref parameters because then there is no need to worry about their lifetime. It is less clear why only blocking assignments to an automatic variable would be allowed. It is not like there is any obvious need for auto variables to go away while there are pending non-blocking assignments?
How do I factor out non-blocking assignments into a task, please? Many thanks in advance.
The problem is that task cannot assume anything about the storage classification of the variable passed to it. The code generated for the task has to work for any kind of storage, so passing by ref has to take on the pessimistic set of restrictions.
I'm learning SystemVerilog and today my lecturer warned us against accidentally introducing memory into combinational systems. He used the following code as an example of this:
module gate(output logic y, input logic a);
always_comb
if(a)
y = '1;
endmodule
However, I don't understand why this presents a problem. As far as I can see, this is just a simple buffer. In what way does this code introduce memory into the system?
At the beginning of the simulation if a == 0 the value of y will be '0. If later a == 1'b1 then y will become '1. What value do you expect y to have when later on a == 0?
The answer to this question is that it will retain its previous value: '1. That is not the behaviour of combinational logic, whose output by definition only depends on the current state of the inputs, not on their previous states. In order to implement the behaviour you have described, the synthesiser will need a component with state, with storage, with memory. It will fulfill this by using a latch.
I been assigned to manually convert the below RTL into its structural equivalent. I don't understand how you'd convert it. What's the structural description for this code in verilog? What steps should I take?
module cou(
output reg [7:0] out,
input [7:0] in,
input iti,
input c,
input clock);
always #(posedge clock)
if (iti == 1)
out <= in;
else if (c == 1)
out <= out + 1;
endmodule
Here is the basic process:
always #(posedge clock) tells you you have positive-edge D-flip-flops without an asynchronous reset or set.
out is the only value being assigned within the always statment. The size of out tells you the number of flops needed.
Drawing a component level schematic diagram can help visualize the structural logic.
Now all that is needed to figure out is the combination logic to the flop's D pin. I'll give you a clue that it can be done using only muxes and an adder.