VHDL core synthesis and implementation in Vivado - aes

I am currently developing an AES encryption core for a Pynq-Z1 FPGA board. I would like to see the routing of the logic in FPGA logic and timing summary of the design.
The project synthesises, but it results in a warning saying that I am using exceeding the number of IOB blocks on the package. This is understandable because the core takes in and outputs a 4 x 4 matrix.
Instead, I would like to have "internal I/O" in order to see the routing on FPGA fabric. How would I go about doing this? Currently, the device view shows an empty topology (shown below) but my synthesised design utilises 4148 LUT and 389 FF. I expect to see some CLBs highlighted.
design device view
I appreciate any feedback and reference to any application notes which might further progress my FPGA understanding.
Cheers

You can use a simple wrapper around your core with a serial interface. Something like:
entity wrapper is
port(clk, rst, dsi, dsi_core, shift_out: in std_ulogic;
di: in std_ulogic_vector(7 downto 0);
dso_core: out std_ulogic;
do: out std_ulogic_vector(7 downto 0)
);
end entity wrapper;
architecture rtl of wrapper is
signal di_core, do_core, do_buffer: std_ulogic_vector(127 downto 0);
begin
u0: entity work.core(rtl)
port map(clk, rst, dsi_core, di_core, dso_core, do_core);
input_process: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
di_core <= (others => '0');
elsif dsi = '1' then
di_core <= di & di_core(127 downto 8);
end if;
end if;
end process input_process;
output_process: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
do_buffer <= (others => '0');
elsif dso_core = '1' then
do_buffer <= do_core;
elsif shift_out = '1' then
do_buffer <= do_buffer(119 downto 0) & X"00";
end if;
end if;
end process output_process;
do <= do_buffer(127 downto 120);
end architecture rtl;
The wrapper just receives inputs, one byte at a time (when dsi = '1') and shifts them in a 128-bits register that is connected to the 128-bits input of your core. When 16 bytes have been entered the environment asserts dsi_core to instruct the core that the 128-bits input can be sampled and processed. The environment waits until the core asserts dso_core, signalling that the processing is over and the 128-bits output is available on the do_core output port of core. When dso_core is asserted the wrapper samples do_core in a 128-bits register (do_buffer). The environment can now read the leftmost byte of do_buffer which drives the do output port of the wrapper. The environment asserts shift_out to shift do_buffer one byte to the left and read the next byte...
This kind of wrapper is a very common practice when you want to test in the real hardware a sub-component of a larger system. As it frequently happens that the number of IOs of sub-components exceeds the number of available IOs, serial input-output solves this. Of course there is a significant latency overhead due to the IO operations but it is just for testing, isn't it?

Your demands are contradictory.
If the design can not place all the I/Os it can not show all the routing as it has not all the begin and/or endpoints. You should reduce your I/O.
The simplest way is to have a real or imaginary interface which much less pins.
An imaginary interface is one which is syntactically correct, reduces your I/Os but will never be used in real life so does not have to be functionally correct.
As it happens you are the third person to ask about reducing I/O in the last weeks and I posted an (untested) SPI interface which has a parameter to generate an arbitrary number of internal inputs and outputs. You can find it here: How can I assign a 256-bit std_logic_vector input

Related

UART serial interface

I want to transfer 8 bits serially (1 bit/clock cycle) through a 1 bit serial interface of a UART. I created an 8 bit packet in the transaction class and drove the packet through the driver modport of the interface. Here is the code snippet below.
for (i = ($size(pkt.RXD)-1); i <= 0; i = i-1) begin
RXSD_vif.DRV.cb_RXSD_DRV.RXD <= RXSD_pkt[i];
end
RXSD_vif is the virtual interface handle.
DRV - modport
cb_RXSD_DRV is the clocking block where I'm taking the positive clock edge with RXD made out to be output.
I'm getting a compile error saying "Too many indices going into RXSD_pkt".
I'm fairly new to this and would appreciate any help in telling me how to fix this. Thanks in advance
I think you're passing the index 'i' to the handle of the packet class. There should be an 8 bit vector within the class through which you need to index. Does this help in any way?

implementing inertial delay in multiple ways

I'm trying to implement inertial delay in SystemVerilog to generate a signal valid_inputs with following criterion
1. valid_inputs should go to '1' after some delay (say 15 units) if no inputs are X/Z
2. valid_inputs should go to '0' immediately if atleast one input becomes X/Z.
I am trying the above with 2 implementations:
module test (a, b, y);
input a, b;
output y;
wire temp;
assign temp = ^{a,b};
bit valid_inputs_temp, valid_inputs_2;
wire valid_inputs;
always #(temp)
begin
if (temp === 1'b1 || temp === 1'b0)
begin
valid_inputs_temp <= 1'b1;
end
else
begin
valid_inputs_temp <= 1'b0;
end
end
assign #(15,0) valid_inputs = valid_inputs_temp;
always #(temp)
begin
if (temp === 1'b1 || temp === 1'b0)
begin
#15 valid_inputs_2 <= 1'b1;
end
else
begin
valid_inputs_2 <= 1'b0;
end
end
endmodule
While the signal valid_inputs works perfectly, but I'm not quite sure why valid_inputs_2 doesn't work exactly the same? Is there a way I can implement the inertial delay using always-begin procedural code?
Please note that while I could modify the assign statement in the above code such that I totally eliminate the corresponding always-begin block, for some reason I need to use the always-begin style of coding.
Thanks,
Vinayak
The reason the second always block does not work is that you have a blocking #15 delay. That suspends the always process and misses a change on temp if it occurs in less that 15 time units. You need to move the delay to the other side of the <= so it becomes a non-blocking assignment delay.
valid_inputs_2 <= #15 1'b1;
And if you are confused about how an always block executes, see this link.
This is not a good example of inertial delay. The reason is that any 0->1 or 1->0 transition makes the output as 1. So pulses do not logically cause any difference in the output.
So there is no way to distinguish it from transport delay.
Moreover, the output turns 0 immediately for z. So there is no delay in that either.
Hence the signal "valid_inputs_2" is transport delay (<= #15).
To my knowledge there is no way to create inertial delay using always block.

Systemverilog rule 4.7 (nondeterminism) is interpreted differently by vcs vs iverilog/modelsim

I am still a bit confused about how SystemVerilog's 2012 rule 4.7 is implemented.
The rule states that in a situation like this:
module test;
logic a;
integer cnt;
initial begin
cnt = 0;
#100;
a <= 0;
a <= 1;
a <= 0;
a <= 1;
a <= 0;
end
always #(posedge a)
begin
cnt <= cnt + 1;
end
endmodule
all assignments would be scheduled on the Non Blocking Assignment queue, and must then be executed in order. The last value wins.
Up to here, it's all clear.
What happens next though is not the same for all simulators.
iverilog and Modelsim (at least the Vivado 2016/3 edition) create one event on 'a', which causes cnt to increment. This seems to also match the behaviour as illustrated by Mr Cummings at SNUG 2000
VCS however filters out the intermediate values and applies only the last one, which by the way is also the way that real flip flops work.
In this case it is not a purely hypothetical discussion, the simulation results are different, and the iverilog/modelsim behaviour could cause bugs that are very difficult to catch, because the flop toggles but no value change is seen in the waveforms.
The other point is this: if iverilog/modelsim are correct, why then are they creating one event and not two?
EDIT:
Additional note.
The example above is indeed not very meaningful.
A more realistic case would be
always #(posedge clk)
begin
clk2 <= 1'b1;
if (somecondition)
clk2 <= 1'b0;
end
always #(posedge clk2, negedge rst_n)
begin
if (!rst_n)
q <= 1'b0;
else
q <= ~q;
end
this is perfectly legal and in real hardware would never glitch.
the first always is actually logically identical to
always #(posedge clk)
begin
if (somecondition)
clk2 <= 1'b0;
else
clk2 <= 1'b1;
end
However, if you simulate the first version with ModelSim, you'll see your q happily toggling away, with clk2 constant 0. This would be a debugging nightmare.
Your last question is easy to explain. It's not that simulators create only one event, they don't- it's that only the first event schedules the #(posedge) to resume the always process and the other events happen in the NBA region before the always block resumes execution in the next active event region.
I can't justify the behavior of other simulators. You are not allowed to make multiple assignments to the same flip-flop in real hardware, so you analogy in not that simple. It's possible to have an un-timed description and get multiple (#posedge's) without time passing. So filtering would prevent that coding style.

which procedural block executed first, in SystemVerilog?

If I have both alwas_comb and always_ff in a single module, which one executed first?.
for example, I have seen this code in a book but I am confused about the functionality. for example, if WE=0 what will be the value of Qout?
module SyncRAM #(parameter M = 4, N = 8)(output logic [N-1:0] Qout,
input logic [M-1:0] Address, input logic [N-1:0] Data, input logic clk, WE);
logic [N-1:0] mem [0:(1<<M)-1];
always_comb
Qout = mem[Address];
always_ff #(posedge clk)
if (~WE)
mem[Address] <= Data;
endmodule
Any help about the truth table of this code is appreciated,
regards
The specific answer to your question is that Qout will just track the value of mem[Address]. In other words, on the rising edge of the clock, if WE is 0, Qout will be driven with the value written to the memory. This is because the memory will behave like a bank of flip-flops, while the Qout output will behave as if it is directly connected to the Q output of a bank of flip-flops.
The order of the execution of the two always blocks is deterministic, because Qout is driven using a blocking assignment (=), whereas the memory is written to using a non-blocking assignment (<=). See the answer here for much more detail.

Clock input ignored while waveform simulation in VHDL?

I'm new to Hardware Description Language Theory and VHDL. I need to design a 2421 up counter in VHDL. I built a synchronous binary up counter using T flip flop and modified it to generate the last 2 desired counts of 8 and 9 by activating the preset and clear conditionally. When I try waveform simulation, the clock input gets ignored. Can't figure out what the problem is. Here is the code:
library ieee;
use ieee.std_logic_1164.all;
entity count2421 is
port(clock:in std_logic;qq:buffer std_logic_vector(3 downto 0));
end count2421;
architecture arch of count2421 is
component t_ff is
port(clock,clear,preset,t:in std_logic;q:buffer std_logic);
end component;
signal p1,p2,p,t,u,a,b:std_logic;
begin
t<='1';
u<='0';
qq(0)<='0';
qq(1)<='0';
qq(2)<='0';
qq(3)<='0';
process(clock) begin
if(clock'event and clock='0') then
p1<=(not qq(3)) and qq(2) and qq(1) and qq(0);
p2<=qq(3) and qq(2) and qq(1) and (not qq(0));
p<=p1 or p2;
a<=qq(3) and qq(2);
b<=a and qq(1);
end if;
end process;
stage0:t_ff port map(clock,u,p,t,qq(0));
stage1:t_ff port map(clock,u,p,qq(0),qq(1));
stage2:t_ff port map(clock,u,p,a,qq(2));
stage3:t_ff port map(clock,p1,p2,b,qq(3));
end arch;
Here is the T flip flop code:
library ieee;
use ieee.std_logic_1164.all;
entity t_ff is
port(clock,clear,preset,t:in std_logic;q:buffer std_logic);
end t_ff;
architecture arch of t_ff is
signal temp:std_logic;
begin
temp<=q;
process(clock)
begin
if(clock'event and clock='0') then
if(clear='1') then
q<='0';
elsif(preset='1') then
q<='1';
elsif(t='1') then
q<=not q;
end if;
end if;
end process;
end arch;
Buffer port qq has two drivers.
Fix that first before worrying about anything else.