implementing inertial delay in multiple ways - system-verilog

I'm trying to implement inertial delay in SystemVerilog to generate a signal valid_inputs with following criterion
1. valid_inputs should go to '1' after some delay (say 15 units) if no inputs are X/Z
2. valid_inputs should go to '0' immediately if atleast one input becomes X/Z.
I am trying the above with 2 implementations:
module test (a, b, y);
input a, b;
output y;
wire temp;
assign temp = ^{a,b};
bit valid_inputs_temp, valid_inputs_2;
wire valid_inputs;
always #(temp)
if (temp === 1'b1 || temp === 1'b0)
valid_inputs_temp <= 1'b1;
valid_inputs_temp <= 1'b0;
assign #(15,0) valid_inputs = valid_inputs_temp;
always #(temp)
if (temp === 1'b1 || temp === 1'b0)
#15 valid_inputs_2 <= 1'b1;
valid_inputs_2 <= 1'b0;
While the signal valid_inputs works perfectly, but I'm not quite sure why valid_inputs_2 doesn't work exactly the same? Is there a way I can implement the inertial delay using always-begin procedural code?
Please note that while I could modify the assign statement in the above code such that I totally eliminate the corresponding always-begin block, for some reason I need to use the always-begin style of coding.

The reason the second always block does not work is that you have a blocking #15 delay. That suspends the always process and misses a change on temp if it occurs in less that 15 time units. You need to move the delay to the other side of the <= so it becomes a non-blocking assignment delay.
valid_inputs_2 <= #15 1'b1;
And if you are confused about how an always block executes, see this link.

This is not a good example of inertial delay. The reason is that any 0->1 or 1->0 transition makes the output as 1. So pulses do not logically cause any difference in the output.
So there is no way to distinguish it from transport delay.
Moreover, the output turns 0 immediately for z. So there is no delay in that either.
Hence the signal "valid_inputs_2" is transport delay (<= #15).
To my knowledge there is no way to create inertial delay using always block.


Pythagorean triplet in Matlab

I have been asked to obtain the first 15 triplets according to this series and this code ought to work. However, it does only produce a table (15*3) filled with zero rather than the 15 Pythagorean triplets? Any help will be welcome.
A = zeros(15, 3);
ii = 1;
for c = 5:120
for a=1:c-1
for b=a:c-1
if c2-(a2+b^2) == 0
A(ii,1) = a;
A(ii,2) = b;
A(ii,3) = c;
if A(15, 1) ~= 0
flag = 1;
if flag == 1
if flag == 1
T1 = array2table(A);
So, the code generated a correct table on application-restart before failing on all subsequent attempts. And, now I notice that the code runs successfully only for the first time after every relaunch of the application. (Resolved, thanks Dan Pollard.)
Also, interested in knowing if there is any way to not write an upper limit (120) into the code.
I don't think your if statement is ever satisfied. For example, for c=5, you'd expect a=3, b=4 to be a triplet. But you're only letting a and b go up to floor(sqrt(c-1)), which is 2.
Do you mean to let a and b go up to floor(sqrt(c2-1))?
Edit As the question has changed.
When you run the code, Matlab creates all the variables which you assign, and stores them in the workspace. This can be useful, but here it's hurting you as you have the variable flag which is stored as 1. This means that when the code runs, it checks if flag==1 after the first run through b, which it is, so the code ends. Resolve this by placing clear; at the beginning of your script.
There isn't a practical way to remove the upper limit on c. Matlab has the built-in variable Inf but at best Matlab won't let you use it in that context. Realistically you could just replace the 120 with a really large number, but this will take more time and more memory as the number gets bigger. Computers have a finite RAM to store matlab arrays in though, and there are infinitely many pythagorean triples, so doing the calculation without an upper limit will fail in some way.

What is the meaning of "Algebraic loop involving integers or booleans" in the error message

I'm making the PI controller using the Dymola Platform and i met the error message like below
And here's some of my code that consist of valve which calculate the disp and PI controller which control the amount of the disp.
They are communicating each other using the flag
//PI controller///
if flag_input==1 then //flag_input==1 : Stop control / flag_input==0 : Restart control//
end if;
if error<0 then // error<0 : flag to Valve to restart calculating the disp//
end if;
if (26/5)*(thetta/(2*pi))*0.001>0.026 and flag_input==0 then
//restart calculating the disp when received flag==1 from the PI controller//
elseif (26/5)*(thetta/(2*pi))*0.001<0 and flag_input==0 then
end if;
Can someone tell me what is the meaning of algebraic loop error and figure out the problem?
From your code snippet it's hard to tell where exactly the problem is.
Dymola tells you that you created a large algebraic loop over all the variables listed at the top under Unknowns and the equations listed below in the section Equations.
This can happen easily when you create if statements with variables which depend on each other. Often you just have to use pre() at the right place to break the loop.
Let`s use another small example to explain the problem.
For some reason we try to count the full milliseconds, which have passed in the current simulation and stop, once we reach 100.
model count_ms
Integer y(start=0);
if y >= 100 then
y = 100;
y = integer(1000*time);
end if;
end count_ms;
This code will produce a similar error as yours:
An algebraic loop involving Integers or Booleans has been detected.
Unknowns: y
Equations: y = (if y >= 100 then 100 else integer(1000*time));
From the error message we see that y can not be solved, due to the equation resulting from the if statement. The equation is not solvable, as y depends on itself. To solve such problems pre was introduced, which gives you access to the value of a variable had when the event was triggered.
To fix the code above, we simply have to use pre when we check for y
if pre(y) >= 100 then
and the model simulates as expected.

VHDL core synthesis and implementation in Vivado

I am currently developing an AES encryption core for a Pynq-Z1 FPGA board. I would like to see the routing of the logic in FPGA logic and timing summary of the design.
The project synthesises, but it results in a warning saying that I am using exceeding the number of IOB blocks on the package. This is understandable because the core takes in and outputs a 4 x 4 matrix.
Instead, I would like to have "internal I/O" in order to see the routing on FPGA fabric. How would I go about doing this? Currently, the device view shows an empty topology (shown below) but my synthesised design utilises 4148 LUT and 389 FF. I expect to see some CLBs highlighted.
design device view
I appreciate any feedback and reference to any application notes which might further progress my FPGA understanding.
You can use a simple wrapper around your core with a serial interface. Something like:
entity wrapper is
port(clk, rst, dsi, dsi_core, shift_out: in std_ulogic;
di: in std_ulogic_vector(7 downto 0);
dso_core: out std_ulogic;
do: out std_ulogic_vector(7 downto 0)
end entity wrapper;
architecture rtl of wrapper is
signal di_core, do_core, do_buffer: std_ulogic_vector(127 downto 0);
u0: entity work.core(rtl)
port map(clk, rst, dsi_core, di_core, dso_core, do_core);
input_process: process(clk)
if rising_edge(clk) then
if rst = '1' then
di_core <= (others => '0');
elsif dsi = '1' then
di_core <= di & di_core(127 downto 8);
end if;
end if;
end process input_process;
output_process: process(clk)
if rising_edge(clk) then
if rst = '1' then
do_buffer <= (others => '0');
elsif dso_core = '1' then
do_buffer <= do_core;
elsif shift_out = '1' then
do_buffer <= do_buffer(119 downto 0) & X"00";
end if;
end if;
end process output_process;
do <= do_buffer(127 downto 120);
end architecture rtl;
The wrapper just receives inputs, one byte at a time (when dsi = '1') and shifts them in a 128-bits register that is connected to the 128-bits input of your core. When 16 bytes have been entered the environment asserts dsi_core to instruct the core that the 128-bits input can be sampled and processed. The environment waits until the core asserts dso_core, signalling that the processing is over and the 128-bits output is available on the do_core output port of core. When dso_core is asserted the wrapper samples do_core in a 128-bits register (do_buffer). The environment can now read the leftmost byte of do_buffer which drives the do output port of the wrapper. The environment asserts shift_out to shift do_buffer one byte to the left and read the next byte...
This kind of wrapper is a very common practice when you want to test in the real hardware a sub-component of a larger system. As it frequently happens that the number of IOs of sub-components exceeds the number of available IOs, serial input-output solves this. Of course there is a significant latency overhead due to the IO operations but it is just for testing, isn't it?
Your demands are contradictory.
If the design can not place all the I/Os it can not show all the routing as it has not all the begin and/or endpoints. You should reduce your I/O.
The simplest way is to have a real or imaginary interface which much less pins.
An imaginary interface is one which is syntactically correct, reduces your I/Os but will never be used in real life so does not have to be functionally correct.
As it happens you are the third person to ask about reducing I/O in the last weeks and I posted an (untested) SPI interface which has a parameter to generate an arbitrary number of internal inputs and outputs. You can find it here: How can I assign a 256-bit std_logic_vector input

Systemverilog rule 4.7 (nondeterminism) is interpreted differently by vcs vs iverilog/modelsim

I am still a bit confused about how SystemVerilog's 2012 rule 4.7 is implemented.
The rule states that in a situation like this:
module test;
logic a;
integer cnt;
initial begin
cnt = 0;
a <= 0;
a <= 1;
a <= 0;
a <= 1;
a <= 0;
always #(posedge a)
cnt <= cnt + 1;
all assignments would be scheduled on the Non Blocking Assignment queue, and must then be executed in order. The last value wins.
Up to here, it's all clear.
What happens next though is not the same for all simulators.
iverilog and Modelsim (at least the Vivado 2016/3 edition) create one event on 'a', which causes cnt to increment. This seems to also match the behaviour as illustrated by Mr Cummings at SNUG 2000
VCS however filters out the intermediate values and applies only the last one, which by the way is also the way that real flip flops work.
In this case it is not a purely hypothetical discussion, the simulation results are different, and the iverilog/modelsim behaviour could cause bugs that are very difficult to catch, because the flop toggles but no value change is seen in the waveforms.
The other point is this: if iverilog/modelsim are correct, why then are they creating one event and not two?
Additional note.
The example above is indeed not very meaningful.
A more realistic case would be
always #(posedge clk)
clk2 <= 1'b1;
if (somecondition)
clk2 <= 1'b0;
always #(posedge clk2, negedge rst_n)
if (!rst_n)
q <= 1'b0;
q <= ~q;
this is perfectly legal and in real hardware would never glitch.
the first always is actually logically identical to
always #(posedge clk)
if (somecondition)
clk2 <= 1'b0;
clk2 <= 1'b1;
However, if you simulate the first version with ModelSim, you'll see your q happily toggling away, with clk2 constant 0. This would be a debugging nightmare.
Your last question is easy to explain. It's not that simulators create only one event, they don't- it's that only the first event schedules the #(posedge) to resume the always process and the other events happen in the NBA region before the always block resumes execution in the next active event region.
I can't justify the behavior of other simulators. You are not allowed to make multiple assignments to the same flip-flop in real hardware, so you analogy in not that simple. It's possible to have an un-timed description and get multiple (#posedge's) without time passing. So filtering would prevent that coding style.

verilog behavioral RTL to structural

I been assigned to manually convert the below RTL into its structural equivalent. I don't understand how you'd convert it. What's the structural description for this code in verilog? What steps should I take?
module cou(
output reg [7:0] out,
input [7:0] in,
input iti,
input c,
input clock);
always #(posedge clock)
if (iti == 1)
out <= in;
else if (c == 1)
out <= out + 1;
Here is the basic process:
always #(posedge clock) tells you you have positive-edge D-flip-flops without an asynchronous reset or set.
out is the only value being assigned within the always statment. The size of out tells you the number of flops needed.
Drawing a component level schematic diagram can help visualize the structural logic.
Now all that is needed to figure out is the combination logic to the flop's D pin. I'll give you a clue that it can be done using only muxes and an adder.