I'm trying to synthesize an Altera circuit using as few logic elements as possible. Also, embedded multipliers do not count against logic elements, so I should be using them. So far the circuit looks correct in terms of functionality. However, the following module uses a large amount of logic elements. It uses 24 logic elements and I'm not sure why since it should be using 8 + a couple of combinational gates for the case block.
I suspect the adder but I'm not 100% sure. If my suspicion is correct however, is it possible to use multipliers as a simple adder?
module alu #(parameter N = 8)
(
output logic [N-1:0] alu_res,
input [N-1:0] a,
input [N-1:0] b,
input [1:0] op,
input clk
);
wire [7:0] dataa, datab;
wire [15:0] result;
// instantiate embedded 8-bit signed multiplier
mult mult8bit (.*);
// assign multiplier operands
assign dataa = a;
assign datab = b;
always_comb
unique case (op)
// LW
2'b00: alu_res = 8'b0;
// ADD
2'b01: alu_res = a + b;
// MUL
2'b10: alu_res = result[2*N-2:N-1]; // a is a fraction
// MOV
2'b11: alu_res = a;
endcase
endmodule
Your case statement will generate a 4 input mux with op as the select which uses a minimum of 2 logic cells. However since your assigning an 8-bit variable in the case block you will require 2 logic elements for each bit of the output. Therefore total logic elements is 8*2 for the large mux and 8 for the adder giving you 24 as the total.
I'm doing this project too so I won't give too much away about how to optimise this. However what I will tell you is that both the mux's and the adder can be implemented using multipliers, 8 at most. With that said I don't think this architecture is optimal for a multiplier implementation.
Related
I do have a problem understanding how always_ff works in a way of creating a mesh of logic gates.
What do I mean ? When I use always_comb like here :
module gray_koder_dekoder(i_data, i_oper, o_code);
parameter LEN = 4;
input logic [LEN-1:0] i_data;
input logic i_oper;
output logic [LEN-1:0] o_code;
int i;
always_comb
begin
o_code = '0;
i = LEN-1;
if (i_oper == 1'b1) // 1'b1 - operacja
begin // kodowania
o_code = i_data ^ (i_data >> 1);
end
else // dla kazdej innej wartosci
begin // realizuj dokodowanie
o_code = i_data;
for (i=LEN-1; i>0; i=i-1)
begin
o_code[i-1] = o_code[i]
^ i_data[i-1];
end
end
end
endmodule
So how do I see it.
At the beggining the program sees that output is 0000,
now if the i_oper is equal 1 so the input is 1 then it checks changes the o_code to i_data ^ (i_data >> 1) so now the program want's to do combination of logic gates for this operation but if the i_oper is equal 0 then the program makes another set of logic gates to get different o_code.
So the always_comb gives the final result for every bite in the i_data that results in o_code.
So my teacher said that always_comb is "blocking" but always_ff is not "blocking" I don't get it ...
So the always_ff doesn't give the final result of logic gates for the input to get a specific output ?
Another example of always_comb :
module gray_dekoder (i_gray, o_data);
parameter LEN = 4;
input logic [LEN-1:0] i_gray;
output logic [LEN-1:0] o_data;
always_comb
begin
o_data = i_gray;
for (int i=LEN-1; i>0; i=i-1)
o_data[i-1] = o_data[i] ^ i_gray[i-1];
end
endmodule
So at the beggining the program sees that the output is 0000 so it will make a set of logic combination to have 0 at the end. Then he sees loop for that modifies the output so the program checks every bit of the input like bit nr 3 then nr 2 then nr 1 etc. and creates for every input specific output so now the output is not 0000 anymore but set of instructions that modifies the output made from loop "for"
so the always comb gives a final result from the analizing the whole code from top to bottom of "always_comb" and creates a set of instructions/set of logic gates that helps it. Because always_comb overwrites the previous instructions like 0000 it was a basic instruction but then was overwrited by the loop "for"
But maybe I think wrongly because if instruction doesn't overwrite the 0000 instruction like here :
module replace(i_a, i_b, o_replaced, o_error);
parameter BITS = 4;
input logic signed [BITS-1:0] i_a, i_b;
output logic signed [BITS-1:0] o_replaced;
output logic o_error;
int i;
always_comb
begin
o_replaced = '0;
if(i_b < 0 || i_b > BITS)
begin
o_error = 1;
o_replaced = 'x;
end
else
begin
i = i_b;
o_replaced = i_a;
o_replaced[i-1] = 1;
o_error = 0;
end
end
endmodule
I have here 0000 output that isn't overwrited for "else" So I don't know what happens then.
I think of always_comb as an "final result" that gives a set of instructions how to create logic gates. But the final result is at the end, so if something changes then the beggining result doesn't matter "overwrited" but with if loop it doesn't work with my mind set.
So always_ff I heard that it doesn't give a final result that it can stop at any point not like in always_comb that the program analysis from top to bottom.
Verilog is designed to represent behavior of hardware and it is not a regular programming language. It operates different semantics.
At a very top glance, hardware consists of combinational logic and flops (and latches). From the other point of view hardware is a set of parallel functions which are synchronized across design by clocks. This means that at a clock edge a lot of hardware devices start working in parallel and they should produce results by the next clock edge. Those results could be used by other functions at the next clock cycle.
Roughly, combinational logic defines a function, flops provide synchronization.
In verilog all those devices are described with always blocks. TUse of edges, e.g., #(posedge clk) provides synchronization points and usually defines flops in the code. A simple function and a flop look like the following.
// combinational logic
always #* // you can use always_comb instead
val = in1 & in2; // a combinational function
// flop
always #(posedge clk) // you can use always_ff instead
out <= val; // synchronization
So, in the example val is calculated by the combinational function and its synchronous out is made available to other functions to be used by a flop. You can see progression of clocks and results in a waveform.
So, this is what always_ff is doing, just providing synchronization and expressing flops for synthesis.
In general, always, always_comb, always_ff and always_latch are identical. The last three are system verilog blocks and just provide additional hints to the compiler which can run additional checks on them. I intentionally used just always blocks in my example to show that. There are some other conditions which need to be programmed to cleanly express the intention. So, your assertion about different working of always_ff has no base. It works the same as other always blocks.
What I think confuses you, is use of blocking (=) and non-blocking (<=) assignments. It does not matter for synthesis which one you use, but it matters for simulation. The difference is described and numerous documents and examples. To understand it properly you need to look into verilog simulation scheduling semantics.
But the rule of thumb is that you should use non-blocking assignments (<=) in flops and use blocking (=) in combinational logic. In flops '<=' allows simulating real behavior of flops. Remember that hardware is a massively-parallel evaluation engine. Consider the following example:
always_ff #(posedge clk) begin
out1 <= in1;
out2 <= out1;
end
The above example defines at least two flops working at posedge clk. out1 and out2 must be synchronized at this clock. It means, the flops have to catch values which existed before the edge and present them after the edge. So, for out1 the value existed before the edge is in1, evaluated by a combinational logic. What would be the value of out2? Which value existed before the edge? Apparently, the value of the out1 before it gets changed to the new value of in1.
clk ___|---|___|---|___
in1 0
out1 x 0 << new value of in1
out2 x << old value of ou1
.
in1 1 .
out1 . 1 << new value of in1
out2 . 0 << old value of ou1
So, after evaluation of the block, the at the first edge the value of out2 will be 'x' (previous value of out1), at the second clock edge it will finally get value of 'in1' as it existed at the previous clock cycle.
I hope it would make your understanding a bit better.
I want to have a urandom_range(); which will not repeat a value once its picked in a simulation ? If it has exhausted its supply of 'available' numbers, then perhaps it can repeat .
Is there any keyword in systemverilog which will help quickly to get around this ?
Not a SV expert here so an example would really help! Thanks
randc does exactly this. (cyclic randomization)
class A;
randc bit[7:0] m;
endclass
Each time you call randomize() on the same object, it will not repeat value for m until all possible values have been given.
Simulators have limits on how large the cyclic value can be, but the standard requires a minimum of 8-bits. If you have a larger value, then you can use the inside operator.
class A;
rand bit[23:0] r;
bit [23:0] list[$];
constraint c { !(r inside {list}); }
function void post_randomize();
list.push_back(r);
endfunction
endclass
If you really expect to cycle through the list, it might be simpler to build the list first, and then shuffle through the list.
bit [7:0] list[20];
for(int i=0;i<20;i++) list[i] = i+10; // range 10-29
list.shuffle();
// cycle through list[0] ... list[29]
list.shuffle();
// cycle through list[0] ... list[29]
You can declare a variable with randc identifier. This is called 'cyclical random' and will ensure exactly what you are requiring.
Note: This requires a license that supports randomization and random variables. Most commercial simulators do provide this but at a higher cost. If you are constrained by this and need to only use the system calls - $urandom or $urandom_range, I would implement something like a queue that tracks all the values returned.
function automatic void find_unique_num();
int c;
int vals[$];
bit found;
do begin
c = $urandom_range(10, 1);
foreach(vals[i])
if (c == vals[i]) found = 1;
end
while (!found);
vals.push_back(c);
return c
endfunction
I'm now trying to force a bit in an array of bits. The position of bit to be "forced" depends on the variable i while others bits keeps 0.
for example, if I have the array bit [2:0] A
when i=0, I want A to be 3'b001 when i=1, A should be 3'b010 when i=2, A should be 3'b100
but I have to use force statement since I'm writing testbench to test path of signals.
Does anyone know how I can do that?
update 1: #Serge I have to use the force statement it was declared like this: bit [31:0] A I tried this: force A[31:0] = 32'd0; for (int i=0;i<=31;i++) begin force A[i]=1; end Obviously it doesn't work. Actually I was testing different scearios to see if the path of the signal is properly done or not
update2: I have now generated a script to force the signal one by one
However, I'm not sure whether I encounter a bug or not
when this statement is executed
a[31:0] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,`TOP_TB.clk_1T,0};
a[1] keeps to be 0. Instead, the 1T clk appear at a[0]. Does anyone know what's happening?
update3: Thanks for your help!!I have actually thought that the 0s are interpreted in 32 bits, but I don't really know why the 1T clk appears at bit 0 instead of bit 1 (I'm using questasim). Btw, I have solved this by generating a script and copy and paste the content inside a text file generated by the script.
You can use bit-indexing in verilog to do what you want. It's like indexing into an array in C. For example:
array_a[i] <= new_value;
In the code above, if i is 0, it will assign new_value to bit 0 and keep the rest unchanged.
From the Language Reference Manual (IEEE Std 1800-2017), section 10.6.2 ("The force and release procedural statements"):
The left-hand side of the assignment can be a reference to a singular variable, a net, a constant bit-select of a vector net, a constant part-select of a vector net, or a concatenation of these. It shall not be a bit-select or a part-select of a variable or of a net with a user-defined nettype.
It appears that it is not possible to do directly what you want.
My best bet for doing more or less what you want to do (I just forced the single bit and left the other ones evolving as normal), keeping in mind that the LHS of the force assignment should be constant, is something like this:
module dut (
input logic [31:0] a,
input logic [31:0] b,
output logic [31:0] z
);
always_comb z = a & b;
endmodule: dut
module tb;
logic [31:0] a;
logic [31:0] b;
logic [31:0] z;
dut dut (.*);
logic clk = 0;
initial forever
#(5ns) clk = !clk;
logic [5:0] sel;
initial forever begin
case (sel)
6'd0: force z[0] = clk;
6'd1: force z[1] = clk;
6'd2: force z[2] = clk;
6'd3: force z[3] = clk;
6'd4: force z[4] = clk;
6'd5: force z[5] = clk;
6'd6: force z[6] = clk;
6'd7: force z[7] = clk;
6'd8: force z[8] = clk;
6'd9: force z[9] = clk;
6'd10: force z[10] = clk;
6'd11: force z[11] = clk;
6'd12: force z[12] = clk;
6'd13: force z[13] = clk;
6'd14: force z[14] = clk;
6'd15: force z[15] = clk;
6'd16: force z[16] = clk;
6'd17: force z[17] = clk;
6'd18: force z[18] = clk;
6'd19: force z[19] = clk;
6'd20: force z[20] = clk;
6'd21: force z[21] = clk;
6'd22: force z[22] = clk;
6'd23: force z[23] = clk;
6'd24: force z[24] = clk;
6'd25: force z[25] = clk;
6'd26: force z[26] = clk;
6'd27: force z[27] = clk;
6'd28: force z[28] = clk;
6'd29: force z[29] = clk;
6'd30: force z[30] = clk;
6'd31: force z[31] = clk;
endcase
#(clk or sel);
release z[0];
release z[1];
release z[2];
release z[3];
release z[4];
release z[5];
release z[6];
release z[7];
release z[8];
release z[9];
release z[10];
release z[11];
release z[12];
release z[13];
release z[14];
release z[15];
release z[16];
release z[17];
release z[18];
release z[19];
release z[20];
release z[21];
release z[22];
release z[23];
release z[24];
release z[25];
release z[26];
release z[27];
release z[28];
release z[29];
release z[30];
release z[31];
end
initial begin
a = 32'h0055aaffaa55ff00;
b = 32'habcdef0123456789;
sel = 6'd0;
#(98ns);
sel = 6'd6;
end
endmodule: tb
This works on my version of ModelSim (INTEL FPGA STARTER EDITION 10.6c).
As for why your code:
a[31:0] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,`TOP_TB.clk_1T,0};
does not work, my best guess is that every "0" is interpreted as an integer 0, i.e. 32'd0. You then effectively have something on the line:
a[31:0] = {960'd0, `TOP_TB.clk_1T, 32'd0};
an the RHS gets truncated to fit into just 32 bits. The truncation of course implies anything left of "32'd0" is thrown away, but your compiler should really raise a warning about this. Something like:
a[31:0] = {30'b0,`TOP_TB.clk_1T,1'b0};
works for me. You can of course plug that construct in the "case" I used in my example too.
Systemverilog does not let you force a bitslice of a vector, so you'll have to force the whole net. A good strategy here is to force the net to be:
force A = A ^ my_force_vector;
And set my_force_vector to the bits you'd like to force.
See this answer here: https://stackoverflow.com/a/50845703/6262513
I am using the following function in my System Verilog code. I wondered if there was an idiomatic way of achieving the same effect that perhaps would not require the width to be hardwired. I tried streaming operators, but could not quite get them to work. I need to use unpacked arrays. Many thanks.
function bit [64:0] cat8 (bit [7:0] a[8]);
return { a[7], a[6], a[5], a[4], a[3], a[2], a[1], a[0] };
endfunction;
since you reversing the array in concat, there is no good way to express it.
you have:
bit [7:0] a[8];
which is equivalent to
bit [7:0] a[0:7];
in your concat you start with a[7] in the most significant bits whether 7 is the least significant index in the array.
This is the reason why the streaming operators did not work in your case.
So, if you really need to reverse the array, than you have what you have, otherwise you can find that these 2 things are equivalent:
{ a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7] }
and
{ >> {a}}
of course you can declare your array as bit [7:0] a[7:0] and keep index ordering in concat as you have. But it will not reverse the array again, as in the above case.
you can define a new datatype through typedef.
typedef bit[7:0] octet;
typedef octet upack[7:0];
function bit [64:0] cat8 (upack a);
// your code
endfunction;
Below should work for you
module top;
function bit [63:0] cat8 (bit [7:0] a[8]);
return { <<8{a}};
endfunction;
bit [7:0] arr[8];
initial begin
arr= '{1,2,3,4,5,6,7,8};
foreach (arr[i])$display("%h", arr[i]);
$display("%h", cat8(arr));
end
endmodule
I wrote a 32-bit LFSR based on the taps from [1]. I want to ask if the following description is right for the 32-bit LFSR with the taps 32,22,2 and 1.
module lfsr (
input logic clk_i,
input logic rst_i,
output logic [31:0] rand_o
);
logic[31:0] lfsr_value;
assign rand_o = lfsr_value;
always_ff #(posedge clk_i, negedge rst_i) begin
if(~rst_i) begin
lfsr_value <= '0;
end else begin
lfsr_value[31:1] <= lfsr_value[30:0];
lfsr_value[0] <= ~(lfsr_value[31] ^ lfsr_value[21] ^ lfsr_value[1] ^ lfsr_value[0]);
end
end
endmodule
[1] http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Looks Ok. you could also use an XOR, instead of an XNOR, as long as you reset to something else (the XOR version locks up at all 0's, the XNOR at all 1's).
For many apps the pseudo-random output is the single bit you shift out (31 in your case), rather than the entire register. It's also (more?) common to shift right, and put the XOR data in the top bit, and use the bit 0 output as your PR data.
Your code is specifically SystemVerilog, and not Verilog, so I've removed the Verilog tag.