SystemVerilog not recognizing constant: Error: Range must be bounded by constant expression - constants

In this short example, I want to simplify writing of signal width. With one signal, there is really no need to do this, but in my real code, I have many such signals and declaring them with the longer style wouldn't be appropriate.
Could you please enlighten me why I'm getting error for signal_2?
module sample #(parameter BYTE_WIDTH = 4);
const int BIT_WIDTH = BYTE_WIDTH * 8;
logic [BYTE_WIDTH * 8 -1 : 0] signal_1; // works
logic [BIT_WIDTH -1 : 0] signal_2; // ** Error: Range must be bounded by constant expressions.
endmodule

A const variable is assigned its value at run time. Which is too late: the width of your variable signal_2 needed to be fixed at compile time. So, what you need is a localparam, which (like a parameter) is fixed at compile time, but (unlike a parameter) cannot be overridden from outside:
module sample #(parameter BYTE_WIDTH = 4);
localparam BIT_WIDTH = BYTE_WIDTH * 8;
logic [BYTE_WIDTH * 8 -1 : 0] signal_1;
logic [BIT_WIDTH -1 : 0] signal_2;
endmodule
https://www.edaplayground.com/x/2fKU

Related

How to eliminate dead code in Dymola/Modelica

I am trying to slim down a very complex model to improve performance, and noticed big performance changes when I add or remove variables into the signal bus, especially multi-body frames.
I am wondering if there is any setting that can eliminate code that isn't involved in generating outputs from the model.
I tried setting the bus connector to "protected" to ensure it doesn't become an output but the code to calculate them is still being generated.
I also tried these flags but it doesn't eliminate the dead code:
Advanced.Embedded.OptimizeForOutputs=true;
Advanced.SubstituteVariablesUsedOnce=true;
Evaluate=true;
Advanced.EvaluateAlsoTop=true;
Advanced.SubstituteVariablesUsedOnce=true;
This is a simple model to replicate the scenario:
model TestBusConnector
extends Modelica.Icons.Example;
protected
Modelica.Blocks.Examples.BusUsage_Utilities.Interfaces.ControlBus controlBus
annotation (Placement(transformation(extent={{-20,-20},{20,20}})));
public
Modelica.Blocks.Sources.Sine sine(freqHz=1)
annotation (Placement(transformation(extent={{-40,-50},{-20,-30}})));
Modelica.Blocks.Sources.Constant const(k=0)
annotation (Placement(transformation(extent={{-10,50},{10,70}})));
Modelica.Blocks.Interfaces.RealOutput y
annotation (Placement(transformation(extent={{90,-10},{110,10}})));
equation
connect(y, const.y) annotation (Line(points={{100,0},{60,0},{60,60},{11,60}}, color={0,0,127}));
connect(sine.y, controlBus.testBusVariable)
annotation (Line(points={{-19,-40},{0,-40},{0,0}}, color={0,0,127}));
annotation (experiment(__Dymola_fixedstepsize=0.001, __Dymola_Algorithm="Euler"),
__Dymola_experimentFlags(Advanced(
InlineMethod=0,
InlineOrder=2,
InlineFixedStep=0.001)),
__Dymola_experimentSetupOutput(
states=false,
derivatives=false,
inputs=false,
outputs=false,
auxiliaries=false,
equidistant=false,
events=false));
end TestBusConnector;
Code generated from Dymola 2019 FD01 is shown below:
include <dsblock6.c>
PreNonAliasNew(0)
StartNonAlias(0)
DeclareVariable("sine.amplitude", "Amplitude of sine wave", 1, 0.0,0.0,0.0,0,513)
DeclareVariable("sine.freqHz", "Frequency of sine wave [Hz]", 1, 0.0,0.0,0.0,0,513)
DeclareVariable("sine.phase", "Phase of sine wave [rad|deg]", 0, 0.0,0.0,0.0,0,513)
DeclareVariable("sine.offset", "Offset of output signal", 0, 0.0,0.0,0.0,0,513)
DeclareVariable("sine.startTime", "Output = offset for time < startTime [s]", 0,\
0.0,0.0,0.0,0,513)
DeclareVariable("sine.y", "Connector of Real output signal", 0.0, 0.0,0.0,0.0,0,512)
DeclareVariable("const.k", "Constant output value", 0, 0.0,0.0,0.0,0,513)
DeclareVariable("const.y", "Connector of Real output signal", 0, 0.0,0.0,0.0,0,513)
DeclareOutput("y", "", 0, 0.0, 0.0,0.0,0.0,0,513)
DeclareAlias2("controlBus.testBusVariable", "Connector of Real output signal", \
"sine.y", 1, 5, 5, 1028)
EndNonAlias(0)
#define DymolaHaveUpdateInitVars 1
#include <dsblock5.c>
DYMOLA_STATIC void UpdateInitVars(double*time, double* X_, double* XD_, double* U_, double* DP_, int IP_[], Dymola_bool LP_[], double* F_, double* Y_, double* W_, double QZ_[], double duser_[], int iuser_[], void*cuser_[],struct DYNInstanceData*did_,int initialCall) {
}
StartDataBlock
EndDataBlock
The translated modelica code (dsmodel.mof) still has the calculation for the sine block.
// Translated Modelica model generated by Dymola from Modelica model
// TEMP.TEST.TestBusConnector
// -----------------------------------------------------------------------------
// Initial Section
sine.amplitude := 1;
sine.freqHz := 1;
sine.phase := 0;
sine.offset := 0;
sine.startTime := 0;
const.k := 0;
const.y := 0;
y := 0.0;
// -----------------------------------------------------------------------------
// Conditionally Accepted Section
sine.y := (if time < 0 then 0 else sin(6.283185307179586*time));
// -----------------------------------------------------------------------------
// Eliminated alias variables
// To have eliminated alias variables listed, set
// Advanced.OutputModelicaCodeWithAliasVariables = true
// before translation. May give much output.
Ideally, I would like the model to translate to:
y := 0.0;
The reason the other answers don't work is that your model is not consistent with your question:
"I am wondering if there is any setting that can eliminate code that isn't involved in generating outputs from the model."
By connecting the control-bus to sine.y you implicitly create an output, and thus sine.y is involved in generating outputs from the model.
That can be avoided in one of the following ways:
Remove the connection between sine.y and controlBus
Change controlBus to be protected
Change so that controlBus isn't at the top-level
It's not a direct answer to your question, but still it could help to improve performance. Part of the computational effort you are trying to avoid is generated by computing variables in the result file. This can be avoided by the settings below:
This can be set as an annotation in the model itself using:
annotation (__Dymola_experimentSetupOutput(
states=false,
derivatives=false,
inputs=false,
auxiliaries=false));
There is another flag which could help. It does not give the result you expected, but it might be still useful:
Advanced.Define.AutoRemoveAuxiliaries = true;
The Dymola User Manual 2 describes the flag as follows:
Removes code for auxiliary variables that neither influences the
simulation state nor the outputs. This improves performance a bit.
From this description my expectation was that the code is generated like you asked for, but unfortunately it is not the case.

CAPL - Converting 4 raw bytes into floating point

CAPL - Vector.
I receive message ID 0x110 which holds current information:
0x3E6978D5 -> 0.228
Currently I can read the data and save into Enviroment Variable to show in Panel using:
putValue(slow_current, this.long(4));
But I don't know how to convert the HEX 4 bytes into float variable, since I cannot use address or casting (float* x = (float *)&vBuffer;)
How to make this conversion in CAPL script? Thanks.
Typically your dbc-file shall contain conversion info from raw value (in your case 4B long) to physical value in form of factor and offset definition:
So your physical value of current shall be calculated as follows:
phys_val = (raw_value * factor) + offset
Note: if you define negative offset then you actually subtracting it in equation above.
But it seems you don't have dbc-file so you need to figure out factor and offset by yourself (if you have 2 example raw values and know their physical equivalent then it shall be as easy as finding linear equation parameters -> y = ax + b).
CAPL shall look like this:
variables
{
float current_phys;
/* adjust below values to your needs */
float factor = 0.001
dword offset = -1000
}
on message 0x110
{
current_phys = (this.long(4) * factor) + offset;
write(current_phys);
}
Alternate solution if you don't want to force transform the value:
You define a sysvar type float(double) and use that sysvar in the panel
(link to it), instead of the envVar
or you change the type of envVar to float(double).
The translation into float will be done automatically
.
Caveat: usually this trick requires that the input number is also 8 bytes as the defined CAPL float range 8 bytes. But you have this by message payload length constraint= 8bytes.
Does not look good, but works:
received msg: 0x3E6978D5
putValue(float4byte,interpretAdFloat(this.long(4)));
float4byte = 0.23
i just reused Vinícius Oliveira solution to avoid creating environment variable. it worked
float floatvalue;
floatvalue = interpretAsFloat(HexValue);
input (HexValue) = 0x3fe20e3a
output(floatvalue() = 1.76606

Parameterizable cross length

I've got a protocol that supports bursts, where each transaction is composed of N individual transfers. Each transfer has a unique index from 0 to N-1. I would like to cover that all transfer orders have been seen (i.e. 0 1 2 3, 1 2 3 0, 0 2 3 1, etc.). The value of N is variable (though in my case I only care about 4 and 8 for now).
The naive approach would be to cover each index individually and cross them, but this would mean I would need multiple covergroups, one per each value of N.
For N = 4:
covergroup some_cg;
coverpoint idxs[0];
coverpoint idxs[1];
coverpoint idxs[2];
coverpoint idxs[3];
cross idxs[0], idxs[1], idxs[2], idxs[3] {
ignore_bins non_unique_idxs[] = some_func_that_computes_bins_for_4_idxs();
}
endgroup
For N = 8:
covergroup some_cg;
coverpoint idx[0];
coverpoint idx[1];
coverpoint idx[2];
coverpoint idx[3];
coverpoint idx[4];
coverpoint idx[5];
coverpoint idx[6];
coverpoint idx[7];
cross idx[0], idx[1], idx[2], idx[3], idx[4], idx[5], idx[6], idx[7] {
ignore_bins non_unique_idxs[] = some_func_that_computes_bins_for_8_idxs();
}
endgroup
The two functions that generate the ignore bins each have to return different types (queue of struct with 4/8 fields), even though conceptually the operation for computing all illegal combinations is similar, regardless of the value of N. This could probably be solved with some clever use of the streaming operator, to stream the contents of the structs into arrays of N elements. The issue of redundancy in the covergroup definitions remains though. The only way I can think of solving this is by generating the code.
Another idea would be to pack all indexes into a packed array of the appropriate size and cover that:
covergroup some_other_cg;
coverpoint packed_idxs {
ignore_bins non_unique_idxs[] = some_func_that_computes_bins_for_N_idxs();
}
endgroup
// Probably won't compile, but it would also be possible to use the streaming op
// Idea is to pack into integral type
foreach (idxs[i])
packed_idxs[i * idx_len +: len] = idxs[i];
It's a pain to debug coverage holes, as it's difficult to figure what transfer order a certain packed value belongs to, especially if the values are shown in decimal radix. I believe the way values are displayed varies from tool to tool and it's not possible to control this. I also don't see any possibility to give names to individual bins using strings.
I would welcome any input that would improve upon any of the two suggestions. The goal is to have one coverage wrapper class with a parameter for the number of transfers and to be able to instantiate that and get the coverage:
class transfer_orders_coverage #(int unsigned NUM_TRANSFERS);
// ...
endclass
Adding as and 'Answer' since it was too long for comment. Pardon me if I am not clear.
Can you add some logic before sampling CG and use some input argument that denotes array position of idxs? For the crosses, you can maintain a packed array of size N and make individual bits 1 when a particular pattern is detected. At the end of sim, you can sample the coverage for that pattern in some different CG.
Basically the idea is to offload logic inside the covergroups and add logic surrounding sample function. Here is a rough idea about what I was thinking.
class transfer_orders_coverage #(int unsigned N = 4);
int idxs[N];
bit [(N -1) : 0] pattern; // Make indexes HIGH according to sampled pattern
// ...
covergroup cross_cg;
mycp_cross: coverpoint pattern{
ignore_bins myignbn = {some_generic_function_for_N(pattern)};
}
endgroup
covergroup indiv_cg with function sample (int index);
mycp_indiv: coverpoint idxs[index]{
// some bins to be covered in ith position of idxs
}
endgroup
function new();
cross_cg = new;
indiv_cg = new;
endfunction
function bit [(N -1) : 0] some_generic_function_for_N(bit [(N -1) : 0] pattern);
// check which bits in "pattern" are to be covered and which are to be ignored
//return some_modified_pattern;
endfunction
function void start();
// Any logic for sampling ith position
foreach(idxs[i]) begin
indiv_cg.sample(i);
pattern[i] = 1'b1;
end
cross_cg.sample();
endfunction
endclass
module top();
transfer_orders_coverage #(4) tr;
initial begin
tr = new;
tr.start();
end
endmodule
Let me know if it seems feasible or not.
I guess, the following solution may work in this case.
covergroup some_cg (int num);
4n_cp : coverpoint ({idxs[0], idxs[1], idxs[2], idxs[3]}) iff (num == 4)
{
option.weight = (num == 4) ? 1 : 0; // Weight might change depending on other coverpoints
ignore_bins non_unique_index[] = <Your Function>;
}
8n_cp : coverpoint ({idxs[0], idxs[1], idxs[2], idxs[3], idxs[4], idxs[5], idxs[6], idxs[7]}) iff (num == 8)
{
option.weight = (num == 8) ? 1 : 0; // Weight might change depending on other coverpoints
ignore_bins non_unique_index[] = <Your Function>;
}
endgroup
// Where you instantiate covergroups
some_cg c1 = new(4);
Let me know, how the above idea works.

Convert rnorm output of NumericVector with length of 1 to a double?

In the following code I am trying to generate a NumericVector of values from a normal distribution, where every time rnorm() is called each time with a different mean and variance.
Here is the code:
// [[Rcpp::export]]
NumericVector generate_ai(NumericVector log_var) {
int log_var_length = log_var.size();
NumericVector temp(log_var_length);
for(int i = 0; i < log_var_length; i++) {
temp[i] = rnorm(1, -0.5 * log_var[i], sqrt(log_var[i]));
}
return(temp);
}
The line that is giving me trouble is this one:
temp[i] = rnorm(1, -0.5 * log_var[i], sqrt(log_var[i]));
It is causing the error:
assigning to 'typename storage_type<14>::type' (aka 'double') from
incompatible type 'NumericVector' (aka 'Vector<14>')
Since I'm returning one number from rnorm, is there a way to convert this NumericVector return type to a double?
Rcpp provides two methods to access RNG sampling schemes. The first option is a single draw and the second enables n draws using some sweet sweet Rcpp sugar. Under your current setup, you are opting for the later setup.
Option 1. Use just the scalar sampling scheme instead of sugar by accessing the RNG function through R::, e.g.
temp[i] = R::rnorm(-0.5 * log_var[i], sqrt(log_var[i]));
Option 2. Use the subset operator on the NumericVector to obtain the only element.
// C++ indices start at 0 instead of 1
temp[i] = Rcpp::rnorm(1, -0.5 * log_var[i], sqrt(log_var[i]))[0];
The prior option will be faster and better. Why you might ask?
Well, Option 2 creates a new NumericVector, fills it with a call to Option 1, then requires a subset operation to retrieve the value before assigning it to the desired scalar.
In any case, RNG can be a bit confusing. Just make sure to always prefix the function call with the correct namespace (e.g. R:: or Rcpp::) so that you and perhaps future programmers avoid any ambiguity as to what kind of sampling scheme you've opted for.
(This is one of the downside of using namespace Rcpp;)

How to do bitwise operation decently?

I'm doing analysis on binary data. Suppose I have two uint8 data values:
a = uint8(0xAB);
b = uint8(0xCD);
I want to take the lower two bits from a, and whole content from b, to make a 10 bit value. In C-style, it should be like:
(a[2:1] << 8) | b
I tried bitget:
bitget(a,2:-1:1)
But this just gave me separate [1, 1] logical type values, which is not a scalar, and cannot be used in the bitshift operation later.
My current solution is:
Make a|b (a or b):
temp1 = bitor(bitshift(uint16(a), 8), uint16(b));
Left shift six bits to get rid of the higher six bits from a:
temp2 = bitshift(temp1, 6);
Right shift six bits to get rid of lower zeros from the previous result:
temp3 = bitshift(temp2, -6);
Putting all these on one line:
result = bitshift(bitshift(bitor(bitshift(uint16(a), 8), uint16(b)), 6), -6);
This is doesn't seem efficient, right? I only want to get (a[2:1] << 8) | b, and it takes a long expression to get the value.
Please let me know if there's well-known solution for this problem.
Since you are using Octave, you can make use of bitpack and bitunpack:
octave> a = bitunpack (uint8 (0xAB))
a =
1 1 0 1 0 1 0 1
octave> B = bitunpack (uint8 (0xCD))
B =
1 0 1 1 0 0 1 1
Once you have them in this form, it's dead easy to do what you want:
octave> [B A(1:2)]
ans =
1 0 1 1 0 0 1 1 1 1
Then simply pad with zeros accordingly and pack it back into an integer:
octave> postpad ([B A(1:2)], 16, false)
ans =
1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0
octave> bitpack (ans, "uint16")
ans = 973
That or is equivalent to an addition when dealing with integers
result = bitshift(bi2de(bitget(a,1:2)),8) + b;
e.g
a = 01010111
b = 10010010
result = 00000011 100010010
= a[2]*2^9 + a[1]*2^8 + b
an alternative method could be
result = mod(a,2^x)*2^y + b;
where the x is the number of bits you want to extract from a and y is the number of bits of a and b, in your case:
result = mod(a,4)*256 + b;
an extra alternative solution close to the C solution:
result = bitor(bitshift(bitand(a,3), 8), b);
I think it is important to explain exactly what "(a[2:1] << 8) | b" is doing.
In assembly, referencing individual bits is a single operation. Assume all operations take the exact same time and "efficient" a[2:1] starts looking extremely inefficient.
The convenience statement actually does (a & 0x03).
If your compiler actually converts a uint8 to a uint16 based on how much it was shifted, this is not a 'free' operation, per se. Effectively, what your compiler will do is first clear the "memory" to the size of uint16 and then copy "a" into the location. This requires an extra step (clearing the "memory" (register)) that wouldn't normally be needed.
This means your statement actually is (uint16(a & 0x03) << 8) | uint16(b)
Now yes, because you're doing a power of two shift, you could just move a into AH, move b into AL, and AH by 0x03 and move it all out but that's a compiler optimization and not what your C code said to do.
The point is that directly translating that statement into matlab yields
bitor(bitshift(uint16(bitand(a,3)),8),uint16(b))
But, it should be noted that while it is not as TERSE as (a[2:1] << 8) | b, the number of "high level operations" is the same.
Note that all scripting languages are going to be very slow upon initiating each instruction, but will complete said instruction rapidly. The terse nature of Python isn't because "terse is better" but to create simple structures that the language can recognize so it can easily go into vectorized operations mode and start executing code very quickly.
The point here is that you have an "overhead" cost for calling bitand; but when operating on an array it will use SSE and that "overhead" is only paid once. The JIT (just in time) compiler, which optimizes script languages by reducing overhead calls and creating temporary machine code for currently executing sections of code MAY be able to recognize that the type checks for a chain of bitwise operations need only occur on the initial inputs, hence further reducing runtime.
Very high level languages are quite different (and frustrating) from high level languages such as C. You are giving up a large amount of control over code execution for ease of code production; whether matlab actually has implemented uint8 or if it is actually using a double and truncating it, you do not know. A bitwise operation on a native uint8 is extremely fast, but to convert from float to uint8, perform bitwise operation, and convert back is slow. (Historically, Matlab used doubles for everything and only rounded according to what 'type' you specified)
Even now, octave 4.0.3 has a compiled bitshift function that, for bitshift(ones('uint32'),-32) results in it wrapping back to 1. BRILLIANT! VHLL place you at the mercy of the language, it isn't about how terse or how verbose you write the code, it's how the blasted language decides to interpret it and execute machine level code. So instead of shifting, uint32(floor(ones / (2^32))) is actually FASTER and more accurate.