How to generate code for in-register named local values in LLVM IR? - code-generation

I am generating LLVM IR (.ll files) from a source language. This language doesn't have any mutable local variables and I don't use any allocas yet, everything so far is in LLVM registers. It does have immutable local values, though. Currently, they work fine unless the initializer part is a constant or another identifier. For example:
def fun(a: Int, b: Int) = {
val n = a + b
n + 2
}
This compiles fine, because a + b compiles to the instruction add i32 %a, %b and instructions can be optionally assigned to local values, so the line becomes: %n = add i32 %a, %b.
On the other hand, I have trouble generating code for the following:
def fun() = {
val n = 1
n
}
I could generate %n = bitcast i32 1 to i32 but bitcast doesn't work with all types and is not really intended for this. Well, I guess in LLVM there is really nothing specifically intended for this, otherwise I wouldn't have the question.
But is there a good solution without generating tons of different no-op instructions depending on the type of the value? bitcast will not work with tuples for example:
error: invalid cast opcode for cast from '{ i32, i32 }' to '{ i32, i32 }'
%n = bitcast {i32, i32} {i32 1, i32 2} to {i32, i32}
Then again, maybe because there are no 'copy' instructions in the IR, I shouldn't be trying to do this and should be replacing %n with the value everywhere it is used?

You have two possibilities:
Generate the code using alloca's, load and stores (check e.g. clang's or llvm-gcc's output at -O0) and then use -mem2reg optimization pass to raise all this stuff to LLVM registers
Use 1 instead of %n everywhere.

Related

What is the difference between llvm::CallBase::getArgOperand() and llvm::UnaryInstruction::getOperand()?

I'm learning LLVM core librarry. I know that getArgOperand(i) would return the i-th operand of callable instruction in the form of llvm::Value* . But what is purpose of getOperand() ? In the following code, it seems that these two functions have similar funtionality (i'm not sure). I can't find detail explanation in the following offcial doc.
https://llvm.org/doxygen/classllvm_1_1CallBase.html#ab2caa29167597390ab2fc3cf30d70389
https://llvm.org/doxygen/classllvm_1_1UnaryInstruction.html#a71927e1ef55b2829d11000662d80c60b
//This is used for extracting the first argument of cudaMalloc(**void, int)
Value *MemAllocInfo::getTarget() {
Value *ans = Alloc->getArgOperand(0);
if (isa<LoadInst>(ans)) ans = dyn_cast<LoadInst>(ans)->getOperand(0);
if (isa<BitCastInst>(ans)) ans = dyn_cast<BitCastInst>(ans)->getOperand(0);
return ans;
}
My Questions are:
Is the funtionality of getArgOperand() and getOperand() equivalent?
and, if possible, what is the purpose of two if statements?
Update1:
I understand what you mean, #Nick Lewycky. But I write a function Pass and demo.ll as follow.
llvm::PreservedAnalyses CountIRPass::run(
llvm::Function& F,llvm::FunctionAnalysisManager& AM){
for(llvm::BasicBlock& BB:F){
for(llvm::Instruction& I:BB){
if(llvm::isa<llvm::CallInst>(I)){
llvm::CallInst& ci=llvm::cast<llvm::CallInst>(I);
llvm::errs() << ci.getNumOperands() <<'\n';
llvm::errs() << ci.getArgOperand(0) <<'\n';
llvm::errs() << ci.getOperand(0) <<'\n';
llvm::errs() << *(ci.getArgOperand(0)) <<'\n';
}
}
}
return llvm::PreservedAnalyses::all();
}
define internal i32 #special_func(i32 %a){
ret i32 0
}
define dso_local i32 #main(i32 %a){
%b=call i32 #special_func(i32 %a)
ret i32 %a
}
The output is:
2
0x55be3770bf20
0x55be3770bf20
i32 %a
The output shows that the results of getArgOperand(0) and getOperand(0) are indentical, which is i32 %a. My LLVM version is 15.0.0 Is it possible about version issue?
The first operand to a call or invoke instruction is the callee, then comes the arguments. getArgOperand(0) returns the first argument to the call, while getOperand(0) would return the callee.
The function you quoted then checks whether the first argument to the function call is a load and if so switches out ans for the pointer that was being loaded. Then if ans is a bitcast, it peeks through the bitcast to replace ans with the casted value. Why it does this is not clear from the context (what about multiple bitcasts?), but that's what it does.
Update1: I think it's a version issue. It looks like right now all getArgOperand(i) does is return getOperand(i). Instead getCalledOperand(), which returns the callee, now does getOperand(-1)! Regardless, this is why getArgOperand() was originally added.

What is the difference between constant variable which is type of list and constant list

This is a basic question, but can't find elsewhere.
In dart we can declare a constant variable as
const foo = [1,2,3];
or
var foo = const [1,2,3];
Is there any performance related change if we use either any one.
When you do
const foo = [1, 2, 3];
It means foo will always be equal to [1, 2, 3] independently of the previously executed code and won't change its value later.
When you do
var foo = const [1, 2, 3];
It means that you are declaring a variable foo (and not a constant) which equals at this moment to the constant [1, 2, 3] (it is not dependant on the previously executed code). But the value foo can change and you could do later:
foo = const [1, 2];
which will be legit since foo is a variable. You couldn't do that with foo as a constant (since it is constant)
Therefore, it is better when you can to write
const foo = [1, 2, 3];
because it indicates to the compiler that foo will never change its value.
If constants are called literals and literals are data represented directly in the code, how can constants be considered as literals?
The article from which you drew the quote is defining the word "constant" to be a synonym of "literal". The latter is the C++ standard's term for what it is describing. The former is what the C standard uses for the same concept.
I mean variables preceded with the const keyword are constants, but they are not literals, so how can you say that constants are literals?
And there you are providing an alternative definition for the term "constant", which, you are right, is inconsistent with the other. That's all. TP is using a different definition of the term than the one you are used to.
In truth, although the noun usage of "constant" appears in a couple of places in the C++ standard outside the defined term "null pointer constant", apparently with the meaning you propose here, I do not find an actual definition of that term, and especially not one matching yours. In truth, your definition is less plausible than TutorialPoint's, because an expression having const-qualified type can nevertheless designate an object that is modifiable (via a different expression).
const int MEANING = 42;
the value MEANING is a constant, 42 is a literal. There is no real relationship between the two terms, as can be seen here:
int n = 42;
where n is not a constant, but 42 is still a literal.
The major difference is that a constant may have an address in memory (if you write some code that needs such an address), whereas a literal never has an address.

Why are macros based on abstract syntax trees better than macros based on string preprocessing?

I am beginning my journey of learning Rust. I came across this line in Rust by Example:
However, unlike macros in C and other languages, Rust macros are expanded into abstract syntax trees, rather than string preprocessing, so you don't get unexpected precedence bugs.
Why is an abstract syntax tree better than string preprocessing?
If you have this in C:
#define X(A,B) A+B
int r = X(1,2) * 3;
The value of r will be 7, because the preprocessor expands it to 1+2 * 3, which is 1+(2*3).
In Rust, you would have:
macro_rules! X { ($a:expr,$b:expr) => { $a+$b } }
let r = X!(1,2) * 3;
This will evaluate to 9, because the compiler will interpret the expansion as (1+2)*3. This is because the compiler knows that the result of the macro is supposed to be a complete, self-contained expression.
That said, the C macro could also be defined like so:
#define X(A,B) ((A)+(B))
This would avoid any non-obvious evaluation problems, including the arguments themselves being reinterpreted due to context. However, when you're using a macro, you can never be sure whether or not the macro has correctly accounted for every possible way it could be used, so it's hard to tell what any given macro expansion will do.
By using AST nodes instead of text, Rust ensures this ambiguity can't happen.
A classic example using the C preprocessor is
#define MUL(a, b) a * b
// ...
int res = MUL(x + y, 5);
The use of the macro will expand to
int res = x + y * 5;
which is very far from the expected
int res = (x + y) * 5;
This happens because the C preprocessor really just does simple text-based substitutions, it's not really an integral part of the language itself. Preprocessing and parsing are two separate steps.
If the preprocessor instead parsed the macro like the rest of the compiler, which happens for languages where macros are part of the actual language syntax, this is no longer a problem as things like precedence (as mentioned) and associativity are taken into account.

Chisel UInt negative value error

I have recently started work in scala, and am required to create an implementation of MD5. It is my understanding that MD5 requires unsigned types, which scala does not come with. As I will soon begin Chisel, which does have unsigned types, I decided to implement its library. Everything appears good so far, except when doing the below bitwise operations, my F value becomes -271733879, which causes an error "Caused by: java.lang.IllegalArgumentException: requirement failed: UInt literal -271733879 is negative" as UInts can't be negative.
if(i<16){
F = ((B & C) | ((~B) & D))
g = i
}
There is more to the error message, but it is just the trace list of different libraries and classes that had an error because of this error, and thus I did not post it because I didn't think it was important. If it was, I can edit this and post it all.
My B, C, and D values are equal to the lower case equivalents listed below, and it is the first time through the for loop so they have not yet updated.
var a0 : UInt = UInt(0x67452301)
var b0 : UInt = UInt(0xefcdab89)
var c0 : UInt = UInt(0x98badcfe)
var d0 : UInt = UInt(0x10325476)
Any Help would be greatly appreciated.
For the sake of my answer, I am using the Chisel 3 preferred 123.U style for specifying literals rather than the Chisel 2 UInt(123) style, but this answer works for either.
There are several ways you could do this:
Use Scala Long (put L at end of literal)
val myUInt = 0x98badcfeL.U
This obviously won't work for larger than 64-bit
Use Scala BigInt
val myUInt = BigInt("98badcfe", 16).U
Use Chisel's shorthand for constructing BigInts from Strings
val myUInt = "x98badcfe".U
hex = x | h, dec = d, oct = o, bin = b

How does dereference work C++

I have trouble understanding what happens when calling &*pointer
int j=8;
int* p = &j;
When I print in my compiler I get the following
j = 8 , &j = 00EBFEAC p = 00EBFEAC , *p = 8 , &p = 00EBFEA0
&*p= 00EBFEAC
cout << &*p gives &*p = 00EBFEAC which is p itself
& and * have same operator precedence.I thought &*p would translate to &(*p)--> &(8) and expected compiler error.
How does compiler deduce this result?
You are stumbling over something interesting: Variables, strictly spoken, are not values, but refer to values. 8 is an integer value. After int i=8, i refers to an integer value. The difference is that it could refer to a different value.
In order to obtain the value, i must be dereferenced, i.e. the value stored in the memory location which i stands for must be obtained. This dereferencing is performed implicitly in C whenever a value of the type which the variable references is requested: i=8; printf("%d", i) results in the same output as printf("%d", 8). That is funny because variables are essentially aliases for addresses, while numeric literals are aliases for immediate values. In C these very different things are syntactically treated identically. A variable can stand in for a literal in an expression and will be automatically dereferenced. The resulting machine code makes that very clear. Consider the two functions below. Both have the same return type, int. But f has a variable in the return statement which must be dereferenced so that its value can be returned (in this case, it is returned in a register):
int i = 1;
int g(){ return 1; } // literal
int f(){ return i; } // variable
If we ignore the housekeeping code, the functions each translate into a sigle machine instruction. The corresponding assembler (from icc) is for g:
movl $1, %eax #5.17
That's pretty starightforward: Put 1 in the register eax.
By contrast, f translates to
movl i(%rip), %eax #4.17
This puts the value at the address in register rip plus offset i in the register eax. It's refreshing to see how a variable name is just an address (offset) alias to the compiler.
The necessary dereferencing should now be obvious. It would be more logical to write return *i in order to return 1, and write return i only for functions which return references — or pointers.
In your example it is indeed illogical to a degree that
int j=8;
int* p = &j;
printf("%d\n", *p);
prints 8 (i.e, p is actually dereferenced twice); but that &(*p) yields the address of the object pointed to by p (which is the address value stored in p), and is not interpreted as &(8). The reason is that in the context of the address operator a variable (or, in this case, the L-value obtained by dereferencing p) is not implicitly dereferenced the way it is in other contexts.
When the attempt was made to create a logical, orthogonal language — Algol68 —, int i=8 indeed declared an alias for 8. In order to declare a variable the long form would have been refint m = loc int := 3. Consequently what we call a pointer or reference would have had the type ref ref int because actually two dereferences are needed to obtain an integer value.
j is an int with value 8 and is stored in memory at address 00EBFEAC.
&j gives the memory address of variable j (00EBFEAC).
int* p = &j Here you define a variable p which you define being of type int *, namely a value of an address in memory where it can find an int. You assign it &j, namely an address of an int -> which makes sense.
*p gives you the value associated with the address stored in p.
The address stored in p points to an int, so *p gives you the value of that int, namely 8.
& p is the address of where the variable p itself is stored
&*p gives you the address of the value the memory address stored in p points to, which is indeed p again. &(*p) -> &(j) -> 00EBFEAC
Think about &j itself (or even &(j)). According to your logic, shouldn't j evaluate to 8 and result in &8, as well? Dereferencing a pointer or evaluating a variable results in an lvalue, which is a value that you can assign to or take the address of.
The L in "lvalue" refers to the left in "left hand side of the assignment", such as j = 10 or *p = 12. There are also rvalues, such as j + 10, or 8, which obviously cannot be assigned to.
That's just a basic explanation. In C++ there's a lot more to it, with various classes of values (but that thread might be too advanced for your current needs).