Method frame - how local variables array is created? - function-pointers

My question is most likely platform/compiler/language specific, but I will try to be as generic as possible.
When we call a method, a frame is created for it and we (usually/always) allocate some space for the local variables in an array, return address, and probably some other stuff depending on the platform we're on. We would also have our PC pointing to the first item of the function's bytecode array. My question arises here...
That bytecode array only includes opcodes and their operands (right?). In this case, when the relevant method has been called, the os/runtime should have an idea about how much space does it need to reserve to create the local variable array. I think that information is probably part of the class file that was already compiled. So, where's that size information stored actually? Is it part of the method's bytecode array (in addition to the opcodes and operands)? Or it's kept somewhere else?
To make the question more clear, perhaps this example might help. For example, when I call a function object, what I'm returned is the address of the first opcode/instruction in the method or the address of something that helps me to initiate the method frame?
Multiple approaches are more than welcome.
Hope my question is clear

It's unclear which language/platform you're talking about, but in the case of Java classfiles, the size of the "local variable" table is stored as a field in the Code attribute of each method that has code.
That being said, modern JVMs operate at a higher level of abstraction. They don't just blindly interpret the bytecode - they may analyze and optimize the bytecode, or even compile it into machine code.

Related

How are CPU Register values updated?

I know this might be a silly question which I am asking, but I am really curious about this, since I am not having much knowledge of computer architecture.
Suppose I have a Register R1 and say I loaded value of a variable say LOCK=5 into the register, so now R1 has the value 5 stored into it, now let's suppose I updated the value of LOCK to 10 after some time, so will the value of register still be 5 or will it be updated.
When it comes to register based CPU architectures, I think Neo from the matrix has a valueable lession: "There are no variables."
Variables, as you're using them in a higher level programming languages are an abstract thing for describing to the compiler what operations to do a particular piece of data. That data may reside in system memory, or for temporary values never leave the register file.
However once the program has been compiled to a binary, there no longer are variables! For debugging purposes the compiler may annotate the code with information of the kind "at this particular position in the code, what is referred to as variable 'x' right now happens to be held in …".
I think the best way to understand this is to compile some very simple programs and look at their respective assembly, to see how things fit together. The Godbolt Compiler Explorer is a really valuable tool, here.

Progress-gl - What's benefit of placing variable declaration on top of the procedure

I've been doing Progress 4GL for 8 years though it's not my main responsibility. I do C++ and Java a lot more. When programming in other language it's suggested to have the declaration close to the usage. With 4GL however I see people place the declaration on top of the file. It's even in the coding standard.
I think placing them on top of them file would lead to 'vertical separation' problem. In most other language it's even suggested to do the assignment at the same line as the declaration.
The question is why it's suggested to do so in 4GL ? What's the benefit ? I know that it's possible to place the declaration anywhere in the file, given that it's declared before it is used.
I think the answer is to do with scoping, or the lack of it, within Progress 4GL.
If you are used to Java, say, and read a Progress 4GL program, that looks like
DO:
DEFINE VARIABLE x AS INTEGER INITIAL 4.
DISPLAY x.
END.
then you wouldn't expect to be able to use this value of x anywhere else in the program, and that any changes made in the block, wouldn't effect anything outside the block.
As I understand it, all progress variables declared within the body of a program are scoped to the whole program, unless they are declared are within an internal procedure or function, in which case they are scoped to the procedure or function.
(Incidentally any default buffers [i.e. undeclared] you use within an internal procedure/function are scoped to the whole program, not just the procedure or function, so you need to be very careful to explicity declare buffers in functions you intend ot use recursively).
I therefore think the convention of declaring variables at the beginning of a program is in order to reflect the fact that Progress will treat them has having been done so, regardless of where you put the declaration.
There is absolutely no benefit in scoping anything to the program as a whole when it could be scoped smaller.
Smaller scopes are easier to test, give less possibility of namespace conflict, and less opportunity for error.
Tightly scoped named buffers are especially useful when writing to the database because they eliminate the possibility of there ever being some other part of your code that uses the same buffer and causes a share-lock, i.e., this fails to compile:
do for b-customer transaction:
find b-customer where .... exclusive...
...
end.
...
find b-customer...
On the other hand, procedures and functions (and include files...) that share scope with the main body of code are a major source of bugs, because when you pick up your variable or whatever, you can never be entirely certain where it has been...
All of this is just basic Structured Programming, of course. It's true for every language and has been accepted since the 70's.
The "reason" that you usually see variables defined at the top is simple. Habit. That is just how things were done in the bad old days.
A lot of old code, or code written by old fossils, is written that way. No matter the language.
Some languages (COBOL springs to mind) even formalized it.
Is there any advantage to such an approach?
Not especially. I guess you could argue "they are all in one place and easy to find" but that isn't very compelling.
"Habit" is actually more compelling ;) If you are working with a team that expects a certain style or in an application where a particular style is prevalent then you should think twice before unilaterally throwing out a new way of doing things - the confusion could be a bigger problem than the advantages gained.

Object class members as pointers to avoid #include in headers - is it good practice?

This is really a question of precedence: which is more preferred in C++, avoiding pointers or avoiding #includes in header files?
"Don't Use #include in header files."
There seems to be some ambiguity based on my research. In this SO question, the top answer says "...make sure you actually need an include, [don't use one] when a forward declaration or even leaving it out completely will do." (From Header files and include best practice)
And this article explains the negative effect excess header inclusions can have on compile-time: http://blog.knatten.org/2012/11/09/another-reason-to-avoid-includes-in-headers/
As well as this tutorial, stating, "...you should try to put all of your code in the CPP class and only the class declaration in the HPP file.": https://github.com/LaurentGomila/SFML/wiki/Tutorial%3A-Basic-Game-Engine#wiki-declarations
"Don't Use Pointers."
But, there is also evidence that pointers should be avoided most often as well:
c++: when to use pointers?
https://softwareengineering.stackexchange.com/questions/56935/why-are-pointers-not-recommended-when-coding-with-c
Which preference takes precedence?
If my understanding about avoiding #includes in header files is correct, this can easily be done by changing things like class members to pointers so I can use a forward declaration instead, but is this a good idea for class members whose lifetime only lasts as long as the class itself?
It's not really an "one or the other". Both statements are true, but you need to understand the reasoning behind them.
tl;dr: Use forward declaration where possible to reduce compile time. Use stack objects or references as much as possible and pointers only in rare cases.
"Don't Use #include in header files."
This is a rather general statement, which as is, would be wrong. The more important part behind this statement actually is: "Use forward declarations where ever possible". Includes in header files are not something bad per se, but they often aren't needed either.
Forward declarations can be used, if the included type/class/etc. is used as a pointer in the new type/class/etc. declaration within the given header. Forward declaration just tells the compiler: "Somewhere a long the way you'll find the actual declaration of type X." The include can even be removed if the type isn't used at all in the declaration. The reason is that the compiler doesn't need to know anything about these types to calculate the required memory layout for the new type. For example a pointer has "always" the same size. Including the file additionally in the header, would potentially only waste processing power, since the compiler would have to open and parse the file, thus adding expensive seconds to the compile time. So in most cases you'll do yourself a favor by reducing the unnecessary includes in the header files and instead use forward declaration.
For the sake of completion: Forward declaration are explicitly needed if you get circular references (class A depends on class B, which depends on class C, which depends on class A). However this can often also reveal either bad design and/or old/outdate coding standards which would lead us to the second topic.
"Don't use pointers."
Again the statement is a tiny bit too general. One might rather want to say: "Don't use raw pointers."
With C++11 and soon C++1y the language itself has changed a lot. As much bad C++ books the world has seen, the more outdated C++ books float around nowadays (here's a good list however). While in the past we were mostly stuck with pointers new and delete for memory management, we've evolved to better, more readable, less risk and 100% memory leak free ways to manage the data in memory. One of the magic words is RAII - since you linked something from SFML above, here's a nice demonstration of the power of RAII. I see many people use pointers and new and delete just because or maybe because they are thinking in Java or C# terms were objects get instantiated with the new keyword. In C++ however object don't need to use new to be allocated and it's mostly preferable to run things on the stack instead of the heap. This works for many, many things, especially when using STL containers, which will hide the dynamic management in the background. The usage of the heap is mostly all cases only preferable if you need the data to be dynamic, non "local" or you need a lot of it. However when you use the heap, make sure to use smart pointers such as std::unique_ptr or std::shared_ptr depending on the use case, but certainly not raw pointers. In modern C++ raw pointers should never own an object anymore. There are cases where it's okay to return a raw pointer to reference an object, but there's really no reason in modern C++ to call new on a raw pointer.
Lets get back to the original question though. The "Don't use raw pointers" is essentially more of a design question and quite unrelated to the whole header issue. While there might be some cases where you'll have to switch to raw pointers, due to circular references, the use of forward declarations is otherwise just about compilation time (and maybe clean code), but it's not as essential for the programming itself.
In short: Don't use raw pointers to avoid inclusions in header files, but use forward declaration where ever possible and utilize smart pointers as much as possible.

Basic principle of auto complete

How do they perform auto complete of code in eclipse or other ides? What is basic principle behind it?
You know how you have to explicitly attach source code to non-standard libraries you imported in Eclipse? When you do that, text-search index is built over that source and this way IDE knows to offer you auto-complete feature. Roughly, I suppose it is something as associative array where key is the prefix of method you typed, and value is description of that method.
Now what is important for this functionality is to be implemented efficiently regarding both time and memory consumption. It would be very inefficient to store the same entry for every possible prefix of some method. (Or even to store every prefix!)
One of interesting structures that could be suitable for this problem is Trie, which is inherently optimized for prefix search while keeping acceptable memory usage.
Take a look here for a simple example:
http://www.sarathlakshman.com/2011/03/03/implementing-autocomplete-with-trie-data-structure/
Besides Tries, used for the case when you have already typed the beginning of the name of a method/var, I think it also uses some sort of type comparison/analysis for the case when you try to invoke a method and the IDE suggests you a local/global variable to pass as parameter to that method call.

Do I conserve memory in MATLAB by declaring variables global instead of passing them as arguments?

I am new to MATLAB, it wasn't in the job description and I've been forced to take over for the person who wrote and maintained the code my company uses. Life's tough.
The guy from which I'm taking over told me that he declared all the big data vectors as global, to save memory. More specifically, so that when one function calls another function, he doesn't create a copy of the data when he passes it over.
Is this true? I read Strategies for Efficient Use of Memory, and it says that
When working with large data sets, be aware that MATLAB makes a temporary copy of an input variable if the called function modifies its value. This temporarily doubles the memory required to store the array, which causes MATLAB to generate an error if sufficient memory is not available.
It says something very similiar in Memory Allocation For Array #Function Arguments:
When you pass a variable to a function, you are actually passing a reference to the data that the variable represents. As long as the input data is not modified by the function being called, the variable in the calling function and the variable in the called function point to the same location in memory. If the called function modifies the value of the input data, then MATLAB makes a copy of the original array in a new location in memory, updates that copy with the modified value, and points the input variable in the called function to this new array.
So is it true that using global can be better? It seems a little sloppy to blithely declare all the large data as global, instead of making sure that none of the code modifies its input argument. Am I wrong? Does this really improve RAM usage?
In my experience, provided that none of the code modifies the large data, memory usage is the same, regardless of whether you use a global variable or an input argument, just like the Matlab docs say. Further information is in this blog post by a MathWorks employee.
There is quite a bit of folklore on performance issues in Matlab and not all of it is right. The internals of Matlab have changed quite a bit. It may be that in a previous version it's better to use a global variable.
This answer may be somewhat tangential, but an additional topic that bears mention here is the use of nested functions to manage memory.
As has already been established in other answers, there is no need for global variables if the data you are passing to the function is not modified (since it will be passed by reference). If it is modified (and is thus passed by value), using a global variable instead will save you memory. However, global variables can be somewhat "uncouth" for the following reasons:
You have to make a declaration like global varName everywhere you need them.
It can be conceptually a little messy trying to keep track of when and how they are modified, especially if they are spread across multiple m-files.
The user can easily break your code with an ill-placed clear global, which clears all global variables.
An alternative to global variables was mentioned in the first set of documentation you cited: nested functions. Immediately following the quote you cited is a code example (which I've formatted slightly differently here):
function myfun
A = magic(500);
setrowval(400, 0);
disp('The new value of A(399:401,1:10) is')
A(399:401,1:10)
function setrowval(row, value)
A(row,:) = value;
end
end
In this example, the function setrowval is nested inside the function myfun. The variable A in the workspace of myfun is accessible within setrowval (as if it had been declared global in each). The nested function modifies this shared variable, thus avoiding any additional memory allocation. You don't have to worry about the user inadvertently clearing anything and (in my opinion) it's a bit cleaner and easier to follow than declaring global variables.
The solution seems a bit strange to me. As you found out already, it shouldn't have significant impact on the memory usage if the called function does not modify the data array. However, if the called function modifies the data array, there's a functional difference: In one case (making the data array global), the change has an impact on the rest of the code, in the other case (passing it as reference) the modifications are only local and temporary.
I think you pretty much answered your own question, but a couple more references would be good here:
I made a video on this:
http://blogs.mathworks.com/videos/2008/09/16/new-location-and-memory-allocation/
Similar to what Loren spoke of here:
http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/
-Dogu