Is the whole function block really stored in RETAIN memory? - plc

In the CodeSys manual we can read that:
"If you declare a local variable in a function block as RETAIN, CODESYS stores the complete instance of this function block in the Retain range (all data of the function block); however, only the declared RETAIN variable is treated as such."
But does someone actually tested? I created a function block with only the following variables:
VAR
Test1: ARRAY[1..50] OF UINT; //100 bytes
END_VAR
VAR RETAIN
Test2: ARRAY[1..50] OF DINT; //200 bytes
END_VAR
My program only implements one instance of this function block. Using SIZEOF at runtime shows a function block size of 312 bytes
Now, if I right-click on the device, and go to "Device Memory Info", the size of my Retain Data is only 203 bytes.
If the complete instance of the function block is stored in the retain range, I would expect the retain data size to be the same as the function block size (312 bytes), but it isn't, it's only 203 bytes (size of the retain data). Is the manual incorrect?

I can say it is true. The first project where I needed retains inside a FB, I ran out of memory after having needed many, many instances (the FB were not “simple”). Once I removed the retain from the FB and linked to an external Retained variable, my problem went away. That is the day I learned the documentation warned me.
But I can say in that case I was making a solution that had around 100 unique and complicated FBs where each was storing a user entry. I have made several other projects that were much simpler and I just let the compiler put the whole FB is retain.

Related

Behaviour of passing struct as a parameter to a CUDA kernel

I'm relatively new to CUDA programming, so I want to clarify the behaviour of a struct when I pass it into a kernel. I've defined the following struct to somewhat imitate the behavior of a 3D array that knows its own size:
struct protoarray {
size_t dim1;
size_t dim2;
size_t dim3;
float* data;
};
I create two variables of type protoarray, dynamically allocate space to data via malloc and cudaMalloc on the host and device side, and update dim1, dim2 and dim3 to reflect the size of array I want this struct to represent. I read in this thread that the struct should be passed via copy. So this is what I do in my kernel
__global__ void kernel(curandState_t *state, protoarray arr_device){
const size_t dim1 = arr_device.dim1;
const size_t dim2 = arr_device.dim2;
for(size_t j(0); j < dim2; j++){
for(size_t i(0); i < dim1; i++){
// Do something
}
}
}
The struct is passed by copy, so all its contents are copied into shared memory of each block. This is where I'm getting bizarre behaviour, which I'm hoping you could help me with. Suppose I had set arr_device.dim1 = 2 on the host side. While debugging inside the kernel and setting a breakpoint at one of the for loops, checking the value of arr_device.dim1 yields something like 16776576, nowhere large enough to cause overflow, but this value copies correctly into dim1 as 2, which means that the for loops execute as I intended them to. As a side question, is using size_t which is essential unsigned long long int bad practice, seeing as the GPU's are made of 32bit cores?
Generally, how safe is it to pass struct and class into kernels as arguments, is bad practice that should be avoided at all cost? I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory, and that they should be very lightweight if I want to pass them by value.
This is a partial answer, since without a proper program to look into, it is difficult/impossible to guess why you would see an invalid value in your arr_device.dim1.
The struct is passed by copy, so all its contents are copied into shared memory of each block.
Incorrect. Kernel arguments are stored in constant memory, which is device-global and not block-specific. They are not stored shared memory (which is block-specific).
When a thread runs, it typically reads arguments from constant memory into registers (and again, not shared memory).
Generally, how safe is it to pass struct and class into kernels as arguments
My personal rule of thumb on this matter is: If the struct/class...
is trivially-copyable; and
all its members of the struct/class are defined both for the host and the device side, or at least - designed with GPU use in mind;
then it should be safe to pass to a kernel.
passing struct and class into kernels as arguments [ - ] is [it] bad practice that should be avoided at all cost?
No. But remember that most C++ libraries only provide host-side code; and were not written with a mind of being used on a GPU. So I'd be wary of using non-trivial classes without a lot of scrutiny.
I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory
Yes, this can be problematic. However - if you used cuda::memory::managed::allocate(), cuda::memory::managed::make_unique() or cudaMallocManaged() - then this should "just work", i.e. the relevant memory pages will be fetched to the GPU or the CPU as necessary when accessed. See:
Unified Memory in CUDA for beginners
Beyond GPU Memory Limits with Unified Memory on Pascal
and that they should be very lightweight if I want to pass [objects to kernels] by value.
Yes, because each and every thread has to read each argument from constant memory before it can use that argument. And while constant memory allows this to happen relatively quickly, it's still a bunch of overhead that you want to minimize.
Also remember that you can't pass anything to kernels by (C++) reference; it's all "by-value" - the object itself or a pointer to it.

Usage of VAR RETAIN PERSISTENT

I'm using WAGO PLC PFC200 in my home automation project. I've plenty of POUs, each for one room. Each room implements IRoom interface and uses base POU for common logic like turning off all lights.
For lights management, I'm using
FbEvaluateShortLongPress from WagoAppBuilding to handle short and long press of buttons on the wall (it could also be a function block from OSCAT library)
FbLatchingRelay from WagoAppBuilding as a toggle for PLC digital output
I want to save the state of FbLatchingRelay in case of e.g.: power drop. I want all lights which were turned off before power drop to be turned on when the power comes back.
I've solved it by declaring a FbLatchingRelay in the VAR RETAIN PERSISTENT area in my POU. But then after reading here that:
If you declare a local variable in a function block as RETAIN, CODESYS stores the complete instance of this function block in the Retain range (all data of the function block); however, only the declared RETAIN variable is treated as such.
I decided to change that in order not to waste RETAIN memory for a bunch of variables which are in POU but are not needed to be stored as RETAIN.
So right now I have something like that:
the VAR RETAIN PERSISTENT area is only declared in my main program
it stores structures for each room (each POU) only with needed data - FbLatchingRelay POU and few other variables
while initializing the room (POU) I'm passing those structures to my rooms using VAR_IN_OUT
each room (POU) uses then this data
PLC_PRG:
VAR RETAIN PERSISTENT
BathroomPersistentData: BathroomData;
END_VAR
Bathroom(PersistData := BathroomPersistentData, xMainLightSwitch := DI1_13, xMirrorLightSwitch := DI2_3, xMirrorLightSwitchActuator => DO2_1, xMainLightSwitchActuator => DO1_11);
Bathroom POU:
VAR_IN_OUT
PersistData: BathroomData;
END_VAR
Is this a good approach? What do you think? It complicates project a bit but I'm not wasting RETAIN memory for things which should not be there (whole POUs).
Yes, this is how my organization handles retain vars. This also lends itself to supporting “save to disk” solutions for other FB demands (not so much for your light states).
On the other hand, did you run out of memory by the original way? Sometimes I find we worry about things that never happen. Yes it is “wasteful” for the whole FB instance to be put in retain memory, but if your FBs are small and your device has plenty of retain memory - then nothing to worry about until later.

Memory profiling of MATLAB columns meaning

I'm using MATLAB profile to observe memory using the command
profile -memory on
profile clear
% my code
profile report
and i got this table
1- i want to ask about the meaning of
Allocated Memory,Freed Memory, SelfMemory, and Peak Memory
2- what is the meaning of negative self memory?
After a quick google, it would seem that no-one knows, except perhaps MathWorks and they aren't telling. (I jest, but in truth I found very little information on the subject).
Logically however I would interpret the column names as follows:
Allocated memory = the total amount of memory allocated within the function and any it calls.
Freed memory = the total amount of memory released within the function and any it calls.
Peak Memory = the maximum amount of memory in use at any one time during the execution of the function.
Self Memory = the amount of memory used by the function, but not including any functions it calls.
I would hypothesize that a negative 'Self Memory' would indicate that the function frees more memory than it allocates. This could be that it has ownership of a piece of data passed to it, which it then clears. E.g.:
function A()
foo = B();
clear foo
end
function foo = B()
foo = rand(10000,10000);
end
In the example above, the data is created in the call to B and since Matlab employs a lazy copy memory management, this case works pretty much as pass-by-reference for the return value. So, B allocates the memory, and A frees it.
Indeed, running that code with the profiling method in the question produces the following output, which supports my hypothesis.

Matlab: Free memory of class objects

I recently wrote some code using Matlab's OOP. In each class object I save some measurement data as a property and define the methods for evaluating them. With an average data set one single class object uses about 32 MB of memory.
Now I am writing a GUI that should process these objects.
In the first step I load a set of objects from a saved .mat-file (about 200 objects, 2GB on harddisk) and store them in the handles struct. They fill the RAM and use about 6-7 GB, when loaded. This is no problem.
But if I close the GUI, it seems that I can't free the used memory.
I tried different approaches with no success.
Setting the data fields to "empty" in the destructor of the class:
function delete(obj)
obj.timeVector = [];
obj.valueVector = [];
end
Trying to free it in the figure_CloseRequestFcn:
function figure_CloseRequestFcn(hObject, eventdata, handles)
handles.data = [];
handles = rmfield(handles,'data');
guidata(hObject,handles);
clear handles;
pack; %Matlab issues a warning, that pack could only
%be used from the command line, but that did
%not work either
delete(hObject);
end
Any ideas, besides closing Matlab every time after working with the GUI?
I found the answer in the Matlab Bug Report Center. Seems to exist since R2011b.
Summary
Storing objects in MAT-files can cause a memory leak and prevent the object class from being cleared
Description
After storing an instance of a class, 'MyClass', in a MAT-file, calling clear classes may result in the warning:
Warning: Objects of 'MyClass' class exist. Cannot clear this class or any of its superclasses.
This warning persists, even if you have cleared all instances of the class in the workspace.
The warning may occur for one MAT-file format, and not for another.
Workaround
Under some circumstances, switching to a different MAT-file format may eliminate the warning.
http://www.mathworks.ch/support/bugreports/857319
Edit:
I tried older formats for saving, but this does not work either. I get an "Error closing file" (http://www.mathworks.ch/matlabcentral/answers/18098-error-using-save-error-closing-file). So Matlab does not support saving class objects that well. I will have to live with the memory issues then and restart Matlab after every use of the GUI.
Based on your memory screenshots, there is definitely memory that is not being cleared. There is a small chance that you have found a fundamental flaw in Matlab's garbage collection, but it is much more likely that the ~6Gigs of memory resident data is still actually available via some series of links. Based on personal experience, here are a few ways that memory which you thought was cleared can still be available:
Timer objects: If one of the callback functions of a timer references a this data (or a copy), then that data is still available. You need to call deleted(t) on that timer.
Persistent variables in functions: I often cache data in a persistent variable within a function, this clearly allows access to that data in the future, so it will not be cleared. You need to call clear FUNCTIONNAME to clear associated persistent variables.
In GUI objects, as either data or within callback functions: The figures and any persistents need to be cleared.
Any static methods or constant attributes in classes which can retain data. These can either be cleared individually within the class, or by force using clear CLASSNAME.
Some tips for finding stale link to data (again, based on personal mistakes)
Look at the exact number of bytes being lost after each call, using the x=memory; call to get an exact count. Is it consistent? Is it a number that you recognize? Sometimes I can find the leak after realizing that it is exactly 238263232 bytes, therefore a 29782904 double array, which must be from function xyz.
See which classes are actually being deleted. Within your delete(obj) function add a detailed display or which objects are being deleted, and by inference, which are not. For a given non-deleted object, where could it be reference from? You should not need to clear data in the delete(obj) function like you are doing, Matlab should handle that for you. Use the delete function instead as a debugging tool.
Matlab has a garbage collector so you don't need to manually manage memory. After closing the GUI, all the memory will be freed except for what is in your workspace. You can clear the workspace variables using clear.
One thing I've noticed on Windows (not sure about other platforms) is that Matlab's GUI sometimes retains extra memory (maybe 100 MB, but not multiple GB like you are seeing). Simply minimizing and then restoring the GUI will free this excess memory.

Objective-C: Calling and copying the same block from multiple threads

I'm dealing with neural networks here, but it's safe to ignore that, as the real question has to deal with blocks in objective-c. Here is my issue. I found a way to convert a neural network into a big block that can be executed all at once. However, it goes really, really slow, relative to activating the network. This seems a bit counterintuitive.
If I gave you a group of nested functions like
CGFloat answer = sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias))
//or in block notation
^(CGFloat x, CGFloat y, CGFloat d, CGFloat bias) {
return sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias));
};
In theory, running that function multiple times should be easier/quicker than looping through a bunch of connections, and setting nodes active/inactive, etc, all of which essentially calculate this same function in the end.
However, when I create a block (see thread: how to create function at runtime) and run this code, it is slow as all hell for any moderately sized network.
Now, what I don't quite understand is:
When you copy a block, what exactly are you copying?
Let's say, I copy a block twice, copy1 and copy2. If I call copy1 and copy2 on the same thread, is the same function called? I don't understand exactly what the docs mean for block copies: Apple Block Docs
Now if I make that copy again, copy1 and copy2, but instead, I call the copies on separate threads, now how do the functions behave? Will this cause some sort of slowdown, as each thread attempts to access the same block?
When you copy a block, what exactly
are you copying?
You are copying any state the block has captured. If that block captures no state -- which that block appears not to -- then the copy should be "free" in that the block will be a constant (similar to how #"" works).
Let's say, I copy a block twice, copy1
and copy2. If I call copy1 and copy2
on the same thread, is the same
function called? I don't understand
exactly what the docs mean for block
copies: Apple Block Docs
When a block is copied, the code of the block is never copied. Only the captured state. So, yes, you'll be executing the exact same set of instructions.
Now if I make that copy again, copy1
and copy2, but instead, I call the
copies on separate threads, now how do
the functions behave? Will this cause
some sort of slowdown, as each thread
attempts to access the same block?
The data captured within a block is not protected from multi-threaded access in any way so, no, there would be no slowdown (but there will be all the concurrency synchronization fun you might imagine).
Have you tried sampling the app to see what is consuming the CPU cycles? Also, given where you are going with this, you might want to become acquainted with your friendly local disassembler (otool -TtVv binary/or/.o/file) as it can be quite helpful in determining how costly a block copy really is.
If you are sampling and seeing lots of time in the block itself, then that is just your computation consuming lots of CPU time. If the block were to consume CPU during the copy, you would see the consumption in a copy helper.
Try creating a source file that contains a bunch of different kinds of blocks; with parameters, without, with captured state, without, with captured blocks with/without captured state, etc.. and a function that calls Block_copy() on each.
Disassemble that and you'll gain a deep understanding on what happens when blocks are copied. Personally, I find x86_64 assembly to be easier to read than ARM. (This all sounds like good blog fodder -- I should write it up).