How to disable perl's garbage collector? - perl

I have a short-lived (fast to complete) perl script, that nevertheless uses enough memory to trigger the garbage collector. In turn, collection takes more than the rest of the processing.
Is there a way to disable garbage collection and let the OS to do it when the script exits?
Edit
The GC pauses the script in the middle of it, not at the very end. KILLing it does not help.

Two approaches to exiting your program quickly without executing global destruction:
POSIX::_exit
This is identical to the C function _exit(). It exits the program immediately which means among other things buffered I/O is not flushed.
kill your main process with SIGKILL.
kill 'KILL', $$;

As mentioned in comments, garbage collection in Perl is a refcounting mechanism, and is triggered by the value no longer being referenced by anything (whether a variable it is stored in which may go out of scope or be assigned a different value, an operation it's part of, a subroutine call stack it's being passed around in, or an actual reference).
So to prevent a value from being cleaned up until program exit, the easiest way is to do the opposite of the conventional memory-conscious wisdom: reference the value from the global stash.
our $foo = \$something_to_keep_alive;
Alternatively, you can (ab)use the fact that circular references will prevent refcounts from decrementing until global destruction.
$something->{self} = $something;
This will cause the value to reference itself, even if done through another layer, until one of the references in the cycle is weakened, removed, or global destruction is reached. And again, certainly something to be avoided in normal circumstances, as it is a by-design memory leak.

Related

Is the only way to disconnect WWW::Mechanize::Firefox from mozrepl destruction of the objects?

As the title says I'm trying to make a perl daemon which, being long-running I want to be sane on resource usage.
All the examples / documentation I've seen doesn't seem to mention a way to disconnect a session.
The best documentation on the topic I can find in WWW::Mechanize::Firefox::Troubleshooting
Where it's suggested the object (and connection?) is kept alive until global destruction.
In short, I've seen no 'disconnect' function, and wonder if I'm missing something.
Disconnection seems to be handled via destructors. Perl uses special DESTROY methods for this. It is not advisable to call this method manually.
You need to decrease the refcount of your $mech object in order to get it destroyed automatically. This happens when the variable drops out of scope, in the Global Destruction Phase at the end of the process, or (in the case of objects), by assigning something different to your variable, e.g.
$mech = undef;
To completely deallocate any variable, you can also
undef $mech; # which btw is the answer provided in the FAQ you linked
The differences are subtle, and irrelevant in this case.

Does Perl safely defer INT signals during a Storable write?

I'm concerned about data corruption while executing a Storable::store operation. I'm writing about 100 MB to an NFS to backup my calculations at particular checkpoints but the process isn't exactly speedy.
To try to prevent corruption, I have a SIG{INT} signal handler. Right before Storable::store is called, a global variable is set indicating it isn't safe to terminate. As soon as Storable::store completes, that global variable is returned to a value to indicate it's okay to interrupt.
That global variable is used to indicate whether or not the signal handler will call die or whether it will just print a statement saying "Can't stop yet."
But am I really helping thing? I see now from reading perlipc that interrupting IO is sometimes done safely, and sometimes it is not... That is, should my signal handler end up being called in the middle of my Storeable::store operation, such a brief diversion to my signal handler subroutine may still be enough to screw things up.
Does anyone know how Storable performs in such a situation? Or is my signal handling setup actually appropriate?
Since 5.8.1, Perl uses "safe signals" (by default). When you setup a signal handler through %SIG, Perl actually installs a simple signal handler that does nothing but increment a counter. In between Perl ops, Perl checks if the counter is non-zero, and calls your signal handler if it is. That way, your signal handlers don't execute in the middle of a system call or library call.
There are only two things you need to worry about:
Modifying global vars (e.g. $!) in your signal handler.
System calls returning EINTR or EAGAIN because a signal came in during the call.
If you're really concerned that SIGINT could break store, try adding SA_RESTART to your signal handler. On Linux, this will force the system calls to be automatically retried upon a signal. This will probably defer your signal handler indefinitely since SIGINT will not be able to break out of the I/O operation. However, it should enable safe operation of Storable::store under any interruption circumstance.

Objective-C: Calling and copying the same block from multiple threads

I'm dealing with neural networks here, but it's safe to ignore that, as the real question has to deal with blocks in objective-c. Here is my issue. I found a way to convert a neural network into a big block that can be executed all at once. However, it goes really, really slow, relative to activating the network. This seems a bit counterintuitive.
If I gave you a group of nested functions like
CGFloat answer = sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias))
//or in block notation
^(CGFloat x, CGFloat y, CGFloat d, CGFloat bias) {
return sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias));
};
In theory, running that function multiple times should be easier/quicker than looping through a bunch of connections, and setting nodes active/inactive, etc, all of which essentially calculate this same function in the end.
However, when I create a block (see thread: how to create function at runtime) and run this code, it is slow as all hell for any moderately sized network.
Now, what I don't quite understand is:
When you copy a block, what exactly are you copying?
Let's say, I copy a block twice, copy1 and copy2. If I call copy1 and copy2 on the same thread, is the same function called? I don't understand exactly what the docs mean for block copies: Apple Block Docs
Now if I make that copy again, copy1 and copy2, but instead, I call the copies on separate threads, now how do the functions behave? Will this cause some sort of slowdown, as each thread attempts to access the same block?
When you copy a block, what exactly
are you copying?
You are copying any state the block has captured. If that block captures no state -- which that block appears not to -- then the copy should be "free" in that the block will be a constant (similar to how #"" works).
Let's say, I copy a block twice, copy1
and copy2. If I call copy1 and copy2
on the same thread, is the same
function called? I don't understand
exactly what the docs mean for block
copies: Apple Block Docs
When a block is copied, the code of the block is never copied. Only the captured state. So, yes, you'll be executing the exact same set of instructions.
Now if I make that copy again, copy1
and copy2, but instead, I call the
copies on separate threads, now how do
the functions behave? Will this cause
some sort of slowdown, as each thread
attempts to access the same block?
The data captured within a block is not protected from multi-threaded access in any way so, no, there would be no slowdown (but there will be all the concurrency synchronization fun you might imagine).
Have you tried sampling the app to see what is consuming the CPU cycles? Also, given where you are going with this, you might want to become acquainted with your friendly local disassembler (otool -TtVv binary/or/.o/file) as it can be quite helpful in determining how costly a block copy really is.
If you are sampling and seeing lots of time in the block itself, then that is just your computation consuming lots of CPU time. If the block were to consume CPU during the copy, you would see the consumption in a copy helper.
Try creating a source file that contains a bunch of different kinds of blocks; with parameters, without, with captured state, without, with captured blocks with/without captured state, etc.. and a function that calls Block_copy() on each.
Disassemble that and you'll gain a deep understanding on what happens when blocks are copied. Personally, I find x86_64 assembly to be easier to read than ARM. (This all sounds like good blog fodder -- I should write it up).

What can I do to find out what's causing my program to consume lots of memory over time?

I have an application using POE which has about 10 sessions doing various tasks. Over time, the app starts consuming more and more RAM and this usage doesn't go down even though the app is idle 80% of the time. My only solution at present is to restart the process often.
I'm not allowed to post my code here so I realize it is difficult to get help but maybe someone can tell me what I can do find out myself?
Don't expect the process size to decrease. Memory isn't released back to the OS until the process terminates.
That said, might you have reference loops in data structures somewhere? AFAIK, the perl garbage collector can't sort out reference loops.
Are you using any XS modules anywhere? There could be leaks hidden inside those.
A guess: your program executes a loop for as long as it is running; in this loop it may be that you allocate memory for a buffer (or more) each time some condition occurs; since the scope is never exited, the memory remains and will never be cleaned up. I suggest you check for something like this. If it is the case, place the allocating code in a sub that you call from the loop and where it will go out of scope, and get cleaned up, on return to the loop.
Looks like Test::Valgrind is a tool for searching for memory leaks. I've never used it myself though (but I used plain valgrind with C source).
One technique is to periodically dump the contents of $POE::Kernel::poe_kernel to a time- or sequence-named file. $poe_kernel is the root of a tree spanning all known sessions and the contents of their heaps. The snapshots should monotonically grow if the leaked memory is referenced. You'll be able to find out what's leaking by diff'ing an early snapshot with a later one.
You can export POE_ASSERT_DATA=1 to enable POE's internal data consistency checks. I don't expect it to surface problems, but if it does I'd be very happy to receive a bug report.
Perl can not resolve reference rings. Either you have zombies (which you can detect via ps axl) or you have a memory leak (reference rings/circle)
There are a ton of programs to detect memory leaks.
strace, mtrace, Devel::LeakTrace::Fast, Devel::Cycle

An IOCP documentation interpretation question - buffer ownership ambiguity

Since I'm not a native English speaker I might be missing something so maybe someone here knows better than me.
Taken from WSASend's doumentation at MSDN:
lpBuffers [in]
A pointer to an array of WSABUF
structures. Each WSABUF structure
contains a pointer to a buffer and the
length, in bytes, of the buffer. For a
Winsock application, once the WSASend
function is called, the system owns
these buffers and the application may
not access them. This array must
remain valid for the duration of the
send operation.
Ok, can you see the bold text? That's the unclear spot!
I can think of two translations for this line (might be something else, you name it):
Translation 1 - "buffers" refers to the OVERLAPPED structure that I pass this function when calling it. I may reuse the object again only when getting a completion notification about it.
Translation 2 - "buffers" refer to the actual buffers, those with the data I'm sending. If the WSABUF object points to one buffer, then I cannot touch this buffer until the operation is complete.
Can anyone tell what's the right interpretation to that line?
And..... If the answer is the second one - how would you resolve it?
Because to me it implies that for each and every data/buffer I'm sending I must retain a copy of it at the sender side - thus having MANY "pending" buffers (in different sizes) on an high traffic application, which really going to hurt "scalability".
Statement 1:
In addition to the above paragraph (the "And...."), I thought that IOCP copies the data to-be-sent to it's own buffer and sends from there, unless you set SO_SNDBUF to zero.
Statement 2:
I use stack-allocated buffers (you know, something like char cBuff[1024]; at the function body - if the translation to the main question is the second option (i.e buffers must stay as they are until the send is complete), then... that really screws things up big-time! Can you think of a way to resolve it? (I know, I asked it in other words above).
The answer is that the overlapped structure and the data buffer itself cannot be reused or released until the completion for the operation occurs.
This is because the operation is completed asynchronously so even if the data is eventually copied into operating system owned buffers in the TCP/IP stack that may not occur until some time in the future and you're notified of when by the write completion occurring. Note that with write completions these may be delayed for a surprising amount of time if you're sending without explicit flow control and relying on the the TCP stack to do flow control for you (see here: some OVERLAPS using WSASend not returning in a timely manner using GetQueuedCompletionStatus?) ...
You can't use stack allocated buffers unless you place an event in the overlapped structure and block on it until the async operation completes; there's not a lot of point in doing that as you add complexity over a normal blocking call and you don't gain a great deal by issuing the call async and then waiting on it.
In my IOCP server framework (which you can get for free from here) I use dynamically allocated buffers which include the OVERLAPPED structure and which are reference counted. This means that the cleanup (in my case they're returned to a pool for reuse) happens when the completion occurs and the reference is released. It also means that you can choose to continue to use the buffer after the operation and the cleanup is still simple.
See also here: I/O Completion Port, How to free Per Socket Context and Per I/O Context?