Querying Python runtime for all objects in existence - python-c-api

I'm working on a C++ Python wrapper the attempts to encapsulate the awkwardness of reference counting, retaining, releasing.
It has a set of unit tests.
However I want to ensure that after each test, everything has been cleared away properly. i.e. every object created during that test has had its reference count taken down to 0, and has consequently been removed.
Is there any way of querying the Python runtime for this information?
If I could just get the number of objects being stored, that would do. I could then sure it doesn't change between tests.
EDIT: I believe it is possible to compile Python with a special flag producing a binary that has functions for monitoring reference counting. But this is as much as I know. Maybe more...

That depends on which implementation you use. I'm assuming you're using cpython. Since you're fiddling with the reference counting mechanism, I will further assume that using the garbage collector to find the remaining objects won't be sufficiently reliable for your purpose. (Elsewise, see here.)
The build flag you were thinking about is this one:
It is best to define these options in the EXTRA_CFLAGS make variable:
make EXTRA_CFLAGS="-DPy_REF_DEBUG".
Py_REF_DEBUG introduced in 1.4
named REF_DEBUG before 1.4
Turn on aggregate reference counting. This arranges that extern
_Py_RefTotal hold a count of all references, the sum of ob_refcnt across
all objects. [..]
Special gimmicks:
sys.gettotalrefcount()
Return current total of all refcounts.
(Source: Python git, SpecialBuilds.txt, Debugging builds from the C API reference.)
If you need a list of all pointers to live objects, use Py_TRACE_REFS, directly below that one in the SpecialBuilds file.

Related

How to call Rust functions in Flutter (Dart) via FFI, but with convenience and safety?

I know we can call Rust from Flutter/Dart via FFI. But Flutter only allows the C ABI when doing FFI. Therefore, I have to manually write down boilerplate code. Especially, Rust unsafe code - since I have to deal with lots of raw pointers :(
Therefore, is there any approaches to do it in a safe way? We know Rust itself is very safe (since its unique memory management approach), and Dart/Flutter itself is also very safe (since GC). But I do not want the ffi call be the Achilles heel and destroy the safety of my app!
There are several ways to do it.
a. JSON/Protobuf-based Approach
The first way that I have used in the production environment for a year is that, you can use JSON or Protobuf to pass all the data between Rust and Dart/Flutter. By doing this, you do not need to write down tons of boilerplate code to allocate/free a String, a List of bytes, a struct/class, etc. All you need to do is to write down one single function that accepts a byte array payload and outputs a byte array result. By saying "one" function, I mean, you can have an action field in your JSON/Protobuf, so calls to indeed different Rust functions can be interleaved into this one thin interface.
Despite its convenience (only a bit of unsafe boilerplate), the drawback is also evident. The serialization and deserialization does not come for free. You will have to pay the CPU time and memory for it, which can be quite large sometimes. Moreover, you cannot easily pass around big objects. For example, if you have an image (you know, at least megabytes of size), serializing it to Protobuf, then deserialize it from Protobuf can be quite a waste of both CPU and memory - useless copies! Even worse, since Flutter/Dart FFI does not support a convenient way of async FFI, you have to make it running in a separate worker isolate - one more memory copy. You can see more here: https://github.com/dart-lang/language/issues/1862 (this is an issue that I opened).
b. Code generator
The second way that I use recently is to write down a code generator. Indeed the code follows several common patterns, such as "allocate - fill data - call FFI - free", etc. So it is not that hard to write a generator to automatically do such kind of things. The idea is to mimic what human beings will do when they write down boilerplate code manually.
I did hope that there already exist some code generator such that I could directly use, but it seemed that none exists... So, go and write it by yourself.
c. Use existing open-source code generator
After I write down the code generator, I guess people may have the same problem as me, so I open-sourced it: https://github.com/fzyzcjy/flutter_rust_bridge
Indeed, my code generator not only solves the problem above, but also have rich type support, allows zero-copy, allows async programming and direct call from main isolate, etc, which can be implemented via code generator but will require lots of boilerplate code if you do it by hand.
Disclaimer: This is a Q&A-style answer to show my thoughts and what I have done on this problem that is critical to my own app in production environment. Indeed I have used the JSON approach since last year, and later refactor into the code generator approach. Hope it also helps other people who faces the same situation!

Scala legacy code: how to access input parameters at different points in execution path?

I am working with a legacy scala codebase, and as is always the case modifying the code is quite difficult without touching different parts.
One of my new requirement in to make several decisions based on some input parameters. Problem is that these decisions are to be made at various points along the execution. So either I encapsulate all those parameters in a case class instance and pass it along. But it means I would have to modify multiple methods signatures, and I want to avoid this approach as much as possible.
Another approach can be to create a global object containing all those input parameters and accessible from different points in the execution. Is it a good approach in Scala?
No, using global mutable variables to pass “hidden” parameters is not a good idea, not in Scala and not in any other programming language. It makes the code hard to understand and modify, because a function's behaviour will now depend on which functions were invoked earlier. And it's extremely fragile, because you might forget setting one of those global parameters before invoking the function, which means that it will use whatever value was stored there before. This is the kind of thing that can appear to work for years, and then break when you modify a completely unrelated part of the program.
I can't stress this enough: do not use global mutable variables, period. The solution is to man up and change those method signatures. Depending on the details, dependency injection may or may not help in your particular case.

PostgreSQL's plperlu interpreter's #INC and/or cached libraries: separate for different databases?

I have different versions of some libraries I'm developing, and want to, from within various plperl functions I've written, load a certain version based on current_database().
(IIRC using use rather than require is preferred, I think because it might cache the library?)
However, my fear is that different databases on the same server will have problems, in either way that I'm thinking of doing it:
1) use lib and then use--if more than one path gets stuck on #INC, it may not be the right one that gets used
2) require--even if this means the right one is always used in the current script, does it mean that the library gets reloaded every time? And either way, if libraries are kept loaded once used, is it possible that namespace pollution from different versions could result in bugs? (E.g. if I have something branch based on whether a variable is defined, and in one version it is by default, and in another it's not--will all the versions now act as if it is, unless I explicitly undefine it rather than just not defining it?)
If plperl is not loaded through shared_preload_libraries, each database session will have its own interpreter freshly initialized at first use, so the libraries included by one session cannot possibly interfere with another session.
See PL/Perl Under the Hood in the manual for more.

Basic principle of auto complete

How do they perform auto complete of code in eclipse or other ides? What is basic principle behind it?
You know how you have to explicitly attach source code to non-standard libraries you imported in Eclipse? When you do that, text-search index is built over that source and this way IDE knows to offer you auto-complete feature. Roughly, I suppose it is something as associative array where key is the prefix of method you typed, and value is description of that method.
Now what is important for this functionality is to be implemented efficiently regarding both time and memory consumption. It would be very inefficient to store the same entry for every possible prefix of some method. (Or even to store every prefix!)
One of interesting structures that could be suitable for this problem is Trie, which is inherently optimized for prefix search while keeping acceptable memory usage.
Take a look here for a simple example:
http://www.sarathlakshman.com/2011/03/03/implementing-autocomplete-with-trie-data-structure/
Besides Tries, used for the case when you have already typed the beginning of the name of a method/var, I think it also uses some sort of type comparison/analysis for the case when you try to invoke a method and the IDE suggests you a local/global variable to pass as parameter to that method call.

How do you define 'unwanted code'?

How would you define "unwanted code"?
Edit:
IMHO, Any code member with 0 active calling members (checked recursively) is unwanted code. (functions, methods, properties, variables are members)
Here's my definition of unwanted code:
A code that does not execute is a dead weight. (Unless it's a [malicious] payload for your actual code, but that's another story :-))
A code that repeats multiple times is increasing the cost of the product.
A code that cannot be regression tested is increasing the cost of the product as well.
You can either remove such code or refactor it, but you don't want to keep it as it is around.
0 active calls and no possibility of use in near future. And I prefer to never comment out anything in case I need for it later since I use SVN (source control).
Like you said in the other thread, code that is not used anywhere at all is pretty much unwanted. As for how to find it I'd suggest FindBugs or CheckStyle if you were using Java, for example, since these tools check to see if a function is used anywhere and marks it as non-used if it isn't. Very nice for getting rid of unnecessary weight.
Well after shortly thinking about it I came up with these three points:
it can be code that should be refactored
it can be code that is not called any more (leftovers from earlier versions)
it can be code that does not apply to your style-guide and way-of-coding
I bet there is a lot more but, that's how I'd define unwanted code.
In java i'd mark the method or class with #Deprecated.
Any PRIVATE code member with no active calling members (checked recursively). Otherwise you do not know if your code is not used out of your scope analysis.
Some things are already posted but here's another:
Functions that almost do the same thing. (only a small variable change and therefore the whole functions is copy pasted and that variable is changed)
Usually I tell my compiler to be as annoyingly noisy as possible, that picks 60% of stuff that I need to examine. Unused functions that are months old (after checking with the VCS) usually get ousted, unless their author tells me when they'll actually be used. Stuff missing prototypes is also instantly suspect.
I think trying to implement automated house cleaning is like trying to make a USB device that guarantees that you 'safely' play Russian Roulette.
The hardest part to check are components added to the build system, few people notice those and unused kludges are left to gather moss.
Beyond that, I typically WANT the code, I just want its author to refactor it a bit and make their style the same as the rest of the project.
Another helpful tool is doxygen, which does help you (visually) see relations in the source tree.. however, if its set at not extracting static symbols / objects, its not going to be very thorough.