Proper way to call a different method from the same C-extension module? - python-c-api

I'm converting a pure-Python module to a C-extension to familiarize myself with the C API.
The Python implementation is as follows:
_CRC_TABLE_ = [0] * 256
def initialize_crc_table():
if _CRC_TABLE_[1] != 0: # Safeguard against re-initialization
return
# snip
def calculate_crc(data: bytes, initial: int = 0) -> int:
if _CRC_TABLE_[1] == 0: # In case user forgets to initialize first
initialize_crc_table()
# snip
# additional non-CRC methods trimmed
My C-extension thus far works:
#include <Python.h>
static Py_ssize_t CRC_TABLE_LEN = 256;
PyObject *_CRC_TABLE_;
static PyObject *method_initialize_crc_table(PyObject *self, PyObject *args) {
// snip
}
static PyMethodDef module_methods[] = {
{"initialize_crc_table", method_initialize_crc_table, METH_VARARGS, NULL},
{NULL, NULL, 0, NULL}
};
void _allocate_table_() {
_CRC_TABLE = PyList_New(CRC_TABLE_LEN);
PyObject *zero = Py_BuildValue("i", 0);
for (int i = 0; i < CRC_TABLE_LEN; i++) {
PyList_SetItem(_CRC_TABLE_, i, zero);
}
}
#if PY_MAJOR_VERSION >= 3
static struct PyModuleDef module_utilities = {
PyModuleDef_HEAD_INIT,
"utilities",
NULL,
-1,
module_methods,
};
PyMODINIT_FUNC PyInit_utilities() {
PyObject *module = PyModule_Create(&module_utilities);
_allocate_table_();
PyModule_AddObject(module, "_CRC_TABLE", _CRC_TABLE_);
return module;
}
#else
PyMODINIT_FUNC initutilities() {
PyObject *module = Py_InitModule3("utilities", module_methods, NULL);
_allocate_table_();
PyModule_AddObject(module, "_CRC_TABLE", _CRC_TABLE_);
}
I am able to access utilities._CRC_TABLE_ from the C-extension in the interpreter and values match the Python-equivalent when invoking utilities.intialize_crc_table.
Now I'm trying to call initialize_crc_table at the start of calculate_crc, performing the same check as used in the Python implementation. I'm returning None for now:
static PyObject *method_calculate_crc(PyObject *self, PyObject *args) {
if (!(uint)PyLong_AsUnsignedLong(PyList_GetItem(_CRC_TABLE_, (Py_ssize_t) 1))) {
PyObject *call_initialize_crc_table = PyObject_GetAttrString(self, "initialize_crc_table");
PyObject_CallObject(call_initialize_crc_table, NULL);
Py_DECREF(call_initialize_crc_table);
}
Py_RETURN_NONE;
}
I've added this to module_methods[] and it compiles without warnings or errors. When I run this method within the interpreter, I get a segfault. I assume it's because self isn't the module as an object.
I can do this as an alternative, which appears to work without issue:
static PyObject *method_calculate_crc(PyObject *self, PyObject *args) {
if (!(uint)PyLong_AsUnsignedLong(PyList_GetItem(_CRC_TABLE_, (Py_ssize_t) 1))) {
method_initialize_crc_table(self, NULL);
}
Py_RETURN_NONE;
}
However, I am not certain if I should be passing self, NULL, or something else to the method.
What is the proper way of invoking method_initialize_crc_table from method_calculate_crc?

There was a "gotcha" here that I must clarify on. While the code was intended for Python 3, development was initially done in Python 2 as the development files were not yet available on the machine I was using. This shed some light on some differences in how each version handles things. David's comments helped lead to this clarification.
If a method is defined as METH_VARARGS but is defined for a module (versus a class), Python 2 does not pass anything for the PyObject *self parameter. This is noted in the documentation but is easy to overlook if you're not careful. Python 3, however, does pass a pointer to the module. As DavidW recommended, I implemented a global variable to hold a reference to the module. Assuming his claims of Python handling the de-referencing at exit are correct, we can safely use this for accessing module global attributes.
With our issue of PyObject *self solved, we no longer get a segfault. We can then address the question of which approach is (seemingly more) correct for calling a method within the local scope of the module. Do we do this:
if (/* conditional */)
PyObject_CallMethod(module, "initialize_crc_table", NULL);
Or this:
if (/* conditional */)
method_initialize_crc_table(self, args, kwargs);
Benchmarks seem to provide an answer here. Using Python's built-in timeit module, we can see a very clear difference in terms of performance. Note that so far in our implementation, .calculate_crc accesses ._CRC_TABLE_ and checks if it's initialized, but no processing occurs. Performance compared to Python 2 and 3 were identical and thus ignored.
The command is as follows:
python3 -m timeit "import utilities; utilities.calculate_crc(0)"
PyObject_CallMethod: 874 nsec per loop
method_initialize_crc_table: 44.3 usec per loop
Using the PyObject_ function is reported as 50x faster, quite a significant difference. Benchmarks alone do not facilitate what is "more correct" but with no clear guidance it may be a sufficient justification for our use. Therefore, I will be using PyObject_ calls for this project.

Related

How can I unit test a specific DML method?

I'm writing some common DML code that contains a fairly complex method, something like:
saved uint32 checksum_ini;
method calculate_checksum(bytes_t data) -> (uint32 sum) {
uint32 result = checksum_ini;
for (int i = 0; i < data.size; ++i) {
result = f(result, data.data[i]);
}
return result;
}
My device calls the function indirectly by reading and writing some registers, which makes it cumbersome to unit test all corner cases of the checksum algorithm.
How can I efficiently write a unit test for my checksum implementation?
One approach is to create a dedicated test module, say test-checksum, containing a test device, say test_checksum_dev, that imports only your common code, and exposes the calculate_checksum method to Python, where it is easy to write tests. This is done in two steps: First, expose the method to C:
dml 1.4;
device test_checksum_dev;
import "checksum-common.dml";
// Make DML method calculate_checksum available as extern C symbol "calculate_checksum"
// The signature will be:
// uint64 calculate_checksum(conf_object_t *obj, bytes_t data)
export calculate_checksum as "calculate_checksum";
The second step is to expose it to Python. Create checksum.h:
#ifndef CHECKSUM_H
#define CHECKSUM_H
#include <simics/base/types.h>
#include <simics/pywrap.h>
extern uint32 calculate_checksum(conf_object_t *obj, bytes_t data);
#endif /* CHECKSUM_H */
(if you also add header %{ #include "checksum.h" %} to the DML file, you will get a hard check that signatures stay consistent).
Now add the header file to IFACE_FILES in your module makefile to create a Python wrapping:
SRC_FILES = test-checksum.dml
IFACE_FILES = checksum.h
include $(MODULE_MAKEFILE)
You can now call the DML method directly from your test:
SIM_load_module('test-checksum')
from simmod.test_checksum.checksum import calculate_checksum
obj = SIM_create_object('test_checksum_dev', 'dev', checksum_ini=0xdeadbeef)
assert calculate_checksum(obj, b'hello world') == (0xda39ba47).to_bytes(4, 'little')

pcap_getnonblock() returns -3

I am quite new to using pcap lib, so please bear with me.
I am trying to use pcap_getnonblock function, the documentation says the following:
pcap_getnonblock() returns the current 'non-blocking' state of
the capture descriptor; it always returns 0 on 'savefiles' . If
there is an error, PCAP_ERROR is returned and errbuf is filled in
with an appropriate error message.
errbuf is assumed to be able to hold at least PCAP_ERRBUF_SIZE
chars.
I got -3 returned and the errbuf is an empty string, I couldn't understand the meaning of such result.
I believe this caused a socket error: 10065.
This problem happened only once and I could not reproduce it, but still it would be great to find its causing to prevent it in future executions.
Thanks in advance.
pcap_getnonblock() can return -3 - that's PCAP_ERROR_NOT_ACTIVATED. Unfortunately, that's not documented; I'll fix that.
Here's a minimal reproducible example that demonstrates this:
#include <pcap/pcap.h>
#include <stdio.h>
int
main(int argc, char **argv)
{
pcap_t *pcap;
char errbuf[PCAP_ERRBUF_SIZE];
if (argc != 2) {
fprintf(stderr, "Usage: this_program <interface_name>\n");
return 1;
}
pcap = pcap_create(argv[1], errbuf);
if (pcap == NULL) {
fprintf(stderr, "this_program: pcap_create(%s) failed: %s\n",
argv[1], errbuf);
return 2;
}
printf("pcap_getnonblock() returns %d on non-activated pcap_t\n",
pcap_getnonblock(pcap, errbuf));
return 0;
}
(yes, that's minimal, as 1) names of interfaces are OS-dependent, so it has to be a command-line argument and 2) if you don't run the program correctly, it should let you know what's happening, so you know what you have to do in order to reproduce the problem).
Perhaps pcap_getnonblock() and pcap_setnonblock() should be changed so that you can set non-blocking mode before activating the pcap_t, so that, when activated, it will be in non-blocking mode. It doesn't work that way currently, however.
I.e., you're allocating a pcap_t with pcap_create(), but you're not activating it with pcap_activate(). You need to do both in order to have a pcap_t on which you can capture.

std::lock_guard (mutex) produces deadlock

First: Thanks for reading this question and tryin' to help me out. I'm new to the whole threading topic and I'm facing a serious mutex deadlock bug right now.
Short introduction:
I wrote a game engine a few months ago, which works perfectly and is being used in games already. This engine is based on SDL2. I wanted to improve my code by making it thread safe, which would be very useful to increase performance or to play around with some other theoretical concepts.
The problem:
The game uses internal game stages to display different states of a game, like displaying the menu, or displaying other parts of the game. When entering the "Asteroid Game"-stage I recieve an exception, which is thrown by the std::lock_guard constructor call.
The problem in detail:
When entering the "Asteroid Game"-stage a modelGetDirection() function is being called to recieve a direction vector of a model. This function uses a lock_guard to make this function being thread safe. When debugging this code section this is where the exception is thrown. The program would enter this lock_guard constructor and would throw an exception. The odd thing is, that this function is NEVER being called before. This is the first time this function is being called and every test run would crash right here!
this is where the debugger would stop in threadx:
inline int _Mtx_lockX(_Mtx_t _Mtx)
{ // throw exception on failure
return (_Check_C_return(_Mtx_lock(_Mtx)));
}
And here are the actual code snippets which I think are important:
mutex struct:
struct LEMutexModel
{
// of course there are more mutexes inside here
mutex modelGetDirection;
};
engine class:
typedef class LEMoon
{
private:
LEMutexModel mtxModel;
// other mutexes, attributes, methods and so on
public:
glm::vec2 modelGetDirection(uint32_t, uint32_t);
// other methods
} *LEMoonInstance;
modelGetDirection() (engine)function definition:
glm::vec2 LEMoon::modelGetDirection(uint32_t id, uint32_t idDirection)
{
lock_guard<mutex> lockA(this->mtxModel.modelGetDirection);
glm::vec2 direction = {0.0f, 0.0f};
LEModel * pElem = this->modelGet(id);
if(pElem == nullptr)
{pElem = this->modelGetFromBuffer(id);}
if(pElem != nullptr)
{direction = pElem->pModel->mdlGetDirection(idDirection);}
else
{
#ifdef LE_DEBUG
char * pErrorString = new char[256 + 1];
sprintf(pErrorString, "LEMoon::modelGetDirection(%u)\n\n", id);
this->printErrorDialog(LE_MDL_NOEXIST, pErrorString);
delete [] pErrorString;
#endif
}
return direction;
}
this is the game function that uses the modelGetDirection method! This function would control a space ship:
void Game::level1ControlShip(void * pointer, bool controlAble)
{
Parameter param = (Parameter) pointer;
static glm::vec2 currentSpeedLeft = {0.0f, 0.0f};
glm::vec2 speedLeft = param->engine->modelGetDirection(MODEL_VERA, LEFT);
static const double INCREASE_SPEED_LEFT = (1.0f / VERA_INCREASE_LEFT) * speedLeft.x * (-1.0f);
// ... more code, I think that's not important
}
So as mentioned before: When entering the level1ControlShip() function, the programm will enter the modelGetDirection() function. When entering the modelGetDirection() function an exception will be thrown when tryin' to call:
lock_guard<mutex> lockA(this->mtxModel.modelGetDirection);
And as mentioned, this is the first call of this function in the whole application run!
So why is that? I appreciate any help here! The whole engine (not the game) is an open source project and can be found on gitHub in case I forgot some important code snippets (sorry! in that case):
GitHub: Lynar Moon Engine
Thanks for your help!
Greetings,
Patrick

Perl XS garbage collection

I had to deal with a really old codebase in my company which had C++ apis exposed via perl.
In on of the code reviews, I suggested it was necessary to garbage collect memory which was being allocated in c++.
Here is the skeleton of the code:
char* convert_to_utf8(char *src, int length) {
.
.
.
length = get_utf8_length(src);
char *dest = new char[length];
.
.
// No delete
return dest;
}
Perl xs definition:
PROTOTYPE: ENABLE
char * _xs_convert_to_utf8(src, length)
char *src
int length
CODE:
RETVAL = convert_to_utf8(src, length)
OUTPUT:
RETVAL
so, I had a comment that the memory created in the c++ function will not garbage collected by Perl. And 2 java developers think it will crash since perl will garbage collect the memory allocated by c++. I suggested the following code.
CLEANUP:
delete[] RETVAL
Am I wrong here?
I also ran this code and showed them the increasing memory utilization, with and without the CLEANUP section. But, they are asking for exact documentation which proves it and I couldn't find it.
Perl Client:
use ExtUtils::testlib;
use test;
for (my $i=0; $i<100000000;$i++) {
my $a = test::hello();
}
C++ code:
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"
#include <stdio.h>
char* create_mem() {
char *foo = (char*)malloc(sizeof(char)*150);
return foo;
}
XS code:
MODULE = test PACKAGE = test
char * hello()
CODE:
RETVAL = create_mem();
OUTPUT:
RETVAL
CLEANUP:
free(RETVAL);
I'm afraid that the people who wrote (and write) the Perl XS documentation probably consider it too obvious that Perl cannot magically detect memory allocation made in other languages (like C++) to document that explicitly. There's a bit in the perlguts documentation page that says that all memory to be used via the Perl XS API must use Perl's macros to do so that may help you argue.
When you write XS code, you're writing C (or sometimes C++) code. You still need to write proper C/C++, which includes deallocating allocated memory when appropriate.
The glue function you desire XS to create is the following:
void hello() {
dSP; // Declare and init SP, the stack pointer used by mXPUSHs.
char* mem = create_mem();
mXPUSHs(newSVpv(mem, 0)); // Create a scalar, mortalize it, and push it on the stack.
free(mem); // Free memory allocated by create_mem().
XSRETURN(1);
}
newSVpv makes a copy of mem rather than taking possession of it, so the above clearly shows that free(mem) is needed to deallocate mem.
In XS, you could write that as
void hello()
CODE:
{ // A block is needed since we're declaring vars.
char* mem = create_mem();
mXPUSHs(newSVpv(mem, 0));
free(mem);
XSRETURN(1);
}
Or you could take advantage of XS features such as RETVAL and CLEANUP.
SV* hello()
char* mem; // We can get rid of the block by declaring vars here.
CODE:
mem = create_mem();
RETVAL = newSVpv(mem, 0); // Values returned by SV* subs are automatically mortalized.
OUTPUT:
RETVAL
CLEANUP: // Happens after RETVAL has been converted
free(mem); // and the converted value has been pushed onto the stack.
Or you could also take advantage of the typemap, which defines how to convert the returned value into a scalar.
char* hello()
CODE:
RETVAL = create_mem();
OUTPUT:
RETVAL
CLEANUP:
free(RETVAL);
All three of these are perfectly acceptable.
A note on mortals.
Mortalizing is a delayed reference count decrement. If you were to decrement the SV created by hello before hello returns, it would get deallocated before hello returns. By mortalizing it instead, it won't be deallocated until the caller has a chance to inspect it or take possession of it (by increasing its reference count).

Page fault with newlib functions

I've been porting newlib to my very small kernel, and I'm stumped: whenever I include a function that references a system call, my program will page fault on execution. If I call a function that does not reference a system call, like rand(), nothing will go wrong.
Note: By include, I mean as long as the function, e.g. printf() or fopen(), is somewhere inside the program, even if it isn't called through main().
I've had this problem for quite some time now, and have no idea what could be causing this:
I've rebuilt newlib numerous times
Modified my ELF loader to load the
code from the section headers instead of program headers
Attempted to build newlib/libgloss separately (which failed)
Linked the libraries (libc, libnosys) through the ld script using GROUP, gcc and ld
I'm not quite sure what other information I should include with this, but I'd be happy to include what I can.
Edit: To verify, the page faults occurring are not at the addresses of the failing functions; they are elsewhere in the program. For example, when I call fopen(), located at 0x08048170, I will page fault at 0xA00A316C.
Edit 2:
Relevant code for loading ELF:
int krun(u8int *name) {
int fd = kopen(name);
Elf32_Ehdr *ehdr = kmalloc(sizeof(Elf32_Ehdr*));
read(fd, ehdr, sizeof(Elf32_Ehdr));
if (ehdr->e_ident[0] != 0x7F || ehdr->e_ident[1] != 'E' || ehdr->e_ident[2] != 'L' || ehdr->e_ident[3] != 'F') {
kfree(ehdr);
return -1;
}
int pheaders = ehdr->e_phnum;
int phoff = ehdr->e_phoff;
int phsize = ehdr->e_phentsize;
int sheaders = ehdr->e_shnum;
int shoff = ehdr->e_shoff;
int shsize = ehdr->e_shentsize;
for (int i = 0; i < pheaders; i++) {
lseek(fd, phoff + phsize * i, SEEK_SET);
Elf32_Phdr *phdr = kmalloc(sizeof(Elf32_Phdr*));
read(fd, phdr, sizeof(Elf32_Phdr));
u32int page = PMMAllocPage();
int flags = 0;
if (phdr->p_flags & PF_R) flags |= PAGE_PRESENT;
if (phdr->p_flags & PF_W) flags |= PAGE_WRITE;
int pages = (phdr->p_memsz / 0x1000) + 1;
while (pages >= 0) {
u32int mapaddr = (phdr->p_vaddr + (pages * 0x1000)) & 0xFFFFF000;
map(mapaddr, page, flags | PAGE_USER);
pages--;
}
lseek(fd, phdr->p_offset, SEEK_SET);
read(fd, (void *)phdr->p_vaddr, phdr->p_filesz);
kfree(phdr);
}
// Removed: code block that zeroes .bss: it's already zeroed whenever I check it anyways
// Removed: code block that creates thread and adds it to scheduler
kfree(ehdr);
return 0;
}
Edit 3: I've noticed that if I call a system call, such as write(), and then call printf() two or more times, I will get an unknown opcode interrupt. Odd.
Whoops! Figured it out: when I map the virtual address, I should allocate a new page each time, like so:
map(mapaddr, PMMAllocPage(), flags | PAGE_USER);
Now it works fine.
For those curious as to why it didn't work: when I wasn't including printf(), the size of the program was under 0x1000 bytes, so mapping with only one page was okay. When I include printf() or fopen(), the size of the program was much bigger so that's what caused the issue.