When profiling code with cprofile, I noticed that there appear a lot of numba entries referring to the numba compiler. I know that the computation time is high at the first run of the script due to compiling.
The attached screenshot however shows cprofile entries from a script which already should be compiled (The script was run at least twice and also the total computation time decreased).
Because the functions shown in the screenshot still consume a lot of time, I asked myself if I do anything wrong using numba. Additionally, it does not make sense to call a compiler function if the code is already compiled. Is there a way to find out if compiled functions are already in storage?
Related
I have a problem when try to debug the code which is copied to the SRAM and executed from there.
The code is overwriting the data - but it is done only during the system update. The sections where code is placed are correctly defined in the linker script file and the debuger correctly see the addresses. But when I step into the function (and the code in RAM is the correct one) it does not connect the source files with the code executed in the memory.
Do you know how can it be done. Debugging C code on the assembler level is not something which makes me happy :)
Any help appreciated.
The problem is a bit silly. When you call RAM function from the FLASH (the first call has to be done this way) it has to be done by the veneer. It was messing up the debugger. But having own calling macro (because of the distance it has to be done via the pointer) everything works fine
example calling macro.
#define RAMFCALL(func, ...) {unsigned (* volatile fptr)() = (unsigned (* volatile)())func; fptr(__VA_ARGS__);}
I'm working on a project based on the stm32f4discovery board using IAR Embedded Workbench (though I'm very close to the 32kb limit on the free version so I'll have to find something else soon). This is a learning project for me and so far I've been able to solve most of my issues with a few google searches and a lot of trial and error. But this is the first time I've encountered a run-time error that doesn't appear to be caused by a problem with my logic and I'm pretty stuck. Any general debugging strategy advice is welcome.
So here's what happens. I have an interrupt on a button; each time the button is pressed, the callback function runs my void cal_acc(uint16_t* data) function defined in stm32f4xx_it.c. This function gathers some data, and on the 6th press, it calls my void gn(float32_t* data, float32_t* beta) function. Eventually, two functions are called, gn_resids and gn_jacobian. The functions are very similar in structure. Both take in 3 pointers to 3 arrays of floats and then modify the values of the first array based on the second two. Unfortunately, when the second function gn_jacobian exits, I get the HardFault.
Please look at the link (code structure) for a picture showing how the program runs up to the fault.
Thank you very much! I appreciate any advice or guidance you can give me,
-Ben
Extra info that might be helpful below:
Running in debug mode, I can step into the function and run through all the lines click by click and it's OK. But as soon as I run the last line and it should exit and move on to the next line in the function where it was called, it crashes. I have also tried rearranging the order of the calls around this function and it is always this one that crashes.
I had been getting a similar crash on the first function gn_resids when one of the input pointers pointed to an array that was not defined as "static". But now all the arrays are static and I'm quite confused - especially since I can't tell what is different between the gn_resids function that works and the gn_jacobian function that does not work.
acc1beta is declared as a float array at the beginning of main.c and then also as extern float32_t acc1beta[6] at the top of stm32f4xx_it.c. I want it as a global variable; there is probably a better way to do this, but it's been working so far with many other variables defined in the same way.
Here's a screenshot of what I see when it crashes during debug (after I pause the session) IAR view at crash
EDIT: I changed the code of gn_step to look like this for a test so that it just runs gn_resids twice and it crashes as soon as it gets to the second call - I can't even step into it. gn_jacobian is not the problem.
void gn_step(float32_t* data, float32_t* beta) {
static float32_t resids[120];
gn_resids(resids, data, beta);
arm_matrix_instance_f32 R;
arm_mat_init_f32(&R, 120, 1, resids);
// static float32_t J_f32[720];
// gn_jacobian(J_f32, data, beta);
static float32_t J_f32[120];
gn_resids(J_f32, data, beta);
arm_matrix_instance_f32 J;
arm_mat_init_f32(&J, 120, 1, J_f32);
Hardfaults on Cortex M devices can be generated by various error conditions, for example:
Access of data outside valid memory
Invalid instructions
Division by zero
It is possible to gather information about the source of the hardfault by looking into some processor registers. IAR provides a debugger macro that helps to automate that process. It can be found in the IAR installation directory arm\config\debugger\ARM\vector_catch.mac. Please refer to this IAR Technical Note on Debugging Hardfaults for details on using this macro.
Depending on the type of the hardfault that occurs in your program you should try to narrow down the root cause within the debugger.
has anyone experience the following issue?
A stack variable getting changed/corrupted after calling ne10 assembly function such as ne10_len_vec2f_neon?
e.g
float gain = 8.0;
ne10_len_vec2f_neon(src, dst, len);
after the call to ne10_len_vec2f_neon, the value of gain changes as its memory is getting corrupted.
1. Note this only happens when the project is compiled in release build but not debug build.
2. Does Ne10 assembly functions preserve registers?
3. Replacing the assembly function call to c equivalent such as ne10_len_vec2f_c and both release and debug build seem to work OK.
thanks for any help on this. Not sure if there's an inherent issue within the program or it is really the call to ne10_len_vec2f_neon causing the corruption with release build.enter code here
I had a quick rummage through the master NEON code here:
https://github.com/projectNe10/Ne10/blob/master/modules/math/NE10_len.neon.s
... and it doesn't really touch address-based stack at all, so not sure it's a stack problem in memory.
However based on what I remember of the NEON procedure call standard q4-q7 (alias d8-d15 or s16-s31) should be preserved by the callee, and as far as I can tell that code is clobbering q4-6 without the necessary save/restore, so it does indeed look like it's clobbering the stack in registers.
In the failed case do you know if gain is still stored in FPU registers, and if yes which ones? If it's stored in any of s16/17/18/19 then this looks like the problem. It also seems plausible that a compiler would choose to use s16 upwards for things it needs to keep across a function call, as it avoids the need to touch in-RAM stack memory.
In terms of a fix, if you perform the following replacements:
s/q4/q8/
s/q5/q9/
s/q6/q10/
in that file, then I think it should work; no means to test here, but those higher register blocks are not callee saved.
I call from my C++ code a DLL that was written in MATLAB.
I observe a strange effect: the first call takes much more time that the next calls.
It takes 3-4 times more.
Is it normal?
Is it possible to do something with it?
Yes that is normal, the delay comes from starting up the MATLAB Runtime Compiler. This is what runs the MATLAB code from the dll that you created through MATLAB. The initial startup cannot be avoided AFAIK, but you can maybe add a dummy call to the DLL when your application begins in order to avoid the "cost" later.
Anecdotally, our builds seem slower after enabling these options. I've searched online a bit and tried to do some comparisons but found nothing conclusive. Wondering if anyone knows offhand.
A great way to answer your own question is to try and measure it. For instance, I tried to compile with SBT (which gives the build time in seconds). I took a medium sized project (78 scala source files) that I tried to compile with and without the flags. I started by doing 3 clean/compile invocations to warm-up the disks (be sure that everything is cached properly by the controller and the OS). Then I measured 3 time the build time to get an average.
For both cases (with and without the flags), the build time was identical. However, it is interesting to note that the first warm-up build was really slow: almost 7x slower ! Therefore it is very difficult to rely on impressions, because the build time will be dominated by the way you access your source files.
Unless your desktop is a teletype with particularly slow electromechanical relay switches, you're safe - it does the same work either way, so if there were a difference it'd be in how long it takes to display the deprecation/unchecked warnings.