STM32F103 Ram issue with FreeRTOS+Trace - stm32

just starting with FreeRTOS and I am having problem with task, so I thought it is the best time to start with learning debugging.
Trying to use Trace library to assess situation I got stuck on compilation process.
I am using CooCox IDE with ST-LinkV2.
Target device is STM32F103C8T6.
FreeRTOS is V8.2.2.
Tracealyzer Recorder Library is v2.7.7.
Error is:
[cc] c:/arm_development/gcc-arm-none-eabi-4_9-2015q1-20150306-win32/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe: FreeRTOSDemo.elf section `.bss' will not fit in region `ram'
[cc] c:/arm_development/gcc-arm-none-eabi-4_9-2015q1-20150306-win32/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe: region ram overflowed with stack
[cc] c:/arm_development/gcc-arm-none-eabi-4_9-2015q1-20150306-win32/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe: region `ram' overflowed by 6000 bytes
[cc] collect2.exe: error: ld returned 1 exit status
BUILD FAILED
Total time: 11 seconds
Any hints on that matter would be helpful, tnx in advance.

This is a basic tools question, not a FreeRTOS or FreeRTOS+Trace question, although you can fix it by changing the FreeRTOS configuration and/or FreeRTOS+Trace configuration.
The error is telling you that you have tried to use more RAM than the part you are using actually has, or at least, the amount of RAM you have told the linker your part actually has.
If you look at the map file for your application you will see which variables are consuming RAM. Probably the single largest will be the FreeRTOS heap. The FreeRTOS documentation tells you how to reduce that. Probably the second largest will be the trace buffer, and the trace configuration header file contains lots of documentation that will tell you how to reduce that.

Related

MATLAB error message on startup [duplicate]

Since a couple of days, I constantly receive the same error while using MATLAB which happens at some point with dlopen. I am pretty new to MATLAB, and that is why I don't know what to do. Google doesn't seem to be helping me either. When I try to make an eigenvector, I get this:
Error using eig
LAPACK loading error:
dlopen: cannot load any more object with static TLS
I also get this while making a multiplication:
Error using *
BLAS loading error:
dlopen: cannot load any more object with static TLS
I did of course look for the solutions to this problem, but I don't understand too much and don't know what to do. These are threads I found:
How do I use the BLAS library provided by MATLAB?
http://www.mathworks.de/de/help/matlab/matlab_external/calling-lapack-and-blas-functions-from-mex-files.html
Can someone help me please?
Examples of function calls demonstrating this error
>> randn(3,3)
ans =
2.7694 0.7254 -0.2050
-1.3499 -0.0631 -0.1241
3.0349 0.7147 1.4897
>> eig(ans)
Error using eig
LAPACK loading error:
dlopen: cannot load any more object with static TLS
That's bug no 961964 of MATLAB known since R2012b (8.0). MATLAB dynamically loads some libs with static TLS (thread local storage, e.g. see gcc compiler flag -ftls-model). Loading too many such libs => no space left.
Until now mathwork's only workaround is to load the important(!) libs first by using them early (they suggest to put "ones(10)*ones(10);" in startup.m). I better don't comment on this "solution strategy".
Since R2013b (8.2.0.701) with Linux x86_64 my experience is: Don't use "doc" (the graphical help system)! I think this doc-utility (libxul, etc.) is using a lot of static TLS memory.
Here is an update (2013/12/31)
All the following tests were done with Fedora 20 (with glibc-2.18-11.fc20) and Matlab 8.3.0.73043 (R2014a Prerelease).
For more information on TLS, see
Ulrich Drepper, ELF handling For Thread-Local Storage, Version 0.21, 2013,
currently available at Akkadia and Redhat.
What happens exactly?
MATLAB dynamically (with dlopen) loads several libraries that need tls initialization. All those libs need a slot in the dtv (dynamic thread vector). Because MATLAB loads several of these libs dynamically at runtime at compile/link time the linker (at mathworks) had no chance to count the slots needed (that's the important part). Now it's the task of the dynamic lib loader to handle such a case at runtime. But this is not easy. To cite dl-open.c:
For static TLS we have to allocate the memory here and
now. This includes allocating memory in the DTV. But we
cannot change any DTV other than our own. So, if we
cannot guarantee that there is room in the DTV we don't
even try it and fail the load.
There is a compile time constant (called DTV_SURPLUS, see glibc-source/sysdeps/generic/ldsodefs.h) in the glibc's dynamic lib loader for reserving a number of additional slots for such a mess (dynamically loading libs with static TLS in a multithreading program). In the glibc-Version of Fedora 20 this value is 14.
Here are the first libs (running MATLAB) that needed dtv slots in my case:
matlabroot/bin/glnxa64/libut.so
/lib64/libstdc++.so.6
/lib64/libpthread.so.0
matlabroot/bin/glnxa64/libunwind.so.8
/lib64/libuuid.so.1
matlabroot/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so
matlabroot/sys/java/jre/glnxa64/jre/lib/amd64/libfontmanager.so
matlabroot/sys/java/jre/glnxa64/jre/lib/amd64/libt2k.so
matlabroot/bin/glnxa64/mkl.so
matlabroot/sys/os/glnxa64/libiomp5.so
/lib64/libasound.so.2
matlabroot/sys/jxbrowser/glnxa64/xulrunner/xulrunner-linux-64/libxul.so
/lib64/libselinux.so.1
/lib64/libpixman-1.so.0
/lib64/libEGL.so.1
/lib64/libGL.so.1
/lib64/libglapi.so.0
Yes more than 14 => too many => no slot left in the dtv. That's what the error message tries to tell us and especially mathworks.
For the record: In order not to violate MATLAB's license I didn't debug, decompile or disassemble any part of the binaries shipped with MATLAB. I only debugged the free and open glibc-binaries of Fedora 20 that MATLAB were using to dynamically load the libs.
What can be done, to solve this problem?
There are 3 options:
(a)
Rebuild MATLAB and do not dynamically load those libs
(with initial-exec tls model) instead link against them (then the linker
can count the required slots!)
(b)
Rebuild those libs and ensure they are NOT using the initial-exec tls model.
(c)
Rebuild glibc and increase DTV_SURPLUS in
glibc/sysdeps/generic/ldsodefs.h
Obviously options (a) and (b) can only be done by mathworks.
For option (c) no source of MATLAB is needed and thus can be done without mathworks.
What is the status at mathworks?
I really tried to explain this to the "MathWorks Technical Support Department". But my impression is: they don't understand me. They closed my support ticket and suggested a telephone(!) conversation in January 2014 with a technical support manager.
I'll do my very best to explain this, but to be honest: I'm not very confident.
Update (2014/01/10): Currently mathworks is trying option (b).
Update (2014/03/19): For the file libiomp5.so you can download a newly compiled version (without static TLS) at mathworks, bug report 961964. And the other libs? No improvement there. So don't be suprised to get "dlopen: cannot load any more object with static TLS" with "doc", e.g. see bug report 1003952.
Restarting Matlab solved the problem for me.
long story short: in the directory that you start matlab from create a file
startup.m with content ones(10)*ones(10);. Restart matlab and it will be taken care of.
This is, as I find, an age-old problem yet unsolved by MathWorks.
Here are my two cents, which worked for me (when I wanted IT++ external libraries, with MEX).
Let the library that you found to be the cause of the problem be "libXYZ.so", and that you know where it lies on your system.
The solution is to inform MATLAB to load the specific library at the earliest of its startup. The reason for this error is apparently due to the lack of slots for this thread local storage aka tls purpose (due to they already been filled-up).
Because the latest compilations suddenly required a new library that was not loaded earlier during its startup, MATLAB throws up this error.
Pity that MATLAB never cared to resolve this problem so long.
Fortunately, the solution is a single, very simple terminal command.
Typical steps on a linux-machine should be as follows:
Open command prompt (Ctrl+Alt+T in Ubuntu)
Issue the following command
export LD_PRELOAD=<PATH-TO-libxyz.so>
e.g.: export LD_PRELOAD=/usr/local/lib/libitpp.so
Start matlab from the same terminal
matlab &
Running your program now should resolve the issue, as it is for my case.
Good luck!
Reference:
[1] http://au.mathworks.com/matlabcentral/answers/125117-openmp-mex-files-static-tls-problem
http://www.mathworks.de/support/bugreports/961964 has been updated on 30/01/2014.
There is a zip file attached with libiomp5.so
I tested it on Mageia 4 x86_64 with Matlab R2013b.
I can now use the Documentation of Matlab to open a demo without any problem.
I had the same problem and I think I just solved it.
When installing matlab use the custom installation (I did not do this the first time). Choose to create symbolic links to matlab scripts in the predefined folder (/usr/local/bin). This did the trick for me!
I had the same problem with both Matlab 2013b and Matlab 2014a. The fix provided by mathworks for libiomp5.so only removed the problem of LAPACK not working. However, I could not use external libraries which are using OpenMp (such as VL_FEAT): I still get the error
"dlopen: cannot load any more object with static TLS."
The only thing which worked for me was downgrading to Matlab 2012b.
I came across this problem after "bar" (for bar plots) with a an array gives me just a single blue block, with no errors thrown. Reboot at first solved the problem. But after a memory error (after processing a very large file), I just cannot get past this blue block problem.
Using "hist" on a matrix input gives me the "BLAS loading error" problem and led me to this thread. The Mathwork workaround fixed the hist and bar problems.
Just wanted to bring recognition to the extent of this bug's influence.
I had the same problem and solved it by increasing my Java Heap memory. Go to Preferences > General > Java-Heap Memory, and increase the allocated memory.
Increasing Java heap memory (to 512 mb) also worked for me on R2013b/Ubuntu 12.04. The "BLAS loading error" began when I processed an 11 GB file (with 16 GB RAM), and has not recurred after increasing java heap memory and restarting matlab.

Arduino flash memory limit

I have a project on Arduino Uno, and I am making it from Eclipse. The AVR compiler gives me this:
avrdude: 24348 bytes of flash written avrdude: verifying flash memory
against SunAngles.hex: avrdude: load data flash data from input file
SunAngles.hex: avrdude: input file SunAngles.hex auto detected as
Intel Hex avrdude: input file SunAngles.hex contains 24348 bytes
avrdude: reading on-chip flash data:
Reading | ################################################## | 100%
3.45s
avrdude: verifying ... avrdude: 24348 bytes of flash verified
avrdude done. Thank you.
The serial monitor does not print anything. If I make the project to be 23999 bytes then the serial monitor works. I have checked Eclipse's serial monitor and Arduino IDE's Serial monitor. They have the same problem. At the site it says that Arduino Uno has 32 KB flash memory and that 0.5 KB is used for the bootloader. What is happening?
In another question someone says to use serial.print(F(something));, and they give a library for pgm. What should I do to solve this problem?
Don't forget the small size of RAM, the 328's 2 KB. You may just be running out of RAM. I learned that when it runs out, it just kind of sits there. And to at first it really looked like a flash boundary problem. Just like your symptom.
I suggest reading the readme library to get the FreeRAM from this. It mentions how the "Serial.print" can consume both RAM and ROM.
I always now use
Serial.print(F("HELLO"));
versus
Serial.print("HELLO");
as it saves RAM, and this should be true for lcd.print. Where I always put a
Serial.println(freeMemory(), DEC); // Print how much RAM is available.
in the beginning of the code, and pay attention. Noting that there needs to be room to run the actual code and recurse into it.
The F() is now stock in Arduino 1.0 and replaces the need for the library function getPSTR().
The latest Arduino IDE also indicates a very rough estimate of expected RAM usage. So there is a switch for that in avr-gcc. You may also want to try using the avr-gcc 4.7.0 rather than 4.3.2 (stock for Arduino), as it claims to be more optimizing.
To equip yourself just in case anyone still has similar issues: Please read the blog post Optimizing SRAM on managing the limited Arduino memory.
From there, you will get a few things to keep in mind as you develop your sketch.
Avoid as much as you possibly can, any global variables. Keep them local to their functions.

Determining whether Renderscript is running on CPU/GPU & Number of Threads

I can't seem to find any documentation on how to check if RenderScript is actually parallelizing code. I'd like to know whether the CPU or GPU is being utilized and the number of threads dispatched.
The only thing I've found is this bug report:
http://code.google.com/p/android/issues/detail?id=28662
The author mentions that putting rsForEach in the script resulted it it being serialized by pointing to the following debug output:
01-02 00:21:59.960: D/RenderScript(1256): = 0 0x0
01-02 00:21:59.976: D/RenderScript(1256): = 1 0x1
I tried searching for a similar string in LogCat, but I haven't been able to find a match.
Any thoughts?
Update:
Actually I seem to have figured it out. It just seems that my LogCat foo isn't as good as it should be. I filtered the debug output by my application information and found a line like this:
02-26 22:30:05.657: V/RenderScript(26113): rsContextCreate dev=0x5cec0458
02-26 22:30:05.735: V/RenderScript(26113): 0x5d9f63b8 Launching thread(s), CPUs 2
This will only tell you how many CPUs could be used. This will not indicate how many threads or which processor is being used. By design RS avoid exposing this information
In general RS will use all the available CPU cores unless you call a "serial" function such as the rsg* or time functions. As to what criteria will result in a script being punted from the GPU to CPU, this will vary depending on the abilities of each vendors GPU.
The bug you referenced has been fixed in 4.1
I came across the same issue, when I was working with RS. I used Nexus 5 for my testing. I found that initial launch of RS utilized CPU instead of using GPU, this is verified using Trepn 5.0s application. Later I found that Nexus-5 GPU doesnt support double precision (Link to Adreno 330), so by default it ports it onto CPU. To overcome this I used #pragma rs_fp_relaxed top of my rs file along with header declarations.
So if you strictly want to port it onto GPU then it may be best way to find out your mobile GPU specs and try above trick and measure GPU utilization using Trepn 5.0s or equivalent application. As of now RS doesnt expose thread level details but during the implementation we can utilize the x and y arguments of our root - Kernel as thread indexes.
Debugging properties
RenderScript includes the debug.rs.default-CPU-driver and debug.rs.script debugging properties.
debug.rs.default-CPU-driver
Values = 0 or 1
Default value = 0
If set to 1, the Android Open Source Project (AOSP) implementation of RenderScript Compute
is used. This does not use any GPU features.
debug.rs.script
Values = 0 or 1
Default value = 0
If set to 1, additional diagnostic information is printed in the logcat. This information includes
the actual device a kernel is running on, either GPU or application processor.
If a kernel cannot be run on the GPU more detailed information is provided explaining why. For
example:
[RS-DIAG] No support for recursive calls on GPU
Arm® Mali™ RenderScript Best Practices_pdf

GDB error invalid offset, value too big (0x00000400) Unable to build app in debug mode need help

I have an app which was working fine few days ago. But today I'm getting this error:
{standard input}:1948:invalid offset, value too big (0x00000400)
Command /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/gcc-4.2 failed with exit >code 1
Ok Guys,
After a lot of troubleshooting finally I found the solution. A big Switch Case was the problem. Converting that into if else statement solved the problem.
I had a similar issue today while I was writing an assembly routine for an ARM Cortex-M0 processor. In my case, the code that caused the error looked like this:
ldr r7 ,=PRTCNFG_10
This is a pseudo instruction causing the processor to load the value of constant PRTCNFG_10 (defined using the .equ directive) into register r7. The pseudo instruction will be translated into
ldr r7 ,[pc, #immed8]
where #immed8 is an 8 bit immediate value. Since 2^8=256, the definition of PRTCNFG_10 must not be placed beyond pc+256 bytes, otherwise the Assembler will throw an error.
I solved the issue by explicitly allocating PRTCNFG_10 in memory:
PRTCNFG_10:
.word 0x606
Just saw the same issue, which also turned out to be caused by a switch case. It wasn't even that big (26 cases), and it had compiled fine in the past, but for some reason it started to fail today. Replacing it with if-else solved the weird GCC error.
While this question is not strictly about assembler, this question pops up in web searches about this specific errors often enough that I'd like to add an answer that should be helpful to people programming in it.
The assembler syntax is LDR REG, =SOMETHING.
If that SOMETHING is >16 bits, we got a problem because Thumb doesn't have 32-bit immediates. To fix this, the assembler remembers the constant and replaces the statement with a PC-relative load to something that's less than 0x400 bytes off (more than that doesn't fit in the instruction).
You then say
.ltoff
someplace convenient (e.g. right behind the next bx lr or pop {pc}) to direct the assembler to place these constants there.

The stack size used in kernel development

I'm developing an operating system and rather than programming the kernel, I'm designing the kernel. This operating system is targeted at the x86 architecture and my target is for modern computers. The estimated number of required RAM is 256Mb or more.
What is a good size to make the stack for each thread run on the system? Should I try to design the system in such a way that the stack can be extended automatically if the maximum length is reached?
I think if I remember correctly that a page in RAM is 4k or 4096 bytes and that just doesn't seem like a lot to me. I can definitely see times, especially when using lots of recursion, that I would want to have more than 1000 integars in RAM at once. Now, the real solution would be to have the program doing this by using malloc and manage its own memory resources, but really I would like to know the user opinion on this.
Is 4k big enough for a stack with modern computer programs? Should the stack be bigger than that? Should the stack be auto-expanding to accommodate any types of sizes? I'm interested in this both from a practical developer's standpoint and a security standpoint.
Is 4k too big for a stack? Considering normal program execution, especially from the point of view of classes in C++, I notice that good source code tends to malloc/new the data it needs when classes are created, to minimize the data being thrown around in a function call.
What I haven't even gotten into is the size of the processor's cache memory. Ideally, I think the stack would reside in the cache to speed things up and I'm not sure if I need to achieve this, or if the processor can handle it for me. I was just planning on using regular boring old RAM for testing purposes. I can't decide. What are the options?
Stack size depends on what your threads are doing. My advice:
make the stack size a parameter at thread creation time (different threads will do different things, and hence will need different stack sizes)
provide a reasonable default for those who don't want to be bothered with specifying a stack size (4K appeals to the control freak in me, as it will cause the stack-profligate to, er, get the signal pretty quickly)
consider how you will detect and deal with stack overflow. Detection can be tricky. You can put guard pages--empty--at the ends of your stack, and that will generally work. But you are relying on the behavior of the Bad Thread not to leap over that moat and start polluting what lays beyond. Generally that won't happen...but then, that's what makes the really tough bugs tough. An airtight mechanism involves hacking your compiler to generate stack checking code. As for dealing with a stack overflow, you will need a dedicated stack somewhere else on which the offending thread (or its guardian angel, whoever you decide that is--you're the OS designer, after all) will run.
I would strongly recommend marking the ends of your stack with a distinctive pattern, so that when your threads run over the ends (and they always do), you can at least go in post-mortem and see that something did in fact run off its stack. A page of 0xDEADBEEF or something like that is handy.
By the way, x86 page sizes are generally 4k, but they do not have to be. You can go with a 64k size or even larger. The usual reason for larger pages is to avoid TLB misses. Again, I would make it a kernel configuration or run-time parameter.
Search for KERNEL_STACK_SIZE in linux kernel source code and you will find that it is very much architecture dependent - PAGE_SIZE, or 2*PAGE_SIZE etc (below is just some results - many intermediate output are deleted).
./arch/cris/include/asm/processor.h:
#define KERNEL_STACK_SIZE PAGE_SIZE
./arch/ia64/include/asm/ptrace.h:
# define KERNEL_STACK_SIZE_ORDER 3
# define KERNEL_STACK_SIZE_ORDER 2
# define KERNEL_STACK_SIZE_ORDER 1
# define KERNEL_STACK_SIZE_ORDER 0
#define IA64_STK_OFFSET ((1 << KERNEL_STACK_SIZE_ORDER)*PAGE_SIZE)
#define KERNEL_STACK_SIZE IA64_STK_OFFSET
./arch/ia64/include/asm/mca.h:
u64 mca_stack[KERNEL_STACK_SIZE/8];
u64 init_stack[KERNEL_STACK_SIZE/8];
./arch/ia64/include/asm/thread_info.h:
#define THREAD_SIZE KERNEL_STACK_SIZE
./arch/ia64/include/asm/mca_asm.h:
#define MCA_PT_REGS_OFFSET ALIGN16(KERNEL_STACK_SIZE-IA64_PT_REGS_SIZE)
./arch/parisc/include/asm/processor.h:
#define KERNEL_STACK_SIZE (4*PAGE_SIZE)
./arch/xtensa/include/asm/ptrace.h:
#define KERNEL_STACK_SIZE (2 * PAGE_SIZE)
./arch/microblaze/include/asm/processor.h:
# define KERNEL_STACK_SIZE 0x2000
I'll throw my two cents in to get the ball rolling:
I'm not sure what a "typical" stack size would be. I would guess maybe 8 KB per thread, and if a thread exceeds this amount, just throw an exception. However, according to this, Windows has a default reserved stack size of 1MB per thread, but it isn't committed all at once (pages are committed as they are needed). Additionally, you can request a different stack size for a given EXE at compile-time with a compiler directive. Not sure what Linux does, but I've seen references to 4 KB stacks (although I think this can be changed when you compile the kernel and I'm not sure what the default stack size is...)
This ties in with the first point. You probably want a fixed limit on how much stack each thread can get. Thus, you probably don't want to automatically allocate more stack space every time a thread exceeds its current stack space, because a buggy program that gets stuck in an infinite recursion is going to eat up all available memory.
If you are using virtual memory, you do want to make the stack growable. Forcing static allocation of stack sized, like is common in user-level threading like Qthreads and Windows Fibers is a mess. Hard to use, easy to crash. All modern OSes do grow the stack dynamically, I think usually by having a write-protected guard page or two below the current stack pointer. Writes there then tell the OS that the stack has stepped below its allocated space, and you allocate a new guard page below that and make the page that got hit writable. As long as no single function allocates more than a page of data, this works fine. Or you can use two or four guard pages to allow larger stack frames.
If you want a way to control stack size and your goal is a really controlled and efficient environment, but do not care about programming in the same style as Linux etc., go for a single-shot execution model where a task is started each time a relevant event is detected, runs to completion, and then stores any persistent data in its task data structure. In this way, all threads can share a single stack. Used in many slim real-time operating systems for automotive control and similar.
Why not make the stack size a configurable item, either stored with the program or specified when a process creates another process?
There are any number of ways you can make this configurable.
There's a guideline that states "0, 1 or n", meaning you should allow zero, one or any number (limited by other constraints such as memory) of an object - this applies to sizes of objects as well.