Arduino flash memory limit - eclipse

I have a project on Arduino Uno, and I am making it from Eclipse. The AVR compiler gives me this:
avrdude: 24348 bytes of flash written avrdude: verifying flash memory
against SunAngles.hex: avrdude: load data flash data from input file
SunAngles.hex: avrdude: input file SunAngles.hex auto detected as
Intel Hex avrdude: input file SunAngles.hex contains 24348 bytes
avrdude: reading on-chip flash data:
Reading | ################################################## | 100%
3.45s
avrdude: verifying ... avrdude: 24348 bytes of flash verified
avrdude done. Thank you.
The serial monitor does not print anything. If I make the project to be 23999 bytes then the serial monitor works. I have checked Eclipse's serial monitor and Arduino IDE's Serial monitor. They have the same problem. At the site it says that Arduino Uno has 32 KB flash memory and that 0.5 KB is used for the bootloader. What is happening?
In another question someone says to use serial.print(F(something));, and they give a library for pgm. What should I do to solve this problem?

Don't forget the small size of RAM, the 328's 2 KB. You may just be running out of RAM. I learned that when it runs out, it just kind of sits there. And to at first it really looked like a flash boundary problem. Just like your symptom.
I suggest reading the readme library to get the FreeRAM from this. It mentions how the "Serial.print" can consume both RAM and ROM.
I always now use
Serial.print(F("HELLO"));
versus
Serial.print("HELLO");
as it saves RAM, and this should be true for lcd.print. Where I always put a
Serial.println(freeMemory(), DEC); // Print how much RAM is available.
in the beginning of the code, and pay attention. Noting that there needs to be room to run the actual code and recurse into it.
The F() is now stock in Arduino 1.0 and replaces the need for the library function getPSTR().
The latest Arduino IDE also indicates a very rough estimate of expected RAM usage. So there is a switch for that in avr-gcc. You may also want to try using the avr-gcc 4.7.0 rather than 4.3.2 (stock for Arduino), as it claims to be more optimizing.

To equip yourself just in case anyone still has similar issues: Please read the blog post Optimizing SRAM on managing the limited Arduino memory.
From there, you will get a few things to keep in mind as you develop your sketch.
Avoid as much as you possibly can, any global variables. Keep them local to their functions.

Related

ThreadX RAM issue on STM32

I'm currently starting to use ThreadX on a STM32 Nucleo-H723ZG (STM32H723ZG MCU).
I noticed that when loading the Nx_TCP_Echo_Server / Nx_TCP_Echo_Client projects from CubeMX, the RAM gets filled up pretty much to the top, which makes me wonder, how I'm supposed to add my own code and data here.
Since I'm pretty new to RAM partitioning, RTOS and similar, I don't have a perfect feeling for what is wrong or right and how to proceed (and if it is a problem at all).
Nevertheless I wonder, if maybe using a different way of partitioning the RAM or by dropping some non-necessary code-parts, the RAM could be freed-up.
Or a different way of thinking:
Since RAM_D1 got filled, but _D2, _D3 and DTCMRAM are pretty much empty, is there a way to use the free RAM for my own purposes (I would like to let SPI and ADC processing run via DMA, so this needs a place to go ....)
Hope my questions are not too confusing ;)
The system has the following amount of RAM, according to STM:
"SRAM: total 564 Kbytes all with ECC, including 128 Kbytes of data TCM RAM for critical real-time data + 432 Kbytes of system RAM (up to 256 Kbytes can remap on instruction TCM RAM for critical real time instructions) + 4 Kbytes of backup SRAM (available in the lowest-power modes)" (see STMs STM32H723ZG MCU product page)
Down below you'll find screenshots of the current RAM usage, for RAM_D1 especially .tcp_sec eats up most of the RAM.
--> Can .tcp_sec be optimized or kicked-out?
If tcp means here the tcp protocol, maybe this can be a way to optimize this, since I'm not sure whether I need a handshake etc., maybe UDP is sufficient (and faster for the ADC data streaming) ... what do you say?
Edit:
The linker-file shows, that there .tcp_sec (NOLOAD) is written ... is NOLOAD maybe a hint on a "placebo" RAM occupation (pre-allocation / reservation, but no actual usage?)
Linker-script extract:
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM_D1
.tcp_sec (NOLOAD) : {
. = ABSOLUTE(0x24048000);
*(.RxDecripSection)
. = ABSOLUTE(0x24048060);
*(.TxDecripSection)
} >RAM_D1 AT> FLASH
For context:
I am developing a "system controller", where my plan is to have it running a RTOS, which manages the read-in of analog values, writing control messages via SPI to two other STMs of the same kind and communicating via Ethernet to my desktop application.
The desktop application is then in charge of post-processing the digitized analog values and sending control messages to the system controller. In the best case the system controller digitizes the analog signal on ADC3 with 5 MSPS (at probably 6 Bit resolution = 30 MBit/s) and sends that data hickup-free to my desktop application.
-> Is this plan possible on this MCU?
I tried to buy a higher (more RAM) version of the nucleo I've got, but due to shortages this one is the best one I was able to get.
For the RTOS I'd like to stick with ThreadX, since FreeRTOS support in STM32IDE seems to be phased out now, after ThreadX was employed as the RTOS by STM.
(I like the easy register configuration using CubeMX/STM32 IDE, hence my drive to use that SW universe ... if there are good reasons to use a different RTOS, tell me :) )
Thank you for your time!
I generated the same project on my side and took a look. I believe you should be able to implement what you want in this CPU. You will need to carefully use the available memory.
It seems there is a confusion about the section .tcp_sec. It contains DMA reception and transmission descriptors for the Ethernet controller/driver. These are constrained by the driver and hardware to be at a specific address. The descriptors are rather small, but the buffers are bigger. With some work these can be reduced. If you are using Ethernet you will need this, no mater if you use TCP or not. As I said, the name can be confusing.
The flash has still plenty of space available. In the debug configuration only about 11% is used. The rest is available for your application code.
You can locate you data in other memory regions. Depending on the toolchain you will use is how you will need to tell the compiler/linker where your data goes. You can look towards the top of the main.c file in that example to see how the DMA descriptors are assigned to a specific section for three different toolchains (IAR, ARM MDK, GCC).
In terms of how to most efficiently use and configure the microcontroller peripherals please get in touch with STMicro, they will know best.
This should get you started. Let us know if this helps!

Loading OS image from floppy disk without INT 13 of BIOS Service

How can I load OS image from floppy disk to memory without BIOS Service while booting my PC?
The only way I’ve used is calling int13h in real mode .
I got to know that I need to handle with ‘Disk controller’ .
Do I need to write kinda ‘Device driver’ in [BIT 16] real mode and is it possible?
As 0andriy has commented, you will have to communicate with the floppy controller directly, bypassing the BIOS. (Which BTW, why do you want to do such a thing? The BIOS was made specifically so you don't have to do this. Is it solely because you want to, maybe to learn how to program the FDC? I'm okay with that.)
The FDC (Floppy Disk Controller) is of the ISA (Industry Standard Architecture) era, back when I/O ports were hard coded to specific addresses. The FDC came in many variants, but most followed a standard rule. The original 756 was a common FDC, with later (still really old to today's standards) controllers following the 82077AA variant.
These controllers had twelve (12) registers using eight (8) I/O Byte addresses, Base + 00h to Base + 07h. (Please note that a single I/O address can be two registers if one is a read and one is a write.) You read and write to these registers to instruct the FDC to do things, such as start the motor for drive 1. (For fun: Did you know that the FDC was originally capable of handling four drives?)
This isn't to difficult to do, but now you have to have some way for the ISA bus to communicate with the FDC and the main memory. In comes the DMA (Direct Memory Access). Now you have to also program the DMA to make the transfers.
Here is the catch. If you don't have all of the FDC and DMA code within the first 512 bytes of the floppy, the 512 bytes the BIOS loaded for you already, there is no way to load the remaining sectors. For example, you can't have your DMA code in the second sector of your boot code expecting to call it, since you have to use that DMA to load that sector in the first place. All FDC and DMA code, at least a minimum read service, must be in the first sector of the disk. This is quite difficult to do, reliably.
I am not saying it is impossible to do, I am just saying it is improbable. For one thing, if you can do it (reliably) in 512 bytes, I would like to see it. It might be a fun experiment. Anyway, do a search for FDC, DMA, etc., things I wrote of here. There are many examples on the web. If you wish to read a book about it, I wrote such a book a while back with all the juicy details.

indexing problem when calling fit() function

I'm currently working on a project of a nn to play a game similar to atari games (more details in the link). I'm having trouble with the indexing. perhaps anyone knows what could be the problem? because I cant seem to find it. Thank you for your time. here's my code (click on the link) and here's the full traceback. the problem starts from the way I call
history = network.fit(state, epochs=10, batch_size=10) // in line 82
See this post: Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
As said in the correct answer,
Modern CPUs provide a lot of low-level instructions, besides the usual arithmetic and logic, known as extensions, e.g. SSE2, SSE4, AVX, etc. From the Wikipedia:
The warning states that your CPU does support AVX (hooray!).
Pretty much, AVX speeds up your training, etc. Sadly, tensorflow is saying that they aren't going to use it... Why?
Because tensorflow default distribution is built without CPU extensions, such as SSE4.1, SSE4.2, AVX, AVX2, FMA, etc. The default builds (ones from pip install tensorflow) are intended to be compatible with as many CPUs as possible. Another argument is that even with these extensions CPU is a lot slower than a GPU, and it's expected for medium- and large-scale machine-learning training to be performed on a GPU.
What should yo do?
If you have a GPU, you shouldn't care about AVX support, because most expensive ops will be dispatched on a GPU device (unless explicitly set not to). In this case, you can simply ignore this warning by:
# Just disables the warning, doesn't enable AVX/FMA
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
If you don't have a GPU and want to utilize CPU as much as possible, you should build tensorflow from the source optimized for your CPU with AVX, AVX2, and FMA enabled if your CPU supports them. It's been discussed in this question and also this GitHub issue. Tensorflow uses an ad-hoc build system called bazel and building it is not that trivial, but is certainly doable. After this, not only will the warning disappear, tensorflow performance should also improve.
You can find all the details and comments in this StackOverflow question.
NOTE: This answer is a product of my professional copy-and-pasting.
Happy coding,
Bobbay
Has the code been debugged line by line ? as this would trace to the line causing error.
I assume the index error crops up from the below one - where "i" and further targets[i] , outs[i] can be checked for the values they have -
per_sample_losses = loss_fn.call(targets[i], outs[i])

OracleSolaris 11.2 -- is /usr/kernel/drv/driver.conf required for PCI?

I'm implementing a small PCI driver for academic purposes, and one thing I'm not clear about if we actually have to provide driver.conf? Different materials which I read (including http://blog.csdn.net/hotsolaris/article/details/1763716), say that for PCI the driver config file is optional, however in my case it seems that pci_config_setup() is successful only with driver.conf provided:
name="mydrv" parent="/pci#0,0/pci8086,2e11"
Then I do:
% add_drv -i 'pciXXXX,YY' mydrv
and it adds in the system with no warning or error messages.
So I assume that some properties of a PCI device can't be derived automatically by the system, e.g. parent bus?
I would appreciate if anybody could shed some light on this. Thanks.
If you look at a random selection of very small files under /kernel/drv for actual physical hardware, you'll see that they almost always only contain the line
ddi_forceattach=1;
Pseudo drivers will have a driver.conf(4) file which reflects their parentage in the system. I really recommend reading that manpage, it goes into good detail about what's required here.

Basic question regarding ROM based executable

I have basic doubt regarding executable stored in ROM.
As I know the executable with text and RO attributes is stored in ROM. Question is as ROM is for Read Only Memory, what happens if there is situation where the code needs to write into memory?
I am not able to conjure up any example to cite here (probably I am ignorant of such situation or I am missing out basic stuff ;) but any light on this topic can greatly help me to understand! :)
Last off -
1. Is there any such situation?
2. In such a case is copying the code from ROM to RAM is the answer?
Answer with some example can greatly help..
Many thanks in advance!
/MS
Read-only memory is read only because of hardware restrictions. The program might be in an EEPROM, flash memory protected from writes, a CD-ROM, or anything where the hardware physically disallows writing. If software writes to ROM, the hardware is incapable of changing the stored data, so nothing happens.
So if a software program in ROM wants to write to memory, it writes to RAM. That's the only option. If a program is running from ROM and wants to change itself, it can't because it can't write to ROM. But yes, the program can run from RAM.
In fact, running from ROM is rare except in the smallest embedded systems. Operating systems copy executable code from ROM to RAM before running it. Sometimes code is compressed in ROM and must be decompressed into RAM before running. If RAM is full, the operating system uses paging to manage it. The reason running from ROM is so rare is because ROM is slower than RAM and sometimes code needs to be changed by the loader before running.
Note that if you have code that modifies itself, you really have to know your system. Many systems use data-execution prevention (DEP). Executable code goes in read+execute areas of RAM. Data goes in read+write areas. So on these systems, code can never change itself in RAM.
Normally only program code, constants and initialisation data are stored in ROM. A separate memory area in RAM is used for stack, heap, etc.
There are few legitimate reasons why you would want to modify the code section at runtime. The compiler itself will not generate code that requires that.
Your linker will have an option to generate a MAP file. This will tell you where all memory objects are located.
The linker chooses where to locate based on a linker script (which you can customise to organise memory as you require). Typically on a FLASH based microcontroller code and constant data will be placed in ROM. Also placed in ROM are the initialisation data for non-zero initialised static data, this is copied to RAM before main() is called. Zero initialised static data is simply cleared to zero before main().
It is possible to arrange for the linker to locate some or all of the code in ROM and have the run-time start-up code copy it to RAM in the same way as the non-zero static data, but the code must either be relocatable or be located to RAM in the first instance, you cannot usually just copy code intended to run from ROM to RAM and expect it to run since it may have absolute address references in it (unless perhaps your target has an MMU and can remap the address space). Locating in RAM on micro-controllers is normally done to increase execution speed since RAM is typically faster than FLASH when high clock speeds are used, producing fewer or zero wait states. It may also be used when code is loaded at runtime from a filesystem rather than stored in ROM. Even when loaded into RAM, if the processor has an MMU it is likely that the code section in RAM section will be marked read-only.
Harvard architecture microcontrollers
Many small microcontrollers (Microchip PIC, Atmel AVR, Intel 8051, Cypress PSoC, etc.) have a Harvard architecture.
They can only execute code from the program memory (flash or ROM).
It's possible to copy any byte from program memory to RAM.
However, (2) copying executable instructions from ROM to to RAM is not the answer -- with these small microcontrollers, the program counter always refers to some address in the program memory. It's not possible to execute code in RAM.
Copying data from ROM to RAM is pretty common.
When power is first applied, a typical firmware application zeros all the RAM and then copies the initial values of non-const global and static variables out of ROM into RAM just before main() starts.
Whenever the application needs to push a fixed string out the serial port, it reads that string out of ROM.
With early versions of these microcontrollers, an external "device programmer" connected to the microcontroller is the only way change the program.
In normal operation, the device was nowhere near a "device programmer".
If the software running on the microcontroller needed to write to program memory ROM -- sorry, too bad --
it was impossible.
Many embedded systems had non-volatile EEPROM that the code could write to -- but this was only for storing data values. The microcontroller could only execute code in the program ROM, not the EEPROM or RAM.
People did may wonderful things with these microcontrollers, including BASIC interpreters and bytecode Forth interpreters.
So apparently (1) code never needs to write to program memory.
With a few recent "self-programming" microcontrollers (from Atmel, Microchip, Cypress, etc.),
there's special hardware on the chip that allows software running on the microcontroller to erase and re-program blocks of its own program memory flash.
Some few applications use this "self-programming" feature to read and write data to "extra" flash blocks -- data that is never executed, so it doesn't count as self-modifying code -- but this isn't doing anything you couldn't do with a bigger EEPROM.
So far I have only seen two kinds of software running on Harvard-architecture microcontrollers that write new executable software to its own program Flash: bootloaders and Forth compilers.
When the Arduino bootloader (bootstrap loader) runs and detects that a new application firmware image is available, it downloads the new application firmware (into RAM), and writes it to Flash.
The next time you turn on the system it's now running shiny new version 16.98 application firmware rather than clunky old version 16.97 application firmware.
(The Flash blocks containing the bootloader itself, of course, are left unchanged).
This would be impossible without the "self-programming" feature of writing to program memory.
Some Forth implementations run on a small microcontroller, compiling new executable code and using the "self-programming" feature to store it in program Flash -- a process somewhat analogous to the JVM's "just-in-time" compiling.
(All other languages seem to require a compiler far too large and complicated to run on a small microcontroller, and therefore have a edit-compile-download-run cycle that takes much more wall clock time).