SD card lifetime optimization - sd-card

Simple question:
Which approach is best in terms of prolonging the life expectancy of an SD card?
Writing 10-minute files with 10 Hz lines of data input (~700 kB each)
1) directly to the SD card
or
2) to the internal memory of the device, then moving the file to the SD card
?
The amount of data being written to the SD card remains the same. The question is simply if a lot of tiny file operations (6000 lines written in the course of ten minutes, 100 ms apart) or one file operation moving the entire file containing the 6000 lines onto the card as once is better. Or does it even matter? Of course the card specifications are hugely important as well, but let's leave that out of the discussion.

1) You should only write to fill flash page boundaries discussed here:
https://electronics.stackexchange.com/questions/227686/sd-card-sector-size
2) Keeping fault-tolerant track of how much data is written where also needs to be written. That counts as a write hit on FAT etc as well, on a page that gets more traffic than others. Avoid if possible (ie fdup/fclose/fopen append) techniques which cause buffer and directory cached data to be flushed. But I would use this trick every minute or so so you never lose more than a minute of data on a crash or accidental removal.
3) OS-supported wear leveling will solve the above, if properly implemented. I have read horror stories about flash memories being destroyed in days.
4) Calculate expected life using the total wear-leveled lifetime writes spec of that memory. Usually in TB's. If you see numbers in the decades, don't bother doing more than (1).
5) Which OS and file-system you are using matters somewhat. For example EXT3 is supposedly faster than EXT2 due to less drive access at a slightly higher risk ratio. Since your question doesn't ask about OS/FS you use, I'll leave the rest of that up to you.

Related

MATLAB: Are there any problems with many (millions) small files compared to few (thousands) large files?

I'm working on a real-time test software in MATLAB. On user input I want to extract the value of one (or a few neighbouring) pixels from 50-200 high resolution images (~25 MB).
My problem is that the total image set is to big (~2000 images) to store in RAM, consequently I need to read each of the 50-200 images from disk after each user-input which of course is way to slow!
So I was thinking about splitting the images into sub-images (~100x100 pixels) and saving these separately. This would make the image-read process quick enough.
Are there any problems I should be aware of with this approach? For instance I've read about people having trouble copying many small files, will this affect me to i.e. make the image-read slower?
rahnema1 is right - imread(...,'PixelRegion') will fasten read operation. If it is not enough for you, even if your files are not fragmented, may be it is time to think about some database?
Disk operations are always the bottleneck. First we switch to disk caches, then distributed storage, then RAID, and after some more time, we finish with in-memory databases. You should choose which access speed is reasonable.

Why do we need to specify the number of flash wait cycles?

Especially when working with "faster" devices like STMF4xx/F7xx we need to specify the number of flash wait cycles, based on the supply voltage and the sys-clock frequency.
When the CPU fetches instructions/or constants this is done over the FLITF. Am I right with the assumption that the FLITF holds a CPU request as long as it can provide the requested data, making it impossible for other Bus-Masters to access flash meanwhile.
If this was true, why should it be important to any interface to know flash wait cycles. Like Cache does preload instructions so or so, independent if it knows how long to wait, no?
Because the flash interface isn't magic.
It has to meet the necessary setup and hold times for addressing and reading out the flash cells, which will vary somewhat depending on voltage. Taking the STM32F411 as an example (because I have that TRM handy), doing some maths with the voltage/frequency/wait-state table implies that a read from flash on one of those takes in the order of ~30ns above 2.7V, down to ~60ns below 2.1V.
Since the flash interface doesn't have its own asynchronous nanosecond-precision timekeeping ability (because that would be needlessly complicated, power-hungry, and silly), that translates to asserting its signals for n clock cycles, after which it can assume the data signals from the cells are stable enough to read back*. How does it know what the clock frequency is, and therefore what n should be? Simple: you, as the programmer who set the clock, tell it. Some hardware things are just infinitely easier to let software deal with.
* and then going through the further shenanigans of extracting the relevant 8, 16 or 32 bits out of the 128-bit line it's read, to finally spit that out the other side onto the AHB bus to the waiting CPU, obviously.

Optimizing compression using HDF5/H5 in Matlab

Using Matlab, I am going to generate several data files and store them in H5 format as 20x1500xN, where N is an integer that can vary, but typically around 2300. Each file will have 4 different data sets with equal structure. Thus, I will quickly achieve a storage problem. My two questions:
Is there any reason not the split the 4 different data sets, and just save as 4x20x1500xNinstead? I would prefer having them split, since it is different signal modalities, but if there is any computational/compression advantage to not having them separated, I will join them.
Using Matlab's built-in compression, I set deflate=9 (and DataType=single). However, I have now realized that using deflate multiplies my computational time with 5. I realize this could have something to do with my ChunkSize, which I just put to 20x1500x5 - without any reasoning behind it. Is there a strategic way to optimize computational load w.r.t. deflation and compression time?
Thank you.
1- Splitting or merging? It won't make a difference in the compression procedure, since it is performed in blocks.
2- Your choice of chunkshape seems, indeed, bad. Chunksize determines the shape and size of each block that will be compressed independently. The bad is that each chunk is of 600 kB, that is much larger than the L2 cache, so your CPU is likely twiddling its fingers, waiting for data to come in. Depending on the nature of your data and the usage pattern you will use the most (read the whole array at once, random reads, sequential reads...) you may want to target the L1 or L2 sizes, or something in between. Here are some experiments done with a Python library that may serve you as a guide.
Once you have selected your chunksize (how many bytes will your compression blocks have), you have to choose a chunkshape. I'd recommend the shape that most closely fits your reading pattern, if you are doing partial reads, or filling in in a fastest-axis-first if you want to read the whole array at once. In your case, this will be something like 1x1500x10, I think (second axis being the fastest, last one the second fastest, and fist the slowest, change if I am mistaken).
Lastly, keep in mind that the details are quite dependant on the specific machine you run it: the CPU, the quality and load of the hard drive or SSD, speed of RAM... so the fine tuning will always require some experimentation.

SD card write limit - data logging

I want track/register when my system (a Raspberry Pi) was shut down, usually due to abrupt power loss.
I want to do it by recording a heartbeat every 10 minutes to an SD card - so every 10 mins it'd go to the SD and write the current time/date in a file. Would that damage the SD in the long run?
If there's only 100k write cycles to it, it'd have a bad block in a couple of years. But I've read there's circuitry to prevent it - would it prevent the bad block? Would it be safer to distribute the log in several blocks?
Thanks
The general answer to this question is a strong "it depends". (Practical answer is what you already have; if your file system parameters are not wrong, you have a large margin in this case.) It depends on the following:
SD card type (SLC/MLC)
SD card controller (wear levelling)
SD card size
file system
luck
If we take a look at a flash chip, it is organised into sectors. A sector is an area which can be completely erased (actually reset to a state with only 1's), typically 128 KiB for SD cards. Zeros can be written bit-by-bit, but the only way to write ones is to erase the sector.
The number of sector erases is limited. The erase operation will take longer each time it is performed on the same sector, and there is more uncertainty in the values written to each cell. The write limit given to a card is really the number of erases for a single sector.
In order to avoid reaching this limit too fast, the SD card has a controller which takes care of wear levelling. The basic idea is that transparently to the user the card changes which sectors are used. If you request the same memory position, it may be mapped to different sectors at different times. The basic idea is that the card has a list of empty sectors, and whenever one is needed, it takes the one which has been used least.
There are other algorithms, as well. The controller may track sector erase times or errors occurring on a sector. Unfortunately, the card manufacturers do not usually tell too much about the exact algorithms, but for an overview, see:
http://en.wikipedia.org/wiki/Wear_leveling
There are different types of flash chips available. SLC chips store only one bit per memory cell (it is either 0 or 1), MLC cells store two or three bits. Naturally, MLC chips are more sensitive to ageing. Three-bit (eight level) cells may not endure more than 1000 writes. So, if you need reliability, take a SLC card despite its higher price,
As the wear levelling distributes the wear across the card, bigger cards endure more sector erases than small cards, as they have more sectors. In principle, a 4 GiB card with 100 000 write cycles will be able to carry 400 TB of data during its lifetime.
But to make things more complicated, the file system has a lot to do with this. When a small piece of data is written onto a disk, a lot of different things happen. At least the data is appended to the file, and the associated directory information (file size) is changed. With a typical file system this means at least two 4 KiB block writes, of which one may be just an append (no requirement for an erase). But a lot of other things may happen: write to a journal, a block becoming full, etc.
There are file systems which have been tuned to be used with flash devices (JFFS2 being the most common). They are all, as far as I know, optimised for raw flash and take care of wear levelling and use bit or octet level atomic operations. I am not aware of any file systems optimised for SD cards. (Maybe someone with academic interests could create one taking the wear levelling systems of the cards into account. That would result in a nice paper or even a few.) Fortunately, the usual file systems can be tuned to be more compatible (faster, leads wear and tear) with the SD card by tweaking file system parameters.
Now that there are these two layers on top of the physical disk, it is almost impossible to track how many erases have been performed. One of the layers is very complicated (file system), the other (wear levelling) completely non-transparent.
So, we can just make some rough estimates. Let's guess that a small write invalidates two 4 KiB blocks in average. This way logging every 10 minutes consumes a 128 KiB erase sector every 160 minutes. If the card is a 8 GiB card, it has around 64k sectors, so the card is gone through once every 20 years. If the card endures 1000 write cycles, it will be good for 20 000 years...
The calculation above assumes perfect wear levelling and a very efficient file system. However, a safety factor of 1 000 should be enough.
Of course, this can be spoiled quite easily. One of the easiest ways is to forget to mount the disk with the noatime attribute. Then the file system will update file access times, which may result in a write every time a file is accessed (even read). Or the OS is swapping virtual memory onto the card.
Last but not least of the factors is luck. Modern SD cards have the unfortunate tendency to die from other causes. The number of lemons with even quite well-known manufacturers is not very small. If you kill a card, it is not necessarily because of the wear limit. If the card is worn out, it is still readable. If it is completely dead, it has died of something else (static electricity, small fracture somewhere).

Why do game developers put many images into one big image?

Over the years I've often asked myself why game developers place many small images into a big one. But not only game developers do that. I also remember the good old Winamp MP3 player had a user interface design file which was just one huge image containing lots of small ones.
I have also seen some big javascript GUI libraries like ext.js using this technique. In ext.js there is a big image containing many small ones.
One thing I noticed is this: No matter how small my PNG image is, the Finder on the Mac always tells me it consumes at least 4kb. Which is heck of a lot if you have just 10 pixels.
So is this done because storing 20 or more small images into a big one is much more memory efficient versus having 20 separate files, each of them probably with it's own header and metadata?
Is it because locating files on the file system is expensive and slow, and therefore much faster to simply locate only one big image and then split it up into smaller ones, once it is loaded into memory?
Or is it lazyness, because it is tedious to think of so many file names?
Is ther a name for this technique? And how are those small images separated from the big one at runtime?
This is called spriting - and there are various reasons to do it in different situations.
For web development, it means that only one web request is required to fetch the image, which can be a lot more efficient than several separate requests. That's more efficient in terms of having less overhead due to the individual requests, and the final image file may well be smaller in total than it would have been otherwise.
The same sort of effect may be visible in other scenarios - for example, it may be more efficient to store and load a single large image file than multiple small ones, depending on the file system. That's entirely aside from any efficiencies gained in terms of the raw "total file size", and is due to the per file overhead (a directory entry, block size etc). It's a bit like the "per request" overhead in the web scenario, but due to slightly different factors.
None of these answers are right. The reason we pack multiple images into one big "sprite sheet" or "texture atlas" is to avoid swapping textures during rendering.
OpenGL and Direct-X take a performance hit when you draw from one image (texture) and the switch to another, so we pack multiple images into one big image and then we can draw several (or hundreds) of images and never switch textures. It has nothing to do with the 4K file size (or hasn't in 15 years).
Also, up until very recently, textures had to by powers of 2 (64, 128, 256) and if your game had lots of odd sized images, that's a lot of wasted memory. Packing them in a single texture could save a lot of space.
The 4kb usage is a side effect of how files are stored on disk. The smallest possible addressable bit of storage in a filesystem is a block, which is usually a fixed size of 512, 1024, 2048, etc... bytes. In your Mac's case, it's using 4k blocks. That means that even a 1-byte file will require at least 4kbytes worth of physical space to store, as it's not possible for the file system to address any storage unit SMALLER than 4k.
The reasons for these "large" blocks vary, but the big one is that the more "granular" your addressing gets (the small the blocks), the more space you waste on indexes to list which blocks are assigned to which files. If you had 1-byte sized blocks, then for every byte of data you store in a file, you'd also need to store 1+ bytes worth of usage information in the file system's metadata, and you'd end up wasting at least HALF of your storage on nothing but indexes.
The converse is true - the bigger the blocks, the more space is wasted for every smaller-than-one-block sized file you store, so in the end it comes down to what tradeoff you're willing to live with.
The reasons are a bit different in different environments.
On the web the main reason is to reduce the number of requests to the web server. Each requests creates overhead, most notably a separate round trip over the network.
When fetching from good ol' mechanical hard drives good read performance requires contiguous data. If you save data in lots of files you get extra seek-time for each file. There is also the block size to consider. Files are made out of blocks, in your case 4kB. When reading a file of one byte you need to read a whole block anyway. If you have many small images you can stuff a whole bunch of them in a single disk block and get them all in the same time as if you had only one small image in the block.
Another reason from days of yore was palletes.
If you did one image you could theme it with one pallete Colour = 14 = light grey with a hint of green.
If you did lots of little images you had to make sure you used the same pallet for every one while designing them, or you got all sort of artifacts.
Given you had one pallete then you could manipulate that, so everything currently green could be made red, by flipping one value in the palletes instead of trawling through every image.
Lots of simple animations like fire, smoke, running water are still done with this method.