I am working on a blog-article about data recovery ans I would be interested how you would approach such a case. Let's tell a client pop by with a HDD and need his data recovered...
What exact would you do?
Thanks!
I'd use a computer program that goes through all the bytes of the disk that are marked as unused and finds anything that looks like it might have been a file.
Related
We know that kafka use memory mapped files for it's index files ,however it's log files don't use the memory mapped files technology.
My question is why index files use memory mapped files, however log files don't ?
Implementing both log and index appending with mmap approach will bring data consistency problem. mmap is not 100% guarantee to flush the data from memory to file(assuming the flush reply on OS instead of an explicitly calling on munmap(2)), if the index update get flushed but log data not get flushed successfully due to some reason, the data in the log can not be understood anymore.
BTW, for a append-only data, in the write direction, we only need to care about next-to-write block(buffer), so the huge data should not impact this.
That how many bytes can be mapped into the memory relates to the address space. For example, a 32-bit architecture can only address 4GB or even smaller portions of files. Kafka logs which are often larger enough might have only portions mapped at a time, therefore complicating reading them.
However, index files are sparse which means they are relatively small in size. Mapping them into the memory could speed up the lookup process and that's the primary benefit memory-mapped files offer.
Logs are where the messages are stored, the index files point to the position in the logs.
There is a nice, colorful blog post, explaining what is going on.
Having a fast index to improve read performance is a common optimization in databases where writes are append-only(Almost all LSTM databases do some form of this). Also as others have pointed out:
indexes are sparse, so smaller memory footprint. Even the sparsity of the index is configurable, which is useful as data grows.
Append only write patterns are faster than random seeks(especially true for SSDs), and therefore don't need a lot of attention for optimization.
if mmap log file, as physical memory is limited, it may cause page fault frequently which is a seriously expensive overhead. use sendFile system call is more suitable
I am writing a memory mapped character device. I can read and write correctly to the device, but my question is about the write behavior in the following case
When the count of data to write is much more than the available memory.
What would be the proper behavior in this case? Shall I write as much as I can and return the error in the next write? or fail from the beginning since the data is much more than the device capacity?
And to make the question more specific, let's take a FS on a hard-disk (ext3) for example.. what will happen if I tried to write data that is more than the available space on the hard-disk? will it fail before it start? or write as much data it can and fail in the next write?
This pretty much depends upon your application. Can your application live with writing partial data ? Is partial data any good ?
IMO, you should do a available memory check before writing anything and return with an error message if you don't have enough memory since otherwise you won't be able to do any meaningful error recovery (if at all you are taking care of it).
I was going through MongoDB Performance tuning and came across this in this website
http://www.scribd.com/fullscreen/56271132?access_key=key-1hnjbdbd1h36109o86zd&allow_share=true&view_mode=scroll
The above site has got a following line
Read -before -write
Spend Your time in read and out of write lock scope
50 % reduction in lock %
Could anybody please tell me ,
What does this mean actually ??
I think it refers to the fact that writing locks the collection and you want to minimize that. I think it is saying you should read first, then write, so you don't have to make a read but wait for a write.
generally you use a memcache system so your reads don't have to wait for collection writes to be done/unlocked and avoid write locks altogether. then again if the information isn't in the memcache, it will read it from the actual collection and it might have to wait for a write lock then.
read more about memcache, there are some memcache frameworks for servers that mongodb out there, like for php and for nodejs.
I would like to test out some code I have in place to manage conditions where there's not enough free disk space to complete the operations.
However, I'm having trouble achieving such situation. I have tried to sync stuff from iTunes to fill up the devices, but either I get too much disk space free or the content will exceed the device capacity and iTunes will not allow the sync.
I'm sure there must be an easier and better strategy to test this situation on the device, but I can't figure it out. I would appreciate and tips or experiences you can share about this.
Fill up the device until it's nearly at the limit in iTunes, then set a loop to copy a largish file into your Documents directory. Each time you copy it, give it a unique name (use UUID). Activate the loop to run a number of times with a control in your interface, or with a timer.
Here's a stupid idea.
Jailbreak your device
ssh into root
Execute a script (zsh?) that essentially implements this algorithm:
def logbomb(tries=5):
try:
for i in range(100):
pass
# write pow(2, i) many bytes into a log file in /private/var/tmp
catch IOError:
logbomb(tries - 1)
By the end you should get to a pretty stuffed private partition. Slightly increase the tries if that doesn't get close.
How can I write a program that can recover files in FAT32?
This is pretty complex, but FAT32 is very good documented:
I wrote a tool for direct FAT32 access once using only those ressources:
http://en.wikipedia.org/wiki/File_Allocation_Table
http://support.microsoft.com/kb/154997/
http://www.microsoft.com/whdc/system/platform/firmware/fatgen.mspx
But I've never actually tried to recover files. If you will successfully recover a file depends on several factors:
The file must still "exist" physically on the hard disk
You must know where the file starts
You must know what you are looking for (Headers..)
It depends on what happened to the files you're trying to recover. The data may still be on the partition, or it could be overwritten by now. There are a lot of pre-written solutions. A simple google search should give you a plethora of software that can try to recover the data, but it's not 100% sure to get them back. If you really want to recover them yourself, you'll need to write something the read the raw partition and ignore missing file markers.
here is a program (written by Thomas Tempelman. This guy is great.) that might help you out. You can make a copy of the partition, ignoring corrupt bits, then operate on the copy so you don't mess anything up, and you may also be able to recover the data directly with it.
I think you are referring to data carving, that is, reading the physical device and reconstructing previously unlinked files based on some knowledge (e.g. when you find two letters, PK, it's highly probable than a zip archive is following, same for JFIF for JPEG).
In this case, I suggest you to study the source code of PhotoRec a great (in my opinion, the best) Open Source tool for data carving.