What can I do to recover a UTF-8 binary file? - mongodb

I somehow had a script running on my company's server that basically did a mongodump and then for some reason used recode to encode all .bson files to UTF-8. Thanks to that, I can't use mongorestore, as it says every single .bson file has 268 Mb.
Is there anything one can do to get data back from a recoded to UTF-8 binary BSON file? There's apparently no way to recode it back. Thanks.

OK. This works only on MongoDB, probably, but I'll put it as an answer because it may work for people with this exact problem:
BSON files, while binary, are somewhat readable, depending on your need. In my case, I had a product collection, and most of what I had to update was descriptions and such.
While not a perfect solution, it is possible to just use Notepad++ to turn hex characters into new lines or anything else, and try to parse the resulting file, if you know what you are doing.
Since all fields (name, _id, description) are still there, I recommend turning those into XML headers, for example.
That solved my problem. Thanks.

Related

Troubleshooting "no writeable tags set" error

I'm trying to (ultimately) modify a batch of files but getting stuck in the basics as I try to modify a single file before running a batch command.
If someone could help me troubleshoot the command I'm inputting, that would be fantastic. I'm sure it's something very simple.
Thanks a lot for any help you can provide!
Here's the abbreviated image exif data:
-ExifToolVersion=10.10
-FileName=2018_11_13_1.jpeg
-Directory=.
-FileSize=2.8 MB
-FileModifyDate=2019:07:12 15:40:38-07:00
-FileAccessDate=2019:07:12 15:40:38-07:00
-FileInodeChangeDate=2019:07:23 10:38:02-07:00
-FilePermissions=rw-rw-r--
-FileType=JPEG
-FileTypeExtension=jpg
-MIMEType=image/jpeg
[...]
-ModifyDate=2018:11:13 12:00:53
[...]
-DateTimeOriginal=2018:11:13 12:00:53
-CreateDate=2018:11:13 12:00:53
My current input is: exiftool "-FileModifyDate<$filename00000" ./2018_11_13_1.jpeg
And the error message is:
Warning: No writable tags set from 2018_11_13_1.jpeg
0 image files updated
1 image files unchanged
And the exif data is, of course, unchanged.
I've confirmed that I can write a value to this tag, so there's definitely something going wrong in pulling from the filename.
( Continued from How to compensate for incomplete date/time info in filename )
The problem here is that you are trying to write from a tag named filename00000. If you check the example in the other post, you will see that there is a space after Filename. This sets it apart so that exiftool knows which is a tag name and which is other data.
There is possibly an additional problem here, though. Your filename has an extra number that is not the date. When exiftool tries to write the time stamp from the filename, it is going to end up with a value of "2018:11:13 10:00:00", which might become especially problematic if that last digit hits a value of 3 or more, resulting in a timestamp of "2018:11:13 30:00:00".
I would suggest using exiftool's Advanced Formatting Feature (a fancy way of saying that you can use perl code in the command) to strip the excess data. Something like
exiftool "-FileModifyDate<${filename;s/^(.*\d{4}_\d\d_\d\d).*/$1/} 000000" ./2018_11_13_1.jpeg
Though take note, if the filenames are in any other format, then it would require a different command.

Huge docx filled with <w:p> tags

My girlfriend is writing a Word document for a homework. She's using the old .doc format as required by her teacher ( :'( ).
At some point, the .doc file went from 150 kB to 2.6 MB with no noticeable change (seen in Dropbox history. Sadly, Word's comparison function fails because Word crashes). From that point, she was unable to save her document without crashing word...
I converted the .doc to docx, unzipped it, and found a 18 MB document.xml file !
I can't even format the xml properly because it crashes Notepad++, but I can see that the file is filled with the same xml tag repeating over and over :
<w:p w:rsidR="002A70E5" w:rsidRDefault="002A70E5" w:rsidP="00565ED9"/>
Do you have any idea what could cause this ?
EDIT: Here's the docx
EDIT2: The motivation for this question is more curiosity than looking for a fix. Thanks for your answers though.
If you're willing to edit the XML directly, you can just delete all the empty <w:p> tags and rezip.
If you're good with Python, you might give python-docx a try and use it to delete all empty paragraphs.
Hopefully that will at least recover the work she's done so far.
Not sure how this would happen, or whether it matters much. Only thing I can think of is a sticking Return key on the keyboard that would insert a huge number of carriage returns. Those each insert a new paragraph. I've actually had that happen occasionally on a Windows virtual machine running on my Mac. No clue why it does it though.
The tag you are talking about is the OpenXml format for building word documents. The openxml stores the document as a zipped file and I am afraid you are seeing the unzipped document.xml file. If you want to keep working with the doc just convert the doc file to docx. Dont unzip it.

How to replace a file inside a zip on iOS?

I need to replace a file on a zip using iOS. I tried many libraries with no results. The only one that kind of did the trick was zipzap (https://github.com/pixelglow/zipzap) but this one is no good for me, because what really do is re-zip the file again with the change and besides of this process be to slow for me, also do something that loads the whole file on memory and make my application crash.
PS: If this is not possible or way to complicated, I can settle for rename or delete an specific file.
You need to find a framework where you can modify how data is read and written. You would then use some form of mmap to essentially read and write small chunks. Searching on NSData and mmap resulted in this Post, however you can use mmap from the posix level too. Ps it will be slower than using pure memory no way around that.
Got it WORKING!! JXZip (https://github.com/JanX2/JXZip) has made exactly what I need, they link to libzip (http://www.nih.at/libzip/) that is a fully equiped library for working with ZIP files and JXZip have all the necessary Objective-C wrapper code. Thanks for all the replys.
For archive purposes, as the author of zipzap:
Actually zipzap does exactly what you want. If you replace an entry within a zip file, zipzap will do the minimum necessary to update it: it will skip writing all entries before the replaced entry, then write out the entry, then write out all entries after the replaced entry without recompressing. At the moment, it does require sufficient memory for the entries after the replaced entry though.

How can I force emacs (or any editor) to read a file as if it is in ASCII format?

I could not find this answer in the man or info pages, nor with a search here or on Google. I have a file which is, in essence, a text file, but it somehow got screwed up upon saving. (I think there are a few strange bytes at the front of the file accidentally.)
I am able to open the file, and it makes sense, using head or cat, but not using any sort of editor.
In the end, all I wish to do is open the file in emacs, delete the "messy" characters, and save it once cleaned up. The file, however, is huge, so I need something powerful like emacs to be able to open it.
Otherwise, I suppose I can try to create a script to read this in line by line, forcing the script to read it in text format, then write it. But I wanted something quick, since I won't be doing this over & over.
Thanks!
Mike
perl -i.bk -pe 's/[^[:ascii:]]//g;' file
Found this perl one liner here: http://www.perlmonks.org/?node_id=619792
Try M-xfind-file-literally in Emacs.
You could edit the file using hexl-mode, which lets you edit the file in hexadecimal. That would let you see precisely what those offending characters are, and remove them.
It sounds like you either got a different line ending in the file (eg: carriage returns on a *nix system) or it got saved in an unexpected encoding.
You could use strings to grab "printable characters in file". You might have to play with the --encoding though I have only ever used it to grab ascii strings from executable files.

using edeD3 to change encoding of mp3 tags

I found this question which is my exact starting point: Chinese-encoded metadata on mp3 files. I want to re-encode all my metadata as utf-8 so that Banshee can read it.
I can't figure out how to get eyeD3 to do that. I can decode individual tags as per that previous link, but I can't make eyeD3 change the actual text encoding of the mp3 file itself, so those tags can be rewritten in the proper encoding. I tried reading all the data into variables (below, 't' is the properly encoded title), then calling:
tag.clear()
tag.update(eyeD3.ID3_V2_4)
tag.setTitle(t)
That tells me: ValueError: ID3 vNone.None is not supported. Not what I was expecting.
I tried tag.setTextEncoding('utf-8'), but that tells me eyeD3.tag.TagException: Invalid encoding. All the other encodings I try give me the same error message.
eyeD3.TAGS2_2_TO_TAGS_2_3_AND_4 looks promising, but it's a dictionary of cryptic letter codes that mean nothing to me.
Can someone tell me how to change the version of the tags to something that supports utf-8, then change the file encoding to utf-8 and write the metadata back in?
Looks like somebody's already created something that does this:
http://code.google.com/p/id3-to-unicode/
It's pretty easy to use. Just download the latest version of the script from the website, make sure you have the eyeD3 and chardet python modules installed (a quick sudo apt-get install python-eyed3 python-chardet did the trick for me in ubuntu), and run the script with the -h flag to see how to use it.
My only complaint is that the script assumes that your music is organized like artist/album/01 track name.mp3, and uses path/file information to fill in missing tags. I disabled this in the latest version (http://id3-to-unicode.googlecode.com/files/id3_to_unicode_1.1.py) by commenting out lines 126-138.
Eric Abrahamsen figured out, that setting the text encoding should look like
tag.setTextEncoding(eyeD3.UTF_8_ENCODING) instead of
tag.setTextEncoding('utf-8').