Parse .strings file with Python

Parse .strings file with Python - iphone

I'm trying to write a small Python script to parse the .strings file in my iPhone application project and determine which keys might not be in use. I'm, also doing some string matching to filter out some of the results. This is where my problems start :). If I try something like
for file_line in strings_file:
if 'search_keyword' in file_line:
...
the search keyword will often not match, even though if I print every file line in the same for I seem to be reading the text correctly and my search keywords appear.
The problem is these .strings files are in some binary format. Does anyone know of a proper way to parse these files?

Use correct encoding to open the .strings-file and in your source code. According to documentation the encoding of your file could be utf-16.
# -*- coding: utf-8 -*-
import codecs
for line in codecs.open(u'your_file.strings', encoding='utf-16'):
if u'keyword' in line:
# process line

No experience with those .strings files, but here is the reason why you don't find matches:
strings_file.read()
returns a string with the full content of the file. Iterating over a string iterates over single characters, i.e. in your for loop, file_line isn't a line, it's always just one single character (a string of length 1), which obviously can't contain a multi-character search word.

It sounds like the stings file was saved as data. If python can't read it as is you can convert it to a plain text file in Objective-c.
Just: (1) read the strings file into a file with the proper encoding. (2) Convert to dictionary (3) write dictionary to another file.
So:
NSString *strings=[NSString stringWithContentsOfFile:filePath encoding:NSUTF16StringEncoding error:&error];
NSDictionary *dict=[strings propertyList];
[dict writeToFile:anotherFilePath atomically:NO];

Related

Applescript: Save Word documents as plaintext while retaining accents

I'm trying to save Word documents as plain text docs. Currently, some times the accents turn into other symbols (usually the same ones, for example: é turns into a theta). Other times it works fine. How do I prevent this?
Currently using the line:
save as active document file name FullDocPath file format format Unicode text
When I encounter this error, I can save the document using the dialog (selecting Western Mac OS Roman encoding...that fixes the problem.
The applescript Word dictionary mentions:
[text encoding unsigned integer] : Text encoding to use when saving out as text file
I have no idea if this is the piece I'm missing or how to utilize it (is there a set integer that designates Western Mac OS Roman encoding?)
Anyone have any ideas?

Try:
set wordDoc to choose file
do shell script "textutil -convert txt " & quoted form of POSIX path of (wordDoc as text)
Check out StefanK's solution using textutil

This is in response to your comment beginning "Thanks Stefan and bibadiak"
With .txt file formats is that there is no universally used way to specify the encoding of a file inside the file, so either the application has to guess, or you have to know the encoding and the application has to let you tell it.
AFAIK if you do not specify an output encoding when you use textutil to convert from .doc or .docx format to text, you get UTF-8. But Mac Word just does not seem to recognise that when you try to open it, either programmatically or in the UI.
So I think you need to do some mix of the following:
a. save in, and work with, a format that uses 16-bit Unicode encoding. Word should recognise that, certainly if the BOM is preserved
b. save to UTF and work with UTF elsewhere, but use textutil to do the conversion back to (say) .docx before you re-open the document in Mac Word
c. if all your characters can be encoded using Mac OS Roman, use e.g.
textutil -convert txt -encoding 30
to save, ensure you work only with that character set, and re-open with Word. (30 is the value of the APple NSString value NSMacOSRomanStringEncoding). I think textutil will fail to convert documents that contain characters outside the MacOS Roman set.

How to figure out what and encoded string contains

I have a string that looks like this
H4sIALYnhUsCA9VXW5aDIAz9zypcgiU8dDnTWtfQ5Q8kEgSR
ap05c+YnhxLyumBu2r/s2PUvO3nh+rCaw0oFob1Q+Z51HfjNZ1jexCSsLAYx
BGG6eATZGJYALIIzG9QOy4NeaPYAyyarKfQY7TgypTjGI3ogkxDahSTw7kX/
FQUHeIgxsoClQD1JGRKF7Jy4oXNeQFou5TvJzlkJoAUIMuGAOlePMTEGWQry
2liLCfHNJPEwuiU7jmzEhM6gnGawSO3ORMnqLQRsNgki7AV4jEI9xKRU65V6
q7UUZVetqsZQC13z3UzMXkkM24nlvs+B/EktqmsnC0dxelvLycTaN+QugYw/
DTJeeTD4iy/ZXQHZ/KuXjH/2kvFKYtfaBfXtaUtlVZCZiIxw5WPLLxkFQZ2D
mMBmUaQJYCKyyBlShVqMuHUFSzu5/UTY1sVMVpwzSnimpEFOz5G7nKSoheIt
yqjg+pxU54zE64jd3zzdrYmW6Ybic2mVvcjAUKfg0s0QMfAXDadyotuGxOdH
hwZIU4NPR2fqbApbVnirTRdFGc/cjr7KwhmV+m6GGbMnf+RetoNNGwiohW4D
AREJ1R0FAhqo7gDx4b18iBh/uWPeGkwc07mMmdtKbBe0WQy9PMpr6TpLZwhR
whmj8/8FjTEWsv8ckhimqgj9+2q0hfWH1WpFCXPYfX27mEMGupKe1QA+gkwd
PDVv/xO+AbHzd9RzDQAA
My initial guess was that this was a Base64encoded file of some sort. Any ideas on how I can figure out if/which type of file it is? It should contain MIME info I guess but how would I save it to a file without fragmenting it.

It's base64. When you decode it, you get a gzipped file, which consists of a boatload of hex characters (literally, as ASCII 0xNN hex characters). They're mostly in the A-Z,a-z range.
I'd paste it here, but from this, I suspect this is part of some exercise you're doing, so I think I'll leave it to you to figure out.
P.S. For edification, I determined the binary output was a gzipped file by using the unix file command, to identify the "magic" bytes, which showed that it was gzipped. Use your decode_base64 function or whatever it is, then dump the return value into a file and gunzip it.

Are there any special variations of uuencoding / uudecoding?

I have written a small program which can encode/decode a text with uuencode/uudecode. The code is based on the algorithm described on Wikipedia. It works fine when I encode/decode a string. But I have found a uuencoded file which I can't decode. This website can decode the file, but when I encode it again I don't get the same file. In addition, when I decode only one line of the file I don't get readable text (neither with my program nor with the decoder I linked before). But in uuenoding all lines are independent from each other - this must be able.
Do someone know whether there are some special variations of the uuenoding, which are not described on Wikipedia? I can decode some strings so my decoder can't be totally wrong. Perhaps someone has written his own decoder, so I post the whole file:
begin 666 Restricted.zip
M4$L#!!0````(`%T[="_]<LYX`P(``'0#```.````4F5S=')I8W1E9"YT>'1M
M4\MNVT`0NQOP/TSNM#PT0!/X4N16`RE0%.GC.I9&TE;2CKH/J_K[<E;IX]"+
M'UJ20W)6^]U3)SX=]KO][D*]SD(7XHD2CX/S'26EU`L%U_6)9#E1?46NQ4,7
MR?E6P\3)J:=%#ABZY7'$P2MO"0J1GGT3Z;B1YJ#?I4ZT:!X;N#KI34)%3Y%6
MS8#>A#I-&[;E`-H%'(EY#G[/(-I',=GI;XN"H49?''YXT#LE]BNU.<!&,*(W
M0&4Y7V#,F_&11NV<-TNU-!D!>HZP5"MF91^YE0-D&H2C5CAL\T&P:#/'A*<+
M#F6(!IEXW?Q?13Q=#P[XLBHJ>L[UX,;U8+`"X3I)0S^RJX=Q+3-28)##+IK:
MEAD#AQRM7DY)ICG%BK[:(,\=L$C>20*EUCR/8BP'&'H+.OT5:+`V>,*NK$%9
MZ<;>Q1X"1WJOBZ#_8HQ+`3?K%(U<1U-:7.HI6A]_+/V[\RU,J]DW!SMV#<37
M89W+>5QCL6/"MDHTQPV&UT5-<R!=?%D)MG^AR&Y3^>]::JP0H2MZ4>3UR?F,
M[>18,L'"..I2K'.,BP8TF<K)YT_/IG1S#<#VZ^,KX$QO'[\\WC_<W;V[?_-P
MW>^`/%.?TGP^G99EJ29MCC^K6JL\G%H78CJQC[CGU=S/V_M2KEN<A0?;A5U`
M[AC.U2*6OUOE0<KD#Q#\MM_]`E!+`0(4`!0````(`%T[="_]<LYX`P(``'0#
M```.``````````$`(`"V#0````!297-T<FEC=&5D+G1X=%!+!08``````0`!
+`#P````O`#``````
`
end

I found the solution! The problem was that I did not notice the first line. This line holds information about the data encoded - a file named Restricted.zip. So the decoded data is a ZIP file which I just had to unpack.
I got a text file named Restricted.txt which contains the readable data.
The problem was so easy, but it took me days to see its solution.
That's a good change over to packing algorithms - perhaps the next thing I do is writing my own program which can pack/unpack zip files.

Emoji iCons are not displaying correctly when read from plist

I am trying to read some text from a plist file and display it to the users in alert box.
When I build the string using this code, everything works (users sees Hello with a smily icon):
NSString *hello = #"Hello \ue415";
but when I get the string from plist, using this code, uses sees "Hello \ue415":
NString *hello = (NSString *)[pageLiteratureDic objectForKey:litratureKey];
Do I have to encode string differently? Any help or pointers will be much appreciated... everyone love emojis ;)

You shouldn't literally type "\ue415" as text into the plist file. \u.... is an escape sequence in the syntax of strings and characters in the C language. The string itself does not contain backslash and "u" and whatever, it contains just 1 character, the Unicode character at the codepoint 0xe415. If you want to save that in a plist, you have to manually type that one Unicode character in there yourself, making sure to use whatever encoding that is required of a plist (maybe utf-8 or utf-16, not sure). Alternately, you can write a program that creates a plist from that string, and then copy and paste whatever is in that plist file over to your file.

In the plist, instead of "Hello \ue415" try using the smily face character explicitly as in "Hello :)". Just cut and paste the smily character over the unicode code. The reading of the plist is probably escaping the backslash and stopping the interpretation as a unicode character.

Deleting a line from a txt file in objective-c

Is there a way to remove a line from the end of a .txt file from Objective-C? I can't seem to find anything on manipulating text files from Objective-C, only reading them into a NSString.

Have you considered changing your data model to a plist? Plists are more easily read/written into/from NSDictionaries.
Otherwise, I think the only way is to read the file into a NSString, separate into a component NSArray by splitting on \n, remove the object at index n, write back into a string by joining with component \n, then writing back to the file.