I am making a directory for an app and I need to parse the the names, e-mail, phone #, and office for each item that I want to display in a UITableView. I have a class made but I have never really dealt with pasring anything past simple txt files.
I need to load a URL to a xml file, which consists of the following type of data at the bottom. It does not have xml tags, but it is saved as a .xml
I have read up on the NSXMLParsers, but I wasn't sure if that would be the correct way to do this or if there was an easier way.
Example of part of the .xml file below, this is just part of a few hundred lines that are organized in the same manner, by division, department, then person.
Thanks for any help!
http://cs.millersville.edu/School of Science and MathematicsDr.FirstH.LastRoddy Science CenterFirst.Last#millersville.edu872-3838Computer ScienceMrs.First.LastRoddy Science CenterFirst.Last#millersville.edu872-3858Computer ScienceDr.FirstH.LastRoddy Science CenterFirstH.LastRoddy#millersville.edu872-3470Computer ScienceDr.FirstH.LastRoddy Science CenterFirst.Last#millersville.edu872-3724Computer ScienceMs.FirstA.GilbertLast Science CenterFirst.Last#millersville.edu871-2214Computer ScienceDr.FirstH.LastRoddy Science CenterFirst.Last#millersville.edu872-3666
There's no way you can use a xml parser for this file.
Instead you may try to use NSScanner to parse the text file. A couple of tutorials are listed here:
Parsing CSV Data
Writing a parser using NSScanner
without the xml tags, your file is as good as a plain text file...
rows separated by new line character....
and each line contains data separated by a dot (.) or something like that. figure out the pattern and parse it like you would parse a text file...
Related
I need to programmatically find the fragments that are called by each rtftemplate.
So, for example in the figure, I would need to get the "GlossaryTermsAcronyms" fragment for the H2_terms_acronyms template.
I can't seem to find any query or script solution to do this. But this should be possible, right?
Unfortunately that is (almost) impossible.
The information is stored in the t_documents.bincontent column. It is binary encoded RTF.
Somewhere in that RTF there should be a reference to the templates fragments that are used.
If you can figure out how to decode the bincontent to get to the actual RTF code of your template, you might have a chance.
Binary fields in EA are usually stored as a zipped text file.
In case the field is included in an xml file (or xml string in the database), it will be base64 encoded.
Does anyone have a sample code for parsing the CEDICT file? CEDICT is a Chinese-English Dictionary. For instance, currently, if I open it in a text editor, a line in the CEDICT file looks like:
不 不 [bu4] /(negative prefix)/not/no/
I would like to see it as:
不 不 [bu4] /(negative prefix)/not/no/
I found Textwrangler to do this for me as a text editor. What I now need is sample code that achieves the same.
The thing is, it's just an encoding problem. If the line looks like
不 不 [bu4] /(negative prefix)/not/no/
It's because the text editor doesn't know/realize that the text is encoded as UTF-8. Text Wrangler, or its big brother BBEdit, are very good at guessing encoding, and can even be asked to display text in a specific encoding.
Since we don't know what you want, in the end, to achieve, it's hard to tell you exactly what has to be done, specifically. What I can say is that your app (which language are you using anyway?) needs to be Unicode aware (and be able to read/manipulate UTF strings).
I wrote a couple of apps based on the CEDICT, one for Mac OS X, one for Android. Parsing and indexing the CEDICT is not very hard.
UPDATE
Regarding the parsing itself of the CEDICT, it's nothing complicated. I don't do Objective-C, never have, never will, but the process would be the same in any language:
Read a line. Say your own example: 不 不 [bu4] /(negative prefix)/not/no/
You have four fields: Trad. Ch., Simp. Ch., Reading, Meaning(s).
These fields are space separated. Of course the 4th field may contain spaces, so be careful.
Store (I used an sqlite db) the 4 fields in to db.
You might want to remove the slashes from the definition field, replace them with something else.
Loop
You have now converted the CEDICT to a database. That's the easy part. As for tokenizing Chinese, good luck with that, mate. Better minds than mine are still banging their heads on this one.
I am having a pdf file contaning quotes of some famous people.I want to store these quotes in sqlite database.any suggestions?
There is no direct relation with pdf and sqlite.
First you have to somehow decode the information/data from pdf file. Read the quotes in text format.
Then you can insert them in a sqlite database.
Its a kind of tough job to decode information from pdf file. Then you need to know the structure of the pdf file format. You can get a description here pdf file format
I think you can look for some pdf file converter to convert it into text, html, xml or csv. Then read that by your app.
I think best way is through command line
awk '{printf"INSERT INTO Quotes VALUES (\x27%s\x27,\x27 Abraham Lincon\x27);\n",$0}' Quotes-AbrahamLincon.txt
where Quotes-AbrahamLincon.txt contains
A friend is one who has the same enemies as you have.
A house divided against itself cannot stand.
A woman is the only thing I am afraid of that I know will not hurt me.
output will be
INSERT INTO Quotes VALUES ('A friend is one who has the same enemies as you have.',' Abraham Lincon');
INSERT INTO Quotes VALUES ('A house divided against itself cannot stand.',' Abraham Lincon');
INSERT INTO Quotes VALUES ('A woman is the only thing I am afraid of that I know will not hurt me.',' Abraham Lincon');
now you can write this into sqlite db
I am writing a game for iOS that uses .tmx map files. I am creating the maps in the application 'Tiled' and then at some point before they get to iOS, I'm parsing them with Perl.
When I save the files as straight XML, it's a cinch for perl to parse them. However, cocos2d insists that the files be base64-encoded. The 'Tiled' map editor has no problem saving files with this encoding scheme, and iOS reads them just fine, but it's presenting problems for my perl code.
For some reason, the standard MIME::Base64 decode_base64() method in perl is not cutting the mustard here- when I decode the strings, I get one or two binary characters-- question marks in diamond boxes and such.
And the vague documentation for the TMX file format makes it unclear if there is some other encoding going on before or after the base64 encoding which might be causing this problems. I looked at the cpp source for the encoder, and I saw lots of references to Latin1, but I couldn't decipher what's going on in detail.
I noticed that when I tried doing my own tests with MIME::Base64, encoding and then decoding a test string, the encoded text looks dramatically different than that which I see coming out of the TMX files-- for instance, my base64-encoded text for a short string looks like this:
aGVyZSBpcyBhIHNlbnRlbmNl
But the base64-encoded text coming from the TMX files looks like this:
9QAAAAABAAANAQAAGAEAAA==
Any suggestions on what else I might try in attempts to decode a string that looks like that?
I think this page might be what you're looking for. It suggests that first you decode_base64, then (if the compression="gzip" attribute is present) use gunzip to uncompress it, and finally use unpack('V*', $data) to extract the list of 4-byte little-endian integers.
I have written a small program which can encode/decode a text with uuencode/uudecode. The code is based on the algorithm described on Wikipedia. It works fine when I encode/decode a string. But I have found a uuencoded file which I can't decode. This website can decode the file, but when I encode it again I don't get the same file. In addition, when I decode only one line of the file I don't get readable text (neither with my program nor with the decoder I linked before). But in uuenoding all lines are independent from each other - this must be able.
Do someone know whether there are some special variations of the uuenoding, which are not described on Wikipedia? I can decode some strings so my decoder can't be totally wrong. Perhaps someone has written his own decoder, so I post the whole file:
begin 666 Restricted.zip
M4$L#!!0````(`%T[="_]<LYX`P(``'0#```.````4F5S=')I8W1E9"YT>'1M
M4\MNVT`0NQOP/TSNM#PT0!/X4N16`RE0%.GC.I9&TE;2CKH/J_K[<E;IX]"+
M'UJ20W)6^]U3)SX=]KO][D*]SD(7XHD2CX/S'26EU`L%U_6)9#E1?46NQ4,7
MR?E6P\3)J:=%#ABZY7'$P2MO"0J1GGT3Z;B1YJ#?I4ZT:!X;N#KI34)%3Y%6
MS8#>A#I-&[;E`-H%'(EY#G[/(-I',=GI;XN"H49?''YXT#LE]BNU.<!&,*(W
M0&4Y7V#,F_&11NV<-TNU-!D!>HZP5"MF91^YE0-D&H2C5CAL\T&P:#/'A*<+
M#F6(!IEXW?Q?13Q=#P[XLBHJ>L[UX,;U8+`"X3I)0S^RJX=Q+3-28)##+IK:
MEAD#AQRM7DY)ICG%BK[:(,\=L$C>20*EUCR/8BP'&'H+.OT5:+`V>,*NK$%9
MZ<;>Q1X"1WJOBZ#_8HQ+`3?K%(U<1U-:7.HI6A]_+/V[\RU,J]DW!SMV#<37
M89W+>5QCL6/"MDHTQPV&UT5-<R!=?%D)MG^AR&Y3^>]::JP0H2MZ4>3UR?F,
M[>18,L'"..I2K'.,BP8TF<K)YT_/IG1S#<#VZ^,KX$QO'[\\WC_<W;V[?_-P
MW>^`/%.?TGP^G99EJ29MCC^K6JL\G%H78CJQC[CGU=S/V_M2KEN<A0?;A5U`
M[AC.U2*6OUOE0<KD#Q#\MM_]`E!+`0(4`!0````(`%T[="_]<LYX`P(``'0#
M```.``````````$`(`"V#0````!297-T<FEC=&5D+G1X=%!+!08``````0`!
+`#P````O`#``````
`
end
I found the solution! The problem was that I did not notice the first line. This line holds information about the data encoded - a file named Restricted.zip. So the decoded data is a ZIP file which I just had to unpack.
I got a text file named Restricted.txt which contains the readable data.
The problem was so easy, but it took me days to see its solution.
That's a good change over to packing algorithms - perhaps the next thing I do is writing my own program which can pack/unpack zip files.