HTML file encoding issue (Chinese) - encoding

I have a HTML file that contains some records shown in a table that somehow got encoded in the wrong way. A large part of the file is correct and shows the content as expect but some parts of the file seem to be encoded in the wrong way. Actually the whole HTML part is shown correctly (all the elements etc) but the values within the cells of the table are sometimes encoded in the wrong way.
For example one cell contains:
<cell>»¿è²å¼æäºæ 线æ¥å¥ç½ç»ä¸­çæ³¢ææå½¢ææ¯ç 究</cell>
While it should contain:
<cell>绿色异构云无线接入网络中的波束成形技术研究</cell>
I already tried figuring out what exactly went wrong, but I can't seem to find the correct solution to completely resolve this problem for the whole file. I tried tools such as FTFY, which didn't give me any meaningful result.
These websites gave me some direction and it seems that something went wrong between Windows-1252/1251 and UTF-8. The first website seems to fix the problem but still returns some unknown characters (UTF-8 displayed as Windows-1252).
Does anyone have an idea how to fix this for the whole file? Or give me any tips to further figure it out on my own.
Thanks in advance.

Related

Find corrupt data in xlsx file

We are generating xlsx files using a perl script. Files usually contains thousands of records. This makes spotting errors a very difficult operation.
This process was working since years without problems.
This week we got a request to check a file which contains errors. While opening Excel prompted that the file contains errors and asked whether we want to repair them.
In fact we do not want to recover the data but want to know which part of the file is corrupt. The error should be coming from corrupt data and we are interested to identify these data.
the log message shows the following:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error068200_01.xml</logFileName> </br>
<summary>Errors were detected in file 'D:\Temp\20161020\file_name.xlsx'</summary>
<repairedRecords summary="Following is a list of repairs:"><repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>
The error should come from corrupt data. Is there any tool/method which helps to spot this corrupt data?
I tried renaming it a zip file, extracting it and opening it via an XML editor but was not able to find any errors in XML file.
We also checked that the different XML file structures are fine.
Thank you and best regards
As expected, the problem was coming from text cells containing numbers having an E in the middle.I used the following steps to identify the erronous cells.
1. Wrote small Java class to read the file. The class was checking the cell type and displaying the value afterwards.The java program generated an Exception at some line "Cannot get a numeric value from a text cell" even If I was correctly checking the cell type before displaying the content.
2. I checked the opened Excel file at that line and found that the cell contains only 'inf'.
3. I opened the file using open office and looked at the same cells. They contain 0.
4. I debugged the program generating the data and found out that these cells contain data like '914E5514'. Seems that E which was interpreted by Excel as an exponent.We changed the program to use the format '#' for that cell and this solved the issue.
Thank you.
Thank you very much, you helped me a lot by saying that 1 particular content item may be the root problem.
My corrupted content was https://www.example.com XYZ ... ASDAS
Solution: www.example.com XYZ ... ASDAS
This is something which cannot be handled by excel. Would be nice to have a list of thing which do not work

Convert abf to atf file type. Where should I start?

I want to preface this question by first saying I am not 100% sure this is the correct place to ask.
Basically, I want to take files with the .abf (axon binary files) extension and convert them to .atf (axon text files). I was wondering if I could simply just run a script that converts binary to text or if it would be more complicated than that.
I'm making a script that takes files with the .abf extension and feeds them into the Clampex9.2 program in order to save them as .atf files. However I feel like there is a much better way than manually feeding the files into a program, then resaving them with the correct extension.
Again, if this is not the right forum for this type of question, I apologize but thank you in advance if anyone can help me with this problem!

What can I do to recover a UTF-8 binary file?

I somehow had a script running on my company's server that basically did a mongodump and then for some reason used recode to encode all .bson files to UTF-8. Thanks to that, I can't use mongorestore, as it says every single .bson file has 268 Mb.
Is there anything one can do to get data back from a recoded to UTF-8 binary BSON file? There's apparently no way to recode it back. Thanks.
OK. This works only on MongoDB, probably, but I'll put it as an answer because it may work for people with this exact problem:
BSON files, while binary, are somewhat readable, depending on your need. In my case, I had a product collection, and most of what I had to update was descriptions and such.
While not a perfect solution, it is possible to just use Notepad++ to turn hex characters into new lines or anything else, and try to parse the resulting file, if you know what you are doing.
Since all fields (name, _id, description) are still there, I recommend turning those into XML headers, for example.
That solved my problem. Thanks.

Best way to get a database friendly list of Veteran Affairs Hospital

I sincerely apologize if this isn't the proper forum to discuss this, but I wasn't sure where to go or what would be the best option.
Basically, I'm trying to find a database friendly list of veteran affairs hospitals. The closest thing that I've been able to find is www.va.gov/ofcadmin/docs/CATB.pdf as it has all the information I'm looking for:
Region
Address
City in a separate column
Zip Code in a separate column
State
Facility # (also known as StationID)
VISN
Symbol
I've tried exporting that PDF out into CSV but it's a complete nightmare to get working. So, I was curious if anyone had any ideas or insights into how I could accomplish this task.
First, here's a CSV file containing the data found in CATB.pdf. The very first line contains the column headers, and the rest of the file contains the contents.
http://tmp.alexloney.com/CATB.csv
Now, for the more detailed explanation...I took the PDF you provided a link to, converted it to an HTML document using Adobe Acrobat, then I used a lot of Regular Expressions to parse the file and clean it up. Once the file was cleaned up enough, I was able to write a program to parse through the remainder of the file, grab the state and region, and spit it all out in a nicely formatted CSV.
Hope that helps you!
I believe that PDFILL has an option in it that will convert a PDF file to Excell. Once in Excell you should have no problem converting to a CSV file.

Localizable.strings woes

My Localizable.strings file has somehow been corrupted and I don't know how to restore it.
If I open it as a Plain Text File it starts with weird characters that I can't copy here.
If I leave the file be the app builds. If I make any changes either the values aren't interpreted properly or I get an error at compile time.
Localizable.strings: Conversion of string failed. The string is empty.
Command /Developer/Library/Xcode/Plug-ins/CoreBuildTasks.xcplugin/Contents/Resources/copystrings failed with exit code 1
I suspect this is an encoding problem but I don't know how it happened (maybe SVN is to blame?) nor how to solve it. Any tips will be much more appreciated.
I have issues with the same file that sound very similar to your own. What happens for me is that Xcode doesn't know the correct file formating. I often get this when rearranging the project and I remove and re-add this file to the Xcode project. When I re-add the file, its encoding gets set to something like Western Roman which can't seem to render anything other than ASCII.
Here's what I do to fix the problem:
In Xcode select the Localizable.stings file in the Groups&Files panel.
Do a Get Info on that file.
On the info panel select the General tab.
In that tab go to the File Encoding and change its value.
The last step is where the trick lies as you now have to guess the right encoding. I find that for most European languages that "Unicode (UTF-8)" works. And for Asian languages I find that "Unicode (UTF-16/32)" are the ones to try.
I just had that error because I forgot a semicolon. Took me a while to figure it out. Seems like a really ambiguous compiler error but the fix was simple.
Make sure in File-Get Info, that UTF-16 is selected. If it's set to none or UTF-8 as encoding then you need to change it. If your characters have spaces between them then you choose to "re-interpret" the file as UTF-16. If there are weird characters in the file, then you need to remove them.
Execpt the UTF-8 problem, sometimes you still have to check the content in case if there are some syntax problems.
Use the following Regular Expression to verify your text line by line, if there's any line not matched, there must be a problem.
"(.+?)"="(.+?)";
You can use the plutil command line tool. Without options or with the -lint option, it checks the syntax of the file given as argument. It will tell you more precisely where the error is.
This happens to me when there is a missing quote or something not right with the file. MOst commonly, since my language files are done by another team member, he tends to forget a quote or something. Usually XCode shows an error on that line, sometimes it does'nt and just throws "Corrupted data" error.
Double check if all your strings are properly closed in quotes
Open the file in Xcode.
Right click it in Project Navigator.
Select Open as -> ASCII Property List