Converting Danish special characters

Converting Danish special characters - matlab

I'm trying to read a .csv file for my script, but the script contains the special danish characters "æ","ø" and "å". So far my idea was to define it like this:
CellM = readtable(csvdata,'Format','%æ%ø%å%s');
but it's not working.

You can specify the file encoding when you call readtable:
readtable(..., 'Encoding', 'UTF-8');
This should solve your problem, assuming you didn't use another function on csvdata, earlier, which already messed your data up.

Related

Winjs, error reading file with FileIO.readTextAsync

I am reading a .json file from disk using Windows.Storage.FileIO.readTextAsync.
All is fine until I put some non english letters in the file, like Æ Å Ø
The error I get is (rough translation from Danish language):
WinRT: No mapping for the Unicode character exists in the target multi-byte code page.
any idea how to read those chars in WinJs?

I found the problem.
when I created the file manually with notepad I set it to type ANSII instead of utf8.
I reopened the file -> save as and the changed the type and overwrote it.

You may be able to solve this by changing the encoding from the default (Utf8) to Utf16. The readTextAsync method accepts a second parameter which is a UnicodeEncoding flag:
Windows.Storage.FileIO.readTextAsync(
file,
Windows.Storage.Streams.UnicodeEncoding.utf16LE
).done( ... );
Or if you need to, you can use utf16BE flag (see link above).

formatting text in a csv export

I'm having trouble with a .csv export which is being uploaded to a website. There are must be some hidden or illegal characters in a description field I have in the database. I'm having a tough time getting the text to format correctly and not break a php script.
If I use the GetAs(css) function in a calculation, the text works fine. Obviously this won't work as a working file but it at least validates there's something in the formatting of the description field that's breaking the export. I did use the excel clean(text) calculation and that fixes the issue as well. Just need to find a way in Filemaker to do this.
Any suggestions?? Maybe a custom function that strips out bad characters?

You can filter invalid characters out of text using the filter function. If you only want a minimal set of ASCII characters, use it like
filter(mytable::myfield; "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789.!?")

Strange character rendered correctly in notepad, but as a control character elsewhere

I have a .csv list of businesses. The file has some strange characters in. For example, in this field: Stocktonon-Tees, the first hyphen, between Stockton and on seems to be a character with the value 6 rather than a hyphen, with the value 45. Stack overflow will probably sanatize this so you can't see it, so here is a pastebin:
http://pastebin.com/NuyyaQy9
Can anyone explain why this could be? Is it some encoding issue that I have missed? Or a corruption in the dataset?

Yes, it's almost certainly an encoding issue. A file just consists of binary data - it's how you interpret that binary data that matters. It sounds like Notepad is guessing at the originally-intended encoding, but whatever else you're using isn't.
Unfortunately you haven't said anything about what software is trying to read the file or what wrote it in the first place - but you should look at what encoding Notepad thinks it is, and work from there.
If it's your code that wrote the file out, and you get to decide the encoding, I'd recommend UTF-8 as a good general purpose, platform-portable encoding.

problem while parsing the CDATA

<text><![CDATA[øCu·l es tu principal reto, objetivo o problema?]]></text>
while parsing the above tag, its crashing.
how to parse the CDATA
the same line is appearing in windows like this...
<text><![CDATA[¿Cuál es tu principal reto, objetivo o problema?]]></text>
due to the special chars the parser is crashing.
why they are converted into special chars in Mac..?
how to solve this?

Well for one, the string as you post it here looks like something has gone wrong with the encoding. "ø" is not a Spanish character.
What xml parser are you using? I would guess that somewhere in that string is a character, possibly hidden, or maybe it's "ø" which makes your parser crash.
Edit (in response to the OP's comment)
I will try to guess what is happening and hope you can use my guess to resolve what is actually happening. So when you created the xml file you used some editor. This editor used a particular encoding. This means that it transferred the characters on your screen into bytes on your disk using a particular mapping from character into bytes (it encoded the characters into bytes). There are many different encodings, one common encoding is called Latin-1. So let's assume the file was encoded using Latin-1. After creating it, you transferred the file onto another machine where you opened it in a different editor. Now, how does the new editor know the encoding of the file? The answer is that it probably tried to guess the encoding. Now here is where the problem arises: it guessed wrong and interpreted the bytes using an encoding other than Latin-1.
While you have your (garbled) file open in an editor try selecting different encodings from the menu. The one that displays all your special characters correctly is likely to be the one used when the file was created.
Edit 2
But my other question remains: what xml parser are you using?
Edit 3
Ok, so now when you write "crashing", do you actually mean crashing or does it just return? Do you get an error message? If yes, what? Can you do the following:
Remove the funny characters from this line and run your code on the following:
<text><![CDATA[l es tu principal reto, objetivo o problema?]]></text>
Does it still crash?

Apostrophe issue in RTF

I have a function within a custom CRM web application (old VB.Net circa 2003) that takes a set of fields from a database and merges them with palceholders in a set of RTF based template documents. These generate merged letters and documentation. The code essentially loops through each line of the RTF template file and replaces any instances of the placeholder values with text from a database record. The issue I'm having is that users have pasted a certain type of apostrophe into the web app (and therefore into the database) that is not rendering correctly in the resulting RTF file. It is rendering like this - â€™.
I need a way to spot this invalid apostrophe in the code and replace it with a valid one. Unfortunately when I paste the invalid apostrophe into the Visual Studio editor it gets converted into the correct one. So I need another way to express this invalid apostrophe's value. Unfortunately I do not know a great deal about unicode and other encodings so I'm calling out for help with this.
Any ideas?

If you really just want to figure out what the character is you might want to try and paste it into a text editor like ultraedit. It has a hex mode that you can flip to to see the actual underlying bytes.
In order to do the replace once you've figured out the character you'd do something like this in Vb,
text.Replace(ChrW(2001), "'")
Note that you might not be able to figure it out easily using the text editor because it might also get mangled by paste from the clipboard. You might want to either print some debug of the ascii values from code. You can use the AscW function to do that.
I can't help but think that it may actually simply be a case of specifying the correct encoding to use when you write out the stream though. Assuming you're using a StreamWriter you can specify it on the constructor. I'm guessing you actually want ASCII given your requirement.
oWriter = New System.IO.StreamWriter(path, False, System.Text.Encoding.ASCII)

It looks like you probably want to encode characters out of the 8 bit range (>255).
You can do that using \uNNNN according to the wikipedia article.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse