formatting text in a csv export - filemaker

I'm having trouble with a .csv export which is being uploaded to a website. There are must be some hidden or illegal characters in a description field I have in the database. I'm having a tough time getting the text to format correctly and not break a php script.
If I use the GetAs(css) function in a calculation, the text works fine. Obviously this won't work as a working file but it at least validates there's something in the formatting of the description field that's breaking the export. I did use the excel clean(text) calculation and that fixes the issue as well. Just need to find a way in Filemaker to do this.
Any suggestions?? Maybe a custom function that strips out bad characters?

You can filter invalid characters out of text using the filter function. If you only want a minimal set of ASCII characters, use it like
filter(mytable::myfield; "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789.!?")

Related

PostgreSQL Escape Microsoft Special Characters In Select Query

PostgreSQL, DBvisualizer and Salesforce
I'm selecting records from a database table and exporting them to a csv file: comma-separated and UTF8 encoded. I send the file to a user who is uploading the data into Saleforce. I do not know Salesforce, so I'm totally ignorant on that side of this. She is reporting that some data in the file is showing up as gibberish (non UTF8) characters (see below).
It seems that some of our users are copy/pasting emails into a web form which then inserts them into our db. Dates from the email headers (I believe) are the text that are showing as gibberish.
11‎/‎17‎/‎2015‎ ‎7‎:‎26‎:‎26‎ ‎AM
becomes
‎11‎/‎16‎/‎2015‎ ‎07‎:‎26‎:‎26‎ ‎AM
The text in the db field looks normal. It's when it is exported to a csv file and then that file is viewed in a text-editor like Wordpad or Salesforce. Then she sees the odd characters.
This only happens with dates from the text that is copy/pasted into the form/db. I have no idea how, or if there is a way, remove these "unseen" characters.
It's the same three-characters each time: ‎ I did a regex_replace() on these to strip them out, but it doesn't work. I think since they are not seen in the db field, the regex does see them.
It seems like even though I cannot see these characters, they must be there in some form that is making them show in text-editors like Wordpad or the Salesforce client after being exported to csv.
I can probably do a mass search/find/replace in the text editor, but it would be nice to do this in the sql and avoid the extra step each time.
Hoping someone has seen this and knows an easy fix.
Thanks for any ideas or pointers that may help.
The sequence ‎ is a left-to-right mark, encoded in UTF-8 (as 0xE2 0x80 0x8E), but being read as if it were in Windows-1252.
A left-to-right mark is invisible, so the fact that you can't see it in the database suggests that it's encoded correctly, but without knowing precisely what path the data took after that, it's hard to guess exactly where it was misinterpreted.
In any case, you should be able to replace the character in your Postgres query by using its Unicode escape sequence: E'\u200E'

Forcing a specific character encoding for an Excel doc in Python

I'm making Excel documents in Python, using PyExcel for now.
My specific problem is that when people using Excel open one of these documents, it opens in a weird character encoding, when I want it to decode in UTF-8.
Is there a way using PyExcel that I can give Excel a hint of what decoding I want to use? And if not PyExcel, something in Python? I'd rather not have to write a note to the users telling them to change their settings.

MS Access Convert from Unicode when Reading from Text File

So, I have an Access database where I import data from a text file. The file is semi-colon delimited. Occasionally (and will become more frequent) I receive a file from one of our affiliates from Russia. The file has unicode (I think) characters like "Ìèðîøíèêîâ" instead of "Мирошников". Ultimately, I'd like to translate those into English upon import, but for now, I'll accept the Russian characters.
How should I go about doing this? Currently, I'm reading each line of the file, using the SPLIT function to separate each field by the ";" separator into an array, and sticking each array element into a table. Would changing the system Keyboard Layout to Russian prior to this work, or is it more complicated than that.
Does any of this make sense, or should I just bag it and go grab a beer (or some Vodka)?
Thanks!
You should be able to create an "Import Specification" that will tell Access how to convert the character data. Follow the procedure here...
Importing a text with separators using VBA
...and choose the appropriate character set from the "Code Page" combo box.
If you need to perform the imports from VBA code then you can save the specification (using the "Save As..." button) and then re-use that specification in a DoCmd.TransferText statement.

Howto make JRequest::getVar filter correctly accented characters?

I want to filter some variables with accented character on a component for joomla1.5 for example:
$name = JRequest::getVar('name', '', 'post','WORD');
but the getvar function filters áéíóú. I need this get well for a form in spanish language.
I'm new to joomla development, but for as far as I can see, it doesn't let me set any other parameter to config to get this.
Is there a way to do this with the advantage of filtering with JRequest::getVar or should I create a function myself which does so?
Do you mean JRequest::getVar() removes symbols like 'áéíóú'? It is very weird because I've worked on Joomla with danish and hebrew symbols. And they were passed through GET, POST, SESSION successfully. Because Joomla works with UTF8 and it understands such symbols. The problem could be only in your file encoding. They should be in UTF8. Is it so? If not try to change it. This should help.

Apostrophe issue in RTF

I have a function within a custom CRM web application (old VB.Net circa 2003) that takes a set of fields from a database and merges them with palceholders in a set of RTF based template documents. These generate merged letters and documentation. The code essentially loops through each line of the RTF template file and replaces any instances of the placeholder values with text from a database record. The issue I'm having is that users have pasted a certain type of apostrophe into the web app (and therefore into the database) that is not rendering correctly in the resulting RTF file. It is rendering like this - ’.
I need a way to spot this invalid apostrophe in the code and replace it with a valid one. Unfortunately when I paste the invalid apostrophe into the Visual Studio editor it gets converted into the correct one. So I need another way to express this invalid apostrophe's value. Unfortunately I do not know a great deal about unicode and other encodings so I'm calling out for help with this.
Any ideas?
If you really just want to figure out what the character is you might want to try and paste it into a text editor like ultraedit. It has a hex mode that you can flip to to see the actual underlying bytes.
In order to do the replace once you've figured out the character you'd do something like this in Vb,
text.Replace(ChrW(2001), "'")
Note that you might not be able to figure it out easily using the text editor because it might also get mangled by paste from the clipboard. You might want to either print some debug of the ascii values from code. You can use the AscW function to do that.
I can't help but think that it may actually simply be a case of specifying the correct encoding to use when you write out the stream though. Assuming you're using a StreamWriter you can specify it on the constructor. I'm guessing you actually want ASCII given your requirement.
oWriter = New System.IO.StreamWriter(path, False, System.Text.Encoding.ASCII)
It looks like you probably want to encode characters out of the 8 bit range (>255).
You can do that using \uNNNN according to the wikipedia article.