How can I check if a character is allowed to be uploaded in Teradata? - character

How can I check if a character is allowed to be uploaded in Teradata ?
Recently I was uploading (using jdbc) a .csv file that contained some weird SUB characters. The upload failed. Later i found out that those weird characters were the older version of the end of file marker. So, where can I get a list of all allowed characters so that I could pre clean my csv files and be sure that they get uploaded ?
Thanks

Found this in the Teradata International Character Set Support documentation to explain why you are encountering the error with the SUB data in your. Which is what I believe the user logic linked to in his/her answer.
The characters 0x1A in LATIN/KANJI1/KANJISJIS and U+FFFD in
UNICODE/GRAPHIC are used internally by Teradata as the error
character; therefore, they are unusable as user data. The user cannot
store or retrieve these values through Teradata.
The list of supported UNICODE characters can be found here: UNICODE Server Character Set (direct download of text file)

#Alex You could try looking here? :)
http://www.info.teradata.com/templates/eSrchResults.cfm?txtrelno=&prodline=all&frmdt=&srtord=Asc&todt=&txtsrchstring=character&rdsort=Title&txtpid=
Update! The link leads to multiple lists of Teradata's supported characters

Related

PostgreSQL Escape Microsoft Special Characters In Select Query

PostgreSQL, DBvisualizer and Salesforce
I'm selecting records from a database table and exporting them to a csv file: comma-separated and UTF8 encoded. I send the file to a user who is uploading the data into Saleforce. I do not know Salesforce, so I'm totally ignorant on that side of this. She is reporting that some data in the file is showing up as gibberish (non UTF8) characters (see below).
It seems that some of our users are copy/pasting emails into a web form which then inserts them into our db. Dates from the email headers (I believe) are the text that are showing as gibberish.
11‎/‎17‎/‎2015‎ ‎7‎:‎26‎:‎26‎ ‎AM
becomes
‎11‎/‎16‎/‎2015‎ ‎07‎:‎26‎:‎26‎ ‎AM
The text in the db field looks normal. It's when it is exported to a csv file and then that file is viewed in a text-editor like Wordpad or Salesforce. Then she sees the odd characters.
This only happens with dates from the text that is copy/pasted into the form/db. I have no idea how, or if there is a way, remove these "unseen" characters.
It's the same three-characters each time: ‎ I did a regex_replace() on these to strip them out, but it doesn't work. I think since they are not seen in the db field, the regex does see them.
It seems like even though I cannot see these characters, they must be there in some form that is making them show in text-editors like Wordpad or the Salesforce client after being exported to csv.
I can probably do a mass search/find/replace in the text editor, but it would be nice to do this in the sql and avoid the extra step each time.
Hoping someone has seen this and knows an easy fix.
Thanks for any ideas or pointers that may help.
The sequence ‎ is a left-to-right mark, encoded in UTF-8 (as 0xE2 0x80 0x8E), but being read as if it were in Windows-1252.
A left-to-right mark is invisible, so the fact that you can't see it in the database suggests that it's encoded correctly, but without knowing precisely what path the data took after that, it's hard to guess exactly where it was misinterpreted.
In any case, you should be able to replace the character in your Postgres query by using its Unicode escape sequence: E'\u200E'

Why can not Google Dataprep handle the encoding in my log files?

We are receiving big log files each month. Before loading them into Google BigQuery they need to be converted from fixed with to delimited. I found a good article on how to do that in Google Dataprep. However, there seems to be something wrong with the encoding.
Each time a Swedish Character appears in the log file, the Split function seems to add another space. This messes up the rest of the columns, as can be seen in the attached screenshot.
I can't determine the correct encoding of the log files, but I know they are being created by pretty old Windows servers in Poland.
Can anyone advice on how to solve this challenge?
Screenshot of the issue in Google Dataprep.
What us the exact recipe you are using ? Do you use (split every x ) ?
When I used in a test case an ISO Latin1 text and ingested it as ISO 8859-1, the output was as expected and only the display was off
Can you try the same ?
Would it be possible to share an example input file with one or two rows ?
As a workaround you can use the RegEx, which should work.
It's unfortunately a bit more complex, because you would have to use multiple regex splits. Here's an example for the first two splits after 10 characters each /.{10}/ and split on //

Uploading Amazon Inventory UTF 8 Encoding

I am trying to upload my english inventory to various european amazon sites. The issue I am having is that the accents found in certain different languages are not displaying correctly when an "inventory file" is uploaded to amazon. The inventory file is a tab delimited text file.
current setup:
$type = 'text/tab-separated-values; charset=utf-8';
header('Content-Type:'.$type);
header('Content-Disposition: attachment; filename="inventory-'.$_GET['cc'].'.txt');
header('Content-Length: ' . strlen($data));
header('Content-Encoding: UTF-8');
When the text file is outputted and saved it looks exactly how it should when opened in windows (all the characters are correct) but for some reason amazon doesn't see it as UTF8 and re-encodes it with all of the characters found here:
http://www.i18nqa.com/debug/utf8-debug.html
I have tried adding the BOM to the top of the file but this just results in amazon giving an error. Has anyone else experienced this?
As #fvu pointed out in his comment, Amazon is expecting the ISO-8859-1 format, not UTF-8. That's why you should use PHP's utf8_decode method when writing to your file.
Ok so after a lot of trying it turns out that the characters needed to be decoded. I opened the text files in excel and they seemed to encode themselves as weird characters like ü using php utf8_decode turned them back into the correct characters EVEN THOUGH the text file showed them as the right characters... very confusing.
To anyone out there having difficulties with UTF 8 try decoding first.
thanks for your help

Input utf-8 characters in management studio

HI,
[background]
We currently build files for many different companies. Our job as a company is basically to sit in between other companies and help with communication and data storage. We have begun to run in to encoding issues where we are receiving data encoded in one format but we need to send it out in another. All files were prevsiously built using the .net framework default of UTF-8. However we've discovered that certain companies cannot read utf-8 files. I assume because they have older systems that require something else. This becomes apparent when sending certain french charaters in particular.
I have a solution in place where we can build a specific file for a specific member using a specific encoding. (While I understand that this may not be enough, unfortunately this is as far as I can go at the moment due to other issues.)
[problem]
Anyways, I'm at the testing stage and I want to input utf-8 or other characters into management studio. Perform an update on some data and then verify that the file is built correctly from that data. I realize that this is not perfect. I've already tried programatically reading the file and verifying the encoding by reading preambles etc. So this is what I'm stuck with. According to this website http://www.biega.com/special-char.html ... I can input utf-8 characters by clicking ALT+&+#+"decimal representation of character" or ALT+"decimal representation of character" but when I use the data specified by the table I get completely different characters in management studio. I've even saved the file in a utf-8 format using management studio by clicking the arrow on the save button in the save dialog and specifying the encoding. So my question is how can I accurately specify a character that will end up being the character I'm trying to input and actually put it in the data that will then be put in a file.
Thanks,
Kevin
I eventually found the solution. The website doesn't specify that you need to type ALT+0+"decimal character representation". The zero was left out. I'd been searching for this for ages.

Toad unicode input problem

In toad, I can see unicode characters that are coming from oracle db. But when I click one of the fields in the data grid into the edit mode, the unicode characters are converted to meaningless symbols, but this is not the big issue.
While editing this field, the unicode characters are displayed correctly as I type. But as soon as I press enter and exit edit mode, they are converted to the nearest (most similar) non-unicode character. So I cannot type unicode characters on data grids. Copy & pasting one of the unicode characters also does not work.
How can I solve this?
Edit: I am using toad 9.0.0.160.
We never found a solution for the same problems with toad. In the end most people used Enterprise Manager to get around the issues. Sorry I couldn't be more help.
Quest officially states, that they currently do not fully support Unicode, but they promise a full Unicode version of Toad in 2009: http://www.quest.com/public-sector/UTF8-for-Toad-for-Oracle.aspx
An excerpt from know issues with Toad 9.6:
Toad's data layer does not support UTF8 / Unicode data. Most non-ASCII characters will display as question marks in the data grid and should not produce any conversion errors except in Toad Reports. Toad Reports will produce errors and will not run on UTF8 / Unicode databases. It is therefore not advisable to edit non-ASCII Unicode data in Toad's data grids. Also, some users are still receiving "ORA-01026: multiple buffers of size > 4000 in the bind list" messages, which also seem to be related to Unicode data.