Informatica Powercenter data truncation/overflow error - unicode

I'm trying to make a simple data loading with an IPC mapping from one Oracle DB to another.
The source table structure is following:
ID NUMBER;
C_VALUE VARCHAR2 (16);
C_CODE VARCHAR2 (16);
SN NUMBER;
SU NUMBER;
The target table structure is following:
ID NUMBER
C_VALUE VARCHAR2 (20)
SSID NUMBER
LOADID NUMBER
LOADROWNUMBER NUMBER
DATEBEGIN DATE
DATEEND DATE
When I'm running the workflow I'm getting the following error:
8340||Error: Target table [TYPE_ACC_RRB] data truncation/overflow
error.
When I'm trying to debug my mapping, I'm seing that my input string in the c_value field is presented by the unicode characters and it's length is doubled in bytes.
Does the Informatica count chars of bytes as the length of it's string fields?
How to make it see for chars, not for bytes?
What I see from the session log is:
Server Mode: [UNICODE]
Server Code page: [UTF-8 encoding of Unicode]
The session sort order is [Binary].
Source database connection [RBO01] code page: [MS Windows Cyrillic (Slavic)]
Target database connection [STG1] code page: [MS Windows Cyrillic (Slavic)]
My mapping:

The solution was to define the environment variable NLS_LANG=russian_russia.cl8mswin1251 on the IPC server

Related

LONG VARCHAR - Read from table to Front (C#) then INSERT / UPDATE value to a table

I'm reading table info from a SqlBase DB using a DatAdapter.Fill in C#, it's working perfectly for any variable type except for LONG VARCHAR, in this case it converts to String type in C# and if I add a watch the object variable I see some weird chars into it, so later when i try to insert/update another table (in another Database) it fails.
I know that even if the value would be ok in C# i can't insert that as it is, the document say i should bind the value to a variable to be able to insert into table, but i'm not sure how to do it since i'm creating the scripts in C# to be run in SqlBase, i'm not taking direct action from C#, even if a could i'm not being able to read the value correctly since it converts to string with weird digits into it, is this LOAG VARCHAR like a VARBINARY in Sql Server? I assume so because the column i have problems with is a LOGO, like a picture.
So in short, is it any way to
Read a long varchar from .NET and then..
..Use it when inserting / updating values to a table?
(1) is .NET but (2) is a sql script to be run on Sqlbase using SqlTalk.
Thanks!
Suggest you UNLOAD the Long data to a flat file using SQLTalk command. That way you'll get readable data. Read the flat file using C# if you need to, and do what ever you want with it, but to re-load the data into another table using SQLTalk, you need to use specific syntax. Go here: SQLBase Manuals ( all versions ) extract the manual appropriate to the version of SQLBase you are using, and 1) read up on UNLOAD from the 'SQLBase Language Reference' to get the Long data out into a flat file ( there are different snytax giving different results ). Then 2) read up on 'Examples of Bind Variables for Long data' from the 'SQLTalk Command Reference' , as you have to set LONG VARCHAR data to bind variables.
When inserting long data into a LONG VARCHAR or LONG NVARCHAR or LONG BINARY column, precede it with the $LONG keyword. You can then start entering data on the next line, and continue entering on successive lines. To mark the end of text, enter a double slash on a new line (//). e.g.
INSERT INTO BIO (NAME, BIO) VALUES (:1,:2)
\
SHAKESPEARE, $LONG
William Shakespeare was born in Stratford-on-Avon on
April 16, 1564. He was England's most famous poet and
dramatist. . . . .. . .
He died in 1616, leaving his second best bed to his wife.
//
If the data for the LONG (N)VARCHAR or LONG VARBINARY column comes from a file, enter the name of the file after the $LONG keyword. e.g.
INSERT INTO BIO (NAME, BIO) VALUES (:1,:2)
\
SHAKESPEARE, $LONG shakes.txt
JONSON,$LONG jonson.txt
O'NEILL,$LONG oneill.txt
/
To Update Long data e.g.
UPDATE TABLE EXPENSES SET COMMENTS = :1 WHERE DATE = :2
\
"Beltran Tree Service", 1/1/94 "Hercules", 1/2/94
"Checkup", 1/3/94
/

BigQuery - create table via UI from cloud storage results in integer error

I am trying to test out BigQuery but am getting stuck on creating a table from data stored in google cloud storage. I am able to reduce the data down to just one value, but it is not making sense.
I have a text file I uploaded to google cloud storage with just one integer value in it, 177790884
I am trying to create a table via the BigQuery web UI, and go through the wizard. When I get to the schema definition section, I enter...
ID:INTEGER
The load always fails with...
Errors:
File: 0 / Line:1 / Field:1: Invalid argument: 177790884 (error code: invalid)
Too many errors encountered. Limit is: 0. (error code: invalid)
Job ID trusty-hangar-120519:job_LREZ5lA8QNdGoG2usU4Q1jeMvvU
Start Time Jan 30, 2016, 12:43:31 AM
End Time Jan 30, 2016, 12:43:34 AM
Destination Table trusty-hangar-120519:.onevalue
Source Format CSV
Allow Jagged Rows true
Ignore Unknown Values true
Source URI gs:///onevalue.txt
Schema
ID: INTEGER
If I load with a schema of ID:STRING it works fine. The number 177790884 is not larger than a 64 bit signed int, I am really unsure what is going on.
Thanks,
Craig
Your input file likely contains a UTF-8 byte order mark (3 "invisible" bytes at the beginning of the file that indicate the encoding) that can cause BigQuery's CSV parser to fail.
https://en.wikipedia.org/wiki/Byte_order_mark
I'd suggest Googling for a platform-specific method for view and remove the byte order mark. (A hex editor would do.)
The issue is definitely with file's encoding. I was able to reproduce error.
And then "fixed" it by saving "problematic" file as ANSI (just for test) and now it was loaded successfully.

How can I change character code from Shift-JIS to UTF-8 when I copy data from DB2 to Postgres?

I'm trying to migrate data from DB2 to Postgres using pentaho ETL now.
character code on DB2 is Shift-JIS (Japanese specific character code) and Postgres is UTF-8.
I could migrate data from DB2 to Postgres successfully, but Japanese character has not been transformed properly (it has been changed to strange characters..)
How can I change character code from Shift-Jis to UTF-8 when I transfer data?
It was bit though problem for me, but I could solve it finally.
first, you need to choose "Modified Java Script value" from job list and write the script as below.
(I'm assuming that the value in the table is column1 and new value is value1)
here is the example of the source code. (You can specify multiple values if you need)
var value1 = new Packages.java.lang.String(new
Packages.java.lang.String(column1).getBytes("ISO8859_1"),"Shift-JIS").replaceAll(" ","");
//you don't need to use replaceAll() if you don't need to trim the string.
Finally, click "Get variables" and the value will be shown in the table below.
then, you can choose the "value1" in the next job and it has been converted to correct encode. (which you specified)

String contains invalid or unsupported UTF8 codepoints. Bad UTF8 hex sequence:

Team,
I am using redshift version *(8.0.2 ). while loading data using COPY command, I get an error: - "String contains invalid or unsupported UTF8 codepoints, Bad UTF8 hex sequence: bf (error 3)".
It seems COPY trying to load UTF-8 "bf" into VARCHAR field. As per Amazon redshift, this error code 3 defines below:
error code3:
The UTF-8 single-byte character is out of range. The starting byte must not be 254, 255
or any character between 128 and 191 (inclusive).
Amazon recommnds this as solution - we need to go replace the character with a valid UTF-8 code sequence or remove the character.
could you please help me how to replace the character with valid UTF-8 code ?
when i checked database properties in PG-ADMIN, it shows the encoding as UTF-8.
Please guide me how to replace the character in the input delimited file.
Thanks...
I've run into this issue in RedShift while loading TPC-DS datasets for experiments.
Here is the documentation and forum chatter I found via AWS:https://forums.aws.amazon.com/ann.jspa?annID=2090
And here is the explicit commands you can use to solve data conversion errors:http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-acceptinvchars
You can explicitly replace the invalid UTF-8 characters or disregard them all together during the COPY phase by stating ACCEPTINVCHARS.
Try this:
copy table from 's3://my-bucket/my-path
credentials 'aws_iam_role=<your role arn>'
ACCEPTINVCHARS
delimiter '|' region 'us-region-1';
Warnings:
Load into table 'table' completed, 500000 record(s) loaded successfully.
Load into table 'table' completed, 4510 record(s) were loaded with replacements made for ACCEPTINVCHARS. Check 'stl_replacements' system table for details.
0 rows affected
COPY executed successfully
Execution time: 33.51s
Sounds like the encoding of your file might not be utf-8. You might try this technique that we use sometimes
cat myfile.tsv| iconv -c -f ISO-8859-1 -t utf8 > myfile_utf8.tsv
For many people loading CSVs into databases, they get their files from someone using Excel or they have access to Excel. If so, this problem is quickly solved by:
First saving the file out of Excel using the Save As and selecting CSV UTF-8 (Comma Delimited) (*.csv) format, by requesting/training those giving you the files to use this export format. Note many people by default export to csv using the CSV (Comma delimited) (*.csv) format and there is a difference.
Loading the csv into Excel and then immediately Saving As to the UTF-8 csv format.
Of course it wouldn't work for files unusable by Excel, ie. larger than 1 million rows, etc. Then I would use the iconv suggestion by mike_pdb
Noticed Athena external table is able to parse data which Redshift copy command unable to do. We can use below alternative approach when encountering - String contains invalid or unsupported UTF8 codepoints Bad UTF8 hex sequence: 8b (error 3).
Follow below steps, if you want to load data into redshift database db2 and table table2.
Have a Glue crawler IAM role ready which has access to S3.
Run crawler.
Validate table and database in Athena created by Glue crawler, say external db1_ext, table1_ext
Login to redshift and create linking with Glue Catalog by creating Redshift schema (db1_schema) using below command.
CREATE EXTERNAL SCHEMA db1_schema
FROM DATA CATALOG
DATABASE 'db1_ext'
IAM_ROLE 'arn:aws:iam:::role/my-redshift-cluster-role';
Load from external table
INSERT INTO db2.table2 (SELECT * FROM db1_schema.table1_ext)

UTF-8 - Oracle issue

I set my NLS_LANG variable as 'AMERICAN_AMERICA.AL32UTF8' in the perl file that connects to oracle and tries to insert the data.
However when I insert a record with one value having this 'ñ' character the sql fails.
But if I use 'Ñ' it inserts just fine.
What am I doing wrong here?
Additional info:
If I change my NLS_LANG to 'AMERICAN_AMERICA.UTF8' I can insert 'ñ' just fine...
What does it fail with ?
Generally if there is a problem in character conversion it fails quietly (eg recording a character with an inappropriate translation). Sometimes you get an error which indicates that the column isn't large enough. This is typically when trying to store, for example, a character that takes up two or three bytes in a column that only allows one byte.
First step is to confirm the database settings
select * from V$NLS_PARAMETERS where parameter like ‘%CHARACTERSET%’;
Then check the byte composition of the strings with a:
select dump('ñ',16), dump('Ñ',16) from dual;
The first query gives me:
1 NLS_CHARACTERSET AL32UTF8
2 NLS_NCHAR_CHARACTERSET AL16UTF16
The second query gives me:
1 Typ=96 Len=2: c3,b1 Typ=96 Len=2: c3,91
My exact db and perl settings are listed in this question:
https://stackoverflow.com/questions/3016128/dbdoracle-and-utf8-issue