String Contains Untranslatable characters - Informatica Teradata ODBc - unicode

I'm Trying to flow the data from staging table to dimension, in which both the tables are unicode. When i'm loading data from file to my staging table the data is loaded (unicode). But when i'm loading data to Dimension layer table(unicode) from staging, the data is not inserting and the following error is thrown.
Note: I have changed the code page for connection strings in workflow manager, odbc administrator and the table also have unicode charset defined.
FnName: Execute -- [Teradata][ODBC Teradata Driver][Teradata Database] The string contains an untranslatable character.
LEGAL_ENTITY_NAME (LEGAL_ENTITY_NAME:UniChar.255:): "SCIENCES FARMAC->UTICA DO BRASIL LTDA"
EXTERNAL_VALUE (EXTERNAL_VALUE:UniChar.255:): "G_BR_501"
PARENT_LEGAL_ENTITY_NAME (PARENT_LEGAL_ENTITY_NAME:UniChar.255:): "GLOBAL (ALL COMPANIES)"
OTHER_PARENT_LEGAL_ENTITY_NAME1 (OTHER_PARENT_LEGAL_ENTITY_NAME1:UniChar.255:): "(NULL)"
OTHER_PARENT_LEGAL_ENTITY_NAME2 (OTHER_PARENT_LEGAL_ENTITY_NAME2:UniChar.255:): "(NULL)"
COUNTRY_CD (COUNTRY_CD:UniChar.10:): "BR
Original data in the staging table is shown below
SCIENCES FARMACˆUTICA DO BRASIL LTDA,G_BR_501,(ALL COMPANIES),null,null,BR
Help is very much appreciated

Related

copy activity fail in azure data factory column delimiter issue?

I have a source csv file which i am loading to sql db using copy activity. In the 45th row i have a cell with this kind of data with unwanted characters.
Atualmente, as solicitações de faturamento manual de serviços de mobilidade de clientes da Região
I tried loading the file. Its throwing error at row 45 that it has more column count than expected. I tried removing unwanted characters from this text. Then the copy actvty got executed. In source my delimiter is set as , by default. How can I handle this situation. Source csv file is in UTF8 format. in sql db i have set every column to varchar(max).
I reproduced this and got the same error when I had the same data in my 3rd row without any double quotes for the data.
If you want to use the default delimiter(,), then use double quotes(") over rows.
Target data after copy activity:

Azure Data Factory schema mapping not working with SQL sink

I have a simple pipeline that loads data from a csv file to an Azure SQL db.
I have added a data flow where I have ensured all schema matches the SQL table. I have a specific field which contains numbers with leading zeros. The data type in the source - projection is set to string. The field is mapped to the SQL sink showing as string data-type. The field in SQL has nvarchar(50) data-type.
Once the pipeline is run, all the leading zeros are lost and the field appears to be treated as decimal:
Original data: 0012345
Inserted data: 12345.0
The CSV data shown in the data preview is showing correctly, however for some reason it loses its formatting during insert.
Any ideas how I can get it to insert correctly?
I had repro’d in my lab and was able to load as expected. Please see the below repro details.
Source file (CSV file):
Sink table (SQL table):
ADF:
Connect the data flow source to the CSV source file. As my file is in text format, all the source columns in the projection are in a string.
Source data preview:
Connect sink to Azure SQL database to load the data to the destination table.
Data in Azure SQL database table.
Note: You can all add derived columns before sink to convert the value to string as the sink data type is a string.
Thank you very much for your response.
As per your post the DF dataflow appears to be working correctly. I have finally discovered an issue with the transformation - I have an Azure batch service which runs a python script, which does a basic transformation and saves the output to a csv file.
Interestingly, when I preview the data in the dataflow, it looks as expected. However, the values stored in SQL are not.
For the sake of others having a similar issue, my existing python script used to convert a 'float' datatype column to string-type. Upon conversion, it used to retain 1 decimal number but as all of my numbers are integers, they were ending up with .0.
The solution was to convert values to integer and then to string:
df['col_name'] = df['col_name'].astype('Int64').astype('str')

Lift load Dateformat issue from csv file

we are migrating db2 data to db2 on cloud. We are using below lift cli operation for migration.
Extracting a database table to a CSV file using lift extract from source database.
Then loading the extracted CSV file to db2 on cloud using 'lift load'
ISSUE:
We have created some tables using ddl on the target db2oncloud which have some columns with DATA TYPE "TIMESTAMP"
while load operation(lift load), we are getting below error"
"MESSAGE": "The field in row \"2\", column \"8\" which begins with
\"\"2018-08-08-04.35.58.597660\"\" does not match the user specified
DATEFORMAT, TIMEFORMAT, or TIMESTAMPFORMAT. The row will be
rejected.", "SQLCODE": "SQL3191W"
If you use db2 as a source database, then use either:
the following property during export (to export dates, times, timestamps as usual for db2 utilities - without double quotes):
source-database-type=db2
try to use the following property during load, if you have already
exported timestamps surrounded by double quotes:
timestamp-format="YYYY-MM-DD-HH24.MI.SS.FFFFFF"
If the data was extracted using lift extract then for sure you should load the data with source-database-type=db2. Using this parameter will preconfigure all the necessary load details automatically.

JSON Data loading into Redshift Table

I am trying to load JSON Data into Redshift Table. Below is the sample code, Table structure and JSON Data.
I have gone through many post in this site and AWS. However, my issue is not yet resolved.
JSON data is below, that I copied the below data in test.json and uploaded in S3...
{backslash: "a",newline: "ab",tab: "dd"}
Table structure is as below
create table escapes (backslash varchar(25), newline varchar(35), tab
varchar(35));
Copy command is as below
copy escapes from 's3://dev/test.json'
credentials 'aws_access_key_id=******;aws_secret_access_key=$$$$$'
format as JSON 'auto';
However it throws the below error
Amazon Invalid operation: Load into table 'escapes' failed. Check 'stl_load_errors' system table for details.;
1 statement failed.
In the 'stl_load_errors' table , the error reason is as below "Invalid value."
Seems like issue is with your JSON data. Ideally it should be-
{
"backslash": "a",
"newline": "ab",
"tab": "dd"
}
I hope this should resolve your issue, but if not, update your question and I could reattempt the answer.

String contains invalid or unsupported UTF8 codepoints. Bad UTF8 hex sequence:

Team,
I am using redshift version *(8.0.2 ). while loading data using COPY command, I get an error: - "String contains invalid or unsupported UTF8 codepoints, Bad UTF8 hex sequence: bf (error 3)".
It seems COPY trying to load UTF-8 "bf" into VARCHAR field. As per Amazon redshift, this error code 3 defines below:
error code3:
The UTF-8 single-byte character is out of range. The starting byte must not be 254, 255
or any character between 128 and 191 (inclusive).
Amazon recommnds this as solution - we need to go replace the character with a valid UTF-8 code sequence or remove the character.
could you please help me how to replace the character with valid UTF-8 code ?
when i checked database properties in PG-ADMIN, it shows the encoding as UTF-8.
Please guide me how to replace the character in the input delimited file.
Thanks...
I've run into this issue in RedShift while loading TPC-DS datasets for experiments.
Here is the documentation and forum chatter I found via AWS:https://forums.aws.amazon.com/ann.jspa?annID=2090
And here is the explicit commands you can use to solve data conversion errors:http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-acceptinvchars
You can explicitly replace the invalid UTF-8 characters or disregard them all together during the COPY phase by stating ACCEPTINVCHARS.
Try this:
copy table from 's3://my-bucket/my-path
credentials 'aws_iam_role=<your role arn>'
ACCEPTINVCHARS
delimiter '|' region 'us-region-1';
Warnings:
Load into table 'table' completed, 500000 record(s) loaded successfully.
Load into table 'table' completed, 4510 record(s) were loaded with replacements made for ACCEPTINVCHARS. Check 'stl_replacements' system table for details.
0 rows affected
COPY executed successfully
Execution time: 33.51s
Sounds like the encoding of your file might not be utf-8. You might try this technique that we use sometimes
cat myfile.tsv| iconv -c -f ISO-8859-1 -t utf8 > myfile_utf8.tsv
For many people loading CSVs into databases, they get their files from someone using Excel or they have access to Excel. If so, this problem is quickly solved by:
First saving the file out of Excel using the Save As and selecting CSV UTF-8 (Comma Delimited) (*.csv) format, by requesting/training those giving you the files to use this export format. Note many people by default export to csv using the CSV (Comma delimited) (*.csv) format and there is a difference.
Loading the csv into Excel and then immediately Saving As to the UTF-8 csv format.
Of course it wouldn't work for files unusable by Excel, ie. larger than 1 million rows, etc. Then I would use the iconv suggestion by mike_pdb
Noticed Athena external table is able to parse data which Redshift copy command unable to do. We can use below alternative approach when encountering - String contains invalid or unsupported UTF8 codepoints Bad UTF8 hex sequence: 8b (error 3).
Follow below steps, if you want to load data into redshift database db2 and table table2.
Have a Glue crawler IAM role ready which has access to S3.
Run crawler.
Validate table and database in Athena created by Glue crawler, say external db1_ext, table1_ext
Login to redshift and create linking with Glue Catalog by creating Redshift schema (db1_schema) using below command.
CREATE EXTERNAL SCHEMA db1_schema
FROM DATA CATALOG
DATABASE 'db1_ext'
IAM_ROLE 'arn:aws:iam:::role/my-redshift-cluster-role';
Load from external table
INSERT INTO db2.table2 (SELECT * FROM db1_schema.table1_ext)