invalid input syntax for type json aws dms postgres - postgresql

I'm running a task that migrates all data from a postgres 10.4 to a RDS postgres 10.4.
Not able to migrate tables which have jsonb column.
After error, whole table is getting suspended.Table contain 449 rows only.
I have made following error policy, still whole table suspended.
"DataErrorPolicy": "IGNORE_RECORD",
"DataTruncationErrorPolicy": "IGNORE_RECORD",
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"DataErrorEscalationCount": 1000,
My expectation is that whole table should be transferred, it can ignore record if any json is wrong.
I dont know why its giving this error 'invalid input syntax for type json' , i have checked all json and all jsons are valid.
After debugging more, this error has been considered as TABLE error , but why ? Thats why table got suspended since TableErrorPolicy is 'SUSPEND_TABLE'.
Why this error considered as table error instead of record error?
Is JSONB column not supported by DMS thats why we are getting below error?
Logs :-
2020-09-01T12:10:04 https://forums.aws.amazon.com/I: Next table to load 'public'.'TEMP_TABLE' ID = 1, order = 0 (tasktablesmanager.c:1817)
2020-09-01T12:10:04 https://forums.aws.amazon.com/I: Start loading table 'public'.'TEMP_TABLE' (Id = 1) by subtask 1.
Start load timestamp 0005AE3F66381F0F (replicationtask_util.c:755)
2020-09-01T12:10:04 https://forums.aws.amazon.com/I: REPLICA IDENTITY information for table 'public'.'TEMP_TABLE': Query status='Success' Type='DEFAULT'
Description='Old values of the Primary Key columns (if any) will be captured.' (postgres_endpoint_unload.c:191)
2020-09-01T12:10:04 https://forums.aws.amazon.com/I: Unload finished for table 'public'.'TEMP_TABLE' (Id = 1). 449 rows sent. (streamcomponent.c:3485)
2020-09-01T12:10:04 https://forums.aws.amazon.com/I: Table 'public'.'TEMP_TABLE' contains LOB columns, change working mode to default mode (odbc_endpoint_imp.c:4775)
2020-09-01T12:10:04 https://forums.aws.amazon.com/I: Table 'public'.'TEMP_TABLE' has Non-Optimized Full LOB Support (odbc_endpoint_imp.c:4788)
2020-09-01T12:10:04 https://forums.aws.amazon.com/I: Load finished for table 'public'.'TEMP_TABLE' (Id = 1). 449 rows received. 0 rows skipped.
Volume transferred 190376. (streamcomponent.c:3770)
2020-09-01T12:10:04 https://forums.aws.amazon.com/E: RetCode: SQL_ERROR SqlState: 22P02 NativeError: 1 Message: ERROR: invalid input syntax for type json;
Error while executing the query https://forums.aws.amazon.com/ (ar_odbc_stmt.c:2648)
2020-09-01T12:10:04 https://forums.aws.amazon.com/W: Table 'public'.'TEMP_TABLE' (subtask 1 thread 1) is suspended (replicationtask.c:2471)
Edit- after debugging more, this error has been considered as TABLE error , but why ?

JSONB column data type must be nullable in target DB.
Note- In my case, after making JSONB column as nullable, this error disappeared.
As mentioned in AWS documentation-
In this case, AWS DMS treats JSONB data as if it were a LOB column. During the full load phase of a migration, the target column must be nullable.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source.PostgreSQL.Prerequisites
https://aws.amazon.com/premiumsupport/knowledge-center/dms-error-null-value-column/

AWS DMS treats the JSON data type in PostgreSQL as a LOB data type column. This means that the LOB size limitation when you use limited LOB mode applies to JSON data. For example, suppose that limited LOB mode is set to 4,096 KB. In this case, any JSON data larger than 4,096 KB is truncated at the 4,096 KB limit and fails the validation test in PostgreSQL.
Reference: AWS DMS - JSON data types being truncated
Update: You can tweak the error handling task settings to skip erroneous rows by setting the value for DataErrorPolicy to IGNORE_RECORD which determines the action AWS DMS takes when there is an error related to data processing at the record level.
Some examples of data processing errors include conversion errors, errors in transformation, and bad data. The default is LOG_ERROR. IGNORE_RECORD, the task continues and the data for that record is ignored.
Reference: AWS DMS - Error handling task settings

You mentioned that you're migrating from PostgreSQL to PostgreSQL. Is there a specific reason to Use AWS DMS?
AWS Docs: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source.PostgreSQL.Homogeneous
When you migrate from a database engine other than PostgreSQL to a PostgreSQL database, AWS DMS is almost always the best migration tool to use. But when you are migrating from a PostgreSQL database to a PostgreSQL database, PostgreSQL tools can be more effective.
...
We recommend that you use PostgreSQL database migration tools such as pg_dump under the following conditions:
You have a homogeneous migration, where you are migrating from a source PostgreSQL database to a target PostgreSQL database.
You are migrating an entire database.
The native tools allow you to migrate your data with minimal downtime.

Related

MIgrate primary key LOB columns - CDC phase

I'm trying to do a full load + CDC replication from an RDS instance (postgres 11) to an RDS cluster (Aurora Postgres 11).
Some of the tables in my database have text columns as primary keys (which are considered LOBs).
During the full load phase, these columns are migrated inline (by configuring the InlineLobMaxSize parameter). But according to AWS,
InlineLobMaxSize – This value determines which LOBs AWS DMS transfers inline during a full load.
Which causes the migration to fail during the CDC phase, with errors like
ERROR: duplicate key value violates unique constraint "xxx_pkey"
DETAIL: Key (xxx_id)=() already exists.
Seems like during CDC phase, it does the 2-step migration:
Copies the row with a null value for the LOB column
Copies the value itself.
How can I fix this?

How to handle NULL with AWS Glue bookmark

I have a table of 30GB in size. I am running an ETL with an AWS Glue job that copies the table to an S3 bucket.
I try to bookmark using the combination of a couple of columns as the bookmark key. Some of the columns have rows with null values.
I get this error:
An error occurred while calling o97.getDynamicFrame. Incorrect DATETIME value: 'null'
I would like to ask if there is any way to give the column a default value.
The other alternative was moving the entire table without bookmark which I don't think is efficient.

PostreSQL error relation already exists while creating a partition

I created a partition (programmatically with Java, JPA/native query), after that I deleted it manually pgAdmin with DROP table my_partition. After that, I'm trying to re-create it again programmatically, but I get this error
SQL Error: 0, SQLState: 42P07
ERROR: relation "partition_2020_12_08" already exists
CREATE TABLE "myschema.com".partition_2020_12_08 PARTITION OF "myschema.com".measurement FOR VALUES FROM (1607385600000) TO (1607471999999)
Interesting that when I execute that SQL with pgAdmin, it works fine. It looks to me that PostreSQL caches some information when I'm using it with JDBC/Java driver.
How to debug this issue? I need to have a possibility to re-create the same partitions if needed.

SparkSQL/JDBC error com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or variable #7: Cannot find data type BLOB

Saving DataFrame to table with VARBINARY columns is throwing error:
com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or
variable #7: Cannot find data type BLOB
If I try to use VARBINARY in createTableColumnTypes option, I get "VARBINARY not supported".
Workaround is:
change TARGET schema to use VARCHAR.
Add .option("createTableColumnTypes", "Col1 varchar(500), Col2) varchar(500)")
While this workaround lets us go ahead with saving rest of data, actual binary data from source table (from where Data is read) is not saved correctly for these 2 columns - we see NULL data.
We are using MS SQL Server 2017 JDBC driver and Spark 2.3.2.
Any help, workaround to address this issue correctly so that we don't lose data is appreciated.

Is it possible to change the chunk sequence in a TOAST table in postgresql

I have a table in a postgresql database (version 9.6) which stores pdf documents. When I try to vacuum this table the following error message comes up:
unexpected chunk number 57 (expected 25) for toast value 1047226 in pg_toast_1027390
If I look into the toast table itself I can see, that the chunk number is incorrect, but is there any way to correct it?
Or is there a way to identify the TOAST pointer in the normal relation to identify which pdf document is affected?
What I did so far is to reindex and to vacuum the associated toast table which both succeeded. Vacuuming of the normal relation failed. I also tried to export and import all data into a new database with no success. The error in the toast table remains. A pg_dump also fails.