AWS DMS Task failed with error: Error executing source loop; Stream component failed at subtask 0 - postgresql

I want to migrate my PostgresDB hosted in Citus cloud service to AWS RDS Aurora Postgres.
I am using AWS DMS service. Have created task but getting following errors:
Last failure message Last Error Stream Component Fatal error. Task
error notification received from subtask 0, thread 0
[reptask/replicationtask.c:2860] [1020101] Error executing source
loop; Stream component failed at subtask 0, component
st_0_QOIS7XIGJDKNPY6RXMGYRLJQHY2P7IQBWIBA5NQ; Stream component
'st_0_QOIS7XIGJDKNPY6RXMGYRLJQHY2P7IQBWIBA5NQ' terminated
[reptask/replicationtask.c:2868] [1020101] Stop Reason FATAL_ERROR
Error Level FATAL
Frankly speaking not able to understand what is wrong here, so any help is appreciated.
cloudwatch logs:

I changed type to Full load it worked so it is not working for ongoing replication Citus Cloud service don't support it.

I had a similar error to this using Aurora PostgreSQL v14.5 and AWS DMS. I was using a DMS Full load + CDC job (using pglogical behind the scenes) to migrate from one table to another (on the same system).
Issue was resolved by rolling back my PostgreSQL version from 14.5 to 13.7.

Related

ADF Dataflow stuck IN progress and fail with below errors

ADF Pipeline DF task is Stuck in Progress. It was working seamlessly last couple of months but suddenly Dataflow stuck in progress and Time out after certain time. We are using IR managed Virtual Network. I am using forereach loop to run data flow for multiple entities parallel, it always randomly get stuck on last Entity.
What can I try to resolve this?
Error in Dev Environment
Error Code 4508
Spark cluster not found
Error in Prod Environment:
Error code
5000
Failure type
User configuration issue
Details
[plugins.*** ADF.adf-ir-001 WorkspaceType:<ADF> CCID:<f289c067-7c6c-4b49-b0db-783e842a5675>] [Monitoring] Livy Endpoint=[https://hubservice1.eastus.azuresynapse.net:8001/api/v1.0/publish/815b62a1-7b45-4fe1-86f4-ae4b56014311]. Livy Id=[0] Job failed during run time with state=[dead].
Images:
I tried below steps:
By changing IR configuring as below
Tried DF Retry and retry Interval
Also, tried For each loop one batch at a time instead of 4 batch parallel. None of the above trouble-shooting steps worked. These PL is running last 3-4 months without a single failure, suddenly they started to fail last 3 days consistently. DF flow always stuck in progress randomly for different entity and times out in one point by throwing above errors.
Error Code 4508 Spark cluster not found.
This error can cause because of two reasons.
The debug session is getting closed till the dataflow finish its transformation in this case recommendation is to restart the debug session
the second reason is due to resource problem, or an outage in that particular region.
Error code 5000 Failure type User configuration issue Details [plugins.*** ADF.adf-ir-001 WorkspaceType: CCID:] [Monitoring] Livy Endpoint=[https://hubservice1.eastus.azuresynapse.net:8001/api/v1.0/publish/815b62a1-7b45-4fe1-86f4-ae4b56014311]. Livy Id=[0] Job failed during run time with state=[dead].
A temporary error is one that says "Livy job state dead caused by unknown error." At the backend of the dataflow, a spark cluster is used, and this error is generated by the spark cluster. to get the more information about error go to StdOut of sparkpool execution.
The backend cluster may be experiencing a network problem, a resource problem, or an outage.
If error persist my suggestion is to raise Microsoft support ticket here

AWS DMS task is failing in CDC with broken connection error

We are using AWS DMS to do a live migration between AWS RDS to AWS RDS.This task is in Full load+CDC mode. Full load is completed successfully but CDC is now failing with below error:
Last Error Load utility network error. Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2883] [1020458] Error executing source loop; Stream component failed at subtask 0, component st_0_ADWXVXURDV4UXYIGPH5US2PQW6XSQVFD5K4NFAY; Stream component 'st_0_ADWXVXURDV4UXYIGPH5US2PQW6XSQVFD5K4NFAY' terminated [reptask/replicationtask.c:2891] [1020458] Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE
In the cloudwatch I can only see below error:
WAL reader terminated with broken connection / recoverable error. [1020458].
I am not sure what might be happening here and my only guess is to fix this I may need to run CDC again with custom checkpoint. Can anyone help me on this?
I tried debugging this issue with further logging levels and also tested the connectivities. I looked into cloudwatch metrics but nothing seems suspicious. Also do note that, CDC did start successfully but has now entered into failed state.

System metrics not support for postgresql. Skipping processing

I am trying to get some sample data from all the tables from my postgres DB. I am running the profiler job from airflow. I am getting below error
{system.py:65} INFO - System metrics not support for postgresql. Skipping processing.
any input would be great.
thank you

Azure Data Factory run New Job Cluster Mode Databricks Python Wheel

We are trying to install the external libraries via Azure data factory. After that we are planning to execute our notebook. Inside the notebook we will be using many different libraries to achieve the business logic.
In the Azure data factory, there is the Append Libraries option from where is possible to install new runtime libraries to the job cluster.
Our linked service connects always to a NEW JOB CLUSTER but we are getting below error while execute the ADF pipelines.
Run result unavailable: job failed with error message Library
installation failed for library due to user error for whl:
"dbfs:/FileStore/jars/ephem-4.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
. Error messages: Library installation attempted on the driver node of
cluster 1226-023738-9cm6lm7d and failed. Please refer to the following
error message to fix the library or contact Databricks support.
Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE.
Error Message:
java.util.concurrent.ExecutionException:
java.io.FileNotFoundException:
dbfs:/FileStore/jars/ephem-4.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

AWS DMS task fails to retrieve tables

I'm trying to migrate existing data and replicate ongoing changes
the source database is PostgreSQL it's managed by aws.
the target is kafka.
I'm facing the below issue.
Last Error No tables were found at task initialization. Either the selected table(s) or schemas(s) no longer exist or no match was found for the table selection pattern(s). If you would like to start a Task that does not initially capture any tables, set Task Setting FailOnNoTablesCaptured to false and restart task. Stop Reason FATAL_ERROR Error Level FATAL