Spring Cloud Data Flow - Postgresql Metadata issue - spring-cloud

We are setting up SCDF on the Kubernetes cluster through kubectl. We are using PostgreSQL for the Metadata process. We got the set up done, and we can see various tables created in the Postgresql, and we can access SCDF UI, but when we try to add any task or try to import repository, then we are getting this error message :
“could not extract ResultSet; SQL [n/a]; nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet “
Did anyone see this error during the installation phase of SCDF?

Related

Dynamically change VPC inside Glue job

Hi I am having issues with VPC settings. When I use a connection (that has VPC attached) in Glue job I can read data from SQL server but then I can't write data into target Snowflake server due to "timeout". By timeout I mean that job doesn't fail because of any reason but timeout. No errors in logs etc.
If I remove the connection from the same job and replace the SQL server data frame with some dummy spark data frame it all writes to Snowflake without any issue.
For connection to SQL Server I use data catalog and for snowflake I use Spark jdbc connector (2 jar files added to job).
I am thinking about connecting to VPC dynamically from the job script itself, then pull the data into data frame, then disconnect from VPC and write data frame to the target. Does anyone think that it is possible? I didn't find any mentions in documentation TBH.

Azure Data Factory run New Job Cluster Mode Databricks Python Wheel

We are trying to install the external libraries via Azure data factory. After that we are planning to execute our notebook. Inside the notebook we will be using many different libraries to achieve the business logic.
In the Azure data factory, there is the Append Libraries option from where is possible to install new runtime libraries to the job cluster.
Our linked service connects always to a NEW JOB CLUSTER but we are getting below error while execute the ADF pipelines.
Run result unavailable: job failed with error message Library
installation failed for library due to user error for whl:
"dbfs:/FileStore/jars/ephem-4.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
. Error messages: Library installation attempted on the driver node of
cluster 1226-023738-9cm6lm7d and failed. Please refer to the following
error message to fix the library or contact Databricks support.
Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE.
Error Message:
java.util.concurrent.ExecutionException:
java.io.FileNotFoundException:
dbfs:/FileStore/jars/ephem-4.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

AWS DMS Task failed with error: Error executing source loop; Stream component failed at subtask 0

I want to migrate my PostgresDB hosted in Citus cloud service to AWS RDS Aurora Postgres.
I am using AWS DMS service. Have created task but getting following errors:
Last failure message Last Error Stream Component Fatal error. Task
error notification received from subtask 0, thread 0
[reptask/replicationtask.c:2860] [1020101] Error executing source
loop; Stream component failed at subtask 0, component
st_0_QOIS7XIGJDKNPY6RXMGYRLJQHY2P7IQBWIBA5NQ; Stream component
'st_0_QOIS7XIGJDKNPY6RXMGYRLJQHY2P7IQBWIBA5NQ' terminated
[reptask/replicationtask.c:2868] [1020101] Stop Reason FATAL_ERROR
Error Level FATAL
Frankly speaking not able to understand what is wrong here, so any help is appreciated.
cloudwatch logs:
I changed type to Full load it worked so it is not working for ongoing replication Citus Cloud service don't support it.
I had a similar error to this using Aurora PostgreSQL v14.5 and AWS DMS. I was using a DMS Full load + CDC job (using pglogical behind the scenes) to migrate from one table to another (on the same system).
Issue was resolved by rolling back my PostgreSQL version from 14.5 to 13.7.

An error occurred while creating datasets: Dataset could not be created

I have a running Kylin Cluster in Kubernetes and Superset in Kubernets also.
Kylin is already configured with a built cube "kylin_sales_cube"
Superset is already configured with Kylin driver and the connection is established.
While trying to create a dataset from a Kylin table I get the following error message:
An error occurred while creating datasets: Dataset could not be created.
On the other hand, I am able to run a query on the same table, but without dataset, I cannot use charts.
Any ideas?
It seems a lack of implementation of a method in kylinpy (or somehere else) but until someone solves it, I suggest everyone who has this problem to implement the has_table method in sqla_dialect.py from kylinpy plugin. You will find it in kylinpy/sqla_dialect.py, and the method is:
You should change that return to the next line:
return table_name in self.get_table_names(connection, schema)
And everything will be back to normal.

AWS Glue : Unable to process data from multiple sources S3 bucket and postgreSQL db with AWS Glue using Scala-Spark

For my requirement, I need to join data present in PostgreSQL db(hosted in RDS) and file present in S3 bucket. I have created a Glue job(spark-scala) which should connect to both PostgreSQL, S3 bucket and complete processing.
But Glue job encounters connection timeout while connecting to S3(below is error msg). It is successfully fetching data from PostgreSQL.
There is no permission related issue with S3 because I am able to read/write from same S3 bucket/path using different job. The exception/issue happens only if I try to connect both postgreSQL and S3 in one glue job/script.
In Glue job, glue context is created using SparkContext as object. I have tried creating two different sparkSession, each for S3 and postgreSQL db but this approach didnt work. Same timeout issue encountered.
Please help me in resolving the issue.
Error/Exception from log:
ERROR[main] glue.processLauncher (Logging.scala:logError(91)):Exception in User Class
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to emp_bucket.s3.amazonaws.com:443
[emp_bucket.s3.amazonaws.com/] failed : connect timed out
This is fixed.
Issue was with security group. Only TCP traffic was allowed earlier.
As part of the fix traffic was opened for all. Also, added HTTPS rule in inbound rules as well.