Azure Data Factory run New Job Cluster Mode Databricks Python Wheel - pyspark

We are trying to install the external libraries via Azure data factory. After that we are planning to execute our notebook. Inside the notebook we will be using many different libraries to achieve the business logic.
In the Azure data factory, there is the Append Libraries option from where is possible to install new runtime libraries to the job cluster.
Our linked service connects always to a NEW JOB CLUSTER but we are getting below error while execute the ADF pipelines.
Run result unavailable: job failed with error message Library
installation failed for library due to user error for whl:
"dbfs:/FileStore/jars/ephem-4.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
. Error messages: Library installation attempted on the driver node of
cluster 1226-023738-9cm6lm7d and failed. Please refer to the following
error message to fix the library or contact Databricks support.
Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE.
Error Message:
java.util.concurrent.ExecutionException:
java.io.FileNotFoundException:
dbfs:/FileStore/jars/ephem-4.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Related

Spark Operator and jmx_exporter failing

I've just migrated k8s to 1.22 and with this version spark-operator:1.2.3 didn't work.
I've followed the info at the internet and upgraded to 1.3.3, however all my spark apps are failing with the same error:
Caused by: java.io.FileNotFoundException: /etc/metrics/conf/prometheus.yaml (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:219) at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157) at java.base/java.io.FileReader.<init>(FileReader.java:75) at io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector.<init>(JmxCollector.java:78) at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:29) ... 6 more *** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422 FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed
It used to work on previous version....
unfortunately, I cannot downgrade k8s.
Can you please assist?
PS: there are no additional options passed to executor, just a path to jmx_exporter_0.15
I think your new application requires that prometheus be running in your cluster and it also expects to find the configuration file for prometheus at the path /etc/metrics/conf/prometheus.yaml. Such files are generally setup by creating a ConfigMap in your cluster and then mounting it to every pod that needs it.
My guess is, during the upgrade of spark a step was missed/not provided which was to install prometheus in your cluster before installing your spark applications which used that installation as a dependency. This is the case since you are trying to use a prometheus exporter and if a prometheus installation doesn't exist already, it will not work.
You can try going through the installation again and checking where prometheus comes into play and ensure that this configuration file is provided to your applications.

Azure Databricks error- The output of the notebook is too large. Cause: rpc response

Error Message - job failed with error message The output of the notebook is too large. Cause: rpc response (of 20972488 bytes) exceeds limit of 20971520 bytes
Details:
We are using databricks notebooks to run the job. Job is running on job cluster. This is a streaming job.
Job started failing with above mentioned error.
We do not have any display(), show(), print(), explain method in the job.
We are not using awaitAnyTermination method in the job as well.
We also tried adding "spark.databricks.driver.disableScalaOutput true" to the job but it still did not work. Job is failing with same error.
We have followed all the steps mentioned in this document - https://learn.microsoft.com/en-us/azure/databricks/kb/jobs/job-cluster-limit-nb-output
Do we have any option to resolve this issue or to find out exactly which commands output is causing it to go above 20MB limit.
See the docs regarding structured streaming in prod.
I would recommend migrating to workflows based on jar jobs because:
Notebook workflows are not supported with long-running jobs. Therefore we don’t recommend using notebook workflows in your streaming jobs.

How to re-deploy Azure Resources using AzDevOps Pipeline?

I have some azure resources (Function App, Cosmos etc) that I have successfully deployed in a resource group using terraform init-plan-apply in a Azure Devops Pipeline. From my local CLI I can change the resources in the main.tf and redeploy, presumably because I have the tf state locally. However, when I try to redploy using the pipeline I get the usual error
Error: A resource with the ID "/subscriptions/xxxxxx-xxxx-xxxx-xxxx/resourceGroups/my
-rg" already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for "azurerm_resource_group" for more information.
│
When I try to import using the config described here I get the unhelpful error
##[error]Error: There was an error when attempting to execute the process '/usr/local/bin/terraform'. This may indicate the process failed to start. Error: spawn /usr/local/bin/terraform ENOENT
Am I thinking about pipelines with terraform in the correct way? Should I be trying to import the resource group, or is there a better way to redeploy resources using terraform?
You're right, the tf state is not saved on the Azure DevOps agents.
Common way is to use Azure Storage account to save the tf state.
You can find official Microsoft tutorial about it here.
More guides you can find here, here and here.

AWS DMS Task failed with error: Error executing source loop; Stream component failed at subtask 0

I want to migrate my PostgresDB hosted in Citus cloud service to AWS RDS Aurora Postgres.
I am using AWS DMS service. Have created task but getting following errors:
Last failure message Last Error Stream Component Fatal error. Task
error notification received from subtask 0, thread 0
[reptask/replicationtask.c:2860] [1020101] Error executing source
loop; Stream component failed at subtask 0, component
st_0_QOIS7XIGJDKNPY6RXMGYRLJQHY2P7IQBWIBA5NQ; Stream component
'st_0_QOIS7XIGJDKNPY6RXMGYRLJQHY2P7IQBWIBA5NQ' terminated
[reptask/replicationtask.c:2868] [1020101] Stop Reason FATAL_ERROR
Error Level FATAL
Frankly speaking not able to understand what is wrong here, so any help is appreciated.
cloudwatch logs:
I changed type to Full load it worked so it is not working for ongoing replication Citus Cloud service don't support it.
I had a similar error to this using Aurora PostgreSQL v14.5 and AWS DMS. I was using a DMS Full load + CDC job (using pglogical behind the scenes) to migrate from one table to another (on the same system).
Issue was resolved by rolling back my PostgreSQL version from 14.5 to 13.7.

Spring Cloud Data Flow - Postgresql Metadata issue

We are setting up SCDF on the Kubernetes cluster through kubectl. We are using PostgreSQL for the Metadata process. We got the set up done, and we can see various tables created in the Postgresql, and we can access SCDF UI, but when we try to add any task or try to import repository, then we are getting this error message :
“could not extract ResultSet; SQL [n/a]; nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet “
Did anyone see this error during the installation phase of SCDF?