Spring Task in Spring Cloud Dataflow on PCF can't find java - spring-cloud

i have a Spring Cloud Task fat jar that i have successfully deployed to SCDF running on PCF. i have created a definition for it and can therefore run it from the dashboard. fwiw it reads and writes from a database using Spring JDBC.
i'm trying to now set it up to run in a scheduled way and am having issues. i created a stream with a triggertask source and a task-launcher-local sink, and have configured the triggertask URI to point to the fat jar (via http, using a staticfile PCF pushed app).
the dashboard shows the two PCF apps (one for triggertask, one for task-local-launcher) both starting successfully, and it all runs, but the task fails every time with the error:
Caused by: java.io.IOException: Cannot run program "java" (in directory "/home/vcap/tmp/spring-cloud-dataflow-5903184636016162160/Task--582903409-1502669137014/Task--582903409"): error=2, No such file or directory
from what i can tell and surmise, the PCF app running the stream tries to fork and exec a java call, but since java is not in the path for PCF app containers i get the error
am i right? either way, how can i get the Spring Cloud Task (jar) to successfully run?
Spring Cloud Data Flow: Server
1.2.3 (using built spring-cloud-dataflow-server-cloudfoundry-1.2.3.BUILD-SNAPSHOT.jar)
Spring Cloud Data Flow: Shell
1.2.3 (using downloaded spring-cloud-dataflow-shell-1.2.3.RELEASE.jar)
Deployment Environment
PCF v1.11.6 (on Azure)
pcf dev v0.26.0 (on mac)
App Starters
http://bit-dot-ly/1-0-4-GA-stream-applications-rabbit-maven
Logs
link to log

The stream definition is missing from the post. It is possible that you're using the tasklauncher-local sink, which is compatible only when using SCDF's local-server and it will fail with the attached error when running in CF. Please make sure you're using tasklauncher-cloudfoundry sink. This application was added in the latest release of app-starters.
As pointed in the previous SO thread, it is highly recommended that you use the latest release of app-starters (1.0.4 is at least 10 months old). The latest releases can be found at the project site.

Related

Unable to use the google.cloud.sql.connector module in Google composer

I am trying to schedule a dataflow pipeline job to read content from a CloudSQL SQLServer instance and write it to the BigQuery table. I'm using the google.cloud.sql.connector[pytds] for setting connection. The manual dataflow job runs successfully when I run it through the Google cloud shell. The airflow version(using Google cloud composer) fails, giving Name error.
'NameError: name 'Connector' is not defined'
I have enabled the save-main-session option. Also, I have mentioned the connector module in the py_requirements option and it is being installed(as per the airflow logs).
py_requirements=['apache-beam[gcp]==2.41.0','cloud-sql-python-connector[pytds]==0.6.1','pyodbc==4.0.34','SQLAlchemy==1.4.41','pymssql==2.2.5','sqlalchemy-pytds==0.3.4','pylint==2.15.4']
[2022-11-02 07:40:53,308] {process_utils.py:173} INFO - Collecting cloud-sql-python-connector[pytds]==0.6.1
[2022-11-02 07:40:53,333] {process_utils.py:173} INFO - Using cached cloud_sql_python_connector-0.6.1-py2.py3-none-any.whl (28 kB)
But it seems the import is not working.
You have to install the PyPi packages in Cloud Composer nodes, you have a tab in the GUI and Composer page :
Add all the needed packages for your Dataflow job in Composer via this page, except Apache Beam and Apache Beam GCP because Beam and Google Cloud dependencies are already installed in Cloud Composer.
Cloud Composer is the runner of your Dataflow job and the runner will instantiate the job. To be able to instantiate the job correctly, the runner needs to have the dependencies installed.
Then the Dataflow job in execution mode, will use the given py_requirements or setup.py file in the workers.
py_requirements or setup.py must also contains the needed Packages to execute the Dataflow job.

Failing while trying to deploy Talend Agent in Pivtal Cloud FOundry

I have been trying to deploy Talend Agent as app in PCF, I literally have no idea about Talend. However for PCF guy, its an java jar file for me what i got from DATA team.
I am getting no buildpack supported error. I also tried passing java buildpack by command but failed again with incompatible buildpack.
Error: No container can run this application. Please ensure that you've pushed a valid JVM artifact or artifacts using the -p command line argument or path manifest entry. Information about valid JVM artifacts can be found at https://github.com/cloudfoundry/java-buildpack#additional-documentation.
Failed to compile droplet: Failed to run finalize script: exit status 1
I was expecting this to be deployed as an App which i can access.
Do we have any one who can help me with this?
The CF Java buildpack expects a Java jar file to have certain characteristics in order for it to know how to execute the code in the jar file. The most common characteristics are a self-executable Spring Boot app, an app containing a Main class, and an app containing Tomcat.
I don't know anything about the Talend Agent, but a typical Java agent jar file is not meant to be executed as a stand-alone app. An agent is meant to be installed in the JVM used to run an app, in order to instrument the JVM and/or the app. An typical agent jar file won't have any of the execution entry points recognized by the CF Java buildpack, and therefore the buildpack will reject it with an error message similar to the one you show.
The CF Java buildpack does understand how to install several specific agents (listed under Standard Frameworks in the buildpack docs) into the JVM when an app is deployed. The Talend Agent is not currently in this list. If it is in fact a typical Java agent jar file, you would have to modify the Java buildpack to add support for it.

How to deploy a self-contained web app in a single instance without downtime?

I'm currently experimenting about building a self-contained web app that runs with embedded server. To run the app is just simply by executing
java -jar application.jar -Dserver.port=80
The thing is, starting the application from executing the line above until it can really listen to incoming requests needs a lot of time. In the sample above, I'm using Java's Spring Boot example, but it can be other libraries or other language.
With that problem above, deploying by killing the current server PID, symlink the new artifact, and then running the new artifact will have a slight downtime.
Another constraint is that, I can only have one instance at one time. So provisioning a new instance then deploying the new artifact into the new instance and then switch the instances from the load-balancer is out of question.
So, how do I do this?

How to create a Spark Streaming jar that would work in AWS EMR?

I've been developing a Spark Streaming application with Eclipse, and I'm using sbt to run it locally.
Now I want to deploy the application on AWS using a jar, but when I try to use the command package of sbt it creates a jar without all dependencies so when I upload it on AWS it won't work because of Scala being missing.
Is there a way to create a uber-jar with SBT? Am I doing something wrong with the deployment of Spark on AWS?
For creating uber-jar with sbt, use sbt plugin sbt-assembly. For more details about creating uber-jar using sbt-assembly refer the blog post
After creating you can run the assembly jar using java -jar command.
But from Spark-1.0.0 onwards the spark-submit script in Spark’s bin directory is used to launch applications on a cluster for more details refer here
You should really be following Running Spark on EC2 that reads:
The spark-ec2 script, located in Spark’s ec2 directory, allows you to
launch, manage and shut down Spark clusters on Amazon EC2. It
automatically sets up Spark, Shark and HDFS on the cluster for you.
This guide describes how to use spark-ec2 to launch clusters, how to
run jobs on them, and how to shut them down. It assumes you’ve already
signed up for an EC2 account on the Amazon Web Services site.
I've only partially followed the document so I can't comment on how well it's written.
Moreover, according to Shipping Code to the Cluster chapter in the other document:
The recommended way to ship your code to the cluster is to pass it
through SparkContext’s constructor, which takes a list of JAR files
(Java/Scala) or .egg and .zip libraries (Python) to disseminate to
worker nodes. You can also dynamically add new files to be sent to
executors with SparkContext.addJar and addFile.

deploy a jee6 web application in glassfish v3.1.1

What things have to taken care for deploy a web appl ( war ) in glassfish v3.1.1 ( glassfish-3.1.1-web-windows.exe installer ) , the appl. is developed using netbeans 7.0.1. I am using postgresql database . Developement machine and Production machine is different and is not connected to each other. Any detailed step by step instruction ?
It all depends on what resources your application would need to run successfully on the application server.
e.g. If your application uses container managed persistence then you have to make sure that you create the required JDBC connection pool and resource on the server before you can deploy your application server. If you check the persistence.xml file you will see if your application uses some jta-datasource (the value provided there is actually the JNDI name of the JDBC resource created on the server). Here you might also have to supply the required JDBC driver to the server if it is not package within the application.
What you can do is install the same application server on your local machine and deploy the application there and see if it fails. If it fails then you can check the stacktrace to find out the reason for failure.