spring cloud stream app starter File Source to Spring Batch Cloud Task - spring-batch

I have a spring batch boot app which takes a flat file as input . I converted the app into cloud task and deployed in spring local data flow server. Next , I created a stream starting with File Source -> tasklaunchrequest-transform -> task-launcher-local which starts my batch cloud task app .
It looks like that the File does not come into the batch app . I do not see anything in the logs to indicate that.
I checked the docs at https://github.com/spring-cloud-stream-app-starters/tasklaunchrequest-transform/tree/master/spring-cloud-starter-stream-processor-tasklaunchrequest-transform
It says
Any input type. (payload and header are discarded)
My question is how do I pass the file as payload from File Source to the Batch app which seems to be a very basic feature.
any help is very much appreciated.

You'll need to write your own transformer that takes the data from the source and packages it up so your task can consume it.

Related

Spring batch integration using OutBoundGateway and ReplyingKafkaTemplate

My Goal
I need to read a file and divide each line as a message and send to kafka from a spring batch project and another spring integration project will be receiving the messages to process it in a async way. I want to return those messages after processing to the batch project and create 4 different files out of those messages.
Here I am trying to use OutBoundGateway and ReplyingKafkaTemplate. I am unable to configure it properly... Is there any example or reference guide to configure it.
I have checked spring batch integration samples github repository... There is no sample for outBoundGateway or ReplyingKafkaTemplate.
Thanks in Advance.
For ReplyingKafkaTemplate logic in Spring Integration there is a dedicated KafkaProducerMessageHandler which can be configured with a ReplyingKafkaTemplate.
See more info in docs:
https://docs.spring.io/spring-integration/docs/current/reference/html/kafka.html#kafka-outbound-gateway
And more about ReplyingKafkaTemplate:
https://docs.spring.io/spring-kafka/reference/html/#replying-template
Probably on the other a KafkaInboundGateway must be configured, respectively:
https://docs.spring.io/spring-integration/docs/current/reference/html/kafka.html#kafka-inbound-gateway

How can I trigger composed task runner through processor application in spring cloud data flow stream?

I have a requirement in which I have to trigger composed task runner (the logic of which is written in processor). How can I do this through stream?
Requirement :
I have to poll a particular directory, whenever there are 2 files present in that directory, my processor will decide that composed task runner should be launched or not. If yes, composed task runner will be launched with certain params and will process both files one by one.
Can anybody please help me in writing stream definition for this scenario?
Currently I am trying to trigger it like below:
stream create exmaple --definition "triggertask --triggertask.uri=file:///Users/batch/apps/timestamp-task-2.1.0.RELEASE.jar --trigger.fixed-delay=30 | trigger-task-processor | tasklaunchrequest-transform --graph='xyz-d1 && xyz-d2' --increment-instance-enabled=true --spring.datasource.url=... --composed-task-arguments='some arguments' | taskLauncher"
Where triggertask is trigger task source
trigger-task-processor is a processor which have business logic about trigger event)
tasklaunchrequest-transform is a processor(custom implementation of composed task runner)
taskLauncher is a task launcher local sink rabbit
I believe you can simplify your design using tasklauncher-dataflow sink application which acts as a REST client to SCDF server and issues the task launch request. You can launch your composed task this way.
In your case, you can have something like this:
file | tasklauncher-dataflow
The file is out of the box app and you can customize it based on your needs to send an output to the tasklauncher-dataflow sink.
You can find some of the references related to it as follows:
https://content.pivotal.io/blog/need-24x7-etl-then-move-to-cloud-native-file-ingest-with-spring-cloud-data-flow
https://dataflow.spring.io/docs/recipes/file-ingest/sftp-to-jdbc/
https://github.com/spring-cloud-stream-app-starters/tasklauncher-dataflow/tree/master/spring-cloud-starter-stream-sink-task-launcher-dataflow

Is it possible to use "Custom Sources and Sinks" to write/append file during Dataflow pipeline execution?

My program relies on local system storage to write a file that is being generated by the program itself. Hence executing the job in "DirectPipelineRunner" mode. Below is the flow,
One of my function - Makes multiple REST API requests and creates/appends to a file(Output.txt) in local system storage.
Pipeline: a) Upload generated file to GCS 2) Read the file from GCS c) Perform transformation d) Write to BigQuery.
Since, my program writes/appends API response to local system storage, I'm executing the pipeline in DirectPipelineRunner mode.
Is it possible to have temporary space in cloud to remove dependency on local file system So that I can execute the pipleline in DataflowPipelineRunner mode?
I guess Custom Sources and Sinks can be used here. Can someone add some light on this problem statement?

when using the spring cloud data flow sftp source starter app file_name header is not found

spring cloud dataflow sftp source starter app states that file name should be in the headers (mode=contents). However, when I connect this source to a log sink, I see a few headers (like Content-Type) but not the file_name header. I want to use this header to upload the file to S3 with the same name.
spring server: Spring Cloud Data Flow Local Server (v1.2.3.RELEASE)
my apps are all imported from here
stream definition:
stream create --definition "sftp --remote-dir=/incoming --username=myuser --password=mypwd --host=myftp.company.io --mode=contents --filename-pattern=preloaded_file_2017_ --allow-unknown-keys=true | log" --name test_sftp_log
configuring the log application to --expression=#root --level=debug doesn't make any difference. Also, writing my own sink that tries to access the file_name header I get an error message that such header does not exist
logs snippets from the source and sink are in this gist
Please follow this link bellow, You need to code your own Source and populate such a header manually downstream already after FileReadingMessageSource. And only after that send the message with content and appropriate header to the target destination.
https://github.com/spring-cloud-stream-app-starters/file/issues/9

Configuring Spring Batch jobs via database instead of xml

i am new to spring batch,i need few clarifications regarding spring batch admin.
can i do job configuration related information in database instead of uploading XML file based configuration???