error on creating spring Embedded kafka instance - apache-kafka

This is related to embedded kafka server provided by spring. On running my test case, embedded kafka intantiation (as show below) fails.
public static KafkaEmbedded embeddedKafka = new KafkaEmbedded(1, true, KAFKA_TOPIC);
instantiation fails with following error.
1> C:\Users\r2dev\AppData\Local\Temp\kafka-1587343850239557903\version-2\log.1: The process cannot access the file because it is being used by another process
2> C:\Users\r2dev\AppData\Local\Temp\kafka-7315008084340411800.lock: The process cannot access the file because it is being used by another process.
I am using "1.2.0.RELEASE" spring kafka version and Java 8.
Any one faced this issue and were able to fix this issue. Please let me know
Thanks in advance.

Related

Spring batch integration using OutBoundGateway and ReplyingKafkaTemplate

My Goal
I need to read a file and divide each line as a message and send to kafka from a spring batch project and another spring integration project will be receiving the messages to process it in a async way. I want to return those messages after processing to the batch project and create 4 different files out of those messages.
Here I am trying to use OutBoundGateway and ReplyingKafkaTemplate. I am unable to configure it properly... Is there any example or reference guide to configure it.
I have checked spring batch integration samples github repository... There is no sample for outBoundGateway or ReplyingKafkaTemplate.
Thanks in Advance.
For ReplyingKafkaTemplate logic in Spring Integration there is a dedicated KafkaProducerMessageHandler which can be configured with a ReplyingKafkaTemplate.
See more info in docs:
https://docs.spring.io/spring-integration/docs/current/reference/html/kafka.html#kafka-outbound-gateway
And more about ReplyingKafkaTemplate:
https://docs.spring.io/spring-kafka/reference/html/#replying-template
Probably on the other a KafkaInboundGateway must be configured, respectively:
https://docs.spring.io/spring-integration/docs/current/reference/html/kafka.html#kafka-inbound-gateway

Spring Cloud Data Flow Custom Scala Processor unable to send/receive data from Starter Apps (SCDF 2.5.1 & Spring Boot 2.2.6)

I have been working on creating a simple custom processor in Scala for Spring Cloud Data Flow and have been running into issues with sending/receiving data from/to starter applications. I have been unable to see any messages propagating through the stream. The definition of the stream is time --trigger.time-unit=SECONDS | pass-through-log | log where pass-through-log is my custom processor.
I am using Spring Cloud Data Flow 2.5.1 and Spring Boot 2.2.6.
Here is the code used for the processor - I am using the functional model.
#SpringBootApplication
class PassThroughLog {
#Bean
def passthroughlog(): Function[String, String] = {
input: String => {
println(s"Received input `$input`")
input
}
}
}
object PassThroughLog {
def main(args: Array[String]): Unit = SpringApplication.run(classOf[PassThroughLog], args: _ *)
}
application.yml
spring:
cloud:
stream:
function:
bindings:
passthroughlog-in-0: input
passthroughlog-out-0: output
build.gradle.kts
// scala
implementation("org.scala-lang:scala-library:2.12.10")
// spring
implementation(platform("org.springframework.cloud:spring-cloud-dependencies:Hoxton.SR5"))
implementation(platform("org.springframework.cloud:spring-cloud-stream-dependencies:Horsham.SR5"))
implementation("org.springframework.boot:spring-boot-starter")
implementation("org.springframework.boot:spring-boot-starter-actuator")
implementation("org.springframework.cloud:spring-cloud-starter-function-web:3.0.7.RELEASE")
implementation("org.springframework.cloud:spring-cloud-starter-stream-kafka:3.0.5.RELEASE")
I have posted the entire project to github if the code samples here are lacking. I also posted the logs there, as they are quite long.
When I bootstrap a local Kafka cluster and push arbitrary data to the input topic, I am able to see data flowing through the processor. However, when I deploy the application on Spring Cloud Data Flow, this is not the case. I am deploying the app via Docker in Kubernetes.
Additionally, when I deploy a stream with the definition time --trigger.time-unit=SECONDS | log, I see messages in the log sink. This has convinced me the problem lies with the custom processor.
Am I missing something simple like a dependency or extra configuration? Any help is greatly appreciated.
When using Spring Cloud Stream 3.x version in SCDF, there's an additional property that you will have to set to let SCDF know what channel bindings are configured as input and output channels.
See: Functional Applications
Pay attention specifically to the following properties:
app.time-source.spring.cloud.stream.function.bindings.timeSupplier-out-0=output
app.log-sink.spring.cloud.stream.function.bindings.logConsumer-in-0=input
In your case, you will have to map passthroughlog-in-0 and passthroughlog-out-0 function bindings to input and output respectively.
Turns out the problem was with my Dockerfile. For ease of configuration, I had a build argument to specify the jar file used in the ENTRYPOINT. To accomplish this, I used the shell version of ENTRYPOINT. Changing up my ENTRYPOINT to the exec version solved my issue.
The shell version of ENTRYPOINT does not play well with image arguments (docker run <image> <args>), and hence SCDF could not pass the appropriate arguments to the container.
Changing my Dockerfile from:
FROM openjdk:11.0.5-jdk-slim as build
ARG JAR
ENV JAR $JAR
ADD build/libs/$JAR .
ENTRYPOINT java -jar $JAR
to
FROM openjdk:11.0.5-jdk-slim as build
ARG JAR
ADD build/libs/$JAR program.jar
ENTRYPOINT ["java", "-jar", "program.jar"]
fixed the problem.

Spring Cloud Stream Rabbit Binder Routing Key always '#'

Version: Spring Boot: 1.4.2.RELEASE
Spring Cloud Deps: Brixton.SR7
Here is my application.properties of a processor app.
logging.level.=DEBUG
server.port=0
logging.file=traveller-events-processor.log
server.port=0
spring.cloud.stream.rabbit.bindings.input.consumer.bindingRoutingKey='aa'
spring.cloud.stream.rabbit.bindings.input.consumer.bindingRoutingKey=aa
spring.cloud.stream.rabbit.bindings.input.consumer.bindQueue=true
spring.cloud.stream.rabbit.bindings.input.consumer.routing-key='aa'
spring.cloud.stream.rabbit.bindings.input.consumer.routingKey='aa'
spring.cloud.stream.bindings.input.destination=events-exchange
spring.cloud.stream.bindings.input.group=eventconsumersgroup
spring.cloud.stream.bindings.output.destination=work.out
spring.cloud.stream.bindings.output.contentType=text/plain
spring.cloud.stream.bindings.output.binder=rabbit
spring.cloud.stream.bindings.output.group=traveller-events-output-group
When I start this app, events-exchange is created as expected and bound to a queue named: events-exchange.eventconsumersgroup (which is also ok). But the routingKey is always '#'. I've tried with all the options I have fished from various documentations. Am I missing something here?
I want this app to only subscribe to certain messages (which I want to achieve via the routing key).
I see that Brixton.SR7 uses 1.0.2.RELEASE of Spring Cloud Stream and I don't seem to find the routingKey as a Rabbit consumer property. Do you want to upgrade to Spring Cloud Camden release or the latest one so that you can try using the consumer property: bindingRoutingKey as mentioned here

IBM Integration Bus: The PIF data could not be found for the specified application

I'm using IBM Integration Bus v10 (previously called IBM Message Broker) to expose COBOL routines as SOAP Web Services.
COBOL routines are integrated into IIB through MQ queues.
We have imported some COBOL copybooks as DFDL schemas in IIB, and the mapping between SOAP messages and DFDL messages is working fine.
However, when the message reaches a node where a serialization of the message tree has to take place (for example, a FileOutput or a MQ request), it fails with the following error:
"The PIF data could not be found for the specified application"
This is the last part of the stack trace of the exception:
RecoverableException
File:CHARACTER:F:\build\slot1\S000_P\src\DataFlowEngine\TemplateNodes\ImbOutputTemplateNode.cpp
Line:INTEGER:303
Function:CHARACTER:ImbOutputTemplateNode::processMessageAssemblyToFailure
Type:CHARACTER:ComIbmFileOutputNode
Name:CHARACTER:MyCustomFlow#FCMComposite_1_5
Label:CHARACTER:MyCustomFlow.File Output
Catalog:CHARACTER:BIPmsgs
Severity:INTEGER:3
Number:INTEGER:2230
Text:CHARACTER:Caught exception and rethrowing
Insert
Type:INTEGER:14
Text:CHARACTER:Kcilmw20Flow.File Output
ParserException
File:CHARACTER:F:\build\slot1\S000_P\src\MTI\MTIforBroker\DfdlParser\ImbDFDLWriter.cpp
Line:INTEGER:315
Function:CHARACTER:ImbDFDLWriter::getDFDLSerializer
Type:CHARACTER:ComIbmSOAPInputNode
Name:CHARACTER:MyCustomFlow#FCMComposite_1_7
Label:CHARACTER:MyCustomFlow.SOAP Input
Catalog:CHARACTER:BIPmsgs
Severity:INTEGER:3
Number:INTEGER:5828
Text:CHARACTER:The PIF data could not be found for the specified application
Insert
Type:INTEGER:5
Text:CHARACTER:MyCustomProject
It seems like something is missing in my deployable BAR file. It's important to say that my application has the message flow and it depends on a shared library that has all the .xsd files (DFDLs).
I suppose that the schemas are OK, as I've generated them using the Toolkit wizard, and the message parsing works well. The problem is only with serialization.
Does anybody know what may be missing here?
OutputRoot.Properties.MessageType must contain the name of the message in the DFDL schema. Additionally when the DFDL schema is in a shared library, OutputRoot.Properties.MessageSet must contain the name of the library.
Sounds as if OutputRoot.Properties is not pointing at the shared library. I cannot remember which subfield does that job - it is either OutputRoot.Properties.MessageType or OutputRoot.Properties.MessageSet.
You can easily check - just check the contents of InputRoot.Properties after an input node that has used the same shared libary.
Faced a similar problem. In my case, a message flow with an HttpRequest node using a DFDL domain parser / format to parse an HTTP response from the remote system threw this error (PIF data could not be found for the specified application). "Re-selecting" the same parser domain & message type on the node followed by build / redeploy solved the problem. Seemed to be a project reference related issue within the IIB toolkit.
you need to create static libraries and refer to application.
in compute node ur coding is based on dfdl body

Triggering spark jobs with REST

I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.
I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job.
I have now few design options.
Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be JobScheduler .
/*Can this Code be abstracted from the application and written as
as a seperate job. Because my understanding is that the
Application code itself has to have the addJars embedded
which internally sparkContext takes care.*/
SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars(
new String[] { "/path/to/jar/submit/cluster" })
.setMaster("/url/of/master/node");
sparkConf.setSparkHome("/path/to/spark/");
sparkConf.set("spark.scheduler.mode", "FAIR");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.setLocalProperty("spark.scheduler.pool", "test");
// Application with Algorithm , transformations
extending above point have multiple versions of jobs handled by service.
Or else use an Spark Job Server to do this.
Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.
Note : I am using a standalone cluster from spark.
kindly help.
It turns out Spark has a hidden REST API to submit a job, check status and kill.
Check out full example here: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
Just use the Spark JobServer
https://github.com/spark-jobserver/spark-jobserver
There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch
Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.
Here is a good client that you might find helpful: https://github.com/ywilkof/spark-jobs-rest-client
Edit: this answer was given in 2015. There are options like Livy available now.
Even I had this requirement I could do it using Livy Server, as one of the contributor Josemy mentioned. Following are the steps I took, hope it helps somebody:
Download livy zip from https://livy.apache.org/download/
Follow instructions: https://livy.apache.org/get-started/
Upload the zip to a client.
Unzip the file
Check for the following two parameters if doesn't exists, create with right path
export SPARK_HOME=/opt/spark
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
Enable 8998 port on the client
Update $LIVY_HOME/conf/livy.conf with master details any other stuff needed
Note: Template are there in $LIVY_HOME/conf
Eg. livy.file.local-dir-whitelist = /home/folder-where-the-jar-will-be-kept/
Run the server
$LIVY_HOME/bin/livy-server start
Stop the server
$LIVY_HOME/bin/livy-server stop
UI: <client-ip>:8998/ui/
Submitting job:POST : http://<your client ip goes here>:8998/batches
{
"className" : "<ur class name will come here with package name>",
"file" : "your jar location",
"args" : ["arg1", "arg2", "arg3" ]
}