This is my business case - I receive a JMS message that has a unique database ID based on which I need to retrieve details from DB. Here are 2 options I'm exploring
a) Write a POJO JMS receiver - Then start the batch job. Now the challenge i'm facing is how do I pass the JMS message to the ItemReader? JobExecutionContext - This is not thread safe
b) Write the JMS receiver a ItemReader - Is it possible?
Note: I don't plan to use Spring Integration.
For option a): You don't need the job execution context. Your Jms listener can start the job with the ID as a parameter, not as an attribute in the job execution context.
For option b): Yes that's possible. you can use the JmsItemReader for that.
Hope this helps.
Related
I have many transactional consumers with a ChainedKafkaTransactionManager based on a JpaTransactionManager and a KafkaTransactionManager (all #KafkaListener's).
The JPA one needs a ThreadLocal variable to be set, to be able to know to which DB to connect to (is the tenant id).
When starting the application, in the onPartitionsAssigned listener, spring-kafka is trying to create a chained txn, hence trying to create a JPA txn, but there's no tenant set, then it fails.
That tenant is set through a http filter and/or kafka interceptors (through event headers).
I tried using the auto-wired KafkaListenerEndpointRegistry with setAutoStartup(false), but I see that the consumers don't receive any events, probably because they aren't initialized yet (I thought they were initialized on-demand).
If I set a mock tenant id and call registry.start() when the application is ready, the initializations seem to be done in other threads (probably because I'm using a ConcurrentKafkaListenerContainerFactory), so it doesn't work.
Is there a way to avoid the JPA transaction on that initial onPartitionsAssigned listener, that is part of the consumer initialization?
If your chained TM has the KafkaTM first, followed by JPA TM (which would be the normal case), you can achieve similar functionality by just injecting the Kafka TM into the container and using #Transactional (with just the JPA TM on the listener) to start the JPA transaction when the listener is called.
The time between the transaction commits will be marginally increased but it would provide similar functionality.
If that won't work for you, open a GitHub issue; we can either disable the initial commit on assignment, or do it without a transaction at all (optionally).
I have an issue when I want a return my data to a queue when my service is down, after reading batch with data. If I goos understood in amqp I can use acknowledge, but in spring bath documentation I don't see any information about that. Also, I check the source code for AmqpItemReader and I don't see any flow for acknowledge. Do I need to implement custom ItemReader with this flow or missed something?
The AmqpItemReader uses a simple RabbitTemplate.receive() operation which acks the message immediately, unless it is running in a transaction.
The only way to control the acks is to use transactions (with a RabbitTransactionManager).
The transaction manager will ack or requeue the message if the transaction is committed or rolled-back, respectively.
I have an outbound channel adapter (in this case is SFTP but it would be the same for a JMS or WS) at the end of a Spring Integration flow. By using direct channels every time there is a messaging flowing, it will be sent out synchronously.
Now, I need to process messages all the way until they reach the outbound adapter, but wait for a predetermined interval before sending them out. In other words batching the send operation.
I know the Spring Batch project might offer a solution to this but I need to find a solution with Spring Integration compoonents (in the int-* namespaces)
What would be a typical pattern to achieve this?
The Aggregator pattern is for you.
In your particular case I'd call that like window, because you don't have any specific correlation to group messages, but just need to build a batch as you call it.
So, I think your Aggregator config may look like:
<int:aggregator input-channel="input" output-channel="output"
correlation-strategy-expression="1"
release-strategy-expression="size() == 10"
expire-groups-upon-completion="true"
send-partial-result-on-expiry="true"/>
correlation-strategy-expression="1" means group any incoming messages
release-strategy-expression="size() == 10" allows to form and release batches by 10 messages
expire-groups-upon-completion="true" says to aggregator to remove the releases group from it store. That allow to for a new group for the same correlationKey (1 in our case)
send-partial-result-on-expiry="true" specifies that normal release operation (send to the output-channel) must be done on expire function when we don't have enough messages to build a whole batch (size 10 in our case). For these options, please, follow with documentation mentioned above.
I am new at Spring batch and I am having an issue implementing my business use case with Spring batch.
Basically, I am reading data from a database i.e. a List of subscribers to a newsletter. I then need to send an email to each subscriber as well as to insert data into the database in order to know which subscriber the email was sent to.
I use an ItemProcessor implementation whose process method returns a MimeMessage and takes a subscriber as an argument; the writer associated with this processor is of type: org.springframework.batch.item.mail.javamail.MimeMessageItemWriter.
The issue is that I need another writer for the database inserts (possibly using a CompositeItemWriter) that takes a List of subscribers as an argument and all I have as input is a MimeMessage from the above ItemProcessor.
Can anyone please help?
From what you've said using the ItemProcessor interface to save the message to the database is conceptually not right. You need to use ItemWriter for that. You can implement writing to DB as ItemWriter and sending the mail message as ItemWriter and use CompositeItemWriter to combine them.
Subscriber is passed to these item writers.
The transformation of Subscriber to MimeMessage is done by 2nd writer internally before transferring to MimeMessageItemWriter (which is aggregated by this writer).
Sending the message to subscriber should be done after saving to the DB, as DB can be rolled back if something goes wrong with sending the message (if you need that functionality), and your batch size should be 1 (otherwise rollback will wrongly discard all notifications which have been successfully sent).
I am a new starter in Flink, I have a requirement to read data from Kafka, enrich those data conditionally (if a record belongs to category X) by using some API and write to S3.
I made a hello world Flink application with the above logic which works like a charm.
But, the API which I am using to enrich doesn't have 100% uptime SLA, so I need to design something with retry logic.
Following are the options that I found,
Option 1) Make an exponential retry until I get a response from API, but this will block the queue, so I don't like this
Option 2) Use one more topic (called topic-failure) and publish it to topic-failure if the API is down. In this way it won't block the actual main queue. I will need one more worker to process the data from the queue topic-failure. Again, this queue has to be used as a circular queue if the API is down for a long time. For example, read a message from queue topic-failure try to enrich if it fails to push to the same queue called topic-failure and consume the next message from the queue topic-failure.
I prefer option 2, but it looks like not an easy task to accomplish this. Is there is any standard Flink approach available to implement option 2?
This is a rather common problem that occurs when migrating away from microservices. The proper solution would be to have the lookup data also in Kafka or some DB that could be integrated in the same Flink application as an additional source.
If you cannot do it (for example, API is external or data cannot be mapped easily to a data storage), both approaches are viable and they have different advantages.
1) Will allow you to retain the order of input events. If your downstream application expects orderness, then you need to retry.
2) The common term is dead letter queue (although more often used on invalid records). There are two easy ways to integrate that in Flink, either have a separate source or use a topic pattern/list with one source.
Your topology would look like this:
Kafka Source -\ Async IO /-> Filter good -> S3 sink
+-> Union -> with timeout -+
Kafka Source dead -/ (for API call!) \-> Filter bad -> Kafka sink dead