how to configure MessageChannelPartitionHandler to poll partition result using job repo - spring-batch

Does anyone know how to configure MessageChannelPartitionHandler to poll partition results from db instead of aggregating jms response from each slave?
I am trying to use this as current aggregator is giving out of memory error after a while job is running. Looks like master node is not releasing memory.

After looking at the code of MessageChannelPartitionHandler, all we need is DataSource and JobExplorer reference. You can also configure polling interval but if you dont provide , default is 10sec.

Related

Event trigger using Mongo/Kafka

I've a mongoDB instance with a collection holding calendar events. This is fed using a Kafka application.
These events need to feed into other downstream systems, using Kafka Streams, but what I'd like to invesitgate is whether is would be possible to only trigger an event to a downstream system when the event has just happened (rather then passing future events downstream).
So if an event is received and written to mongo for a date in the future, the downstream system will only know about it as that date is reached and not before.
I've looked at the normal connectors (mongoDB -> Kafka https://www.mongodb.com/kafka-connector) and that functionaility isn't provided.
One of the ways I thought about doing this would be to write a custom application which queries the mongo DB collection on a schedule between the "last run" and "now" to get all the events which occur within these times and create a downstream event into Kafka. (setting indexes on the query elements in the mongo document).
Is there any other way?
Many thanks for reading.
Jill
Instead of query the mongodb I would suggest to create a consumer group to the original kafka topic, which the mongodb data is ingested from and do if you recognize that the date is in the future -> create a rundeck / airflow scheduled task configured to that date, so your consumer logic will be simple.
Another solution you can try is to do some changes to the source connector that you found and try to merge it.
Good luck! Im here if you have any questions

High Memory consumption in MessageChannelPartitionHandler in-case of more partitions

Our use case -> Using Remote partitioning - the job is devided into multiple partitions and using active MQ workers are processing these partitions.
Job is failing with memory issue at MessageChannelPartitionHandler handle method where it is holding more number of StepExecution in memory.(we have around 20K StepExecutions/partitions in this case)
we override message channel partition handler for submitting controlled messages to ActiveMQ and even when we try to poll replies from DB it is having database connection timeout issues and when we increased idle connection this approach as well failing to hold all those StepExecutions in memory.
Either case of our Custom/MessageChannelPartitionHandler we are facing similar issues and these step executions are required to aggregate at master. Do we have any alternative way of achieving this.
Can someone help us to understand better way of handling these long running/huge data processing scenarios?

How can I complete a Druid Task?

I import some data on my Druid Datasource. For that, I use Nifi and Tranquility for streaming injection with minute granularity (for my tests).
I've Ambari for check all my tasks and their status.
All my data are imported on my Datasource correctly and i can request them with Hive query.
When I look my tasks on Ambari, all of them are running, they are never "Complete". If I want to complete one of them, I have to kill it but I loose my data and status task is "FAILED".
I would like to understand what can I do for complete my tasks with success.
Thanks.
I found the problem.
In my tranquility conf, I had declared a big value for the "WindowPeriod".
In fact, the task automatically ends when the "WindowPeriod" end.
For example, "WindowPeriod":"PT10M" means that the task will end in 10 minutes.
Glad that you figured it out! Just want to call out for anyone reading this that Tranquility is deprecated. The streaming ingestion services such as https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion should be proffered for anyone starting a new deployment.

Master/Slave pattern on Google Cloud using Pub/Sub

We want to build a master slave pattern on Google Cloud.
We planned to use Pub/Sub for that (similar to JMS pattern) letting each worker to grab a task from the queue and ack when done.
But, it seems like a subscriber can't get messages sent before it started.
And we're not sure how to make sure each message will be processed by a single 'slave'.
Is there a way to do it? Or another mechanism on google cloud for that?
As far as I understand the master slave pattern, the slaves do the tasks in parallel and the master harvest the result. I'd create a topic for queuing the tasks, and a single subscription attached to this topic, so that all the slaves use this subscription to fetch the task.
Also I'd create another topic/sub pair for publishing results from slaves and the master harvest the result. Alternatively the result can be stored into shared datastore like Cloud Datastore.
You can do this by creating 'single' subscription which is than used by all the slaves. pubsub service delivers new message only once to given subscription so you can be sure that given message will be processed only by 1 slave.
You can also adjust acknowledgement deadline appropriately so that delivery retry doesn't happen. If retry happens than it will result in multiple slaves getting same message.

Oracle Service Bus Proxy Service Scheduler

I need to create a proxy service scheduler that receive messages of the queue after 5 minutes. like queue produce message either a single or multiple but proxy receieve that messages after interval of every 5 minutes. how can i achieve this only using oracle service bus ...
Kindly help me for this
OSB do not provide Scheduler capabilities out of the box. You can do either of the following:
For JMS Queue put infinite retries by not setting retry limit and set retry interval as 5 minutes.
Create a scheduler. Check this post for the same: http://blogs.oracle.com/jamesbayer/entry/weblogic_scheduling_a_polling
Answer left for reference only, messages shouldn't be a subject to complex computed selections in this way, some value comparison and pattern matching only.
To fetch only old enough messages from queue,
not modifying queue or messages
not introducing any new brokers between queue and consumer
not prematurely consuming messages
, use Message Selector field of OSB Proxy on JMS Transport tab to set boolean expression (SQL 92) that checks that message's JMSTimestamp header is at least 5 minutes older than current time.
... and I wasn't successful to quickly produce valid message selector neither from timestamp nor JMSMessageID (it contains time in milis - 'ID:<465788.1372152510324.0>').
I guess somebody could still use it in some specific case.
You can use Quartz scheduler APIs to create schedulers across domains.
Regards,
Sajeev
I don't know whether this works for you, but its working good for me. May be you can use this to do your needful.
Goto Transport Details of your Proxy Service, under Advanced Options tab, set the following fields.
Polling Frequency (Mention your frequency 300 sec(5 min))
Physical Directory (may be here you need to give your Queue path)