I have a simple Talend standard job containing two Kafka inputs as you can see in the picture, the problem is when I run the job just one of the Kafka input start, the ideal condition that I expected to happen is multiple Kafka input running at the same time, is there is any configuration that I miss?
you can easily add the tParallelize component at the beginning of the talend job and it will be executed at the same time, if you have multiple sub jobs it can work too.
I think the Talend job default runs in serial we just can't see which component runs first because the process is so fast.
Related
I am trying to do a simple integration between our control-M batch environment and our kafka environment. What i want to have is to be able to public where certain jobs or jobnets in Kafka are complete, have an issue with extra information like start and end times.
So we can implement a stream processor that would abstract the details away and tell the our event processing system that the daily end of day processing is complete. Financial/banking.
I looked to see if there is API of some sorts but i only see maintenance of reports not reporting the actual running of the batch
Currently developing a Beam pipeline using Flinkrunner and Apache Beam 2.29.
As per suggestion in Beam File processing patterns, I have an unbound pipeline listening to a Kafka topic for a CSV filename and once received processes it through TextIO readFile().
We end up with two PCollections, one is from the file being processed and the other is from a lookup from an external datastore. The PCollections are joined using the Join extension which forces us to setup some triggering on these two PCollections. So I have defined something like the below for each PCollection in hopes that the end result following the join would produce some new output every time a new filename arrives from the Kafka topic we are monitoring.
PCollection<KV<String, Map<String, AttributeValue>>> lookupTable = LookupTable.getPspLookupData(p, lookupTableName, lookupTableRegionFilter)
.apply("WindowB", Window.<KV<String, Map<String, AttributeValue>>>into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(15))))
.withAllowedLateness(Duration.standardSeconds(5))
.discardingFiredPanes()
);
But it simply does not work more than once. It seems that if I send one or more kafka messages before the 15 seconds defined in plusDelayOf() the data gets processed but anything sent past those 15 seconds (from pipeline startup) is never processed and the pipeline is simply "stuck" despited having defined a trigger of Repeatedly.forever...
I have tried numerous combinations and I simply cant get it to work. Would welcome any ideas or suggestions to get this to work. Feels like I am missing something basic but I have been at this for hours.
Thanks,
Serge
I import some data on my Druid Datasource. For that, I use Nifi and Tranquility for streaming injection with minute granularity (for my tests).
I've Ambari for check all my tasks and their status.
All my data are imported on my Datasource correctly and i can request them with Hive query.
When I look my tasks on Ambari, all of them are running, they are never "Complete". If I want to complete one of them, I have to kill it but I loose my data and status task is "FAILED".
I would like to understand what can I do for complete my tasks with success.
Thanks.
I found the problem.
In my tranquility conf, I had declared a big value for the "WindowPeriod".
In fact, the task automatically ends when the "WindowPeriod" end.
For example, "WindowPeriod":"PT10M" means that the task will end in 10 minutes.
Glad that you figured it out! Just want to call out for anyone reading this that Tranquility is deprecated. The streaming ingestion services such as https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion should be proffered for anyone starting a new deployment.
Here is what I am trying to do, not sure if it is possible:
Autosys gets File1:10pm starts DataStage Job 1.1:10pm
Job1.1:10pm is still running
Autosys gets File1:20pm, it needs to start the same Job1 but run it as Job1.1:20pm, even though Job1.1:10pm is still running & not wait for it to finish, go ahead & run.
Can Autosys call the same DataStage job every time it gets a new file & run it with the new timestamp as the invocation id. Without waiting for the previous job to finish.
Thanks ya'll
Yes - absolutely - this is possible. To enable different InvocationIds you have to check the "multiple instance" property in the jobs properties. With this you allow multiple simultaneous runs of the job.
The invocationID can be a parameter as well when calling it from a sequence.
When your (multiple intance) job writes to a file make sure that each filename is unique to avoid side effects due to the multiple runs at the same time. This can be done by specifying DSJobInvocationId as part of the filename. Note that it is a parameter provided by DataStage which needs to be written exactly as shown with the upper and lower case letters. DataStage will the replace it with the content of your job invocationid at runtime.
Am invoking spring jobs based on event, however i hv couple jobs to execute on specific event which could execute in parallel, Is there any utility class which can execute multiple jobs in parallel? Thanks
We don't offer anything specific for launching multiple jobs based on a single message out of the box with Spring Batch. However, writing a message handler that can handle that scenario should be pretty trivial.