I had developed a Job in Talend and built the job and automated to run the Windows Batch file from the below build
On the Execution of the Job Start Windows Batch file it will invoke the dimtableinsert job and then after it finishes it will invoke fact_dim_combine it is taking just minutes to run in the Talend Open Studio but when I invoke the batch file via the Task Scheduler it is taking hours for the process to finish
Time Taken
Manual -- 5 Minutes
Automation -- 4 hours (on invoking Windows batch file)
Can someone please tell me what is wrong with this Automation Process
The reason of the delay in the execution would be a latency issue. Talend might be installed in the same server where database instance is installed. And so whenever you execute the job in Talend, it will complete as expected. But the scheduler might be installed in the other server, when you call the job through scheduler, it would take some time to insert the data.
Make sure you scheduler and database instance is on the same server
Execute the job directly in the windows terminal and check if you have same issue
The easiest way to know what is taking so much time is to add some logs to your job.
First, add some tWarn at the start and finish of each of the subjobs (dimtableinsert and fact_dim_combine) to know which one is the longest.
Then add more logs before/after the components inside the jobs.
This way you should have a better idea of what is responsible for the slowdown (DB access, writing of some files, etc ...)
Related
I have a scheduled parallel Datastage (11.7) job.
This job has a Hive Connector with a Before and After Statement.
The before statement run ok but After statement remains in running state for several hours (on Hue Log i see this job finished in 1hour) and i have to manually abort it on Datastage Director.
Is there the way to "program an abort"?
For example i want schedule the interruption of the running job every morning at 6.
I hope I was clear :)
Even though you can kill the job - as per other responses - using dsjob to stop the job, this may have no effect because the After statement has been issued synchronously; the job is waiting for it to finish, and (probably) not processing kill signals and the like in the meantime. You would be better advised to work out why the After command is taking too long, and addressing that.
I have a job that uses the Kafka Connector Stage in order to read a Kafka queue and then load into the database. That job runs in Continuous Mode, which it has no time to conclude, since it keeps monitoring the Kafka queue in real time.
For unexpected reasons (say, server issues, job issues etc) that job may terminate with failure. In general, that happens after 300 running hours of that job. So, in order to keep the job alive I have to manually look to the job status and then to do a Reset and Run, in order to keep the job running.
The problem is that between the job termination and my manual Reset and Run can pass several hours, which is critical. So I'm looking for a way to eliminate the manual interaction and to reduce that gap by automating the job invocation.
I tried to use Control-M to daily run the job, but with no success: The first day the Control-M called the job, it ran it fine. But in the next day, when the Control-M did an attempt to instantiate the job again it failed (since it was already running). Besides, the Datastage will never tell back Control-M that a job was successfully concluded, since the job's nature won't allow that.
Said that, I would like to hear ideas from you that can light me up.
The first thing that came in mind is to create a intermediate Sequence and then schedule it in Control-M. Then, this new Sequence would call the continuous job asynchronously by using command line stage.
For the case where just this one job terminates unexpectedly and you want it to be restarted as soon as possible, have you considered calling this job from a sequence? The sequence could be setup to loop running this job.
Thus sequence starts job and waits for it to finish. When job finishes, the sequence will then loop and start the job again. You could have added conditions on job exit (for example, if the job aborted, then based on that job end status, you could reset the job before re-running it.
This would not handle the condition where the DataStage engine itself was shut down (such as for maintenance or possibly an error) in which case all jobs end including your new sequence. The same also applies for a server reboot or other situations where someone may have inadvertently stopped your sequence. For those cases (such as DataStage engine stop) your team would need to have process in place for jobs/sequences that need to be started up following a DataStage or System outage.
For the outage scenario, you could create a monitor script (regardless of whether running the job solo or from sequence) that sleeps/loops on 5-10 minute intervals and then checks the status of your job using dsjob command, and if not running can start that job/sequence (also via dsjob command). You can decide whether that script startup would occur at DataSTage startup, machine startup, or run it from Control M or other scheduler.
Hi i am beginer in Talend Open Studio 5.3.1 version.
currently i am facing issue in project i.e. schedule a job to run every 10 seconds and it monitor the other job and display output as status of another job which means the job is running or idle state.
Currently i am using Talend Open Studio 5.3.1 version by using this version it is possible or not .
explain me how to schelude a job for 10 seconds and display output as status of another job.
can anyone suggest and help me to solve my problem.
We should think a bit out of the box here. I'd solve this by using Project level logging: https://help.talend.com/display/TalendOpenStudioforBigDataUserGuide520EN/2.6+Customizing+project+settings
You'll have the jobs status stored in a database table, you just have to check whether the last execution of the job is still running or not. (Self join the stats table)
Monitoring jobs is not supported in Talend Open Studio, but there is some workaround:
Use a master job that launch the job to be monitored using tRunJob component, and your master job will have an idea whats going on.
Use empty files to synchronize your jobs, an empty file with a tricky name created by monitored jobs and the master job check them to get other jobs states.
Much easier is to use Quartz.
I don't understand why this isn't working, I set up a pgAgent job to send a NOTIFY from the database every hour
The steps
The schedule
Turns out that problem was that heroku doesn't support agAgent and the database was running on heroku, I ended making a work around the scheduling tasks using windows task scheduler - it's not the best solution but it does the job I needed to do...
Good day everyone.
I have single server for Chronos, Mesos and Zookeeper, and i want to use Chronos as something, what will run my scripts daily. Some scripts today, some tomorrow and so on..
The problem is when i'm trying to launch tasks one after another, only first one executes correctly, another one is lost somewhere. If i launch first then take a pause of 3-4 seconds and launch another - they both are launched, but sequentially.
And i need to run them in parallel.
Can someone provide a hint on this? Maybe there is some settings that i must change?
You should set a time in UTC time for both tasks to be launched with a repeating period of 24 hours. In this case, there is no reason why your tasks should not execute in parallel. Check the chronos logs and the tasks logs in sandbox on mesos for errors.
You can certainly run all of these components (Chronos, master, slave, and ZK) on the same machine, although ZK really becomes valuable once you have HA with multiple masters.
As user4103259 suggested, check the master and slave logs for that LOST/failed taskId to see what exactly happened to it. A task could go LOST/failed for numerous reasons, anywhere along the task launch/running/completing process.