I am working on a job design that will trigger a child job based on completion status that the job reads from the database.
If the return value from database is Y, then the child job should execute. If the return value from database in N, the job should sleep and look for the return value from database after 15 minutes. If the return value is Y, then the child job should execute, else it should sleep and this goes on loop.
I tried with the following design approach, but the job is stuck at tSleep
tSetGlobalVariable -> tLoop -> tRunJob (Child job1) -> tSleep -> tRunJob (Child job2)
tRunJob (Child job1) would fetch status if a particular job has completed or not from the database and would send the status back to run on a While loop. If the status is Y, ChildJob2 would run. If the status is N, it should sleep until the status changes to Y.
You could use the component tWaitForSqlData which combine the scan, sleep and condition portions of your job.
The simplification should help you pinpoint where your problem is exactly.
Related
I have a table that acts like a queue (let's call it queue) and has a sequence from 1..N.
Some triggers inserts on this queue (the triggers are inside transactions).
Then external machines have the sequence number and asks the remote database: give me sequences greater than 10 (for example).
The problem:
In some cases transaction 1 and 2 begins (numbers are examples). But transaction 2 ends before transaction 1. And in between host have asked queue for sequences greater than N and transaction 1 sequences are skipped.
How to prevent this?
I would proceed like this:
add a column state to the table that you change as soon as you process an entry
get the next entry with
SELECT ... FROM queuetab
WHERE state = 'new'
ORDER BY seq
LIMIT 1
FOR UPDATE SKIP LOCKED;
update state in the row you found and process it
As long as you do the last two actions in a single transaction, that will make sure that you are never blocked, get the first available entry and never skip an entry.
I have two ADFv2 triggers.
One is set to execute every 3 mins and another every 20 mins.
They execute different pipelines but there is an overlap as both touch the same database table which I want to prevent.
Is there a way to set them up so if one is already running and the other is scheduled to start, it is instead queued until the running trigger is finished?
Not natively AFAIK. You can use the pipeline's concurrency property setting to get this behaviour but only for a single pipeline.
Instead you could (we have):
Use Validation activity to block if a sentinel blob exists and have your other pipeline write and delete the blob when it starts/ends.
Likewise have one pipeline set a flag in a control table on the database that you can examine
If you can tolerate changing your frequencies to have a common factor, create a master pipeline that Execute Pipeline's your current two pipelines; make the longer one only called every n-th run using MOD. Then you can use the concurrency setting on the outer pipeline to make sure the next trigger gets queued until the current run ends.
Use REST API https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically#rest-api in one pipeline to check if the other is running
Jason's post gave me an idea for a more simple solution.
I have two triggers. Each executes at different schedules and different pipelines.
On occasion the schedule on these triggers can overlap. In this circumstance the trigger that fires while the other is running should not run. Only one to be running at any one time.
I did this using the following.
Create a control table with a IsJobRunning BIT (flag) column
When a trigger fires, the pipeline associated with it will execute an SP that will check the Control table.
If the value in the IsJobRunning is 0 then UPDATE the IsJobRunning column to 1 and continue executing,
if 1 then RAISEERROR - a dummy error - stop executing.
IF (SELECT J.IsJobRunning FROM '[[Control table ]]' ) = 1
BEGIN
SET #ERRMSG = N'**INFORMATIONAL ONLY** Other ETL trigger job is running - so stop this attempt ' ;
SET #ErrorSeverity = 16 ;
-- Note: this is only a INFORMATIONAL message and not an actual error.
RAISERROR (#ERRMSG,#ErrorSeverity,1 ) WITH NOWAIT;
RETURN 1;
END ;
ELSE
BEGIN
-- set #IsJobRunning to RUNNING
EXEC '[[ UPDATE IsJobRunning on COntrol table]] ' ;
END ;
This looks like this in the pipeline.
This logic is in both Pipelines.
I have 4 stream.
A_STREAM, B_STREAM, C_STREAM, D_ STREAM
I have chain of task where A_TASK is parent and it has 3 child task (B_TASK, C_TASK, D_TASK).
CREATE TASK A_TASK
WAREHOUSE = XYZ
SECHDULE = '15 MINUTE'
WHEN SYSTEM$STREAM_HAS_DATA('A_STREAM)
AS
DO Something;
CREATE TASK C_TASK
WAREHOUSE=XYZ
AFTER A_TASK
WHEN SYSTEM$STREAM_HAS_DATA('C_STREAM')
AS
DO SOMETHING;
Let say A_TASK got triggered and completed but when it came to execution for C_TASK stream C_STREAM didn't had data so task didn't got triggered.
After 5 minutes C_STREAM got data.
Here the issue is data will never got loaded to Target table from C_STREAM since next time A_TASK won't get triggered. How do we tackle these kind of secnario?
I can't seperate these task since they operate on same target table.
In Snowflake task do we have something like child task will wait until dependency is met?
I have a Spring batch step reader where the query is complex and contains join of several tables.
Job will be run everyday looking for records that were added to table A, based on the last updated date.
In the scenario where no records were added, the query still takes a long time to return results. I would like to check if there were any records that were added to table A, and only then run the full query.
Example : select count(recordID) from table A where last_update_date >
If count > 0, then proceed with the step (reader, writer etc) joining the other tables.
If count = 0, then skip the reader, writer and set step status as COMPLETED and proceed with the next step of the job.
Is this possible in Spring batch ? If yes, how can this be done?
Use a StoredProcedureItemReader.
Or a JobExecutionDecider where perform fast query and move to processing step or to job termination.
I have a cron job that runs every 2 mins it takes a 10 records from a postgres table and working on them then it set a flag when it is finished. i want to make sure if the fist cron runs and takes more than 2 min the other one will run on different data on DBs not on the same data.
is there any why to handle this case?
This can be solved using a Database Transaction.
BEGIN;
SELECT
id,status,server
FROM
table_log
WHERE
(direction = '2' AND status_log = '1')
LIMIT 100
FOR UPDATE SKIP LOCKED;
what are we doing?
We are Selecting all rows available (not locked) from other cron-jobs that might be running. And selecting them for update. So this means all this Query grabs its unlocked and all results will be locked for this cron-job only.
how to update my locked rows?
Simple use a for loop on your processor language (Python, Ruby, PHP) and do a concatenation for each update remember we are building 1 single update.
UPDATE table_log SET status_log = '6' ,server = '1' WHERE id = '1';
Finally we use
COMMIT;
And all rows locked will be updated. This prevents other Queries from touching the same data at the same time. Hope it helps.
Turn your "finished" flag from binary to ternary ("needs work", "in process", "finished"). You also might want to store the pid of the "in process" process, in case it dies and you need to clean it up, and a timestamp for when it started.
Or use a queueing system that someone already wrote and debugged for you.