snowflake setting task dependencies - scheduled-tasks

I have a task which needs to be executed after successful completion of different predecessor tasks.
Say for example below three tasks triggers at same time and calls different stored proc.
CREATE TASK myschema.mytask_1
WAREHOUSE = mywh
schedule='USING CRON 0 8 * * MON America/Los_Angeles'
as call myschema.MY_PROC_1();
CREATE TASK myschema.mytask_2
WAREHOUSE = mywh
schedule='USING CRON 0 8 * * MON America/Los_Angeles'
as call myschema.MY_PROC_2();
CREATE TASK myschema.mytask_3
WAREHOUSE = mywh
schedule='USING CRON 0 8 * * MON America/Los_Angeles'
as call myschema.MY_PROC_3();
However I want the below 4th task to be executed after all the above three completes successfully. If any one of them got failed 4th shouldn't trigger. In short 4th task depend on completion of all above three tasks.
I have read through some snowflake docs and found only one task can be set as dependencies. For now can think of below by going one after the other. Also I'm not sure how to evaluate the successful completion of prior task to proceed further. Can someone please help me to achieve this is in any better way. Any help on this is much appreciated.
CREATE TASK myschema.mytask_1
WAREHOUSE = mywh
schedule='USING CRON 0 8 * * MON America/Los_Angeles'
as call myschema.MY_PROC_1();
CREATE TASK myschema.mytask_2
WAREHOUSE = mywh
AFTER myschema.mytask_1
as call myschema.MY_PROC_2();
CREATE TASK myschema.mytask_3
WAREHOUSE = mywh
AFTER myschema.mytask_2
as call myschema.MY_PROC_3();
CREATE TASK myschema.mytask_4
WAREHOUSE = mywh
AFTER myschema.mytask_3
as call myschema.MY_PROC_4();

Although the solution with Streams suggested by Mike Walton is fascinating, implementing a single stored procedure (possibly within an explicit transaction so that it rolls back when an error occurs) might be a simpler and, therefore, better maintainable solution.
Having said that, if performance is key you may want to opt for the Streams-option cause it guarantees the different pieces of code within each stored procedure run concurrently, whereas the Single-SP would run them sequentially.

If you run the tasks in succession, then any failure will stop the rest from executing, and when the next scheduled execution comes around, it'll start from the beginning and execute again. Depending on your logic, this may not be the behavior you're looking for.
Regarding the first option, one possible solution here is to leverage streams to initiate the 4th task. It is a little bit unorthodox, but you can make it work. Here are the basic steps to look into trying:
Each of the 3 parallel tasks would need to insert a record into a separate table upon successful completion of the SP, so it'd have to be the last step in the SP.
Each of those 3 tables would need to have a STREAM object created.
You'd schedule the task to run every minute and use a WHEN clause that looked something like the below code.
You would then need to execute some additional tasks after the 4th task that did some DML statement against your streams, so that the stream would get reset.
Step 3 example:
CREATE OR REPLACE TASK mytask_4
WAREHOUSE = xxxx
SCHEDULE = '1 MINUTE'
WHEN SYSTEM$STREAM_HAS_DATA('mytask_1_stream') = True
AND SYSTEM$STREAM_HAS_DATA('mytask_2_stream') = True
AND SYSTEM$STREAM_HAS_DATA('mytask_3_stream') = True;
Step 4 exmple:
CREATE OR REPLACE TASK mytask_5
WAREHOUSE = xxxx
AFTER myschema.mytask_4
INSERT * INTO log_table FROM mytask_1_stream;
CREATE OR REPLACE TASK mytask_6
WAREHOUSE = xxxx
AFTER myschema.mytask_4
INSERT * INTO log_table FROM mytask_2_stream;
CREATE OR REPLACE TASK mytask_7
WAREHOUSE = xxxx
AFTER myschema.mytask_4
INSERT * INTO log_table FROM mytask_3_stream;
Like I said, it's a bit of a workaround, but it should work nicely in most cases. And one additional point, mytask_4 in this case will never use any compute unless all 3 of the streams contain data, which means all 3 of your previous tasks have executed successfully. Otherwise, the task will be skipped and wait for the next minute to "check again". If you are running the first 3 tasks less often, you can schedule mytask_4 to run less often, as well, if you wish.

Related

Routing agents through specific resources in anylogic

I am solving a job shop scheduling problem resorting to anylogic. I have 20 jobs (agents) and 5 machines(resources) and each job as a specific order to visit the machines. My question is: how can I make sure that each job follows its order.
This is what I have done. One agent called 'jobs' and 5 agents, each one corresponding to a machine. One resource pool associated to each one of the service blocks. In the collection enterblocks I selected the 5 enter blocks.
In the agent 'jobs' I have this. The parameters associated to each job, read from the database file, and the collection 'enternames' where I selected the machine(1,2,3,4,5) parameters and the collection 'ptimes' where I put the processing times of the job (This two colletions is where I am not sure I have done it correctly)
My database file
I am not sure how to use the counter used here How to store routings in job shop production in Anylogic. In the previous link the getNextService function is used in the exit blocks but I am also not sure how to use it in my case due to the counter.
Firstly, to confirm that based on the Job agent and database view, the first line in the database will result in a Job agent with values such as:
machine1 = 1 and process1=23
machine2 = 0 and process2=82 and so on
If that is the intent, then a better way is to restructure the database, so there are two tables:
Table of jobs to machine sequence looking something like this:
job
op1
op2
op3
op4
op5
1
machine2
machine1
machine4
machine5
machine3
2
machine4
machine3
machine5
machine1
machine2
3
...
...
...
...
...
Table of jobs to processing time
Then, add a collection of type ArrayList of String to Job (let's call this collection col_machineSequence) and when the Job agents get created their on startup code should be:
for (String param : List.of("op1","op2","op3","op4","op5")) {
col_machineSequence.add(getParameter(param));
}
As a result, col_machineSequence will contain sequence of machines each job should visit in the order defined in the database.
NOTE: Please see help on getParameter() here.
Also:
Putting a Queue in front of the Service isn't necessary
Repeating Enter-Queue-Service-Exit isn't necessary, this can be simplified using this method
Follow-up clarifications:
Collections - these will be enclosed in each Job agent
Queue sorting - Service block has Priorities / preemption which governs the ordering on the queue
Create another agent for the second table (call the agent ProcessingTime and table processing_time) and add it to the Job agent and then load it from database filtering on p_jobid as shown in the picture

How do i Re-run pipeline with only failed activities/Dataset in Azure Data Factory V2?

I am running a pipeline where i am looping through all the tables in INFORMATION.SCHEMA.TABLES and copying it onto Azure Data lake store.My question is how do i run this pipeline for the failed tables only if any of the table fails to copy?
Best approach I’ve found is to code your process to:
0. Yes, root cause the failure and identify if it is something wrong with the pipeline or if it is a “feature” of your dependency you have to code around.
1. Be idempotent. If your process ensures a clean state as the very first step, similar to Command Design pattern’s undo (but more naive), then your process can re-execute.
* with #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.
* this is an ADFv1 or v2 compatible approach
2. If ADFv2, then you have more options and can have more complex logic to handle errors:
* for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.
* you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.
3. You can also use asynchronous communication to future process executions that save success to a central store. Then later executions “if” I already was successful then stop processing before the activity.
* this is powerful for more generalized pipelines, since you can choose where to begin
4. Last resort I know (and I would love to learn new ways to handle) is manual re-execution of failed activities.
Hope this helps,
J

ScheduledExecutorService: modify one or more running tasks

I have a program, it loads a few tasks from a file prepared by user and start executing them according the scheduling shown in the file.
Example: taskFile.txt
Task1: run every hour
Task2: run every 2 seconds
...
TaskN: run every monday at 10:00
This first part is Ok, i solved by using ScheduledExecutorService and i am very satisfied. The tasks are load and run as they should.
Now, let's image that the user, by GUI (at runtime), decides that Task2 should run every minute, and he wants to remove Task3.
I cannot find any way to access one specific task in the pool, in order to remove/modify it.
So I cannot update tasks at runtime. When user changes a task, I can only modify the taskFile.txt and restart the application, in order to reload all tasks according the newly updated taskFile.txt.
Do you know any way to access a single task in order to modify/delete it?
Or even, a way to remove one given task, so i can insert a new one in the pool, with the modifications wanted by the user.
Thanks
This is not elegant, but works.
Let's suppose you need 10 threads, and sometimes you need to manage a specific thread.
Instead to have a pool with 10 thread, use 10 pools with one thread for each, keep them in your favourite data structure, and act on the pool_1 when you want to modify thread_1.
It's possible to remove the older Runnable from the pool and put a new one with the needed changes.
Otherways, anything put in the pool became anonymous and will be not directly manageable.
If somebody has a better solution...

Spring Batch - execute a set of steps 'x' times based on a condition

I need to execute a sequence of steps a specific number of times.. any pointers on what is the best way to do this in Spring Batch. I am able to implement executing a single step 'x' times. but my requirement is to execute a set of steps - based on a condition 'x' times.Any pointers will help.
Thanks
Lakshmi
You could put all steps in a job an start the whole job several times. There are different ways, how a job actually is launched in spring-batch. have a look at joboperator and launcher and then simply implement a loop around the launching of the job.
You can do this after the whole spring-context is initialized, so there will be no overhead concerning that. But you must by attention about the scope of your beans, especially the reader and writers.
Depending on your needs concerning failurehandling and restart, you also have pay attention how you manage the execution context of your job and steps.
You can simulate a loop with SB using a JobExecutionDecider:
Put it in front of all steps.
Store x in job execution context and check for x value into
decider: move to 'END' if x equals desidered value or increment it
and move to first step of set.
After last step move back to start (the decider).

spring batch add job parameters in a step

I have a job with two steps. first step is to create a file in a folder with the following structure
src/<timestamp>/file.zip
The next step needs to retrieve this file and process it
I want to add the timestamp to the job parameter. Each job instance is differentiated by the timestamp, but I won't know the timestamp before the first step completes. If i add a timestamp at the beginning of the job to the job parameter then each time a new job instance will be started. any incomplete job will be ignored.
I think you can make use of JobExecutionContext instead.
Step 1 gets the current timestamp, use that to generate the file, and put to JobExecutionContext. Step 2 read from the JobExecutionContext to get the timestamp, which used to construct the input path for its processing.
Just to add something on top on your approach of splitting steps like this: You have to think twice whether this is really what you want. If Step 1 finished, and Step 2 failed, when the job instance is re-runed, it will start from Step 2, that means the file is not going to regenerate in Step 1 (because it is completed already). If it is what you look for, that's fine. If not, you may see if you want to put Step1 & Step2 in one step instead.