How can I swap elements in a valid sequence of a directed graph (e.g. swap jobs in a sequence of dependent jobs) - directed-graph

I have a directed graph of dependent jobs.
E.g. here job a has to be finished before job b, and a before c, before d.
-> c -> d
/
a -> b
Now given a valid job order (e.g. [a,c,b,d]) I want to swap two directly adjacent jobs (e.g. b and c to get [a,b,c,d]).
For any valid sequence of a given directed graph of jobs, is it sufficient to check if the left job (here c) is a direct predecessor of the right job (here b) in the graph (and if not swap them), to get a valid job sequence?
Is there a scientific proof for this?

Related

Airflow exclude task from downstream dependency or reference job outside of subdag

I am currently trying to build a data pipeline in airflow that has many sub-dependencies. I've created subdags (and subsubdags) to achieve the functionality that I want. One thing that I can't figure out, however, is whether I am able to reference a downstream task from within a subdag.
I've included a picture of data pipeline: Data pipeline
task e needs to be triggered by task c but has no downstream dependencies. I can't find a way to reference a task from a higher level within a subdag. Is there a way to do this?
As a work-around, for now I have just placed task e within the subdag with tasks a, b, c, and d. task f should be triggered when task a, task b, task c, and task d are successful, but it should not matter whether task e has succeeded or failed. Can I set task f to have the trigger one_failed and specify the task_id of the accepted failure?
Any help is appreciated! Thanks!

How does pages work if the DB is manipulated between next

The below code i have is working as intended, but is there a better way to do it?
I am consuming a db like a queue and process in batches of a max number. I'm thinking on how i can refactor it to use page.hasNext() and page.nextPageable()
However I can't find any good tutorial/documentation on what happens if the DB is manipulated between getting a page and getting the next page.
List<Customer> toBeProcessedList = customerToBeProcessedRepo
.findFirstXAsCustomer(new PageRequest(0, MAX_NR_TO_PROCESS));
while (!toBeProcessedList.isEmpty()) {
//do something with each customer and
//remove customer, and it's duplicates from the customersToBeProcessed
toBeProcessedList = customerToBeProcessedRepo
.findFirstXAsCustomer(new PageRequest(0, MAX_NR_TO_PROCESS));
}
If you use the paging support for each page requested a new sql statement gets executed, and if you don't do something fancy (and probably stupid) they get executed in different transactions. This can lead to getting elements multiple times or not seeing them at all, when the user moves from page to page.
Example: Page size 3; Elements to start: A, B, C, D, E, F
User opens the first page and sees
A, B, C (total number of pages is 2)
element X gets inserted after B; User moves to the next page and sees
C, D, E (total number of pages is now 3)
if instead of adding X, C gets deleted, the page 2 will show
E, F
since D moves to the first page.
In theory one could have a long running transaction with read stability (if supported by the underlying database) so one gets consistent pages, BUT this opens up questions like:
When does this transaction end, so the user gets to see new/changed data
When does this transaction end, when the user moves away?
This approach would have some rather high resource costs, while the actual benefit is not at all clear
So in 99 of 100 cases the default approach is pretty reasonable.
Footnote: I kind of assumed relational databases, but other stores should behave in basically the same way.

How to get Goal Funnel Step data such as "entered" and "proceeded" through Query API?

When looking at Goal Funnel report in the Google Analytics website. I can see not only the number of goal starts and completion but also how many visits to each step.
How can I find the step data through the Google Analytics API?
I am testing with the query explorer and testing on a goal with 3 steps, which 1st step marked as Required
I was able to get the start and completion by running by using goalXXStarts and goalXXCompletions:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3A90593258&start-date=2015-09-12&end-date=2015-10-12&metrics=ga%3Agoal7Starts%2Cga%3Agoal7Completions
However I can't figure out a way to get the goal second step data.
I tried using ga:users or ga:uniquePageViews with the URL of the step 2, and previousPagePath as step 1 (required = true) and add to that the ga:users or ga:uniquePageViews from the next stage with ga:previousPagePath of step 1 (since its required=true) for backfill.
I also tried other combinations, but could never reach the right number or close to it.
One technique that can be used to perform conversion funnel analysis with the Google Analytics Core Reporting API is to define a segment for each step in the funnel. If the first step of the funnel is a 'required' step, then that step must also be included in segments for each of the subsequent steps.
For example, if your funnel has three steps named A, B, and C, then you will need to define a segment for A, another for B, and another again for C.
If step A is required then:
Segment 1: viewed page A,
Segment 2: viewed page A and viewed page B,
Segment 3: viewed page A and viewed page C.
Otherwise, if step A is NOT required then:
Segment 1: viewed page A,
Segment 2: viewed page B,
Segment 3: viewed page C.
To obtain the count for each step in the funnel, you perform a query against each segment to obtain the number of sessions where that segment matches. Additionally, you can query the previous and next pages, including entrances and exits, for each step (if you need to); in which case, query previousPagePath and pagePath as dimensions along with metrics uniquePageviews, entrances and exits. Keep in mind the difference between 'hit-level' vs 'session-level' data when performing, constructing and interpreting the results of each query.
You can also achieve similar results by using sequential segmentation which will offer you finer control over how the funnel steps are counted, as well as allowing for non-sequential funnel analysis if required.

How to implement a content driven workflow in jBPM/activiti/YAWL?

I need a workflow with human tasks, where each node can be revisited at any point of execution.
/A1 -- B1 \
/ \
Start - AND AND - End
\ /
\ A1 --- B2/
So even if the current execution is at B2, the user can go to A1 (assuming A1 is assigned to the same user and is already done).
How can I model this behavior in jBPM/Activiti - since the task once completed is deleted from the execution chain?
Is there any other workflow engine which allows me to do this?
I'm afraid this is difficult to achieve because BPMN is intentionally stateless.
Let's suppose you're at End and want to return to A1. It means that the process token somehow teleports to A1, flows to B1, to the second AND and... we're stuck here. As far as I understand, AND means a BPMN parallel gateway. And it does not allow us to go further because it needs two input tokens at once to produce an output token.
Probably you can adapt another approach, called a Finite State Machine. Imagine that your document (piece of content, token, et cetera) which flows through the workflow has a state attribute, which can be one of several values: Start, A1, etc. Besides this, you have a transition table of the following format:
from | to
----------
Start | AND
AND | A1
AND | A2
Thus, your system knows that AND follows Start. This table does not actually specify a classical finite state machine because it defines a non-functional relation; in other words, any state may be followed from two or more states as it is shown on your diagram. It's up to you how to implement this. Maybe you can create two copies of the instance, or use a combined state attribute which is a list of values.
But anyway, in this approach, 1) the system knows how to swith states automatically; 2) you can build a UI to switch them manually.
The choice of workflow/state machine libraries if you choose to use them depends on your platform, programming langauge, and requirements.

jBPM5 Task Instances Sequence for Display

Please assume the following scenario,
In a process flow there are three humanTask1, humanTask2, humanTask3; All three are assigned to same userA;
Now, assume there are two process instances (p1, p2) live. Each process instance can be in different tasklevel.
That is, for p1, task status are as, humanTask3-inprogress and for p2, humanTask1-inprogress
To display the tasks for this userA in a web page, I want them to be ordered as they appear in workflow design like,
p2-humanTask1
p1-humanTask3
taskService.getTasksOwned() may not return the tasks list in this order.
How do I ensure the tasks are displayed in this sequence?
I am using jBPM 5.3; LocalTaskService;
AFAIK, there is no out-of-the-box solution for this. You will have to decide your own order and then apply it whether by ordering the result of taskService.getTasksOwned() or by creating and executing a customized version of the query that taskService.getTasksOwned() is using.
Hope it helps,