What is BPMN User Task equivalent in temporal.io and how to implement it? - workflow

I'm evaluating temporal.io as an modern workflow-as-code alternative for BPMN based solutions such as Camunda.
In my scenario workflow orchestrates activity workers, which calls external microservices for business transactions. Business transactions may encounter business exceptions or require human action to proceed the flow, and rises required user tasks. Workflow should block at certain points until there are no blocking tasks for that specific activity.
Should the blocking task logic reside inside activities and services, keeping the workflow definition more abstract and deterministic? I premuse an activity should simply throw an runtime exception when there is a blocking task, is that right? Then, how do I continue the workflow when the task is completed?
Or should I use workflow signals to mimic BPMN user tasks and if so, how do I send a signal from an external service to a specific workflow instance?

Probably easiest way is to have an activity that notifies your external system
responsible for interacting with human actors, and then use signals to notify workflow of the completion of the human decision.
With Temporal you can write workflow code that waits for multiple signals in case there are multiple actors/decisions involved.
Other options could be to store a list of tasks in an external system and notify your workflow from that system directly (again via signals), or could have a workflow per "approver" which can hold a list of assigned tasks inside these workflows that you could query state, or could have them send notifications when all tasks have been performed.
how do I send a signal from an external service to a specific workflow instance?
You would use Temporal SDK client api to send a signal to a workflow execution thats uniquely identified via namespace name, task queue name workflow id. Not sure which programming language you are using but maybe this Go sample can help.

Related

What should I consider when using Cadence/Temporal to design a new project?

I am new to Cadence/Temporal and was wondering what the design review process is like. My team is ready to have a formal design review out but was wondering if there is a template available to capture Cadence/Temporal specific information?
This is something I try to call as "workflow-oriented-architecture". I would suggest to think more about the below aspects:
Different options/alternatives of “what part of the process” in the design that can be modeled as workflow. Based on that,
What will be the workflowID with which IDReusePolicy? It's usually recommended to use some business ID to guarantee the uniqueness so that there is only one workflow executing for a business entity
How is the Workflow started with what information as input parameters?
What Cadence/Temporal concepts you are planning to use, and how does a workflow interact with other system?
Regular/local/long-running activity is for making an action to external system
Durable timer (use workflow.Sleep or Workflow.Await) is to wait for certain time then wake up. Unlike using sleep in native language, durable timer is reliable that whatever host restart won't impact the firing
signal is to receive an event from external system
query is to let external system to get some workflow states
search attributes can do two things: a) letting application searching for workflows with some conditions using ListWorkflowExecutions API, and letting application to get the basic status by DescribeWorkflowExecution API
How do you handle failure, especially using Cadence/Temporal concepts: activityRetry, workflowRetry, reset

Reuse Jobs in GitHub Actions Workflow

I’m migrating a pipeline from Circle CI to Github Actions and am finding it a bit weird that I can only run jobs once instead of creating a job, then calling it from the workflow section, making it possible to call a job multiple times without duplicating the commands/scripts in that job.
My pipeline pushes out code to three environments, then runs a lighthouse scan for each of them. In circle ci I have 1 job to push the code to my envs and 1 job to run lighthouse. Then from my workflow section, I just call the jobs 3 times, passing the env as a parameter. Am I missing something or is there no way to do this in github actions? Do I just have to write out my commands 3 times in each job?
There are 3 main approaches for code reusing in GitHub Actions:
Reusing workflows
The obvious option is using the "Reusable workflows" feature that allows you to extract some steps into a separate "reusable" workflow and call this workflow as a job in other workflows.
Takeaways:
Reusable workflows can't call other reusable workflows.
The strategy property is not supported in any job that calls a reusable workflow.
Env variables and secrets are not inherited.
It's not convenient if you need to extract and reuse several steps inside one job.
Since it runs as a separate job, you have to use build artifacts to share files between a reusable workflow and your main workflow.
You can call a reusable workflow in synchronous or asynchronous manner (managing it by jobs ordering using needs keys).
A reusable workflow can define outputs that extract outputs/outcomes from executed steps. They can be easily used to pass data to the "main" workflow.
Dispatched workflows
Another possibility that GitHub gives us is workflow_dispatch event that can trigger a workflow run. Simply put, you can trigger a workflow manually or through GitHub API and provide its inputs.
There are actions available on the Marketplace which allow you to trigger a "dispatched" workflow as a step of "main" workflow.
Some of them also allow doing it in a synchronous manner (wait until dispatched workflow is finished). It is worth to say that this feature is implemented by polling statuses of repo workflows which is not very reliable, especially in a concurrent environment. Also, it is bounded by GitHub API usage limits and therefore has a delay in finding out a status of dispatched workflow.
Takeaways
You can have multiple nested calls, triggering a workflow from another triggered workflow. If done careless, can lead to an infinite loop.
You need a special token with "workflows" permission; your usual secrets.GITHUB_TOKEN doesn't allow you to dispatch a workflow.
You can trigger multiple dispatched workflows inside one job.
There is no easy way to get some data back from dispatched workflows to the main one.
Works better in "fire and forget" scenario. Waiting for a finish of dispatched workflow has some limitations.
You can observe dispatched workflows runs and cancel them manually.
Composite Actions
In this approach we extract steps to a distinct composite action, that can be located in the same or separate repository.
From your "main" workflow it looks like a usual action (a single step), but internally it consists of multiple steps each of which can call own actions.
Takeaways:
Supports nesting: each step of a composite action can use another composite action.
Bad visualisation of internal steps run: in the "main" workflow it's displayed as a usual step run. In raw logs you can find details of internal steps execution, but it doesn't look very friendly.
Shares environment variables with a parent job, but doesn't share secrets, which should be passed explicitly via inputs.
Supports inputs and outputs. Outputs are prepared from outputs/outcomes of internal steps and can be easily used to pass data from composite action to the "main" workflow.
A composite action runs inside the job of the "main" workflow. Since they share a common file system, there is no need to use build artifacts to transfer files from the composite action to the "main" workflow.
You can't use continue-on-error option inside a composite action.
Source: my "DRY: reusing code in GitHub Actions" article
I'm currently in the exact same boat and just found an answer. You're looking for a Composite Action, as suggested in this answer.
Reusable workflows can't call other reusable workflows.
Actually, they can, since Aug. 2022:
GitHub Actions: Improvements to reusable workflows
Reusable workflows can now be called from a matrix and other reusable workflows.
You can now nest up to 4 levels of reusable workflows giving you greater flexibility and better code reuse.
Calling a reusable workflow from a matrix allows you to create richer parameterized builds and deployments.
Learn more about nesting reusable workflows.
Learn more about using reusable workflows with the matrix strategy.

What is the exact use-case for ContinueAsNew

Team,
What is the exact use case to use continueAsNew?
As we have support for CronSchedule to do periodic activities, I don't know the scenario to use this.
Are we having this to give backward compatibility
There are many scenarios besides cron that require always running workflows. For example, a workflow that listens for external events and keeps some aggregated state. Such workflow will eventually run out of the history size limit. To support such workflow processing an unlimited number of events, it has to call continue as new periodically.

Architecting a configurable user notification service

I am building an application which needs to send notifications to users at a fixed time of day. Users can choose which time of day they would like to be notified, and which days they would like to be notified. For example, a user might like to be notified at 6am every day, or 7am only on week days.
On the back-end, I am unsure how to architect the service that sends these notifications. The solution needs to handle:
concurrency, so I can scale my servers (notifications should not be duplicated)
system restarts
if a user changes their preferences, pending notifications should be rescheduled
Using a message broker such as RabbitMQ and task scheduler such as Celery may meet your requirements.
Asynchronous, or non-blocking, processing is a method of separating the execution of certain tasks from the main flow of a program. This provides you with several advantages, including allowing your user-facing code to run without interruption.
Message passing is a method which program components can use to communicate and exchange information. It can be implemented synchronously or asynchronously and can allow discrete processes to communicate without problems. Message passing is often implemented as an alternative to traditional databases for this type of usage because message queues often implement additional features, provide increased performance, and can reside completely in-memory.
Celery is a task queue that is built on an asynchronous message passing system. It can be used as a bucket where programming tasks can be dumped. The program that passed the task can continue to execute and function responsively, and then later on, it can poll celery to see if the computation is complete and retrieve the data.
While celery is written in Python, its protocol can be implemented in any language. worker is an implementation of Celery in Python. If the language has an AMQP client, there shouldn’t be much work to create a worker in your language. A Celery worker is just a program connecting to the broker to process messages.
Also, there’s another way to be language independent, and that’s to use REST tasks, instead of your tasks being functions, they’re URLs. With this information you can even create simple web servers that enable preloading of code. Simply expose an endpoint that performs an operation, and create a task that just performs an HTTP request to that endpoint.
Here it is the python example from official documentation:
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')
# Calls test('world') every 30 seconds
sender.add_periodic_task(30.0, test.s('world'), expires=10)
# Executes every Monday morning at 7:30 a.m.
sender.add_periodic_task(
crontab(hour=7, minute=30, day_of_week=1),
test.s('Happy Mondays!'),
)
#app.task
def test(arg):
print(arg)
As I can see you need to have 3 types of entities: users (to store email or some other way to reach the user), notifications (to store what you want to send to user - text etc) and schedules (to store when user want to get notifications). You need to store entities of those types in some kind of database.
Schedule should be connected to user, notification should be connected to user and schedule.
Assume you have cron job that starts some script every minute. This script will try to get all notifications connected with schedule for current time (job starting time). Don't forget to implement some type of overlaping prevention.
After this script will place a tasks (with all needed data: type of notification, users who you want to notify etc) in queue (beanstalkd or something). You can create as many workers (even on different physical instances) as you want to serve this queue (without thinking about duplication) - this will give you a great power of scalability.
In case user changed his schedule it will affect all his notification at the same moment. There is no pending notification as they will be served only when they really should be send.
This is a very highlevel description. Many things depends on language, database(s), queue server, wokers implementation.

Windows Workflow: Persistence and Polling

I'm currently learning the WF framework, so bear with me; mostly I'm looking for where to start looking, not necessarily a direct answer. I just can't seem to figure out how to begin researching what I'd like in The Google.
Let's say I have a simple one-step workflow (much more complicated than that, but for simplicity's sake). This workflow needs to watch a certain record in the database to see when it changes. I don't have the capability to "push" via a trigger from the database when the row changes, so I need to poll for it every so often.
This workflow needs to be persisted to the database to be durable against restarts and whatnot as this is a long-running workflow. I'm trying to figure out the best way to get it to check every 3 minutes or so and also persist to the database. Do the persistence capabilities of the framework allow for that? It seems to be time-based. And since the workflow won't be reawakened by an external event, how does it reload from the database and check the same step it did previously again? Does it attempt the last unfulfilled activity automatically upon reloading?
Do "while" activities with a delay attached to it work at all, or can it be handled solely through the persistence services?
I'm not sure what you mean by "handled soley through persistence services"? Persistence refers only to the storing of an idle workflow.
You could have a Delay and a Code activity in a Sequence in a While loop. When in the Delay the workflow will go idle and may be persisted if necessary. However depending on how much state is needed when persisting the workflow and/or how many such workflows you would have running at any one time may mean that a leaner approach is necessary.
A leaner approach would be to externalise the DB watching and have some "DB watching" workflow service raise an event when the desired change has occured. This service would be added to Workflow runtime.
To that end you need a service contract which is defined by an Inteface with the [ExternalDataExchange] attribute. This interface in turn defines an event that the service will raise when the desired DB change is detected. It also defines a method that a Workflow can call to specify what what change this service should be looking for. The method should accept an instance GUID so that the requesting instance can be found when the DB change is detected.
In the workflow you use a CallExternalMethodActivity to call this services method. You then flow to a HandleExternalEventActivity which listen for the event. At this point the workflow will go idle and can be persisted. It will remain there until the service raises the event.