Best practice to build & deploy Temporal workflows - deployment

I am using the GO SDK from Temporal, and I was wondering what is the best practice way to package and deploy Workflows.
Can I bundle all my workflows and activities into one Worker service? Is there any limitation by doing this, or is it recommended to deploy/build each workflow separately?
Also I would like to expose http endpoints to trigger the workflows. What is the best practice to do this if I deploy Temporal on Kubernetes (GKE), expose an ingress/service resource?
Thanks!

From the technical point of view, the Temporal doesn't impose any specific requirement on the packaging. It supports a single bundle that contains any number of workflows and activities and it supports deployment of a single activity or workflow type independently.
Treat workflows and activities as long-running operations. Then treat the unit of deployment as a microservice. Then the same logic that applies to microservices applies here. So if collocating workflows and activities together makes sense from the code and operational point of view do it.

Related

Horizontal Scalability with Drools

Knowing that drools work with in memory data. Is there a way to distribute horizontally on different drools instances to enhance performance when performing CRUD operations on rules, fact types, etc? I guess the instances would need to be on sync with each other in some way, so they all have the same data in memory or share in some way a knowledge base. I'm kinda new on drools and trying to research on a way to move a monolith on a cloud environment (gcp) so it can take advantage on load balancing, scaling, etc. Want to know if there is any feature on drools itself that supports this or if there is any way to implement this myself, thanks in advance for any information/documentation/use case on this matter.
Currently I haven't tried a way to do this, but my goal is to improve performance and availability by using automatic scaling or support multiple instances of my app.
I'm not sure what kind of "CRUD" you're doing on Drools (or how). But if you just want to deploy new rules (for example), then this is identical to pushing any data or application changes to your deployment in a distributed system -- either your nodes are gradually updated, so during the upgrade process you have some mix of old and new logic/code; or you deploy new instances with the new logic/code and then transition traffic to your new instances and away from the old ones -- either all at once or in a controlled blue/green (or similar) fashion.
If you want to split a monolith, I think the best approach for you would be to consider Kogito [1] and microservice architecture. With microservices, you could even consider using the Function as a service approach - having small immutable service instances, that are just executed and disposed. Kogito mainly targets Quarkus platform, but there are also some Spring Boot examples. There is also OpenShift operator available.
As far as sharing the working memory, there was a project in the KIE community called HACEP [2]. Unfortunately that is now deprecated and we are researching other solutions to make the working memory persisted.
[1] https://kogito.kie.org/
[2] https://github.com/kiegroup/openshift-drools-hacep
The term "entry point" is related to the fact that we have multiple partitions in a Working Memory and you can choose which one you are inserting into. If you can organize your business logic to work with different entry points you can process 'logical partitions' on different machines in parallel safely. At a glance drools entry points gives you something like table partitioning in Oracle which implies the same options.
Use load balancer with sticky sessions if you can (from business point of view) partition 'by client'
you question looks more like an architecture question.
As a start, I would have a look into the Kie Execution Server component provided with Drools that helps you to create microservice decisions based on Drools rulesets.
Kie Execution Server (used in stateless mode by clients) could be embedded in different pods/instances/servers to ensure horizontal scalability.
As mentioned by #RoddyoftheFrozenPeas , one of the problem you'll face will be the simultaneous hot deploy of new rulesets on the "swarm" of kieserver that hosts your services.
That would have to be handled using a proper devops strategy.
Best
Emmanuel

How to trigger argo workflow from an API request?

What is the best way to trigger an argo workflow from an API request?
The API request is handled by a web server, how does the server submits the workflow to the argo server? Using the CLI? using a rest request? What is the best/recommended approach here?
There's no one "right way." But here are some of the options, so you can pick the one that makes the most sense for your application:
Use the Argo API
with an SDK (Java, Go, Python)
If your API is written in Java, Go, or Python, and if your interactions with Argo are more complex than simply submitting a Workflow (for example, if you're also listing Workflows and would like a nice representation of those objects), an Argo Workflows SDK might be a good choice. In my experience the SDKs have quirks and bugs, so I'd only dive in if you need a more full-featured client.
directly with some HTTP client
If your use case is very simple (like submitting a small Workflow with a WorkflowTemplate reference), I would recommend using a direct HTTP call to the Argo or Kubernetes API.
Use a webhook
The webhook endpoint is technically part of the API, but it's a bit different. The API is basically a specialized version of the Kubernetes API, tailored to the Argo CRDs. The events API endpoint provides some additional features specific to kicking off workflows.
Use the CLI
You'd have to fork the CLI process from your server code, so this probably isn't the "cleanest" approach.
Use Argo Events
Argo Events is a separate but closely-related project. It can accept a variety of inputs (webhooks, pub/sub messages, etc) and then trigger a Workflow.
Argo Events could make sense if, for example, you want an external record of all the workflows submitted. Pub/sub would give you that record.
Use the Kubernetes API or CLI
Workflows are just Kubernetes resources, so you can just submit them via Kubernetes mechanisms if you like. If your language has a robust Kubernetes SDK, that's a solid choice.
As I'm sure you can tell, it really depends on the application. Let me know if any of these needs clarification.

Swagger multiple definitions best practices

I have an API, and a consumer web app, both written in Node and Express. The API is defined by a OpenAPI Specification. Implemented by swagger-ui-express.
The above web apps are Dockerised and managed in Kubernetes.
The API has a handful of endpoints for managing the lifecycle of a user's registration/application to the service.
Currently, when I need to cleardown completed/abandoned applications, or resubmit failed applications, I employ a periodically run cronjob to carry out a database query for the actions mentioned. The cronjob is defined by a Kubernetes config YAML file. This is quickly becoming unmanageable, and hard to maintain.
I am looking in to having a dedicated endpoint for each of the above tasks. Then a dedicated cronjob could periodically send a request to the API endpoint to carry out the complex task. This moves the business logic back in to the API, and avoids duplication within a cronjob hosted elsewhere. I am ultimately asking if this is a good approach or is there a better workflow documented somewhere I could implement?
My thinking is that I could add these new endpoints to the already-existing consumer API, but have the new (housekeeping/management) endpoints separated from the others.
To separate each (current) endpoint in to their respective resource, I am defining tags within the specification. Tags don't seem to be sufficient for the separation of these new "housekeeping" endpoints.
Looking through the SwaggerUI documentation I can see that I can define multiple definitions (via the urls property) to switch between. These definitions being powered by individual Specification documents. This looks like a very clean way of separating the consumer API from the admin API, is this best practice?
Any input would be appreciated on this as I am struggling to find much documentation on this kind of issue.

Temporal workflow vs Cadence workflow

How is temporal.io related to cadenceworkflow.io? What should be used if starting a new project depending on the cadence workflow service?
Disclaimer: I'm the original co-founder and tech lead of the Cadence project and currently co-founder/CEO of the Temporal Technologies.
temporal.io is the fork of the Cadence project by the original founders and tech leads of the Cadence project Maxim Fateev and Samar Abbas. The fork is fully open source under the same MIT (with some SDKs under Apache 2.0) license as Cadence. We started Temporal Technologies and received VC funding as we believe that the programming model that we pioneered through AWS Simple Workflow, Durable Task Framework and the Cadence project has potential which goes far beyond a single company. Having a commercial entity to drive the project forward is essential for the longevity of the project.
The temporal.io fork has all the features of Cadence as it constantly merges from it. It also implemented multiple new features.
Here are some of the technical differences between Cadence and Temporal as of initial release of the Temporal fork.
All thrift structures are replaced by protobuf ones
All public APIs of Cadence rely on Thrift. Thrift object are also stored in DB in serialized form.
Temporal converted all these structures to Protocol Buffers. This includes objects stored in the DB.
Communication protocol switched from TChannel to gRPC
Cadence relies on TChannel which was TCP based multiplexing protocol which was developed at Uber. TChannel has a lot of limitations like not supporting any security and having a very limited number of language bindings. It is essentially deprecated even at Uber.
Temporal uses gRPC for all interprocess communication.
TLS Support
Cadence doesn't support any communication security as it is a limitation of TChannel.
Temporal has support for mutual TLS and is going to support more advanced authentication and authorization features in the future.
Simplified configuration
Temporal has reworked the service configuration. Some of the most confusing parts of it are removed. For example, the need to configure membership seeds is eliminated. In temporal each host upon startup registers itself with the database and uses the list from the database as the seed list.
Release pipelines
Cadence doesn't test any publicly released artifacts including docker images as its internal release pipeline is ensuring the quality of the internally built artifacts only. It also doesn't perform any release testing for dependencies that are not used within Uber. For example, MySQL integration is not tested beyond rather incomplete unit tests. The same applies to the CLI and other components.
Temporal is making heavy investment into the release process. All the artifacts including a full supported matrix of dependencies are going to be subjected through a full release pipeline which is going to include multi-day stress runs.
The other important part of the release process is the ability to generate patches for production issues. The ability to ensure quality of such patches and produce all the necessary artifacts in a timely manner is important for anyone running Temporal in production.
Payload Metadata
Cadence stores activity inputs and outputs and other payloads as binary blobs without any associated metadata.
Temporal allows associating metadata with every payload. It enables features like dynamically pluggable serialization mechanisms, seamless compression, and encryption.
Failure Propagation
In Cadence activity and workflow failures are modeled as a single binary payload and a string reason field. Only Java client supports chaining exceptions across workflow and activity boundaries. But this chaining relies on fragile GSON serialization and doesn't work with other languages.
Temporal activity and workflow failures are modeled as protobufs and can be chained across components implemented in different SDKs. For example, a single failure trace can contain a chain that is caused by an exception that originates in activity written in Python, propagated through Go child workflow up to Java workflow, and later to the client.
Go SDK
Temporal implemented the following improvements over Cadence Go client:
Protobuf & gRPC
No global registration of activity and workflow types
Ability to register activity structure instance with the worker. It greatly simplifies passing external dependencies to the activities.
Workflow and activity interceptors which allow implementing features like configuring timeouts through external config files.
Activity and workflow type names do not include package names. This makes code refactoring without breaking changes much simpler.
Most of the timeouts which were required by Cadence are optional now.
workflow.Await method
Java SDK
Temporal implemented the following improvements over Cadence Java client:
Workflow and activity annotations to allow activity and workflow implementation objects to implement non-workflow and activity interfaces. This is important to play nice with AOP frameworks like Spring.
Polymorphic workflow and activity interfaces. This allows having a common interface among multiple activity and workflow types.
Dynamic registration of signal and query handlers.
Workflow and activity interceptors which allow implementing features like configuring timeouts through external config files.
Activity and workflow type name generation improved
SDKs not supported by Cadence
Typescript SDK, Python SDK, PHP SDK
SDKs under active development
.NET SDK, Ruby SDK
Temporal Cloud
Temporal Technologies monetizes the project by providing a hosted version of the Temporal service. There are dozens of companies (including SNAP) already using it in production.
Other
We have a lot of other features and client SDKs for other languages planned. You can find us at Temporal Community Forum.
Overview
Using iWF will let you switch between Cadence & Temporal easily.
In addition, iWF will provides a nice abstraction on top of the both and makes your life a lot better.
The fact is both Cadence & Temporal are under active development. You can see they have some different focuses if looking at their road maps. The two projects share the same vision to let everyone rethink about programming models of long-running business.
Tasks across domain+clusters
If you have multiple Cadence clusters,
this allows starthing childWF across different clusters and domains.
Support Both Thrift&gRPC
gRPC support is completely done on the server side. Internal traffic is all using gRPC and we are working on letting users migrating from Thrift to gRPC.
Authorization
The permission is based on domain but can be extended. Different from Temporal, the permission policy can be stored within Cadence domain data storage so that you don't have to build another service/storage to manage them.
Note that the whole proposal is developed by community member.
Workflow Shadower
Workflow Shadower is built on top of Workflow Replayer to address this problem. The basic idea of shadowing is: scan workflows based on the filters you defined, fetch history for each workflow in the scan result from Cadence server and run the replay test. It can be run either as a test to serve local development purpose or as a workflow in your worker to continuously replay production workflows.
Graceful domain failover
This allows XDC(multiple clusters) mode to reduce the pain of rerun some tasks during failover.
NoSQL plugin model
This allows implementing different NoSQL persistence in a minimum way. By the time writing this post, Temporal haven't started working on it.
MongoDB support
On top of the NoSQL interfaces, MongoDB support is WIP.
Using multiple SQL instances as sharded SQL
This allows user to have a Cadence cluster with a much larger scale. (then using XDC to add more DB instances)
Configuration Storage for Dynamic config
This enables us changing the dynamic configuration(like for ratelimiting) without making any deployment. Just a CLI command can control the behavior of the system.
It's in experiment and still WIP for production ready.
Workflow notification
A WIP eco system project to allow getting notification from Cadence. This is the benefits of Cadence using Kafka to deliver visibility messages. Temporal doesn't use Kafka which will be super difficult to support this feature.
Periodical Healthchecker(Canary) and Benchmark tool and benchmark setup docs
More Documentation
Seamless Cluster Migration guidance
Dashboard/Monitoring
...
...
Other small improvements that Temporal is missing
TerminateIfRunning IDReusePolicy
All domain API forwarding policy
Better & cleaner XDC configuration
Tooling to deserialize database blob data
...
...
I'm from the Cadence team at Uber, and I wanted to let you know that Cadence continues to be developed actively by our team. Below is a section of the update that we shared with the Cadence community recently:
We want to reinforce that Uber's Cadence team is committed to the
growth and open source development of the Cadence project. Today,
Cadence powers 100+ different use cases within Uber and that number
grows quickly. Collectively, there are 50M+ ongoing executions at any
moment on average and our customers finish 3B+ executions per month.
Outside of Uber, we also know that many engineering teams at various
companies have already adopted Cadence for their business-critical
workflows. We are excited to continue evolving Cadence as an
open-source project in a backward-compatible way with an increased
focus on reliability, scalability, and maintainability in the near
term.
It's probably too early to compare Cadence and Temporal. Still, I have a few ideas around how we can systematically shed light on Cadence's roadmap to ensure all the necessary information is out there to enable such comparisons going forward. I'll update this post with links when we create a page with information about the roadmap.
In the meantime, please let me know if you need further information about Cadence that would be helpful in this context.
Temporal.io is a company that has forked cadence project and are now building on top of it - naming it temporal.
It is founded by the authors of cadence.
I would suggest using temporal.io as it is under active development

Designing Helm charts for microservices based application

I'm currently building an application that is composed of 4 microservices (a, b, c, d). We would like to make Kubernetes Helm part of our CI/CD pipeline.
We are at the point where we're discussing how best to define the charts and was wondering was the advice from the community.
Our current options appear to be:
a chart per microservice (so 4 charts)
a chart per "application flow" (service a calls b, service c calls d, so 2 charts in total)
a single chart that deploys all 4 microservices
some combination of 1. and 3. where we leverage the dependencies feature of Helm
It might be worth calling out that:
we currently don't have a requirement to deploy any microservice in isolation e.g. make it available to a separate application.
it is likely that we will need to have the ability to scale any microservice independently i.e. not simply replicate all 4 services.
I'm mentioning these requirements because I feel that they may be relevant to the chart design.
Might be a late answer, but FWIW, it depends on how and where you develop your microservices app. If each microservice has it's own repo and CI pipeline, then it makes sense to separate the charts as well (one per service). However, if all services are in the same repo and deployed with a single ci pipeline, then a single chart is a better fit.
All your 4 options will work. And whether you deploy your app with separate charts or with one won't make any difference as long as eventually all your services will be deployed.
As for scaling services independently, if you use separate deployments for your services in one big chart, you can scale them separately by using input values for each deployment in your values.yaml ... so it's not something which forces you to split your charts.
And for your image changing question in the comments, it only needs an upgrade for your installed release with the new image tag.
By the way, We use helmsman for deploying (and managing) helm charts from code in our CI/CD pipelines. Might be useful for you ;)
We have a similar problem, and we choose the lean way: first simple and functional, then evolve.
We started with a simple chart deploying all services because our main requirement it's to have one installer. But we know that in short time we are gonna to refactor to use 3rd party charts and even our own charts in our own repo to handle different deployment strategies and independent evolution of services.
Probably useful resource: https://medium.com/faun/dry-helm-charts-for-micro-services-db3a1d6ecb80
We’ll create a chart for each service, an Umbrella chart, and a Common chart.
The punch here is the Common chart. This chart will contain templates for all the common resources of our services, viz. Deployment, Service, Ingress, ConfigMap and more.
Each service chart will only include the common templates required for the service and supply service-specific values.
Finally, the Umbrella chart will (surprisingly) just unify the services charts.