Is there any support on Cloudbees for worker processes/threads, like Worker Dynos on Heroku or Queues on GAE?
sounds like you're looking for som asynchronous task executor, aren't you ?
inside your application you could implement this as a ServiceExecutor thread pool. Compared to GAE, RUN#Cloud don't have Thread restrictions. To distribute tasks on multiple nodes you will need a message queue service that we don't provide. You can have a look at amazon SQS or rabbit MQ SaaS CloudAMQP
Each app on CloudBees is alike - there is no need to designate one as a worker dyno type thing - people often use that pattern (they have an app that is effectively a worker - but it is just an app - any app can do that work).
You can run things like quartz scheduler as well.
Related
I have a UI where I can start machine learning jobs. When a job is requested, a message is added to a PubSub (kafka) and pulled by the service that will run the job.
I have a problem with this service design. I was thinking about creating the main service on Kubernetes that will pull messages from PubSub then this main service would create pods (or rather jobs) to run the actual ML work.
However, I don't know how to make the main service monitor the "worker" jobs it creates. Do I have to do it manually by persisting the ID of the job somewhere and monitoring it? Also how to deal with the "main" service potential failure?
I feel like this is a "classic" use case but I can't find much about how to solve this.
Thanks for your help
I was looking for a microservice orchestrator and came across Uber Cadence. I have gone through the documentation and also used it in the development setup.
I had a few questions for production scenarios:
Is it recommended to have a dedicated tasklist for the workflow and the different activities used by it? Or, should we use a single tasklist for all? Does this decision impact the scalability or performance?
When we add a new worker machine, is it a common practice to run all the workers for different activities/workflows in the same machine? Example:
Worker.Factory factory = new Worker.Factory("samples-domain");
Worker helloWorkflowWorker = factory.newWorker("HelloWorkflowTaskList");
helloWorkflowWorker.registerWorkflowImplementationTypes(HelloWorkflowImpl.class);
Worker helloActivityWorker = factory.newWorker("HelloActivityTaskList");
helloActivityWorker.registerActivitiesImplementations(new HelloActivityImpl());
Worker upperCaseActivityWorker = factory.newWorker("UpperCaseActivityTaskList");
upperCaseActivityWorker.registerActivitiesImplementations(new UpperCaseActivityImpl());
factory.start();
Or should we run each activity/workflow worker in a dedicated machine?
In a single worker machine, how many workers can we create for a given activity? For example, if we have activity HelloActivityImpl, should we create multiple workers for it in the same worker machine?
I have not found any documentation for production set up. For example, how to install and configure the Cadence Service in production? It will be great if someone can direct me to the right material for this.
In some of the video tutorials, it was mentioned that, for High Availability, we can setup Cadence Service across multiple data centers. How do I configure Cadence service for that?
Unless you need to have separate flow control and rate limiting for a set of activities there is no reason to use more than one task queue per worker process.
As I mentioned in 1 I would rewrite your code as:
Worker.Factory factory = new Worker.Factory("samples-domain");
Worker worker = factory.newWorker("HelloWorkflow");
worker.registerWorkflowImplementationTypes(HelloWorkflowImpl.class);
worker.registerActivitiesImplementations(new HelloActivityImpl(), new UpperCaseActivityImpl());
factory.start();
There is no reason to create more than one worker for the same activity.
Not sure about Cadence. Here is the Temporal documentation that shows how to deploy to Kubernetes.
This documentation is not yet available. We at Temporal are working on it.
You can also use Cadence helmchart https://hub.helm.sh/charts/banzaicloud-stable/cadence
I am actively working with Cadence team to have operation documentation for the community. It will be useful for those don't want to run on K8s, like myself. I will come back later as we make progress.
Current draft version: https://docs.google.com/document/d/1tQyLv2gEMDOjzFibKeuVYAA4fucjUFlxpojkOMAIwnA
will be published to cadence-docs soon.
I am working on a cloud service platform that consists of getting tasks from users, executing them, and giving back the results.
TL;DR
Is there a way to have a "task queue", where tasks can be inserted via a REST API, and extracted automatically by the Google Kubernetes Engine cluster by guaranteeing an automatic scaling?
Long description
Users can send tasks in parallel, and each task is time consuming and need to be performed on a GPU. So, setting up an auto-scaling GPU cluster is what I thought of.
More in particular, in my idea, users could send tasks/data through a REST API, the REST API provides in filling a task queue, and the task queue itself will feed tasks to workers on the GPU auto-scaling cluster. Of course, there are other details (authentication, database, storage, etc.) that have to be addressed but are not the point of my question.
For reasons I don't specify here, the project is already started on the Google Cloud Platform, so switching to AWS or other providers is not an option.
For what I understood, things seem a bit different from standard Docker-only clusters in AWS, that is, we have to use the Google Kubernetes Engine (GKE) to setup the auto-scaling cluster, even for "simple" GPU-enabled Docker containers.
By looking at the not-so-exhaustive documentation, I know that queues are used, but what I don't know is whether feeding of tasks to the cluster is automatically handled. Also, the so-called "Task Queue" service has been deprecated.
Thank you!
First I thought Cloud Tasks queues may be the answer to your troubles, but more this post seems to promote Cloud Pub/Sub as a better alternative.
After a quick chat with batch developers, the current solution (before the batch service become public) is to adopt a third-party queue system like Slurm.
I didn't find could we replace rabbitMQ/activeMQ/SQS with native kubernetes messaging queue?
or they are totally different in terms of features?
It is a totally different mechanism.
Kubernetes internal queues is not a real "queues" you can use in external applications, they are a part of internal messaging system and manage only objects which are parts of Kubernetes.
Moreover, Kubernetes doesn't provide any message queue as a service for external apps (except a situation when your app actually service one of K8s objects).
If you are not sure which service is better for your app - try to check queues.io.
That is a list of almost all available MQ engines with some highlights.
If you are referring to the Parallel Processing Using a Work Queue approach, you can technically use any queuing system, because the main logic is in the code used to get the items from the queue, Kubernetes is used only to control the parallelism.
If the idea is to use the queue algorithm used internally by kubernetes. it is not exposed as a a service for external applications, you would have to copy the code and implement in you application.
I'm fairly new to Akka and new to distributed programming in general. Using Akka's Mist component, I've created supervised actors to handle HTTP requests asynchronously. Everything is currently running on one physical machine with local actors. What I don't understand is how to build a truly fault-tolerant system with more than one box. As stated in the Akka docs:
Also, you (usually) need to know if one box is down and/or the service you are talking to on the other box is down. Here actor supervision/linking is a critical tool for not only monitoring the health of remote services, but to actually manage the service, do something about the problem if the actor or node is down. Such as restarting actors on the same node or on another node.
How do I do this? I'm looking for an example or pointers on how to begin making my application distributed. Other services in our group use Apache gateways in front of multiple Tomcat instances, so the event of a Tomcat server going down is transparent to the user. I'm deploying my service to the Akka microkernel and need to achieve a similar level of high availability across more than one physical box.
I'm using Akka 1.1.3.
Remote supervision works only with client-managed remote actors for the Akka 1.x series.
Akka 2.0 that is currently under development will support transparent clustering, cluster-wide supervision and cluster-wide lifecycle monitoring.
You might consider putting an HTTP load balancer in front of Akka Microkernel instances running Mist, this would match what your group does with 'Apache gateways'.
Another approach would be to expose remote actors on a number of instances and then use Akka's LoadBalancer or Actor Pool to send messages around, see here
The second approach is a bit of a pain if you have a dynamic pool of machines, because the pool of devices wants to be specified programatically. Akka 2.0 addresses this with cluster support that is setup in the akka.conf file.
As far as the release date of 2.0, for what its worth 1.2 was just recently released on 2011-Sept-19.