using a single celery with multiple python daemons - celery

I have "project" written in Python with multiples components: there are several distinct Pyramid and Twisted apps running.
We're looking at using Celery to offload some of the work from Pyramid and Twisted. Just to be clear, we're looking at one Celery instance / config, that handles the work for multiple Pyramid and Twisted apps.
All the info I found online covers multiple Celery for one or more apps; not one Celery for multiple apps. Celery will be doing 4-5 functions that are common to all these apps.
Are there any recommended strategies / common pitfalls for this sort of setup, or should we be generally fine with having a standalone celery_tasks package that all the different projects import ?

It is distributed system. By the definition it doesn't matter from where you call the tasks as long as they get executed by a worker and the caller is able to fetch the results.
You should be fine with both projects configured properly to sending tasks and receiving results. One shared module with common tasks is going to be just fine.
Shared workers should import only that module.

Related

Celery setup for multiple Django Projects running in Docker Swarm

We have multiple Django Projects (in dedicated GitHub repos) running in Docker Swarm. We want to set up Celery in such a way that it can be used across all the projects.
Is there a way to achieve this? I am looking for more ideas and considerations while architecting this.
I have tried setting it up on one of the projects and invoking its tasks from the other projects by using the send_task method. It kind of works in one direction.
How can I make it execute tasks whose definition is not present in the project in which it is running? Or in other words, how can I execute un-unregistered tasks from other projects?

How to reduce load on JBoss Fuse Karaf running Camel routes with ActiveMQ?

I have a broken system running JBoss Fuse 7.5 and 6.3 with Karaf, Camel routes, and ActiveMQ. The main purpose of the system is to move data from A to B using various protocols.
The system was designed over 10 years ago where it worked fine handling a few dozen routes. However the demand has grown from a few dozen to a thousand or more. The load is too much. Routes are breaking and we are losing data, running out of memory, exhausting resources, etc.
I need to keep the system running without error until the replacement system comes online which will be a while. We are required to use the hardware we have. We cannot increase processing power by adding a VM.
Some ideas I have. I am not sure if these can be done.
Restrict number of concurrent processes.
Configure camel routes to run only during certain times of the day. Or have a method to spread the load out over time.
Any suggestions on what can be done?

Scala Spark IntelliJ Idea development process

I am currently using spark to write my dimensional data model and we are currently uploading the jar to an AWS EMR cluster to test. However, this is tedious and time consuming for testing and building tables.
I would like to know what others are doing to speed up their development. The possibilities I came across in my research is running spark jobs directly from the IDE with Intellij Idea and I would like to know other development processes that are being used where it's faster to develop.
The ways I have had tried till now are:
Installing spark and hdfs on two or three commodity PCs and test the code before submitting it on the cluster.
Running the code on the single node to avoid dummy mistakes.
Submitting the jar file on the cluster.
The similar part in the first and third method is making the jar file which may takes a lot of time. The second one is not suitable to find and fix the bugs and problems and raise on distributed running environments.

How a spark application starts using sbt run.

I actually want to know the underlying mechanism of how this happens that when I execute sbt run the spark application starts !
What is the difference between this and running spark on standalone mode and then deploying application on it using spark-submit.
If someone can explain how the jar is submitted and who makes the task and assigns it in both the cases, that would be great.
Please help me out with this or point to some read where i can make my doubts cleared !
First, read this.
Once you are familiar with the terminologies, different roles, and their responsibilities, read below paragraph to summarize.
There are different ways to run a spark application(a spark app is nothing but a bunch of class files with an entry point).
You can run the spark application as single java process(usually for development purposes). This is what happens when you run sbt run.
In this mode, all the services like driver, workers etc are run inside a single JVM.
But above way of running is only for development and testing purposes as it won't scale. That means you won't be able to process a huge amount of data. This is where other ways of running a spark app come into the picture(Standalone, mesos, yarn etc).
Now read this.
In these modes, there will be dedicated JVMs for different roles. Driver will be running as a separate JVM, there could be 10s to 1000s of executor JVMs running on different machines(Crazy right!).
The interesting part is, the same application that runs inside a single JVM will be distributed to run on 1000s of JVMs. This distribution of the application, life-cycle of these JVMs, making them fault-tolerance etc are taken care by Spark and the underlying cluster frameworks.

Simulating computer cluster on simple desktop to test parallel algorithms

I want to try and learn MPI as well as parallel programming.
Can a sandbox be created on my desktop PC?
How can this be done?
Linux and windows solutions are welcome.
If you want to learn MPI, you can definitely do it on a single PC (Most modern MPIs have shared memory based communication for local communication so you don't need additional configuration). So install a popular MPI (MPICH / OpenMPI) on a linux box and get going! If your programs are going to be CPU bound, I'd suggest only running job sizes that equal the number of processor cores on your machine.
Edit: Since you tagged it as a virtualization question, I wanted to add that you could also run MPI on multiple VMs (on VMPlayer or VirtualBox for example) and run your tests. This would need inter-vm networking to be configured (differs based on your virtualization software).
Whatever you choose (single PC vs VMs) it won't change the way you write your MPI programs. Since this is for learning MPI, I'd suggest going with the first approach (run multiple MPI programs on a single PC).
You don't need to have VMs running to launch multiple copies of your application that communicate using MPI .
MPI can help you a virtual cluster on a given single node by launching multiple copies of your applications.
One benifit though, of having it run in a VM is that (as you already mentioned) it provides sand boxing .Thus any issues if your application creates will remain limited to that VM which is running the app copy.