Celery configuration gets updated when calling a different task

Celery configuration gets updated when calling a different task - celery

I have multiple tasks as different django apps using a RabbitMQ broker. This was setup with standard django configuration and was working perfectly. I was using groups, chains and calling them from different modules.
As a standard practice, I had:
celery.py:
app = Celery('<proj>')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
And in project/init.py:
from __future__ import absolute_import
from .celery import app as celery_app
All tasks were inherited from celery.Task with run() overwritten.
Now I got a requirement to call a different task on a different RabbitMQ broker.
So here's what I did where I had to call the different task:
diff_app = Celery('diff')
diff_app.config_from_object({'BROKER_URL':'<DIFF_BROKER_URL>'})
Now to call:
diff_app.send_task('<task_name>', (args1,arg2,))
After I do this, when I call my previous tasks, they get routed to this new broker. The moment I comment out this code, everything is fine back again.
When I check celery_app (described above) conf, the broker url is correct. But when I check any previous task->app->conf->broker url, it is updated with new broker. How to fix this?

I removed 'autodiscover_tasks' and associated '_app' with each 'Task' class. This got me through with the issue.

Related

Laravel 8 "Queue::push" is working, but "dispatch" is not

I'm facing an issue with Laravel queued jobs.
I'm using Laravel v8.40.0 with Redis v6.2.5 and Horizon v5.7.14 for managing jobs.
I have a job class called MyJob which should write a message in log file.
If I use Queue::push(new MyJob()) everything seems to work fine: I see the job in Horizon and the new row in log file.
But if I use dispatch(new MyJob()) or MyJob::dispatch() it doesn't seem to push my job into queue: I can't see the job in Horizon and I see no results in log file.
I was following the docs (https://laravel.com/docs/8.x/queues#dispatching-jobs) to use queues correctly and I don't understand where I'm doing wrong.
Thank you

What’s the best way to log messages from Cadence workflows and activities?

In my workflows and activities, I’d like to log some messages for debugging purposes.
I saw the cadence.GetLogger(ctx).Info() function, but don’t know where to find the logs.

Go Client:
You can use the following in the workflow code:
cadence.GetLogger(ctx).Info(...)
In the activity code, you should use the following:
cadence.GetActivityLogger(ctx).Info(...)
By default, the logger will write to console, which may be sufficient for development purposes. However, you should log to a file if you need the logs in production, too. Here is how to setup your cadence worker to do it:
workerOptions := cadence.WorkerOptions{
Logger: myLogger,
}
worker := cadence.NewWorker(service, domain, taskList, workerOptions)
The Cadence client uses zap as the logging framework. You can create the zap logger and specify the log file path per your needs. Check out the zap documentation to learn more about configuring the logs.
Java Client
The Java client uses slf4j for logging. You can get the logger instance by calling Workflow.getLogger() and configure it in logback.xml as usual.

Headless browser and locust.io

Is it possible to integrate a headless browser with locust? I need my load tests to process client side script that triggers additional requests on each page load.

That's an old question, but I came across it now, and found a solution in "realbrowserlocusts" (https://github.com/nickboucart/realbrowserlocusts) - it adds "Real Browser support for Locust.io load testing" using selenium.
If you use one of its classes (FirefoxLocust, ChromeLocust, PhantomJSLocust) instead of HttpLocust for your locust user class
class WebsiteUser(HeadlessChromeLocust):
then in your TaskSet self.client becomes an instance of selenium WebDriver.
One drawback for me was that webdriver (unlike built-in client in HttpLocust) doesn't know about "host", which forces to use absolute URLs in TaskSet instead of relative ones, and it's really convenient to use relative URLs when working with different environments (local, dev, staging, prod, etc.).
But there is an easy fix for this: to inherit from one of realbrowserlocusts' locusts and pass "host" to WebDriver instance:
from locust import TaskSet, task, between
from realbrowserlocusts import HeadlessChromeLocust
class UserBehaviour(TaskSet):
#task(1)
def some_action(self):
# self.client is selenium WebDriver instance
self.client.get(self.client.base_host + "/relative/url")
# and then for inst. using selenium methods:
self.client.find_element_by_name("form-name").send_keys("your text")
# etc.
class ChromeLocustWithHost(HeadlessChromeLocust):
def __init__(self):
super(ChromeLocustWithHost, self).__init__()
self.client.base_host = self.host
class WebsiteUser(ChromeLocustWithHost):
screen_width = 1200
screen_height = 1200
task_set = UserBehaviour
wait_time = between(5, 9)
============
UPDATE from September 5, 2020:
I posted this solution in March 2020, when locust was on major version 0. Since then, in May 2020, they released version 1.0.0 in which some backward incompatible changes were made (one of which - renaming StopLocust to StopUser). realbrowserlocusts was not updated for a while, and is not updated yet to work with locust >=1.
There is a workaround though. When locust v1.0.0 was release, previous versions were released under a new name - locustio with the last version 0.14.6, so if you install "locustio==0.14.6" (or "locustio<1"), then a solution with realbrowserlocusts still works (I checked just now). (see https://github.com/nickboucart/realbrowserlocusts/issues/13).
You have to limit a version of locustio, as it refuses to install without it:
pip install locustio
...
ERROR: Command errored out with exit status 1:
...
**** Locust package has moved from 'locustio' to 'locust'.
Please update your reference
(or pin your version to 0.14.6 if you dont want to update to 1.0)

In theory you could make a headless browser a Locust slave/worker. But the problem is that the browser is much more expensive in terms of CPU and memory which would make it difficult to scale.
That is why Locust uses small greenlets to simulate users since they much cheaper to construct and run.
I would recommend you to break down your page's requests and encode them as requests inside of Locust. The Network tab in Chrome's Dev Tools is probably a good start. I've also heard of people capturing these by going through a proxy that logs all requests for you.

You could use something like browserless to take care of the hosting of Chrome (https://browserless.io). Depending on how you brutal your load tests are there’s varying degrees of concurrency. Full disclaimer: I’m the maker of the browserless service

I think locust is not desinged for that purposes, it is for creating concurrent user to make http requests so I didnt see any integration with locust and browser. However you can simulate browser by sending extra information in the header with that way client side scripts will also work.
r = self.client.get("/orders", headers = {"Cookie": self.get_user_cookie(user[0]), 'User-Agent': self.user_agent})

The locust way of solving this is to add more requests to your test that mimic the requests that the javascript code will make.
I structure my locust tests to parse the JSON response from an early request in the app's workflow. I then randomly pick some interesting piece of data from that JSON, and then issue more requests that mimic what would happen in the browser if the user had clicked on that piece of data.

Celery Beat - Pyramid Mailer

So, I have some plain python code which works pefectly in a normal python shell:
from pyramid_mailer.mailer import Mailer
from pyramid_mailer.message import Message
from pyramid_mailer.message import Attachment
mailer = Mailer(
host="172.10.10.240",
port="25")
message = Message(
subject="Orders with invalid status",
sender='r#example.com'],
recipients=['luke#example.com'],
html="<p>Test</p>")
mailer.send_immediately(message)
But, If I create a celery beat task such as this:
from pyramid_celery import celery_app as app
from pyramid_mailer.mailer import Mailer
from pyramid_mailer.message import Message
from pyramid_mailer.message import Attachment
mailer = Mailer(
host="172.10.10.240",
port="25")
#app.task
def wronglines_celery():
message = Message(
subject="Orders with invalid status",
sender='r#example.com'],
recipients=['luke#example.com'],
html="<p>Test</p>")
mailer.send_immediately(message)
This second example does not generate an email, it runs perfectly fine and throws no error at all, even with the log level set to DEBUG.
Running celery beat with:
celery beat -A pyramid_celery.celery_app --ini development.ini
Using the pyramid_celery plug-in as referenced in the official documentation on the celery website. My development.ini file can be seen below (relevant parts):
[celery]
BROKER_URL = amqp://app_rmq:password#localhost:5672/myvhost
CELERY_IMPORTS = intranet.celery_tasks
# Check once a day for orders with wrong line status
[celerybeat:task1]
task = intranet.celery_tasks.wronglines_celery
type = crontab
schedule = {"hour": 16, "minute": 30}
[logger_celery]
level = DEBUG
handlers =
qualname = celery
# Begin logging configuration
[loggers]
keys = root, intranet, sqlalchemy, celery
EDIT:
If I launch celery (without beat) it works perfectly, e.g. if I launch with:
celery worker -A pyramid_celery.celery_app --ini development.ini
All tasks execute (over and over) but all emails send and nothing throws an error, it seems to be the introduction of beat which is causing issues.

Are you sure its not working? The way we've configured your crontab it says "Only run once a day at 4:30". So if you ran that until it hit 4:30 I would expect it to execute properly.
Can you change your schedule to be {} instead to have it run every minute as a basic test?
I've added a crontab example to the examples here:
https://github.com/sontek/pyramid_celery/blob/master/examples/scheduler_example/development.ini#L33-L36
If you can provide more code (maybe a sample repo or modification of the examples already in the repo) that shows it not working I can take a look and hopefully fix the bug.

So, after much googlig and frustrating debugging I found an old github issue. That claimed celery tasks were working only when launched with a worker, and not with beat. The user states
Beat does not execute tasks, it just sends the messages. You need both a beat instance and a worker instance!
So to launch the work and the beat instance with the same command, shown here:
celery worker --beat -A pyramid_celery.celery_app --ini development.ini
I will be sending a pull request today to fix the documentation with regards to the correct way to launch a worker and beat instance.

By default, Celery tasks silently fail on error output. It most likely throws an exception which you never seen.
To be sure what's going to fail, put pdb (ipdb) breakpoint in task code, start celery worker on the foreground and step through the code line-by-line.

Triggering spark jobs with REST

I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.
I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job.
I have now few design options.
Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be JobScheduler .
/*Can this Code be abstracted from the application and written as
as a seperate job. Because my understanding is that the
Application code itself has to have the addJars embedded
which internally sparkContext takes care.*/
SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars(
new String[] { "/path/to/jar/submit/cluster" })
.setMaster("/url/of/master/node");
sparkConf.setSparkHome("/path/to/spark/");
sparkConf.set("spark.scheduler.mode", "FAIR");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.setLocalProperty("spark.scheduler.pool", "test");
// Application with Algorithm , transformations
extending above point have multiple versions of jobs handled by service.
Or else use an Spark Job Server to do this.
Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.
Note : I am using a standalone cluster from spark.
kindly help.

It turns out Spark has a hidden REST API to submit a job, check status and kill.
Check out full example here: http://arturmkrtchyan.com/apache-spark-hidden-rest-api

Just use the Spark JobServer
https://github.com/spark-jobserver/spark-jobserver
There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch

Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

Here is a good client that you might find helpful: https://github.com/ywilkof/spark-jobs-rest-client
Edit: this answer was given in 2015. There are options like Livy available now.

Even I had this requirement I could do it using Livy Server, as one of the contributor Josemy mentioned. Following are the steps I took, hope it helps somebody:
Download livy zip from https://livy.apache.org/download/
Follow instructions: https://livy.apache.org/get-started/
Upload the zip to a client.
Unzip the file
Check for the following two parameters if doesn't exists, create with right path
export SPARK_HOME=/opt/spark
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
Enable 8998 port on the client
Update $LIVY_HOME/conf/livy.conf with master details any other stuff needed
Note: Template are there in $LIVY_HOME/conf
Eg. livy.file.local-dir-whitelist = /home/folder-where-the-jar-will-be-kept/
Run the server
$LIVY_HOME/bin/livy-server start
Stop the server
$LIVY_HOME/bin/livy-server stop
UI: <client-ip>:8998/ui/
Submitting job:POST : http://<your client ip goes here>:8998/batches
{
"className" : "<ur class name will come here with package name>",
"file" : "your jar location",
"args" : ["arg1", "arg2", "arg3" ]
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Celery configuration gets updated when calling a different task - celery

I removed 'autodiscover_tasks' and associated '_app' with each 'Task' class. This got me through with the issue.

Related

Laravel 8 "Queue::push" is working, but "dispatch" is not

What’s the best way to log messages from Cadence workflows and activities?

Headless browser and locust.io

Celery Beat - Pyramid Mailer

Triggering spark jobs with REST

Categories

Resources