falcon python example with celery - celery

import falcon
import json
from tasks import add
from waitress import serve
class tasksresource:
def on_get(self, req, resp):
"""Handles GET requests"""
self.result = add.delay(1, 2)
self.context = {'ID': self.result.id, 'final result': self.result.ready()}
resp.body = json.dumps(self.context)
api = falcon.API()
api.add_route('/result', tasksresource())
# api.add_route('/result/task', taskresult())
if __name__ == '__main__':
serve(api, host='127.1.0.1', port=5555)
how do i get the Get the task id from json payload ( post data)
and add a route to it

Here a small example. Structure of files:
/project
__init__.py
app.py # routes, falcon etc.
tasks.py # celery
example.py # script for demonstration how it works
app.py:
import json
import falcon
from tasks import add
from celery.result import AsyncResult
class StartTask(object):
def on_get(self, req, resp):
# start task
task = add.delay(4, 4)
resp.status = falcon.HTTP_200
# return task_id to client
result = {'task_id': task.id}
resp.body = json.dumps(result)
class TaskStatus(object):
def on_get(self, req, resp, task_id):
# get result of task by task_id and generate content to client
task_result = AsyncResult(task_id)
result = {'status': task_result.status, 'result': task_result.result}
resp.status = falcon.HTTP_200
resp.body = json.dumps(result)
app = falcon.API()
# registration of routes
app.add_route('/start_task', StartTask())
app.add_route('/task_status/{task_id}', TaskStatus())
tasks.py:
from time import sleep
import celery
app = celery.Celery('tasks', broker='redis://localhost:6379/0', backend='redis://localhost:6379/0')
#app.task
def add(x, y):
"""
:param int x:
:param int y:
:return: int
"""
# sleep just for demonstration
sleep(5)
return x + y
Now we need to start celery application. Go to project folder and run:
celery -A tasks worker --loglevel=info
After this we need to start Falcon application. Go to project folder and run:
gunicorn app:app
Ok. Everything is ready.
example.py is small client side which can help to understand:
from time import sleep
import requests
# start new task
task_info = requests.get('http://127.0.0.1:8000/start_task')
task_info = task_info.json()
while True:
# check status of task by task_id while task is working
result = requests.get('http://127.0.0.1:8000/task_status/' + task_info['task_id'])
task_status = result.json()
print task_status
if task_status['status'] == 'SUCCESS' and task_status['result']:
print 'Task with id = %s is finished' % task_info['task_id']
print 'Result: %s' % task_status['result']
break
# sleep and check status one more time
sleep(1)
Just call python ./example.py and you should see something like this:
{u'status': u'PENDING', u'result': None}
{u'status': u'PENDING', u'result': None}
{u'status': u'PENDING', u'result': None}
{u'status': u'PENDING', u'result': None}
{u'status': u'PENDING', u'result': None}
{u'status': u'SUCCESS', u'result': 8}
Task with id = 76542904-6c22-4536-99d9-87efd66d9fe7 is finished
Result: 8
Hope this helps you.

The above example by Danila Ganchar is great and very helpful. I'm using celery version 4.3.0 with Python 3, and one of the errors I received from using the example above is on this line:
task_result = AsyncResult(task_id)
The error I would receive is:
AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
This may be a recent change, but result.AsyncResult (or just AsyncResult in this example because he imported it from celery.result) doesn't know the backend you are using. There are 2 solutions to solving this problem:
1) You can take the AsyncResult of the actual task itself add.AsyncResult(task_id) because the add task already has the backend defined through the #app.task decorator. The downside to this in this example is you want to be able to get the result for any task by just passing in the task_id via the Falcon endpoint, so this is limited
2) The preferred method is to just pass in the app parameter to the AsyncResult function:
task = result.AsyncResult(id, app=app)
Hope this helps!

Related

How can I de-duplicate responses for pytest?

The responses library provides mocks for requests. In my case, it looks typically like this:
import responses
#responses.activate
def test_foo():
# Add mocks for service A
responses.add(responses.POST, 'http://service-A/foo', json={'bar': 'baz'}, status=200)
responses.add(responses.POST, 'http://service-A/abc', json={'de': 'fg'}, status=200)
#responses.activate
def test_another_foo():
# Add mocks for service A
responses.add(responses.POST, 'http://service-A/foo', json={'bar': 'baz'}, status=200)
responses.add(responses.POST, 'http://service-A/abc', json={'de': 'fg'}, status=200)
How can I avoid this code duplication?
I would love to have a mock_service_a fixture or something similar.
Just as you suggest, creating a fixture solves these issues.
import pytest
import responses
import requests
#pytest.fixture(scope="module", autouse=True)
def mocked_responses():
with responses.RequestsMock() as rsps:
rsps.add(
responses.POST, "http://service-a/foo", json={"bar": "baz"}, status=200
)
rsps.add(
responses.POST, "http://service-a/abc", json={"de": "fg"}, status=200
)
yield rsps
def test_foo():
resp = requests.post("http://service-a/foo", json={"bar": "baz"})
assert resp.status_code == 200
def test_another_foo():
resp = requests.post("http://service-a/abc", json={"de": "fg"})
assert resp.status_code == 200
Running it returns:
==================================== test session starts =====================================
platform darwin -- Python 3.9.1, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: **
collected 2 items
tests/test_grab.py .. [100%]
===================================== 2 passed in 0.21s ======================================

Python Faust await agent.ask() doesn't return reply and hangs function calling it

I am new to Python, playing around with stuff, trying to communicate python services via Kafka using Faust.
So I have small PoC project.
Faust app definition:
# app.py
import faust as f
from models import ReadRequest, ReadResponse
app = f.App("faust-app", broker="kafka://localhost:9092", store="rocksdb://")
topics = {"read-request": app.topic("read-request", value_type=ReadRequest)}
def get_app() -> f.types.AppT:
return app
def get_topic(name: str) -> f.types.TopicT:
return topics[name]
My DB reader agent:
# reader.py
import pandas as pd
from pymongo import MongoClient
from app import get_app, get_topic
client = MongoClient()
app = get_app()
req_topic = get_topic("read-request")
#app.agent(req_topic)
async def read_request(requests):
async for request in requests:
db = client.test
coll = db[request["collection"]]
result = coll.find(request["query"])
df = pd.DataFrame(result)
response = {
"id": request["id"],
"data": list(df.loc[:, df.columns != "_id"].to_dict(orient="records")),
}
print(response) # debug <1>
yield response
Model definitions:
# models.py
import faust as f
class ReadRequest(f.Record):
id: int
collection: str
query: dict
Test agent.ask()
# test.py
import asyncio
from reader import read_request
from models import ReadRequest
async def run():
result = await read_request.ask(ReadRequest(id=1, collection="test", query={}))
print(result) # debug <2>
if __name__ == "__main__":
asyncio.run(run())
So I have zookeeper, kafka server, mongodb and faust worker reader running. Everything is using out of the box configs.
When I run python3 test.py I see debug <1> print output as expected, but debug <2> never goes off and execution hangs there.
Faust docs say
So I assume that I am doing everything right.
Anyone has any clues?

Apache Airflow - trigger/schedule DAG rerun on completion (File Sensor)

Good Morning.
I'm trying to setup a DAG too
Watch/sense for a file to hit a network folder
Process the file
Archive the file
Using the tutorials online and stackoverflow I have been able to come up with the following DAG and Operator that successfully achieves the objectives, however I would like the DAG to be rescheduled or rerun on completion so it starts watching/sensing for another file.
I attempted to set a variable max_active_runs:1 and then a schedule_interval: timedelta(seconds=5) this yes reschedules the DAG but starts queuing task and locks the file.
Any ideas welcome on how I could rerun the DAG after the archive_task?
Thanks
DAG CODE
from airflow import DAG
from airflow.operators import PythonOperator, OmegaFileSensor, ArchiveFileOperator
from datetime import datetime, timedelta
from airflow.models import Variable
default_args = {
'owner': 'glsam',
'depends_on_past': False,
'start_date': datetime.now(),
'provide_context': True,
'retries': 100,
'retry_delay': timedelta(seconds=30),
'max_active_runs': 1,
'schedule_interval': timedelta(seconds=5),
}
dag = DAG('test_sensing_for_a_file', default_args=default_args)
filepath = Variable.get("soucePath_Test")
filepattern = Variable.get("filePattern_Test")
archivepath = Variable.get("archivePath_Test")
sensor_task = OmegaFileSensor(
task_id='file_sensor_task',
filepath=filepath,
filepattern=filepattern,
poke_interval=3,
dag=dag)
def process_file(**context):
file_to_process = context['task_instance'].xcom_pull(
key='file_name', task_ids='file_sensor_task')
file = open(filepath + file_to_process, 'w')
file.write('This is a test\n')
file.write('of processing the file')
file.close()
proccess_task = PythonOperator(
task_id='process_the_file',
python_callable=process_file,
provide_context=True,
dag=dag
)
archive_task = ArchiveFileOperator(
task_id='archive_file',
filepath=filepath,
archivepath=archivepath,
dag=dag)
sensor_task >> proccess_task >> archive_task
FILE SENSOR OPERATOR
import os
import re
from datetime import datetime
from airflow.models import BaseOperator
from airflow.plugins_manager import AirflowPlugin
from airflow.utils.decorators import apply_defaults
from airflow.operators.sensors import BaseSensorOperator
class ArchiveFileOperator(BaseOperator):
#apply_defaults
def __init__(self, filepath, archivepath, *args, **kwargs):
super(ArchiveFileOperator, self).__init__(*args, **kwargs)
self.filepath = filepath
self.archivepath = archivepath
def execute(self, context):
file_name = context['task_instance'].xcom_pull(
'file_sensor_task', key='file_name')
os.rename(self.filepath + file_name, self.archivepath + file_name)
class OmegaFileSensor(BaseSensorOperator):
#apply_defaults
def __init__(self, filepath, filepattern, *args, **kwargs):
super(OmegaFileSensor, self).__init__(*args, **kwargs)
self.filepath = filepath
self.filepattern = filepattern
def poke(self, context):
full_path = self.filepath
file_pattern = re.compile(self.filepattern)
directory = os.listdir(full_path)
for files in directory:
if re.match(file_pattern, files):
context['task_instance'].xcom_push('file_name', files)
return True
return False
class OmegaPlugin(AirflowPlugin):
name = "omega_plugin"
operators = [OmegaFileSensor, ArchiveFileOperator]
Dmitris method worked perfectly.
I also found in my reading setting schedule_interval=None and then using the TriggerDagRunOperator worked equally as well
trigger = TriggerDagRunOperator(
task_id='trigger_dag_RBCPV99_rerun',
trigger_dag_id="RBCPV99_v2",
dag=dag)
sensor_task >> proccess_task >> archive_task >> trigger
Set schedule_interval=None and use airflow trigger_dag command from BashOperator to launch next execution at the completion of the previous one.
trigger_next = BashOperator(task_id="trigger_next",
bash_command="airflow trigger_dag 'your_dag_id'", dag=dag)
sensor_task >> proccess_task >> archive_task >> trigger_next
You can start your first run manually with the same airflow trigger_dag command and then trigger_next task will automatically trigger the next one. We use this in production for many months now and and it runs perfectly.

Celery broadcast task not working

I've tried to make a broadcast task but only one of my workers recieve it per each call. Would you please help me? (I'm using rabbitmq and node-celery)
default_exchange = Exchange('celery', type='direct')
celery.conf.update(
CELERY_RESULT_BACKEND = "amqp",
CELERY_RESULT_SERIALIZER='json',
CELERY_QUEUES = (
Queue('celery', default_exchange, routing_key='celery'),
Broadcast('broadcast_tasks'),
),
CELERY_ROUTES = (
{'my_tasks.sample_broadcast_task': {
'queue': 'broadcast_tasks',
}},
{'my_tasks.sample_normal_task': {
'queue': 'celery',
'exchange': 'celery',
'exchange_type': 'direct',
'routing_key': 'celery',
}}
),
)
I've also test following configurtion but not working.
celery.conf.update(
CELERY_RESULT_BACKEND = "amqp",
CELERY_RESULT_SERIALIZER='json',
CELERY_QUEUES=(
Queue('celery', Exchange('celery'), routing_key='celery'),
Broadcast('broadcast'),
),
)
#celery.task(ignore_result=True, queue='broadcast',
options=dict(queue='broadcast'))
def sample_broadcast_task():
print "test"
EDIT
after changing how to run worker by adding -Q broadcast, now i face to this error:
PreconditionFailed: Exchange.declare: (406) PRECONDITION_FAILED - inequivalent arg 'type' for exchange 'broadcast' in vhost '/': received 'direct' but current is 'fanout'
After trying many many many things, i finally find a solution. This work for me.
( celery 3.1.24 (Cipater) and Python 2.7.12 )
WORKER - tasks.py :
from celery import Celery
import celery_config
from kombu.common import Broadcast, Queue, Exchange
app = Celery()
app.config_from_object(sysadmin_celery_config)
#app.task
def print_prout(x):
print x
return x
WORKER - celery_config.py :
# coding=utf-8
from kombu.common import Broadcast, Queue, Exchange
BROKER_URL = 'amqp://login:pass#172.17.0.1//'
CELERY_RESULT_BACKEND = 'redis://:login#172.17.0.1'
CELERY_TIMEZONE = 'Europe/Paris'
CELERY_ENABLE_UTC = True
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'pickle'
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
CELERY_DISABLE_RATE_LIMITS = True
CELERY_ALWAYS_EAGER = False
CELERY_QUEUES = (Broadcast('broadcast_tasks'), )
worker lauched with :
celery -A celery_worker.tasks worker --loglevel=info --concurrency=1 -n worker_name_1
On the client (another docker container for me).
from celery import Celery
from celery_worker import tasks
result = tasks.print_prout.apply_async(['prout'], queue='broadcast_tasks')
print result.get()
The next step for me is how to retrieve and display results returned by all the workers. The "print result.get()" seems to return only the result of the last worker.
It does not seem obvious ( Have Celery broadcast return results from all workers )
according to your description:
I've tried to make a broadcast task but only one of my workers recieve it per each call
you may be using direct type exchange.
Try this
from celery import Celery
from kombu.common import Broadcast
BROKER_URL = 'amqp://guest:guest#localhost:5672//'
class CeleryConf:
# List of modules to import when celery starts.
CELERY_ACCEPT_CONTENT = ['json']
CELERY_IMPORTS = ('main.tasks')
CELERY_QUEUES = (Broadcast('q1'),)
CELERY_ROUTES = {
'tasks.sampletask': {'queue': 'q1'}
}
celeryapp = Celery('celeryapp', broker=BROKER_URL)
celeryapp.config_from_object(CeleryConf())
#celeryapp.task
def sampletask(form):
print form
To send the message, do
d= sampletask.apply_async(['4c5b678350fc643'],serializer="json", queue='q1')

Flask, blueprints uses celery task and got cycle import

I have an application with Blueprints and Celery
the code is here:
config.py
import os
from celery.schedules import crontab
basedir = os.path.abspath(os.path.dirname(__file__))
class Config:
SECRET_KEY = os.environ.get('SECRET_KEY') or ''
SQLALCHEMY_COMMIT_ON_TEARDOWN = True
RECORDS_PER_PAGE = 40
SQLALCHEMY_DATABASE_URI = ''
CELERY_BROKER_URL = ''
CELERY_RESULT_BACKEND = ''
CELERY_RESULT_DBURI = ''
CELERY_TIMEZONE = 'Europe/Kiev'
CELERY_ENABLE_UTC = False
CELERYBEAT_SCHEDULE = {}
#staticmethod
def init_app(app):
pass
class DevelopmentConfig(Config):
DEBUG = True
WTF_CSRF_ENABLED = True
APP_HOME = ''
SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://...'
CELERY_BROKER_URL = 'sqla+mysql://...'
CELERY_RESULT_BACKEND = "database"
CELERY_RESULT_DBURI = 'mysql://...'
CELERY_TIMEZONE = 'Europe/Kiev'
CELERY_ENABLE_UTC = False
CELERYBEAT_SCHEDULE = {
'send-email-every-morning': {
'task': 'app.workers.tasks.send_email_task',
'schedule': crontab(hour=6, minute=15),
},
}
class TestConfig(Config):
DEBUG = True
WTF_CSRF_ENABLED = False
TESTING = True
SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://...'
class ProdConfig(Config):
DEBUG = False
WTF_CSRF_ENABLED = True
SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://...'
CELERY_BROKER_URL = 'sqla+mysql://...celery'
CELERY_RESULT_BACKEND = "database"
CELERY_RESULT_DBURI = 'mysql://.../celery'
CELERY_TIMEZONE = 'Europe/Kiev'
CELERY_ENABLE_UTC = False
CELERYBEAT_SCHEDULE = {
'send-email-every-morning': {
'task': 'app.workers.tasks.send_email_task',
'schedule': crontab(hour=6, minute=15),
},
}
config = {
'development': DevelopmentConfig,
'default': ProdConfig,
'production': ProdConfig,
'testing': TestConfig,
}
class AppConf:
"""
Class to store current config even out of context
"""
def __init__(self):
self.app = None
self.config = {}
def init_app(self, app):
if hasattr(app, 'config'):
self.app = app
self.config = app.config.copy()
else:
raise TypeError
init.py:
import os
from flask import Flask
from celery import Celery
from config import config, AppConf
def create_app(config_name):
app = Flask(__name__)
app.config.from_object(config[config_name])
config[config_name].init_app(app)
app_conf.init_app(app)
# Connect to Staging view
from staging.views import staging as staging_blueprint
app.register_blueprint(staging_blueprint)
return app
def make_celery(app=None):
app = app or create_app(os.getenv('FLASK_CONFIG') or 'default')
celery = Celery(__name__, broker=app.config.CELERY_BROKER_URL)
celery.conf.update(app.conf)
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask
return celery
tasks.py:
from app import make_celery, app_conf
cel = make_celery(app_conf.app)
#cel.task
def send_realm_to_fabricdb(realm, form):
some actions...
and here is the problem:
The Blueprint "staging" uses task send_realm_to_fabricdb, so it makes: from tasks import send_realm_to_fabricdb
than, when I just run application, everything goes ok
BUT, when I'm trying to run celery: celery -A app.tasks worker -l info --beat, it goes to cel = make_celery(app_conf.app) in tasks.py, got app=None and trying to create application again: registering a blueprint... so I've got cycle import here.
Could you tell me how to break this cycle?
Thanks in advance.
I don't have the code to try this out, but I think things would work better if you move the creation of the Celery instance out of tasks.py and into the create_app function, so that it happens at the same time the app instance is created.
The argument you give to the Celery worker in the -A option does not need to have the tasks, Celery just needs the celery object, so for example, you could create a separate starter script, say celery_worker.py that calls create_app to create app and cel and then give it to the worker as -A celery_worker.cel, without involving the blueprint at all.
Hope this helps.
What I do to solve this error is that I create two Flask instance which one is for Web app, and another is for initial Celery instance.
Like #Miguel said, I have
celery_app.py for celery instance
manager.py for Flask instance
And in these two files, each module has it's own Flask instance.
So I can use celery.task in Views. And I can start celery worker separately.
Thanks Bob Jordan, you can find the answer from https://stackoverflow.com/a/50665633/2794539,
Key points:
1. make_celery do two things at the same time: create celery app and run celery with flask content, so you can create two functions to do make_celery job
2. celery app must init before blueprint register
Having the same problem, I ended up solving it very easily using shared_task (docs), keeping a single app.py file and not having to instantiate the flask app multiple times.
The original situation that led to the circular import:
from src.app import celery # src.app is ALSO importing the blueprints which are importing this file which causes the circular import.
#celery.task(bind=True)
def celery_test(self):
sleep(5)
logger.info("Task processed by Celery.")
The current code that works fine and avoids the circular import:
# from src.app import celery <- not needed anymore!
#shared_task(bind=True)
def celery_test(self):
sleep(5)
logger.info("Task processed by Celery.")
Please mind that I'm pretty new to Celery so I might be overseeing important stuff, it would be great if someone more experienced can give their opinion.