celeryd insists on checking amqp when its configured to use redis - celery

I am trying to run celerdy + redis in my setup.
CELERYD_NODES="worker1"
CELERYD_NODES="worker1 worker2 worker3"
CELERY_BIN="/home/snijsure/.virtualenvs/mtest/bin/celery"
CELERYD_CHDIR="/home/snijsure/work/mytest/"
CELERYD_OPTS="--time-limit=300 --concurrency=8"
CELERYD_LOG_FILE="/var/log/celery/%N.log"
CELERYD_PID_FILE="/var/run/celery/%N.pid"
CELERYD_USER="celery"
CELERYD_GROUP="celery"
CELERY_CREATE_DIRS=1
export DJANGO_SETTINGS_MODULE="analytics.settings.local"
I have following in my base.py
BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
BROKER_HOST = "localhost"
BROKER_BACKEND="redis"
REDIS_PORT=6379
REDIS_HOST = "localhost"
BROKER_USER = ""
BROKER_PASSWORD =""
BROKER_VHOST = "0"
REDIS_DB = 0
REDIS_CONNECT_RETRY = True
CELERY_SEND_EVENTS=True
CELERY_RESULT_BACKEND='redis'
CELERY_TASK_RESULT_EXPIRES = 10
CELERYBEAT_SCHEDULER="djcelery.schedulers.DatabaseScheduler"
CELERY_ALWAYS_EAGER = False
import djcelery
djcelery.setup_loader()
However when I start the celeryd using /etc/init.d/celerdy start
I see following messages in my log files
[2014-08-14 23:16:41,430: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#127.0.0.1:5672//: [Errno 111] Connection refused.
Trying again in 32.00 seconds...
It seems like its trying to connect to amqp. Any ideas on why that is I have followed procedure outlined here
http://celery.readthedocs.org/en/latest/getting-started/brokers/redis.html
I am running version 3.1.13 (Cipater)
What am I doing wrong?
-Subodh

How do you start you celery worker? I encounter this error once because I didn't start it right. You should add -A option when execute "celery worker" so that celery will connect to the broker you configured in your Celery Obj. Otherwise celery will try to connect the default broker.

Your /etc/default/celeryd file looks ok.
You are using djcelery, however. I'd recommend you drop that. If you look at the Django setup guide and example project you will notice that there are no longer any INSTALLED_APPS required for celery. It appears that djcelery is now only recommended if you want to use the Django SQL database as a backend.
https://github.com/celery/celery/tree/3.1/examples/django/
http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html#using-celery-with-django
I've just rebuilt against that pattern and I can confirm that it works ok, at least in terms of connecting to Redis rather than trying to use RabbitMQ (amqp).

Related

Need help setting BROKER_URL in Airflow's config and Celery Executor

Summary
I'm using Apache-Airflow for the first time. I've gotten the webserver, SequentialExecutor and LocalExecutor to work, but I'm running into issues when using the CeleryExecutor with rabbitmq-server. I currently have two AWS EC2 instances.
Error
To summarize: My worker cannot connect to the rabbitmq-server on my scheduler node. Whenever I run airflow worker on the worker instance, it gives:
- ** ---------- [config]
- ** ---------- .> app: airflow.executors.celery_executor:0x7f53a8dce400
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> default exchange=default(direct) key=default
[2019-02-15 02:26:23,742: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#127.0.0.1:5672//: [Errno 111] Connection refused.
Configuration
I followed all of the directions I could find online. Both instances have the same airflow.cfg file, with
[core]
executor = CeleryExecutor
[celery]
broker_url = pyamqp://username:password#hostname:port/virtual_host
and result_backend pointing at the same MySQL database on RDS that airflow is working off of.
From what I could tell, no matter what, the worker node always tried connecting to a local rabbitmq-server and completely ignored that broker_url in my airflow.cfg file.
What I've Tried
I went spelunking in the source code, and noticed in celery/app/base.py, if I error log out the configurations it gets in _get_config() when it goes to create a connection, there are actually TWO values in the dictionary returned.
BROKER_URL = None
broker_url = pyamqp://username:password#hostname:port/virtual_host
and all of the connection logic seems to point at the BROKER_URL key.
I tried setting BROKER_URL and CELERY_BROKER_URL in airflow.cfg, but it seems to be case insensitive, and ignores the latter. Just to see if it would work, I modified the _get_config() method and hacked in:
s['BROKER_URL'] = s['broker_url']
return s
And, like I expected, everything started working.
Am I doing something wrong? I'd really rather not use this hack, but I can't understand why it's ignoring the configuration values.
Thanks!
From the error message, it seems like the hostname being passed in the URI is wrong:
If rabbitmq-server and worker are in different machines: instead of localhost/127.0.0.1, the hostname should be the IP address of the rabbitmq machine
If rabbitmq-server and worker are in the same machine as part of a Docker Compose application (e.g. if you took inspiration from here): the hostname should be the service name associated to the RabbitMQ image in docker-compose.yml, e.g. amqp://guest:guest#rabbitmq:5672/

Apache Airflow celery executor is not getting result backend

I am running Apache Airflow version 1.9.0 and when I try to run a task from UI, I get the following error in airflow scheduler console:
[2018-05-08 12:09:06,737] {jobs.py:1077} INFO - No tasks to consider for execution.
[2018-05-08 12:09:06,738] {jobs.py:1662} INFO - Heartbeating the executor
[2018-05-08 12:09:06,738] {celery_executor.py:101} ERROR - Error syncing the celery executor, ignoring it:
[2018-05-08 12:09:06,738] {celery_executor.py:102} ERROR - No result backend configured. Please see the documentation for more information.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", line 83, in sync
state = async.state
File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 329, in state
return self.backend.get_status(self.id)
File "/usr/local/lib/python2.7/dist-packages/celery/backends/base.py", line 547, in _is_disabled
'No result backend configured. '
NotImplementedError: No result backend configured. Please see the documentation for more information.
In my airflow.cfg, I have the following variables in [celery] section:
celery_app_name = airflow.executors.celery_executor
celeryd_concurrency = 16
worker_log_server_port = 8795
broker_url = amqp://guest:guest#localhost:5672//
celery_result_backend = amqp://guest:guest#localhost:5672//
flower_host = 0.0.0.0
flower_port = 5555
default_queue = default
What am I doing wrong here?
You should not point celery_result_backend to a RabbitMQ instance since the purpose of this backend is to store information concerning the status of the tasks and RabbitMQ is not the right tool for that (Please correct me if I'm mistaken).
You can use Redis in case you want to keep using the same instance as broker and backend, or alternatively you can use postgres as the backend which I recommend. A sample configuration for Postgres would be the following:
celery_result_backend = db+postgresql://airflow:****#postgres/airflow
More info on the official docs: Here

uWSGI, Flask, sqlalchemy, and postgres: SSL error: decryption failed or bad record mac

I'm trying to setup an application webserver using uWSGI + Nginx, which runs a Flask application using SQLAlchemy to communicate to a Postgres database.
When I make requests to the webserver, every other response will be a 500 error.
The error is:
Traceback (most recent call last):
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
context)
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 388, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
The above exception was the direct cause of the following exception:
sqlalchemy.exc.OperationalError: (OperationalError) SSL error: decryption failed or bad record mac
The error is triggered by a simple Flask-SQLAlchemy method:
result = models.Event.query.get(id)
uwsgi is being managed by supervisor, which has a config:
[program:my_app]
command=/usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini --catch-exceptions
directory=/path/to/my/app
stopsignal=QUIT
autostart=true
autorestart=true
and uwsgi's config looks like:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
The furthest that I can get is that it has something to do with uwsgi's forking. But beyond that I'm not clear on what needs to be done.
The issue ended up being uwsgi's forking.
When working with multiple processes with a master process, uwsgi initializes the application in the master process and then copies the application over to each worker process. The problem is if you open a database connection when initializing your application, you then have multiple processes sharing the same connection, which causes the error above.
The solution is to set the lazy configuration option for uwsgi, which forces a complete loading of the application in each process:
lazy
Set lazy mode (load apps in workers instead of master).
This option may have memory usage implications as Copy-on-Write semantics can not be used. When lazy is enabled, only workers will be reloaded by uWSGI’s reload signals; the master will remain alive. As such, uWSGI configuration changes are not picked up on reload by the master.
There's also a lazy-apps option:
lazy-apps
Load apps in each worker instead of the master.
This option may have memory usage implications as Copy-on-Write semantics can not be used. Unlike lazy, this only affects the way applications are loaded, not master’s behavior on reload.
This uwsgi configuration ended up working for me:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
# the fix
lazy = true
lazy-apps = true
As an alternative you might dispose the engine. This is how I solved the problem.
Such issues may happen if there is a query during the creation of the app, that is, in the module that creates the app itself. If that states, the engine allocates a pool of connections and then uwsgi forks.
By invoking 'engine.dispose()', the connection pool itself is closed and new connections will come up as soon as someone starts making queries again. So if you do that at the end of the module where you create your app, new connections will be created after the UWSGI fork.
I am running a flask app using gunicorn on Heroku. My application started exhibiting this problem when I added the --preload option to my Procfile. When I removed that option, my application resumed functioning as normal.
Not sure whether to add this as an answer to this question or ask a separate question and put this as an answer there. I was getting this exact same error for reasons that are slightly different from the people who have posted and answered. In my setup, I using gunicorn as a wsgi for a Flask application. In this application, I was offloading some intense database operations off to a celery worker. The error would come from the celery worker.
From reading a lot of the answers here and looking at the psycopg2 as well as sqlalchemy session documentation, it became apparent to me that it is a bad idea to share an SQLAlchemy session between separate processes (the gunicorn worker and the sqlalchemy worker in my case).
What ended up solving this for me was creating a new session in the celery worker function so it used a new session each time it was called and also destroying the session after every web request so flask used a session per request. The overall solution looked like this:
Flask_app.py
#app.teardown_appcontext
def shutdown_session(exception=None):
session.close()
celery_func.py
#celery_app.task(bind=True, throws=(IntegrityError))
def access_db(self,entity_dict, tablename):
with Session() as session:
try:
session.add(ORM_obj)
session.commit()
except IntegrityError as e:
session.rollback()
print('primary key violated')
raise e

python-memcache memcached -- I installed on centos virtualbox but it get/set never seem to work

I'm using python. I did a yum install memcached followed by a easy_install python-memcached
I used the simple test program from the Help(memcache). When I wasn't getting the proper answers I threw in some print statements:
[~/test]$ cat m2.py
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
x = mc.set("some_key", "Some value")
print 'Just set a key and value into the cache (suposedly)'
value = mc.get("some_key")
print 'Just retrieved that value from the cache using the key'
print 'X %s' % x
print 'Value %s' % value
[~/test]$ python m2.py
Just set a key and value into the cache (suposedly)
Just retrieved that value from the cache using the key
X 0
Value None
[~/test]$
The question now is, what have I failed to do in my installation? It appears to be working from an API perspective but it fails to put anything into the memcache share area.
I'm using a virtualbox vm running centos
[~]# cat /proc/version
Linux version 2.6.32-358.6.2.el6.i686 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Thu May 16 18:12:13 UTC 2013
Is there a daemon that is supposed to be running? I don't see an obvious named one when I do a ps.
I tried to get pylibmc installed on my vm but was unable to find a working installation so for now will see if I can get the above stuff working first.
I discovered if i ran straight from the python console GUI i get a bit more output if I set debug=1
>>> mc = memcache.Client(['127.0.0.1:11211'], debug=1)
>>> mc.stats
{}
>>> mc.set('test','value')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
0
>>> mc.get('test')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
When I try to use per the example telnet to connect to the port i get a connection refused:
[root#~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root#~]#
I tried the instructions I found on the net for configuring telnet so localhost wouldn't be disabled:
vi /etc/xinetd.d/telnet
service telnet
{
flags = REUSE
socket_type = stream
wait = no
user = root
server = /usr/sbin/in.telnetd
log_on_failure += USERID
disable = no
}
And then ran the commands to restart the service(s):
service iptables stop
service xinetd stop
service iptables start
service xinetd start
service iptables stop
I ran with both cases (iptables started and stopped) but it has no effect. So I am out of ideas. What do I need to do to make it so the PORT will be allowed? if that is the problem?
Or is there a memcached service that needs to be running that needs to open up the port ?
well this is what it took to get it working: ( a series of manual steps )
1) su -
cd /var/run
mkdir memcached # this was missing
In the memcached file I added "-l 127.0.0.1" to the OPTIONS statement. It's apparently a listen option. Do this for steps 2 & 3. I'm not certain which file is actually used at runtime.
2) cd /etc/sysconfig
cp memcached memcached.old
vi memcached
3) cd /etc/init.d
cp memcached memcached.old
vi memcached
4) Try some commands to see if the server starts now
/etc/init.d/memcached start
/etc/init.d/memcached status
/etc/init.d/memcached stop
/etc/init.d/memcached restart
I tried opening a browser, but it never seemed to actually display anything so I don't really know how valid this approach is. I'm not running apache or anything like this so perhaps its not relevant to my cause. Perhaps I would have to supply a ?key=blah or something.
5) http://127.0.0.1:11211
6) Now it should be ready to go. If one runs the test shown with the following it should work. At least it did for me. doing the help(memcache) will display a simple program. just paste that in and it should work just fine.
[~]$ python
>>> import memcache
>>> help(memcache)

Hector test example not working on Cassandra 0.7.4

I have set up my single node Cassandra 0.7.4 and started the service with
bin/cassandra -f. Now I am trying to use the Hector API (v. 0.7.0) to manage the
DB.
The Cassandra CLI works fine and I can create keyspaces and so on.
I tried to run the test example and create a single keyspace:
Cluster cluster = HFactory.getOrCreateCluster("TestCluster",
new CassandraHostConfigurator("localhost:9160"));
Keyspace keyspace = HFactory.createKeyspace("Keyspace1", cluster);
But all I get is this:
2011-04-14 22:20:27,469 [main ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService
- Downed Host
Retry service started with queue size -1 and retry delay 10s
2011-04-14 22:20:27,492 [main ] DEBUG
me.prettyprint.cassandra.connection.HThriftClient -
Transport open status false
for client CassandraClient<localhost:9160-1>
....this again about 20 times
me.prettyprint.cassandra.service.JmxMonitor - Registering JMX
me.prettyprint.cassandra.service_TestCluster:ServiceType=hector,
MonitorType=hector
2011-04-14 22:20:27,636 [Thread-0 ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService -
Downed Host
retry shutdown hook called
2011-04-14 22:20:27,646 [Thread-0 ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService -
Downed Host
retry shutdown complete
Can you please tell me what I'm doing wrong?
Thanks
When you connect via the CLI, do you specify "-h localhost -p 9160"?
Can you actually do stuff on the command line with the above?
The error from HThriftClient indicates it could not connect to the Cassandra Daemon.
FTR, you would get responses much faster via hector-users#googlegroups.com
If you are on a linux machine, try starting up your cassandra server by this command:
/bin$ ./cassandra start -f
Then for the cli, use this command:
./cassandra-cli -h {hostname}/9160.
Then make sure that the configures are ok.