Socket conflict while running VAEs

Socket conflict while running VAEs - sockets

I am currently trying to run VAEs on my personal labtop following the steps from: https://github.com/AntixK/PyTorch-VAE
I am only trying to run the simplest VAE with configs/vae.yaml
Since my personal device only have one gpu so I changed the gpus parameter in the config to [0] and I also add:
import os
os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "gloo"
to run.py to let the file runnable. However, I am now getting such error:
Global seed set to 1265
break point 3!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
break point 4!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======= Training VanillaVAE =======
Global seed set to 1265
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-NMQ53KV]:50425 (system error: 10049 - The requested address is not valid in its context.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-NMQ53KV]:50425 (system error: 10049 - The requested address is not valid in its context.).
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\core\datamodule.py:469: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.
rank_zero_deprecation(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-------------------------------------
0 | model | VanillaVAE | 3.9 M
-------------------------------------
3.9 M Trainable params
0 Non-trainable params
3.9 M Total params
15.751 Total estimated model params size (MB)
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]break point 1!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
break point 2!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Global seed set to 1265
break point 3!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
break point 4!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======= Training VanillaVAE =======
Global seed set to 1265
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W ..\torch\csrc\distributed\c10d\socket.cpp:401] [c10d] The server socket has failed to bind to [DESKTOP-NMQ53KV]:50425 (system error: 10048 - Only one usage of each socket address (protocol/network address/port) is normally permitted.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:401] [c10d] The server socket has failed to bind to DESKTOP-NMQ53KV:50425 (system error: 10013 - An attempt was made to access a socket in a way forbidden by its access permissions.).
[E ..\torch\csrc\distributed\c10d\socket.cpp:435] [c10d] The server socket has failed to listen on any local network address.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\huklab\Desktop\odin\PyTorch-VAE\run.py", line 67, in <module>
runner.fit(experiment, datamodule=data)
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py", line 737, in fit
self._call_and_handle_interrupt(
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py", line 772, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\trainer\trainer.py", line 1132, in _run
self.accelerator.setup_environment()
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\accelerators\gpu.py", line 39, in setup_environment
super().setup_environment()
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\accelerators\accelerator.py", line 83, in setup_environment
self.training_type_plugin.setup_environment()
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\plugins\training_type\ddp.py", line 185, in setup_environment
self.setup_distributed()
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\plugins\training_type\ddp.py", line 272, in setup_distributed
init_dist_connection(self.cluster_environment, self.torch_distributed_backend)
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pytorch_lightning\utilities\distributed.py", line 386, in init_dist_connection
torch.distributed.init_process_group(
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\distributed\distributed_c10d.py", line 595, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\distributed\rendezvous.py", line 257, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "C:\Users\huklab\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\distributed\rendezvous.py", line 188, in _create_c10d_store
return TCPStore(
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [DESKTOP-NMQ53KV]:50425 (system error: 10048 - Only one usage of each socket address (protocol/network address/port) is normally permitted.). The server socket has failed to bind to DESKTOP-NMQ53KV:50425 (system error: 10013 - An attempt was made to access a socket in a way forbidden by its access permissions.).
Does anyone has any idea what happens and how to fix this?
Thanks a lot for your help!

Related

kubernetes.client.exceptions.ApiException: (0) Reason: Handshake status 500 Internal Server Error

getting the above exception while deploying dags in the pipeline.
The log is as follows:
************************************************************************************************************
* *
* *
* Deploying 'dags'... *
* *
* *
************************************************************************************************************
[2022-12-16 14:09:48,076] {io} INFO - Current directory: /artifacts/dags
[2022-12-16 14:09:48,076] {copy_deploy_tool} INFO - Deploy 'dags' by copying files...
[2022-12-16 14:09:48,083] {deploy_tool} INFO - saving values.yaml...
[2022-12-16 14:09:48,162] {copy_deploy_tool} INFO - Removing files from 'development:airflow-5db795dd7c-d586h:/root/airflow/dags'
[2022-12-16 14:09:48,264] {deploy} ERROR - Execution failed for project: dags
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/ws_client.py", line 296, in websocket_call
client = WSClient(configuration, get_websocket_url(url), headers, capture_all)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/ws_client.py", line 94, in __init__
self.sock.connect(url, header=header)
File "/usr/local/lib/python3.6/dist-packages/websocket/_core.py", line 253, in connect
self.handshake_response = handshake(self.sock, *addrs, **options)
File "/usr/local/lib/python3.6/dist-packages/websocket/_handshake.py", line 57, in handshake
status, resp = _get_resp_headers(sock)
File "/usr/local/lib/python3.6/dist-packages/websocket/_handshake.py", line 143, in _get_resp_headers
raise WebSocketBadStatusException("Handshake status %d %s", status, status_message, resp_headers)
websocket._exceptions.WebSocketBadStatusException: Handshake status 500 Internal Server Error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/commands/deploy/deploy.py", line 65, in __try_execute
deployment=project[DEPLOYMENT],
File "/usr/local/lib/python3.6/dist-packages/tools/deploy/copy_deploy_tool.py", line 50, in run
namespace, container, pod_name, command, api_client=api_client
File "/usr/local/lib/python3.6/dist-packages/helpers/kubernetes.py", line 133, in run_pod_command
stdout=True,
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/stream.py", line 35, in stream
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api/core_v1_api.py", line 841, in connect_get_namespaced_pod_exec
(data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api/core_v1_api.py", line 941, in connect_get_namespaced_pod_exec_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 345, in call_api
_preload_content, _request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 176, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/stream.py", line 30, in _intercept_request_call
return ws_client.websocket_call(config, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/ws_client.py", line 302, in websocket_call
raise ApiException(status=0, reason=str(e))
kubernetes.client.rest.ApiException: (0)
Reason: Handshake status 500 Internal Server Error
[2022-12-16 14:09:49,631] {shell} INFO - doi: Deployment record uploaded successfully
[2022-12-16 14:09:49,631] {shell} INFO - OK
[2022-12-16 14:09:49,635] {io} INFO - Current directory: /artifacts
[2022-12-16 14:09:49,635] {pretty_info} INFO - ```

Usually, it happens when pod becomes into the Running state, but no running containers on it (0/1). And if you run execute command to this pod and container you will get 500 Internal server error instead of errors related to the real issue(the container is not running).
Check that all containers are running.
if all([ p.status.phase == "Running" for p in my_pods]) \
and all([c.state.running for p in my_pods for c in p.status.container_statuses]):
Also refer to this Stackpost and Github issue.

Not able to run self-hosted sentry on CentOS

I am trying to install self-hosted sentry on CentOS (CentOS Linux release 7.9.2009 (Core)) according to this documentation - https://develop.sentry.dev/self-hosted/.
But when I run sudo ./install.sh I get this error:
Failed to connect to clickhouse:9000
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 260, in connect
return self._init_connection(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 226, in _init_connection
self.socket = self._create_socket(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 202, in _create_socket
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/local/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
Connection to Clickhouse cluster clickhouse:9000 failed (attempt 9)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/clickhouse/native.py", line 81, in execute
result: Sequence[Any] = conn.execute(
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/client.py", line 205, in execute
self.connection.force_connect()
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 180, in force_connect
self.connect()
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 279, in connect
raise err
clickhouse_driver.errors.NetworkError: Code: 210. Name or service not known (clickhouse:9000)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/snuba/snuba/migrations/connect.py", line 30, in check_clickhouse_connections
check_clickhouse(clickhouse)
File "/usr/src/snuba/snuba/migrations/connect.py", line 49, in check_clickhouse
ver = clickhouse.execute("SELECT version()")[0][0]
File "/usr/src/snuba/snuba/clickhouse/native.py", line 96, in execute
raise ClickhouseError(e.code, e.message) from e
snuba.clickhouse.errors.ClickhouseError: [210] Name or service not known (clickhouse:9000)
Failed to connect to clickhouse:9000
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 260, in connect
return self._init_connection(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 226, in _init_connection
self.socket = self._create_socket(host, port)
File "/usr/local/lib/python3.8/site-packages/clickhouse_driver/connection.py", line 202, in _create_socket
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/local/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
The error from docker-compose logs clickhouse:
clickhouse_1 | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/sentry.xml'.
clickhouse_1 | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Failed to merge config with '/etc/clickhouse-server/config.d/sentry.xml': Access to file denied: /etc/clickhouse-server/config.d/sentry.xml, Stack trace (when copying this message, always include the lines below):
clickhouse_1 |
clickhouse_1 | 0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0x105351b0 in /usr/bin/clickhouse
clickhouse_1 | 1. ? # 0xdefbd83 in /usr/bin/clickhouse
clickhouse_1 | 2. DB::ConfigProcessor::loadConfig(bool) # 0xdef9e97 in /usr/bin/clickhouse
clickhouse_1 | 3. BaseDaemon::reloadConfiguration() # 0x9157010 in /usr/bin/clickhouse
clickhouse_1 | 4. BaseDaemon::initialize(Poco::Util::Application&) # 0x91597d2 in /usr/bin/clickhouse
clickhouse_1 | 5. DB::Server::initialize(Poco::Util::Application&) # 0x8f96458 in /usr/bin/clickhouse
clickhouse_1 | 6. Poco::Util::Application::run() # 0x10457659 in /usr/bin/clickhouse
clickhouse_1 | 7. DB::Server::run() # 0x8f96045 in /usr/bin/clickhouse
clickhouse_1 | 8. mainEntryClickHouseServer(int, char**) # 0x8f8ce23 in /usr/bin/clickhouse
clickhouse_1 | 9. main # 0x8ee8799 in /usr/bin/clickhouse
clickhouse_1 | 10. __libc_start_main # 0x21b97 in /lib/x86_64-linux-gnu/libc-2.27.so
clickhouse_1 | 11. _start # 0x8ee802e in /usr/bin/clickhouse
clickhouse_1 | (version 20.3.9.70 (official build))
clickhouse_1 | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/sentry.xml'.
clickhouse_1 | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Failed to merge config with '/etc/clickhouse-server/config.d/sentry.xml': Access to file denied: /etc/clickhouse-server/config.d/sentry.xml, Stack trace (when copying this message, always include the lines below):
There is error - Access to file denied: /etc/clickhouse-server/config.d/sentry.xml at clickhouse logs (similar issue - https://forum.sentry.io/t/click-house-giving-permission-errors/12418). How can I set correct permission for this file?

Fixed it by giving 777 permissions for some folders of sentry installation. I know that it can be not desirable for someone but it is easy and fast solution.
sudo chmod -R 777 ./clickhouse
sudo chmod -R 777 ./relay
sudo chmod -R 777 ./sentry
sudo chmod -R 777 ./postgress

Odoo 12 in WSL - Psycopg2 doesn't get the right Port for PostgreSQL

I have installed Odoo 12 in Ubuntu 18.04 LTS using WSL for windows 10 Pro.
Everything seems fine and service is running, but when I access the localhost I get Internal Server Error, and I get this error:
2019-09-05 06:52:07,596 309 ERROR ? werkzeug: Error on request:
Traceback (most recent call last):
File "/opt/odoo/.local/lib/python3.6/site-packages/werkzeug/serving.py", line 303, in run_wsgi
execute(self.server.app)
File "/opt/odoo/.local/lib/python3.6/site-packages/werkzeug/serving.py", line 291, in execute
application_iter = app(environ, start_response)
File "/opt/odoo/odoo/odoo/service/server.py", line 409, in app
return self.app(e, s)
File "/opt/odoo/odoo/odoo/service/wsgi_server.py", line 128, in application
return application_unproxied(environ, start_response)
File "/opt/odoo/odoo/odoo/service/wsgi_server.py", line 117, in application_unproxied
result = odoo.http.root(environ, start_response)
File "/opt/odoo/odoo/odoo/http.py", line 1320, in __call__
return self.dispatch(environ, start_response)
File "/opt/odoo/odoo/odoo/http.py", line 1293, in __call__
return self.app(environ, start_wrapped)
File "/opt/odoo/.local/lib/python3.6/site-packages/werkzeug/middleware/shared_data.py", line 220, in __call__
return self.app(environ, start_response)
File "/opt/odoo/odoo/odoo/http.py", line 1453, in dispatch
self.setup_db(httprequest)
File "/opt/odoo/odoo/odoo/http.py", line 1376, in setup_db
httprequest.session.db = db_monodb(httprequest)
File "/opt/odoo/odoo/odoo/http.py", line 1537, in db_monodb
dbs = db_list(True, httprequest)
File "/opt/odoo/odoo/odoo/http.py", line 1504, in db_list
dbs = odoo.service.db.list_dbs(force)
File "/opt/odoo/odoo/odoo/service/db.py", line 375, in list_dbs
with closing(db.cursor()) as cr:
File "/opt/odoo/odoo/odoo/sql_db.py", line 657, in cursor
return Cursor(self.__pool, self.dbname, self.dsn, serialized=serialized)
File "/opt/odoo/odoo/odoo/sql_db.py", line 171, in __init__
self._cnx = pool.borrow(dsn)
File "/opt/odoo/odoo/odoo/sql_db.py", line 540, in _locked
return fun(self, *args, **kwargs)
File "/opt/odoo/odoo/odoo/sql_db.py", line 608, in borrow
**connection_info)
File "/opt/odoo/.local/lib/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? - - -
PostgreSQL is installed using port 5433:
Ver Cluster Port Status Owner Data directory Log file
10 main 5433 online postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
I changed the port in the conf file \etc\odoo-server.conf from False to 5433:
[options]
; This is the password that allows database operations:
; admin_passwd = admin
db_host = False
db_port = 5433
db_user = odoo
db_password = False
logfile = /var/log/odoo/odoo-server.log
addons_path = /opt/odoo/addons,/opt/odoo/odoo/addons
However, I get the same error. Seems Psycopg2 take the value as default port 5432 even I specified it in the .conf file.
Any help?
Thanks

Your distribution (like mine) probably makes the key in the /tmp directory.
A quick workaround I've been using is:
mkdir /var/run/postgresql
ln -s /tmp/.s.PGSQL.5432(or 5433) /var/run/postgresql/

Apache Airflow: After pointing airflow.cfg to postgres, it still tries to run on MySQL

I'm using Apache-airflow2. My dags were running on LocalExecutor up until now smoothly. Now i want to scale it up and use CeleryExecutor (I'm still doing it on my Local Mac) I've configured it to run on CeleryExecutor, and when server starts, the log shows CeleryExecutor. But whenever i run airflow worker(so the same can be used as worker), or airflow flower, I run into an error where it tries to connect to Mysqlite and fails when it doesn't find the module.
I've configured rabbitMQ on local and airflow on virtualenv. Find below updated lines in airflow.cfg:
broker_url = amqp://myuser:mypassword#localhost/myvhost
result_backend = db+postgresql://localhost:5433/celery_space?user=celery_user&password=celery_user
sql_alchemy_conn = postgresql://localhost:5433/postgres?user=postgres&password=root
executor = CeleryExecutor
Please find below worker error:
-------------- celery#superadmins-MacBook-Pro.local v4.2.1 (windowlicker)
---- **** -----
--- * *** * -- Darwin-18.2.0-x86_64-i386-64bit 2018-12-30 09:15:00
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: airflow.executors.celery_executor:0x10c682fd0
- ** ---------- .> transport: sqla+mysql://airflow:airflow#localhost:3306/airflow
- ** ---------- .> results: mysql://airflow:**#localhost:3306/airflow
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> default exchange=default(direct) key=default
[tasks]
. airflow.executors.celery_executor.execute_command
[2018-12-30 09:15:00,290: INFO/MainProcess] Connected to sqla+mysql://airflow:airflow#localhost:3306/airflow
[2018-12-30 09:15:00,304: CRITICAL/MainProcess] Unrecoverable error: ModuleNotFoundError("No module named 'MySQLdb'")
Traceback (most recent call last):
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/celery/worker/worker.py", line 205, in start
self.blueprint.start(self)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/celery/bootsteps.py", line 369, in start
return self.obj.start()
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 317, in start
blueprint.start(self)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/celery/worker/consumer/tasks.py", line 41, in start
c.connection, on_decode_error=c.on_decode_error,
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/celery/app/amqp.py", line 297, in TaskConsumer
**kw
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/messaging.py", line 386, in __init__
self.revive(self.channel)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/messaging.py", line 408, in revive
self.declare()
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/messaging.py", line 421, in declare
queue.declare()
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/entity.py", line 608, in declare
self._create_queue(nowait=nowait, channel=channel)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/entity.py", line 617, in _create_queue
self.queue_declare(nowait=nowait, passive=False, channel=channel)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/entity.py", line 652, in queue_declare
nowait=nowait,
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/transport/virtual/base.py", line 531, in queue_declare
self._new_queue(queue, **kwargs)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 82, in _new_queue
self._get_or_create(queue)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 70, in _get_or_create
obj = self.session.query(self.queue_cls) \
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 65, in session
_, Session = self._open()
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 56, in _open
engine = self._engine_from_config()
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 51, in _engine_from_config
return create_engine(conninfo.hostname, **transport_options)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/sqlalchemy/engine/__init__.py", line 425, in create_engine
return strategy.create(*args, **kwargs)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 81, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/Users/deepaksaroha/Desktop/apache_2.0/nb-atom-airflow/lib/python3.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 102, in dbapi
return __import__('MySQLdb')
ModuleNotFoundError: No module named 'MySQLdb'
[2018-12-30 09:15:00,599] {__init__.py:51} INFO - Using executor SequentialExecutor
Starting flask
Please let know how can worker be run in sync with the airflow.cfg settings. Appreciate any help, let me know if anything further log or any configuration file is needed.

celery failing on dotcloud deployment with IO Error

Celery is failing on one of my dotcloud deployments, and I'm not sure how to fix. The deployment is almost identical to an existing dotcloud deployment (verified via doing a file diff) which seems to be working ok.
The error I get in djcelery log:
dotcloud#hack-default-www-0:/var/log/supervisor$ more djcelery_error.log
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
/home/dotcloud/env/lib/python2.6/site-packages/djcelery/loaders.py:108: UserWarn
ing: Using settings.DEBUG leads to a memory leak, never use this setting in prod
uction environments!
warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2012-06-04 03:27:32,139: WARNING/MainProcess] -------------- celery#hack-defaul
t-www-0 v2.5.3
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: amqp://root#hack-OQVADQ2K.dotcloud.com:29210//
- ** ---------- . loader: djcelery.loaders.DjangoLoader
- ** ---------- . logfile: [stderr]#INFO
- ** ---------- . concurrency: 2
- ** ---------- . events: ON
- *** --- * --- . beat: OFF
-- ******* ----
--- ***** ----- [Queues]
-------------- . celery: exchange:celery (direct) binding:celery
[Tasks]
. experiments.tasks.pushMessageToIphone
. experiments.tasks.sendTestMessage
[2012-06-04 03:27:32,172: INFO/PoolWorker-1] child process calling self.run()
[2012-06-04 03:27:32,185: INFO/PoolWorker-2] child process calling self.run()
[2012-06-04 03:27:32,188: WARNING/MainProcess] celery#hack-default-www-0 has sta
rted.
[2012-06-04 03:27:35,315: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 2 seconds...
[2012-06-04 03:27:40,374: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 4 seconds...
[2012-06-04 03:27:47,479: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 6 seconds...
[2012-06-04 03:27:56,509: ERROR/MainProcess] Consumer: Connection Error: Socket
Interestingly, the error log of celery cam shows something a bit different. I'm not sure if this is a red herring..
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
[2012-06-04 03:27:31,373: INFO/MainProcess] -> evcam: Taking snapshots with djce
lery.snapshot.Camera (every 1.0 secs.)
Traceback (most recent call last):
File "hack/manage.py", line 14, in
execute_manager(settings)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 459, in execute_manager
utility.execute()
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 74, in run_from_argv
return super(CeleryCommand, self).run_from_argv(argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 67, in execute
super(CeleryCommand, self).execute(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 232, in execute
output = self.handle(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/comma
nds/celerycam.py", line 26, in handle
ev.run(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 38, in run
detach=detach)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 70, in run_evcam
return cam()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/snapshot.py
", line 116, in evcam
recv.capture(limit=None)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 204, in capture
list(self.itercapture(limit=limit, timeout=timeout, wakeup=wakeup))
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 193, in itercapture
with self.consumer(wakeup=wakeup) as consumer:
File "/usr/lib/python2.6/contextlib.py", line 16, in __enter__
return self.gen.next()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 185, in consumer
queues=[self.queue], no_ack=True)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
279, in __init__
self.revive(self.channel)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
286, in revive
channel = channel.default_channel
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 581, in default_channel
self.connection
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 574, in connection
self._connection = self._establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 533, in _establish_connection
conn = self.transport.establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 279, in establish_connection
connect_timeout=conninfo.connect_timeout)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 89, in __init__
super(Connection, self).__init__(*args, **kwargs)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 144, in __init__
(10, 30), # tune
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/abstra
ct_channel.py", line 95, in wait
self.channel_id, allowed_methods)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 202, in _wait_method
self.method_reader.read_method()
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/method
_framing.py", line 221, in read_method
raise m
IOError: Socket closed
My supervisord file:
[program:djcelery]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
As mentioned, I have nearly identical code deployed under a different dotcloud account that is working fine.
Status of the rabbitmq broker:
$ ./dotcloud info hack.broker
aliases:
- hackxxxx.dotcloud.com
config:
password: xxxx
rabbitmq_management: true
user: root
created_at: 1338702527.075196
datacenter: Amazon-us-east-1c
image_version: 924a079b622a (latest)
memory: 49M/512M (9%)
ports:
- name: ssh
url: ssh://dotcloud#hackxxx.dotcloud.com:29209
- name: amqp
url: amqp://root:xxxx#hackxxxx.dotcloud.com:29210
- name: http
url: http://root:xxx#hack1-xxxx.dotcloud.com/
state: running
type: rabbitmq

It looks like it is having an issue connection to your broker. Have you confirmed that you can connect to your broker, and it is up and running?
What are you using for a broker?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Socket conflict while running VAEs - sockets

Related

kubernetes.client.exceptions.ApiException: (0) Reason: Handshake status 500 Internal Server Error

Not able to run self-hosted sentry on CentOS

Odoo 12 in WSL - Psycopg2 doesn't get the right Port for PostgreSQL

Apache Airflow: After pointing airflow.cfg to postgres, it still tries to run on MySQL

celery failing on dotcloud deployment with IO Error

Categories

Resources