Airflow scheduler is throwing out an error - 'DisabledBackend' object has no attribute '_get_task_meta_for' - celery

I am trying to install airflow (distributed mode) in WSL, I got the setup of Airflow webserver, Airflow Scheduler, Airflow Worker, Celery (3.1) and RabbitMQ.
While running the Airflow Scheduler it is throwing out this error (below) even though the backend is set up.
ERROR
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/airflow/executors/celery_executor.py", line 92, in sync
state = task.state
File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 398, in state
return self._get_task_meta()['status']
File "/usr/local/lib/python3.6/dist-packages/celery/result.py", line 341, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/usr/local/lib/python3.6/dist-packages/celery/backends/base.py", line 288, in get_task_meta
meta = self._get_task_meta_for(task_id)
AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
https://issues.apache.org/jira/browse/AIRFLOW-1840
This is the exact error I am getting but couldn't find a solution.
Result Backend-
result_backend = db+postgresql://postgres:****#localhost:5432/postgres
broker_url = amqp://rabbitmq_user_name:rabbitmq_password#localhost/rabbitmq_virtual_host_name
Help please, gone through almost all the documents but couldn't find a solution

I was facing the same issue on celery version - 3.1.26.post2 (with rabitmq,postgresql and airflow),the reason for this issue is the dictionary used in celery base.py file at(lib/python3.5/site-packages/celery/app/base.py)
does not capture celery backend at key CELERY_RESULT_BACKEND instead it captures at key result_backend.
So the solution here is go to _get_config function available in base.py file at(lib/python3.5/site-packages/celery/app/base.py),at the end of the function before returning dictionary s add the below code.
s['CELERY_RESULT_BACKEND'] = s['result_backend'] #code to be added
return s
This solved the problem.

Related

When I run the sentinel register tesla_ai.jir -set_active true -mode ir command in Jaseci after closing the terminal, I get a Python error

When I run the sentinel register tesla_ai.jir -set_active true -mode ir command in Jaseci after closing the terminal, I get a Python error. What could be causing this issue?
Here is the error message:
File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
During handling of the above exception, another exception occurred:
This occured while typing jaseci > sentinel register tesla_ai.jir -set_active true -mode ir
Three things are happening here:
First, we registered the jir we compiled earlier to new sentinel. This means this new sentinel now has access to all of our walkers, nodes and edges. -mode ir option speciifes a jir program is registered instead of a jac program.
Second, with -set_active true we set this new sentinel to be the active sentinel. In other words, this sentinel is the default one to be used when requests hit the Jac APIs, if no specific sentinels are specified.
Third, sentinel register has automatically creates a new graph (if no currently active graph) and run the init walker on that graph. This behavior can be customized with the options -auto_run and -auto_create_graph.
Restart the terminal or the entire computer to see if the issue is temporary and resolve it.
Check if I have the latest version of Jaseci installed. Try updating it to see if that resolves the issue.
Ensure that MY environment has the required dependencies and libraries installed for Jaseci to run correctly.

Airflow 2.1.4 Composer V2 GKE kubernetes in custom VPC Subnet returning 404

So I have two V2 Composers running in the same project, the only difference in these two is that in one of them I'm using the default subnet and default values/autogenerated values for cluster-ipv4-cidr & services-ipv4-cidr. In the other one I've created another subnet in the same (default VPC) which is in the same region, but a different IP range, and I reference this subnet when creating the Composer, additionally I give it the services-ipv4-cidr=xx.44.0.0/17 and services-ipv4-cidr=xx.45.4.0/22.
Everything else is the same between these two Composer environments. In the environment where I have a custom subnet I'm not able to run any KubernetsPodOperator jobs, they return the error:
ERROR - Exception when attempting to create Namespaced Pod:
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 111, in run_pod_async
resp = self._client.create_namespaced_pod(
File "/opt/python3.8/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 6174, in create_namespaced_pod
(data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) # noqa: E501
File "/opt/python3.8/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 6251, in create_namespaced_pod_with_http_info
return self.api_client.call_api(
File "/opt/python3.8/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 340, in call_api
return self.__call_api(resource_path, method,
File "/opt/python3.8/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 172, in __call_api
response_data = self.request(
File "/opt/python3.8/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 382, in request
return self.rest_client.POST(url,
File "/opt/python3.8/lib/python3.8/site-packages/kubernetes/client/rest.py", line 272, in POST
return self.request("POST", url,
File "/opt/python3.8/lib/python3.8/site-packages/kubernetes/client/rest.py", line 231, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
and this pod does not appear if I go to GKE to check workloads. These two GKE envs use same composer service account, K8s service account and namespaces, but from my understand that is not an issue. Jobs outside of the K8sPodOperator work fine. I had a theory that perhaps the non-default subnet needed additional permissions but I wasn't able to confirm or deny this theory yet.
From the log I can see that the KubernetesPodOperator can't locate the worker, even though from the UI I can find it, and also non-KubernetesPodOperator jobs do this succesfully.
Would appreciate some guidance on what to do/where to look?

Airflow run a DAG to manipulate DB2 data that rasise a jaydebeapi.Error

I follow the offcial website of Airflow to produce my Airflow DAG to connect to DB2. When i run a DAG to insert data or update data that will raise a jaydebeapi.Error. Even though Airflow raise a jaydebeapi.Error, the data still has inserted/updated in DB2 successfully.
The DAG on the Airflow UI will be marked FAILED. I don't know what steps i miss to do.
My DAG code snippet:
with DAG("my_dag1", default_args=default_args,
schedule_interval="#daily", catchup=False) as dag:
cerating_table = JdbcOperator(
task_id='creating_table',
jdbc_conn_id='db2',
sql=r"""
insert into DB2ECIF.T2(C1,C1_DATE) VALUES('TEST',CURRENT DATE);
""",
autocommit=True,
dag=dag
)
DAG log:
[2022-06-20 02:16:03,743] {base.py:68} INFO - Using connection ID 'db2' for task execution.
[2022-06-20 02:16:04,785] {dbapi.py:213} INFO - Running statement:
insert into DB2ECIF.T2(C1,C1_DATE) VALUES('TEST',CURRENT DATE);
, parameters: None
[2022-06-20 02:16:04,842] {dbapi.py:221} INFO - Rows affected: 1
[2022-06-20 02:16:04,844] {taskinstance.py:1889} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/jdbc/operators/jdbc.py", line 76, in execute
return hook.run(self.sql, self.autocommit, parameters=self.parameters, handler=fetch_all_handler)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/hooks/dbapi.py", line 195, in run
result = handler(cur)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/jdbc/operators/jdbc.py", line 30, in fetch_all_handler
return cursor.fetchall()
File "/home/airflow/.local/lib/python3.7/site-packages/jaydebeapi/__init__.py", line 596, in fetchall
row = self.fetchone()
File "/home/airflow/.local/lib/python3.7/site-packages/jaydebeapi/__init__.py", line 561, in fetchone
raise Error()
jaydebeapi.Error
[2022-06-20 02:16:04,847] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=my_dag1, task_id=creating_table, execution_date=20210101T000000, start_date=, end_date=20220620T021604
I have installed required python packages of Airflow. List below:
Package(System) name/Version
Airflow/2.3.2
IBM DB2/11.5.7
OpenJDK/15.0.2
JayDeBeApi/1.2.0
JPype1/0.7.2
apache-airflow-providers-jdbc/3.0.0
I have tried to use the latest version of item 4(1.2.3) and item 5(1.4.0) still doesn't work.
I also have downgraded Airflow version to 2.2.3 or 2.2.5 got same result.
How to solve this problem?
The error doesn't happen in the original insert query but due to a fetchall introduced in this PR - https://github.com/apache/airflow/pull/23817
Using apache-airflow-providers-jdbc/2.1.3 might be an easy workaround.
To get the root cause, set DEBUG logging level in Airflow and see why the fetchall causes the error. Having the full traceback will help

Using Beautiful Soup on the heroku Application

I am trying to deploy a bot I made in Python using the following libraries:
requests, beautifulsoup4, discord.
This is to be deployed using I believe git hub and Heroku. The bot deploys successfully; however, when I check the logs, the bot has crashed. Here is the error message:
2020-05-17T23:17:42.624634+00:00 app[api]: Deploy 83c32a30 by user ****************************
2020-05-17T23:17:42.624634+00:00 app[api]: Release v12 created by user ****************************
2
2020-05-17T23:17:43.134443+00:00 heroku[worker.1]: State changed from crashed to starting
2020-05-17T23:17:48.338694+00:00 heroku[worker.1]: State changed from starting to up
2020-05-17T23:17:51.764352+00:00 heroku[worker.1]: State changed from up to crashed
2020-05-17T23:17:51.660991+00:00 app[worker.1]: Traceback (most recent call last):
2020-05-17T23:17:51.661016+00:00 app[worker.1]: File "BocoBot_Version1.py", line 126, in <module>
2020-05-17T23:17:51.661182+00:00 app[worker.1]: soup = BeautifulSoup(source, 'lxml')
2020-05-17T23:17:51.661184+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/bs4/__init__.py", line 245, in __init__
2020-05-17T23:17:51.661401+00:00 app[worker.1]: % ",".join(features))
**2020-05-17T23:17:51.661423+00:00 app[worker.1]: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?**
2020-05-17T23:17:57.000000+00:00 app[api]: Build succeeded
I believe this is the issue in question:
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
But I do not know what I need to do to resolve it. My guess is that it has to do with my requirements.txt file where I tell it what packages to add. But no matter what changes I make to BeautifulSoup4, it continues not to work.
Here is the requirements.txt file information:
git+https://github.com/Rapptz/discord.py
PyNaCl==1.3.0
pandas
beautifulsoup4
requests
discord
dnspython==1.16.0
async-timeout==3.0.1
Any suggestions would be greatly appreciated and I will be happy to add more information.
Try adding lxml to your requirements.txt.

Apache Airflow celery executor is not getting result backend

I am running Apache Airflow version 1.9.0 and when I try to run a task from UI, I get the following error in airflow scheduler console:
[2018-05-08 12:09:06,737] {jobs.py:1077} INFO - No tasks to consider for execution.
[2018-05-08 12:09:06,738] {jobs.py:1662} INFO - Heartbeating the executor
[2018-05-08 12:09:06,738] {celery_executor.py:101} ERROR - Error syncing the celery executor, ignoring it:
[2018-05-08 12:09:06,738] {celery_executor.py:102} ERROR - No result backend configured. Please see the documentation for more information.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", line 83, in sync
state = async.state
File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 329, in state
return self.backend.get_status(self.id)
File "/usr/local/lib/python2.7/dist-packages/celery/backends/base.py", line 547, in _is_disabled
'No result backend configured. '
NotImplementedError: No result backend configured. Please see the documentation for more information.
In my airflow.cfg, I have the following variables in [celery] section:
celery_app_name = airflow.executors.celery_executor
celeryd_concurrency = 16
worker_log_server_port = 8795
broker_url = amqp://guest:guest#localhost:5672//
celery_result_backend = amqp://guest:guest#localhost:5672//
flower_host = 0.0.0.0
flower_port = 5555
default_queue = default
What am I doing wrong here?
You should not point celery_result_backend to a RabbitMQ instance since the purpose of this backend is to store information concerning the status of the tasks and RabbitMQ is not the right tool for that (Please correct me if I'm mistaken).
You can use Redis in case you want to keep using the same instance as broker and backend, or alternatively you can use postgres as the backend which I recommend. A sample configuration for Postgres would be the following:
celery_result_backend = db+postgresql://airflow:****#postgres/airflow
More info on the official docs: Here