I'm trying to get dask-kubernetes to work with my GKE account. The maddening thing is that it worked. But now it doesn't. I set up a cluster fine. The nodes get created fine as well. They run for 60 seconds and then time out with the following message (as shown with kubectl logs podname):
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/dask-worker", line 8, in <module>
sys.exit(go())
File "/opt/conda/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 446, in go
main()
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 432, in main
loop.run_sync(run)
File "/opt/conda/lib/python3.8/site-packages/tornado/ioloop.py", line 532, in run_sync
return future_cell[0].result()
File "/opt/conda/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 426, in run
await asyncio.gather(*nannies)
File "/opt/conda/lib/python3.8/asyncio/tasks.py", line 684, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/opt/conda/lib/python3.8/site-packages/distributed/core.py", line 284, in _
raise TimeoutError(
asyncio.exceptions.TimeoutError: Nanny failed to start in 60 seconds
Which, I assume means that the workers can't connect to the scheduler which runs on my laptop? However I don't understand why. The port seems to be open.
from dask_kubernetes import KubeCluster
from dask.distributed import Client
import dask.array as da
if __name__ == '__main__':
cluster = KubeCluster.from_yaml('worker-spec-2.yml')
cluster.scale(1)
client = Client(cluster)
array = da.ones((1000, 1000, 1000))
print(array.mean().compute())
And the worker-spec-2.yml contains the following:
kind: Pod
metadata:
labels:
foo: bar
spec:
restartPolicy: Never
containers:
- image: daskdev/dask:latest
imagePullPolicy: IfNotPresent
args: [dask-worker, --nthreads, '1', --no-dashboard, --memory-limit, 1GB, --death-timeout, '60']
name: easyvvuq
env:
- name: EXTRA_PIP_PACKAGES
value: git+https://github.com/dask/distributed
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: 500m
memory: 2G
Again, this or something similar has worked for me. I may have changed something in the worker-spec.yml but that is about it.
My question would - how do I go about diagnosing this? I am not a kubernetes expert by any means.
Related
I'm having trouble working my code in docker. Could you please help me?
I'm putting my application in docker together with mongo in docker, but when I run the file inside docker it doesn't connect with the mongo of the other docker and accuses Timeout.
My Docker:
version: "3.4"
services:
mongo_db:
image: mongo:6.0
ports:
- "27017:27017"
volumes:
- ./mongo_db:/data/db
container_name: mongo_db
mongo_app:
image: mongo_img_db:latest
links:
- mongo_db
command: python3 /app/main.py
container_name: mongo_app
Error:
### Connection: Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'app')
Traceback (most recent call last):
File "/app/main.py", line 216, in <module>
db_calc.update_storage_stats(db_calc.calculate_artifacts_size())
File "/app/main.py", line 43, in calculate_artifacts_size
artifacts_size = self.db_client.builds.aggregate(
File "/usr/local/lib/python3.9/site-packages/pymongo/collection.py", line 2428, in aggregate
with self.__database.client._tmp_session(session, close=False) as s:
File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1757, in _tmp_session
s = self._ensure_session(session)
File "/usr/local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1740, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/usr/local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1685, in __start_session
self._topology._check_implicit_session_support()
File "/usr/local/lib/python3.9/site-packages/pymongo/topology.py", line 538, in _check_implicit_session_support
self._check_session_support()
File "/usr/local/lib/python3.9/site-packages/pymongo/topology.py", line 554, in _check_session_support
self._select_servers_loop(
File "/usr/local/lib/python3.9/site-packages/pymongo/topology.py", line 238, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 63542415a868649d12f2a966, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused')>]>
Setup: Airflow 2.0.1 with Kubernetes 1.18 and Python 3.8, Kubernetes Client: 18.17.x
Pod template file:
apiVersion: v1
kind: Pod
metadata:
name: workerPod
spec:
containers:
- args: []
command: []
env:
- name: <Key>
value: "<value>"
envFrom: []
name: base
image: "<image_name>"
imagePullSecrets: [name: "<image_pull_secrets>"]
imagePullPolicy: "Always"
ports: []
volumeMounts:
- mountPath: "<path>"
name: "<name>"
The default config set in airflow.cfg is as follows:
[kubernetes]
pod_template_file = <path to template file>
worker_container_repository = <base-default-image>
worker_container_tag = <tag>
namespace = airflow
delete_worker_pods = True
delete_worker_pods_on_failure = False
worker_pods_creation_batch_size = 1
multi_namespace_mode = False
in_cluster = True
kube_client_request_args =
delete_option_kwargs =
enable_tcp_keepalive = False
tcp_keep_idle = 120
tcp_keep_intvl = 30
tcp_keep_cnt = 6
dags_in_image = True
dags_volume_mount_point = <volume-mount-point>
image_pull_secrets = <default-pull-secrets>
The problem is that, while certain keys are being read correctly from the pod_template_file, for instance, I can see all the env variables be set correctly as well as imagePullPolicy being read correctly as well (validated by overriding value of imagePullPolicy: "Always" from imagePullPolicy: "IfNotPresent"), but the key for imagePullSecrets is not being read correctly. I can validate this, as I get a Base credentials not provided error when the image is being pulled from the ecr repo. I have validated that the credentials are correct and I can create a pod when trying to do so explicitly.
Even when trying to set the imagePullSecrets in the airflow.cfg directly, I still end up getting the same error.
I have also tried to create the pod override using the V1 api explicitly as follows:
start_task = PythonOperator(
task_id=<start_task_id>, python_callable=<start_task_callabel>, op_args=[<args>], dag=dag,
executor_config={
"pod_template_file": "<path_to_template>",
"pod_override": k8s.V1Pod(
spec=k8s.V1PodSpec(
containers=[
k8s.V1Container(
name="base",
image="<image_override>",
image_pull_policy="<pull_policy>"
),
],
image_pull_secrets=[k8s.V1LocalObjectReference('<image_pull_secrets>')],
)
),
},
)
In this case I can get the docker image to be used downloaded correctly without any authentication errors. But unfortunately, the pod throws an error: AttributeError: 'V1Container' object has no attribute '_startup_probe'
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/airflow/__main__.py", line 40, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/airflow/utils/cli.py", line 89, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/airflow/cli/commands/task_command.py", line 234, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/usr/local/lib/python3.8/dist-packages/airflow/cli/commands/task_command.py", line 64, in _run_task_by_selected_method
_run_task_by_local_task_job(args, ti)
File "/usr/local/lib/python3.8/dist-packages/airflow/cli/commands/task_command.py", line 120, in _run_task_by_local_task_job
run_job.run()
File "/usr/local/lib/python3.8/dist-packages/airflow/jobs/base_job.py", line 237, in run
self._execute()
File "/usr/local/lib/python3.8/dist-packages/airflow/jobs/local_task_job.py", line 84, in _execute
if not self.task_instance.check_and_change_state_before_execution(
File "/usr/local/lib/python3.8/dist-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/airflow/models/taskinstance.py", line 1029, in check_and_change_state_before_execution
session.commit()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1046, in commit
self.transaction.commit()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 504, in commit
self._prepare_impl()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 483, in _prepare_impl
self.session.flush()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2540, in flush
self._flush(objects)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2682, in _flush
transaction.rollback(_capture_exception=True)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2642, in _flush
flush_context.execute()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
persistence.save_obj(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/persistence.py", line 230, in save_obj
_emit_update_statements(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/persistence.py", line 885, in _emit_update_statements
for (
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/persistence.py", line 626, in _collect_update_commands
state.manager[propkey].impl.is_equal(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/sqltypes.py", line 1738, in compare_values
return x == y
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/models/v1_pod.py", line 221, in __eq__
return self.to_dict() == other.to_dict()
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/models/v1_pod.py", line 196, in to_dict
result[attr] = value.to_dict()
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/models/v1_pod_spec.py", line 1004, in to_dict
result[attr] = list(map(
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/models/v1_pod_spec.py", line 1005, in <lambda>
lambda x: x.to_dict() if hasattr(x, "to_dict") else x,
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/models/v1_container.py", line 660, in to_dict
value = getattr(self, attr)
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/models/v1_container.py", line 458, in startup_probe
return self._startup_probe
AttributeError: 'V1Container' object has no attribute '_startup_probe'
I had a similar issue. The problem was that we changed our airflow containers and upgraded the Kubernetes library in the new containers. There is not necessarily an issue with the new Kubernetes library but Airflow had serialized some objects (in our case TaskInstance, seems also to be the case in your case according to the shared backtrace) and it deserializes it and makes a Python object from it. So in your case it recreates a V1Container object from the serialized form it had. The new object in your case is structured in Python like this which has an attribute _startup_probe set in its initializer. But the serialized version doesn't have that attribute so it seems to be a version prior to this commit. It seems the deserialization does not cause issues but whenever the to_dict method is used issues will arise. In your case it is used to do comparison (eq) for me it was upon logging as repr uses it.
Airflow Slack community pointed me to this change which should resolve this issue. I haven't been able to test this yet but sharing this here already in case someone hits it.
I had a similar issue, which started occuring when we updated our airflow version. the problem was that we were installing an older version of kubernetes which was incompatible with the latest airflow, while creating the custom kubernetes container(in the dockerfile). Using the latest version of kubernetes in the dockerfile fixed the issue.
I am trying to make a reboot to a pod on a Rancher Cluster through Ansible and I am having this error:
ERROR! couldn't resolve module/action 'k8s_exec'. This often indicates a misspelling, missing collection, or incorrect module path.
The error appears to be in '/home/ansible/ansible/GetKubectlPods': line 27, column 7, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
-
name: Reboot Machine
^ here
I don't know why I am getting an error since I am also using k8s_info and he is working be fine.
Here is the playbook I am running:
---
- hosts: localhost
connection: local
remote_user: root
vars:
ansible_python_interpreter: '{{ ansible_playbook_python }}'
tasks:
-
name: Get the pods in the specific namespace
k8s_info:
kubeconfig: '/etc/ansible/RCCloudConfig'
kind: Pod
namespace: redmine
register: pod_list
-
name: Print pod names
debug:
msg: "pod_list: {{ pod_list | json_query('resources[*].status.podIP') }} "
- set_fact:
pod_names: "{{pod_list|json_query('resources[*].metadata.name')}}"
-
name: Reboot Machine
k8s_exec:
kubeconfig: '/etc/ansible/RCCloudConfig'
namespace: redmine
pod: redmine_quick-testing-6c57cc5d65-lwkww #pod name
command: reboot
Ansible Version:
ansible 2.9.9
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.6/site-packages/ansible
executable location = /usr/bin/ansible
python version = 3.6.8 (default, Apr 16 2020, 01:36:27) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
In case I edit my playbook and add the community.kubernetes collection, I get the following error:
fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/__init__.py:12: DeprecationWarning: The package kubernetes.client.apis is renamed and deprecated, use kubernetes.client.api instead (please note that the trailing s was removed).\n DeprecationWarning\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/stream/ws_client.py\", line 296, in websocket_call\n client = WSClient(configuration, get_websocket_url(url), headers, capture_all)\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/stream/ws_client.py\", line 94, in __init__\n self.sock.connect(url, header=header)\n File \"/usr/local/lib/python3.6/site-packages/websocket/_core.py\", line 226, in connect\n self.handshake_response = handshake(self.sock, *addrs, **options)\n File \"/usr/local/lib/python3.6/site-packages/websocket/_handshake.py\", line 80, in handshake\n status, resp = _get_resp_headers(sock)\n File \"/usr/local/lib/python3.6/site-packages/websocket/_handshake.py\", line 165, in _get_resp_headers\n raise WebSocketBadStatusException(\"Handshake status %d %s\", status, status_message, resp_headers)\nwebsocket._exceptions.WebSocketBadStatusException: Handshake status 404 Not Found\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/home/ansible/.ansible/tmp/ansible-tmp-1592530746.4685414-46123-84159696554463/AnsiballZ_k8s_exec.py\", line 102, in <module>\n _ansiballz_main()\n File \"/home/ansible/.ansible/tmp/ansible-tmp-1592530746.4685414-46123-84159696554463/AnsiballZ_k8s_exec.py\", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File \"/home/ansible/.ansible/tmp/ansible-tmp-1592530746.4685414-46123-84159696554463/AnsiballZ_k8s_exec.py\", line 40, in invoke_module\n runpy.run_module(mod_name='ansible_collections.community.kubernetes.plugins.modules.k8s_exec', init_globals=None, run_name='__main__', alter_sys=True)\n File \"/usr/lib64/python3.6/runpy.py\", line 205, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/usr/lib64/python3.6/runpy.py\", line 96, in _run_module_code\n mod_name, mod_spec, pkg_name, script_name)\n File \"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_k8s_exec_payload_q6oom26c/ansible_k8s_exec_payload.zip/ansible_collections/community/kubernetes/plugins/modules/k8s_exec.py\", line 148, in <module>\n File \"/tmp/ansible_k8s_exec_payload_q6oom26c/ansible_k8s_exec_payload.zip/ansible_collections/community/kubernetes/plugins/modules/k8s_exec.py\", line 135, in main\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/stream/stream.py\", line 35, in stream\n return func(*args, **kwargs)\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py\", line 841, in connect_get_namespaced_pod_exec\n (data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs) # noqa: E501\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py\", line 941, in connect_get_namespaced_pod_exec_with_http_info\n collection_formats=collection_formats)\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py\", line 345, in call_api\n _preload_content, _request_timeout)\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py\", line 176, in __call_api\n _request_timeout=_request_timeout)\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/stream/stream.py\", line 30, in _intercept_request_call\n return ws_client.websocket_call(config, *args, **kwargs)\n File \"/usr/local/lib/python3.6/site-packages/kubernetes/stream/ws_client.py\", line 302, in websocket_call\n raise ApiException(status=0, reason=str(e))\nkubernetes.client.rest.ApiException: (0)\nReason: Handshake status 404 Not Found\n\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
Solution:
Find your k8s_exec.py file and edit your import settings from Kubernetes.apis to Kubernetes.api
Thank you in advance!
I am using using https://github.com/helm/charts/tree/master/stable/airflow helm chart and building v1.10.8 puckle/docker-airflow image with kubernetes installed on it and using that image in the helm chart,
But I keep getting
File "/usr/local/bin/airflow", line 37, in <module>
args.func(args)
File "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1140, in initdb
db.initdb(settings.RBAC)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 332, in initdb
dagbag = models.DagBag()
File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 95, in __init__
executor = get_default_executor()
File "/usr/local/lib/python3.7/site-packages/airflow/executors/__init__.py", line 48, in get_default_executor
DEFAULT_EXECUTOR = _get_executor(executor_name)
File "/usr/local/lib/python3.7/site-packages/airflow/executors/__init__.py", line 87, in _get_executor
return KubernetesExecutor()
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 702, in __init__
self.kube_config = KubeConfig()
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 283, in __init__
self.kube_client_request_args = json.loads(kube_client_request_args)
File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
In my scheduler, also as various sources advise,
I tried setting :
AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: {"_request_timeout" : [60,60] }
in my helm values. that also didn't work any one have any ideas what am I missing?
Here's my values.yaml
airflow:
image:
repository: airflow-docker-local
tag: 1.10.8
executor: Kubernetes
service:
type: LoadBalancer
config:
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.8
AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
AIRFLOW__KUBERNETES__NAMESPACE: airflow
AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: {"_request_timeout" : [60,60] }
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow#airflow-postgresql:5432/airflow
persistence:
enabled: true
existingClaim: ''
workers:
enabled: false
postgresql:
enabled: true
redis:
enabled: false
EDIT :
Various attempts to set environment variable in helm values.yaml didn't work, after that I added (pay attention to double and single quotes)
ENV AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS='{"_request_timeout" : [60,60] }'
to Dockerfile here : https://github.com/puckel/docker-airflow/blob/1.10.9/Dockerfile#L19
after that my airflow-scheduler pod starts but then I keep getting following error on my scheduler pod.
Process KubernetesJobWatcher-9: Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 313,
in recv_into return self.connection.recv_into(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py",
line 1840, in recv_into self._raise_ssl_error(self._ssl, result) File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py",
line 1646, in _raise_ssl_error raise WantReadError() OpenSSL.SSL.WantReadError
For the helm value, the template uses a loop that places the airflow.config map into double quotes ". This means any " in a value needs to be escaped for the output templated YAML to be valid.
airflow:
config:
AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: '{\"_request_timeout\":60}'
That deploys and runs (but I haven't completed an end to end test)
According to this github issue, the python scheduler SSL timeout may not be a problem as the watcher starts again after the 60 second connection timeout.
I have recently installed airflow on an AWS server by using this guide for ubuntu 16.04. After a painful and successful install started the webserver. I tried a sample dag as follows
from airflow.operators.python_operator import PythonOperator
from airflow.operators.dummy_operator import DummyOperator
from datetime import timedelta
from airflow import DAG
import airflow
# DEFAULT ARGS
default_args = {
'owner': 'airflow',
'start_date': airflow.utils.dates.days_ago(2),
'depends_on_past': False}
dag = DAG('init_run', default_args=default_args, description='DAG SAMPLE',
schedule_interval='#daily')
def print_something():
print("HELLO AIRFLOW!")
with dag:
task_1 = PythonOperator(task_id='do_it', python_callable=print_something)
task_2 = DummyOperator(task_id='dummy')
task_1 << task_2
But when i open the UI the tasks in the dag are still in "No Status" no matter how many times i trigger manually or refresh the page.
Later i found out that airflow scheduler is not running and shows the following error:
{celery_executor.py:228} ERROR - Error sending Celery task:No module named 'MySQLdb'
Celery Task ID: ('init_run', 'dummy', datetime.datetime(2019, 5, 30, 18, 0, 24, 902499, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/executors/celery_executor.py", line 118, in send_task_to_executor
result = task.apply_async(args=[command], queue=queue)
File "/usr/local/lib/python3.7/site-packages/celery/app/task.py", line 535, in apply_async
**options
File "/usr/local/lib/python3.7/site-packages/celery/app/base.py", line 728, in send_task
amqp.send_task_message(P, name, message, **options)
File "/usr/local/lib/python3.7/site-packages/celery/app/amqp.py", line 552, in send_task_message
**properties
File "/usr/local/lib/python3.7/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 510, in _ensured
return fun(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/kombu/messaging.py", line 194, in _publish
[maybe_declare(entity) for entity in declare]
File "/usr/local/lib/python3.7/site-packages/kombu/messaging.py", line 194, in <listcomp>
[maybe_declare(entity) for entity in declare]
File "/usr/local/lib/python3.7/site-packages/kombu/messaging.py", line 102, in maybe_declare
return maybe_declare(entity, self.channel, retry, **retry_policy)
File "/usr/local/lib/python3.7/site-packages/kombu/common.py", line 121, in maybe_declare
return _maybe_declare(entity, channel)
File "/usr/local/lib/python3.7/site-packages/kombu/common.py", line 145, in _maybe_declare
entity.declare(channel=channel)
File "/usr/local/lib/python3.7/site-packages/kombu/entity.py", line 608, in declare
self._create_queue(nowait=nowait, channel=channel)
File "/usr/local/lib/python3.7/site-packages/kombu/entity.py", line 617, in _create_queue
self.queue_declare(nowait=nowait, passive=False, channel=channel)
File "/usr/local/lib/python3.7/site-packages/kombu/entity.py", line 652, in queue_declare
nowait=nowait,
File "/usr/local/lib/python3.7/site-packages/kombu/transport/virtual/base.py", line 531, in queue_declare
self._new_queue(queue, **kwargs)
File "/usr/local/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 82, in _new_queue
self._get_or_create(queue)
File "/usr/local/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 70, in _get_or_create
obj = self.session.query(self.queue_cls) \
File "/usr/local/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 65, in session
_, Session = self._open()
File "/usr/local/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 56, in _open
engine = self._engine_from_config()
File "/usr/local/lib/python3.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 51, in _engine_from_config
return create_engine(conninfo.hostname, **transport_options)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/__init__.py", line 443, in create_engine
return strategy.create(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 87, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 104, in dbapi
return __import__("MySQLdb")
ModuleNotFoundError: No module named 'MySQLdb'
Here is the setting in the config file (airflow.cfg):
sql_alchemy_conn = postgresql+psycopg2://airflow#localhost:5432/airflow
broker_url = sqla+mysql://airflow:airflow#localhost:3306/airflow
result_backend = db+postgresql://airflow:airflow#localhost/airflow
I been struggling with this issue for two days now, Please help
In your airflow.cfg, there should also be a config option for celery_result_backend. Are you able to let us know what this value is set to? If it is not present in your config, set it to the same value as the result_backend
i.e:
celery_result_backend = db+postgresql://airflow:airflow#localhost/airflow
And then restart the airflow stack to ensure the configuration changes apply.
(I wanted to leave this as a comment but don't have enough rep to do so)
I think the example you are following didnt told you to install mysql and it seems you are using it in broker URL.
you can install mysql and than configure it. (for python 3.5+)
pip install mysqlclient
Alternatively, for a quick fix. You can also use rabbit MQ(Rabbitmq is a message broker, that you will require to rerun airflow dags with celery) guest user login
and then your broker_url will be
broker_url = amqp://guest:guest#localhost:5672//
if not already installed, Rabbitmq can be installed with following command.
sudo apt install rabbitmq-server
Change configuration NODE_IP_ADDRESS=0.0.0.0 in configuration file located at
/etc/rabbitmq/rabbitmq-env.conf
start RabbitMQ service
sudo service rabbitmq-server start