Ambassador pod fails in kubernetes for kubernetes api server cluster IP inaccessible - [Errno 113] Host is unreachable',) - kubernetes

Issue with Ambassador deployment starting in Kubernetes:
Kubernetes - v1.13
Ambassador - image: quay.io/datawire/ambassador:0.50.3
Container Runtime - Docker
Cluster networking done used Flannel
The entire set up is done on two Oracle VMs on a single Windows 10 machine.
the network applied is Host-Only with master - 192.168.99.110 and a node - 192.168.99.101.
I am deploying ambassador using kubectl apply -f https://getambassador.io/yaml/ambassador/ambassador-rbac.yaml. After 30 secs when the kubernetes pods starting the Kube watch it goes into a 'CrashLoopBackOff' state. I inspected the logs of the pod and it says below - which at the last sentence states that 10.96.0.1(API Server cluster IP) is unreachable :
" []# kubectl logs ambassador-76f644ddfb-vnj4d
2019-03-06 17:12:13 kubewatch [23 TMainThread] 0.50.3 INFO: kubewatch starting: mode 'cluster-id' ambassador_config_dir '/no/such/path' envoy_config_file '/dev/null' debug 'False' delay '1.0' pid 'None'
2019-03-06 17:12:13 kubewatch [23 TMainThread] 0.50.3 INFO: namespace default, watching all namespaces
2019-03-06 17:12:14,131 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d748>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': /api/v1/namespaces/default
2019-03-06 17:12:14 kubewatch [23 TMainThread] 0.50.3 WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d748>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': /api/v1/namespaces/default
2019-03-06 17:12:15,136 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d7f0>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': /api/v1/namespaces/default
2019-03-06 17:12:15 kubewatch [23 TMainThread] 0.50.3 WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d7f0>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': /api/v1/namespaces/default
2019-03-06 17:12:16,140 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d898>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': /api/v1/namespaces/default
2019-03-06 17:12:16 kubewatch [23 TMainThread] 0.50.3 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d898>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': /api/v1/namespaces/default
2019-03-06 17:12:17 kubewatch [23 TMainThread] 0.50.3 WARNING: kubewatch failed!
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
OSError: [Errno 113] Host is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d9b0>: Failed to establish a new connection: [Errno 113] Host is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/ambassador/kubewatch.py", line 527, in <module>
main()
File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/ambassador/kubewatch.py", line 518, in main
watcher.run(id_only=True)
File "/ambassador/kubewatch.py", line 342, in run
self.get_cluster_id(v1)
File "/ambassador/kubewatch.py", line 407, in get_cluster_id
ret = v1.read_namespace(wanted)
File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 17572, in read_namespace
(data) = self.read_namespace_with_http_info(name, **kwargs)
File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 17657, in read_namespace_with_http_info
collection_formats=collection_formats)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
_request_timeout=_request_timeout)
File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 342, in request
headers=headers)
File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 205, in request
headers=headers)
File "/usr/lib/python3.6/site-packages/urllib3/request.py", line 68, in request
**urlopen_kw)
File "/usr/lib/python3.6/site-packages/urllib3/request.py", line 89, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/usr/lib/python3.6/site-packages/urllib3/poolmanager.py", line 323, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 667, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 667, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 667, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.96.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f739506d9b0>: Failed to establish a new connection: [Errno 113] Host is unreachable',))
AMBASSADOR: kubewatch cluster-id exited with status 1
AMBASSADOR: shutting down (1)"

An update. I have disabled firewalld service in both master and node and the issue is solved now. I think the root cause was due to PODs in the node were not able to access the kube-dns on the master due to firewall restrictions.

Related

Task gets stuck on Airflow when the DB is unreachable

Too many tasks running at the same time on Airflow, some tasks get heartbeat exception and they get stuck. Sometimes even if the tasks are completed, the airflow can't get the completed log.
our setup:
Airflow v2.3.4 in Docker
PostgreSQL 11.6 on x86_64-pc-linux-gnu
example log:
[2022-10-19, 09:01:15 +03] {base_job.py:229} ERROR - LocalTaskJob heartbeat got an exception
....
File "/home/airflow/.local/lib/python3.7/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "xxx" (xxx), port xxx failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
....
return self.dbapi.connect(*cargs, **cparams)
File "/home/airflow/.local/lib/python3.7/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "xxx" (xxx), port xxx failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
(Background on this error at: https://sqlalche.me/e/14/e3q8)

I tried to deploy my scipt on PythonAnywhere but got following exception

I made a script for telegram that changes the avatar every minute when it starts, the following error appears
Attempt 1 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
Attempt 2 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
Attempt 3 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
Attempt 4 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
Attempt 5 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
Attempt 6 at connecting failed: ConnectionRefusedError: [Errno 111] Connect call failed ('149.154.167.51', 443)
Traceback (most recent call last):
File "/home/ainurfast/clocks/main.py", line 13, in
client.start()
File "/home/ainurfast/.local/lib/python3.9/site-packages/telethon/client/auth.py", line 133, in start
else self.loop.run_until_complete(coro)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/ainurfast/.local/lib/python3.9/site-packages/telethon/client/auth.py", line 140, in _start
await self.connect()
File "/home/ainurfast/.local/lib/python3.9/site-packages/telethon/client/telegrambaseclient.py", line 525, in connect
if not await self._sender.connect(self._connection(
File "/home/ainurfast/.local/lib/python3.9/site-packages/telethon/network/mtprotosender.py", line 127, in connect
await self._connect()
File "/home/ainurfast/.local/lib/python3.9/site-packages/telethon/network/mtprotosender.py", line 253, in _connect
raise ConnectionError('Connection to Telegram failed {} time(s)'.format(self._retries))
ConnectionError: Connection to Telegram failed 5 time(s)
The mtproto protocol for telegram does not work from a free account on PythonAnywhere. You can use the HTTP protocol or upgrade your account.

Openstack Swift stat [Errno 111] Connection refused

When I type
swift stat
I get this error:
HTTPConnectionPool(host='controller', port=8080): Max retries exceeded with url: /v1/AUTH_d17698cf7bbf4dcc8fc59ed6f7b48052 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c597f9910>: Failed to establish a new connection: [Errno 111] Connection refused'))

ubuntu openstack ocata - Discovering versions from the identity service failed

command:
openstack --os-auth-url http://controller:5000/v3 \
--os-project-domain-name default --os-user-domain-name default \
--os-project-name demo --os-username demo token issue
error:
Discovering versions from the identity service failed when creating
the password plugin. Attempting to determine version from URL.
Internal Server Error (HTTP 500)
Error coming in keystone.log:
2018-06-12 10:40:05.888577 mod_wsgi (pid=16170): Target WSGI script '/usr/bin/keystone-wsgi-admin' cannot be loaded as Python module.
2018-06-12 10:40:05.888611 mod_wsgi (pid=16170): Exception occurred processing WSGI script '/usr/bin/keystone-wsgi-admin'.
2018-06-12 10:40:05.888634 Traceback (most recent call last):
2018-06-12 10:40:05.888656 File "/usr/bin/keystone-wsgi-admin", line 51, in <module>
2018-06-12 10:40:05.888688 application = initialize_admin_application()
2018-06-12 10:40:05.888702 File "/usr/lib/python2.7/dist-packages/keystone/server/wsgi.py", line 129, in initialize_admin_application
2018-06-12 10:40:05.888726 config_files=_get_config_files())
2018-06-12 10:40:05.888739 File "/usr/lib/python2.7/dist-packages/keystone/server/wsgi.py", line 53, in initialize_application
2018-06-12 10:40:05.888759 common.configure(config_files=config_files)
2018-06-12 10:40:05.888772 File "/usr/lib/python2.7/dist-packages/keystone/server/common.py", line 30, in configure
2018-06-12 10:40:05.888792 keystone.conf.configure()
2018-06-12 10:40:05.888805 File "/usr/lib/python2.7/dist-packages/keystone/conf/__init__.py", line 126, in configure
2018-06-12 10:40:05.888826 help='Do not monkey-patch threading system modules.'))
2018-06-12 10:40:05.888839 File "/usr/lib/python2.7/dist-packages/oslo_config/cfg.py", line 2288, in __inner
2018-06-12 10:40:05.888860 result = f(self, *args, **kwargs)
2018-06-12 10:40:05.888872 File "/usr/lib/python2.7/dist-packages/oslo_config/cfg.py", line 2478, in register_cli_opt
2018-06-12 10:40:05.888892 raise ArgsAlreadyParsedError("cannot register CLI option")
2018-06-12 10:40:05.888915 ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option
error.log:
[Tue Jun 12 10:12:18.510745 2018] [mpm_event:notice] [pid 29892:tid 139804806121344] AH00491: caught SIGTERM, shutting down
[Tue Jun 12 10:12:29.674244 2018] [wsgi:warn] [pid 16158:tid 139690338350976] mod_wsgi: Compiled for Python/2.7.11.
[Tue Jun 12 10:12:29.674304 2018] [wsgi:warn] [pid 16158:tid 139690338350976] mod_wsgi: Runtime using Python/2.7.12.
[Tue Jun 12 10:12:29.676957 2018] [mpm_event:notice] [pid 16158:tid 139690338350976] AH00489: Apache/2.4.18 (Ubuntu) mod_wsgi/4.3.0 Python/2.7.12 configured -- resuming normal operations
[Tue Jun 12 10:12:29.676985 2018] [core:notice] [pid 16158:tid 139690338350976] AH00094: Command line: '/usr/sbin/apache2'
Please can somebody help me to solve the issue.
Issue solved.
Error was in mod_wsgi according to log. Web Service Gateway Interface (WSGI) middleware pipeline for the Identity service is configured in keystone-paste.ini file, thus verified my file with the openstack docs keystone-paste.ini file available on internet thus changed pipeline configuration and issue get solved.
I have edited /etc/keystone/keystone-paste.ini file
Under [pipeline:public_api]
pipeline = healthcheck cors sizelimit http_proxy_to_wsgi osprofiler url_normalize request_id
changed above line to:
pipeline = healthcheck cors sizelimit http_proxy_to_wsgi osprofiler url_normalize request_id build_auth_context token_auth json_body ec2_extension public_service
Same way edited [pipeline:admin_api]
pipeline = healthcheck cors sizelimit http_proxy_to_wsgi osprofiler url_normalize request_id
changed pipeline to:
pipeline = healthcheck cors sizelimit http_proxy_to_wsgi osprofiler url_normalize request_id build_auth_context token_auth json_body ec2_extension s3_extension admin_service
Also made changes in [pipeline:api_v3]
pipeline = healthcheck cors sizelimit http_proxy_to_wsgi osprofiler url_normalize request_id
changed above line to:
pipeline = healthcheck cors sizelimit http_proxy_to_wsgi osprofiler url_normalize request_id build_auth_context token_auth json_body ec2_extension_v3 s3_extension service_v3
By making following changes issue get solved.

DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and must be a readable file path

I have two CoreOS stable (1185.5.0) servers at home. I try to install kubernetes controller and worker on these two.
i use the coreos-kubernetes scripts to install from https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic, and patched from https://github.com/kfirufk/coreos-kubernetes-multi-node-generic-install-script. I use rkt to run the relevant containers.
I use the following environment variable options override file:
ETCD_AUTHORITY=coreos-3.tux-in.com:2379
ETCD_ENDPOINTS="https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379"
CONTROLLER_ENDPOINT=https://coreos-2.tux-in.com
K8S_VER=v1.5.0-beta.3_coreos.0
HYPERKUBE_IMAGE_REPO=quay.io/coreos/hyperkube
DNS_SERVICE_IP=10.3.0.10
USE_CALICO=true
CONTAINER_RUNTIME=rkt
OVERWRITE_ALL_FILES=true
ADVERTISE_IP=10.79.218.3
ETCD_CERT_FILE="/etc/ssl/etcd/etcd2.pem"
ETCD_KEY_FILE="/etc/ssl/etcd/etcd2-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/ssl/etcd/ca.pem"
ETCD_SCHEME="https"
IS_MASK_UPDATE_ENGINE=false
coreos-2.tux-in.com which resolves to 10.79.218.2 is the controller node.
coreos-3.tux-in.com which resolves to 10.79.218.3 is the worker node.
it seems that the controller script installs fine.
when I try to install the kubernetes worker on the 2nd server, I noticed the following error message keeps appearing in the kubelet log:
2016-12-12 12:24:08,171 6960 [kube-system/kubernetes-dashboard-v1.4.1-kjj0c] ERROR Unhandled Exception killed plugin
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: Traceback (most recent call last):
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: File "<string>", line 773, in main
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: File "<string>", line 64, in __init__
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: File "site-packages/pycalico/datastore.py", line 229, in __init__
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and must be a readable file path. Value provided:
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: 2016-12-12 12:24:08,171 6960 [kube-system/kubernetes-dashboard-v1.4.1-kjj0c] ERROR CNI Error:
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: {
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: "msg": "Unhandled Exception killed plugin",
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: "cniVersion": "0.1.0",
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: "code": 100,
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: "details": null
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: }
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: Traceback (most recent call last):
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: File "<string>", line 773, in main
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: File "<string>", line 64, in __init__
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: File "site-packages/pycalico/datastore.py", line 229, in __init__
Dec 12 12:24:08 coreos-3.tux-in.com kubelet-wrapper[1786]: DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and must be a readable file path. Value provided:
Invalid ETCD_CA_CERT_FILE error message shows that the value provided is empty, which shows that the ETCD_CA_CERT_FILE environment variable is not set for some reason. I tried editing /etc/systemd/system/kubelet.service and adding Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem under [Service] but the results are the same. any ideas ?
it appears there was a problem with parsing ETCD_CA_CERT_FILE parameter in calico node container. i found a bug report about it, can't find it now for some reason so can't paste here, sorry.
so anyhow using the latest calico-node version fixes the issue.
(Version v1.0.0-rc4 instead of 0.19.0)