ansible k8s module failing to connect to cluster with 503 - appends /version/openshift to non openshift cluster - kubernetes

I'm trying to use ansible new k8s module (based ok k8_raw from 2.6) to maintain an aks k8 cluster.
While I can work with the cluster with kubectl , any command with the k8s cluster fails with a 503 error.
For example this task:
- name: deploy kured daemonset
k8s:
state: present
context: "{{ cluster_name}}"
host: "redacted"# tried specifying this, but does not help
kubeconfig: "~/.kube/config"
src: "aks/utils/kured-ds.yaml"
And failure:
Traceback (most recent call last):
File "/home/alonisser/.ansible/tmp/ansible-tmp-1549320815.98-157731551192134/AnsiballZ_k8s.py", line 113, in <module>
_ansiballz_main()
File "/home/alonisser/.ansible/tmp/ansible-tmp-1549320815.98-157731551192134/AnsiballZ_k8s.py", line 105, in _ansiballz_main
invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File "/home/alonisser/.ansible/tmp/ansible-tmp-1549320815.98-157731551192134/AnsiballZ_k8s.py", line 48, in invoke_module
imp.load_module('__main__', mod, module, MOD_DESC)
File "/tmp/ansible_k8s_payload_IYmGFG/__main__.py", line 233, in <module>
File "/tmp/ansible_k8s_payload_IYmGFG/__main__.py", line 229, in main
File "/tmp/ansible_k8s_payload_IYmGFG/ansible_k8s_payload.zip/ansible/module_utils/k8s/raw.py", line 131, in execute_module
File "/tmp/ansible_k8s_payload_IYmGFG/ansible_k8s_payload.zip/ansible/module_utils/k8s/common.py", line 172, in get_api_client
File "/home/alonisser/.local/lib/python2.7/site-packages/openshift/dynamic/client.py", line 103, in __init__
self.__init_cache()
File "/home/alonisser/.local/lib/python2.7/site-packages/openshift/dynamic/client.py", line 113, in __init_cache
self.__resources.update(self.parse_api_groups())
File "/home/alonisser/.local/lib/python2.7/site-packages/openshift/dynamic/client.py", line 169, in parse_api_groups
new_group[version] = self.get_resources_for_api_version(prefix, group['name'], version, preferred)
File "/home/alonisser/.local/lib/python2.7/site-packages/openshift/dynamic/client.py", line 181, in get_resources_for_api_version
resources_response = load_json(self.request('GET', path))['resources']
File "/home/alonisser/.local/lib/python2.7/site-packages/openshift/dynamic/client.py", line 363, in request
_return_http_data_only=params.get('_return_http_data_only', True)
File "/home/alonisser/.local/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 321, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/home/alonisser/.local/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
_request_timeout=_request_timeout)
File "/home/alonisser/.local/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 342, in request
headers=headers)
File "/home/alonisser/.local/lib/python2.7/site-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/home/alonisser/.local/lib/python2.7/site-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (503)
Reason: Service Unavailable
Ansible version: 2.7/8(dev)
What am I missing?
UPDATE:
When I've added print statement to the libs used by the module beneath I found out somewhere in the pipeline /version/openshift is appended to the host name, which of course fails, because it's a non openshift cluster
Any work around for this bug?

Answer: turned out there were two failing requests. the first is to version/openshift is catched by the client and doesn't cause the crash. the crash actually happened because of an error with my cluster metrics server, which while not really needed by the k8 client used by ansible still fails a request to it.
So if anyone bumps into it, might be helpful

Related

AttributeError: 'AuthorizedSession' object has no attribute 'configure_mtls_channel'

I was orchestrating two dataflow job with cloud composer and it was working fine for month. Suddenly the two jobs stopped working with the following error message:
in download_blob File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/client.py",
line 399, in get_bucket retry=retry, File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/bucket.py",
line 1002, in reload retry=retry, File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/_helpers.py",
line 225, in reload retry=retry, File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/_http.py",
line 63, in api_request return call() File
"/usr/local/lib/python3.6/site-packages/google/api_core/retry.py",
line 286, in retry_wrapped_func on_error=on_error, File
"/usr/local/lib/python3.6/site-packages/google/api_core/retry.py",
line 184, in retry_target return target() File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
479, in api_request timeout=timeout, File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
337, in _make_request method, url, headers, data, target_object,
timeout=timeout File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
374, in _do_request return self.http.request( File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
157, in http return self._client._http File
"/usr/local/lib/python3.6/site-packages/google/cloud/client.py", line
187, in _http
self._http_internal.configure_mtls_channel(self._client_cert_source)
AttributeError: 'AuthorizedSession' object has no attribute
'configure_mtls_channel'
In the jobs I download a file from google cloud storage with the storage client. I assumed it was because of some dependencies issues. In the composer environment I installed google-cloud-storage without specifying a version. I tried specifying different versions of the package but nothing seems to work.
Thanks!
This seems to be related to this issue.
Try pinning google-cloud-core to 1.5.0, then I highly recommend for you to Drain your jobs once you get them back to work (assuming they have streaming jobs).

Not able to run OpenDistro for Elastic in kubernetes as non-root -supervisord error

I am setting up OpenDistro for Elastic in Kubernetes. The cluster has pod security in place that will not allow privileged pods. When I start the cluster the logs indicated a permission issue with /usr/share/supervisor/supervisord.log
I have a securityContext set on the deployment
securityContext:
runAsUser: 1000
fsGroup: 1000
``
The error message from kubectl logs es-master-0 is
```/usr/share/elasticsearch/config/elasticsearch.yml seems to be already configured for Security. Quit.
Traceback (most recent call last):
File "/usr/bin/supervisord", line 9, in <module>
load_entry_point('supervisor==4.0.2', 'console_scripts', 'supervisord')()
File "/usr/lib/python2.7/site-packages/supervisor-4.0.2-py2.7.egg/supervisor/supervisord.py", line 358, in main
go(options)
File "/usr/lib/python2.7/site-packages/supervisor-4.0.2-py2.7.egg/supervisor/supervisord.py", line 368, in go
d.main()
File "/usr/lib/python2.7/site-packages/supervisor-4.0.2-py2.7.egg/supervisor/supervisord.py", line 70, in main
self.options.make_logger()
File "/usr/lib/python2.7/site-packages/supervisor-4.0.2-py2.7.egg/supervisor/options.py", line 1472, in make_logger
backups=self.logfile_backups,
File "/usr/lib/python2.7/site-packages/supervisor-4.0.2-py2.7.egg/supervisor/loggers.py", line 417, in handle_file
handler = RotatingFileHandler(filename, 'a', maxbytes, backups)
File "/usr/lib/python2.7/site-packages/supervisor-4.0.2-py2.7.egg/supervisor/loggers.py", line 212, in __init__
FileHandler.__init__(self, filename, mode)
File "/usr/lib/python2.7/site-packages/supervisor-4.0.2-py2.7.egg/supervisor/loggers.py", line 159, in __init__
self.stream = open(filename, mode)
IOError: [Errno 13] Permission denied: '/usr/share/supervisor/supervisord.log'

Airflow scheduler failure

I have followed
this tutorial in attempt to build an airflow cluster on localhost with my own DAGs. When I ran airflow scheduler after having set executor = CeleryExecutor in the config file, I received the following traceback:
Traceback (most recent call last):
File "/home/yurii/Tools/anaconda3/bin/airflow", line 28, in
args.func(args)
File"/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/bin/cli.py", line 839, in scheduler job.run()
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 200, in run
self._execute()
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 1309, in _execute
self._execute_helper(processor_manager)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 1441, in _execute_helper
self.executor.heartbeat()
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/executors/base_executor.py", line 124, in heartbeat
self.execute_async(key, command=command, queue=queue)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 80, in execute_async
args=[command], queue=queue)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/celery/app/task.py", line 573, in apply_async
**dict(self._get_exec_options(), **options)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 354, in send_task
reply_to=reply_to or self.oid, **options
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/celery/app/amqp.py", line 310, in publish_task
**kwargs
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/kombu/messaging.py", line 172, in publish
routing_key, mandatory, immediate, exchange, declare)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/kombu/connection.py", line 449, in _ensured
return fun(*args, **kwargs)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/kombu/messaging.py", line 188, in _publish
mandatory=mandatory, immediate=immediate,
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/librabbitmq/init.py", line 122, in basic_publish
mandatory or False, immediate or False,
TypeError: an integer is required (got type NoneType)
Some additional information:
I am using Airflow 1.8.0 along with Celery 3.1.25 and RabbitMQ 3.5.7 as a broker and backend, but also tried Airflow 1.9.0 with Celery 4.2.
Airflow with sequential executor works without any problems.
`airflow test "dag_name" "task_name" "exec_date" runs succeessfully.
I am new to Airflow/Celery/RabbitMQ/SQL, so any help would be appreciated!
To add to previous answer. Using py-amqp involves either changing from broker_url = amqp://XXXXX to broker_url = pyamqp://XXXXX OR
pip uninstall librabbitmq.
Additionally you may need to change celery_result_backend variable to result_backend in your airflow.cfg. The celery_ prefix has been removed for variables in the [celery] node in airflow.cfg in recent versions.
It seems you are using librabbitmq as amqp broker which is not recommended by celery core team. Use py-amqp as the rabbitmq broker and you should get rid of this error.

Librabbitmq 2.0.0 with Python 3 gives TypeError: can't pickle memoryview objects

I am using the latest master branch of the git repo https://github.com/celery/librabbitmq and installing librabbitmq==2.0.0 for Python 3.6 by following the instructions in the readme
Using the development version
You can clone the repository by doing the following:
$ git clone git://github.com/celery/librabbitmq.git
Then install it by doing the following:
$ cd librabbitmq
$ make install # or make develop
This works fine (after installing certain binaries for c compliation in the OS), but when I then make a small a+b add task and call it with add.delay(2,2) it fails with the following error. I looked up and saw that Celery 4 uses json as serializer, so clearly it is not because if pickle serialization
Changing from librabbitmq to pyamqp broker works normally
Same exact situation in both MacOS and Ubuntu 16
[2018-04-30 23:40:02,956: CRITICAL/MainProcess] Unrecoverable error:
SystemError(' returned a result with an error set',) Traceback (most
recent call last): File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/messaging.py",
line 624, in _receive_callback
return on_m(message) if on_m else self.receive(decoded, message) File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 570, in on_task_received
callbacks, File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/strategy.py",
line 145, in task_message_handler
handle(req) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 221, in _process_task_sem
return self._quick_acquire(self._process_task, req) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/async/semaphore.py",
line 62, in acquire
callback(*partial_args, **partial_kwargs) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 226, in _process_task
req.execute_using_pool(self.pool) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/request.py",
line 531, in execute_using_pool
correlation_id=task_id, File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/concurrency/base.py",
line 155, in apply_async
**options) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/billiard/pool.py",
line 1486, in apply_async
self._quick_put((TASK, (result._job, None, func, args, kwds))) File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/concurrency/asynpool.py",
line 813, in send_job
body = dumps(tup, protocol=protocol) TypeError: can't pickle memoryview objects
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 203, in start
self.blueprint.start(self) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 119, in start
step.start(parent) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 370, in start
return self.obj.start() File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 320, in start
blueprint.start(self) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 119, in start
step.start(parent) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 596, in start
c.loop(*c.loop_args()) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/loops.py",
line 88, in asynloop
next(loop) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/async/hub.py",
line 354, in create_loop
cb(*cbargs) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/transport/base.py",
line 236, in on_readable
reader(loop) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/transport/base.py",
line 218, in _read
drain_events(timeout=0) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/librabbitmq-2.0.0-py3.6-macosx-10.6-intel.egg/librabbitmq/init.py",
line 227, in drain_events
self._basic_recv(timeout) SystemError: returned a result with an error set
This library is not recommended to use as rabbitmq broker with celery. Instead please try py-amqp. this is more maintained and less buggy.

ERROR: gcloud crashed (CannotConnectToMetadataServerException): <urlopen error [Errno -2] Name does not resolve>

I am having issues configuring my container to point to my Kubernetes cluster with the command gcloud container clusters get-credentials. I get the following error.
ERROR: gcloud crashed (CannotConnectToMetadataServerException): <urlopen error [Errno -2] Name does not resolve>
If you would like to report this issue, please run the following command:
gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
Enhanced logging:
CannotConnectToMetadataServerException: <urlopen error [Errno -2] Name does not resolve>
2018-04-10 18:00:42,625 ERROR ___FILE_ONLY___ BEGIN CRASH STACKTRACE
Traceback (most recent call last):
File "/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 147, in main
gcloud_cli.Execute()
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 818, in Execute
self._HandleAllErrors(exc, command_path_string, specified_arg_names)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 856, in _HandleAllErrors
exceptions.HandleError(exc, command_path_string, self.__known_error_handler)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/exceptions.py", line 526, in HandleError
core_exceptions.reraise(exc)
File "/google-cloud-sdk/lib/googlecloudsdk/core/exceptions.py", line 111, in reraise
six.reraise(type(exc_value), exc_value, tb)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 792, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 751, in Run
self._parent_group.RunGroupFilter(tool_context, args)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 692, in RunGroupFilter
self._parent_group.RunGroupFilter(context, args)
File "/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 693, in RunGroupFilter
self._common_type().Filter(context, args)
File "/google-cloud-sdk/lib/surface/container/__init__.py", line 71, in Filter
context['api_adapter'] = api_adapter.NewAPIAdapter('v1')
File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/container/api_adapter.py", line 147, in NewAPIAdapter
return NewV1APIAdapter()
File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/container/api_adapter.py", line 151, in NewV1APIAdapter
return InitAPIAdapter('v1', V1Adapter)
File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/container/api_adapter.py", line 172, in InitAPIAdapter
api_client = core_apis.GetClientInstance('container', api_version)
File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/apis.py", line 297, in GetClientInstance
api_name, api_version, no_http, _CheckResponse, enable_resource_quota)
File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/apis_internal.py", line 153, in _GetClientInstance
http_client = http.Http(enable_resource_quota=enable_resource_quota)
File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/http.py", line 64, in Http
creds = store.LoadIfEnabled()
File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/store.py", line 281, in LoadIfEnabled
return Load()
File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/store.py", line 348, in Load
cred = STATIC_CREDENTIAL_PROVIDERS.GetCredentials(account)
File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/store.py", line 162, in GetCredentials
cred = provider.GetCredentials(account)
File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/store.py", line 214, in GetCredentials
if account in c_gce.Metadata().Accounts():
File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/gce.py", line 127, in Accounts
gce_read.GOOGLE_GCE_METADATA_ACCOUNTS_URI + '/')
File "/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 289, in DecoratedFunction
exceptions.reraise(to_reraise[1], tb=to_reraise[2])
File "/google-cloud-sdk/lib/googlecloudsdk/core/exceptions.py", line 111, in reraise
six.reraise(type(exc_value), exc_value, tb)
File "/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 159, in TryFunc
return func(*args, **kwargs), None
File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/gce.py", line 52, in _ReadNoProxyWithCleanFailures
raise CannotConnectToMetadataServerException(e)
CannotConnectToMetadataServerException: <urlopen error [Errno -2] Name does not resolve>
To give some color, we kick off a build to CircleCI everytime we push code to github. However, we have a container we call internally belushi, that we use to run our entire infrastructure. This container has gcloud installed in it. CircleCI infrastructure is on AWS and when they spin up the belushi container we actually run gcloud get-credentials that point the belushi container to our project in google cloud, which has a kubernetes cluster configured and we run all of our functional CI testing in that cluster. So we need that belushi pod to configure into the ci project to move forward.
The weird thing is that the belushi:latest image always configures properly; however, when we are working on belushi we often branch and create a new image to run tests. So for example, I will create a branch in belushi and then have a new hash of 1234567, so we will spin up the belushi:1234567 image and try to run things, and the first thing we do is configure it to point to the ci project; however, we get that metadata resolve issue.
I feel like it is DNS related or maybe the metadata server isn't allow the new image of belushi to communicate with it right away. After I retry it a bunch of times it will eventually configure properly (without any code changes). So I wonder if the metadata server is rejecting it for some reason or it could be on AWS not resolving for some unknown reason.
First thing you can do to troubleshoot is, when you get this error, attempt this:
curl -H "Metadata-Flavor:Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/
The metadata server should respond straight away with your service account metadata.
Is your container behind any kind of http proxy?