Why shouldn't you run Kubernetes pods for longer than an hour from Composer? - kubernetes

The Cloud Composer documentation explicitly states that:
Due to an issue with the Kubernetes Python client library, your Kubernetes pods should be designed to take no more than an hour to run.
However, it doesn't provide any more context than that, and I can't find a definitively relevant issue on the Kubernetes Python client project.
To test it, I ran a pod for two hours and saw no problems. What issue creates this restriction, and how does it manifest?

I'm not deeply familiar with either the Cloud Composer or Kubernetes Python client library ecosystems, but sorting the GitHub issue tracker by most comments shows this open item near the top of the list: https://github.com/kubernetes-client/python/issues/492
It sounds like there is a token expiration issue:
#yliaog this is an issue for us, as we are running kubernetes pods as
batch processes and tracking the state of the pods with a static
client. Once the client object is initialized, it does no refresh, and
therefore any job that takes longer than 60 minutes will fail. Looking
through python-base, it seems like we could make a wrapper class that
generates a new client (or refreshes the config) every n minutes, or
checks status prior to every call (as #mvle suggested). The best fix
would be in swagger-codegen, but a temporary solution would probably
be very useful for a lot of people.
- #flylo, https://github.com/kubernetes-client/python/issues/492#issuecomment-376581140

https://issues.apache.org/jira/browse/AIRFLOW-3253 is the reason (and hopefully, my fix will be merged soon). As the others suggested, this affects anyone using the Kubernetes Python client with GCP auth. If you are authenticating with a Kubernetes service account, you should see no problem.
If you are authenticating via a GCP service account with gcloud (e.g. using the GKEPodOperator), you will generally see this problem with jobs that take longer than an hour because the auth token expires after an hour.

There are more insights here too.
Currently, long-running jobs on GKE always eventually fail with a 404 error (https://bitbucket.org/snakemake/snakemake/issues/932/long-running-jobs-on-kubernetes-fail). We believe that the problem is in the Kubernetes client, as we determined that although _refresh_gcp_token is being called when the token is expired, the next API call still fails with a 404 error.
You can see here that Snakemake uses the kubernetes python client.

Related

watch of *v1.Pod ended with: too old resource version

I updated my EKS from 1.16 to 1.17. All of sudden I started getting this error:
pkg/mod/k8s.io/client-go#v0.0.0-20180806134042-1f13a808da65/tools/cache/reflector.go:99: watch of *v1.Pod ended with: too old resource version
Checked on git and people were saying that's not an error but my question is how to stop getting these messages? I was not getting this message when I was having EKS 1.16?
Source.
This is a community wiki answer. Feel free to expand it.
In short, there is nothing to worry about when encountering these messages. They mean that there are newer version(s) of the watched resource after the time the client API last acquired a list within that watch window. In other words: a watch against the Kubernetes API is timing out, and it is being restarted, which is a intended behavior.
You can also see that being mentioned here:
this is perfectly expected, no worries. The messages are several hours
apart.
When nothing happens in your cluster, the watches established by the
Kubernetes client don't get a chance to get refreshed naturally, and
eventually time out. These messages simply indicate that these watches
are being re-created.
and here:
these are nothing to worry about. This is a known occurrence in
Kubernetes and is not an issue [0]. The API server ends watch requests
when they are very old. The operator uses a client-go informer, which
takes care of automatically re-listing the resource and then
restarting the watch from the latest resource version.
So answering your question:
my question is how to stop getting these messages
Simply, you don't because:
This is working as expected and is not going to be fixed.

How to create a kubernetes job from a pod

I'm working on a cluster in which I'm performing a lot scraping on Instagram to find valuable accounts and then message them to ask if they're interested in selling their account. This is what my application consists of:
Finding Instagram accounts by scraping for them with a lot of different accounts
Refine the accounts retrieved and sort out the bad ones
Message the chosen accounts
In addition to this, I'm thinking of uploading every data of each step to a database (the whole chunk of accounts gathered in step 1, the refined accounts gathered in step 2, and the messaged users from step 3) in separate collections. I'm also thinking of developing a slack bot that handles errors by messaging me a report of the error and eventually so it can message me whenever a user responds.
As you can see, there are a lot of different parts of this application and that is the reason why I figured that using Kubernetes for this would be a good idea.
My initial approach to this was by making every pod in my node a rest API. Then I could send a request to each pod, each time I wanted them to run. But if figured that this would not be an optimal solution and not in any way a Kubernetes-way approach.
The only way to achieve it in way you describe it is to communicate with Kubernetes API server from inside of your pod. This requires several thing (adding service account and role binding, using kubernetes client etc) and I would not recommended it as regular application flow (unless you are a devops trying to provide some generic/utility solution).
From another angle - sharing a volumes between pods and jobs should be avoided if possible (it adds complexity and restrictions)
You can dit more on this here - https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#accessing-the-api-from-within-a-pod - as a starter.
If I can suggest some solutions:
you can share S3 volume and have Cronjob scheduled to
run every some time. If cronjob finds data - it process it. Therefore you do
not need to trigger job from inside a pod.
Two services, sending data via http (if feasible) - second service don't do
anything when it is not requested from it.
If you share your usecase with some details probably better answers could be provided.
Cheers
There is out of the box support in kubectl to run a job from a cronjob (kubectl create job test-job --from=cronjob/a-cronjob), but there is no official support for running a job straight from a pod. You will need to get the pod resource from the cluster and then create a job by using the pod specification as part of the job specification.

AWS elastic search cluster becoming unresponsive

we have several AWS elastic search domains which sometimes become unresponsive for no apparent reason. The ES endpoint as well as Kibana return bad gateway errors after a few minutes of trying to load the resources.
The node status message is the following (not that it's any help):
/_cluster/health: {"code":"ProxyRequestServiceException","message":"Unable to execute HTTP request: Read timed out (Service: null; Status Code: 0; Error Code: null; Request ID: null)"}
Error logs are activated for the cluster but do not show anything relevant for the time at which the cluster became inactive.
I would like to at least be able to restart the cluster but the status remains "processing" seemingly forever.
Unfortunately, if you are using the AWS ElasticSearch Service (as in not building it on your own EC2 instances), many... well... MOST... of the admin API's and capabilities are restricted so you cannot dig as much into it as you could if you built it from the ground up.
I have found that AWS Support does a pretty good job in getting to the bottom of things when needed, so I would suggest you open a support ticket.
I wish this wasn't the case, but using their service is nice and easy (as in you don't have to build and maintain the infra yourself), but you lose a LOT of capabilities from an Admin or Troubleshooting perspective. :(

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.

Kubernetes etcd HighNumberOfFailedHTTPRequests QGET

I run kubernetes cluster in AWS, CoreOS-stable-1745.6.0-hvm (ami-401f5e38), all deployed by kops 1.9.1 / terraform.
etcd_version = "3.2.17"
k8s_version = "1.10.2"
This Prometheus alert method=QGET alertname=HighNumberOfFailedHTTPRequests is coming from coreos kube-prometheus monitoring bundle. The alert started to fire from the very beginning of the cluster lifetime and now exists for ~3 weeks without visible impact.
^ QGET fails - 33% requests.
NOTE: I have the 2nd cluster in other region built from scratch on the same versions and it has exact same behavior. So it's reproducible.
Anyone knows what might be the root cause, and what's the impact if ignored further?
EDIT:
Later I found this GH issue which describes my case precisely: https://github.com/coreos/etcd/issues/9596
From CoreOS documentation:
For alerts to not appear on arbitrary events it is typically better not to alert directly on a raw value that was sampled, but rather by aggregating and defining a relative threshold rather than a hardcoded value. For example: send a warning if 1% of the HTTP requests fail, instead of sending a warning if 300 requests failed within the last five minutes. A static value would also require a change whenever your traffic volume changes.
Here you can find detailed information on how to Develop Prometheus alerts for etcd.
I got the explanation in GitHub issue thread.
HTTP metrics/alerts should be replaced with GRPC.