Unable to enter a pod in the gke cluster - kubernetes

We have our k8s cluster set up with our app, including a neo4j DB deployment and other artifacts. Overnight, we've started facing an issue in our GKE cluster when trying to enter or interact somehow with any pod running in the cluster. The following screenshot shows a sample of the error we get.
issued command
error: unable to upgrade connection: Authorization error (user=kube-apiserver, verb=create, resource=nodes, subresource=proxy)
Our GKE cluster is created as standard (no autopilot) and the versions are
Node pool details
cluster basics
As said before it was working fine regardless of the warning about the versions. However, we haven't been able yet to identify what could have changed between the last time it worked, and now.
Any clue on what authorization setup might have been changed making it incompatible now is very welcomed

Related

AKS : Kubernetes coreDNS fails to resolve headless services

We have multiple headless service running in our Azure's AKS VMAS cluster. Sometimes (randomly), we have observed that the coredns fails to resolve the headless services with the following error logs:
E0909 09:31:22.241120 1 runtime.go:73] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
Please note that, while facing the above mentioned issues, the non-headless service(services which have cluster IPs), gets resolved properly without any hassle.
For resolving the issue in the dev/svt environment, we terminate the coredns pod in kube-system namespace, and everything starts working fine again, for brief period of time - 1/2 days.
This deletion operation cannot be performed in the customer deployment scenario.
We raised a ticket with the AKS team, but since coredns is a third-party project, it doesn't come under Azure's support domain.
Has anyone faced this issue with coredns?
What is the permanent solution for this issue ?
Maybe it will help someone https://github.com/coredns/coredns/issues/4022
This is a known defect in CoreDNS you need to upgrade CoreDNS inside AKS to use a newer version with the fix applied 1.7.0

FailedToUpdateEnpoint in kubernetes

I have a kubernetes cluster with some deployments and pods.I have experienced a issue with my deployments with error messages like FailedToUpdateEndpoint, RedinessprobeFailed.
This errors are unexpected and didn't have idea about it.When we analyse the logs of our, it seems like someone try hack our cluster(not sure about it).
Thing to be clear:
1.Is there any chance someone can illegally access our kubernetes cluster without having the kubeconfig?
2.Is there any chance, by using the frontend IP,access our apps and make changes in cluster configurations(means hack the cluster services via Web URL)?
3.Even if the cluster access illegally via frontend URL, is there any chance to change the configuration in cluster?
4.Is there is any mechanism to detect, whether the kubernetes cluster is healthy state or hacked by someone?
Above three mentioned are focus the point, is there any security related issues with kubernetes engine.If not
Then,
5.Still I work on this to find reason for that errors, Please provide more information on that, what may be the cause for these errors?
Error Messages:
FailedToUpdateEndpoint: Failed to update endpoint default/job-store: Operation cannot be fulfilled on endpoints "job-store": the object has been modified; please apply your changes to the latest version and try again
The same error happens for all our pods in cluster.
Readiness probe failed: Error verifying datastore: Get https://API_SERVER: context deadline exceeded; Error reaching apiserver: taking a long time to check apiserver

Greenplum install on GKE

I am trying to install Greenplum on GKE using the directions here
I make it to step 12: but my operator pod is failing because it cannot pull the secret:
kubectl logs -l app=greenplum-operator -n greenplum
{"level":"INFO","ts":"2020-03-10T18:20:50.803Z","logger":"operator-setup","msg":"Go Info","Version":"go1.13.7","GOOS":"linux","GOARCH":"amd64"}
{"level":"INFO","ts":"2020-03-10T18:20:50.803Z","logger":"operator-setup","msg":"creating operator"}
W0310 18:20:50.803978 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0310 18:20:50.804036 1 client_config.go:546] error creating inClusterConfig, falling back to default config: open /var/run/secrets/kubernetes.io/serviceaccount/token: permission denied
It looks like a permissions issue pulling the image, but the image pull test earlier in the instructions succeeded:
job.batch/greenplum-operator-fetch-test created
GREENPLUM-OPERATOR TEST OK
job.batch "greenplum-operator-fetch-test" deleted
Has anyone else run into this issue?
There's a bug the current documentation. You most likely did everything right. However, creating a GKE cluster with "Enable Kubernetes alpha features in this cluster" as listed on the prerequisites page (https://greenplum-kubernetes.docs.pivotal.io/1-12/prepare-gke.html) is no longer necessary. In fact, it's currently causing the exact issue you seem to be having. Try creating a GKE cluster following all of the documentation except make sure to NOT enable GKE "alpha features".

Azure Service Fabric Cluster returns nothing for code-versions and config-versions

In short: both the "sfctl cluster code-versions" and "sfctl cluster config-versions" return empty arrays. Is this a symptom of a problem with the cluster?
Background: I am attempting to follow the Create a Linux container app tutorial, for learning about Service Fabric; but I have run into a problem when the application upload fails with a timeout.
On investigating this, I found that the other sfctl cluster commands (e.g. sfctl cluster health) all worked and returned useful data - except code-versions and config-versions, which both return an empty array:
$ sfctl cluster code-versions
[]
$ sfctl cluster config-versions
[]
I'm not sure if that's unhealthy, or what kind of data they might be returning.
Other notes:
The cluster is secured with a self-signed certificate; this is installed locally and works correctly, but both the above commands also log a warning:
~/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning)
However, the same warning is logged for the other commands (e.g. sysctl cluster health) and doesn't stop them from working.
The cluster is at version 6.4.634.1, on Linux
Service Fabric Explorer shows everything as Healthy: Cluster Health State, System Application Health State, and the 3 nodes.
The Azure portal shows the cluster status as "Baseline upgrade"
Explorer shows the cluster as having Code Version "0.0.0.0"

GCE Image suddenly not found

I'm running kubernetes on GCE. I used kube-up.sh to create the cluster and the nodes and masters are all running the image gci-stable-56-9000-84-2. I deleted a few nodes today which triggered the autoscaler to create new ones. But they failed with the following error.
Instance 'kubernetes-minion-30gb-20180131-9jwn' creation failed: The
resource
'projects/google-containers/global/images/gci-stable-56-9000-84-2' was
not found (when acting as 'REDACTED')
Is it possible this image was deleted somehow? I don't think I changed any access controls or permissions for any service accounts.
The image is present on this page:
https://cloud.google.com/container-optimized-os/docs/release-notes#cos-stable-56-9000-84-2
This error could be due to authentication issues. Re-authenticate to the gcloud command-line tool with command ‘gcloud auth login’
It could be as well that the Kubernetes Engine service account has been deleted or edited. Check this: https://cloud.google.com/kubernetes-engine/docs/troubleshooting#error_404