Can't access pods behind service - kubernetes

I'm attempting to access a WebSphere App Server I have running on Kubernetes, and I can't seem to be able to connect properly. I have a pod running behind a service that should connect it's admin console port to an external NodePort. Problem is it resets the connection every time I try. I had the same issue with an ssh port for another pod, but that was fixed when I fixed a Weave networking error. At this point, I can't tell if it's a Kubernetes or Weave error, or something else all together. Any help would be appreciated.
I'm running KubeV1.1.0 off the release-1.1 branch, Mesos 0.24.0, and Weave 1.0.1. My next step is to try a different version of K8s.

The Kubernetes release-1.1 branch was cut before the master branch started passing conformance tests on Mesos. A lot of work has been done since then to make Kubernetes on Mesos more usable. Not all of those changes/fixes have been backported/cherry-picked to fix the v1.1 release branch. It's probably a better idea to try using the code from master branch for the time being, at least while the kubernetes-mesos integration is in alpha. We hope future point releases will be more stable.
(I work on the Kubernetes-Mesos team at Mesosphere.)

Related

kubetnetes cluster in Azure (AKS) upgrade 1.24.9 in fail state with pods facing intermittent DNS issues

I upgrade AKS using Azure portal from 1.23.5 to 1.24.9. This part finished properly (or so I assumed) based on below status on Azure portal.
I continued with 1.24.9 to 1.25.5. This time it worked partly. Azure portal shows 1.25.5 for nodepool with provision state "Failed". While nodes are still at 1.24.9.
I found that some nodes were having issues connecting to network including outside e.g. github as well as internal "services". For some reason it is intermittent issue. On same node it sometime works and sometimes not. (I had pods running on each node with python.)
Each node has cluster IP in resolv.conf
One of the question on SO had a hint about ingress-nginx compatibility. I found that I had an incompatible version. So I upgraded it to 1.6.4 which is compatible with 1.24 and 1.25 both
But this network issue still persists. I am not sure if this is because AKS provisioning state of "Failed". Connectivity check for this cluster in Azure portal is Success. Only issue reported in Azure portal diagnostics is nodepool provisioning state.
is there anything I need to do after ingress-nginx upgrade for all nodes/pods to get the new config?
Or is there a way to re-trigger this upgrade? although I am not sure why, but just assuming that it may reset the configs on all nodes and might work.

Synchronize and rollback independent deployments in kubernetes

I have k8s setup that contains 2 deployments: client and server deployed from different images. Both deployments have replica sets inside, liveness and readiness probes defined. The client communicates with the server via k8s' service.
Currently, the deployment scripts for both client and server are separated (separate yaml files applied via kustomization). Rollback works correctly for both parts independently but let's consider the following scenario:
1. deployment is starting
2. both deployment configurations are applied
3. k8s master starts replacing pods of server and client
4. server pods start correctly so new replica set has all the new pods up and running
5. client pods have an issue, so the old replica set is still running
In many cases it's not a problem, because client and server work independently, but there are situations when breaking change to the server API is released and both client and server must be updated. In that case if any of these two fails then both should be rolled back (doesn't matter which one fails - both needs to be rolled back to be in sync).
Is there a way to achieve that in k8s? I spent quite a lot of time searching for some solution but everything I found so far describes deployments/rollbacks of one thing at the time and that doesn't solve the issue above.
The problem here is something covered in Blue/Green deployments.
Here is a good reference of Blue/Green deployments with k8s.
The basic idea is, you deploy the new version (Green deployment) while keeping the previous version (Blue deployment) up and running and only allow traffic to the new version (Green deployment) when everything went fine.

Chaos testing of Vertx Application

Pointers on any tool for chaos testing of Vertx application deployed on Openshift. Will chaos monkey work or any other tool out there?
On a local installation of OpenShift (minishift or all on a single machine) Pumba https://github.com/alexei-led/pumba does work well as it can connect to the local docker socket and kill pods.
Only be careful to exclude the OC pods otherwise sudenly the UI will also stop working too.
If you would like to know more about openshift + pumba, please check this slide deck I wrote last year: https://www.jetdrone.xyz/presentations/codemotion-amsterdam-2018/

how to install kubernetes manually?

While getting familiar with kubernetes I do see tons of tools that should helps me to install kubernetes anywhere, but I don't understand exactly what it does inside, and as a result don't understand how to trouble shoot issues.
Can someone provide me a link with tutorial how to install kubernetes without any tools.
There are two good guides on setting up Kubernetes manually:
Kelsey Hightower's Kubernetes the hard way
Kubernetes guide on getting started from scratch
Kelsey's guide assumes you are using GCP or AWS as the infrstructure, while the Kubernetes guide is a bit more agnostic.
I wouldn't recommend running either of these in production unless you really know what you're doing. However, they are great for learning what is going on under the hood. Even if you just read the guides and don't use them to setup any infrastructure you should gain a better understanding of the pieces that make up a Kubernetes cluster. You can then use one of the helpful setup tools to create your cluster, but now you will understand what it is actually doing and can debug when things go wrong.
For simplicity, you can view k8s as three components
etcd
k8s master, which includes kube-apiserver, controller, scheduler
node, which contains kubelet
You can install etcd and k8s master together in one machine. The procedures are
Install etcd. Download etcd package and run it, which is quite
simple. Remember the port of etcd service, e.g. 2379,4001, or any you
set.
Git clone the kubernetes project from github. Find the executable binary file, e.g. for k8s version 1.3, you can find kube-apiserver, kube-controller-manager and kube-scheduler in src/k8s.io/kubernetes/_output/local/bin/linux/amd64 folder
Then run kube-apiserver, specify the etcd ip and port (e.g. --etcd_servers=http://127.0.0.1:4001)
Run scheduler and controller, specifying the apiserver ip and port(e.g. --master=127.0.0.1:8080). There is no oreder between scheduler and controller
Master is running so far. Make sure these processes run without errors. If etcd exits, apiserver would exit. If apiserver exits, scheduler and controller would exit.
On another machine(virtual preferred, network connected), run kubelet. Kubelet could also be found in previous folder(src/k8s.io/kubernetes/_output/local/bin/linux/amd64), specify apiserver ip and port(e.g. --api-servers=http://10.10.10.19:8080). You may install docker or something else on node, which to prove that you could create a container.

How to restart unresponsive kubernetes master in GKE

The kubernetes master in one of my GKE clusters became unresponsive last night following the infrastructure issue in us-central1-a.
Whenever I run "kubectl get pods" in the default namespace I get the following error message:
Error from server: an error on the server has prevented the request from succeeding
If I run "kubectl get pods --namespace=kube-system", I only see the kube-proxy and the fluentd-logging daemon.
I have trying scaling the cluster down to 0 and then scaling it back up. I have also tried downgrading and upgrading the cluster but that seems to apply only to the nodes (not the master). Is there any GKE/K8S API command to issue a restart to the kubernetes master?
There is not a command that will allow you to restart the Kubernetes master in GKE (since the master is considered a part of the managed service). There is automated infrastructure (and then an oncall engineer from Google) that is responsible for restarting the master if it is unhealthy.
In this particular cases, restarting the master had no effect on restoring it to normal behavior because Google Compute Engine Incident #16011 caused an outage on 2016-06-28 for GKE masters running in us-central1-a (even though that isn't indicated on the Google Cloud Status Dashboard). During the incident, many masters were unavailable.
If you had tried to create a GCE cluster using kube-up.sh during that time, you would have similarly seen that it would be unable to create a functional master VM due to the SSD Persistent disk latency issues.
I'm trying to have at least one version to upgrade ready, if you trying to upgrade the master, it will restart and work within few minutes. Otherwise you should wait around 3 days while Google team will reboot it. On e-mail/phone, then won't help you. And unless you have payed support (transition to which taking few days), they won't give a bird.