I've an issue installing k8s with kubespray, the problem is with api server, on start up it complains about some time out errors and goes down.
Bottom line of long error message is like this:
logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x12b\n\x04\n\x00\x12\x00\x12\aFailure\x1a9Timeout: request did not complete within allowed duration\"\aTimeout*\n\n\x00\x12\x00\x1a\x00(\x002\x000\xf8\x03\x1a\x00\"\x00"
Also this is result of health check
-> curl localhost:8080/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[-]poststarthook/bootstrap-controller failed: reason withheld
[+]poststarthook/extensions/third-party-resources ok
[-]poststarthook/ca-registration failed: reason withheld
[+]poststarthook/start-kube-apiserver-informers ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[-]autoregister-completion failed: reason withheld
healthz check failed
I've changed api server manifest and set --v=5 but still I don't see any useful logs.
How can I debug the issue?
I suffered the same problem recently. The health check log is same with you.
Etcd itself is ok, etcdctl can operate it, apiserver shows etcd is ok too.
k8s apiserver log only shows timeout of etcd.
After i check the socket between etcd and apiserver, i found that apiserver had no connection to etcd at all.
so i use check the client-certificates files and found that its validity time is expired.so that apiserver can't establish ssl connection on etcd. But apiserver didn't show the accurate erro.
Hope this help you.
Related
I have a issue with my controller-manager and hours of debugging been not successful.
Log:
"controller_manager.go:232] error running controllers: failed to get api versions from server: failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: the server is currently unable to handle the request"
But if run "kubectl api-versions" then servicecatalog.k8s.io/v1beta1 is listed.
Anyone here who can help me? :/
Getting following exception in wildfly:
ERROR [org.jboss.modcluster] (UndertowEventHandlerAdapter - 1):
MODCLUSTER000042: Error MEM sending ENABLE-APP command to rod.de.mgg.dk/11.10.11.11:400, configuration will be reset: MEM: Can't update or insert host alias
Able to ping this server.
Deployed an application in server but even after undeploy and restart not able to fix it.
Wildfly server group has 3 nodes in it which balance load.
The only response to this i found is following on google:
The indicates a problem with the LB, please inspect apache/undertow logs for the cause.
But i am unable to deduce so. Could anyone please suggest more here?
You need to increase MaxHost in your httpd.conf as per your backend server requirement.
I deployed application 1 on service port 10101. It's an external facing app with label HAPROXY_0_VHOST=vhost1.xxx.xxx. And it works with no problems.
Then I deployed a similar application 2 on service port 10102, with HAPROXY_1_VHOST=vhost2.xxx.xxx. I read Marathon-LB's document and this is my understanding of how to deploy 2 apps on different VHOST. However, curl http://vhost2.xxx.xxx returns HTTP/1.0 503 Service Unavailable.
I confirmed that application 2 is running normally by checking the result from curl marathon-lb.marathon.mesos:10102 on DCOS master node.
Did I configure VHOST incorrectly? Or something else was wrong?
Figured this out: the app for vhost2 should be labeled HAPROXY_0_VHOST=vhost2.xxx.xxx instead of HAPROXY_1_VHOST=vhost2.xxx.xxx. The documentation is note clear here.
I have one successfully working cluster, with out any problems, I've tried to make a copy of it. It's working basically, except one issue - token generated by apiserver is not valid with error message:
6 handlers.go:37] Unable to authenticate the request due to an error: crypto/rsa: verification error
I have api server started up with following parameters:
kube-apiserver --address=0.0.0.0 --admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota --service-cluster-ip-range=10.116.0.0/23 --client_ca_file=/srv/kubernetes/ca.crt --basic_auth_file=/srv/kubernetes/basic_auth.csv --authorization-mode=AlwaysAllow --tls_cert_file=/srv/kubernetes/server.cert --tls_private_key_file=/srv/kubernetes/server.key --secure_port=6443 --token_auth_file=/srv/kubernetes/known_tokens.csv --v=2 --cors_allowed_origins=.* --etcd-config=/etc/kubernetes/etcd.config --allow_privileged=False
I think I'm missing something but can't find what exactly, any help will be appreciated!
So, apparently it was wrong server.key used by controller manager.
According to kubernetes documentation token is generated by controller manager.
While I was doing copy of the all my configuration, I had to change ipaddress and had to change certificate due to this as well. But controller-manager started with "old" certificate and after the change created wrong keys because server.key.
You can see this below flag for api server, it works for me. Check this.
--insecure-bind-address=${OS_PRIVATE_IPV4}
--bind-address=${OS_PRIVATE_IPV4}
--tls-cert-file=/srv/kubernetes/server.cert
--tls-private-key-file=/srv/kubernetes/server.key
--client-ca-file=/srv/kubernetes/ca.crt
--admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ResourceQuota
--token-auth-file=/srv/kubernetes/known_tokens.csv
--basic-auth-file=/srv/kubernetes/basic_auth.csv
--etcd_servers=http://${OS_PRIVATE_IPV4}:4001
--service-cluster-ip-range=10.10.0.0/16
--logtostderr=true
--v=5
I reinstalled some nodes and a master. Now on the master I am getting:
Sep 15 04:53:58 master kube-apiserver[803]: I0915 04:53:58.413581 803 logs.go:41] http: TLS handshake error from $ip:54337: remote error: bad certificate
Where $ip is one of the nodes.
So I likely need to delete or recreate certificates. What would the location of those be? Any recommended commands to recreate or remove those or copy them from node to master or vice versa? Whatever gets me past this error message...
Take a look through the Creating Certificates section of authentication.md. It walks you through the certificates that you need to create and how to pass them to the system components, and you should be able to use that to re-generate certificates for your cluster.