Marathon Proxy loop detected - marathon

Running Marathon in HA ver 1.1.1-1.0.472.el7.
One of the non-leader nodes is returning 502 "Detecting Proxy Loop".
When checking the logs for each request I am seeing the following:
Oct 17 11:13:10 marathon-2-server marathon[13362]: [2016-10-17 11:13:10,130] INFO Proxying request to GET http://marathon-1-server:8080/favicon.ico from marathon-2-server:8080 (mesosphere.marathon.api.JavaUrlConnectionRequestForwarder$:qtp1485208789-526699)
Oct 17 11:13:10 marathon-2-server marathon[13362]: [2016-10-17 11:13:10,131] INFO Proxying request to GET http://marathon-1-server:8080/favicon.ico from marathon-2-server:8080 (mesosphere.marathon.api.JavaUrlConnectionRequestForwarder$:qtp1485208789-530887)
Oct 17 11:13:10 marathon-2-server marathon[13362]: [2016-10-17 11:13:10,131] ERROR Prevent proxy cycle, rejecting request (mesosphere.marathon.api.JavaUrlConnectionRequestForwarder$:qtp1485208789-530887)

Marathon had some issues with leader proxy detecting during leader election, but this should be solved with 1.3 or newer.

Related

Why kafka crash?

I have kafka 2.5.0
My service kafka crash sometimes.
kafka/logs/server.log
[09:25:23,316] WARN Unable to read additional data from client sessionid 0x1000001a8fd0012, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn)
/var/log/messages
09:25:23 kafka1 systemd: kafka.service: main process exited, code=exited, status=1/FAILURE
09:25:23 kafka1 systemd: Unit kafka.service entered failed state.
09:25:23 kafka1 systemd: kafka.service failed.
How to find out why this happens?
Check Zookeeper first if it is running.
If it is running, try to change these settings in zoo.cfg:
autopurge.snapRetainCount=15 (at least)
autopurge.purgeInterval=1 - 2 hours
Some hints might be here:
zookeeper + Unable to read additional data from client session id
ZooKeeper keeps getting EndOfStreamException, causing a crash

Connection timeout on cluster.openBucket call with Couchbase / Kubernetes

I have deployed a 4 node Couchbase cluster using Google GKE.
The master node exposes ports 8091, 8093 to the Loadbaancer.
When connecting to the Loadbalancer IP (external) via a Java app to insert data, I get the timeout error with this stack:
Apr 04, 2017 3:32:15 PM com.couchbase.client.core.endpoint.AbstractEndpoint$2 operationComplete
WARNING: [null][ViewEndpoint]: Socket connect took longer than specified timeout.
Apr 04, 2017 3:32:15 PM com.couchbase.client.core.endpoint.AbstractEndpoint$2 operationComplete
WARNING: [null][KeyValueEndpoint]: Socket connect took longer than specified timeout.
Apr 04, 2017 3:32:15 PM com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise notifyListener0
WARNING: An exception was thrown by com.couchbase.client.core.endpoint.AbstractEndpoint$2.operationComplete()
rx.exceptions.OnErrorNotImplementedException: connection timed out: /10.4.0.3:8093
at rx.Observable$26.onError(Observable.java:7955)
at rx.observers.SafeSubscriber._onError(SafeSubscriber.java:159)
at rx.observers.SafeSubscriber.onError(SafeSubscriber.java:120)
at rx.internal.operators.OperatorMap$1.onError(OperatorMap.java:48)
What's puzzling is that the stack shows 10.4.0.3:8093 which is actually the the IP of the docker container.
Appreciate all suggestions.
Have you checked the firewall rules for the master node and the workers? You need to allow ingress for the ports you have set up.
See this answer

Mesos-marthon cluster issue Could not determine the current leader

I am new in Mesos and Marathon services. I have setup 3 master and 3 slave server as per www.digitalocean.com. Configured as it is in master servers as well as slaves. Finally I done setup of Mesos, Marathon, Zookeeper and Chronos. Mesos is able to listing with 5050, Marathon is 8080 and Chronos 4400. After few hours my Marthon instances are showing like Error 503
HTTP ERROR: 503
Problem accessing /. Reason:
Could not determine the current leader
Powered by Jetty:// 9.3.z-SNAPSHOT.
But mesos is working fine. Every time i am facing this problem and if i restart the marathon service and zookeeper service its working fine.
Marathon
Jun 15 06:19:20 master3 marathon[1054]: INFO Waiting for consistent leadership state. Are we leader?: false, leader: Some(192.168.4.78:8080 (mesosphere.marathon.api.LeaderProxyFilter$:qtp522188921-35)
Jun 15 06:19:20 master3 marathon[1054]: INFO Waiting for consistent leadership state. Are we leader?: false, leader: Some(192.168.4.78:8080 (mesosphere.marathon.api.LeaderProxyFilter$:qtp522188921-35)
Zookeeper
2016-06-15 03:41:13,797 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#197] - Accepted socket connection from /192.168.4.78:38339
2016-06-15 03:41:13,798 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running

haproxy as loadbalancer for kube-apiserver

I've set up a kubernetes cluster with three masters. The kube-apiserver should be stateless. To properly access them from the worker nodes, I've configured an haproxy which is configured to provide the ports (8080) of the apiserver.
frontend http_front_8080
bind *:8080
stats uri /haproxy?stats
default_backend http_back_8080
backend http_back_8080
balance roundrobin
server m01 192.168.33.21:8080 check
server m02 192.168.33.22:8080 check
server m03 192.168.33.23:8080 check
But when I run the nodes with the loadbalancers ip as the address of the apiserver I'll receive this errors:
Apr 20 12:35:07 n01 kubelet[3383]: E0420 12:35:07.308337 3383 reflector.go:271] pkg/kubelet/kubelet.go:240: Failed to watch *api.Service: too old resource version: 4001 (4041)
Apr 20 12:36:48 n01 kubelet[3383]: E0420 12:36:48.321021 3383 reflector.go:271] pkg/kubelet/kubelet.go:240: Failed to watch *api.Service: too old resource version: 4011 (4041)
Apr 20 12:37:31 n01 kube-proxy[3408]: E0420 12:37:31.381042 3408 reflector.go:271] pkg/proxy/config/api.go:47: Failed to watch *api.Service: too old resource version: 4011 (4041)
Apr 20 12:41:42 n01 kube-proxy[3408]: E0420 12:41:42.409604 3408 reflector.go:271] pkg/proxy/config/api.go:47: Failed to watch *api.Service: too old resource version: 4011 (4041)
If I change the loadbalancers IP to one of the masters nodes it works as expected (without these error messages above).
Am I something missing in my haproxy configuration which is vital for running this config?
I had the same issue as you. I assume the watch requires some sort of state on the api server side.
The solution is to change the configuration so all the requests from a client go to the same server using balance source. I assume you only have multiple api servers so kubernetes is highly available (instead of load balancing).
frontend http_front_8080
bind *:8080
stats uri /haproxy?stats
default_backend http_back_8080
backend http_back_8080
balance source
server m01 192.168.33.21:8080 check
server m02 192.168.33.22:8080 check
server m03 192.168.33.23:8080 check

Kubernetes starts giving errors after few hours of uptime

I have installed K8S on OpenStack following this guide.
The installation went fine and I was able to run pods but after some time my applications stops working. I can still create pods but request won't reach the services from outside the cluster and also from within the pods. Basically, something in networking gets messed up. The iptables -L -vnt nat still shows the proper configuration but things won't work.
To make it working, I have to rebuild cluster, removing all services and replication controllers doesn't work.
I tried to look into the logs. Below is the journal for kube-proxy:
Dec 20 02:12:18 minion01.novalocal systemd[1]: Started Kubernetes Proxy.
Dec 20 02:15:52 minion01.novalocal kube-proxy[1030]: I1220 02:15:52.269784 1030 proxier.go:487] Opened iptables from-containers public port for service "default/opensips:sipt" on TCP port 5060
Dec 20 02:15:52 minion01.novalocal kube-proxy[1030]: I1220 02:15:52.278952 1030 proxier.go:498] Opened iptables from-host public port for service "default/opensips:sipt" on TCP port 5060
Dec 20 03:05:11 minion01.novalocal kube-proxy[1030]: W1220 03:05:11.806927 1030 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [1433/544]) [2432] Reason: Details:<nil> Code:0}
Dec 20 03:06:08 minion01.novalocal kube-proxy[1030]: W1220 03:06:08.177225 1030 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [1476/207]) [2475] Reason: Details:<nil> Code:0}
..
..
..
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448570 1030 proxier.go:161] Failed to ensure iptables: error creating chain "KUBE-PORTALS-CONTAINER": fork/exec /usr/sbin/iptables: too many open files:
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: W1220 16:01:23.448749 1030 iptables.go:203] Error checking iptables version, assuming version at least 1.4.11: %vfork/exec /usr/sbin/iptables: too many open files
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448868 1030 proxier.go:409] Failed to install iptables KUBE-PORTALS-CONTAINER rule for service "default/kubernetes:"
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448906 1030 proxier.go:176] Failed to ensure portal for "default/kubernetes:": error checking rule: fork/exec /usr/sbin/iptables: too many open files:
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: W1220 16:01:23.449006 1030 iptables.go:203] Error checking iptables version, assuming version at least 1.4.11: %vfork/exec /usr/sbin/iptables: too many open files
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.449133 1030 proxier.go:409] Failed to install iptables KUBE-PORTALS-CONTAINER rule for service "default/repo-client:"
I found few posts relating to "failed to install iptables" but they don't seem to be relevant as initially everything works but after few hours it gets messed up.
What version of Kubernetes is this? A long time ago (~1.0.4) we had a bug in the kube-proxy where it leaked sockets/file-descriptors.
If you aren't running a 1.1.3 binary, consider upgrading.
Also, you should be able to use lsof to figure out who has all of the files open.