I am trying to setup Kubernetes cluster using the instruction at https://coreos.com/kubernetes/docs/latest/getting-started.html.
I am in the step 2 (Deploy master) where when I start the master service, the master service is in active status but it cannot communicate with the API server. Also, there are 6 containers started but the logs are empty. Please find the kubelet log below:
Jan 26 07:54:18 kubernetes-1.novalocal systemd[1]: Started kubelet.service.
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: W0126 07:54:20.214551 1115 server.go:585] Could not load kubeconfig file /var/lib/kubelet/kubeconfig: stat /var/lib/kubelet/kubeconfig: no such file or directory. Trying auth path instead.
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: W0126 07:54:20.214631 1115 server.go:547] Could not load kubernetes auth path /var/lib/kubelet/kubernetes_auth: stat /var/lib/kubelet/kubernetes_auth: no such file or directory. Continuing with defaults.
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:20.217269 1115 plugins.go:71] No cloud provider specified.
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:20.219217 1115 manager.go:128] cAdvisor running in container: "/system.slice/kubelet.service"
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:20.672952 1115 fs.go:108] Filesystem partitions: map[/dev/vda9:{mountpoint:/ major:254 minor:9 fsType: blockSize:0} /dev/vda3:{mountpoint:/usr major:254 minor:3 fsType: blockSize:0} /dev/vda6:{mountpoi
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:20.856238 1115 manager.go:163] Machine: {NumCores:2 CpuFrequency:1999999 MemoryCapacity:4149022720 MachineID:5a493caa9327449cabd050ac6cd2e065 SystemUUID:5A493CAA-9327-449C-ABD0-50AC6CD2E065 BootID:541d
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:20.858067 1115 manager.go:169] Version: {KernelVersion:4.3.3-coreos-r2 ContainerOsVersion:CoreOS 899.5.0 DockerVersion:1.9.1 CadvisorVersion: CadvisorRevision:}
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:20.862564 1115 server.go:798] Adding manifest file: /etc/kubernetes/manifests
Jan 26 07:54:20 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:20.862655 1115 server.go:808] Watching apiserver
Jan 26 07:54:21 kubernetes-1.novalocal kubelet[1115]: I0126 07:54:21.165506 1115 plugins.go:56] Registering credential provider: .dockercfg
Jan 26 07:54:21 kubernetes-1.novalocal kubelet[1115]: E0126 07:54:21.171563 1115 kubelet.go:2284] Error updating node status, will retry: error getting node "192.168.111.32": Get http://127.0.0.1:8080/api/v1/nodes/192.168.111.32: dial tcp 127.0.0.1:8080: connection r
Jan 26 07:54:21 kubernetes-1.novalocal kubelet[1115]: E0126 07:54:21.172329 1115 kubelet.go:2284] Error updating node status, will retry: error getting node "192.168.111.32": Get http://127.0.0.1:8080/api/v1/nodes/192.168.111.32: dial tcp 127.0.0.1:8080: connection r
Jan 26 07:54:21 kubernetes-1.novalocal kubelet[1115]: E0126 07:54:21.173114 1115 kubelet.go:2284] Error updating node status, will retry: error getting node "192.168.111.32": Get http://127.0.0.1:8080/api/v1/nodes/192.168.111.32: dial tcp 127.0.0.1:8080: connection refused
Also, the following are the containers launched.
2bf275350996 gcr.io/google_containers/podmaster:1.1 "/podmaster --etcd-se" 26 minutes ago Up 26 minutes k8s_controller-manager-elector.5b0f7cea_kube-podmaster-192.168.111.32_kube-system_3b8350635fe89ab366063da0be8969fd_1f370f8c
c64042286744 gcr.io/google_containers/podmaster:1.1 "/podmaster --etcd-se" 26 minutes ago Up 26 minutes k8s_scheduler-elector.bc3d71be_kube-podmaster-192.168.111.32_kube-system_3b8350635fe89ab366063da0be8969fd_c9ecb387
81bd74d0396a gcr.io/google_containers/hyperkube:v1.1.2 "/hyperkube proxy --m" 26 minutes ago Up 26 minutes k8s_kube-proxy.176f5569_kube-proxy-192.168.111.32_kube-system_8a987aa8c76c4d76bd80ccff5b65ffea_840d8228
39494ed8e814 gcr.io/google_containers/pause:0.8.0 "/pause" 27 minutes ago Up 27 minutes k8s_POD.6d00e006_kube-podmaster-192.168.111.32_kube-system_3b8350635fe89ab366063da0be8969fd_36b73b1d
632dc0a2f612 gcr.io/google_containers/pause:0.8.0 "/pause" 27 minutes ago Up 27 minutes k8s_POD.6d00e006_kube-apiserver-192.168.111.32_kube-system_86819bf93f678db0ee778b8c8bb658dc_815c6627
361b297b37f9 gcr.io/google_containers/pause:0.8.0 "/pause" 27 minutes ago Up 27 minutes k8s_POD.6d00e006_kube-proxy-192.168.111.32_kube-system_8a987aa8c76c4d76bd80ccff5b65ffea_7a6182ed
These are trying to talk to the insecure version of the API, which shouldn't work between machines. That will only work on the master. Additionally, the master isn't set up to accept work (register_node=false), so it is not expected to report back its status.
The key piece of info we're missing, what machine did that log come from?
Did you set the MASTER_HOST= parameter correctly?
The address of the master node. In most cases this will be the publicly routable IP of the node. Worker nodes must be able to reach the master node(s) via this address on port 443.
Also, note this section of the docs:
Note that the kubelet running on a master node may log repeated attempts to post its status to the API server. These warnings are expected behavior and can be ignored. Future Kubernetes releases plan to handle this common deployment consideration more gracefully.
Related
OS: RHEL 8.2
I am trying to create a systemctl service for zookeeper. It fails to access the datadir.
Here is my config for zookeeper,
dataDir=/opt/zookeeper
maxClientCnxns=20
tickTime=2000
dataDir=/var/zookeeper/
initLimit=20
syncLimit=10
server.0=master:2888:3888
clientPort=2181
admin.serverPort=8082
Permission of /opt/zookeeper is set to 777.
[user1#server1 opt]$ ls -lart
total 0
dr-xr-xr-x. 17 root root 244 Jul 3 10:56 ..
drwxr-xr-x 3 root root 27 Jul 10 10:29 rh
drw-r--r-- 2 user2 user2 6 Jul 17 08:48 hsluw_data
drw-r--r-- 2 user2 user2 6 Jul 17 08:58 hsluw_config
drwxr-xr-x. 6 root root 71 Jul 17 08:58 .
drwxrwxrwx 3 user2 user2 23 Jul 17 09:40 zookeeper
If I run the command,
./bin/zookeeper-server-start.sh config/zookeeper.properties
it gives me an error message: Unable to access datadir
[2020-07-30 10:25:50,767] ERROR Invalid configuration, only one server specified (ignoring) (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2020-07-30 10:25:50,767] INFO Starting server (org.apache.zookeeper.server.ZooKeeperServerMain)
[2020-07-30 10:25:50,769] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2020-07-30 10:25:50,769] ERROR Unable to access datadir, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Cannot write to data directory /var/zookeeper/version-2
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:132)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:124)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
Unable to access datadir, exiting abnormally
However, sudoing the above command works,
sudo ./bin/zookeeper-server-start.sh config/zookeeper.properties
Now I have created a service in /etc/systemd/system/zookeeper.service
I wrote the service in /etc/systemd/system/zookeeper.service in this way,
[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=user2
ExecStart=/home/user2/kafka/bin/zookeeper-server-start.sh /home/user2/kafka/config/zookeeper.properties
ExecStop=/home/user2/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
The SELinux status is disabled.
user2#server1$ sestatus
SELinux status: disabled
Now if I do the following
sudo systemctl daemon-reload
sudo systemctl start zookeeper
sudo systemctl enable zookeeper
I am getting the the same Unable to access the datadir error like the following,
[user2#server1 /]$ systemctl status zookeeper
\u25cf zookeeper.service
Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2020-07-30 10:13:19 CEST; 24s ago
Main PID: 12911 (code=exited, status=3)
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: org.apache.zookeeper.server.persistence.FileTxnSnapLog$Data>
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: at org.apache.zookeeper.server.persistence.FileTxnS>
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: at org.apache.zookeeper.server.ZooKeeperServerMain.>
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: at org.apache.zookeeper.server.ZooKeeperServerMain.>
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: at org.apache.zookeeper.server.ZooKeeperServerMain.>
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: at org.apache.zookeeper.server.quorum.QuorumPeerMai>
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: at org.apache.zookeeper.server.quorum.QuorumPeerMai>
Jul 30 10:13:19 server1.localdomain zookeeper-server-start.sh[12911]: Unable to access datadir, exiting abnormally
Jul 30 10:13:19 server1.localdomain systemd[1]: zookeeper.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 30 10:13:19 server1.localdomain systemd[1]: zookeeper.service: Failed with result 'exit-code'.
What am I missing here?
In the configuration file, this line
dataDir=/var/zookeeper/
appears twice. Removing that line solves the issue.
When I'm trying to add magento 2 varnish.vcl file by creating a symbolic link, varnish service stop working with error permission denied, while if I use default varnish configuration file varnish works smooth.
My Stack is ubuntu 16.04, varnish 4.1
ls -al
drwxr-xr-x 2 root root 4096 Mar 21 13:14 .
drwxr-xr-x 96 root root 4096 Mar 21 12:56 ..
lrwxrwxrwx 1 root root 44 Mar 21 13:14 default.vcl -> /var/www/bazaar/varnish.vcl
-rw-r--r-- 1 root root 1225 Aug 22 2017 default.vcl_bak
-rw-r--r-- 1 root root 37 Mar 21 12:56 secret
here is the status for varnish service
● varnish.service - Varnish HTTP accelerator
Loaded: loaded (/lib/systemd/system/varnish.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/varnish.service.d
└─customexec.conf
Active: failed (Result: exit-code) since Wed 2018-03-21 13:59:08 UTC; 2s ago
Docs: https://www.varnish-cache.org/docs/4.1/
man:varnishd
Process: 3093 ExecStart=/usr/sbin/varnishd -j unix,user=vcache -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m (code=exited, status=2)
Main PID: 3093 (code=exited, status=2)
Mar 21 13:59:08 bazaar systemd[1]: Stopped Varnish HTTP accelerator.
Mar 21 13:59:08 bazaar systemd[1]: Started Varnish HTTP accelerator.
Mar 21 13:59:08 bazaar varnishd[3093]: Error: Cannot read -f file (/etc/varnish/default.vcl): Permission denied
Mar 21 13:59:08 bazaar systemd[1]: varnish.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 21 13:59:08 bazaar systemd[1]: varnish.service: Unit entered failed state.
Mar 21 13:59:08 bazaar systemd[1]: varnish.service: Failed with result 'exit-code'.
my current user for nginx is bazaar
and permissions for varnish.vcl is as follow
-rw-r--r-- 1 bazaar bazaar 7226 Mar 21 13:24 varnish.vcl
Any hint or help will be highly appreciated.
Thanks.
It is likely that the user (vcache) does not have access to read in the parent directory(s) /var/www/bazaar.
I have a preemptible node pool of size 1 on GKE. I've been running this node pool with size 1 for almost a month now. Every day the node restarts after 24 hours and rejoins the cluster. Today it restarted but did not rejoin the cluster.
Instead, I noticed that according to gcloud compute instances list the underlying instance was running but not included in the output of kubectl get node. I increased the node pool size to 2, whereupon a second instance was launched. That node immediately joined my GKE cluster and pods were scheduled onto it. The first node is still running according to gcloud, but it won't join the cluster.
What's going on? How can I debug this this problem?
Update:
I SSHed into the instance and was immediately greeted with this excellent error message:
Broken (or in progress) Kubernetes node setup! Check the cluster initialization status
using the following commands:
Master instance:
- sudo systemctl status kube-master-installation
- sudo systemctl status kube-master-configuration
Node instance:
- sudo systemctl status kube-node-installation
- sudo systemctl status kube-node-configuration
The results of sudo systemctl status kube-node-installation:
goto mark: ● kube-node-installation.service - Download and install k8s binaries and configurations
Loaded: loaded (/etc/systemd/system/kube-node-installation.service; enabled; vendor preset: disabled)
Active: active (exited) since Thu 2017-12-28 21:08:53 UTC; 6h ago
Process: 945 ExecStart=/home/kubernetes/bin/configure.sh (code=exited, status=0/SUCCESS)
Process: 941 ExecStartPre=/bin/chmod 544 /home/kubernetes/bin/configure.sh (code=exited, status=0/SUCCESS)
Process: 937 ExecStartPre=/usr/bin/curl --fail --retry 5 --retry-delay 3 --silent --show-error -H X-Google-Metadata-Request: True -o /home/kubernetes/bin/configure.sh http://metadata.google.internal/com
puteMetadata/v1/instance/attributes/configure-sh (code=exited, status=0/SUCCESS)
Process: 933 ExecStartPre=/bin/mount -o remount,exec /home/kubernetes/bin (code=exited, status=0/SUCCESS)
Process: 930 ExecStartPre=/bin/mount --bind /home/kubernetes/bin /home/kubernetes/bin (code=exited, status=0/SUCCESS)
Process: 925 ExecStartPre=/bin/mkdir -p /home/kubernetes/bin (code=exited, status=0/SUCCESS)
Main PID: 945 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4915)
Memory: 0B
CPU: 0
CGroup: /system.slice/kube-node-installation.service
Dec 28 21:08:52 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: Downloading node problem detector.
Dec 28 21:08:52 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: % Total % Received % Xferd Average Speed Time Time Time Current
Dec 28 21:08:52 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: Dload Upload Total Spent Left Speed
Dec 28 21:08:52 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: [158B blob data]
Dec 28 21:08:52 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: == Downloaded https://storage.googleapis.com/kubernetes-release/node-problem-detector/node-problem-detector-v0.4.1.tar.gz (SHA1 = a57a3fe
64cab8a18ec654f5cef0aec59dae62568) ==
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: cni-0799f5732f2a11b329d9e3d51b9c8f2e3759f2ff.tar.gz is preloaded.
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: kubernetes-manifests.tar.gz is preloaded.
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: mounter is preloaded.
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure.sh[945]: Done for installing kubernetes files
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc systemd[1]: Started Download and install k8s binaries and configurations.
And the result of sudo systemctl status kube-node-configuration:
● kube-node-configuration.service - Configure kubernetes node
Loaded: loaded (/etc/systemd/system/kube-node-configuration.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2017-12-28 21:08:53 UTC; 6h ago
Process: 994 ExecStart=/home/kubernetes/bin/configure-helper.sh (code=exited, status=4)
Process: 990 ExecStartPre=/bin/chmod 544 /home/kubernetes/bin/configure-helper.sh (code=exited, status=0/SUCCESS)
Main PID: 994 (code=exited, status=4)
CPU: 33ms
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc systemd[1]: Starting Configure kubernetes node...
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[994]: Start to configure instance for kubernetes
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[994]: Configuring IP firewall rules
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[994]: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[994]: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[994]: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc systemd[1]: kube-node-configuration.service: Main process exited, code=exited, status=4/NOPERMISSION
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc systemd[1]: Failed to start Configure kubernetes node.
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc systemd[1]: kube-node-configuration.service: Unit entered failed state.
Dec 28 21:08:53 gke-cluster0-pool-d59e9506-g9sc systemd[1]: kube-node-configuration.service: Failed with result 'exit-code'.
So it looks like kube-node-configuration failed. I ran sudo systemctl restart kube-node-configuration and now the status output is:
● kube-node-configuration.service - Configure kubernetes node
Loaded: loaded (/etc/systemd/system/kube-node-configuration.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2017-12-29 03:41:36 UTC; 3s ago
Main PID: 20802 (code=exited, status=0/SUCCESS)
CPU: 1.851s
Dec 29 03:41:28 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Extend the docker.service configuration to set a higher pids limit
Dec 29 03:41:28 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Docker command line is updated. Restart docker to pick it up
Dec 29 03:41:30 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Start kubelet
Dec 29 03:41:35 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Using kubelet binary at /home/kubernetes/bin/kubelet
Dec 29 03:41:35 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Start kube-proxy static pod
Dec 29 03:41:35 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Start node problem detector
Dec 29 03:41:35 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Using node problem detector binary at /home/kubernetes/bin/node-problem-detector
Dec 29 03:41:36 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Prepare containerized mounter
Dec 29 03:41:36 gke-cluster0-pool-d59e9506-g9sc configure-helper.sh[20802]: Done for the configuration for kubernetes
Dec 29 03:41:36 gke-cluster0-pool-d59e9506-g9sc systemd[1]: Started Configure kubernetes node.
...and the node joined the cluster :). But, my original question stands: what happened?
We were experiencing a similar problem on GKE with preemptible nodes, seeing error messaging like these from the nodes:
Extend the docker.service configuration to set a higher pids limit
Docker command line is updated. Restart docker to pick it up
level=info msg="Processing signal 'terminated'"
level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
level=info msg="Daemon shutdown complete"
docker daemon exited
Start kubelet
After about a month of back-and-forth with Google Support, we learned that the Nodes were getting preempted and replaced, and the new node that comes in, uses the same name, and it all happens without the normal pod disruption of a node being evicted.
Backstory: we were running into this problem because Jenkins was running it's workers on the nodes, and during this ~2 minute "restart" of the node going and returning, Jenkins master would loose connection and fail the job.
tldr; don't use preemptible nodes for this kind of work.
I installed it a Centos 7 box.
R studio server service could not start.
I run the command
systemctl status rstudio-server.service
and it showed:
● rstudio-server.service - RStudio Server
Loaded: loaded (/etc/systemd/system/rstudio-server.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Thu 2016-01-28 20:18:20 ICT; 1min 6s ago
Process: 48820 ExecStart=/usr/lib/rstudio-server/bin/rserver (code=exited, status=203/EXEC)
Jan 28 20:18:20 localhost.localdomain systemd[1]: rstudio-server.service: control process exited, code=exited s...=203
Jan 28 20:18:20 localhost.localdomain systemd[1]: Failed to start RStudio Server.
Jan 28 20:18:20 localhost.localdomain systemd[1]: Unit rstudio-server.service entered failed state.
Jan 28 20:18:20 localhost.localdomain systemd[1]: rstudio-server.service failed.
Jan 28 20:18:20 localhost.localdomain systemd[1]: rstudio-server.service holdoff time over, scheduling restart.
Jan 28 20:18:20 localhost.localdomain systemd[1]: start request repeated too quickly for rstudio-server.service
Jan 28 20:18:20 localhost.localdomain systemd[1]: Failed to start RStudio Server.
Jan 28 20:18:20 localhost.localdomain systemd[1]: Unit rstudio-server.service entered failed state.
Jan 28 20:18:20 localhost.localdomain systemd[1]: rstudio-server.service failed.
I installed and run an old version (rstudio-server-0.99.491-1.x86_64) on the same box without any problem.
How could I fix the issues?
Although you asked this question 3 years ago, I think it's still necessary to share my solution to this problem.
I encounter this problem after I updated R.
The reason why you can not restart rstudio-server is that the PORT 8787 was been using by previous rserver. After knowing this, the solution is easy.
First, check the pid that was using PORT 8787
sudo netstat -anp | grep 8787
tcp 0 0 0.0.0.0:8787 0.0.0.0:* LISTEN pid/rserver
Second, kill this pid (use your pid)
sudo kill -9 pid
Third, restart rstudio-server or reinstall resutio server package
Yesterday service worked fine. But today when i checked service's state i saw:
Mar 11 14:03:16 coreos-1 systemd[1]: scheduler.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 11 14:03:16 coreos-1 systemd[1]: Unit scheduler.service entered failed state.
Mar 11 14:03:16 coreos-1 systemd[1]: scheduler.service failed.
Mar 11 14:03:16 coreos-1 systemd[1]: Starting Kubernetes Scheduler...
Mar 11 14:03:16 coreos-1 systemd[1]: Started Kubernetes Scheduler.
Mar 11 14:08:16 coreos-1 kube-scheduler[4659]: E0311 14:08:16.808349 4659 reflector.go:118] watch of *api.Service ended with error: very short watch
Mar 11 14:08:16 coreos-1 kube-scheduler[4659]: E0311 14:08:16.811434 4659 reflector.go:118] watch of *api.Pod ended with error: unexpected end of JSON input
Mar 11 14:08:16 coreos-1 kube-scheduler[4659]: E0311 14:08:16.847595 4659 reflector.go:118] watch of *api.Pod ended with error: unexpected end of JSON input
It's really confused 'cause etcd, flannel and apiserver work fine.
Only some strange logs are for etcd:
Mar 11 20:22:21 coreos-1 etcd[472]: [etcd] Mar 11 20:22:21.572 INFO | aba44aa0670b4b2e8437c03a0286d779: warning: heartbeat time out peer="6f4934635b6b4291bf29763add9bf4c7" missed=1 backoff="2s"
Mar 11 20:22:48 coreos-1 etcd[472]: [etcd] Mar 11 20:22:48.269 INFO | aba44aa0670b4b2e8437c03a0286d779: warning: heartbeat time out peer="6f4934635b6b4291bf29763add9bf4c7" missed=1 backoff="2s"
Mar 11 20:48:12 coreos-1 etcd[472]: [etcd] Mar 11 20:48:12.070 INFO | aba44aa0670b4b2e8437c03a0286d779: warning: heartbeat time out peer="6f4934635b6b4291bf29763add9bf4c7" missed=1 backoff="2s"
So, I'm really stuck and don't know what's wrong. How can i resolve this problem? Or, how can i check details log for scheduler.
journalctl give me same logs like systemd status
Please see: https://github.com/GoogleCloudPlatform/kubernetes/issues/5311
It means apiserver accepted the watch request but then immediately terminated the connection.
If you see it occasionally, it implies a transient error and is not alarming. If you see it repeatedly, it implies that apiserver (or etcd) is sick.
Is something actually not working for you?