I'm trying to configure 3G USB dongle as backuo solution to WiFi.
Using Raspberry 4b, latest raspbian in which seems usb dongles automatically converted to modems as I see.
So I'm skipping usb-modeswitch part and trying to connect through wvdial.
Connection seems successfull but in few seconds - disconnect.
Highly appreciate any help.
I'm using Huawei E372 or Huawei E397B - same result.
Initially it connects already as modem (1506)
pi#raspberrypi:~/3g $ lsusb
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 014: ID 12d1:1506 Huawei Technologies Co., Ltd. Modem/Networkcard
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
pi#raspberrypi:~/3g $ ls -al /dev/gsm*
-rw-r--r-- 1 root root 0 Dec 25 11:33 /dev/gsmmodem
lrwxrwxrwx 1 root root 7 Dec 25 11:48 /dev/gsmmodem2 -> ttyUSB0
wvdial config:
[Dialer Defaults]
Init1 = ATZ
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
Init3 = AT+CGDCONT=1,"IP","internet",,0,0
Stupid Mode = 1
Modem Type = Analog Modem
ISDN = 0
Phone = *99***1#
Modem = /dev/gsmmodem
Username = { }
Password = { }
New PPPD = yes
Baud = 460800
connection:
pi#raspberrypi:~/3g sudo wvdial
--> WvDial: Internet dialer version 1.61
--> Initializing modem.
--> Sending: ATZ
OK
--> Sending: ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
OK
--> Sending: AT+CGDCONT=1,"IP","internet",,0,0
AT+CGDCONT=1,"IP","internet",,0,0
OK
--> Modem initialized.
--> Sending: ATDT*99***1#
--> Waiting for carrier.
ATDT*99***1#
CONNECT
--> Carrier detected. Starting PPP immediately.
--> Starting pppd at Wed Dec 25 11:33:39 2019
--> Pid of pppd: 22588
--> Using interface ppp0
--> pppd: X[1f]�[01]X[1f]�[01]
--> pppd: X[1f]�[01]X[1f]�[01]
--> pppd: X[1f]�[01]X[1f]�[01]
--> pppd: X[1f]�[01]X[1f]�[01]
--> pppd: X[1f]�[01]X[1f]�[01]
--> pppd: X[1f]�[01]X[1f]�[01]
--> pppd: X[1f]�[01]X[1f]�[01]
--> Disconnecting at Wed Dec 25 11:33:42 2019
--> The PPP daemon has died: A modem hung up the phone (exit code = 16)
--> man pppd explains pppd error codes in more detail.
--> Try again and look into /var/log/messages and the wvdial and pppd man pages for more information.
--> Auto Reconnect will be attempted in 5 seconds
--> Cannot open /dev/gsmmodem: No such file or directory
--> Cannot open /dev/gsmmodem: No such file or directory
--> Cannot open /dev/gsmmodem: No such file or directory
--> Disconnecting at Wed Dec 25 11:33:43 2019
log:
Dec 25 11:33:47 raspberrypi kernel: [58307.025422] usb 1-1.2: Product: HUAWEI Mobile
Dec 25 11:33:47 raspberrypi kernel: [58307.025435] usb 1-1.2: Manufacturer: Huawei Technologies
Dec 25 11:33:47 raspberrypi kernel: [58307.028680] usb-storage 1-1.2:1.0: USB Mass Storage device detected
Dec 25 11:33:47 raspberrypi kernel: [58307.029145] scsi host0: usb-storage 1-1.2:1.0
Dec 25 11:33:47 raspberrypi mtp-probe: checking bus 1, device 13: "/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.2"
Dec 25 11:33:47 raspberrypi mtp-probe: bus: 1, device: 13 was not an MTP device
Dec 25 11:33:47 raspberrypi mtp-probe: checking bus 1, device 13: "/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.2"
Dec 25 11:33:47 raspberrypi mtp-probe: bus: 1, device: 13 was not an MTP device
Dec 25 11:33:47 raspberrypi usb_modeswitch: switch device 12d1:1505 on 001/013
Dec 25 11:33:47 raspberrypi kernel: [58307.818077] usb 1-1.2: USB disconnect, device number 13
Found an answer. Instead of Modem = /dev/gsmmodem or Modem = /dev/ttyUSB0 need to set
Modem = /dev/ttyUSB2 - it's not obvious and depends on modem type I guess.
Nevertheless it states gsmmodemm on USB0:
pi#raspberrypi:~ $ ls -al /dev/gsm*
lrwxrwxrwx 1 root root 7 Dec 25 20:32 /dev/gsmmodem -> ttyUSB0
To figure out which port to dial - just run automatic wvdial configuratiin - sudo wvdialconf and pick up last one.
Editing `/etc/wvdial.conf'.
Scanning your serial ports for a modem.
ttyUSB0<*1>: ATQ0 V1 E1 -- OK
ttyUSB0<*1>: ATQ0 V1 E1 Z -- OK
ttyUSB0<*1>: ATQ0 V1 E1 S0=0 -- OK
ttyUSB0<*1>: ATQ0 V1 E1 S0=0 &C1 -- OK
ttyUSB0<*1>: ATQ0 V1 E1 S0=0 &C1 &D2 -- OK
ttyUSB0<*1>: ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0 -- OK
ttyUSB0<*1>: Modem Identifier: ATI -- Manufacturer: huawei
ttyUSB0<*1>: Speed 9600: AT -- OK
ttyUSB0<*1>: Max speed is 9600; that should be safe.
ttyUSB0<*1>: ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0 -- OK
ttyUSB1<*1>: ATQ0 V1 E1 -- failed with 2400 baud, next try: 9600 baud
ttyUSB1<*1>: ATQ0 V1 E1 -- failed with 9600 baud, next try: 9600 baud
ttyUSB1<*1>: ATQ0 V1 E1 -- and failed too at 115200, giving up.
ttyUSB2<*1>: ATQ0 V1 E1 -- OK
ttyUSB2<*1>: ATQ0 V1 E1 Z -- OK
ttyUSB2<*1>: ATQ0 V1 E1 S0=0 -- OK
ttyUSB2<*1>: ATQ0 V1 E1 S0=0 &C1 -- OK
ttyUSB2<*1>: ATQ0 V1 E1 S0=0 &C1 &D2 -- OK
ttyUSB2<*1>: ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0 -- OK
ttyUSB2<*1>: Modem Identifier: ATI -- Manufacturer: huawei
ttyUSB2<*1>: Speed 9600: AT -- OK
ttyUSB2<*1>: Max speed is 9600; that should be safe.
ttyUSB2<*1>: ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0 -- OK
Found a modem on /dev/ttyUSB0.
Modem configuration written to /etc/wvdial.conf.
ttyUSB0<Info>: Speed 9600; init "ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0"
ttyUSB2<Info>: Speed 9600; init "ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0"
Related
Hi guys I have an 11 nodes Kubernetes cluster with cilium 1.12.1 kubeProxyReplacement=strict built on bare metal in our data center, but pods on 4 of the nodes(node5-node8) have issues when communicate with other pods or service which not on the same node, other 7 nodes don't have the issue. I can ping to other pods IP, but when telnet the port, packages seems never arrived.
All the 11 nodes installed the same version of OS, same kernel, and the cluster is deployed with Kubespray, I made sure that the 11 nodes had the same software environment as much as possible, (I’m not sure if it has anything to do with the hardware, but the 4 problematic nodes were gigabit NIC servers and the others were all 10 gigabit NICs.)
This is the node list:
❯ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master01 Ready control-plane 39h v1.24.4 10.252.55.22 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
master02 Ready control-plane 39h v1.24.4 10.252.54.44 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
master03 Ready control-plane 39h v1.24.4 10.252.55.39 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node05 Ready <none> 39h v1.24.4 10.252.34.27 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node06 Ready <none> 39h v1.24.4 10.252.33.44 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node07 Ready <none> 39h v1.24.4 10.252.33.52 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node08 Ready <none> 39h v1.24.4 10.252.33.45 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node01 Ready <none> 39h v1.24.4 10.252.144.206 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node02 Ready <none> 39h v1.24.4 10.252.145.13 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node03 Ready <none> 39h v1.24.4 10.252.145.163 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node04 Ready <none> 39h v1.24.4 10.252.145.226 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
This is what happens in pod on node5 when communicate with nginx pods running on master01:
# ping works fine
bash-5.1# ping 10.233.64.103
PING 10.233.64.103 (10.233.64.103) 56(84) bytes of data.
64 bytes from 10.233.64.103: icmp_seq=1 ttl=63 time=0.214 ms
64 bytes from 10.233.64.103: icmp_seq=2 ttl=63 time=0.148 ms
--- 10.233.64.103 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1026ms
rtt min/avg/max/mdev = 0.148/0.181/0.214/0.033 ms
# curl not working
bash-5.1# curl 10.233.64.103
curl: (28) Failed to connect to 10.233.64.103 port 80 after 3069 ms: Operation timed out
# hubble observe logs(hubble observe --to-ip 10.233.64.103 -f):
Sep 6 03:15:16.100: cilium-test/testubuntu-g2gv6 (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-overlay FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:15:16.100: cilium-test/testubuntu-g2gv6 (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:15:22.026: cilium-test/testubuntu-g2gv6:33722 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: SYN)
This is what happens in pod on node4 when communicate with the same nginx pod:
# ping works fine
bash-5.1# ping 10.233.64.103
PING 10.233.64.103 (10.233.64.103) 56(84) bytes of data.
64 bytes from 10.233.64.103: icmp_seq=1 ttl=63 time=2.33 ms
64 bytes from 10.233.64.103: icmp_seq=2 ttl=63 time=2.30 ms
# curl works fine as well
bash-5.1# curl 10.233.64.103
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
nginx.org.<br/>
Commercial support is available at
nginx.com.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
# hubble observe logs(hubble observe --to-ip 10.233.64.103 -f):
Sep 6 03:16:24.808: cilium-test/testubuntu-wcwfg (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-overlay FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:16:24.810: cilium-test/testubuntu-wcwfg (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:16:27.043: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: SYN)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: SYN)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK, PSH)
Sep 6 03:16:27.047: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.047: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Sep 6 03:16:27.048: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK, FIN)
Sep 6 03:16:27.050: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Sep 6 03:16:27.050: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.051: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK)
This is the cilium-health status, also shows the port connection issues on the 4 nodes:
❯ kubectl exec -it -n kube-system ds/cilium -- cilium-health status
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), clean-cilium-state (init)
Probe time: 2022-09-06T03:10:24Z
Nodes:
node01 (localhost):
Host connectivity to 10.252.144.206:
ICMP to stack: OK, RTT=341.295µs
HTTP to agent: OK, RTT=100.729µs
Endpoint connectivity to 10.233.67.53:
ICMP to stack: OK, RTT=334.224µs
HTTP to agent: OK, RTT=163.289µs
master01:
Host connectivity to 10.252.55.22:
ICMP to stack: OK, RTT=1.994728ms
HTTP to agent: OK, RTT=1.610932ms
Endpoint connectivity to 10.233.64.235:
ICMP to stack: OK, RTT=2.100332ms
HTTP to agent: OK, RTT=2.489126ms
master02:
Host connectivity to 10.252.54.44:
ICMP to stack: OK, RTT=2.33033ms
HTTP to agent: OK, RTT=2.34166ms
Endpoint connectivity to 10.233.65.225:
ICMP to stack: OK, RTT=2.101561ms
HTTP to agent: OK, RTT=2.067012ms
master03:
Host connectivity to 10.252.55.39:
ICMP to stack: OK, RTT=1.688641ms
HTTP to agent: OK, RTT=1.593428ms
Endpoint connectivity to 10.233.66.74:
ICMP to stack: OK, RTT=2.210915ms
HTTP to agent: OK, RTT=1.725555ms
node05:
Host connectivity to 10.252.34.27:
ICMP to stack: OK, RTT=2.383001ms
HTTP to agent: OK, RTT=2.48362ms
Endpoint connectivity to 10.233.70.87:
ICMP to stack: OK, RTT=2.194843ms
HTTP to agent: Get "http://10.233.70.87:4240/hello": dial tcp 10.233.70.87:4240: connect: connection timed out
node06:
Host connectivity to 10.252.33.44:
ICMP to stack: OK, RTT=2.091932ms
HTTP to agent: OK, RTT=1.724729ms
Endpoint connectivity to 10.233.71.119:
ICMP to stack: OK, RTT=1.984056ms
HTTP to agent: Get "http://10.233.71.119:4240/hello": dial tcp 10.233.71.119:4240: connect: connection timed out
node07:
Host connectivity to 10.252.33.52:
ICMP to stack: OK, RTT=2.055482ms
HTTP to agent: OK, RTT=2.037437ms
Endpoint connectivity to 10.233.72.47:
ICMP to stack: OK, RTT=1.853614ms
HTTP to agent: Get "http://10.233.72.47:4240/hello": dial tcp 10.233.72.47:4240: connect: connection timed out
node08:
Host connectivity to 10.252.33.45:
ICMP to stack: OK, RTT=2.461315ms
HTTP to agent: OK, RTT=2.369003ms
Endpoint connectivity to 10.233.74.247:
ICMP to stack: OK, RTT=2.097029ms
HTTP to agent: Get "http://10.233.74.247:4240/hello": dial tcp 10.233.74.247:4240: connect: connection timed out
node02:
Host connectivity to 10.252.145.13:
ICMP to stack: OK, RTT=372.787µs
HTTP to agent: OK, RTT=168.915µs
Endpoint connectivity to 10.233.73.98:
ICMP to stack: OK, RTT=360.354µs
HTTP to agent: OK, RTT=287.224µs
node03:
Host connectivity to 10.252.145.163:
ICMP to stack: OK, RTT=363.072µs
HTTP to agent: OK, RTT=216.652µs
Endpoint connectivity to 10.233.68.73:
ICMP to stack: OK, RTT=312.153µs
HTTP to agent: OK, RTT=304.981µs
node04:
Host connectivity to 10.252.145.226:
ICMP to stack: OK, RTT=375.121µs
HTTP to agent: OK, RTT=185.484µs
Endpoint connectivity to 10.233.69.140:
ICMP to stack: OK, RTT=403.752µs
HTTP to agent: OK, RTT=277.517µs
Any suggestions on where I should start troubleshooting?
Since 1.12 version they changed the routing heavily.
Try to enable legacy routing.
In the helm_values.yaml (if you are using helm to deploy) you should add:
bpf:
hostLegacyRouting: true
It configures whether direct routing mode should route traffic via host stack (true) or directly and more efficiently out of BPF (false) if the kernel supports it. The latter has the implication that it will also bypass netfilter in the host namespace.
You can read more about BPF in the official docs. Pay attention to the compatibility of the node OS with BPF
I'm trying to create Redis cluster along with Node.JS (ioredis/cluster) but that doesn't seem to work.
It's v1.11.8-gke.6 on GKE.
I'm doing exactly what been told in ha-redis docs:
~ helm install --set replicas=3 --name redis-test stable/redis-ha
NAME: redis-test
LAST DEPLOYED: Fri Apr 26 00:13:31 2019
NAMESPACE: yt
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
redis-test-redis-ha-configmap 3 0s
redis-test-redis-ha-probes 2 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
redis-test-redis-ha-server-0 0/2 Init:0/1 0 0s
==> v1/Role
NAME AGE
redis-test-redis-ha 0s
==> v1/RoleBinding
NAME AGE
redis-test-redis-ha 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-test-redis-ha ClusterIP None <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-0 ClusterIP 10.7.244.34 <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-1 ClusterIP 10.7.251.35 <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-2 ClusterIP 10.7.252.94 <none> 6379/TCP,26379/TCP 0s
==> v1/ServiceAccount
NAME SECRETS AGE
redis-test-redis-ha 1 0s
==> v1/StatefulSet
NAME READY AGE
redis-test-redis-ha-server 0/3 0s
NOTES:
Redis can be accessed via port 6379 and Sentinel can be accessed via port 26379 on the following DNS name from within your cluster:
redis-test-redis-ha.yt.svc.cluster.local
To connect to your Redis server:
1. Run a Redis pod that you can use as a client:
kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
2. Connect using the Redis CLI:
redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
~ k get pods | grep redis-test
redis-test-redis-ha-server-0 2/2 Running 0 1m
redis-test-redis-ha-server-1 2/2 Running 0 1m
redis-test-redis-ha-server-2 2/2 Running 0 54s
~ kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
Defaulting container name to redis.
Use 'kubectl describe pod/redis-test-redis-ha-server-0 -n yt' to see all of the containers in this pod.
/data $ redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
redis-test-redis-ha.yt.svc.cluster.local:6379> set test key
(error) READONLY You can't write against a read only replica.
But in the end only one random pod I connect to is writable. I ran logs on a few containers and everything seem to be fine there. I tried to run cluster info in redis-cli but I get ERR This instance has cluster support disabled everywhere.
Logs:
~ k logs pod/redis-test-redis-ha-server-0 redis
1:C 25 Apr 2019 20:13:43.604 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:13:43.604 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:13:43.604 # Configuration loaded
1:M 25 Apr 2019 20:13:43.606 * Running mode=standalone, port=6379.
1:M 25 Apr 2019 20:13:43.606 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 25 Apr 2019 20:13:43.606 # Server initialized
1:M 25 Apr 2019 20:13:43.606 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 25 Apr 2019 20:13:43.627 * DB loaded from disk: 0.021 seconds
1:M 25 Apr 2019 20:13:43.627 * Ready to accept connections
1:M 25 Apr 2019 20:14:11.801 * Replica 10.7.251.35:6379 asks for synchronization
1:M 25 Apr 2019 20:14:11.801 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are '8311a1ca896e97d5487c07f2adfd7d4ef924f36b' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:11.802 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:17.825 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:17.825 * Background RDB transfer started by pid 55
55:C 25 Apr 2019 20:14:17.826 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:17.926 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:17.926 # Slave 10.7.251.35:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:17.926 * Streamed RDB transfer with replica 10.7.251.35:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:18.828 * Synchronization with replica 10.7.251.35:6379 succeeded
1:M 25 Apr 2019 20:14:42.711 * Replica 10.7.252.94:6379 asks for synchronization
1:M 25 Apr 2019 20:14:42.711 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are 'af453adde824b2280ba66adb40cc765bf390e237' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:42.711 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:48.976 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:48.977 * Background RDB transfer started by pid 125
125:C 25 Apr 2019 20:14:48.978 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:49.077 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:49.077 # Slave 10.7.252.94:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:49.077 * Streamed RDB transfer with replica 10.7.252.94:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:49.761 * Synchronization with replica 10.7.252.94:6379 succeeded
~ k logs pod/redis-test-redis-ha-server-1 redis
1:C 25 Apr 2019 20:14:11.780 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:14:11.781 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:14:11.781 # Configuration loaded
1:S 25 Apr 2019 20:14:11.786 * Running mode=standalone, port=6379.
1:S 25 Apr 2019 20:14:11.791 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 25 Apr 2019 20:14:11.791 # Server initialized
1:S 25 Apr 2019 20:14:11.791 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 25 Apr 2019 20:14:11.792 * DB loaded from disk: 0.001 seconds
1:S 25 Apr 2019 20:14:11.792 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 25 Apr 2019 20:14:11.792 * Ready to accept connections
1:S 25 Apr 2019 20:14:11.792 * Connecting to MASTER 10.7.244.34:6379
1:S 25 Apr 2019 20:14:11.792 * MASTER <-> REPLICA sync started
1:S 25 Apr 2019 20:14:11.792 * Non blocking connect for SYNC fired the event.
1:S 25 Apr 2019 20:14:11.793 * Master replied to PING, replication can continue...
1:S 25 Apr 2019 20:14:11.799 * Trying a partial resynchronization (request c2827ffe011d774db005a44165bac67a7e7f7d85:6006176).
1:S 25 Apr 2019 20:14:17.824 * Full resync from master: af453adde824b2280ba66adb40cc765bf390e237:722
1:S 25 Apr 2019 20:14:17.824 * Discarding previously cached master state.
1:S 25 Apr 2019 20:14:17.852 * MASTER <-> REPLICA sync: receiving streamed RDB from master
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Flushing old data
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Finished with success
What am I missing or is there a better way to do clustering?
Not the best solution, but I figured I can just use Sentinel instead of finding another way (or maybe there is no another way). It has support on most languages so it shouldn't be very hard (except redis-cli, can't figure how to query Sentinel server).
This is how I got this done on ioredis (node.js, sorry if you not familiar with ES6 syntax):
import * as IORedis from 'ioredis';
import Redis from 'ioredis';
import { redisHost, redisPassword, redisPort } from './config';
export function getRedisConfig(): IORedis.RedisOptions {
// I'm not sure how to set this properly
// ioredis/cluster automatically resolves all pods by hostname, but not this.
// So I have to implicitly specify all pods.
// Or resolve them all by hostname
return {
sentinels: process.env.REDIS_CLUSTER.split(',').map(d => {
const [host, port = 26379] = d.split(':');
return { host, port: Number(port) };
}),
name: process.env.REDIS_MASTER_NAME || 'mymaster',
...(redisPassword ? { password: redisPassword } : {}),
};
}
export async function initializeRedis() {
if (process.env.REDIS_CLUSTER) {
const cluster = new Redis(getRedisConfig());
return cluster;
}
// For dev environment
const client = new Redis(redisPort, redisHost);
if (redisPassword) {
await client.auth(redisPassword);
}
return client;
}
In env:
env:
- name: REDIS_CLUSTER
value: redis-redis-ha-server-1.redis-redis-ha.yt.svc.cluster.local:26379,redis-redis-ha-server-0.redis-redis-ha.yt.svc.cluster.local:23679,redis-redis-ha-server-2.redis-redis-ha.yt.svc.cluster.local:23679
You may wanna protect it using password.
Any pointers for this issue? Tried tons of things already to no avail.
This command fails with the error Can't read superblock
sudo mount -t ceph worker2:6789:/ /mnt/mycephfs -o name=admin,secret=AQAYjCpcAAAAABAAxs1mrh6nnx+0+1VUqW2p9A==
Some more info that may be helpful
uname -a Linux cephfs-test-admin-1 4.14.84-coreos #1 SMP Sat Dec 15 22:39:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Ceph status and ceph osd status all show no issues at all
dmesg | tail
[228343.304863] libceph: resolve 'worker2' (ret=0): 10.1.96.4:0
[228343.322279] libceph: mon0 10.1.96.4:6789 session established
[228343.323622] libceph: client107238 fsid 762e6263-a95c-40da-9813-9df4fef12f53
ceph -s
cluster:
id: 762e6263-a95c-40da-9813-9df4fef12f53
health: HEALTH_WARN
too few PGs per OSD (16 < min 30)
services:
mon: 3 daemons, quorum worker2,worker0,worker1
mgr: worker1(active)
mds: cephfs-1/1/1 up {0=mds-ceph-mds-85b4fbb478-c6jzv=up:active}
osd: 3 osds: 3 up, 3 in
data:
pools: 2 pools, 16 pgs
objects: 21 objects, 2246 bytes
usage: 342 MB used, 76417 MB / 76759 MB avail
pgs: 16 active+clean
ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | worker2 | 114M | 24.8G | 0 | 0 | 0 | 0 | exists,up |
| 1 | worker0 | 114M | 24.8G | 0 | 0 | 0 | 0 | exists,up |
| 2 | worker1 | 114M | 24.8G | 0 | 0 | 0 | 0 | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
ceph -v
ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable)
Some of the syslog output:
Jan 04 21:24:04 worker2 kernel: libceph: resolve 'worker2' (ret=0): 10.1.96.4:0
Jan 04 21:24:04 worker2 kernel: libceph: mon0 10.1.96.4:6789 session established
Jan 04 21:24:04 worker2 kernel: libceph: client159594 fsid 762e6263-a95c-40da-9813-9df4fef12f53
Jan 04 21:24:10 worker2 systemd[1]: Started OpenSSH per-connection server daemon (58.242.83.28:36729).
Jan 04 21:24:11 worker2 sshd[12315]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.242.83.28 us>
Jan 04 21:24:14 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:15 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:18 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:18 worker2 sshd[12315]: Received disconnect from 58.242.83.28 port 36729:11: [preauth]
Jan 04 21:24:18 worker2 sshd[12315]: Disconnected from authenticating user root 58.242.83.28 port 36729 [preauth]
Jan 04 21:24:18 worker2 sshd[12315]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.242.83.28 user=root
Jan 04 21:24:56 worker2 systemd[1]: Started OpenSSH per-connection server daemon (24.114.79.151:58123).
Jan 04 21:24:56 worker2 sshd[12501]: Accepted publickey for core from 24.114.79.151 port 58123 ssh2: RSA SHA256:t4t9yXeR2yC7s9c37mdS/F7koUs2x>
Jan 04 21:24:56 worker2 sshd[12501]: pam_unix(sshd:session): session opened for user core by (uid=0)
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
So after digging the problem was due to XFS partitioning issues ...
Do not know how I missed it at first.
In short:
Trying to create a partion using xfs was failing.
i.e. running mkfs.xfs /dev/vdb1 would simply hang. The OS would still create and mark partitions properly but they'd be corrupt - the fact you'd only find out when trying to mount by getting that Can't read superblock error.
So ceph does this:
1. Run deploy
2. Create XFS partitions mkfs.xfs ...
3. OS would create those faulty partitions
4. Since you can still read the status of OSDs just fine all status report and logs will report no problems (mkfs.xfs did not report errors it just hang)
5. When you try to mount cephFS or use block storage the whole thing bombs due to corrupt partions.
The root cause: still unknown. But I suspect something was not done right on the SSD disk level when provisioning/attaching them from my cloud provider. It now works fine
I'm using a Kubernetes cluster, deployed on Azure using the ACS-engine. My cluster is composed of 5 nodes.
1 master (unix VM) (v1.6.2)
2 unix agent (v1.6.2)
2 windows agent (v1.6.0-alpha.1.2959+451473d43a2072)
I have created a unix pod defined by the following YAML:
Name: ping-with-unix
Node: k8s-linuxpool1-25103419-0/10.240.0.5
Start Time: Fri, 30 Jun 2017 14:27:28 +0200
Status: Running
IP: 10.244.2.6
Controllers: <none>
Containers:
ping-with-unix-2:
Container ID:
Image: willfarrell/ping
Port:
State: Running
Started: Fri, 30 Jun 2017 14:27:29 +0200
Ready: True
Restart Count: 0
Environment:
HOSTNAME: google.com
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1nrh5 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-1nrh5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1nrh5
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: <none>
Events: <none>
This pod does not have internet access.
2017-06-30T12:27:29.512885000Z ping google.com every 300 sec
2017-06-30T12:27:29.521968000Z PING google.com (172.217.17.78): 56 data bytes
2017-06-30T12:27:39.698081000Z --- google.com ping statistics ---
2017-06-30T12:27:39.698305000Z 1 packets transmitted, 0 packets received, 100% packet loss
I created a 2nd pod, targeting windows container, with a custom docker image. This image instantiates an HttpClient and request an endpoint. It also does not have internet access. I can access the container to run interactive PowerShell. I cannot not access any DNS (due to lack of internet access).
PS C:\app> ping github.com
Ping request could not find host github.com. Please check the name and try again.
PS C:\app> ping 192.30.253.112
Pinging 192.30.253.112 with 32 bytes of data:
Request timed out.
Request timed out.
Ping statistics for 192.30.253.112:
Packets: Sent = 2, Received = 0, Lost = 2 (100% loss),
Control-C
What do I have to configure to allow my containers to access internet?
Remarks: I have not defined any Network policy.
I've updated my cluster using the api version '2017-07-01' and the kubernetes version '1.6.6'. Both my unix and windows pods have Internet access.
Note, for Windows pods :
Internet is available 2 or 3 minutes after the pod starts
I can't set the DnsPolicy to "Default", only "ClusterFirst" or "ClusterFirstWithHostNet" works.
I have a kubernetes cluster with one master and four nodes. kube-proxy was working fine on all four nodes, and I could access services on any of the nodes irrespective of where it was running; ie. http://node1:30000 through http://node4:30000 was giving the same response.
After restarting node4 by running shutdown -r now, it came back up, but I noticed that the node was no longer responding to requests. I am running the following command:
curl http://node4:30000
If I run it from my PC, or from any other node in the cluster -- node1 through node3, or master -- I get:
curl: (7) Failed to connect to node4 port 30000: Connection timed out
However, if I run it from node4, it responds successfully. This leads me to believe that kube-proxy is running fine, but something is preventing external connections.
When I run kubectl describe node node4, my output looks normal:
Name: node4
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=node4
Taints: <none>
CreationTimestamp: Tue, 21 Feb 2017 15:21:17 -0400
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:28 -0400 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses: 10.6.81.64,10.6.81.64,node4
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
System Info:
Machine ID: dbc0bb6ba10acae66b1061f958220ade
System UUID: 4229186F-AA5C-59CE-E5A2-258C1BBE9D2C
Boot ID: a3968e6c-eba3-498c-957f-f29283af1cff
Kernel Version: 4.4.0-63-generic
OS Image: Ubuntu 16.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.0
Kubelet Version: v1.5.2
Kube-Proxy Version: v1.5.2
ExternalID: node4
Non-terminated Pods: (27 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
<< application pods listed here >>
kube-system kube-proxy-0p3lj 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-uqmr1 20m (1%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
20m (1%) 0 (0%) 0 (0%) 0 (0%)
Is there anything specific I need to do to bring a node back online after a system restart?
My team was able to solve this one by downgrading docker to 1.12. It appears that the problem is related to this issue:
https://github.com/kubernetes/kubernetes/issues/40182
After downgrading docker to 1.12, everything is working now.