I can connect to my DigitalOcean Ubuntu 20LTS VM instant that has PostgreSQL 14 installed without issue, but I'm trying to make it more secure with only specific IPs that can connect to the database.
I heard the way to do this is to modify the /etc/postgresql/14/main/postgresql.conf file.
When I have this line, I can connect to my database without issue.
listen_addresses='0.0.0.0'
However, if I modify this line with:
listen_addresses='123.123.123.123'
I get this DataGrip error message: [08001] Connection to 111.111.111.111:12345 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
111.111.111.111:12345 is my (fake) VM's IP and port that I already set up.
123.123.123.123 is my (fake) computer's external IP that I get from here or here
Any suggestions? Is there a log I can search from that will give me a better understanding of what is going on?
Also to note, with listen_addresses='0.0.0.0', running ss -ptl gives an output of
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 127.0.0.53%lo:domain 0.0.0.0:*
LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:*
LISTEN 0 244 0.0.0.0:12345 0.0.0.0:*
LISTEN 0 128 [::]:ssh [::]:*
with listen_addresses='123.123.123.123', running ss -ptl gives an output of
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 127.0.0.53%lo:domain 0.0.0.0:*
LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:*
LISTEN 0 128 [::]:ssh [::]:*
Documentation that I used so far:
https://www.postgresql.org/docs/current/runtime-config-connection.html
https://www.postgresql.org/docs/current/auth-pg-hba-conf.html
I have a K8S pod. Inside pod, I do dns lookup using nslookup. It works fine. But when I do tcpdump on pod interface (eth0), it clearly shows received dns response has bad udp checksum. I checked with netstat the udp counters, but I dont see the checksum error counter (InCsumErrors) at all getting hit. Here are some relevant outputs.
IP config of pod:
root#node:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0#NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0#if10936: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether e2:22:5c:6c:53:bd brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.233.85.177/32 scope global eth0
valid_lft forever preferred_lft forever
Successfull Nslookup:
bash-4.4# nslookup google.com
Server: 169.254.25.10
Address: 169.254.25.10#53
Non-authoritative answer:
Name: google.com
Address: 216.58.207.238
Name: google.com
Address: 2a00:1450:400e:809::200e
Tcpdump showing bad udp cksum for above nslookup run:
root#node:~# tcpdump -ni eth0 -vvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:02:24.267999 IP (tos 0x0, ttl 64, id 50356, offset 0, flags [none], proto UDP (17), length 82)
10.233.85.177.52764 > 169.254.25.10.53: [bad udp cksum 0x23f2 -> 0xd1bd!] 43806+ A? google.com.qaammuk.svc.cluster.local. (54)
16:02:24.269489 IP (tos 0x0, ttl 64, id 56987, offset 0, flags [DF], proto UDP (17), length 175)
169.254.25.10.53 > 10.233.85.177.52764: [bad udp cksum 0x244f -> 0x2c2a!] 43806 NXDomain*- q: A? google.com.qaammuk.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (147)
16:02:24.269847 IP (tos 0x0, ttl 64, id 50357, offset 0, flags [none], proto UDP (17), length 74)
10.233.85.177.39433 > 169.254.25.10.53: [bad udp cksum 0x23ea -> 0xac65!] 45029+ A? google.com.svc.cluster.local. (46)
16:02:24.270901 IP (tos 0x0, ttl 64, id 56988, offset 0, flags [DF], proto UDP (17), length 167)
169.254.25.10.53 > 10.233.85.177.39433: [bad udp cksum 0x2447 -> 0x06d2!] 45029 NXDomain*- q: A? google.com.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (139)
16:02:24.271206 IP (tos 0x0, ttl 64, id 50358, offset 0, flags [none], proto UDP (17), length 70)
10.233.85.177.59330 > 169.254.25.10.53: [bad udp cksum 0x23e6 -> 0xdaca!] 2633+ A? google.com.cluster.local. (42)
16:02:24.272262 IP (tos 0x0, ttl 64, id 56989, offset 0, flags [DF], proto UDP (17), length 163)
169.254.25.10.53 > 10.233.85.177.59330: [bad udp cksum 0x2443 -> 0x3537!] 2633 NXDomain*- q: A? google.com.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (135)
16:02:24.272527 IP (tos 0x0, ttl 64, id 50359, offset 0, flags [none], proto UDP (17), length 56)
10.233.85.177.53873 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x278c!] 52759+ A? google.com. (28)
16:02:24.272707 IP (tos 0x0, ttl 64, id 56990, offset 0, flags [DF], proto UDP (17), length 82)
169.254.25.10.53 > 10.233.85.177.53873: [bad udp cksum 0x23f2 -> 0xe468!] 52759* q: A? google.com. 1/0/0 google.com. [8s] A 216.58.211.110 (54)
16:02:24.272963 IP (tos 0x0, ttl 64, id 50360, offset 0, flags [none], proto UDP (17), length 56)
10.233.85.177.54691 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x370f!] 47943+ AAAA? google.com. (28)
16:02:24.273141 IP (tos 0x0, ttl 64, id 56991, offset 0, flags [DF], proto UDP (17), length 94)
169.254.25.10.53 > 10.233.85.177.54691: [bad udp cksum 0x23fe -> 0xf8e0!] 47943* q: AAAA? google.com. 1/0/0 google.com. [8s] AAAA 2a00:1450:400e:809::200e (66)
netstat output to show udp counters from linux stack. No InCsumErrors:
root#node:~# netstat -s -u
Udp:
18 packets received
0 packets to unknown port received
0 packet receive errors
18 packets sent
0 receive buffer errors
0 send buffer errors
UdpLite:
IpExt:
InOctets: 2130
OutOctets: 1101
InNoECTPkts: 18
I tried both with checksum offload enabled and disable on eth0. Same behavior in both cases.
Shouldn't bad udp checksum detected by tcpdump mean that kernel will at some point drop udp packets before handing them over to the socket bound to nslookup?
When you do nslookup google.com 8.8.8.8 everything looks fine. I think that this because since you are using coredns to resolve the domains, the packets run through a Service.
Service in k8s is a virtual entity. It appears as a forwarding rule in iptables. During forwarding process, source ip address is swapped out without recalculating the cksum, thus the error in tcpdump.
And according to RFC 768, UDP checksum is defined as following:
Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data
So as you see, IP header also is a part of a checksum and so is the source IP that is swapped out and this is changing the checksum.
Calculating the checksum is usually done using hardware acceleration of NIC before sending/receiving packets to a node. It would require a lot of computation just to compute checksums of all packets doing through iptables. And also it is useless because once you receive a packet on a node's network interface and you confirm that it's valid, you can be sure that it will stay valid within the node even after you forward it with iptables.
Does K8S setup some rules for linux kernel to ignore bad udp checksums
for pod interfaces?
I know that e.g. loopback interface does not checksum packets (at least not by default). Maybe brigde interfaces (e.g. docker0 and veth*) also doesn't checksum. I tried to find a strong evidence to prove this statement but I didn't find anything to either prove it or disprove it.
try:
ethtool --offload eth0 rx off tx off
ethtool -K eth0 gso off
Filebeat is running on Machine B which read logs and push to ELK logstash on Machine A.
But in the Machine B filebeat log, it shows the error i/o timeout
2019-08-24T12:13:10.065+0800 ERROR pipeline/output.go:100 Failed to connect to backoff(async(tcp://example.com:5044)): dial tcp xx.xx.xx.xx:5044: i/o timeout
2019-08-24T12:13:10.065+0800 INFO pipeline/output.go:93 Attempting to reconnect to backoff(async(tcp://example.com:5044)) with 1 reconnect attempt(s)
I've check the logstash on Machine A which running well, can listening on 0 0.0.0.0:5044
Here is the logstash log
[INFO ] 2019-08-24 12:09:35.217 [[main]-pipeline-manager] beats - Beats inputs: Starting input listener {:address=>"0.0.0.0:5044"}
And here is netstat output,
$ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:5044 0.0.0.0:* LISTEN 20668/java
I also check the firewall on Machine A is closed.
$ firewall-cmd --list-all
FirewallD is not running
$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy DROP)
target prot opt source destination
I also use telnet to connect Machine A, But I get this,
$ telnet example.com 5044
Trying xx.xx.xx.xx...
telnet: connect to address xx.xx.xx.xx: Connection timed out
I run the filebeat with same config on Machine A(local) to check it the config for filebeat on Machine B(remote) is wrong, it works well.
2019-08-24T14:17:35.195+0800 INFO pipeline/output.go:95 Connecting to backoff(async(tcp://localhost:5044))
2019-08-24T14:17:35.198+0800 INFO pipeline/output.go:105 Connection to backoff(async(tcp://localhost:5044)) established
At last I find it's caused by the VPS Provider aliyun, it only open some common port such 22, 80,443.
I need to login to aliyun VPS management page, and open 5044 to make VPS Provider bypass the 5044 port.
*Note: * Attachment: some other issues I encountered when config filebeat with ELK.
**Issue 1: ** Failed to connect to backoff(async(tcp://ip:5044)): dial tcp ip:5044: connect: connection refused
2019-08-26T10:25:41.955+0800 ERROR pipeline/output.go:100 Failed to connect to backoff(async(tcp://example.com:5044)): dial tcp xx.xx.xx.xx:5044: connect: connection refused
2019-08-26T10:25:41.955+0800 INFO pipeline/output.go:93 Attempting to reconnect to backoff(async(tcp://example:5044)) with 2 reconnect attempt(s)
Issue 2: Failed to publish events caused by: write tcp ip:46890->ip:5044: write: connection reset by peer
2019-08-26T10:28:32.274+0800 ERROR logstash/async.go:256 Failed to publish events caused by: write tcp xx.xx.xx.xx:46890->xx.xx.xx.xx:5044: write: connection reset by peer
2019-08-26T10:28:33.311+0800 ERROR pipeline/output.go:121 Failed to publish events: write tcp xx.xx.xx.xx:46890->xx.xx.xx.xx:5044: write: connection reset by peer
Issue 3: Filebeat error: lumberjack protocol error and Logstash error: OPENSSL_internal:WRONG_VERSION_NUMBER
Filebeat log error,
2019-08-26T08:49:09.505+0800 INFO pipeline/output.go:95 Connecting to backoff(async(tcp://example.com:5044))
2019-08-26T08:49:09.588+0800 INFO pipeline/output.go:105 Connection to backoff(async(tcp://example.com:5044)) established
2019-08-26T08:49:09.605+0800 ERROR logstash/async.go:256 Failed to publish events caused by: lumberjack protocol error
2019-08-26T08:49:09.606+0800 ERROR logstash/async.go:256 Failed to publish events caused by: client is not connected
Logstash log,
[INFO ] 2019-08-26 08:49:29.444 [defaultEventExecutorGroup-4-2] BeatsHandler - [local: 0.0.0.0:5044, remote: undefined] Handling exception: javax.net.ssl.SSLHandshakeException: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[WARN ] 2019-08-26 08:49:29.445 [nioEventLoopGroup-2-7] DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
...
All the three issues are caused by miss configuration, here is the workable config,
logstash version,
/usr/share/logstash/bin/logstash -V
logstash 7.3.1
filebeat version,
/usr/share/filebeat/bin/filebeat version
filebeat version 7.3.1 (amd64), libbeat 7.3.1 [a4be71b90ce3e3b8213b616adfcd9e455513da45 built 2019-08-19 19:30:50 +0000 UTC]
logstash conf file /etc/logstash/conf.d/beat.conf
input {
beats {
port => 5044
ssl => true
ssl_certificate_authorities => "/etc/pki/tls/certs/logstash-forwarder.crt"
ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
ssl_verify_mode => "peer"
}
}
output {
elasticsearch {
hosts => "http://127.0.0.1:9200"
manage_template => false
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
filebeat conf file /etc/filebeat/filebeat.yml
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /data/error_logs/Log_error_201908
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["example.com:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
ssl.certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]
# Certificate for SSL client authentication
ssl.certificate: "/etc/pki/tls/certs/logstash-forwarder.crt"
# Client Certificate Key
ssl.key: "/etc/pki/tls/private/logstash-forwarder.key"
I was able to enable ipv6 on mongodb.
/etc/mongod.conf file has net.ipv6 set to true.
I can see that mongodb is listening on ipv6:
# netstat -anp | grep 27017
tcp 0 0 0.0.0.0:27017 0.0.0.0:* LISTEN 17967/mongod
tcp6 0 0 :::27017 :::* LISTEN 17967/mongod
unix 2 [ ACC ] STREAM LISTENING 19750206 17967/mongod /tmp/mongodb-27017.sock
#
ping6 to the IPv6 address is fine.
[root#tesla05 log]# ping6 -I eno33554952 tesla05-2-ipv6.ulticom.com
PING tesla05-2-ipv6.ulticom.com(tesla05) from fe80::250:56ff:feb4:7c43 eno33554952: 56 data bytes
64 bytes from tesla05: icmp_seq=1 ttl=64 time=0.101 ms
64 bytes from tesla05: icmp_seq=2 ttl=64 time=0.093 ms
64 bytes from tesla05: icmp_seq=3 ttl=64 time=0.091 ms
however, mongo shell doesn't seem to understand ipv6 address.
[root#tesla05 log]# mongo --ipv6 [fe80::250:56ff:feb4:7c43]:27017/admin
MongoDB shell version: 3.2.4
connecting to: [fe80::250:56ff:feb4:7c43]:27017/admin
2016-10-25T12:04:50.401-0400 W NETWORK [thread1] Failed to connect to fe80::250:56ff:feb4:7c43:27017, reason: errno:22 Invalid argument
2016-10-25T12:04:50.402-0400 E QUERY [thread1] Error: couldn't connect to server [fe80::250:56ff:feb4:7c43]:27017, connection attempt failed :
connect#src/mongo/shell/mongo.js:226:14
#(connect):1:6
exception: connect failed
[root#tesla05 log]# mongo --ipv6 tesla05-2-ipv6.ulticom.com:27017/admin
MongoDB shell version: 3.2.4
connecting to: tesla05-2-ipv6.ulticom.com:27017/admin
2016-10-25T12:15:17.861-0400 W NETWORK [thread1] Failed to connect to fe80::250:56ff:feb4:7c43:27017, reason: errno:22 Invalid argument
2016-10-25T12:15:17.861-0400 E QUERY [thread1] Error: couldn't connect to server tesla05-2-ipv6.ulticom.com:27017, connection attempt failed :
connect#src/mongo/shell/mongo.js:226:14
#(connect):1:6
exception: connect failed
You are trying to use a link-local IPv6 address. These are not valid without a scope, but you haven't provided one. Thus you get the error Invalid argument. For this reason, putting a link-local address in the DNS makes no sense, because the address is only valid on a particular LAN, and the scope may be different for every host on that LAN.
To use the address, append the scope to it, e.g. fe80::250:56ff:feb4:7c43%eno33554952
I have set up sharing an ssh connection on my local machine fine, but when I try and do this on our CI server it is failing and I cant work out why.
the ~/.ssh/config is
StrictHostKeyChecking=no
Host *
ControlMaster auto
ControlPath ~/.ssh/control:%h:%p:%r
ControlPersist 2h
First connection will fail, but will create the socket, second connection will fail as the socket is stale.
The end of the verbose output from first connection is....
$ ssh -vvvv -N user#domain.co.uk
....
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1
debug1: match: OpenSSH_6.6.1 pat OpenSSH_6.6.1* compat 0x04000000
....
Authenticated to domain.co.uk ([88.47.112.93]:22).
debug1: setting up multiplex master socket
debug3: muxserver_listen: temporary control path /home/rof/.ssh/control:domain.co.uk:22:user.3HfyjbhRCDHGwnrI
debug2: fd 4 setting O_NONBLOCK
debug3: fd 4 is O_NONBLOCK
debug3: fd 4 is O_NONBLOCK
debug1: channel 0: new [/home/rof/.ssh/control:domain.co.uk:22:user]
debug3: muxserver_listen: mux listener channel 0 fd 4
debug2: fd 3 setting TCP_NODELAY
debug3: packet_set_tos: set IP_TOS 0x08
debug1: control_persist_detach: backgrounding master process
debug2: control_persist_detach: background process is 84004
Control socket connect(/home/rof/.ssh/control:domain.co.uk:22:user): Connection refused
Failed to connect to new control master
debug1: forking to background
debug1: Entering interactive session.
debug2: set_control_persist_exit_time: schedule exit in 7200 seconds
If you run it without the -N option the command input just hangs.
any subsequent ssh connections say the socket is stale and unlink it, thus not using a shared connection.
Any ideas?
For anyone else with this issue, this was due to the fact the the CI server we use uses overlayfs as its filesystem, which doesn't play nice with unix sockets.
To fix this, I saved the socket in virtual memory instead..
ControlPath /var/shm/control:%h:%p:%r