How to handle ip address change in couchbase cluster? - kubernetes

I have couchbase cluster on k8s with operator 1.2 , I see following error today continuously
IP address seems to have changed. Unable to listen on 'ns_1#couchbase-cluster-couchbase-cluster-0001.couchbase-cluster-couchbase-cluster.default.svc'. (POSIX error code: 'nxdomain') (repeated 3 times)

The “IP address change” message is an alert message generated by Couchbase Server. The server checks for this situation as follows: it tries to listen on a free port on the interface that is the node’s address.
It does this every 3 seconds. If the host name of the node can’t be resolved you get an nxdomain error which is the most common reason that users see this alert message.
However, the alert would also fire if the user stopped the server, renamed the host and restarted - a much more serious configuration error that we would want to alert the user to right away. Because this check runs every three seconds, if you have any flakiness in your DNS you are likely to see this alert message every now and then.
As long as the DNS glitch doesn’t persist for long (a few seconds) there probably won’t be any adverse issues. However, it is an indication that you may want to take a look at your DNS to make sure it’s reliable enough to run a distributed system such as Couchbase Server against. In the worst case, DNS that is unavailable for a significant length of time could result in lack of availability or auto failover.
Ps:Thanks to Dave Finlay who actually answered this question to me.

if you received above error "IP address seems to have changed. Unable to listen on 'ns_1#lxcobeccestg1.gcp.xxx.com'. (POSIX error code: 'nxdomain') "
then hostname got changed, however new hostname is not changed in ip / ip_start file. to resolve this got
to /opt/couchbase/var/lib/couchbase/ip_start.
and you have to update the ip_start file with the hostname.
/opt/couchbase/var/lib/couchbase/ look for ip or ip_start vi ip_start
and change the name in my case it was still showing wrong hostname lxcobeccestg1.gcp. i have changed it to lxcobeccestg2.gcp
Execute:
sudo /etc/init.d/couchbase-server start
or systemctl restart couchbase-server

Related

How to Confirm PostgreSQL on Ubuntu VM is communicating with External Server for Updates

I have an Ubuntu VM installed on a client's VMware system. Recently, the client's IT informed us that his firewall has been detecting consistent potential port scans to our VM's internal IP address (coming from 87.238.57.227). He asked if this was part of a known package update process on our VM.
He sent us a firewall output where we can see several instances of the port scan, but there are also instances of our Ubuntu VM trying to communicate back to the external server on port 37258 (this is dropped by the firewall).
Based on a google lookup, the hostname of the external IP address is "feris.postgresql.org", with the ASN pointing to a European company called Redpill-Linpro. As far as I can tell, they offer IT consulting services, specializing in open source software (like PostgreSQL, which is installed on our VM). I have never heard of them before though and have no idea why our VM would be communicating with them or vice-versa. I'm also not sure if I'm interpreting the IP lookup information correctly: https://ipinfo.io/87.238.57.227
I'm looking for a way to confirm or disprove that this is just our VM pinging for a standard postgres update. If that's the case I'd like to restrict this behaviour. We would prefer to do these types of updates manually and limit the communication outside of the VM to what is strictly necessary for the functionality of our application.
Update
I sent an email to Redpill's abuse account. They responded quickly saying that the server should not be port scanning anyone and if it appears that way, something is wrong.
The server is part of a cluster of machines that serves apt.postgresql.org among other postgres download sites. I don't think we have anything like ansible or puppet installed that would automatically check for updates but I will look into that to make sure. I'm wondering if Ubuntu reaching out to update the MOTD with the number of available packages would explain why our VM is trying to reach out to the external postgres server?
The abuse rep said in any case there should only be outgoing connections from the VM, not incoming. He asked for some additional info so I will keep communicating with him and try to update this post accordingly
My communication with the client's IT dropped off so I did not get a definitive answer on this, but I'll provide some new details:
I reached out to the abuse email for Redpill-Linpro. He got back to me and confirmed the server corresponding to the detected IP address is part of a cluster that hosts postgres download sites, including apt.postgresql.org. He was surprised to learn we had detected a port scan from their server and seems eager to figure out why that is happening.
He asked if the client IT could pass along some necessary info for them to set up tracking on that server. But the client IT never got back to me. I think he was satisfied that it wasn't malicious and stopped pursuing it.
Here's one of the messages the abuse rep sent me that may be relevant:
That does look a lot like the tcp to the apt download server yes. It's
strange that your firewall reports that many incoming connections, but
they could be fallout from some connection tracking that's not
operating as intended. The timing appears to be matching up more or
less perfectly. And there should definitely not be any ping-back
connections from it.
Since you appear to be using the http version of the server (and not https) bringing the data in cleartext, they should be able to just
dump the TCP connection contents and verify exactly what it does. But
I bet they are going to see a number of http requests initiated by the
apt client that is checking for updates.

Aws ec2 and Route 53 domain

I'm banging my head against the wall at the moment.
What am I doing wrong here?
Your help would be much appreciated!
I started with AWS, bought a domain with route 53 and thought I could easily start using it.
Have made an A record with the server IP [static IP].
This seems to result in a DNS_PROBE_FINISHED_NXDOMAIN domain that can't be reached.
Even after waiting for hours.
Next solution I found on the web was setting a CNAME record;
This doesn't seem to work either.
What am I doing wrong here, any suggestions?
Thank you for your input
I have been learning a lot about AWS and it's quite handy.
[update]
* I found the dns name at the elastic IP settings [public DNS]
Step to do this :
Create A record of domain
Give same EC2 IP to A record
Change Security group of EC2 for port 80 and 443( if using) to all
Also try to ping EC2 IP by opening ssh port.
If do this all carefully. Then for IP changes sometime take times.
To see whether changes reflected or not.
Ubuntu :
open : /etc/hosts file and record for this.
terminal > sudo nano /etc/hosts/
add entry this file
xx.xxx.xxx.xxx www.xample.com
and save and close
then try to ping your domain and hit from browser. if this works then revert file changes. wait for Route53 to reflect changes in A record.
I found the problem.
When you register the domain, Amazon has set the nameservers, these nameservers on the register page and route53 were different. This is why I couldn't point the domain to my IP.
After setting them the same; the domain was pointing to my server.

client is waiting forever for remote server to return a webpage

I have an application with a server written in F# and serve web files using suave. I remote login using powershell into another machine in the network to run the application (The application is also in one of the network drives). I do that because that machine have access to third party APIs needed for the server. Now when I do [IPAddress_Of_Remote_Machine]/[html_file] or [name_of_pc]/[html_file] then chrome is waiting forever and doesn't ever return the webpage. This wasn't happening before and I ran into this problem recently. I opened a different port and used it instead of the default one 80. This made things work but the problem keeps showing up after a couple of days. I don't think it's a firewall issue but I'm clueless to why this is happening.
When running netstat -an, this is what I get (I hid the IP address):
As you can see all of the connections are either in CLOSE_WAIT or ESTABLISHED but not LISTENING. All of these TCP connections is probably because I have PhantomJS and two other APIs running in the application as well. However the loop back address is also open on the same port 5959:
I'm not sure what is difference between these two but when using PortQryUI to query the remote server it returns a success!
I have already made an inbound rule for port 5959 on the server so it should be allowed. The web page is stuck at Waiting for [name_of_pc]. Also, sometimes this problem disappears and everything works fine.
What is the potential problem behind this? Why would this happen all of a sudden?
UPDATE:
I re-ran the application today and it's working correctly. It could be that something is dynamically set within the firewall? Not really sure what is going on. The machine I'm running the server on has a bunch of applications running on it as well so maybe there is an external process that is affecting it?
I made a hello world app with Suave and deployed it on the network drive to test if it's going to work. I opened inbound rule for port 6001
Then I ran the app:
However, it's still not working and this time it says the site cannot be reached when I do: http://[name_of_pc]:6001.
Moving this to an answer so that it can be closed:
Could you post the bindings section of your suave cfg? I'm guessing you know where that is since you are using a non-standard port but if you need don't, search for HttpBinding. I suspect you will find it pointing to 127.0.0.1 which is not good enough for remote access. You could try changing it to 0.0.0.0 or to the server's actual IP address. I would try 0.0.0.0 first for the flexibility it provides

What occurs when socket's IP changed, and how to deal with it?

I'm developing a C/S program using Delphi 7, TServerSocket and TClientSocket controls. One problem is now I can only use my PC as server, and my PC is using virtual dialer, so ISP keeps changing my IP, about once in one or two days.
Because I'm using a router, the ServerSocket is opened directly in my local IP (192.168.1.x), just mapped to my public IP, so I suppose the ServerSocket itself shouldn't crash when my public IP changes. What I suppose should be: when my IP changes, all connecting sockets become unavailable, and when my application doesn't know it and still using the socket, ServerSocket should receive some event like OnClientError.
But I found a weird problem - when my IP changed, the server application automatically shut down. I don't know exactly what happened because the shut-down time is afternoon, I was in my office, but I noticed another result: even I used heartbeat in my application layer protocol, the server didn't catch the keep-alive failure - because I recorded everything in a log file on my server, and didn't find it. So my server must be shut down instantly when my IP changed, which even didn't reach the keep-alive logic.
This seems very weird, how can a socket error(due to IP change) directly lead to the whole application shut down? Please if someone have any explanations, and how to deal with this problem, thanks
Once the socket is opened, its bound IP address will never change. This can not be 'fixed' on the server side. I would recommend to work on the servers stability, also the clients should detect that the server no longer exists at the given IP address, and re-connect. (This is independent of why the server became unavailable - a restarting server is normal.)

Is it possible to see connection attempts to a Google Cloud SQL instance?

We are currently encountering the following error when trying to connect to a Cloud SQL instance: Lost connection to MySQL server at 'reading initial communication packet', system error: 0.
This is a familiar error, and as detailed here usually means the IP address needs to be whitelisted. However, we believe we have done so.
Is there a way to see connection attempts and their IP addresses that have been made (and refused) to the Cloud SQL instance?
Currently we don't expose that information but it is something we would like fix. :-)
According to #Razvan, as of September 2014, this information isn't exposed.
We ended up using CIDR blocks to search the space and find the actual IP address. This is unsatisfying, obviously, but it's a way to pin down the problem.
If other people want to sanity check that the problem is their IP is being refused, you can add 0.0.0.0/0 in order to accept all ranges and try to connect. If it works, you know what is the problem.
Be absolutely sure to remove this as an accepted range, after you are done, however!
Figured I might help someone who stumbles here.
Had exactly the same issue essentially trying to connect to a GCP SQL instance from a hosting provider.
Whitelist the IP address that is shown in my cpanel and it will not connect. (It used to, but the provider made some changes with their infrastructure lately and it stopped working)
put 0.0.0.0/0 in my Cloud Platform whitelist and it connects no problem.
So now I know that my cpanel IP is not the IP trying to connect to GCP.
After some hair pulling (figured that the bare metal server had a different IP than my cpanel IP, it did, but this also didn't work.)
finally tried the IP address for the name servers that point to my domain and bam. All is good.
If you are facing this issue, try your name server (usually something like NS1.hostingprovider.com etc..). I put both the NS1 and NS2 ip's in the whitelist and we are working fine.