Kubernetes API Server Fails To start / Looks Like Cannot connect to DNS Server - kubernetes

I have found some similar questions to kubernetes API server not starting but the error message I am getting is different. I have had a working cluster for several months, went to login yesterday and it was offline. Looked around in some log files and this is what I get below, looks like its trying to make a DNS query to my local DNS Server which has been working fine for the last few years and still works fine. The Log is below and I'm pretty frustrated because I don't know how to fix this, have made no config changes and hoping the community can help.
E0609 00:03:14.518792 1 controller.go:152] Unable to remove old endpoints from kubernetes service: StorageError: key not found, Code: 1, Key: /registry/masterleases/192.168.5.2, ResourceVersion: 0, AdditionalErrorMsg:
F0609 00:03:14.534558 1 controller.go:161] Unable to perform initial IP allocation check: unable to refresh the service IP block: Get https://localhost:6443/api/v1/services: dial tcp: lookup localhost on 172.16.0.1:53: no such host

In case anybody else comes across this issue, it had to do with a missing entry in my /etc/hosts file, there needs to be a line "127.0.0.1 localhost" for the api server to start correctly. If that is missing it tries to use a DNS server lookup which does not make sense, happy I have it working!

Related

K3s dial tcp lookup server misbehaving during letsencrypt staging

After succesfully hosting a first service on a single node cluster I am trying to add a second service with both its own dnsName.
The first service uses LetsEncrypt succesfully and now I am trying out the second service with a test-certifcate and the staging endpoint/clusterissuer
The error I am seeing once I describe the Letsencrypt Order is:
Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://example.nl/.well-known/acme-challenge/9kdpAMRFKtp_t8SaCB4fM8itLesLxPkgT58RNeRCwL0': Get "http://example.nl/.well-known/acme-challenge/9kdpAMRFKtp_t8SaCB4fM8itLesLxPkgT58RNeRCwL0": dial tcp: lookup example.nl on 10.43.0.11:53: server misbehaving
The port that is misbehaving is pointing to the internal IP of my service/kube-dns, which means it is past my service/traefik i think.
The cluster is running on a VPS and I have also checked the example.nl domain name is added to /etc/hosts with the VPS's ip like so:
206.190.101.190 example1.nl
206.190.101.190 example.nl
The error is a bit vague to me because I do not know exactly what de kube-dns is doing and why it thinks the server is misbehaving, I think maybe it is because it has now 2 domain names to handle I missed something. Anyone can shed some light on it?
Feel free to ask for more ingress or other server config!
Everything was setup right to be able to work, however this issue had definitely had something to do with DNS resolving. Not internally in the k3s cluster, but externally at the domain registrar.
I found it by using https://unboundtest.com for my domain and saw my old namespaces still being used.
Contacted the registrar and they had to change something for the domain in the DNS of the registry.
Pretty unique situation, but maybe helpful for people who also think the solution has to be found internally (inside k3s).

HAProxy running in PfSense returning 503 error with NextCloud and can't fix it

Good afternoon everyone,
I have the following setup in my home-lab:
ESXi
PfSense
NextCloud
TrueNAS
I am running HAproxy in PfSense instance, and have a domain that I have set up to access my NAS locally (and I have tested it and can make it work externally, though I do not want to do that). I can access it localy at an address like nas.homelab.com
I am trying to set up NextCloud the same way, this time externally, however, I keep getting a 503 error. I have this set up so I can see it from the internet as well, using a link similar to nc.homelab.com
I've gone through and set everything up as best I can using a Lawerence Systems video on the subject, however, I can not figure out how to get rid of the 503 error.
I've seen other threads mentioning to make sure I have a default backend to eliminate this error, however I have one set for the NAS, since I know it works, and nothing has changed.
Thank you all for your help!
Sam

Aws ec2 and Route 53 domain

I'm banging my head against the wall at the moment.
What am I doing wrong here?
Your help would be much appreciated!
I started with AWS, bought a domain with route 53 and thought I could easily start using it.
Have made an A record with the server IP [static IP].
This seems to result in a DNS_PROBE_FINISHED_NXDOMAIN domain that can't be reached.
Even after waiting for hours.
Next solution I found on the web was setting a CNAME record;
This doesn't seem to work either.
What am I doing wrong here, any suggestions?
Thank you for your input
I have been learning a lot about AWS and it's quite handy.
[update]
* I found the dns name at the elastic IP settings [public DNS]
Step to do this :
Create A record of domain
Give same EC2 IP to A record
Change Security group of EC2 for port 80 and 443( if using) to all
Also try to ping EC2 IP by opening ssh port.
If do this all carefully. Then for IP changes sometime take times.
To see whether changes reflected or not.
Ubuntu :
open : /etc/hosts file and record for this.
terminal > sudo nano /etc/hosts/
add entry this file
xx.xxx.xxx.xxx www.xample.com
and save and close
then try to ping your domain and hit from browser. if this works then revert file changes. wait for Route53 to reflect changes in A record.
I found the problem.
When you register the domain, Amazon has set the nameservers, these nameservers on the register page and route53 were different. This is why I couldn't point the domain to my IP.
After setting them the same; the domain was pointing to my server.

How to handle ip address change in couchbase cluster?

I have couchbase cluster on k8s with operator 1.2 , I see following error today continuously
IP address seems to have changed. Unable to listen on 'ns_1#couchbase-cluster-couchbase-cluster-0001.couchbase-cluster-couchbase-cluster.default.svc'. (POSIX error code: 'nxdomain') (repeated 3 times)
The “IP address change” message is an alert message generated by Couchbase Server. The server checks for this situation as follows: it tries to listen on a free port on the interface that is the node’s address.
It does this every 3 seconds. If the host name of the node can’t be resolved you get an nxdomain error which is the most common reason that users see this alert message.
However, the alert would also fire if the user stopped the server, renamed the host and restarted - a much more serious configuration error that we would want to alert the user to right away. Because this check runs every three seconds, if you have any flakiness in your DNS you are likely to see this alert message every now and then.
As long as the DNS glitch doesn’t persist for long (a few seconds) there probably won’t be any adverse issues. However, it is an indication that you may want to take a look at your DNS to make sure it’s reliable enough to run a distributed system such as Couchbase Server against. In the worst case, DNS that is unavailable for a significant length of time could result in lack of availability or auto failover.
Ps:Thanks to Dave Finlay who actually answered this question to me.
if you received above error "IP address seems to have changed. Unable to listen on 'ns_1#lxcobeccestg1.gcp.xxx.com'. (POSIX error code: 'nxdomain') "
then hostname got changed, however new hostname is not changed in ip / ip_start file. to resolve this got
to /opt/couchbase/var/lib/couchbase/ip_start.
and you have to update the ip_start file with the hostname.
/opt/couchbase/var/lib/couchbase/ look for ip or ip_start vi ip_start
and change the name in my case it was still showing wrong hostname lxcobeccestg1.gcp. i have changed it to lxcobeccestg2.gcp
Execute:
sudo /etc/init.d/couchbase-server start
or systemctl restart couchbase-server

Is it possible to see connection attempts to a Google Cloud SQL instance?

We are currently encountering the following error when trying to connect to a Cloud SQL instance: Lost connection to MySQL server at 'reading initial communication packet', system error: 0.
This is a familiar error, and as detailed here usually means the IP address needs to be whitelisted. However, we believe we have done so.
Is there a way to see connection attempts and their IP addresses that have been made (and refused) to the Cloud SQL instance?
Currently we don't expose that information but it is something we would like fix. :-)
According to #Razvan, as of September 2014, this information isn't exposed.
We ended up using CIDR blocks to search the space and find the actual IP address. This is unsatisfying, obviously, but it's a way to pin down the problem.
If other people want to sanity check that the problem is their IP is being refused, you can add 0.0.0.0/0 in order to accept all ranges and try to connect. If it works, you know what is the problem.
Be absolutely sure to remove this as an accepted range, after you are done, however!
Figured I might help someone who stumbles here.
Had exactly the same issue essentially trying to connect to a GCP SQL instance from a hosting provider.
Whitelist the IP address that is shown in my cpanel and it will not connect. (It used to, but the provider made some changes with their infrastructure lately and it stopped working)
put 0.0.0.0/0 in my Cloud Platform whitelist and it connects no problem.
So now I know that my cpanel IP is not the IP trying to connect to GCP.
After some hair pulling (figured that the bare metal server had a different IP than my cpanel IP, it did, but this also didn't work.)
finally tried the IP address for the name servers that point to my domain and bam. All is good.
If you are facing this issue, try your name server (usually something like NS1.hostingprovider.com etc..). I put both the NS1 and NS2 ip's in the whitelist and we are working fine.