Getting error no such device or address on kubernetes pods - kubernetes

I have some dotnet core applications running as microservices into GKE (google kubernetes engine).
Usually everything work right, but sometimes, if my microservice isn't in use, something happen that my application shutdown (same behavior as CTRL + C on terminal).
I know that it is a behavior of kubernetes, but if i request application that is not running, my first request return the error: "No such Device or Address" or timeout error.
I will post some logs and setups:

The key to what's happening is this logged error:
TNS: Connect timeout occured ---> OracleInternal.Network....
Since your application is not used, the Oracle database just shuts down it's idle connection. To solve this problem, you can do two things:
Handle the disconnection inside your application to just reconnect.
Define a livenessProbe to restart the pod automatically once the application is down.
Make your application do something with the connection from time to time -> this can be done with a probe too.
Configure your Oracle database not to close idle connections.

Related

Disconnect service fabric cluster connection

I know that we can connect to the service fabric cluster using Connect-ServiceFabricCluster as mentioned in Microsoft learn, which works flawlessly.
I use this in a script - it prints the following every time it tries to connect to service fabric again.
WARNING: Cluster connection with the same name already existed, the old connection will be deleted
So, is there a way to safely disconnect from sf before executing the next steps or closing, other than letting the connection time out?
To Disconnect service fabric cluster connection we have a Remove-ServiceFabricCluster command.
WARNING: Cluster connection with the same name already existed, the old connection will be deleted
The warning indicates that you are trying to connect the already connected cluster.
The warning itself says that the old one will be deleted and new one will be created.
AFAIK, you can continue without disconnecting/ removing the cluster.
Reference taken from MSDoc.

Why am I experiencing endless connection timeouts using quarkus microprofile reactive rest client

At some point of my quarkus app life (under kubernetes) it begins getting endless connection timeouts from multiple different hosts (timeout configured to be 1 second). As of this point the app never recovers until I restart the k8s pod.
These endless connection timeouts are not due to the hosts since other apps in the cluster do not suffer from this, also a restart of my app fixes the problem.
I am declaring multiple hosts(base-uri) through the quarkus application.properties. (maybe its using a single vertx/netty event-loop and it's wrong?)

azure websocket connection through kubernetes, many disconnects with code 1006

A nodejs server on kubernetes get many websocket connections - all is fine, but from time to time an abrupt disconnect occurs (code 1006).
Then every few minutes, the server disconnects from all clients (all disconnects have code 1006).
Important to note that this happens to all replicas at the same time, indicating the cause is external to the servers (and the clients). Could it be the application gateway?
How can I debug this further?
Changing from the default azure application gateway to nginx solved this problem.

Google Cloud SQL unable to connect and restart master instance

This morning my application could not connect my MySQL master instance in Google Cloud SQL. The master instance does not have more logs, but the replication instance log show that the replication could not connect to the master too.
I tried to restart MySQL, but an hour later, it didn't start.
What should I do?
There are several possible reasons for this issue. For instance, your master instance may have failed due to an error while a dump was being created, or the instance may have been under maintenance and now it cannot restart correctly, etc. If that were the case, you would need to get in touch with Google Cloud Platform Support to have your Cloud SQL instance manually restarted.
Alternatively, you can also check the documentation for instance connection issues and how to diagnose issues in connections.
If nothing of this applies to your case, you should consider adding more information to your question, since there could be a problem with the expiration of your SSL server certificate, with the Proxy, etc.

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.