Spark Driver died, but did not kill the application - scala

I have a streaming job, which fails due to a network call timeout. Whereas the application keeps retying for some time, If in the mean time I kill the Driver, the application does not die. And I have to manually kill the application through the UI.
My question is:
Does this happen because the network connection forms over a different thread and does not let the Application die??

Related

SIGTERM signal arrives first to kuma and stops all active application connections immediately

we have applications that work with Kafka (MSK), we noticed that once pod is starting to shutdown (during autoscaling or deployment) the app container loses all active connections and the SIGTERM signal causes Kuma to close all connections immediately which cause data loss due to unfinished sessions (which doesn’t get closed gracefully) on the app side and after that we receive connection errors to the kafka brokers,
is anyone have an idea how to make Kuma wait some time once it gets the SIGTERM signal to let the sessions close gracefully?
or maybe a way to let the app know before the kuma about the shutsown?
or any other idea ?
This is known issue getting fixed in the coming 1.7 release: https://github.com/kumahq/kuma/pull/4229

Getting error no such device or address on kubernetes pods

I have some dotnet core applications running as microservices into GKE (google kubernetes engine).
Usually everything work right, but sometimes, if my microservice isn't in use, something happen that my application shutdown (same behavior as CTRL + C on terminal).
I know that it is a behavior of kubernetes, but if i request application that is not running, my first request return the error: "No such Device or Address" or timeout error.
I will post some logs and setups:
The key to what's happening is this logged error:
TNS: Connect timeout occured ---> OracleInternal.Network....
Since your application is not used, the Oracle database just shuts down it's idle connection. To solve this problem, you can do two things:
Handle the disconnection inside your application to just reconnect.
Define a livenessProbe to restart the pod automatically once the application is down.
Make your application do something with the connection from time to time -> this can be done with a probe too.
Configure your Oracle database not to close idle connections.

Restart Akka Actor System after terminated

We have an Akka http app with approx. 100+ API and 15+ Actors. After Http.bindAndHandle(routes, host, port) I have terminated ActorSystem.
Http().bindAndHandle(corsHandler(routes), "0.0.0.0", 9090/*, connectionContext = https*/)
sys.addShutdownHook(actorSystem.terminate())
So, I don't want to stop my application. So, My questions are:
Does actorsystem needs to terminate compulsory?
Does my application stop working after terminating actorsystem?
What if user hits API after actorsystem is terminated? Does it Restart again to handle API requests?
So, what do I need to do if I want my application always listening to client requests.
Thanks.
You are looking for fault tolerance in your application. As the actor system is going to be terminated in situation of some error or when we explicitly force it to terminate. You have to use supervision strategy for your application to be fault tolerant. Please look into these links
https://doc.akka.io/docs/akka/2.5/fault-tolerance.html
https://doc.akka.io/docs/akka/2.5/general/supervision.html
The purpose of a shutdown hook is to allow an orderly shut down of the application when the JVM is about to shutdown. It's not necessarily required in all circumstances, but an orderly shutdown could be useful if your ActorSystem wants to release resources in an orderly manner, or signal to other nodes in a cluster that it's being shut down.
When the actor system has terminated, there will be no more actors to handle HTTP requests, because actors cannot exist without a running actor system to be part of. So no, if your user hits the API after the actor system has terminated, the actor system will not be restarted, because instead the request will simply be rejected (connection refused or something).
You can't avoid that happening in your code because a JVM shutdown cannot be cancelled.
However, the good news is, you can avoid it at the infrastructure level using various operational techniques, e.g. blue-green deployments with a HTTP load balancer can support downtime-free upgrades of stateless applications.

JDBC connection pool never shrinks

I run 3 processes at the same time , all of them are using the same DB (RDS postgres)
all of them are java application that uses JDBC for connecting to the DB
I am using PGPoolingDataSource in every process as a connection pool for the DB
every request is handled by the book - ended with
finally{
connection.close();
}
main problems:
1.
I ran out of connections really fast because I do a massive work
with the DB at the beginning (which is ok) but the pool never
shrinks.
I get some exceptions in the code because there are not enough connections and I wish I could expand the timeout when a requesting
a connection.
My insights:
the PGPoolinDataSource never shrinks by definition! I couldn't find any documentation about it, but I assume this is the case. So I tried the apache DBCP pool and again I am having the same problem .
I think there must be timeout when waiting for a connection - I would guess that this timeout can be configured, but I couldn't find this configuration on both pools .
My Questions:
why does the pool never shrinks?!
how to determine how many connections to allocate for each pool\process (here every process has one pool)
what happens if I don't close the pool (not the connections) and the app is dead does the connections on the pool are still alive? this happens a lot when I update the application I stop and start it so I never close the pool.
what would be a good JDBC connection pool that works best with postgres and that has an option to set the timeout for the getConnection ?
Thanks

Jobs in a queue is dropped unexpectedly in Gearman

I'm dealing with a very strange problem now.
Since I queue the jobs over 1,000 at once, Gearman doesn't work properly so far...
The problem is that, when I reserve the jobs in background mode, I could see the jobs were correctly queued from the monitoring page (gearman monitor),
but It is drained right after without delivering it to the worker. (within a few seconds)
After all, the jobs never be executed by the worker, just disappeared from the queue (job server).
So I tried rebooting the server entirely, and reinstall gearman as well as php library. (I'm using 1 CentOS, 1 Ubuntu with PHP gearman library, and version is 0.34 and 1.0.2)
But no luck yet... Job server just misbehaving as I explained in aobve.
What should I do for now?
Can I check the workers state, or see and monitor the whole process from queueing the jobs to the delivering to the worker?
When I tried gearmand with a option like: 'gearmand -vvvv' It never print anything on the screen while I register worker to the server, and run a job with client code (PHP)
Any comment will be appreciated.
For your information, I'm not considering persistent queue using MySQL or SQLite for now, because it sometimes occurs performance issue with slow execution.