How to properly detect and then remove jobs that cannot be recovered using Quartz.net? - quartz-scheduler

For various valid reasons, some jobs in the job store are old and can no longer be recovered. For instance, when the Job class is no longer part of the .NET assemblies after a refactor. I'm wondering how to gracefully catch these problems when the scheduler starts, and then delete the unrecoverable jobs.
When the app starts, I basically do this (abridged):
IScheduler scheduler = <create a scheduler and a jobstore object>
try{ scheduler.Start() } catch {}
try{ scheduler.Start() } catch {}
try{ scheduler.Start() } catch {}
If I call Start() three times, the scheduler eventually starts. The reason I have to do this hacky thing is because Start() will throw exceptions for unrecoverable, old jobs.
Failure occured during job recovery. and Could not load type 'MyOldClassName' from assembly 'MyAssembly'.
I want to gracefully remove the broken jobs and avoid these exceptions. In my actual code, I log these exceptions.
Is there a better way to do this?

I found one way to do this. Calling this before Start() cures the problem.
var jobs = this._scheduler.GetJobKeys(GroupMatcher<JobKey>.AnyGroup());
foreach (var jobKey in jobs)
{
try
{
// attempt to access the jobType. If it fails, then we know it's broken
Type t = _scheduler.GetJobDetail(jobKey).JobType;
}
catch (JobPersistenceException ex)
{
if (ex.InnerException != null)
{
if (ex.InnerException.GetType() == typeof(TypeLoadException))
{
_scheduler.DeleteJob(jobKey);
}
}
else
{
// log this
}
}
catch (Exception ex)
{
// log this
}
}

Related

Vert.x: How to wait for a future to complete

Is there a way to wait for a future to complete without blocking the event loop?
An example of a use case with querying Mongo:
Future<Result> dbFut = Future.future();
mongo.findOne("myusers", myQuery, new JsonObject(), res -> {
if(res.succeeded()) {
...
dbFut.complete(res.result());
}
else {
...
dbFut.fail(res.cause());
}
}
});
// Here I need the result of the DB query
if(dbFut.succeeded()) {
doSomethingWith(dbFut.result());
}
else {
error();
}
I know the doSomethingWith(dbFut.result()); can be moved to the handler, yet if it's long, the code will get unreadable (Callback hell ?) It that the right solution ? Is that the omny solution without additional libraries ?
I'm aware that rxJava simplifies the code, but as I don't know it, learning Vert.x and rxJava is just too much.
I also wanted to give a try to vertx-sync. I put the dependency in the pom.xml; everything got downloaded fine but when I started my app, I got the following error
maurice#mickey> java \
-javaagent:~/.m2/repository/co/paralleluniverse/quasar-core/0.7.5/quasar-core-0.7.5-jdk8.jar \
-jar target/app-dev-0.1-fat.jar \
-conf conf/config.json
Error opening zip file or JAR manifest missing : ~/.m2/repository/co/paralleluniverse/quasar-core/0.7.5/quasar-core-0.7.5-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
I know what the error means in general, but I don't know in that context... I tried to google for it but didn't find any clear explanation about which manifest to put where. And as previously, unless mandatory, I prefer to learn one thing at a time.
So, back to the question : is there a way with "basic" Vert.x to wait for a future without perturbation on the event loop ?
You can set a handler for the future to be executed upon completion or failure:
Future<Result> dbFut = Future.future();
mongo.findOne("myusers", myQuery, new JsonObject(), res -> {
if(res.succeeded()) {
...
dbFut.complete(res.result());
}
else {
...
dbFut.fail(res.cause());
}
}
});
dbFut.setHandler(asyncResult -> {
if(asyncResult.succeeded()) {
// your logic here
}
});
This is a pure Vert.x way that doesn't block the event loop
I agree that you should not block in the Vertx processing pipeline, but I make one exception to that rule: Start-up. By design, I want to block while my HTTP server is initialising.
This code might help you:
/**
* #return null when waiting on {#code Future<Void>}
*/
#Nullable
public static <T>
T awaitComplete(Future<T> f)
throws Throwable
{
final Object lock = new Object();
final AtomicReference<AsyncResult<T>> resultRef = new AtomicReference<>(null);
synchronized (lock)
{
// We *must* be locked before registering a callback.
// If result is ready, the callback is called immediately!
f.onComplete(
(AsyncResult<T> result) ->
{
resultRef.set(result);
synchronized (lock) {
lock.notify();
}
});
do {
// Nested sync on lock is fine. If we get a spurious wake-up before resultRef is set, we need to
// reacquire the lock, then wait again.
// Ref: https://stackoverflow.com/a/249907/257299
synchronized (lock)
{
// #Blocking
lock.wait();
}
}
while (null == resultRef.get());
}
final AsyncResult<T> result = resultRef.get();
#Nullable
final Throwable t = result.cause();
if (null != t) {
throw t;
}
#Nullable
final T x = result.result();
return x;
}

Cancelling Fetch results in client being Disconnected

When cancelling IMailFolder.Fetch method with the cancellationToken, I get an exception that the client is disconnected.
I debugged MailKit and traced the issue to ImapEngine.Iterate() method where there is the following:
try {
while (current.Step ()) {
// more literal data to send...
}
if (current.Bye)
Disconnect ();
} catch {
Disconnect ();
throw;
} finally {
current = null;
}
Is it the right approach to disconnect the client on every exception type being caught?
Should this also apply to the case when we are cancelling the operation, so we can prioritize another operation, and we do not want to disconnect?
How else would you cancel a command that is in progress if not disconnecting the socket?

Grpc - use rx observable

I am trying to expose an observable via a GRPC stream.
my simplified code looks like this:
public override async Task Feed(Request request, IServerStreamWriter<Response> responseStream, ServerCallContext context)
{
var result = new Result();
try
{
await Observable.ForEachAsync(async value =>
{
await responseStream.WriteAsync(value);
});
}
catch (Exception ex)
{
Log.Info("Session ended:" + ex);
}
}
I receive the following error:
W0123 14:30:59.709715
Grpc.Core.Internal.ServerStreamingServerCallHandler2 Exception
occured in handler. System.ArgumentException: Der Wert liegt außerhalb
des erwarteten Bereichs. bei
Grpc.Core.Internal.ServerStreamingServerCallHandler2.d__4.MoveNext()
W0123 14:30:59.732716 Grpc.Core.Server Exception while handling RPC.
System.InvalidOperationException: Der Vorgang ist aufgrund des
aktuellen Zustands des Objekts ungültig. bei
Grpc.Core.Internal.AsyncCallServer2.SendStatusFromServerAsync(Status
status, Metadata trailers, Tuple2 optionalWrite) bei
Grpc.Core.Internal.ServerStreamingServerCallHandler`2.d__4.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde --- bei
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task
task) bei
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
task) bei Grpc.Core.Server.d__34.MoveNext()
How would you recommend to handle this? I guess I would need to process the ForEachAsync in the same thread.
With the gRPC streaming API, you are only allowed to write one item at a time. If you start another WriteAsync() operation before the previous one finishes, you'll get an exception. You also need to finish all your writes before returning from the method handler (the Feed method in this case). The reason only one write is allowed at a time is to ensure gRPC's flow control works well.
In your case the Rx API doesn't seem to be capable of ensuring that, so one way to solve this would be to use an intermediate buffer.
How about this?
public override async Task Feed(Request request, IServerStreamWriter<Response> responseStream, ServerCallContext context)
{
var result = new Result();
try
{
await Observable.Scan(Task.CompletedTask, (preceding,value) =>
preceding.ContinueWith(_ => responseStream.WriteAsync(value))
);
}
catch (Exception ex)
{
Log.Info("Session ended:" + ex);
}
}
I haven't tested it yet, but I think it works anyway.
There are two issues preventing the OP's method from working as expected both stemming from the responseStream.WriteAsync's expectation that only one write can be pending at a time.
When the observable is subscribed to, each 'onNext' will be processed on the same thread that raised the notification. This means you be able to enforce the one write at a time rule. You can use the ObserveOn method to control how each notification will be scheduled.
var eventLoop = new EventLoopScheduler();
var o = myObservable.ObserveOn(eventLoop);
ForEachAsync doesn't work like it looks like one might think with async lambdas. If you drop the async/await here and instead call WriteAsync like this, then the Write will be executed in the expected context.
responseStream.WriteAsync<Response>(value).Wait();
Here is a complete example...
public override async Task Feed(Request request, IServerStreamWriter<Response> responseStream, ServerCallContext context)
{
var o = getMyObservable(request);
try
{
var eventLoop = new EventLoopScheduler(); //just for example, use whatever scheduler makes sense for you.
await o.ObserveOn(eventLoop).ForEachAsync<Response>(value =>
{
responseStream.WriteAsync(value).Wait();
},
context.CancellationToken);
}
catch (TaskCanceledException) { }
catch (Exception ex)
{
Log.Info("Session ended:" + ex);
}
Log.Info("Session ended normally");
}

Service Fabric Reliable Queues FabricNotReadableException

I have a Stateful service with 1000 partitions and 1 replica.
This service in the RunAsync method have an infinte while cycle where I call a Reliable Queue to get messages.
If there are no messages I wait 5 seconds, then retry.
I used to do exactly that with Azure Storage Queue with success.
But with Service Fabric I'm getting thousands of FabricNotReadableExceptions, the Service become unstable and I'm not able to update it or delete it, I need to cancel the entire cluster.
I tried to update it and after 18 hours it was still stuck, so there is something terribly wrong in what I'm doing.
This is the method code:
public async Task<QueueObject> DeQueueAsync(string queueName)
{
var q = await StateManager.GetOrAddAsync<IReliableQueue<string>>(queueName);
using (var tx = StateManager.CreateTransaction())
{
try
{
var dequeued = await q.TryDequeueAsync(tx);
if (dequeued.HasValue)
{
await tx.CommitAsync();
var result = dequeued.Value;
return JSON.Deserialize<QueueObject>(result);
}
else
{
return null;
}
}
catch (Exception e)
{
ServiceEventSource.Current.ServiceMessage(this, $"!!ERROR!!: {e.Message} - Partition: {Partition.PartitionInfo.Id}");
return null;
}
}}
This is the RunAsync
protected override async Task RunAsync(CancellationToken cancellationToken)
{
while (true)
{
var message = await DeQueueAsync("MyQueue");
if (message != null)
{
//process, takes around 500ms
}
else
{
Thread.Sleep(5000);
}
}
}
I also changed Thread.Sleep(5000) with Task.Delay and was having thousands of "A task was canceled" errors.
What I'm missing here?
It's the cycle too fast and SF cannot update the other replicas in time?
Should I remove all the replicas leaving just one?
Should I use the new ConcurrentQueue instead?
I have the problem in production and in local with 50 or 1000 partitions, doesn't matter.
I'm stuck and confused.
Thanks
You need to honor the cancellationToken that is passed in to your RunAsync implementation. Service Fabric will cancel the token when it wants to stop your service for any reason - including upgrades - and it will wait indefinitely for RunAsync to return after cancelling the token. This could explain why you couldn't upgrade your application.
I would suggest checking cancellationToken.IsCancelled inside your loop, and breaking out if it has been cancelled.
FabricNotReadableException can happen for a variety of reasons - the answer to this question has a comprehensive explanation, but the takeaway is
You can consider FabricNotReadableException retriable. If you see it, just try the call again and eventually it will resolve into either NotPrimary or Granted.

scheduler.properties file could be found. - Quartz scheduler

I have a problem while getting the instance for Quartz Scheduler, not at the first call but on the continuous calls.
This is my piece of code.
public void getClusteredSchedulerInstance() {
try {
cluteredScheduler = new StdSchedulerFactory("scheduler.properties").getScheduler();
if (!cluteredScheduler.isStarted()) {
cluteredScheduler.start();
}
} catch (SchedulerException e) {
logger.error("Error while starting clustered scheduler", e);
}
}
When i call the method for the first time it reads the property file and gives the instance, but fails to do so on further calls.
Can i know why this happens?
Note: scheduler.properties is located in the current working directory.
Error message
org.quartz.SchedulerException: Properties file: 'scheduler.properties' could not be read. [See nested exception: java.io.FileNotFoundException: scheduler.properties (The system cannot find the file specified)]