How to kill a Marathon instance after duration? - marathon

I need a Marathon app (Docker container in this case) that is created on demand, then goes away (normal SIGTERM) after a configurable amount of time.
What is the best way to implement this? With a health check hack?
I initially turned to the API protos, but found nothing obvious there.

Related

How to run a monitoring application?

I want to build a simple application, which does the following:
Make an http call towards a specific end point.
If the end point doesn't respond, I want to reset a scheduled task on a remote machine (restarting the service that is exposing the end point).
But once built, how do I run such an application?
Should I make it a continuously running application, that just performs its logic every X minutes?
Should I make it a scheduled task that runs the application every X minutes?
Or is there a completely different way of doing it? If all options are viable, that's fine as well, I don't want to start a "which do you feel is better" debate. I just want to dodge any land mines I might have missed.

Progressive Web App: skipWaiting() with multiple service worker versions

[CONTEXT]
I worked through Jake Archibald's fantastic Udacity course found here: Offline Web Applications. His work provides a Toasts dialog alerting the user that there is an update available, and they are invited to update:
Refresh / Dismiss Dialog
While this dialog is available to the user, there's a corner case on hand that I can't seem to resolve:
The service-worker can be updated any number of times prior to the client updating the local instance, pushing the numbered version of the service worker past 'just one more'. For example, the current and active service worker is #821, while the service worker that is waiting is now #824
active and waiting service workers
[PROBLEM]
I cannot find the right way to alert the browser that the next service worker to install needs to be #824, instead of #822, the dialog-box + PWA tell me that the current browser is 'redundant', and that I can't get to service-worker #824 without refreshing, and then clicking the update button.
I can recreate this with any version of Jake's code once the service-worker is set, and skipWaiting() is introduced.
I literally just want to be able to cover the corner case where the service-worker is updated 2 or more times before the user decides to update their local PWA.
You can find Jake's code on github: Jakearchibald/wittr
[ASK]: Has anyone found a solution for this corner case? If so, how do you solve it? What I'm seeing doesn't make sense as the service-worker lifecycle seems to respected per Googles documentation: service-workers/lifecyle
I did quite a bit of additional reading/research and found the following discussion threads on Github:
- Provide an easier way to listen for waiting/activated/redundant Service Workers
- Immediate Service Worker
- Recommended Approach for Refreshing Page on new SW
- Provide a one-line way to listen for a waiting Service Worker
It looks like this idea was brought up in 2017, and has for the most part gone stale. However, you can double down on using
navigator.serviceWorker.waiting.then(reg => {
if (confirm('refresh now?')) reg.waiting.postMessage('skipWaiting');
});
That will give you the ability to listen for a new Web Worker, after activating Web Worker #1, then setting Web Worker #2 to redundant, and moving Web Worker #3 into a waiting state. It's obtuse and indirect, but at least you can now move the 3# thread up and into the right slot.
A real shout-out to dfabulich, Matt Gaunt, and Beatrix Perez

Kubernetes Pod warm-up for load balancing

We are having a Kubernetes service whose pods take some time to warm up with first requests. Basically first incoming requests will read some cached values from Redis and these requests might take a bit longer to process. When these newly created pods become ready and receive full traffic, they might become not very responsive for up to 30 seconds, before everything is correctly loaded from Redis and cached.
I know, we should definitely restructure the application to prevent this, unfortunately that is not feasible in a near future (we are working on it).
It would be great if it was possible to reduce the weight of the newly created pods, so they would receive 1/10 of the traffic in the beggining with the weight increasing as the time would pass. This would be also great for newly deployed versions of our application to see if it behaves correctly.
Why you need the cache loading in first call instead of having in heartbeat which is hooked to readiness probe? One other option is to make use of init containers in kubernetes
Until the application can be restructured to do this "priming" internally...
For when running on Kubernetes, look into Container Lifecycle Hooks and specifically into the PostStart hook. Documentation here and example here.
It seems that the behavior of "...The Container's status is not set to RUNNING until the postStart handler completes" is what can help you.
There's are few gotchas like "... there is no guarantee that the hook will execute before the container ENTRYPOINT" because "...The postStart handler runs asynchronously relative to the Container’s code", and "...No parameters are passed to the handler".
Perhaps a custom script can simulate that first request with some retry logic to wait for the application to be started?

Unity hard real time synchronization

I need to synchronize Unity app to a 3rd party app where time synchronization is crucial (1-2ms varient max).
The way this is done today (without Unity) is getting priority of the OS scheduler with a designated app which will assure a constant delay.
A constant delay is good enough as it can be used in the data analysis which is not done in real time. Today the constant delay is measured once on the beginning.
Thanks in advance.
This kind of delays should be easy to achieve in a background thread.
Threads work well in Unity, despite common belief. The only thing you need to look out for is not to access Unity objects from the thread.
Easiest way to do this is to start a thread in a MonoBehaviour.Start with the IsBackground property set to true (so you don't have to worry about it blocking your application exit) and communicate to and from it with a message queue (for example a List<Action> with locked access).

AppFabric Hosted Workflow does not always reload after delay/unload

I have a WCF Windows Workflow (4.5) Workflow Service hosted under IIS and using AppFabric 1.1. The workflow instances are long-running (up to about a week), but much of the time is spent in Delay activities.
This seemed to work fine at first, but when running multiple instances of the workflow at the same time (2+ instances causes this), some of them just never wake up once they've unloaded from memory during the Delay step. When I look at the logs, the errors I find all look like this:
System.OperationCanceledException: The execution of InstancePersistenceCommands has been canceled because the InstanceHandle was freed.
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Activities.Dispatcher.DurableInstanceManager.WaitAndHandleStoreEventsCallback(IAsyncResult result)
Unfortunately, I'm not finding any useful information on that error message.
The SuspensionExceptionName and SuspensionReason fields in the AppFabric Persisted Instances Table show System.NullReferenceException: Object reference not set to an instance of an object. But this doesn't happen inside my workflow, only outside.
Additional Info:
I'm running the activity as a Fire & Forget (receive activity, no send)
My workflow calls into other WCF services to fetch data.
I am running this on Server 2012 R2, IIS 8 (not azure)
Workflow Persistence is working. I can reset IIS, reboot... its just when I run 2 instances that it has problems.
I'm definitely not hitting any kind of throttling limits. While the workflow deals with a few MB of data, this issue happens at 2+ instances.
Any idea what might be happening here?
Edit:
I realized I found more information on how the issue operates and never added it to the question. When the delay issue happens, it operates a lot like a static variable getting written by 2 threads.
Here's a visualization:
WF1 Start ---->Do Stuff--->Sleep------------*1----->Cancelled Exception at some point
------WF 2 Start---->Do Stuff------->Sleep->Wake up---------*2------>More Stuff---->End Successfully
*1 - When WF Instance 1 Should Wake up (Same time as WF 2 wakes)
*2 - When WF Instance 2 Should have woken up (Seems to be ignored)
Before anyone asks... I got rid of every static variable, method, class in my code. Nothing is static anymore.
I've been struggling with similar issues for quite a while. I use WFW4 and I find similar errors when a workflow instance is in a long delay.
I don't know what the cause of the problem is, but I have a work around that you might find helpful.
In my case, the errors I get are from Workflow Management Service and say:
Failed to invoke service management endpoint at 'net.pipe://.svc' to activate service '/Alerts/Workflows/.xamlx'. Exception: 'Access is denied.'
These errors start happening sometime between 6 and 30 hours after the instance goes into a long delay.
I have found that if I create a new instance of the workflow when the first instance is in delay and the errors are happening, then Workflow Management Service is able to resume interacting with the first sleeping instance.
So, I made a new workflow whose sole purpose is to periodically launch and then kill instances of the workflow that contains the long delay.
It actually gets a bit more complicated to make this work. I wanted this new workflow to also go to sleep between times when it creates and kills a new instance of the first workflow. But this going to sleep causes the instance of the new workflow to suffer the same problem as the first workflow. So, I modified the new workflow so it does the following:
-- delay for some rather short period, such as 30 minutes
-- create an instance of the first workflow
-- wait a minute
-- kill the just-created instance of the first workflow
-- create a new instance of this new error-preventing workflow
-- terminate
Since having done this, I no longer get the Access is Denied error from Workflow Management Service!
Hope this helps
Turns out my first answer was not correct, but I believe this answer is right, and solves the issue ChrisG is having.
My workaround did not actually work. Took a while for the problem to resurface. 29 hours to be precise - the default time it takes for an app pool to recycle.
So for me, the solution was to make my app pool not recycle. When an app pool recycles while a workflow instance is in a delay activity, the workflowManagementService is not able to wake up the instance and throws Access is Denied errors. If you create a new instance of the workflow after the app pool has recycled, the first instance will pick up where it left off, but sometimes still has problems, which is what I believe is happening to ChrisG.
ChrisG, looking at your visualization, is it possible that an appPool is recycling during the time wf1 is sleeping? I believe that is the cause the exception. If you then launch a new wf instance after *2 has passed (and if an app pool recycle happened prior to *1), that will wake up both wf1 and wf2, but wf1 won't work properly (at least in my experience)
Also, this happens after iisresets and server reboots. To handle those, you need to use IIS7 which allows the web application (as well as the web site) which is hosting the xamlx files to autostart after an iisreset or server reboot. This option is not available in IIS6. See http://www.postseek.com/meta/991815402b369e71ce925cde47ac907d for details
Hope this helps!