Azure service fabric does not cancel with lots of tasks - azure-service-fabric

I'm trying to test out various performance aspects of Azure Service Fabric to understand how it all works and have hit against some problems when cancelling a service.
In one particular test, I create 101 tasks, 50 reading a queue, 50 writing a queue and 1 showing progress and reporting it.
When the service gets stopped, for example just re-deploying the application I can see the cancellation token gets the request and some tasks cancel, but I see a lot of events in the event viewer basically saying
AsyncCalloutAdapter-22542743: end delegate threw an exception
System.OperationCanceledException: Operation canceled. ---> System.Runtime.InteropServices.COMException: Operation aborted (Exception from HRESULT: 0x80004004 (E_ABORT))
at System.Fabric.Interop.NativeRuntime.IFabricStateReplicator2.EndReplicate(IFabricAsyncOperationContext context)
at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedCompletedSynchronously)
--- End of inner exception stack trace ---
The only way to get this back is to either reset the local cluster.
When using a fewer number of tasks it all seems to work ok.
This is all using a local development cluster.

Related

Github Actions Concurrency Queue

Currently we are using Github Actions for CI for infrastructure.
Infrastructure is using terraform and a code change on a module triggers plan and deploy for changed module only (hence only updates related modules, e.g 1 pod container)
Since auto-update can be triggered by another github repository push they can come relatively on same time frame, e.g Pod A Image is updated and Pod B Image is updated.
Without any concurrency in place, since terraform holds lock, one of the actions will fail due to lock timeout.
After implementing concurreny it is ok for just 2 on same time pushes to deploy as second one can wait for first one to finish.
Yet if there are more coming, Githubs concurreny only takes into account last push for queue and cancels waiting ones (in progress one can still continue). This is logical from single app domain perspective but since our Infra code is using difference checks, by passing deployments on canceled job actually bypasses and deployment!.
Is there a mechanism where we can queue workflows (or even maybe give a queue wait timeout) on Github Actions ?
Eventually we wrote our own script in workflow to wait for previous runs
Get information on current run
Collect previous non completed runs
and wait until completed (in a loop)
If exited waiting loop continue
on workflow
Tutorial on checking status of workflow jobs
https://www.softwaretester.blog/detecting-github-workflow-job-run-status-changes

Error while deleting stateless service in Service Fabric Cluster

In Service Fabric Cluster i have a stateless service which has a while(true) loop running continuously in RunAsync Method. Due to this while loop i am finding it hard to delete the application from the cluster. Error occurs every time i try to delete stating cannot detach the process.Normally i try to deploy the application once to remove the code. To redeploy the code on top of the application i have to deploy twice. Is there a work around to this without removing the infinite while loop.
Updated: Runasync Method
protected override async Task RunAsync(CancellationToken cancellationToken)
{
//making sure the thread is active
while (true)
{
do something;
}
}
Thank you for the input.
During shutdown, the cancellation token passed to RunAsync is canceled.
You need to check the cancellation token's IsCancellationRequested property in your main loop. When this becomes true, and if called, the token's ThrowIfCancellationRequested method throws an OperationCanceledException.
If your service does not respond to these API calls in a reasonable amount of time, Service Fabric can forcibly terminate your service. Usually this only happens during application upgrades or when a service is being deleted. This timeout is 15 minutes by default.
See this document for a good reference: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-lifecycle#stateless-service-shutdown

The target server failed to respond for multiple iterations in Jmeter

In my Jmeter script,I am getting error for 2nd iteration.
For multiple users with single iteration, no errors were observed, but with multiple iterations am getting error with below message
Response code: Non HTTP response code: org.apache.http.NoHttpResponseException
Response message: Non HTTP response message: The target server failed to respond
Response data is The target server failed to respond
Error Snapshot
Could you please suggest me what could be reason behind this error
Thanks in advance
Most likely your server becomes overloaded. In regards to possible reason my expectation is that single iteration does not deliver the full concurrency as JMeter acts like:
JMeter starts all the virtual users within the specified ramp-up period
Each virtual user starts executing samplers
When there are no more samplers to execute and no loops to iterate - the thread is being shut down
So with 1 iteration you may run into situation when some threads have already finished their job and the others have not been started yet. When you add more iterations the "old" threads start over and "new" are arriving. The situation is explained in the JMeter Test Results: Why the Actual Users Number is Lower than Expected article and you can monitor the actual delivered load using Active Threads Over Time chart of the HTML Reporting Dashboard or Active Threads Over Time Listener available via JMeter Plugins
To get to the bottom of the failure I would recommend checking the following:
components logs on the application under test side (application logs, application/web server logs, database logs)
application under test baseline health metrics (CPU, RAM, Disk, etc.). You can use JMeter PerfMon Plugin, this way you will be able to correlate increasing load with resources consumption

Azure Service Fabric Actor Retry Logic on Exception

I'm working on a Azure Service Fabric solution.
I have implemented an Actor with some state, everything works fine.
Can anyone explain me , what happens if an user exception is thrown in my Actor implementation? What does the Service Fabric Environment do if a call to an actor throws an exception. Is there any default retry logic, that Forces the call again?
If an actor throws an exception, it gets handled inside ActorRemotingExceptionHandler or other default implementation of IExceptionHandler. Currently, if the exception is an ordinary exception which is not related to network issues or cluster or nodes availability, it will be rethrown on the client side where you will be able to handle it.

Azure Service Fabric

Please help me to know , Is there any option in the azure service fabric to delay deprovision ? I have a micro service application hosted in fabric which is distributed in different nodes at their instances . If i tried to disengage/deprovision the service from portal , Can the service fabric internally check whether any transaction is going any of the instances or not , If it is engaged , Will it wait for complete it ? Also want to know , If microsoft is not providing such a service , does we have any powershell command to check the instance status ?
Thanks
I assume that by "disengage/deprovision the service from portal" you are referring to deleting the service via the Service Fabric Explorer web app (perhaps via a link followed from the portal). Please correct me if this is wrong.
To answer your question directly, the framework will not wait for in-flight operations to complete during a service delete. Every replica for the service will lose its read and write permissions, causing all in-flight operations to fail. We do not offer a way to stall during this step in order to, for example, allow currently open transactions to be completed.
The reason we do not offer this semantic, is that service deletion is expected to be rare or permanent, and that delaying deletion for the final operation doesn't enable any additional scenarios. In either case, if a client is attempting operations on a service being deleted, either:
The last client operation may fail due to delete racing and revoking read/write permissions
Every subsequent client operation will fail due to the service no longer existing
or
The last client operation will succeed due to deletion being delayed
Every subsequent client operation will fail due to the service no longer existing
The expectation is that any client or dependent service should have already been updated or deleted prior to deleting the service they depend on, as you are making the permanent decision that this service should no longer exist.