To ask briefly, is there a way to specify Service/Application Startup dependency in Azure Service Fabric?
I have two services, say S1 and S2. S2 depends on S1, and must be started after S1 starts. Currently S1 and S2 are under different applications packages. I can also put them into one application package if necessary.
It works if I start S1 first, then S2 during deployment. However, seems that Service Fabric has some maintenance work, during which services get restarted. Now the problem is that the order of starting S1 and S2 is not guaranteed, which causes S2 to fail to read some configurations during initialization. S2 fails silently but keeps running.
In Service Fabric there's way to specify SetupEntryPoint", however in this case S1 itself has an "SetupEntryPoint", besides I feel it is not proper to put a long running service under "SetupEntryPoint".
I'm also thinking about making S2 stop when it fails to read configurations from S1, in that case Service Fabric will keep trying restarting S2 until S1 gets started.
But is there any way to guarantee S2 starts after S1 through Service Fabric config?
I also faced the same problem. I don't think it is possible via the Service Fabric config alone.
What in the end I come up with the solution is that I redesign the service in the way that it won't completely startup unless the dependent service is up. In my case, my service A depends on service B. If service A boots first, it'll fire a message to the message bus, and wait for service B to reply. If service B is already started, then it'll reply directly. If service B is not yet started, once it started it'll reply. Then when service A gets the reply, it'll just continue with it's startup. That way it'll be always be able to really finish starting.
Another easier way is that if service B is down, just throw an exception in Service A. Service Fabric will try to start the application in another node, and hopefully by then service B is up. The down side is that the first initial boot will be painfully slow (as it'll keeps retrying until the correct order shows up) and also you'll be seeing a lot of error (but eventually will go away when all services finished startup)
Related
I have a BE service in NestJS that is deployed in Vercel.
I need several schedulers, so I have used #nestjs/schedule lib, which is super easy to use.
Locally, everything works perfectly.
For some reason, the only thing that is not working in my production environment is those schedulers. Everything else is working - endpoints, data base access..
Does anyone has an idea why? is it something with my deployment? maybe Vercel has some issue with that? maybe this schedule library requires something the Vercel doesn't have?
I am clueless..
Cold boot is the process of starting a computer from shutdown or a powerless state and setting it to normal working condition.
Which means that the code you deployed in a serveless manner, will run when the endpoint is called. The platform you are using spins up a virtual machine, to execute your code. And keeps the machine running for a certain period of time, incase you get another API hit, it's cheaper and easier on them to keep the machine running for lets say 5 minutes or 60 seconds, than to redeploy it on every call after shutting the machine when function execution ends.
So in your case, most likely what is happening is that the machine that you are setting the cron on, is killed after a period of time. Crons are system specific tasks which run in the kernel. But if the machine is shutdown, the cron dies with it. The only case where the cron would run, is if the cron was triggered at a point of time, before the machine was shut down.
Certain cloud providers give you the option to keep the machines alive. I remember google cloud used to follow the path of that if a serveless function is called frequently, it shifts from cold boot to hot start, which doesn't kill the machine entirely, and if you have traffic the machines stay alive.
From quick research, vercel isn't the best to handle crons, due to the nature of the infrastructure, and this is what you are looking for. In general, crons aren't for serveless functions. You can deploy the crons using queues for example or another third party service, check out this link by vercel.
I have a stateless service that pulls messages from an Azure queue and processes them. The service also starts some threads in charge of cleanup operations. We recently ran into an issue where these threads which ideally should have been killed when the service shuts down continue to remain active (definitely a bug in our service shutdown process).
Further looking at logs, it seemed that, the RunAsync methods cancellation token received a cancellation request, and later within the same process a new instance of the stateless service that was registered in ServiceRuntime.RegisterServiceAsync was created.
Is this expected behavior that service fabric can re-use the same process to start a new instance of the stateless service after shutting down the current instance.
The service fabric documentation https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-hosting-model does seem to suggest that different services can be hosted on the same process, but I could not find the above behavior being mentioned there.
In the shared process model, there's one process per ServicePackage instance on every node. Adding or replacing services will reuse the existing process.
This is done to save some (process-level) overhead on the node that runs the service, so hosting is more cost-efficient. Also, it enables port sharing for services.
You can change this (default) behavior, by configuring the 'exclusive process' mode in the application manifest, so every replica/instance will run in a separate process.
<Service Name="VotingWeb" ServicePackageActivationMode="ExclusiveProcess">
As you mentioned, you can monitor the CancellationToken to abort the separate worker threads, so they can stop when the service stops.
More info about the app model here and configuring process activation mode.
I'm going to write a quick little SF service to report endpoints from a service to a load balancer. That part is easy and well understood. FabricClient. Find Services. Discover endpoints. Do stuff with load balancer API.
But I'd like to be able to deal with a graceful drain and shutdown situation. Basically, catch and block the shutdown of a SF service until after my app has had a chance to drain connections to it from the pool.
There's no API I can find to accomplish this. But I kinda bet one of the services would let me do this. Resource manager. Cluster manager. Whatever.
Is anybody familiar with how to pull this off?
From what I know this isn't possible in a way you've described.
Service Fabric service can be shutdown by multiple reasons: re-balancing, errors, outage, upgrade etc. Depending on the type of service (stateful or stateless) they have slightly different shutdown routine (see more) but in general if the service replica is shutdown gracefully then OnCloseAsync method is invoked. Inside this method replica can perform a safe cleanup. There is also a second case - when replica is forcibly terminated. Then OnAbort method is called and there are no clear statements in documentation about guarantees you have inside OnAbort method.
Going back to your case I can suggest the following pattern:
When replica is going to shutdown inside OnCloseAsync or OnAbort it calls lbservice and reports that it is going to shutdown.
The lbservice the reconfigure load balancer to exclude this replica from request processing.
replica completes all already processing requests and shutdown.
Please note that you would need to implement startup mechanism too i.e. when replica is started then it reports to lbservice that it is active now.
In a mean time I like to notice that Service Fabric already implements this mechanics. Here is an example of how API Management can be used with Service Fabric and here is an example of how Reverse Proxy can be used to access Service Fabric services from the outside.
EDIT 2018-10-08
In order to abstract receive notifications about services endpoints changes in general you can try to use FabricClient.ServiceManagementClient.ServiceNotificationFilterMatched Event.
There is a similar situation solved in this question.
I'm a bit confused with this configuration. My Spring Boot app with #EnableDiscoveryClient has spring.cloud.consul.host set to localhost. I'm running a Consul Agent on the host where my Boot app is running, but I've a few questions (can't seem to find my answers in the documentation).
Can this config accept multiple values?
If so, I'd prefer to set the values to a list of Consul server addresses (but then, what's the point of running Consul Agents at all, so this doesn't seem practical, which means I'm not understanding something here)
If not, are we expected to run a Consul Agent on every node a Boot app with #EnableDiscoveryClient is running? (this feels wrong as well; for one, this would seem like a single point of failure even though one agent should be able to tell everything about the cluster; what if I can't contact this one agent?)
What's the best practice for this configuration?
Actuallly this is Consul itself to solve your problem. An agent is runing on every server to handle clustering, failures, sharing data, autodiscovery etc. for you so that you don't neen to know the other hosts in your Spring Boot configuration. Spring Boot app always connects to the agent running on the same machine.
See https://www.consul.io/docs/agent/basics.html
Please help me to know , Is there any option in the azure service fabric to delay deprovision ? I have a micro service application hosted in fabric which is distributed in different nodes at their instances . If i tried to disengage/deprovision the service from portal , Can the service fabric internally check whether any transaction is going any of the instances or not , If it is engaged , Will it wait for complete it ? Also want to know , If microsoft is not providing such a service , does we have any powershell command to check the instance status ?
Thanks
I assume that by "disengage/deprovision the service from portal" you are referring to deleting the service via the Service Fabric Explorer web app (perhaps via a link followed from the portal). Please correct me if this is wrong.
To answer your question directly, the framework will not wait for in-flight operations to complete during a service delete. Every replica for the service will lose its read and write permissions, causing all in-flight operations to fail. We do not offer a way to stall during this step in order to, for example, allow currently open transactions to be completed.
The reason we do not offer this semantic, is that service deletion is expected to be rare or permanent, and that delaying deletion for the final operation doesn't enable any additional scenarios. In either case, if a client is attempting operations on a service being deleted, either:
The last client operation may fail due to delete racing and revoking read/write permissions
Every subsequent client operation will fail due to the service no longer existing
or
The last client operation will succeed due to deletion being delayed
Every subsequent client operation will fail due to the service no longer existing
The expectation is that any client or dependent service should have already been updated or deleted prior to deleting the service they depend on, as you are making the permanent decision that this service should no longer exist.