I have an event-sourced CQRS architecture hosted in kubernetes. I have a single writer (the "denormalizer service") that listens to events and writes the denormalized views to a datastore. These views are then served by a separate view service. When the denormalizer image is updated through a deployment with new projections it replays all events from the beginning and writes the new views to a different datastore.
So I need 2 instances of denormalizer, one with the old code, and another replaying the events through the new code. When the new code is finished replaying I need to:
1) signal to the view service to switch to the newly written datastore and then,
2) bring down the old denormalizer deployment as it is no longer required.
Problem is, (to my limited knowledge) kubernetes seems ill-equipped to deal with this scenario.
Any idea how I would do something like this?
I don't know the specifics of your system, but two solutions come to my mind.
Using readinessProbe
You can define a readinessProbe for your writer service. Make it report the service is ready, when the rewrite is done. Then, the rolling updater will know when to shut down the old version of the writer and start serving traffic to the new one. The only thing you'd need to do more is notify the viewer to switch to new data source. This could be done by the writer calling some API on the viewer service.
Using separate process
You can create a special process that will execute the procedure you described using Kubernetes API. It is more work than the first solution, but gives you more control over the whole process. It would observe your repository if there are new versions of writers, if yes, it would start a new service, wait for it to be ready, kill the old writer and notify the viewer.
Related
I've been looking at using service fabric as a platform for a new solution that we are building and I am getting hung up on data / stage management. I really like the concept of reliable services and the actor model and as we have started to prototype out some things it seems be working well.
With that beings said I am getting hung up on state management and how I would use it in a 'real' project. I am also a little concerned with how the data feels like a black box that I can't interrogate or manipulate directly if needed. A couple scenarios I've thought about are:
How would I share state between two developers on a project? I have an Actor and as long as I am debugging the actor my state is maintained, replicated, etc. However when I shut it down the state is all lost. More importantly someone else on my team would need to set up the same data as I do, this is fine for transactional data - but certain 'master' data should just be constant.
Likewise I am curious about how I would migrate data changes between environments. We periodically pull production data down form our SQL Azure instance today to keep our test environment fresh, we also push changes up from time to time depending on the requirements of the release.
I have looked at the backup and restore process, but it feels cumbersome, especially in the development scenario. Asking someone to (or scripting the) restore on every partition of every stateful service seems like quite a bit of work.
I think that the answer to both of these questions is that I can use the stateful services, but I need to rely on an external data store for anything that I want to retain. The service would check for state when it was activated and use the stateful service almost as a write-through cache. I'm not suggesting that this needs to be a uniform design choice, more on a service by service basis - depending on the service needs.
Does that sound right, am I overthinking this, missing something, etc?
Thanks
Joe
If you want to share Actor state between developers, you can use a shared cluster. (in Azure or on-prem). Make sure you always do upgrade-style deployments, so state will survive. State is persisted if you configure the Actor to do so.
You can migrate data by doing a backup of all replica's of your service and restoring them on a different cluster. (have the service running and trigger data-loss). It's cumbersome yes, but at this time it's the only way. (or store state externally)
Note that state is safe in the cluster, it's stored on disk and replicated. There's no need to have an external store, provided you do regular state backups and keep them outside the cluster. Stateful services can be more than just caches.
First of all, this is a question regarding my thesis for school. I have done some research about this, it seems like a problem that hasn't been tackled yet (might not be that common).
Before jumping right into the problem, I'll give a brief example of my use case.
I have multiple namespaces containing microservices depending on a state X. To manage this the microservices are put in a namespace named after the state. (so namespaces state_A, state_B, ...)
Important to know is that each microservice needs this state at startup of the service. It will download necessary files, ... according to the state. When launching it with state A version 1, it is very likely that the state gets updated every month. When this happens, it is important to let all the microservices that depend on state A upgrade whatever necessary (databases, in-memory state, ...).
My current approach for this problem is simply using events, the microservices that need updates when the state changes can subscribe on the event and migrate/upgrade accordingly. The only problem I'm facing is that while the service is upgrading, it should still work. So somehow I should duplicate the service first, let the duplicate upgrade and when the upgrade is successful, shut down the original. Because of this the used orchestration service would have to be able to create duplicates (including duplicating the state).
My question is, are there already solutions for my problem (and if yes, which ones)? I have looked into Netflix Conductor (which seemed promising with its workflows and events), Amazon SWF, Marathon and Kubernetes, but none of them covers my problem.
Best of all the existing solution should not be bound to a specific platform (Azure, GCE, ...).
For uninterrupted upgrade you should use clusters of nodes providing your service and perform a rolling update, which takes out a single node at a time, upgrading it, leaving the rest of the nodes for continued servicing. I recommend looking at the concept of virtual services (e.g. in kubernetes) and rolling updates.
For inducing state I would recommend looking into container initialization mechanisms. For example in docker you can use entrypoint scripts or in kubernetes there is the concept of init containers. You should note though that today there is a trend to decouple services and state, meaning the state is kept in a DB that is separate from the service deployment, allowing to view the service as a stateless component that can be replaced without losing state (given the interfacing between the service and required state did not change). This is good in scenarios where the service changes more frequently and the DB design less frequently.
Another note - I am not sure that representing state in a namespace is a good idea. Typically a namespace is a static construct for organization (of code, services, etc.) that aims for stability.
i should be able to copy renditions of the asset from the worker instance to master instance and then delete the asset inthe worker instance
using the DAM update asset offloading workflow
In my opinion its not a good practice to update the Update Asset workflow on worker instance -
This whole offloading is based on Sling Discovery and eventing mechanism. Which requires offloaded asset to be sent back (read reverse replication) to Leader instance
Adding a step within Update Asset Workflow may cause issues with reverse replication of asset.
You will have to build something independent of the offloading process to achieve this deletion. There are multiple ways to do it -
One possible way -
Have JMS based implementation to monitor reverse replication
If reverse replication is successful either delete the asset or mark the asset for deletion (highly recommended)
If following the approach of marking asset for deletion, setup a cleanup task to run only of worker instance (scheduled to convenient time). This cleanup task identifies the assets marked for deletion and processes them.
IMHO marking asset for deletion is better approach as its more performant and efficient. All assets are processed at once during off-peak time.
There are other ways to this as well but would require a lot of custom code to be written.
Updates -
Tapping into Reverse Replication -
You need to get into the details of the working of reverse replication.
Content to be reverse replicated is pushed to OUTBOX
If you look at /etc/replication/agents.publish/outbox/jcr:content on your local instance, look for property transportUri which is by default - repo://var/replication/outbox i.e. content to reverse replicated is pushed to '/var/replication/outbox'
Now look at /libs/cq/replication/components/revagent/revagent.jsp, This is the logic that works on the receiving instance.
Going through above will give you deeper understanding of how reverse replication is working.
Now you have two options to implement what you want -
To check the replication status, tap into the replication queue as the code in /libs/cq/replication/components/revagent/revagent.jsp is doing. This is the code that executes on Author instance where the content is reverse replicated, in your case its Leader instance. You will have to works around this code to make it work on Worker instance. To be more specific on implementation your code will update the line Agent agent = agentMgr.getAgents().get(id); where id is the OUTBOX agent id.
Have a event listener monitor the outbox. Check the payload that comes for replication and use it for your functionality.
What I have mentioned is the crude approach that doesn't cover the failover/recovery use-cases, i.e. how would you handle the deletion if your replication queue is blocked for any reason and the images have not been pushed back to leader.
Our org is planning on basing parts of our business model on the premise of recurring workflows in CRM 2011. However, we sometimes run into an issue with a backed up workflow queue, or for some reason need to restart the server (update rollups, etc.), or in some other way find we have to restart the CRM's async service.
What would happen to any workflows in the "waiting" phase in this scenario?
I see the workflow in the AsyncOperationBase table with the "waiting" statuscode; when the service comes back online, does it look at this table and resume accordingly?
In the above scenario, what what would happen if the service was stopped, and in the interim, the workflow reached its PostponeUntil date? Does the service look at all non-complete future and backdated workflows and decide what to with each? Or does the workflow just fail altogether?
Any fails in the process would obviously be a deal breaker for this element of the CRM system, and we'd have to develop an external component to handle recurring items.
I'd expect there to be some documentation on this, but I can only guess that the WaitSubscription class has something to do with this topic, but it's for the most part undocumented.
For now, we've decided to go with an external service to manage this, due to the seemingly black box nature of the async process. Tracing the calls to the database server does show a lot of calls to the AsyncOperationBase table, which tends to make me believe the service always checks to see if a job is being done, but in the absence of extreme testing, for now it's more secure to use a separate service for this requirement.
I'm working on a Drools project which requires pausing of ruleflow (write into database) and resuming the ruleflow (read from database). I know that Drools provides out of the box JPA/transaction style persistence, one I couldn't get it running and second it persists in serialized form, not very useful for my use cases.
What I come up with is for my system to remember the node where the ruleflow is paused (can be done), persist the node id and working facts in database (can be done). Then retrieve these persisted data when resuming the ruleflow, inject them into the knowledge session (can be done) and continue the ruleflow from the paused node (cannot be done). But I have yet to find a way to start processing from a specific node.
Please help, thanks.
I'm having a similar issue (if I understood your question), as far as I know, drools allows only 1 start node per ruleflow, so the only chance I see to start a ruleflow at an arbitrary node is to start right into a gateway (diverge) node, the gateway node should be connected to every node in the ruleflow (or as much as you neeed), the rules defined for it should allow you to start the workflow at any node. Sure this work arround is not pretty but might be enough.
By the way, If you find a better solution please let me know.
Fuanka