Retaining and Migrating Actor / Service State - azure-service-fabric

I've been looking at using service fabric as a platform for a new solution that we are building and I am getting hung up on data / stage management. I really like the concept of reliable services and the actor model and as we have started to prototype out some things it seems be working well.
With that beings said I am getting hung up on state management and how I would use it in a 'real' project. I am also a little concerned with how the data feels like a black box that I can't interrogate or manipulate directly if needed. A couple scenarios I've thought about are:
How would I share state between two developers on a project? I have an Actor and as long as I am debugging the actor my state is maintained, replicated, etc. However when I shut it down the state is all lost. More importantly someone else on my team would need to set up the same data as I do, this is fine for transactional data - but certain 'master' data should just be constant.
Likewise I am curious about how I would migrate data changes between environments. We periodically pull production data down form our SQL Azure instance today to keep our test environment fresh, we also push changes up from time to time depending on the requirements of the release.
I have looked at the backup and restore process, but it feels cumbersome, especially in the development scenario. Asking someone to (or scripting the) restore on every partition of every stateful service seems like quite a bit of work.
I think that the answer to both of these questions is that I can use the stateful services, but I need to rely on an external data store for anything that I want to retain. The service would check for state when it was activated and use the stateful service almost as a write-through cache. I'm not suggesting that this needs to be a uniform design choice, more on a service by service basis - depending on the service needs.
Does that sound right, am I overthinking this, missing something, etc?
Thanks
Joe

If you want to share Actor state between developers, you can use a shared cluster. (in Azure or on-prem). Make sure you always do upgrade-style deployments, so state will survive. State is persisted if you configure the Actor to do so.
You can migrate data by doing a backup of all replica's of your service and restoring them on a different cluster. (have the service running and trigger data-loss). It's cumbersome yes, but at this time it's the only way. (or store state externally)
Note that state is safe in the cluster, it's stored on disk and replicated. There's no need to have an external store, provided you do regular state backups and keep them outside the cluster. Stateful services can be more than just caches.

Related

Handling database changes with Azure deployment slot

I’m currently configuring an environment for a web site. As I’m using azure, I’d like to use deployment slots in order to ensure that users won’t get any downtime. While I understand the goal of deployment slots, I have difficulties to understand how they could be usable in my case.
Basically, the site is using a database that will evolve as the times goes. In other words, most of my releases will alter the schema of the database and I can’t ensure that it will always be backward compatible (indeed I might delete columns or something).
Therefore, there are two solutions in my mind . Either both DS will share the same database or they use different ones.
However, if they share the same database, after the deployment is executed in the staging DS, it can be that the production one start failing (because it’s referencing a deleted column for example). So I can’t use DS the way.
Using two separate databases seems to be a viable option... if they are synchronized. Indeed, if the db of the staging db is only there to validate a deployment, I can’t swap the site to this DS while production DS is updated, as the data of this DB won’t be up to date. Therefore I need to ensure that data are in sync BUT that when the staging gets updated, this sync is somehow paused as the schema won’t be the same anymore...
All the article I read talk about DS without really mentioning the database issue, so I don’t really understand how it is supposed to work. Can some enlighten me a bit please ?

Why should I store kubernetes deployment configuration into source control if kubernetes already keeps track of it?

One of the documented best practices for Kubernetes is to store the configuration in version control. It is mentioned in the official best practices and also summed up in this Stack Overflow question. The reason is that this is supposed to speed-up rollbacks if necessary.
My question is, why do we need to store this configuration if this is already stored by Kubernetes and there are ways with which we can easily go back to a previous version of the configuration using for example kubectl? An example is a command like:
kubectl rollout history deployment/nginx-deployment
Isn't storing the configuration an unnecessary duplication of a piece of information that we will then have to keep synchronized?
The reason I am asking this is that we are building a configuration service on top of Kubernetes. The user will interact with it to configure multiple deployments, I was wondering if we should keep a history of the Kubernetes configuration and the content of configMaps in a database for possible roll backs or if we should just rely on kubernetes to retrieve the current configuration and rolling back to previous versions of the configuration.
You can use Kubernetes as your store of configuration, to your point, it's just that you probably shouldn't want to. By storing configuration as code, you get several benefits:
Configuration changes get regular code reviews.
They get versioned, are diffable, etc.
They can be tested, linted, and whatever else you desired.
They can be refactored, share code, and be documented.
And all this happens before actually being pushed to Kubernetes.
That may seem bad ("but then my configuration is out of date!"), but keep in mind that configuration is actually never in date - just because you told Kubernetes you want 3 replicas running doesn't mean there are, or if there were that 1 isn't temporarily down right now, and so on.
Configuration expresses intent. It takes a different process to actually notice when your intent changes or doesn't match reality, and make it so. For Kubernetes, that storage is etcd and it's up to the master to, in a loop forever, ensure the stored intent matches reality. For you, the storage is source control and whatever process you want, automated or not, can, in a loop forever, ensure your code eventually becomes reflected in Kubernetes.
The rollback command, then, is just a very fast shortcut to "please do this right now!". It's for when your configuration intent was wrong and you don't have time to fix it. As soon as you roll back, you should chase your configuration and update it there as well. In a sense, this is indeed duplication, but it's a rare event compared to the normal flow, and the overall benefits outweigh this downside.
Kubernetes cluster doesn't store your configuration it runs it, as you server runs your application code.

RethinkDB - How to stream data to the browser

Context
Greetings,
One day I randomly found RethinkDB and I was really fascinated by the whole real-time changes thing. In order to learn how to use this tool I quickly spinned up a container running RethinkDB and i started making a small project. I wanted to make something very simple therefore i thought about creating a service in which speakers can create room and the audience can ask questions. Other users can upvote questions in order to let the speaker know which one are the best. Obviously this project has a lot of realtime needs that i believe are best satisfied by using RethinkDB.
Design
I wanted to use a vary specific set of tools for this. The backend would be made in Laravel Lumen, the frontend in Vue.JS and the database of course would be RethinkDB.
The problem
RethinkDB as it seems is not designed to be exposed to the end user directly despite the fact that no security concern exists.
Assuming that the user only needs to see the questions and the upvoted in real time, no write permissions are needed and if a user changed the room ID nothing bad will happen since the rooms are all publicly accessible.
Therefore something is needed in order to await data updates and push it through a socket to the client (socket.io for example or pusher).
Given the fact that the backend is written in PHP i cannot tell Lumen to stay awake and wait for data updates. From what i have seen from the online tutorials a secondary system should be used that should listen for changes and then push them. (lets say a node.js service for example)
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
If I have to send the action from the client's computer (user asks a question), save it to the database, have a script that listens for changes, then push the changes to socket.io and finally have the client (vue.js) act when a new event arrives, what is the point of having a real-time database in the first place?
I could avoid all this headache simply by having the Lumen app push the event directly to socket.io and user any other database system instead.
I really cant understand the point of all this. I am not experienced with no-sql databases by any means but i really want to experiment with them.
Thank you.
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
RethinkDB has no built in mechanism to transfer data to end-users. It has no access control (in the conventional sense) as well. The common way, like you said, is to spin up one / multiple node instance(s) running socket.io. On each instance you can listen on your RethinkDB change streams and use socket.io's broadcast functionality. This would be a common way, but as RethinkDB's streams are pretty optimized, you could also open a change stream for every incoming socket.io connection.

How to manage state in microservices?

First of all, this is a question regarding my thesis for school. I have done some research about this, it seems like a problem that hasn't been tackled yet (might not be that common).
Before jumping right into the problem, I'll give a brief example of my use case.
I have multiple namespaces containing microservices depending on a state X. To manage this the microservices are put in a namespace named after the state. (so namespaces state_A, state_B, ...)
Important to know is that each microservice needs this state at startup of the service. It will download necessary files, ... according to the state. When launching it with state A version 1, it is very likely that the state gets updated every month. When this happens, it is important to let all the microservices that depend on state A upgrade whatever necessary (databases, in-memory state, ...).
My current approach for this problem is simply using events, the microservices that need updates when the state changes can subscribe on the event and migrate/upgrade accordingly. The only problem I'm facing is that while the service is upgrading, it should still work. So somehow I should duplicate the service first, let the duplicate upgrade and when the upgrade is successful, shut down the original. Because of this the used orchestration service would have to be able to create duplicates (including duplicating the state).
My question is, are there already solutions for my problem (and if yes, which ones)? I have looked into Netflix Conductor (which seemed promising with its workflows and events), Amazon SWF, Marathon and Kubernetes, but none of them covers my problem.
Best of all the existing solution should not be bound to a specific platform (Azure, GCE, ...).
For uninterrupted upgrade you should use clusters of nodes providing your service and perform a rolling update, which takes out a single node at a time, upgrading it, leaving the rest of the nodes for continued servicing. I recommend looking at the concept of virtual services (e.g. in kubernetes) and rolling updates.
For inducing state I would recommend looking into container initialization mechanisms. For example in docker you can use entrypoint scripts or in kubernetes there is the concept of init containers. You should note though that today there is a trend to decouple services and state, meaning the state is kept in a DB that is separate from the service deployment, allowing to view the service as a stateless component that can be replaced without losing state (given the interfacing between the service and required state did not change). This is good in scenarios where the service changes more frequently and the DB design less frequently.
Another note - I am not sure that representing state in a namespace is a good idea. Typically a namespace is a static construct for organization (of code, services, etc.) that aims for stability.

Reverting mechanisms in Salt

I'm using Salt for my project deployment. What if the deployment is unsuccessful? Does any mechanism for reverting exist in Salt?
By reverting, I mean that I want to be able to revert my database and my system to the most recent working state.
As far I can tell Salt does not provide virtualization AND/OR snapshot capabilities out of box.
However, Salt state definitions are idempotent, meaning they describe final state and are safe to apply multiple times. Note, that it helps only in case something has been broken in the middle, then the fix is just to retry the state, e.g.
salt minion state.sls mystate
Having said that, if youn seek for some kind of recovery/failover guarantee - there are several ways how to skin the rabbit.
I have no experience in this particular area, so I might be wrong, however I believe SaltStack integrates with Docker and different containerization technologies. So, one way would be to build your failover procedure on top of that.
If your target platform is Cloud (AWS, DigitalOcean etc..) you might adjust you topology in the way you spin new instances from snapshots, perform salt state updates, perform smoke/sanity tests, register instances in LoadBalancer etc... If any of those states fails - no big deal, restart later.