Node specific configuration in JBoss Clustering - jboss

I have two nodes in a cluster; I am allowing the users to have node specific configurations like logging level, local cache settings etc; at a time, it has become really difficult to manage these settings because the user has to know or remember the configuration applied on a particular node - move node after node before finding that particular node; is there any standard or known way to manage these nodes from a single place? like, from the httpd server itself or have one node as master and remember other nodes?

Well, there is no standard-way of doing that, but one pretty good working solution is to save the configuration in SVN, do a checkout and every time you do a modification, you can see the modifications by svn diff or svn st. We use this solution to roll out our JBoss-servers and it's working fine for us.

Related

In a Kubernetes Deployment when should I use deployment strategy Recreate

Kubernetes provides us two deployment strategies. One is Rolling Update and another one is Recreate. I should use Rolling Update when I don't want go off air. But when should I be using Recreate?
There are basically two reasons why one would want/need to use Recreate:
Resource issues. Some clusters simply do not have enough resources to be able to schedule additional Pods, which then results in them being stuck and the update procedure with it. This happens especially for local development clusters and/or applications that consume large amout resources.
Bad applications. Some applications (especially legacy or monolithic setups) simply cannot handle it when new Pods - that do the exact same thing as they do - spin up. There are too many reasons as to why this may happen to cover all of them here but essentially it means that an application is not suitable for scaling.
+1 to F1ko's answer, however let me also add a few more details and some real world examples to what was already said.
In a perfect world, where every application could be easily updated with no downtime we would be fully satisfied having only Rolling Update strategy.
But as the world isn't a perfect place and all things don't go always so smoothly as we could wish, in certain situations there is also a need for using Recreate strategy.
Suppose we have a stateful application, running in a cluster, where individual instances need to comunicate with each other. Imagine our aplication has recently undergone a major refactoring and this new version can't talk any more to instances running the old version. Moreover, we may not even want them to be able to form a cluster together as we can expect that it may cause some unpredicteble mess and in consequence neither old instances nor new ones will work properly when they become available at the same time. So sometimes it's in our best interest to be able to first shutdown every old replica and only once we make sure none of them is runnig, spawn a replica that runs a new version.
It can be the case when there is a major migration, let's say a major change in database structure etc. and we want to make sure that no pod, running the old version of our app is able to write any new data to the db during the migration.
So I would say, that in majority of cases it is very application-specific, individual scenario involving major migrations, legacy applications etc which would require accepting a certain downtime and Recreate all the pods at once, rather then updating them one-by-one like in Rolling Update strategy.
Another example which comes to my mind. Let's say you have an extremely old version of Mongodb running in a replicaset consisting of 3 members and you need to migrate it to a modern, currently supported version. As far as I remember, individual members of the replicaset can form a cluster only if there is 1 major version difference between them. So, if the difference is of 2 or more major versions, old and new instances won't be able to keep running in the same cluster anyway. Imagine that you have enough resources to run only 4 replicas at the same time. So rolling update won't help you much in such case. To have a quorum, so that the master can be elected, you need at least 2 members out of 3 available. If the new one won't be able to form a cluster with the old replicas, it's much better to schedule a maintanance window, shut down all the old replicas and have enough resources to start 3 replicas with a new version once the old ones are removed.

Why should I store kubernetes deployment configuration into source control if kubernetes already keeps track of it?

One of the documented best practices for Kubernetes is to store the configuration in version control. It is mentioned in the official best practices and also summed up in this Stack Overflow question. The reason is that this is supposed to speed-up rollbacks if necessary.
My question is, why do we need to store this configuration if this is already stored by Kubernetes and there are ways with which we can easily go back to a previous version of the configuration using for example kubectl? An example is a command like:
kubectl rollout history deployment/nginx-deployment
Isn't storing the configuration an unnecessary duplication of a piece of information that we will then have to keep synchronized?
The reason I am asking this is that we are building a configuration service on top of Kubernetes. The user will interact with it to configure multiple deployments, I was wondering if we should keep a history of the Kubernetes configuration and the content of configMaps in a database for possible roll backs or if we should just rely on kubernetes to retrieve the current configuration and rolling back to previous versions of the configuration.
You can use Kubernetes as your store of configuration, to your point, it's just that you probably shouldn't want to. By storing configuration as code, you get several benefits:
Configuration changes get regular code reviews.
They get versioned, are diffable, etc.
They can be tested, linted, and whatever else you desired.
They can be refactored, share code, and be documented.
And all this happens before actually being pushed to Kubernetes.
That may seem bad ("but then my configuration is out of date!"), but keep in mind that configuration is actually never in date - just because you told Kubernetes you want 3 replicas running doesn't mean there are, or if there were that 1 isn't temporarily down right now, and so on.
Configuration expresses intent. It takes a different process to actually notice when your intent changes or doesn't match reality, and make it so. For Kubernetes, that storage is etcd and it's up to the master to, in a loop forever, ensure the stored intent matches reality. For you, the storage is source control and whatever process you want, automated or not, can, in a loop forever, ensure your code eventually becomes reflected in Kubernetes.
The rollback command, then, is just a very fast shortcut to "please do this right now!". It's for when your configuration intent was wrong and you don't have time to fix it. As soon as you roll back, you should chase your configuration and update it there as well. In a sense, this is indeed duplication, but it's a rare event compared to the normal flow, and the overall benefits outweigh this downside.
Kubernetes cluster doesn't store your configuration it runs it, as you server runs your application code.

How to manage state in microservices?

First of all, this is a question regarding my thesis for school. I have done some research about this, it seems like a problem that hasn't been tackled yet (might not be that common).
Before jumping right into the problem, I'll give a brief example of my use case.
I have multiple namespaces containing microservices depending on a state X. To manage this the microservices are put in a namespace named after the state. (so namespaces state_A, state_B, ...)
Important to know is that each microservice needs this state at startup of the service. It will download necessary files, ... according to the state. When launching it with state A version 1, it is very likely that the state gets updated every month. When this happens, it is important to let all the microservices that depend on state A upgrade whatever necessary (databases, in-memory state, ...).
My current approach for this problem is simply using events, the microservices that need updates when the state changes can subscribe on the event and migrate/upgrade accordingly. The only problem I'm facing is that while the service is upgrading, it should still work. So somehow I should duplicate the service first, let the duplicate upgrade and when the upgrade is successful, shut down the original. Because of this the used orchestration service would have to be able to create duplicates (including duplicating the state).
My question is, are there already solutions for my problem (and if yes, which ones)? I have looked into Netflix Conductor (which seemed promising with its workflows and events), Amazon SWF, Marathon and Kubernetes, but none of them covers my problem.
Best of all the existing solution should not be bound to a specific platform (Azure, GCE, ...).
For uninterrupted upgrade you should use clusters of nodes providing your service and perform a rolling update, which takes out a single node at a time, upgrading it, leaving the rest of the nodes for continued servicing. I recommend looking at the concept of virtual services (e.g. in kubernetes) and rolling updates.
For inducing state I would recommend looking into container initialization mechanisms. For example in docker you can use entrypoint scripts or in kubernetes there is the concept of init containers. You should note though that today there is a trend to decouple services and state, meaning the state is kept in a DB that is separate from the service deployment, allowing to view the service as a stateless component that can be replaced without losing state (given the interfacing between the service and required state did not change). This is good in scenarios where the service changes more frequently and the DB design less frequently.
Another note - I am not sure that representing state in a namespace is a good idea. Typically a namespace is a static construct for organization (of code, services, etc.) that aims for stability.

Dataproc node setup

I understand google dataproc clusters are equipped to handle initialization actions - which are executed on creation of every node. However, this is only reasonable for small actions, and would not do well with creating nodes with tons of dependencies and software for large pipelines. Thus, I was wondering - is there anyway to load nodes as custom images or have an image spin up once the node is created that has all the installs on it, so you don't have to download things again and again.
Good question.
As you note, initialization actions are currently the canonical way to install stuff on Clusters when they are created. If you have a ton of dependancies, or need to do things like compile from source, those initialization actions may take a bit.
We have support for a better method to handle customizations on our long-term roadmap. This may be via custom images or some other mechanism.
In the interim, scaling clusters up/down may provide some relief if you want to keep some of the customizations in place and split the difference between boot time and the persistence of your cluster. Likewise, if there are any precompiled packages, those always save time.

Is it safe to cloud sync TFS workspaces?

Please excuse a newbie question, but I've always used SVN and more recently, Git. Just now am touching TFS for the first time.
If I have two different machines that I work on regularly, can I safely keep the project files in sync using something like Dropbox/Sugarsync/Skydrive?
Are there any pros/cons to be aware of?
(I know that some of you might ask something like why not just checkout on the other machine. Just trying to save a step. I want to just pick up the other machine and do what I need to do without having to check out anything.)
TFS workspaces contain information about the machine name and user that created them, however if you're using local workspaces and you're not putting any server-side locks on files then I suppose you could sync them via dropbox and it should probably work just fine.
That said, I'd never recommend it.
You're not only going to sync all your code but also all the binaries that you're producing each and every time you compile, plus you won't have any change history between machines and you need to keep monitoring the drop box app to make sure things have synced fully before switching machines.
If you want to move changes between two machines I'd recommend using shelvesets. It only takes a few seconds to do and you'll have a more explicit update process between machines. You can be sure of what is happening in your code on each machine and you have an implicit rollback point if you realise you put something in the shelveset you didn't want.