Version Management in Large SOA/Microservice Architectures - version-control

We are about to embark on a large programme of work to migrate a small number of hugely monolithic 3 tier frameworks into a SOA/Microservice architecture. However there is one thing that I haven't really managed to nail down, version management (note the use of the word management, not control)
One of the core principles of this programme is that each component is absolutely independent, and therefore is designed, developed, built, versioned, deployed, operated, monitored and deprecated independently of all other Consumers and Services. This is the right principle and therefore means that the future holds 15+ clients and 50+ services. In operation we need to quickly and very reliably know all the dependencies. In a world where a service may have 3 or 4 versions of its API in production and a consumer may use 20+ services the dependency tree very quickly becomes large and complex.
So my question is how do you guys manage this? How do you maintain your "enterprise version matrix" (if that is even the correct terminology)?

Related

How do you track your current deployments?

Imagine there is an application consisting from bunch of microservices. All of these microservices can be developed/deployed completely independently from each other. Each microservice can be "described" with several attributes - e.g. current API version, release version, commit hash etc. Along with that, there are several environments used in development process - e.g. Testing environment (often called Sandbox), Staging environment, Pre-Release environment and obviously Production environment.
Is there a convenient tool/way/approach to track, basically, what attribute is currently deployed to which environment? For instance, get a quick access to information like "what is the current version of Restful API at Pre-Release environment"? Or more complex one - "what was this version two month ago"? And of course see the "global picture" as well?
Theres no ready to use solution on the market yet according to my knowledge.
Some teams are using git ops https://www.twistlock.com/2018/08/06/gitops-101-gitops-use/ to get ahead of the chaos challenge a lot of different micro services usually ship with.
Another technology in a somewhat different, yet related direction are micro service meshes, istio https://istio.io/ being one of them.
There are also test approaches like contract testing or heavy integration tests, that are more expensive, but also provide more confidence.

Should actors/services be split into multiple projects?

I'm testing out Azure Service Fabric and started adding a lot of actors and services to the same project - is this okay to do or will I lose any of service fabric features as fail overs, scaleability etc?
My preference here is clearly 1 actor/1 service = 1 project. The big win with a platform like this is that it allows you to write proper microservice-oriented applications at close to no cost, at least compared to the implementation overhead you have when doing similar implementations on other, somewhat similar platforms.
I think it defies the point of an architecture like this to build services or actors that span multiple concerns. It makes sense (to me at least) to use these imaginary constraints to force you to keep the area of responsibility of these services as small as possible - and rather depend on/call other services in order to provide functionality outside of the responsibility of the project you are currently implementing.
In regards to scaling, it seems you'll still be able to scale your services/actors independently even though they are a part of the same project - at least that's implied by looking at the application manifest format. What you will not be able to do, though, are independent updates of services/actors within your project. As an example; if your project has two different actors, and you make a change to one of them, you will still need to deploy an update to both of them since they are part of the same code package and will share a version number.

When does publishing all components of an application become problematic?

A common goal of software design seems to be to structure an application so that changes impact a minimal amount of components (i.e compiled assemblies) which can then be published individually. Dependency Inversion Principle is applied so stable components don't depend on volatile ones, classes are packaged in a way again to limit a deployment to a minimal set of components.
So my question is, what is wrong with publishing an entire application in its entirety for each change? Especially if a publish is a completely automated 1-click solution. Surely deploying components individually means I then have to version and manage them individually as mini projects with their own dependencies? A large chunk of 'good design' seems to hang on this one principle that publishing each component separately is a good thing, but why?
What is wrong with publishing an entire application in its entirety
for each change?
I don't think there is anything wrong if you can manage it. It seems this is the approach that Facebook takes:
Because Facebook's entire code base is compiled down to a single
binary executable, the company's deployment process is quite different
from what you'd normally expect in a PHP environment. Rossi told me
that the binary, which represents the entire Facebook application, is
approximately 1.5GB in size. When Facebook updates its code and
generates a new build, the new binary has to be pushed to all of the
company's servers.
Partial deployments can sometimes be useful to maintain a degree of autonomy between component teams but they can result in unexpected behaviours if the particular version-combination of components hasn't been tested.
However, the motivation for modular design is more around ease of change (evolvability, maintainability, low coupling, high cohesion) than the ability to do partial deployments.

How does "distributed computing" apply to web development or programming in general?

I am about to use Apache Hadoop, the headlines read:
The Apache Hadoop project develops
open-source software for reliable,
scalable, distributed computing.
I can relate "scalability" to programming, but I just don't know how this "distributing" can help me in my development. According to wikipedia:
A distributed system consists of
multiple autonomous computers that
communicate through a computer
network. The computers interact with
each other in order to achieve a
common goal
So does this mean I can deploy my web apps across multiple computers and do some sort of "intense computing"? The terms that come into my mind are Content Delivery Networks and Cloud Computing.
Web development has always been about distributed computing, since clients have been on different machines to the servers they talk to, web pages can pull in resources from many servers to build a page's content, and servers may talk to other machines to achieve their goals. CDNs make this more obvious than before, but really they're just an evolution, an introduction of a virtualization/indirection layer between what you ask for and the hardware used to provide it.
Clouds are about taking the concepts of virtualization and applying them to remote hosting, both of low-level OSes and higher-level software platforms. The really interesting thing about them is that this enables different business models on the part of customers (and with different risks too, but that's mostly not related to the fact that it's distributed computing but rather that it is not wholly under your control in your own jurisdiction).
I've found that the most effective use of distributed computing is when you think in terms of connecting together distinct services, each of which with different capabilities (which might be for technical reasons, or might not; sometimes, it's for business or legal reasons that things have to be divided up) and where each of those services may be provided by many components in multiple locations. There are, and continue to remain, issues with balancing the need for performance (which is a force that brings components together) and the need for robustness (which tends to lead to distribution and replication) within the overall context of the general capabilities map.
My goodness! That paragraph sounds like terrible piffle! What I'm trying to say is that it's all trade-offs, and you should be prepared for not getting it right first time.
(Hadoop is a mechanism for doing a distributed file store, and for efficiently applying certain classes of operation – those that fit well with MapReduce or other similar scatter-gather algorithms – across that whole dataset. If that shoe fits, use it. But it doesn't solve all problems, and thank goodness for that! Things that can do everything tend to look very much like things that can't actually do anything at all, and usefulness and comprehensibility come in the restrictions.)
Hadoop is typically used to process massive data sets by distributing the processing of that data set across multiple machines.
What this means is you probably don't want to use it to "deploy an application". You might use it to process stats on your application, however. For instance, you might have very large logs of user data. This would happen if your user data grows to become too large to fit on a single hard drive, and/or would take too long for one machine to process stats on (using standard methods like an SQL query).
Ygam. While the traditional role of "clients" and "servers" have been pretty stable from 1960 till about 2005.
I believe with every fiber of my being, that distributed computing is that we all carry processors around in our pockets.
Phones do computing work. Phones do NOT need centralized servers, but they DO benefit from them.
Phones , Smartphones, tablets are an example of where distributed computation is going.
You can make a wifi base-station out of an Android device now. So now a phone becomes a server of sorts for just that instant in the coffee shop that you turn it on for that cute person next to you without internet ....and now I digress.......

What are common practices for deployment of large scale systems?

Given a large scale software project with several components written in different languages, configuration files, configuration scripts, environment settings and database migration scripts - what are the common practices for deployment to production?
What are the difficulties to consider? Can the process be simplified with tools like Ant or Maven? How can rollback and database management be handled? Is it advisable to use version control for the production environment?
As I see it, you're mostly asking about best practices and tools for release engineering AKA releng -- it's important to know the "term of art" for a subject, because it makes it much easier to search for more information.
A configuration management system (CMS -- aka revision control system or version control system) is indispensable for today's software development; if you use one or more IDEs, it's also nice to have good integration between them and the CMS, though that's more of an issue for purposes of development than for purposes of deployment / releng.
From a releng viewpoint, the key thing about a CMS is that it must have good support for "branching" (under whatever name), because releases must be made from a "release branch" where all the code under development and all of its dependencies (code and data) are in a stable "snapshot" from which the exact, identical configuration can be reproduced at will.
The need for good branching support may be more obvious if you have to maintain multiple branches (customized for different uses, platforms, whatever), but even if your releases are always, strictly in a single linear sequence, releng best practices still dictate making a release branch. "Good branching support" includes ease of merging (and "conflict resolution" when different changes are made to a file), "cherry-picking" (taking one patch or changeset from one branch, or the head/trunk, and applying it to another branch), and the like.
In practice, you start off the release process by making a release branch; then, you run exhaustive testing on that branch (typically MUCH more than what you run everyday in your continuous build -- including extensive regression testing, integration testing, load testing, performance verification, etc, and possibly even more costly quality assurance processes, depending). If and when the exhaustive testing and QA reveal defects in the release-candidate (including regressions, performance degradation, etc), they must be fixed; in a large team, development on head/trunk may be continuing while the QA is being done, whence the need for ease of cherry-picking / merging / etc (whether your practice is to perform the fix on head or on the release branch, it still needs to be merged to the other side;-).
Last but not least, you're NOT getting full releng value from your CMS unless you're somehow tracking with it "everything" that your releases depend on -- simplest would be to have copies or hard links to all the binaries for the tools you need to build your release, etc, but that may often be impractical; so at the very least track the exact release, version, bugfix &c numbers of those tools that are used (operating system, compilers, system libraries, tools that preprocess image, sound or video files into final form, etc, etc). The key is being able, at need, to exactly reproduce the environment required to rebuild the exact version that's proposed for release (otherwise you'll go mad tracking down subtle bugs that may depend on third party tools' changes as their versions change;-).
After a CMS, the second most important tool for releng is a good issue tracking system -- ideally one that's well integrated with the CMS. That's also important for the development process (and other aspects of product management), but in terms of the release process the importance of the issue tracker is the ability to easily document exactly what bugs have been fixed, what features have been added, removed, or changed, and what modifications in performance (or other user-observable characteristics) are expected in the new forthcoming release. For the purpose, a key "best practice" in development is that every changeset that gets committed to the CMS must be connected to one (or more) issue in the issue tracking system: after all, there's gotta be some purpose for that change (fix a bug, change a feature, optimize something, or some internal refactor that's supposed to be invisible to the software's user); similarly, every tracked issue that's marked as "closed" must be connected to one (or more) changesets (unless the closing is of the "won't fix / working as intended" kind; issues related to bugs &c in third-party components, which have been fixed by the third-party supplier, are easy to treat similarly if you do manage to keep track of all third-party components in the CMS too, see above; if you don't, at least there should be text files under CMS documenting third-party components and their evolution, again see above, and they need to be changed when some tracked issue on a 3rd party component gets closed).
Automating the various releng processes (including building, automated testing, and deployment tasks) is the third top priority -- automated processes are much more productive and repeatable than asking some poor individual to manually go through a list of steps (for sufficiently complex tasks, of course, the workflow of the automation may need to "get a human being in the loop"). As you surmise, tools such as Ant (and SCons, etc, etc) can help here, but inevitably (unless you're lucky enough to get away with very simple and straightforward processes) you'll find yourself enriching them with ad-hoc scripts &c (some powerful and flexible scripting language such as perl, python, ruby, &c, will help). A "workflow engine" can also be precious when your release workflow is sufficiently complex (e.g. involving specific human beings or groups thereof "signing off" on QA compliance, legal compliance, UI guidelines compliance, and so forth).
Some other specific issues you're asking about vary enormously depending on specifics of your environment. If you can afford programmed downtime, your life is relatively easy, even with a large database in play, as you can operate sequentially and deterministically: you shut the existing system down gracefully, ensure the current database is saved and backed up (easing rollback, in the hopefully VERY rare case it's needed), run the one-off scripts for schema migration or other "irreversible" environment changes, fire the system back up again in a mode that's still unaccessible to general users, run another extensive suite of automated tests -- and finally if everything's gone smoothly (including the saving and backup of the DB in its new state, if relevant) the system's opened up to general use again.
If you need to update a "live" system, without downtime, this may range anywhere from a slight inconvenience to a systematic nightmare. In the best case, transactions are reasonably short, and synchronization between the state set by transactions may be delayed a bit without damage... and you have a reasonable abundance of resources (CPUs, storage, &c). In this case, you run two systems in parallel -- the old one and the new one -- and just make sure all new transactions target the new system, while letting old ones complete on the old system. A separate task periodically syncs "new data in the old system" to the new system, as transactions on the old system terminate. Eventually you can determine that no transactions are running on the old system and all changes that happened there are synced up to the new one -- and at that time you can finally shut down the old system. (You need to be prepared to "reverse sync" too of course, in case a rollback of the change is needed).
This is the "simple, sweet" extreme for live system updating; at the other extreme, you can find yourself in such an overconstrained situation that you can prove the task is impossible (you just cannot logically meet all the stated requirements with the given resources). Long sessions opened on the old system which just cannot be terminated -- scarce resources that make it impossible to run two systems in parallel -- core requirements for hard-real-time sync of every transaction -- etc, etc, can all make your life miserable (and as I noticed, at the extreme, can make the stated task absolutely impossible). The two best things you can do about this: (1) ensure you have abundant resources (this will also save your skin when some server unexpected goes belly-up... you'll have another one to fire up to meet the emergency!-); (2) consider this predicament from the start, when initially defining the architecture of the overall system (e.g.: prefer short-lived transactions to long-lived sessions that just can't be "snapshot, closed down, and restarted seamlessly from the snapshot", is one good arcitectural pointer;-).
disclaimer: where I work we use something I wrote that I'm going to mention
I'll tell you how I do it.
For configuration settings, and general deployment of code and content, I use a combination of NAnt, a CI server, and dashy (an automatic deployment tool). You can replace dashy with any other 'thing', that automates the uploads to your server (perhaps capistrano).
For DB scripts, we use RedGate SQL Compare to get the scripts, and then for most changes, I actually make them manually, where appropriate. This is because some of the changes are slightly complex, and I feel more comfortable doing it by hand. You can actually use this tool to do it for you, (or at least generate the scripts).
Depending on your language, there are some tools that can script the DB updates for you (I think someone on this forum wrote one; hopefully he'll reply), but I have no experience with those. It's something I'd like to add though.
Difficulties
I forgot to answer one of your questions.
The biggest problem in updating any significantly complicated/distributed site is DB synchronisation. You need to consider if you will have any downtime, and if you do, what will happen to the DBs. Will you shutdown everything, so that no transactions can be processed? Or will you shift everything to one server, update DB B, and then sync DB A & B, then update DB B? Or something else?
Whatever you choose, you need to choose it, and either say "Okay for each update, there will be X downtime", or whatever. Just have it documented.
The worst thing you can do is have someones transaction fail, mid processing, because you were updating that server; or to somehow leave only part of your system operational.
I don't think you have any option about using versioning or not.
You cannot go without versioning (as mentioned in a comment).
I am talking from experience since I am currently working on a website where we have several items that have to work together:
The software itself, its functionalities at a given point in time
The external systems (6 of them) we are linked with (messages are versioned)
A database, which holds the configuration (and translations)
The static resources (images, css, javascripts) hosted on an Apache server (several in fact)
A java applet, which has to be synchronized with the javascript and software
This works because we use versioning, although I must admit that our database is pretty simple, and because the deployment is automated.
Versioning means that we have actually at any point in time several concurrent versions of the messages, the database, the static resources and the java applet.
It is especially important in the event of a 'fallback'. If you discover a flaw when loading you new software and suddenly you cannot afford to load, you'll have a crisis if you have not versioned, and you'll just have to load the old soft if you have.
Now as I said, the deployment is scripted:
- the static resources are deployed first (+ the java applet)
- the database goes next, several configurations cohabit since they are versioned
- the soft is queued for load in a 'cool' time-window (when our traffic is at its lowest point, so at night)
And of course we take care of backward / forward compatibility issues with the external servers, which should not be underestimated.