Monorepo microservices architecture - Lerna powered by Nx - github

Background
We have a Lerna/NX monorepo with multiple packages/subrepos within it, and each one currently has it’s own associated workflows for GitHub.
We are now trying to build out some APIs using a microservice architecture within that monorepo, and have a few snagging queries regarding this.
Question 1 - shared dependencies
We will have a minimum of 5 microservices for our MVP (more planned in the future) all consumed via an API Gateway. These microservices are going to be sharing a lot of dependencies, with a few unique dependencies for specific microservices depending on its purpose, such as 3rd party SDKs/prisma client/etc.
Is there a way we can utilise a “shared” package.json for the shared dependencies, whilst also populating it on build with the unique dependencies which would be housed within the package.json within each microservice?
See below image for an example of the folder structure with a “root” package.json that houses the shared dependencies, and within each microservice “sub-package” we would house the microservice specific dependencies.
OSC-API microservices structure
The purpose of this is to avoid the tedious task of going through each package individually to update the shared dependencies. Instead we would just have to update the “shared” package.json.
If this is something that just screams bad practice or “SHOULDN'T be done” feel free to say so.
Question 2 - GitHub workflows
Is there a way we can easily create a singular workflow that would cater for all microservices in 1 .yml file?
This would need to individually build the affected packages based on commits in the PR triggering the workflow. I.e. if Microservice 1, has commits, but nothing else then only microservice 1 gets built and deployed. However, the caveat is if there IS a “shared” package.json then any changes to this should cause all microservices to be rebuilt with the new dependencies.
It is also worth noting here, we do not have a large degree of DevOps experience, so what we currently have is a lot of trial and error, and as such would like to keep it somewhat simplified. However, if there is ample explanation to more complex suggestions would not mind trying them out.
The reason we want a single file is because we already have 3/4 workflows for each package in our monorepo, and as soon as we start building out these microservices, the workflows folder is going to start getting very bulky very quickly if we have to target each microservice with an individual workflow.yml
We have considered using the “working-directory” flag (https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-set-the-default-shell-and-working-directory), but it does not seem to work with marketplace actions that do not allow you to specify subfolders
We have also considered moving the individual packages to the root of the monorepo, this results in multiple workflows, it seems to work alright, are there any negatives associated with doing it in this manner
eg.
- run: rm -v -r ./package.json ./node_modules
- run: mv -v -f packages/osc-package1/* .

Related

Semantic Versioning on multiple services in the same Github repository using GH Actions

Our team uses a mono-repo, with several microservices, and some common packages between them.
I am tasked with adding CI/CD automation, and traditionally I rely in Git tags for the sem-ver and utilize comments to decide on major/minor/patch. The semantic-release node library does an good job of automating this.
The problem here is that it is a mono-repo and thus commits and tags are only useful across a global sem-ver. However in my case I have multiple microservices that each will have their own sem-ver.
One thought I have is maintaining a json manifest to store the versions of the services. By blocking direct pushes to the main branch, I can guarentee this file would not be changed on master except by the CI/CD actions.
I also would like to get some ideas from the community on what they would do in this situation? Or what they have done similar to this in the past?

How to manage logical grouping of microservice based application to ensure version compatibility for CI/CD Pipeline?

For the MicroService Architecture based application, I'm trying to understand a standard process about how to logically group and manage correct version compatibility among independently deployable microservices. Let me elaborate with practical scenario :
Say, I am building a software application which is composed of 10 microservices. All the microservices have their independent repositories(branching workflow etc.) and their separate CI/CD Pipeline.
The CI/CD Pipeline gets triggered whenever any change pushed to 'master' branch for respective microservice.
Considering Helm chart and Kubernetes based deployment, all the microservices will get deployed with version 1.0 for the very first deployment and our system would work. For subsequent releases, we might have only couple of services that will get deploy. So after couple of production releases, each microservice will be at different version to constituent an application at that point of time.
My question is :
How to logically group independently deployable microservices in order to deploy or rollback to earlier release i.e. how to determine what was the version of different microservices for earlier releases?
Is there any existing tool or standard practice to track versions of each microservice for given release to seamlessly rollback to expected release?
If not automated solution, what would be the right approach to address such requirement?
Appreciate your thoughts and suggestion on this.
With consideration kuberenets:
1. Helm is nice tool to deploy and track.
2. Native k8s deployment works nice, you need to use deployment properly especially look --record flag in k8s commands eg check this link
With AWS ECS clusters:
1. they have task definations and tasks. I think that works for you.
Not have pointers for docker-compose, swarm, and other tools. But you can always use the power of git and some scripting.
the idea is make a file that lists all versions of services/containers/code . and commit that file in git with code. Make tag out of it for simplicity. your script should compare this state file and current state and apply specific changes only. Look at git submodules also. it is nothing but a group of many git projects and it tracks status of each project with help of commit id of each project. This helped us in the situation you mention.
This is a fairly new problem, we just launched a new tool Reliza Hub to solve that. Also here is my post on the subject: Microservices – Combinatorial Explosion of Versions. Currently, we are at the MVP stage and a lot of work is going on - see this video tutorial if our direction makes sense for you https://www.youtube.com/watch?v=yDlf5fMBGuI
If you decide to implement and have any questions or need help with integration, just tag me on SO and I'd be very much willing to make it work for you.
To sum up few things that we are doing - we denote developer facing projects (those that map to source code) as Projects and customer facing projects (bundles that customer sees) as Products.
And we say that Products are essentially composition of Projects and provide tooling how you can compile different versions of Projects into what's called a Product bundle. You can then integrate this into any CI or CD tool out there or start manually if you haven't configured CICD yet.
Other than that, yes - I highly recommend helm and kubernetes - this is what we use on newer projects. (And I can also add ArgoCD and Spinnaker to the existing tooling). But it is not enough to track permutations of different versions of microservices and establishing which configurations are good and which are not between different environments.

Single CI config for multiple repositories

seeking for advice about such problem.
We have stack of microservices written on NodeJs and running on Kubernetes cluster. We have separate GitHub repository for each of them and currently using Circleci for our CI/CD process. As of now we have about 25-30 repos, but their number will increase and problem that we faced now is that we need to have Circleci config yaml in each repository and if we need to change something globally in our ci/cd pipeline, we need to update this in each repository, which is obviously pretty painful process and Circleci doesn't support to have one config file for multiple repos.
I believe our situation/setup in terms of multiple repos is not unique, does anybody have experience/ideas of which CI tool support described scenario of having one config file for multiple repos?
Below are 2 approaches that I considered when had to deal with similar situation. You'd need to define for yourself what you want to optimize for and make a decision based on that
Optimizing for flexibility and isolation. In this scenario instead of making all repos use the same config file, you're keeping the file in each repo and automating how you manage this file.
For example: you'll have to create a CLI tool or a script to automate copying circle file and committing to appropriate repos (whenever a change needs to happen)
PROS: isolation - all repos have their own configuration, if you ever going to have a golang microservice or different config in one of your nodejs services, modifying CI pipeline wouldn't be an issue
CONS: a bit of extra work to write automation around managing this config separately
Optimizing for easier maintainability. Figure how to share single pipeline configuration across your repos.
For example: use git submodules for keeping circle.yml file, or use separate npm package with circle.yml file. Another alternative is to use a CI tool that supports templating, then define pipeline template and re-use it for each individual pipeline (one of the CI tools that supports it - Teamcity)
I personally picked approach #1 in similar situation. IMHO, this is a price one have to pay when one decides to go with microservices to not end up with a platform that is rather a distributed monolith :) also I really liked when all repos are descriptive and self contained and CI pipeline as code is one of the ways to help achieve that
In my mind you have 2 options - you could have a single CI job/config that can deploy any single/multiple services (if all the services are the same). Or if every service is different than you need a separate job/config for each. If it's somewhere in the middle it's a question of whether you want a single job that has a bunch of if/then statements e.g. "if repo = user then do this special thing." The if/then approach worked fine for me up to a point, but eventually, there were too many special cases at it was easier to just go with the unique config for each service.
I solved the issue of it "being hard to make a 1 line change across 30 git repos" by having a git superuser. Basically, normal users can only merge using PRs, but the superuser can commit directly. Since I'm only changing things like config files there are rarely merge conflicts or broken test cases so it works. Here's some sample code:
#!/usr/bin/env bash
for dir in /temp/*/
do
cd $dir
git pull
sed 's/Nick/John/g' report.txt > report_new.txt
git commit -m "CI change" && git push
cd ..
done

Release Management to different environments (Dev/QA/Integration/Stable)

I recently joined a company as Release Engineer where a large number of development teams develop numerous services, applications, web-apps in various languages with various inter-dependencies among them.
I am trying to find a way to simplify and preferably automate releases. Currently the release team is doing the following to "release" the software:
CURRENT PROCESS OF RELEASE
Diff the latest revision from SCM between QA and INTEGRATION branches.
Manually copy/paste "relevant" changes between those branches.
Copy the latest binaries to the right location (this is automated using a .cmd script).
Restart any services
MY QUESTION
I am hoping to avoid steps 1. and 2. altogether (obviously), but am running into issues where differences between the environments is causing the config files to be different for different environments (e.g. QA vs. INTEGRATION). Here is a sample:
IN THE QA ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.QA.domain.net/</value>
</setting>
IN THE INTEGRATION ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.integration.domain.net/</value>
</setting>
If you look closely then the only difference between the two <setting> tags above is the URL in the <value> tag. This is because the QA and INTEGRATION environments are in different data-centers and are ever so slightly not in sync (with them growing apart as development gets faster/better/stronger). Changes such as this where the URL/endpoint is different are TO BE IGNORED during "release" (i.e. these are not "relevant" changes to merge from QA to INTEGRATION).
Even in a regular release (about once a week) I have to deal with a dozen config files changes that have to released from QA to integration and I have to manually go through each config file and copy/paste non URL-related changes between the files. I can't simply take an entire package that the CI tool spits out from QA (or after QA), since the URL/endpoints are different.
Since there are multiple programming languages in use, the config file example above could be C#, C++ or Java. So am hoping any solution would be language agnostic.
SUMMARY OF ENVIRONMENTS/PROGRAMMING LANGUAGES/OS/ETC.
Multiple programming languages - C#, C++, Java, Ruby. Management is aware of this as one of the problems, since Release team is has to be king-of-all-trades and is addressing this.
Multiple OS - Windows 2003/2008/2012, CentOS, Red Hat, HP-UX. Management is addressing this too - starting to consolidate and limit to Windows 2012 and CentOS.
SCM - Perforce, TFS. Management is trying to move everyone to a single tool (likely TFS)
CI is being advocated, though not mandatory - Management is pushing change through but is taking time.
I have given example of QA and INTEGRATION, but in reality there is QA (managed by developers+testers), INTEGRATION (managed by my team), STABLE (releases to STABLE by my team but supported by Production Ops), PRODUCTION (supported by Production Ops). These are the official environments - others are currently unofficial, but devs or test teams have a few more. I would eventually want to start standardizing/consolidating these unofficial envs too, since devs+tests should not have to worry about doing this kind of stuff.
There is a lot of work being done to standardize how the binaries are being deployed using tools like DeployIT (http://www.xebialabs.com/products) which may provide some way to simplify these config changes.
The devs teams are agile and release often, but that just means more work diffing config files.
SOLUTIONS SUGGESTED BY TEAM MEMBERS:
Current mind-set is to use a LoadBalancer and standardize names across different environments, but I am not sure if "a process" such as this is the right solution. There must be a better way that can start with how devs write configs to how release environments meet dependencies.
Alternatively some team members are working on install-scripts (InstallShield / MSI) to automate find/replace or URLs/enpoints between envs. I am hoping this is not the solution, but it is doable.
If I have missed anything or should provide more information, please let me know.
Thanks
[Update]
References:
Managing complex Web.Config files between deployment environments - C# web.config specific, though a very good start.
http://www.hanselman.com/blog/ManagingMultipleConfigurationFileEnvironmentsWithPreBuildEvents.aspx - OK, though as a first look, this seems rather rudimentary, that may break easily.
Generally the problem isn't too difficult - you need branches for each of the environments and CI build setup for them. So a merge to the QA branch would trigger a build of that code and a custom deployment to QA. Simple.
Now managing multiple config files isn;t quite so easy (unless you have 1 for each environment, in which case you just call them Int.config, QA.config etc, store them all in the SCM, and pick the appropriate one to use in each branch's deployment script - eg, when the build for QA runs, it picks qa.config and copies it to the correct location and renames it to the correct name)(incidentally, this is the approach I tend to use as its very simple).
If you have multiple configs you need to use, then its always going to be a manual process - but you can help yourself by copying all the relevant configs to a build staging area that an admin will use to perform the deployment. Its a good first step in that the build they have in a staging directory will be the correct one for them, they just have to choose which config to use either during (eg as an option in the installer) or by manually copying the appropriate config over.
I would not try to manage some automated way of taking a single config file in source control and re-writing it with different data in the build, or pre-deploy steps. That way lies madness, and a lot of continual hassle trying to maintain the data and the tooling. Keep separate configs in place and make sure the devs know to update all of them when they make a change. (Or, you can hold 1 config in the SCM tree and make sure they know that merging their changes must not overwrite any existing modifications - multiple configs is easier)
I agree with #gbjbaanb. Have one config for each environment. Get your developers to write apps that read their properties (including their URLs) from config files and commit config files for each environment. Not only does this help you with deployment, but config files under revision control provides reproducibility, full transparency, and an audit trail of your environment specific settings.
Personally, I prefer to create a single deployable package that works on any environment by including all of the environment configs (even the ones you aren't using). You can then have some deployment automation that figures out which config files the apps should use and sets that up appropriately.
Thanks to #gman and #gbjbaanb for the the answers (https://stackoverflow.com/a/16310735/143189, https://stackoverflow.com/a/16246598/143189), but I felt that they didn't help me solve the underlying problem that I am facing, and restating just to make clear.
The code seems very aware of the environment in which they run. How to write environment-agnostic code?
The suggestions in the answers above are to store 1 config file for each environment (environment-config). This is possible, but any addition/deletion/edit of non-environment settings will have to be ported over to each environment-config.
After some study, I wonder if the following would work better?
Keep the config file's structure consistent/standardized e.g. XML. Try to keep the environment-specific endpoints in this config-file but store them in a way that allows easy access to the specific individual nodes/settings (e.g. using XPath).
When deploying to a specific environment, then your deployment tool should be able to parse (e.g. using XPath) and update the environment-specific endpoint to the value for the specific environment to which you are deploying.
The above is not a unique idea. There are some existing implementations that tackle the above solution already:
http://www.iis.net/learn/develop/windows-web-application-gallery/reference-for-the-web-application-package & http://www.iis.net/learn/publish/using-web-deploy/web-deploy-parameterization (WebDeploy)
http://docs.xebialabs.com/releases/3.9/deployit/packagingmanual.html#using-placeholders-in-ci-properties (DeployIt)
Home-spun solutions using XPath find and replace.
In short, while there are programming-language-specific solutions, and programming-language-agnostic solutions, I guess the big downfall is that Release Management needs to be considered during development too, else it will cause deployment headaches - I don't like that, since it sounds like "development should be aware of what tests will be designed". Is there a need AND a way to avoid this, is the big questions.
I'm working through the process of creating a "deployment pipeline" for a web application at the moment and am sifting my way through similar problems. Your environment sounds more complicated than ours, but I've got some thoughts.
First, read this book, I'm 2/3 the way through it and it's answering every question I ever had about software delivery, and many that I never thought to ask: http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912/ref=sr_1_1?s=books&ie=UTF8&qid=1371099379&sr=1-1
Version Control Systems are your best friend. Absolutely everything required to build a deployable package should be retrievable from your VCS.
Use a Continuous Integration server, we use TeamCity and are pretty happy with it so far.
The CI server builds packages that are totally agnostic to the eventual target environment. We still have a lot of code that "knows" about the target environments, which of course means that if we add a new environment, we have to modify all such code to make sure it will cope and then re-test it to make sure we didn't break anything in the process. I now see that this is error-prone and completely avoidable.
Tools like Visual Studio support config file transformation, which we looked at briefly but quickly realized that it depends on environment-specific config files being prepared with the code, by the developers in order to be added to the package. Instead, break out any settings that are specific to a particular environment into their own config mechanism (e.g. another xml file) and have your deployment tool apply this to the package as it deploys. Keep these files in VCS, but use a separate repository so that revisions to config don't trigger new builds and cause the build number to get falsely inflated.
This way, your environment-specific config files only contain things that change on a per-environment basis, and only if that environment needs something different to the default. Contrary to #gbjbaanb's recommendation, we are planning to do whatever is necessary to keep the package "pure" and the environment-specific config separate, even if it requires custom scripting etc. so I guess we're heading down the path of madness. :-)
For us, Powershell, XML and Web Deploy parameterization will be instrumental.
I'm also planning to be quite aggressive about refactoring the config files so that the same information isn't repeated several times in various places.
Good luck!

Version control vs. automatic distribution vs. dependency management

Imagine a distributed software system, installed on a group of a few hundred computers (nodes). Nodes are responsible for automatically running scheduled tasks. There are hundreds of tasks, and every task is scheduled to run on about 5-10 nodes. Nodes may stop for days, and may be removed from the system. Every task is defined by one or more source files, and node-specific config files. The code is developed and tested directly on nodes (using remote access), since only these are equipped with the special hardware and have the network context required to run the tasks (building a separate test system would be too expensive). The source files of every task refer to shared source files (libraries), and libraries may refer to other libraries. The dependency tree of tasks and libraries is complicated.
I don't have any experience with distributed version control systems, but I feel that this system could be built around a DVCS. Different libraries, and source files of different tasks, would have their own repository. Every node which runs a given task should have an instance of the repo of that task. The repo of every library, used by at least one task of a node, should also be present on that node. Developers would modify and commit code locally on nodes, and distribute the modifications to repos on other nodes using DVCS techniques.
Question #1
What would be the best approach to distribute code changes to other nodes?
Some possible scenarios:
Developers push their modifications to every other node which has an instance the same repo. (But they may forget/don't have time to do so.)
Nodes automatically pull every change from every other remote repo, and update themselves. (But there may be conflicts.)
For each repo, one of the instances is used as a "reference". Developers push their modifications to this instance, and every other node having an instance automatically pulls from here and updates itself. (But the node having the reference instance may stop.)
Question #2
What would be the best way to handle dependencies?
If more than one tasks (or libraries) refer to the same library, and the referred library has to be modified, one or more referring tasks (or libraries) may stop working (dependency hell). It would be better to stick with the originally referred version, and upgrade to the new one after proper testing. That is, more than one version of the same source file should be present in the same repo, which does not seem possible. Do I have to create a new branch for the new version of the referred library? If yes, how should I upgrade the referring repos?
Thank you for your help.
I don't have any experience with distributed version control systems,
but I feel that this system could be built around a DVCS.
Wrong feeling, in common. VCS (SCM) is Version Control System (Source Control Management), i.e - track
changes
in historical aspect mostly
as flat array without complex dependences (some dependences are still taken into account)
You have to see at another category of tools - configuration management software, which handle deploys, policies, complex dependences, conditions etc natively
You can get some iteration to CM with DVCS, but it will be hard work and pale semblance of existing well-established tools as a result