Test/QA/Prod setup with a Kafka Schema Registry - apache-kafka

We are working with Kafka, and have three environments set up: A test environment for the test system, a QA environment and a production environment. Pretty standard. Now we are getting started with using Avro, and have set up a schema registry. The same installation setup: Test, QA and production. But now we get a bit uncertain about how to use the environments together. We have looked around a bit but haven't really found any examples of how to setup a Kafka test/qa/prod environment with a schema registry. These are the three approaches we have discussed internally:
Should we use Schema Registry Prod for all environments? Just as we do with our other artifact repositories.
With Nexus, Artifactory, Harbor etc. we use one instance for handling both developer version and release versions of artifacts. So our initial approach was to do the same with the Schema Registry. But here there is a difference though: with our other artifact repositories we have SNAPSHOT support and different spaces (builds/releases etc.) which we have not seen people using with a Schema Registry. So even though this approach was our initial, and it should work since we plan on using compatibility FULL_TRANSITIVE, we are now getting doubtful about sending each development/test version of a contract all the way to production. For example, FULL_TRANSITIVE would make it impossible to make non-compatible changes even during development.
Use the Test schema registry for test environment, and prod schema registry for Prod environment.
Straight forward, use "Kafka Schema Registry Test" in our test environment, with our Kafka test installation. And "Kafka Schema Registry Prod" only in our production environment. Then our build pipeline would deploy the Avro schemas to the production schema registry at an appropriate stage.
Use snapshots schemas
This would be to try and mimic the setup with our other repositories. That is, we use one schema registry in all our environments (the "Schema Registry Prod"). But during development and test we use a "snapshot" version of the schema (perhaps prepend "-snapshot" to the subject name). In the build pipeline we change to a 'non-snapshot' subject when ready to release.
So we would like to hear how other people work with Avro and Schema Registry. What does your setup look like?

Adding snapshot to the subject name would cause the default TopicNameStrategy serializer function to fail
In a primarily Java dev environment, we do have 3 Registries, but they are only at backwards transitivity, not full
Where possible, we use the Maven Plugins provided by Confluent to test the schema compatibility before registering. This will cause the CI pipeline to fail if the schema is not compatible, and we can use Maven profiles to override the registry url per environment, but the assumption is that if it fails in a lower environment, it'll never reach higher. Test environment can have SNAPSHOT artifacts manually be deployed to, but this is only referenced by code artifacts and not Registry subjects, so schemas need to be manually deleted if there are any mis-registrations. That being said, "staging/qa" and prod are generally the exact same, so barring network connectivity around "production", you'd only need two Registries
For non Java projects, we force them to use a standalone custom Maven archetype repo with the same plugins, so that allows them to version the schema artifacts in Maven, but also allows for JVM consumers to still use their data. It also simplifies the build pipeline support to a standard lifecycle

Related

AWS Glue - version control and setting up for continuous integration

We are in the process of setting up the CI / CD process for AWS Glue ETL Process. The existing ETL process contains the following AWS Glue Components - Crawlers, Registered tables in catalog, Jobs, Triggers and workflows.
Obviously the first step is to set up a code repository and link the existing artifacts from different components mentioned above to the repository, which will ideally need to facilitate the developers in performing the check-ins and pull request from the tool (Something similar to ADF and Databricks). However as far as we have explored, AWS glue does not have integration to any of the source code repository which can directly provide this feature unless we are missing something.
Hence what is the method to setup the environment for CI (I'm still not talking about CD), the below link gives a reference for CI/CD:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/
However it mentions at the beginning that, AWS CloudFormation template file for deploying the ETL jobs are both committed to version control - so not clear on how this is done for the on-going regular commits from the developers.
However as far as we have explored, AWS glue does not have integration
to any of the source code repository which can directly provide this
feature unless we are missing something.
Correct, Glue does not have VC integration.
I develop (python and cloudformation) locally on vscode and use it's git integration plugin. And I use a container if I want to test something locally, but Glue also has a Dev Endpoint for similar tasks.

How to manage logical grouping of microservice based application to ensure version compatibility for CI/CD Pipeline?

For the MicroService Architecture based application, I'm trying to understand a standard process about how to logically group and manage correct version compatibility among independently deployable microservices. Let me elaborate with practical scenario :
Say, I am building a software application which is composed of 10 microservices. All the microservices have their independent repositories(branching workflow etc.) and their separate CI/CD Pipeline.
The CI/CD Pipeline gets triggered whenever any change pushed to 'master' branch for respective microservice.
Considering Helm chart and Kubernetes based deployment, all the microservices will get deployed with version 1.0 for the very first deployment and our system would work. For subsequent releases, we might have only couple of services that will get deploy. So after couple of production releases, each microservice will be at different version to constituent an application at that point of time.
My question is :
How to logically group independently deployable microservices in order to deploy or rollback to earlier release i.e. how to determine what was the version of different microservices for earlier releases?
Is there any existing tool or standard practice to track versions of each microservice for given release to seamlessly rollback to expected release?
If not automated solution, what would be the right approach to address such requirement?
Appreciate your thoughts and suggestion on this.
With consideration kuberenets:
1. Helm is nice tool to deploy and track.
2. Native k8s deployment works nice, you need to use deployment properly especially look --record flag in k8s commands eg check this link
With AWS ECS clusters:
1. they have task definations and tasks. I think that works for you.
Not have pointers for docker-compose, swarm, and other tools. But you can always use the power of git and some scripting.
the idea is make a file that lists all versions of services/containers/code . and commit that file in git with code. Make tag out of it for simplicity. your script should compare this state file and current state and apply specific changes only. Look at git submodules also. it is nothing but a group of many git projects and it tracks status of each project with help of commit id of each project. This helped us in the situation you mention.
This is a fairly new problem, we just launched a new tool Reliza Hub to solve that. Also here is my post on the subject: Microservices – Combinatorial Explosion of Versions. Currently, we are at the MVP stage and a lot of work is going on - see this video tutorial if our direction makes sense for you https://www.youtube.com/watch?v=yDlf5fMBGuI
If you decide to implement and have any questions or need help with integration, just tag me on SO and I'd be very much willing to make it work for you.
To sum up few things that we are doing - we denote developer facing projects (those that map to source code) as Projects and customer facing projects (bundles that customer sees) as Products.
And we say that Products are essentially composition of Projects and provide tooling how you can compile different versions of Projects into what's called a Product bundle. You can then integrate this into any CI or CD tool out there or start manually if you haven't configured CICD yet.
Other than that, yes - I highly recommend helm and kubernetes - this is what we use on newer projects. (And I can also add ArgoCD and Spinnaker to the existing tooling). But it is not enough to track permutations of different versions of microservices and establishing which configurations are good and which are not between different environments.

How to Accomplish This Branching and Deployment Strategy Using TeamCity and Octopus

I have been researching and am trying to figure out the best branching and deployment strategy to accomplish the requirements below. Maybe I’m missing something but it is more complicated than it seems. Ideally, we’d just have one permanent branch, ‘master’, that could have specific commits tagged to mark releases to production.
Our current strategy is based on Git Flow and has permanent branches ‘master’ (only has releases to production) and ‘develop’. The primary thing that complicates using a multiple permanent-branches model is the concept of “promoting” the same build from the staging environment to production. Currently, this needs to be done in a separate source code branch (deployments to staging come from ‘develop’, deployments to prod come from ‘master’).
Tools: Git (VSTS), TeamCity, Octopus Deploy
Requirements (feature and hotfix lifecycles):
All code is reviewed via pull requests (enforced via branch policies)
All code gets deployed to a staging environment for testing
We can quickly go back to any snapshot of code that was deployed previously
If testing is successful, then the same build can be “promoted” from our staging environment to production (no need to build again)
Features accumulate over time before pushing out to production as a single release. Hotfixes have to be able to go through without getting caught up in the "all or nothing" next regular release.
I like the idea of having one permanent branch with tags (re: The master/develop split is redundant, http://endoflineblog.com/gitflow-considered-harmful), but having additional permanent branches may better facilitate deploying to different lifecycles/versions (feature and hotfix) to Octopus.
I have been wrestling with how best to pull this off and I may be over complicating things. Any feedback is appreciated.
It seems you have a number of questions and they are quite broad... I'll add some comments to each of your requirements as a conversation starter, but this whole thread might get blocked by moderators as it is definitely not the style of questions SO was made for.
All code is reviewed via pull requests (enforced via branch policies)
I haven't looked at VSTS for ages, but I'd expect they already support branch policies and pull-requests, so not sure if there's anything you need here other than configure settings in your repositories.
In case VSTS does not support that, you might consider moving to a tool that does e.g. BitBucket, GitHub, etc. Both of these have an on-premises version in case you can't (or don't want to) use the cloud hosted version.
All code gets deployed to a staging environment for testing
You achieve that with setting up lifecycles in Octopus Deploy, to make sure deployments/promotions follow the the sequence you want.
We can quickly go back to any snapshot of code that was deployed previously
You already have source control, so all you need now is traceability from the code that is deployed in an environment, to the deployment version in Octopus Deploy, the build job in TeamCity, the branch and exact commit in your source control.
There's a few things that you can do, to achieve that:
Define a versioning scheme that works for you. I like to use semantic versioning. "Major" and "Minor" versions are defined by the developers, and the "Patch" is the auto-incremented number from TeamCity (%build.number%). Every git push build the code and generates a unique build version (%major%.%minor%.%build.number%)
As part of the build steps in TeamCity, before you compile the code, make sure your source files are patched with the version number assigned by each build, the commit hash from your source control, and the branch name. e.g. if you are using .NET, make sure all the AssemblyInfo.cs files are updated with that version, so that the version is embedded in the binaries. This allows anyone to query the version looking at the properties of the binary files, and also allows you to display the app version on the app itself (e.g. status bar, footer, caption, about box, etc.)
Have TeamCity tag your source control with the version number of every build, so you can quickly see on your source control history. You probably only want to do that for the master branch, though which is what you care about.
Have Octopus tag your source control with the deployment version number and the environment name, so that you can quickly see (from your source control) what got deployed where.
Steps 1 and 2 are the most important ones, really. 3 and 4 are just nice-to-have. Most of the time you'll just open the app in the environment, check the commit hash in the "About", and do a git checkout to that commit hash...
If testing is successful, then the same build can be "promoted" from our staging environment to production (no need to build again)
Again, Octopus Deploy lifecycles, and make sure anything different in each environment is defined in the configuration file of the application, which is updated during the Octopus deployment, using environment-specific variables.
In terms of branch workflow, this last requirement makes it mandatory to merge changes into master (or whatever your "production" branch is) before the deployment lifecycle can begin.

Deploy using artifact from artifactory

Is there a way in Bamboo to deploy artifacts from artifactory rather than only local published artifacts? I've found the Artifactory Plugin but as far as I could see, it only allows for deploying stuff into artifactory.
I'm using Bamboo 5.4.2
You can use your build server to deploy from Artifactory to your application server, that's a very detoured way to go. You already uploaded all the binaries to Artifactory why would you want to download them to the build server again?
You have number of ways to get the needed files to your application server right from Artifactory, without involving the CI server, and the selection depends on how complicated your requirements are. If all you need is to get the latest version of some artifact from Artifactory to app server, tools like LiveRebel are a great match. If you need to do more, e.g. deploy on sophisticated topology of clustered environment with sharded data schema upgrade without downtime, you might need something more free-style like Puppet, Chef, Ansible, or Salt.
In any way, Artifactory Properties and the REST API to work with them are your best friend. Using properties in your REST queries for artifacts allows expressing queries like "Give me all the artifacts that were produced by certain Bamboo build, but only those, which were staged, have the QA level of 'production' and matching the target deployment target".

Release Management to different environments (Dev/QA/Integration/Stable)

I recently joined a company as Release Engineer where a large number of development teams develop numerous services, applications, web-apps in various languages with various inter-dependencies among them.
I am trying to find a way to simplify and preferably automate releases. Currently the release team is doing the following to "release" the software:
CURRENT PROCESS OF RELEASE
Diff the latest revision from SCM between QA and INTEGRATION branches.
Manually copy/paste "relevant" changes between those branches.
Copy the latest binaries to the right location (this is automated using a .cmd script).
Restart any services
MY QUESTION
I am hoping to avoid steps 1. and 2. altogether (obviously), but am running into issues where differences between the environments is causing the config files to be different for different environments (e.g. QA vs. INTEGRATION). Here is a sample:
IN THE QA ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.QA.domain.net/</value>
</setting>
IN THE INTEGRATION ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.integration.domain.net/</value>
</setting>
If you look closely then the only difference between the two <setting> tags above is the URL in the <value> tag. This is because the QA and INTEGRATION environments are in different data-centers and are ever so slightly not in sync (with them growing apart as development gets faster/better/stronger). Changes such as this where the URL/endpoint is different are TO BE IGNORED during "release" (i.e. these are not "relevant" changes to merge from QA to INTEGRATION).
Even in a regular release (about once a week) I have to deal with a dozen config files changes that have to released from QA to integration and I have to manually go through each config file and copy/paste non URL-related changes between the files. I can't simply take an entire package that the CI tool spits out from QA (or after QA), since the URL/endpoints are different.
Since there are multiple programming languages in use, the config file example above could be C#, C++ or Java. So am hoping any solution would be language agnostic.
SUMMARY OF ENVIRONMENTS/PROGRAMMING LANGUAGES/OS/ETC.
Multiple programming languages - C#, C++, Java, Ruby. Management is aware of this as one of the problems, since Release team is has to be king-of-all-trades and is addressing this.
Multiple OS - Windows 2003/2008/2012, CentOS, Red Hat, HP-UX. Management is addressing this too - starting to consolidate and limit to Windows 2012 and CentOS.
SCM - Perforce, TFS. Management is trying to move everyone to a single tool (likely TFS)
CI is being advocated, though not mandatory - Management is pushing change through but is taking time.
I have given example of QA and INTEGRATION, but in reality there is QA (managed by developers+testers), INTEGRATION (managed by my team), STABLE (releases to STABLE by my team but supported by Production Ops), PRODUCTION (supported by Production Ops). These are the official environments - others are currently unofficial, but devs or test teams have a few more. I would eventually want to start standardizing/consolidating these unofficial envs too, since devs+tests should not have to worry about doing this kind of stuff.
There is a lot of work being done to standardize how the binaries are being deployed using tools like DeployIT (http://www.xebialabs.com/products) which may provide some way to simplify these config changes.
The devs teams are agile and release often, but that just means more work diffing config files.
SOLUTIONS SUGGESTED BY TEAM MEMBERS:
Current mind-set is to use a LoadBalancer and standardize names across different environments, but I am not sure if "a process" such as this is the right solution. There must be a better way that can start with how devs write configs to how release environments meet dependencies.
Alternatively some team members are working on install-scripts (InstallShield / MSI) to automate find/replace or URLs/enpoints between envs. I am hoping this is not the solution, but it is doable.
If I have missed anything or should provide more information, please let me know.
Thanks
[Update]
References:
Managing complex Web.Config files between deployment environments - C# web.config specific, though a very good start.
http://www.hanselman.com/blog/ManagingMultipleConfigurationFileEnvironmentsWithPreBuildEvents.aspx - OK, though as a first look, this seems rather rudimentary, that may break easily.
Generally the problem isn't too difficult - you need branches for each of the environments and CI build setup for them. So a merge to the QA branch would trigger a build of that code and a custom deployment to QA. Simple.
Now managing multiple config files isn;t quite so easy (unless you have 1 for each environment, in which case you just call them Int.config, QA.config etc, store them all in the SCM, and pick the appropriate one to use in each branch's deployment script - eg, when the build for QA runs, it picks qa.config and copies it to the correct location and renames it to the correct name)(incidentally, this is the approach I tend to use as its very simple).
If you have multiple configs you need to use, then its always going to be a manual process - but you can help yourself by copying all the relevant configs to a build staging area that an admin will use to perform the deployment. Its a good first step in that the build they have in a staging directory will be the correct one for them, they just have to choose which config to use either during (eg as an option in the installer) or by manually copying the appropriate config over.
I would not try to manage some automated way of taking a single config file in source control and re-writing it with different data in the build, or pre-deploy steps. That way lies madness, and a lot of continual hassle trying to maintain the data and the tooling. Keep separate configs in place and make sure the devs know to update all of them when they make a change. (Or, you can hold 1 config in the SCM tree and make sure they know that merging their changes must not overwrite any existing modifications - multiple configs is easier)
I agree with #gbjbaanb. Have one config for each environment. Get your developers to write apps that read their properties (including their URLs) from config files and commit config files for each environment. Not only does this help you with deployment, but config files under revision control provides reproducibility, full transparency, and an audit trail of your environment specific settings.
Personally, I prefer to create a single deployable package that works on any environment by including all of the environment configs (even the ones you aren't using). You can then have some deployment automation that figures out which config files the apps should use and sets that up appropriately.
Thanks to #gman and #gbjbaanb for the the answers (https://stackoverflow.com/a/16310735/143189, https://stackoverflow.com/a/16246598/143189), but I felt that they didn't help me solve the underlying problem that I am facing, and restating just to make clear.
The code seems very aware of the environment in which they run. How to write environment-agnostic code?
The suggestions in the answers above are to store 1 config file for each environment (environment-config). This is possible, but any addition/deletion/edit of non-environment settings will have to be ported over to each environment-config.
After some study, I wonder if the following would work better?
Keep the config file's structure consistent/standardized e.g. XML. Try to keep the environment-specific endpoints in this config-file but store them in a way that allows easy access to the specific individual nodes/settings (e.g. using XPath).
When deploying to a specific environment, then your deployment tool should be able to parse (e.g. using XPath) and update the environment-specific endpoint to the value for the specific environment to which you are deploying.
The above is not a unique idea. There are some existing implementations that tackle the above solution already:
http://www.iis.net/learn/develop/windows-web-application-gallery/reference-for-the-web-application-package & http://www.iis.net/learn/publish/using-web-deploy/web-deploy-parameterization (WebDeploy)
http://docs.xebialabs.com/releases/3.9/deployit/packagingmanual.html#using-placeholders-in-ci-properties (DeployIt)
Home-spun solutions using XPath find and replace.
In short, while there are programming-language-specific solutions, and programming-language-agnostic solutions, I guess the big downfall is that Release Management needs to be considered during development too, else it will cause deployment headaches - I don't like that, since it sounds like "development should be aware of what tests will be designed". Is there a need AND a way to avoid this, is the big questions.
I'm working through the process of creating a "deployment pipeline" for a web application at the moment and am sifting my way through similar problems. Your environment sounds more complicated than ours, but I've got some thoughts.
First, read this book, I'm 2/3 the way through it and it's answering every question I ever had about software delivery, and many that I never thought to ask: http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912/ref=sr_1_1?s=books&ie=UTF8&qid=1371099379&sr=1-1
Version Control Systems are your best friend. Absolutely everything required to build a deployable package should be retrievable from your VCS.
Use a Continuous Integration server, we use TeamCity and are pretty happy with it so far.
The CI server builds packages that are totally agnostic to the eventual target environment. We still have a lot of code that "knows" about the target environments, which of course means that if we add a new environment, we have to modify all such code to make sure it will cope and then re-test it to make sure we didn't break anything in the process. I now see that this is error-prone and completely avoidable.
Tools like Visual Studio support config file transformation, which we looked at briefly but quickly realized that it depends on environment-specific config files being prepared with the code, by the developers in order to be added to the package. Instead, break out any settings that are specific to a particular environment into their own config mechanism (e.g. another xml file) and have your deployment tool apply this to the package as it deploys. Keep these files in VCS, but use a separate repository so that revisions to config don't trigger new builds and cause the build number to get falsely inflated.
This way, your environment-specific config files only contain things that change on a per-environment basis, and only if that environment needs something different to the default. Contrary to #gbjbaanb's recommendation, we are planning to do whatever is necessary to keep the package "pure" and the environment-specific config separate, even if it requires custom scripting etc. so I guess we're heading down the path of madness. :-)
For us, Powershell, XML and Web Deploy parameterization will be instrumental.
I'm also planning to be quite aggressive about refactoring the config files so that the same information isn't repeated several times in various places.
Good luck!