How to scale buildbot in a company

How to scale buildbot in a company - buildbot

I've been looking into buildbot lately, and the lack of good documentation and sample configurations makes it hard to understand how buildbot is commonly used.
According to the buildbot manual, each buildmaster is responsible for 1 code base. That means that a company that wants to use buildbot on, say, 10 projects, needs to maintain 10 different sets of buildbot installations (master-slaves configurations, open ports, websites with output etc.). Is this really the way things are done? Am I missing an option that creates a mash-up that is easy to maintain and monitor?
Thanks!

At my place of work we use Buildbot to test a single program over several architectures and versions of Python. I use one build master to oversee about 16 slaves. Each set of slaves pulls from a different repo and tests it against Python 2.X.
From my experience, it would be easy to configure a single build master to run a mash-up of projects. This might not be a good idea because the waterfall page (where the build slaves report results) can get very congested with more than a few slaves. If you are comfortable scrolling through a long waterfall page, then this will not be an issue.
EDIT:
The update command in master.cfg:
test_python26_linux.addStep(ShellCommand, name = "update pygr",
command = ["/u/opierce/PygrBuildBot/update.sh","000-buildbot","ctb"], workdir=".")
000-buildbot and ctb are additional parameters to specify which branch and repo to pull from to get the information. The script update.sh is something I wrote to avoid an unrelated git problem. If you wanted to run different projects, you could write something like:
builder1.addStep(ShellCommand, name = "update project 1",
command = ["git","pull","git://github.com/your_id/project1.git"], workdir=".")
(the rest of builder1 steps)
builder2.addStep(ShellCommand, name = "update project 2",
command = ["git","pull","git://github.com/your_id/project2.git"], workdir=".")
(the rest of builder2 steps)
The two projects don't have to be related. Buildbot creates a directory for each builder and runs all the steps in that directory.

FYI, BuildBot 0.8.x supports several repositories on one master, simplifying things a bit.

Related

Single CI config for multiple repositories

seeking for advice about such problem.
We have stack of microservices written on NodeJs and running on Kubernetes cluster. We have separate GitHub repository for each of them and currently using Circleci for our CI/CD process. As of now we have about 25-30 repos, but their number will increase and problem that we faced now is that we need to have Circleci config yaml in each repository and if we need to change something globally in our ci/cd pipeline, we need to update this in each repository, which is obviously pretty painful process and Circleci doesn't support to have one config file for multiple repos.
I believe our situation/setup in terms of multiple repos is not unique, does anybody have experience/ideas of which CI tool support described scenario of having one config file for multiple repos?

Below are 2 approaches that I considered when had to deal with similar situation. You'd need to define for yourself what you want to optimize for and make a decision based on that
Optimizing for flexibility and isolation. In this scenario instead of making all repos use the same config file, you're keeping the file in each repo and automating how you manage this file.
For example: you'll have to create a CLI tool or a script to automate copying circle file and committing to appropriate repos (whenever a change needs to happen)
PROS: isolation - all repos have their own configuration, if you ever going to have a golang microservice or different config in one of your nodejs services, modifying CI pipeline wouldn't be an issue
CONS: a bit of extra work to write automation around managing this config separately
Optimizing for easier maintainability. Figure how to share single pipeline configuration across your repos.
For example: use git submodules for keeping circle.yml file, or use separate npm package with circle.yml file. Another alternative is to use a CI tool that supports templating, then define pipeline template and re-use it for each individual pipeline (one of the CI tools that supports it - Teamcity)
I personally picked approach #1 in similar situation. IMHO, this is a price one have to pay when one decides to go with microservices to not end up with a platform that is rather a distributed monolith :) also I really liked when all repos are descriptive and self contained and CI pipeline as code is one of the ways to help achieve that

In my mind you have 2 options - you could have a single CI job/config that can deploy any single/multiple services (if all the services are the same). Or if every service is different than you need a separate job/config for each. If it's somewhere in the middle it's a question of whether you want a single job that has a bunch of if/then statements e.g. "if repo = user then do this special thing." The if/then approach worked fine for me up to a point, but eventually, there were too many special cases at it was easier to just go with the unique config for each service.
I solved the issue of it "being hard to make a 1 line change across 30 git repos" by having a git superuser. Basically, normal users can only merge using PRs, but the superuser can commit directly. Since I'm only changing things like config files there are rarely merge conflicts or broken test cases so it works. Here's some sample code:
#!/usr/bin/env bash
for dir in /temp/*/
do
cd $dir
git pull
sed 's/Nick/John/g' report.txt > report_new.txt
git commit -m "CI change" && git push
cd ..
done

Multiple SonarQube analysis on one pull-request

We used to have a big project that had SonarQube analysis run on it for every pull-request on GitHub. Everything worked fine.
Then we did some refactoring, and split the code into separate projects. Since the code is related, the repo is still the same. But, instead of running just one build+analysis we run multiple ones per pull-request.
Everything else works fine, except that the SonarQube GitHub plugin writes the problems found in the first build, then removes them in the second build and so on. So I get an email about problems in the first build, but when I go and look at the PR in GitHub, it's all green and no messages anywhere.
Optimally I would like to specify to SonarQube GH plugin that these builds should be handled as separate in the PR, but I haven't found a way to do that yet.

What you are trying to achieve is not possible with the SonarQube GitHub plugin. If you want PR analysis back, you have 2 ways:
Either you gather those projects under the same umbrella, making them modules of a top project
Or you extract them in different repositories
The best solution depends on how your "new" projects are coupled to each other. If they have the same lifecycle (~ the same versioning scheme), then it's best to gather them under a top project. If not (i.e. they can be released independently with different versions), then moving them to dedicated repositories would be the best approach.

It is possible, but requires a complex setup:
- A SonarQube project for each language.
- A Github user for each language
- In each SonarQube project, under the General Settings -> Pull Requests, set a different access token to post back to github for each project.
Now you will have 2 different commenters, one for each project.

Build Flow vs Build Pipeline

I'm trying to split up a few Jenkins jobs using the Build Flow plugin so that instead of three monolithic jobs, we have three "starting points" that then use the DSL to trigger downstream jobs. I chose Build Flow over the Build Pipeline plugin because it seemed like it was a lot harder to share jobs between different pipelines ( ie, sharing the workspace of the multiple starting jobs with a single compile job ).
Previously, I had three jobs set up: Project-PR, Project-DEV, and Project-PROD.
Project-PR would build whenever a pull request happened in GitHub, and would just run a smaller subset of our unit tests, so that we could get quick verification that the PR is okay to merge.
Project-DEV would build whenever a feature branch was merged in GitHub into the main development branch, as well as having the ability to be manually triggered and given a different branch to pull. It would run the full suite of unit -- basically a sanity check that everything is still good. Then it would compile and minify, and push to a QA environment for testing, and then it would run the full suite of integration tests against that QA environment. This step was configured as a parametrized build, with the parameter being the name of the branch to pull, test, and push. It would push to and set up QA environment specific to that branch, so that we could QA multiple features without having to merge to development ( ie, feature-one.qa.example.com, feature-two.qa.example.com ).
Project-PROD would only ever be manually triggered, and would do the full unit and integration test suite, compile and minify the front-end code ( Less, JS, and CSS ), and push the built code into a special "release branch" in GitHub that can then be deployed -- we haven't quite reached the point of Jenkins being in charge of deployment.
Now, what I wanted to set up was to split the subtasks into their own jobs, so that it'd be easy to set up new jobs to without having to copy and paste all the build steps ( or copying the job and changing all the things that need to be unique ). This would let us do things like create a copy of the Project-DEV, but switch out the last job for one that deploys to a staging environment set up in the cloud. Or easily create a job that could report test results to a third party source, ie copy the results to a shared network folder or something. Or any number of things. The goal is basically to use these subtask jobs as building blocks to let us build more complicated jobs, while also making it easier to update how one portion of the build works ( for example, maybe we switch to a different technology for compiling, which might change how Jenkins would compile the code ).
For example, the Project-PR would be split into the following:
Project-PULL -> Project-SetupBuildEnv -> Project-PartialUnitTests
(BuildFlow) (Normal Job) (Normal Job)
The SetupBuildEnv would just pull down any NPM or Composer requirements, and set up the directories required for testing and building. PartialUnitTests then run, and report it's results back up to the
The Project-DEV could be split up like so:
Project-DEV -> Project->SetupBuildEnv -> Project-FullUnitTests -> Project-Compile -> Project-Minify -> Project-DeployQA -> Project-FullIntegrationTests
This way, the parts of the build process that are shared ( in this case, Project-SetupBuildEnv ) can be easily shared between jobs, reducing duplication, and making it easier to update a step in the build process without having to remember EVERY job that uses that step.
Right now, I'm using the Shared Workspace plugin so that all the steps use the same workspace. However, I'm running into an issue with that: it's not actually using one workspace. What's happening is that the Build Flow job will get a directory ( eg: /sharedspace/shared_one ), and download the code from GitHub into there. Then it will trigger the DSL, which starts up the 'SetupBuildEnv' job. But instead of working inside the same directory, it will get a directory with a name like "/sharedspace/shared_one#2", and run the build setup task in there. Then when it goes to do the third step ( unit testing ), it fails, because now it's got a third directory ( /sharedspace/shared_one#3 ), but that directory didn't have the setup run, so the required node and composer modules are missing. What's weird is that it looks like the Shared Workspace plugin is copying the first shared workspace to another directory and incrementing a counter ( the #N part of the directory name ) and giving that to the other jobs to work in.
So, question time:
is there a way to fix the Shared Workspace plugin so that it's actually only using one directory for each job?
if not, is it possible to have the Clone Workspace plugin take an argument, so I can specify what archived workspace to use instead of using the dropdown?
another possiblity: would using the shared workspace plugin, but use the "Local subdirectory for repo (optional)" option in the advanced git job options to specify the directory to use?
failing all that, is there some other way to set up a build pipeline that can share jobs with other pipelines that I've missed out on?

In my experience, even if you do get this working, this might not be a scalable way to go longer term. We've found the shared workspace plugin entirely a bad idea for long / complex builds (similar reasons to yours - but also: scaling across dozens of slaves becomes hard suddenly). Arguably the idea is slightly against the spirit of modern scalable CI.
I'd instead delegate more to your build tools, be they Maven / Gradle, Ant, even Grunt, whatever. If you want to keep these builds truly modular, but can't afford to rebuild at each step (we decided full independence was worth wasting a few minutes per build) then perhaps look at creating useful artefacts at key stages - in your case, minified assets TARs, library JARs, or maybe webjars, or whatever, and deploy them to a (Maven?) repo.
Later build steps in your pipeline can quickly, easily, and repeatably pull the latest (or named version) assets from this centralised repo, and continue with the build process.
An alternative (with similarities) is to build one or more assets, but only promote them after increasing numbers of tests are run, which can be done in separate builds coordinated by your build flow, using the Promoted Builds plugin etc.

One job with different steps for different branches

I am currently in a weird situation. We have one repository with multiple branches (CI and an official branch). Our current job structure has two jobs, one for each branch, for each platform we support. Our official build has different steps than our development build, all of which can be run using a script. My question is can we set up one job per platform that will look at both of the branches and run different steps according to what branch is changed? I am aware of the environment vars that get set with the github plugin. Is this something that would need to be utilized?
Thanks!

Release Management to different environments (Dev/QA/Integration/Stable)

I recently joined a company as Release Engineer where a large number of development teams develop numerous services, applications, web-apps in various languages with various inter-dependencies among them.
I am trying to find a way to simplify and preferably automate releases. Currently the release team is doing the following to "release" the software:
CURRENT PROCESS OF RELEASE
Diff the latest revision from SCM between QA and INTEGRATION branches.
Manually copy/paste "relevant" changes between those branches.
Copy the latest binaries to the right location (this is automated using a .cmd script).
Restart any services
MY QUESTION
I am hoping to avoid steps 1. and 2. altogether (obviously), but am running into issues where differences between the environments is causing the config files to be different for different environments (e.g. QA vs. INTEGRATION). Here is a sample:
IN THE QA ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.QA.domain.net/</value>
</setting>
IN THE INTEGRATION ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.integration.domain.net/</value>
</setting>
If you look closely then the only difference between the two <setting> tags above is the URL in the <value> tag. This is because the QA and INTEGRATION environments are in different data-centers and are ever so slightly not in sync (with them growing apart as development gets faster/better/stronger). Changes such as this where the URL/endpoint is different are TO BE IGNORED during "release" (i.e. these are not "relevant" changes to merge from QA to INTEGRATION).
Even in a regular release (about once a week) I have to deal with a dozen config files changes that have to released from QA to integration and I have to manually go through each config file and copy/paste non URL-related changes between the files. I can't simply take an entire package that the CI tool spits out from QA (or after QA), since the URL/endpoints are different.
Since there are multiple programming languages in use, the config file example above could be C#, C++ or Java. So am hoping any solution would be language agnostic.
SUMMARY OF ENVIRONMENTS/PROGRAMMING LANGUAGES/OS/ETC.
Multiple programming languages - C#, C++, Java, Ruby. Management is aware of this as one of the problems, since Release team is has to be king-of-all-trades and is addressing this.
Multiple OS - Windows 2003/2008/2012, CentOS, Red Hat, HP-UX. Management is addressing this too - starting to consolidate and limit to Windows 2012 and CentOS.
SCM - Perforce, TFS. Management is trying to move everyone to a single tool (likely TFS)
CI is being advocated, though not mandatory - Management is pushing change through but is taking time.
I have given example of QA and INTEGRATION, but in reality there is QA (managed by developers+testers), INTEGRATION (managed by my team), STABLE (releases to STABLE by my team but supported by Production Ops), PRODUCTION (supported by Production Ops). These are the official environments - others are currently unofficial, but devs or test teams have a few more. I would eventually want to start standardizing/consolidating these unofficial envs too, since devs+tests should not have to worry about doing this kind of stuff.
There is a lot of work being done to standardize how the binaries are being deployed using tools like DeployIT (http://www.xebialabs.com/products) which may provide some way to simplify these config changes.
The devs teams are agile and release often, but that just means more work diffing config files.
SOLUTIONS SUGGESTED BY TEAM MEMBERS:
Current mind-set is to use a LoadBalancer and standardize names across different environments, but I am not sure if "a process" such as this is the right solution. There must be a better way that can start with how devs write configs to how release environments meet dependencies.
Alternatively some team members are working on install-scripts (InstallShield / MSI) to automate find/replace or URLs/enpoints between envs. I am hoping this is not the solution, but it is doable.
If I have missed anything or should provide more information, please let me know.
Thanks
[Update]
References:
Managing complex Web.Config files between deployment environments - C# web.config specific, though a very good start.
http://www.hanselman.com/blog/ManagingMultipleConfigurationFileEnvironmentsWithPreBuildEvents.aspx - OK, though as a first look, this seems rather rudimentary, that may break easily.

Generally the problem isn't too difficult - you need branches for each of the environments and CI build setup for them. So a merge to the QA branch would trigger a build of that code and a custom deployment to QA. Simple.
Now managing multiple config files isn;t quite so easy (unless you have 1 for each environment, in which case you just call them Int.config, QA.config etc, store them all in the SCM, and pick the appropriate one to use in each branch's deployment script - eg, when the build for QA runs, it picks qa.config and copies it to the correct location and renames it to the correct name)(incidentally, this is the approach I tend to use as its very simple).
If you have multiple configs you need to use, then its always going to be a manual process - but you can help yourself by copying all the relevant configs to a build staging area that an admin will use to perform the deployment. Its a good first step in that the build they have in a staging directory will be the correct one for them, they just have to choose which config to use either during (eg as an option in the installer) or by manually copying the appropriate config over.
I would not try to manage some automated way of taking a single config file in source control and re-writing it with different data in the build, or pre-deploy steps. That way lies madness, and a lot of continual hassle trying to maintain the data and the tooling. Keep separate configs in place and make sure the devs know to update all of them when they make a change. (Or, you can hold 1 config in the SCM tree and make sure they know that merging their changes must not overwrite any existing modifications - multiple configs is easier)

I agree with #gbjbaanb. Have one config for each environment. Get your developers to write apps that read their properties (including their URLs) from config files and commit config files for each environment. Not only does this help you with deployment, but config files under revision control provides reproducibility, full transparency, and an audit trail of your environment specific settings.
Personally, I prefer to create a single deployable package that works on any environment by including all of the environment configs (even the ones you aren't using). You can then have some deployment automation that figures out which config files the apps should use and sets that up appropriately.

Thanks to #gman and #gbjbaanb for the the answers (https://stackoverflow.com/a/16310735/143189, https://stackoverflow.com/a/16246598/143189), but I felt that they didn't help me solve the underlying problem that I am facing, and restating just to make clear.
The code seems very aware of the environment in which they run. How to write environment-agnostic code?
The suggestions in the answers above are to store 1 config file for each environment (environment-config). This is possible, but any addition/deletion/edit of non-environment settings will have to be ported over to each environment-config.
After some study, I wonder if the following would work better?
Keep the config file's structure consistent/standardized e.g. XML. Try to keep the environment-specific endpoints in this config-file but store them in a way that allows easy access to the specific individual nodes/settings (e.g. using XPath).
When deploying to a specific environment, then your deployment tool should be able to parse (e.g. using XPath) and update the environment-specific endpoint to the value for the specific environment to which you are deploying.
The above is not a unique idea. There are some existing implementations that tackle the above solution already:
http://www.iis.net/learn/develop/windows-web-application-gallery/reference-for-the-web-application-package & http://www.iis.net/learn/publish/using-web-deploy/web-deploy-parameterization (WebDeploy)
http://docs.xebialabs.com/releases/3.9/deployit/packagingmanual.html#using-placeholders-in-ci-properties (DeployIt)
Home-spun solutions using XPath find and replace.
In short, while there are programming-language-specific solutions, and programming-language-agnostic solutions, I guess the big downfall is that Release Management needs to be considered during development too, else it will cause deployment headaches - I don't like that, since it sounds like "development should be aware of what tests will be designed". Is there a need AND a way to avoid this, is the big questions.

I'm working through the process of creating a "deployment pipeline" for a web application at the moment and am sifting my way through similar problems. Your environment sounds more complicated than ours, but I've got some thoughts.
First, read this book, I'm 2/3 the way through it and it's answering every question I ever had about software delivery, and many that I never thought to ask: http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912/ref=sr_1_1?s=books&ie=UTF8&qid=1371099379&sr=1-1
Version Control Systems are your best friend. Absolutely everything required to build a deployable package should be retrievable from your VCS.
Use a Continuous Integration server, we use TeamCity and are pretty happy with it so far.
The CI server builds packages that are totally agnostic to the eventual target environment. We still have a lot of code that "knows" about the target environments, which of course means that if we add a new environment, we have to modify all such code to make sure it will cope and then re-test it to make sure we didn't break anything in the process. I now see that this is error-prone and completely avoidable.
Tools like Visual Studio support config file transformation, which we looked at briefly but quickly realized that it depends on environment-specific config files being prepared with the code, by the developers in order to be added to the package. Instead, break out any settings that are specific to a particular environment into their own config mechanism (e.g. another xml file) and have your deployment tool apply this to the package as it deploys. Keep these files in VCS, but use a separate repository so that revisions to config don't trigger new builds and cause the build number to get falsely inflated.
This way, your environment-specific config files only contain things that change on a per-environment basis, and only if that environment needs something different to the default. Contrary to #gbjbaanb's recommendation, we are planning to do whatever is necessary to keep the package "pure" and the environment-specific config separate, even if it requires custom scripting etc. so I guess we're heading down the path of madness. :-)
For us, Powershell, XML and Web Deploy parameterization will be instrumental.
I'm also planning to be quite aggressive about refactoring the config files so that the same information isn't repeated several times in various places.
Good luck!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse