Github pull request hooks, static code analysis and provisional rollback

Github pull request hooks, static code analysis and provisional rollback - github

Does github provide
hooks to setup scripts to be run on every pull request (where say, one could call a simple static code analyser script)
and a provision to reject the pull request, based on the results from that
script run via pull request hook.
Am trying to setup a pre-screener mechanism to catch trivial bugs/mistakes so that the reviewers are not bothered about trivial mistakes and they could focus more on the logic/feature. And if the prescreening script finds that the source in question doesn't fit the norms (typically, when even the simplest of checks fail; e.g, a function with >5000 SLoC, or unsafe strcpy(), or inclusion of deprecated header files etc), it should return a failure and pull request itself should fail unless the minimum gating criteria is met.
Since the code is on github rather than a local server, this seems to be kinda tricky.
I got a couple of pointers (here, and here) but still couldn't gather the details fully. The codebase consists of multiple repositories on github. Is there a better way to achieve and accomplish this? Please share your thoughts on possible approaches. Thanks!

Does GitHub provide hooks to setup scripts to be run on every pull request (where say, one could call a simple static code analyser script) and a provision to reject the pull request, based on the results from that script run via pull request hook.
This should be achievable through the GitHub Apis, by combining the creation of a hook for events of type pull_request, and then decorating the commits with a resulting status.
This is quite a low-level approach, but this lets you keep a complete control over what is being done. For instance, you could automatically add comments to the Pull Requests or even close them if they do not pass the analysis process.
Another higher level approach would be to leverage the Travis CI services through the addition of a .travis.yml file in your repository. Travis is free for opensource projects and also offers paid services for private repositories.
Setting up Travis is quite easy and tweaking the build script is a breeze.
Below two sample Travis scripts for your inspiration:
LibGit2: a C library. Build with several compilers, run tests, run Valgrind. The build fails (and the PR is decorated as such) when the code doesn't compile or upon a test failure.
LibGit2Sharp: A C# binding for LibGit2. Build against Mono Xbuild compiler, run tests. The build fails (and the PR is decorated as such) when the code doesn't compile or upon a test failure.
The official announcement for the GitHub Commit Status services can be read in this blog post.

You may have use for this:
https://github.com/tomasbjerre/violation-comments-to-github-lib
It will parse the file system to find report files from static code analyzers, and then use those to comment the pull request in GitHub.

Related

Are projects using GitHub checks having continuous integration?

I am doing data analysis on GitHub projects and I want to filter projects having continuous integration (on GitHub).
There are two types of checks and statuses on GitHub: Checks and Statuses! Projects can use GitHub Apps to run checks or mark their commits with external services (CI or other) [source]. My question here is: does having GitHub Checks (or statuses) results available for a project mean that the project is using CI for sure? If not what other factors should be presented to say a project is having continuous integration?

Possibly. But you can't be sure. It means that some check runs and some status is updated. But without looking at the automation, there is no way to conclude that Continuous integration takes place.
Maybe it checks whether the contributor has signed a contribution agreement.
Maybe it checks for the presence of a Issue id or an attachment.
Maybe it updated some external system (like Service Now) so the issue can be tracked there as well...
Checks and statuses are used in many different ways.
And Continuous Integration looks different for different technologies. Some languages need to be compiled, others won't need that. They'd hopefully have some kind of tests to validate nothing broke during integration, but there is no surefire way to know as it may simply be running a script or using a test framework or something else.
You can probably easily conclude that the absence of checks and statuses likely means that CI isn't being performed (even that can't be said with 100% certainty as an external system may be performing the CI, just not reporting back the status). The presence if checks and statuses means that something happens. But you likely need to dig a bit deeper to classify whether the thing that happens constitutes to CI.

Versioning, CI, build automation best practices

I am looking for some advice on the best practice for versioning software.
Background
Build automation with gradle.
Continuous integration with Jenkins
CVS as SCM
Semantic Versioning
Sonatype Nexus inhouse repo
Question
Lets say I make a change to come code. An automated CI job will pull it in and run some tests against it. If these tests should pass, should Jenkins update the version of the code and push it to nexus? Should it be pushed up as a "SNAPSHOT"? Should it be pushed up to nexus at all, or instead just left in the repository until I want to do a release?
Thanks in advance

I know you said you are using CVS, but first of all, have you checked the git-flow methodology?
http://nvie.com/posts/a-successful-git-branching-model/
I have little experience with CVS, but it can be applied to it, and a good versioning and CI procedure begins with having well defined branches, basically at least one for the latest release, and one for the latest in-development version.
With this you can tell the CI application what it is working with.
Didn't have time for a more detailed answer before, so I will extend now, trying to give a general answer.
Branches
Once you have clearly defined branches you can control your work flow. For example, it is usual having a 'master' and a 'develop' branches, where the master will contain the latest release, and develop will contain the next release.
This means you can always point to the latest release of the code, it is in the master branch, while the next version is in the develop branch. Of course, this can be more detailed, such as tagging the master branch for the various releases, or having an additional branch for each main feature, but it is enough having these two.
Following this, if you need to change something to the code, you edit the develop branch, and make sure it is all correct, then keep making changes until you are happy with the current version, and move this code to the master.
Tests
Now, how to make sure all is correct and valid? By including tests in your project. There is a lot which can be read about testing, but let's keep it simple. There are two main types of tests:
White box tests, where you know the insides of the code, and prepare the tests for the specific implementation, making sure it is built as you want
Black box tests, where you don't know how the code is implemented (or at least, you act as if you didn't), and prepare more generic tests, meant to make sure it works as expected
Now, going to the next step, you won't hear much about these two tests, and instead people will talk about the following ones:
Unit tests, where you test the smallest piece of code possible
Integration test, where you connect several pieces of code and test them
"The smallest piece of code possible" has a lot of different meanings, depending on the person and project. But keeping with the simplification, if you can't make a white box test of it, then you are creating an integration test.
Integration tests include things like database access, running servers, and take a long time. At least much longer than unit testing. Also, integration tests due to their complexity may require setting up a specific environment.
This means that while unit tests can be run locally with ease, integration tests may be so slow that people dislike running them, or may just be impossible to run in your machine.
So what do you do? Easy, separate the tests, so unit tests can be run locally after each change, while integration test are run (after unit tests) by the CI server after each commit.
Additional tests
Just as a comment, don't stop at this simplified vision of tests. There are several ways of handling tests, and some tests I wouldn't fit into unit or integration tests. For example it is always a good idea validating code style rules, or you can make a test which just deploys the project into a server, to make sure it doesn't break.
CI
Your CI server should be listening to commits, and if correctly configured it will know when this commit comes from a development version, a release or anything else. Allowing you to customise the process as you wish.
First of all it should run all the tests. No excuses, and don't worry if it takes two hours, it should run all the tests, as this is your shield against future problems.
If there are errors, then the CI server will stop and send a warning. Fix the code and start again. If all tests passed, then congratulations.
Now it is the time to deploy.
Deploying should be taken with care, always. The latest version available in the dependencies repository should, always, be the most current one.
It is nice having a script to deploy the releases into the repository after a commit, but unless you have some short of final validation, a manual human-controlled one at it, you may end releasing a bad version.
Of course, you may ignore this for development versions, as long as they are segregated from the actual releases, but otherwise it is best handling the final deployment by hand.
Well, it may be with a script or whatever you prefer, but it should be you who begins the deployment of releases, not the machine.
CI customisation
Having a CI server allows for much more than just testing and building.
Do you like reports? Then generate a test coverage report, a quality metrics one, or whatever you prefer. You are using a external service for this? Then send him the files and let it work.
Your project contains files to generate documentation? Build it and store it somewhere.

continuous integration pain points

Recently my fledgling team (just two devs) attempted to implement continuous delivery practices as described by Jez Humble.
That is we ditched feature branches and pull requests (in git) and aimed to commit to the mainline branch every day at least.
We have a comprehensive unit and functional test suite for both the front and back end which is triggered automatically by Jenkins, when pushing to git.
We configured a feature switching app and resolved to use it for longer running features.
However, we encountered several problems and I'm curious to get a perspective from people who are successfully using this approach.
Delays due to Vetting/ Manual QA process
often tasks were small enough that we didn't think they warranted configuring feature switching, e.g. adding an extra field to a form, or changing some field labels. However, for various reasons that ticket would become blocked (e.g. some unforeseen aspect of the task needing UX input).
This would mean mainline ended up in a compromised state whilst we waited for external dependencies to unblock the task. Often we'd be saying "we can't deploy anything until Thursday, as that's when we can get an IA review"
The answer here is probably a much tighter vetting of which tasks are started. However, it was often difficult to completely anticipate every potential blocker. Maybe if a task becomes blocked additional dev should be done to add a feature switch, or revert the commits? Tricky situation.
Issues with code review during integration on mainline branch
Branches and pull requests give a nice breakdown of changes made on a single task. However, attempting CD we ended up with a mish-mash of unrelated commits on mainline, and the code reviewer having to somehow piece together commits that related to the task he was reviewing. And often there'd be a number of additional minor bug fixes, changes in response to review type commits at the end of a task. Essentially we couldn't figure out a clean way to code review work with this set up.
Generic code review issues
We used phabricator for a bit to do post-commit code reviews, but found it was flagging every single commit (some very minor) for code review, rather than showing us a list of changes per individual dev task. So it made reviewing the code onerous compared to git pull requests. Is there a better way?
We've now reverted back to short lived feature branching in git and raising pull requests to initiate code review and it's a nice set up, but if we could fix the issues we're having with non-feature branching CD, then we'd like to re-attempt that approach.

Could you automate the vetting process and/or run it before you integrate. If you automate the vetting process, for ex adding a form/button etc, you just need a suite of test to run post integration to validate that your mainline is not broken
You need to code review before integration i.e on the pull request . If issues are caught during a code review and fixed, the pull request is updated and the mainline is not messed.
Code review tools are very specific to a group of developers and the team needs.I suggest you play with a few code review tools to see which one suits your needs
Based on most of your questions, I would recommend running all your Vetting/code review etc before you merge.(You can do it in increments if the process is too cumbersome) and running an automated suite of tests for all the stuff that you want to do post integration.
If the process setup that you have in your team is too complicated to be finished in a day and can have multiple iterations , then it is worthwilefor you to evaluate a modified version of gitflow than a fork based CI model

If you use feature branches to work on tasks when you finish with a task you can either merge it back to the integration branch or create a pull request for the merge back to the integration branch.
In both case you get a merge commit, a summary of every change you made on the feature branch.
Do you need something more than this?

force stable trunk/master branch

Our development departments grows and I want to force a stable master/trunk.
Up to now every developer can commit into the master/trunk. In future developers should commit into a staging area, and if all tests pass the code gets moved to the trunk automatically. If the test fails, the developer gets a mail with the failed tests.
We have several repositories: One for the core product, several plugins and a repository for every customer.
Up to now we run SVN and git, but switching all repos to git could be done, if necessary.
Which software could help us to get this done?
There a some articles on the web which explain how to use gerrit and jenkins to force a stable branch.
I am unsure if I need both, or if it is better to use something else.
Environment: We are 10 developers, and use python and django.
Question: Which tool can help me to force a stable master branch?
Update
I was on holiday, and now the bounty has expired. I am sorry. Thank you for your answers.

Question: Which tool can help me to force a stable master branch?
Having been researching this particular aspect of CI quasi-pathologically since our ~20 person PHP/ZF1-based dev team made the switch from SVN to Git over the winter (and I became the de-facto git mess-fixer), I can't help but share my experience with this particular aspect of continuous integration.
While obviously, having a "critical mass of unit tests running" in combination with a slew of conditionally parameterized Jenkins jobs, triggering infinitely more conditionally parameterized jobs, covering every imaginable circumstance would (perhaps) be the best and most proper way to move towards a Continuous Integration/Delivery/Deployment model, the meatspace resources required for such a migration are not insignificant.
Some questions:
Does your team have some kind of VCS workflow or, minimally, rules defined?
What percentage would you say, roughly, of your codebase is under some kind of behavioral (eg. selenium), functional or unit testing?
Does your team ( / senior devs ) actually have the time / interest to get the most out of gerrit's peer-based code review functionality?
On average, how many times do you deploy to production in any given day / week / month?
If the answers to more than one of these questions are 'no', 'none', or 'very little/few', then I'd perhaps consider investing in some whiteboard time to think through your team's general workflow before throwing Jenkins into the mix.
Also, git-hooks. Seriously.
However, if you're super keen on having a CI/Jenkins server, or you have all those basics covered already, then I'd point you to this truly remarkable gem of a blog post:
http://twasink.net/2011/09/16/making-parallel-branches-meet-regularly-with-git-and-jenkins/
And it's equally savvy cousin:
http://twasink.net/2011/09/20/git-feature-branches-and-jenkins-or-how-i-learned-to-stop-worrying-about-broken-builds/
Oh, and of course, the very necessary devopsreactions tumblr.

There a some articles on the web which explain how to use gerrit and jenkins to force a stable branch.
I am unsure if I need both, or if it is better to use something else.
gerrit is for coding review
Jenkins is a job scheduler that can run any job you want, including one:
compiling everything
launching sole unit test.
In each case, the idea is to do some guarded commit, ie pushing to an intermediate repo (gerrit, or one monitored by Jenkins), and only push to the final repo if the intermediate process (review or automatic build/test) passed successfully.
By adding intermediate repos, you can easily force one unique branch on the final "blessed" repo to which those intermediate referential will push to if the commits are deemed worthy.

It sounds like you are looking to establish a standard CI capability. You will need the following essential tools:
Source Version Control : SVN, git (You are already covered here)
CI server : Jenkins (you will need to build and run tests with each
check in, and report results. Jenkins is the defacto standard tool
used for this)
Testing : PyUnit
Artifact Repository : you will need a mechanism for organizing and
archiving the increments created with each build. This could be a
simple home grown directory based system. I have also used Archiva,
but there are other tools.
There are many additional tools that might be useful depending on your development process:
Code review : If you want to make code review a formal gate in your
process, Gerrit is a good tool.
Code coverage analysis : I've used EMMA in the past for Java. I am
sure that are some good tools for Python coverage.
Many others : a library of Jenkin's plugins that provide a variety of
useful tools is available to you. Taking some time to review
available plugins will definitely be time well spent.
In my experience, establishing the right cultural is as important as finding the right tooling.
Testing : one of the 10 principles of CI is "self testing builds". In
other words, you must have a critical mass of unit tests running.
Developers must become test infected. Unit testing must become a
natural, highly value part of each developers individual development
process. In my experience, establishing a culture of test infection
is the hardest part of deploying CI.
Frequent check-in : Developers and managers must organize there work
in a way that allows for frequent small check-ins. CI calls for daily
checkins. This is sometimes a difficult habit to establish.
Responsiveness to feedback : CI is about immediate feedback. The
developers must be conditioned to response to the immediate feedback.
If unit tests fail, the build it broken. Within 15 minutes of a CI
build breaking, the developer responsible should either have a fix
checked in, or have the original, bad check-in backed out.

does github support precommithooks?

Currently we are using SVN.
I would like to start using GitHub, but one absolute requirement is that we will need to have precommit (premerge) validation of the code like we currently have. Does GitHub support precommithooks (premergehooks)?
We're a team of 5 developers. We made an agreement that all code (JavaScript) should pass JSLint-like validation. Voluntary validation has proven not to work because it's easily forgotten. How can we be sure that code that becomes available to the others is guaranteed to validate against JSLint (or similar)?

The concept I was looking for was the prereceive hook

I don't believe github supports pre-commit hooks. However, the git core does. You could set up the pre-commit hooks locally, or apply them as a test before merging branches into your main github.

I think you're missing something fundamental about git. It's not a centralized model (well ok, it can be, but if you're going to use it this way then github is probably the wrong approach). If you're using github, the right way to do this is:
Host your main repo
Have your developers each create their own fork
Let them happily hack away, committing and pushing to their heart's content
When they think a feature is ready, they send a pull request to you (the maintainer) which you yourself verify on the side to ensure stability. Then you merge / rebase their changes into the main repo.
Naturally there are many ways to skin a cat. But when you're talking about "real git" (the kind employed by the open source community), the centralized "check-it-in-and-it-damned-well-better-work" model is kind of difficult, especially when it comes to larger projects.

I think this article describes a very good workflow that could be a basis for automation:
http://scottchacon.com/2011/08/31/github-flow.html
The main idea is that you use pull requests as mentioned above, but you can also have a service that can use the github api to fetch or pull the branch making the request, merge, test, validate then push to the target branch.

No, GitHub doesn't support pre-commit hooks. How would that even work? Committing happens on your computer, do you really want to allow GitHub to run arbitrary code on your machine?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse