We are using GitHub as our source repository, AWS CodeBuild to compile the code from GitHub, Elastic Beanstalk to host environments and CodePipeline to trigger a build on commit and to deploy the code to different environments, with production being the final environment.
What I would like to add as a final step to CodePipeline is a merge back to master after a build has been deployed to production. I did a brief search on google but could not find any good references for how to initiate a git merge.
Does anybody have any experience with triggering a merge from CodePipeline?
Currently there isn't built-in support for merging.
Today most users run their pipeline on master, and merge into that before the code enters their pipeline. One advantage of this approach is that it ensures your pipeline is run on the exact merged version on mainline, rather than a pre-merge version.
However, we're aware that some workflows like a pull-request based workflow would benefit from being able to merge at the end of a pipeline.
The best workaround today is to use a Lambda function, custom action, or CodeBuild step to perform the merge.
Related
In my code hosted on GitHub, we perform some tests and quite a bit of post-processing using GitHub Actions. Now, we would like to (or, actually, have to) use Gitlab runners hosted by a supercomputing center to do some further testing and benchmarking. This cannot be done with self-hosted GitHub runners, because I cannot influence their decision. We do not want to move the whole workflow and community over to some Gitlab instance either. So here's my (general) question: Is there a way to use Gitlab runners from within GitHub Actions?
What I have tried and what kind of works is to mirror the repository over to the Gitlab instance and let the runners do their magic there. Using this neat approach, the GitHub Action will wait for the results of the runners and integrate them into its own results. However, this does not work if contributors fork the repository and make pull requests.
In principle, it looks like this could be doable if the contributors also have accounts and corresponding permissions at the Gitlab instance. This is fine for now, because the community is small and the Gitlab instance is accessible to external contributors. Note that manual action from the maintainers of the code (i.e., me) is required before contributors can execute code with the runners for the first time, so we should be fine concerning security.
However, I cannot get this to work for pull requests, because I fail to mirror them. As said, direct pushes are fine, but nothing else works. This leads me to the more specific questions: How can I mirror a pull request from GitHub to a Gitlab repository? How can I enable this for both pull request and pushes (and do I need even more cases)?
Any help is appreciated! I'm really no expert on GitHub Actions, Gitlab runners or even git itself (beyond the basics). If there's a better way to achieve this, I'm happy to hear about it!
I can think of several workarounds:
1. Change what triggers your pipelines
Since you cannot mirror pull requests, but you can mirror branches, adapt the pipeline triggers in Gitlab so the pipelines are launched whenever there is a new commit, instead of a new PR.
You can always use a staging branch if you want to limit the pipeline executions.
2. Use webhooks
If the Gitlab instance is available on the internet, create a GitHub action that triggers a Gitlab pipeline execution whenever there is a PR on Github, or even open a PR directly in Gitlab. It is well documented:
Trigger a pipeline using curl
API to create merge request
I have two instances of azure data factory. One is PROD and another is DEV.
I have my DEV ADF integrated to git repository and will be doing all developments in this adf instance.
Once the code is ready for production deployment, will follow CI/CD steps to deploy the DEV ADF into PROD.
This functionality is working fine.
Now recently I had few changes in my PROD ADF instance by upgrading the ADLS Gen1 to Gen2 and few alterations on pipelines also. These changes has been directly updated in PROD instance of ADF.
Now I have to deploy these changes in DEV instance in order to make both instances in sync, before proceeding with further developments.
In order to achieve this i have followed below steps.
Remove git integration of DEV ADF instance.
Integrate PROD ADF into a new git repository and do a publish
Build Pipelines and Release pipelines has been executed and deployed PROD into DEV
I could see the changes in both PROD and DEV are in sync.
Now i want to re integrate the DEV ADF in order to proceed with further developments
When I re integrate the DEV ADF into the collaboration branch (master) of existing dev instance repository as shown below, I could see the discrepancies in pipeline count and linked service count.
The pipelines and linked services which are deleted from PROD is still there in DEV ADF master branch.
When I remove the git integration of DEV ADF, now both DEV and PROD ADF are in sync.
I tried to integrate the DEV ADF into a new branch of same dev repository as shown below,
Still I could see the deleted pipelines and linked services which are deleted from production is also available in the dev adf.
It seems like the pipelines and linked services which are changed are getting updated, but the items deleted are not removed from the dev master repository.
Is there any way to cleanup master branch and import only the existing resources at the time of git re-integration?
The only possible way i could found is to create new repository instead of re integrating to the existing one, but it seems like difficult to keep on changing repository and also already created branches and changes in the existing repository will be lost.
Is there any way when I re-integrate the repository with ADF, it should take only the existing resources into master branch of repository, not merging with the existing code in master?
These things happen. ADF Git integrations are a bit different, so there's a learning curve to getting a hold of them. I've been there. Never fear. There is a solution.
There are two things to address here:
Fixing your process so this doesn't happen again.
Fixing the current problem.
The first place you went wrong was making changes directly in PRD. You should have made these changes in DEV and promoted according to standard process.
The next places you went wrong were removing DEV from Git and then adding PRD to Git. PRD should not be connected to Git at any point, and you shouldn't be juggling Git integrations. It's dangerous and can lead to lost work.
Ensure that you do not repeat these mistakes, and you will prevent complicating things like this going forward.
In order to fix the current issues it's worth pointing out that with ADF Git integrations, you don't have to use the ADF editor for everything. You are totally able to manipulate Git repos cloned to your local file system with standard Git tools, and this is going to be the key to digging yourself out. (It's also what you should have done in the first place to retrofit PRD changes back into DEV.)
Basically, if your PRD master contains the objects as you want them, then first clone that branch to your local file system. Elsewhere on your drive, clone a feature branch of your DEV repo to the file system. In order to bring these in sync, you just copy the PRD master contents and paste them into the DEV feature branch directory and push changes. Now, this DEV feature branch matches PRD master. A merge and pull request from this DEV feature branch to DEV master will then bring DEV master in sync with PRD master (assuming the merge is done correctly).
Even when not having to do things like this, it can be helpful to have your ADF Git repo cloned locally so you have specific control over things. There are times when ADF orphans objects, and you can clean them up via the file system and Git tools without having to wrestle the ADF editor as such.
At work, we're now using GitHub, and with that GitHub flow. My understanding of GitHub flow is that there is a master branch and feature branches. Unlike git flow, there is no develop branch.
This works quite well on projects that we've done, and simplifies things.
However, for our products, we have a development and production environment. For the production environment, we use the master branch, whereas for the development environment we're not sure how to do it?
The only idea I can think of is:
When a branch is merged with master, redeploy master using GitHub actions.
When another branch is pushed, set up a GitHub action so that any other branch (other than master) is deployed to this environment.
Currently, for projects that require a development environment, we're essentially using git flow (features -> develop -> master).
Do you think my idea is sensible, and if not what would you recommend?
Edit:
Just to clarify, I'm asking the best way to implement development with GitHub Flow and not git flow.
In my experience, GitHub Flow with multiple environments works like this. Merging to master does not automatically deploy to production. Instead, merging to master creates a build artifact that is able to be promoted through environments using ChatOps tooling.
For example, pushing to master creates a build artifact named something like my-service-47cbd6c, which is a combination of the service name and the short commit hash. This is pushed to an artifact repository of some kind. The artifact can then be deployed to various environments using tooling such as ChatOps style slash commands to trigger the deloy. This tooling could also have checks to make sure test environments are not skipped, for example. Finally, the artifact is promoted to production.
So for your use case with GitHub Actions, what I would suggest is this:
Pushing to master creates the build artifact and automatically deploys it to the development environment.
Test in development
Promote the artifact by deploying to production using a slash command. The action slash-command-dispatch would help you with this.
You might also consider the notion of environments (as illustrated here)
Recently (Feb. 2021), you can:
##Limit which branches can deploy to an environment
You can now limit which branches can deploy to an environment using Environment protection rules.
When a job tries to deploy to an environment with Deployment branches configured Actions will check the value of github.ref against the configuration and if it does not match the job will fail and the run will stop.
The Deployment branches rule can be configured to allow:
All branches – Any branch in the repository can deploy
Protected branches – Only branches with protection rules
Selected branches – Branches matching a set of name patterns
That means you can define a job to deploy in dev environment, and that job, as a condition, will only run if triggered from a commit pushed from a given branch (master in your case)
For anyone facing the same question or wanting to simplify their process away from gitflow, I'd recommend taking a look at this article. Whilst it doesn't talk about Github flow explicitly it does effectively provide one solution to the OP.
Purests may consider this to be not strictly Gitflow but to my mind it's a simple tweak that makes the deployment & CI/CD strategy more explicit in git. I prefer to have this approach rather than add some magic to the tooling which can make a process harder for devs to follow and understand.
I think the Gitflow intro is written fairly pragmatically as well:
Different teams may have different deployment strategies. For some, it may be best to deploy to a specially provisioned testing environment. For others, deploying directly to production may be the better choice...
The diagram in the article sums it up well:
So here we have Master == Gitflow main and the useful addition is the temporary release branch from which you can deploy to other environments such as development. What is worth considering is what you choose to call this temporary branch, in the above it's considered a release, in your process it may be a test branch, etc.
You can take or leave the squashing and tagging and the tooling will change between teams. Equally you may or may not care about actual version numbers.
This isn't a million miles away from VonC's answer, the difference is the process is more tightly defined and it's more towards having multiple developers merge into a single branch & apply fixes in order to get a new version ready for production. It may well be that you configure the deployment of this temporary branch via a naming convention as in his answer.
The way I've implemented this flow is using PRs. I did it with Azure DevOps, but I'd say that the same can be achieved with GitHub Actions.
When you have a branch that you intent to test and eventually merge to master and release to production, you create a PR from that branch to master. The PR will trigger a pipeline, which will run your build, static analysis and tests. If that passes, the PR is deployed to a test environment where further automated and manual testing can happen. That PR can be reviewed and approved by other developers and, if you need to, by QA after manual testing. You can configure GitHub PR rules to enforce the approvals. Once approved, you can merge the PR to master.
What happens once in master is independent of the workflow above, but most likely a new pipeline will be triggered, which will build a release candidate and run the whole path to production (with or without manual intervention).
One of the tricks is how the PR pipeline decides which environment to deploy the PR too. I can think of three options:
Create an environment on the fly which will be killed once the PR is merged or closed. This is the most advanced and flexible option. This would require the system to publish the environment location to the PR.
Have a pool of environments and have the automation figure out which are free and automatically choose one. The environments could be stopped, so you find an environment which is stopped, start it up and deploy there. Once the PR is closed/merged, stop the environment again.You can publish the environment location to the PR.
Add a label to the PR indicating the environment (ie. env-1, env-2, etc.). This is the simplest option, but it requires that developers look at the open PRs to see which environments are already in use in other PRs to avoid overwriting other people's code.
With all these options, once the PR is created, you can just push new commits to the branch and the environment will be updated.
You also need to decide what you want to do when a new commit is pushed to master. You most likely want to trigger a new PR build to update the environments with the latest master, but you can do this automatically or manually, depending on how busy your master is.
Nathan, adding a development branch is good idea, you can work on development changes in new branch and test them in dev environment and after getting signoff to move to production environment you can merge your changes in master branch.
Don't forget to perform regression testing on merged master branch to test both old features and new features are working fine before releasing your code for installation in production
TL;DR
How do I avoid rebuilding artifacts on master when a feature is merged without creating multiple pipelines per project? Where do I access the information about which branch was merged?
More Info
I run Jenkins to build many projects stored in two different VCSs (Gitlab, Bitbucket). Auto-discovery for both VCSs work and create multi-branch pipelines for every project/branch/PR containing a Jenkinsfile (Gitlab Branch Source Plugin, Bitbucket Branch Source Plugin).
Build artifacts get produced and stored on every build (e.g. docker images pushed to registry).
As I follow a feature branch workflow, these features get eventually merged into master, master will then be deployed in irregular intervals.
When doing the merge, there is an artifact already built and stored for this code(see appendix:1). It was built for the feature branch the code originated from (e.g. container mysuperapp:feat-add-better-things-3). I would like to take this artifact and promote it as the new master artifact (e.g. mysuperapp:master), avoiding a rebuild (and unit + integration testing everything).
But, merging a feature branch just kicks off a new build pipeline on branch master without any information about the merged branch (see appendix:2). This is correct behavior concerning master (new commit(s) where pushed) but prevents me from reacting to the merged branch (e.g. the aforementioned promoting or even just deleting unused artifacts). Is there any way to get the information, which branch was merged?
I am aware, that I can create a new pipeline listening for PR webhooks from my VCSs, running a pipeline to do the promotion and ignore builds on master completely. But this moves visibility of this process to a different pipeline and requires additional pipelines for projects, e.g. reducing the advantage of auto-discovery to 50% (have to create these merge pipelines for each project).
How can I keep the advantages of auto-discovery and visibility of executed steps while also executing something on a merge?
Ideas: Tag artifacts differently, but how (needs to be able to clean up correctly)? Parameterize pipelines and setup a single merge pipeline which re-triggers the pipeline 'push on master' with parameters of the merged branch. But can this be done without having to setup the webhooks for every project? Ask the VCSs via REST about which branch belonged to a commit?
Greets and thanks for the help you all! This may be a complicated one, but it would be so cool to get this to work. It's the last barrier for me to enable continuos delivery for a lot of projects!
Appendix:
1: I am also aware, that to have consistent builds, I have to enforce --ff-only merges. This question is not about the pitfalls of git but rather about the way to go with Jenkins.
2: Git provides me with the parent commits, I can easily find out, which commit was merged. But, especially using "Delete branch after merge", leaves me without the branch ref in git. Tagging my docker images with commits instead of branches leaves me with backtracking the last commit on each build to delete the old, obsolete build.
I want to create automated deployment pipeline for azure datafactory.
For one stream of development we can configure it using doc
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
But when it comes to deploying to two diff test datafactories for parrallel features development (in two different branches), it is not working because the adb_publish which gets generated is only specific to the one datafactory.
Currently we are doing deployement using powershell scripts and passing object list which needs to be deployed.
Our repo is in Azure devops.
I tried
linking the repo to multiple df but then it is causing issue, perhaps when finding deltas to publish.
Creating forks of repo instead of branches so that adb_publish can be seperate for the every datafactory - but this approach will not work when there is a conflict, which needs manual merge, so the testing will be required again instead of moving to prod.
Adf_publish get generated whenever you publish. Publishing takes whatever you have in your repo and updates data factory with it.
To develop multiple features in parallel, you need to just use "Save". Save will commit your changes to the branch you are actually working on. Other branches will do the same. Whenever you want to publish, you need to first make a pull request from your branch to master, then publish. Any merge conflict should be solved when merging everything in the master branch. Then just publish and there shouldn't be any conflicts, and adf_publish will get generated after that.
Hope this helped!
Since a GitHub repository can be associated with only one data factory. And you are only allowed to publish to the Data Factory service from your collaboration branch. Check this
It seems there is not a direct and easy way to accomplish this. If forking repo as workaround, you may have to solve the conflicts before merging as #Martin suggested.