We develop enterprise software and we wish to promote more code reuse between our developers (to keep this problem simple, let's assume all .NET). We are about to move to a new VCS system (mostly likely mercurial) and I want to have a strategy in place for how we will share libraries.
What is the best process for managing shared libraries that meets the following use cases:
Black Box - only the public API of the library is known and there is no assumption that consuming developers will be able to "step into" or set breakpoints into the library. The library is a black box. Often a dev does not care about the details, just give me the version of the lib that has always "worked".
Debug - the developer should be able to at least "step into" the library during development. Setting breakpoints would be a bonus too.
Parallel Development - while most likely the minority, there are seemingly valid use cases for developing the library in parallel with the consuming application. Often the authors of the library and component are the same developer. For better or worse, the applications and libraries can often be tightly coupled. Being able to make changes and debug into both can be a very productive way for us to develop.
It should be noted that solving 3, may implicitly solve 2.
Solutions may involve additional tools (such as NuGet, etc.).
By sharing libraries, you must distinguish between:
source dependencies (you are sharing sources, implying a recompilation within your project)
binary dependencies (you are share the delivery, compiled from common sources) and link to it from your project.
Regarding both, NuGet (2.0) finally introduced the "Package Restore During Build", in order to not commit to source control whatever is build in Lib or ExternalDependencies folder.
NuGet (especially with its new hierarchical config, NuGet 2.1) is well suited for module management within a C# project, and will interface with both git and Mercurial.
Combine it with the Mercurial subrepos, and you should be able to isolate in its own repo the common code base you want to reuse.
I have 2 possible solutions to this problem, neither of which seems ideal (and therefore why I posted the question).
Use the VCS to manage the dependencies. Specifically, use mercurial subrepos and always share by source.
Advantages:
All 3 usecases are solved.
Only one tool is required for source control and dependency management
Disadvantages:
Subrepos feature is considered a feature of last resort by Mercurial developers and from experimentation and reading has the following issues:
Tags cannot be easily or atomically applied to multiple repos.
Root/Shell repos are inherently fragile (can be broken if the pathing to subrepos changes). Mercurial developers suggest mitigating this issue by including no content in the shell repo and only use it to define (and track the revision) of the subrepos. Therefor allowing a dev to manually recreate a moment in time even if the subrepo pathing is broken.
Branching cannot cross repo boundaries (most likely not a big issue as one could argue that branches should only occur in a given subrepo).
Use Ivy or NuGet to manage the dependencies. There are two ways this could work.
Dependencies/Packages can simply contain official binaries. A build server can be configured to publish a new dependency/package into the company repository when a developer submits a build for new version. This solves case 1. Nuget seems to support symbol packages that may solve case 2. Case 3 is not solved and leaves developers in that case out to dry and come up with there own solution (there is basically no way to commit applications to the VCS that include dependencies by source). This seems to be the traditional way that dependency management tools are used.
Dependencies/Packages can contain a script that gets the source from mercurial. The script could be automatically executed when the dependency/package is installed. Some magic has be performed to have the .NET solution include the reference by project (rather than by browsing the filesystem), but in theory this could happen in the NuGet install script and reversed in the uninstall script.
Switching between "source" and "binary" dependencies seems to be a manual step. I would argue devs should switch to binary dependencies for releases and perhaps this could be enforced on the build server when creating a release. This further complicated by the fact that the VS solution needs to be modified to reference a project vs a binary.
How many source packages exists? Does every binary package contain the script to fetch the source that it was built with? Or do we create separate source packages that use the install script magic to get the source? This leads to the question is there a source package for every tag in mercurial? Every changeset? Or simply 1 source package that just clones and updates to the tip and leaves the dev to update to a previous revision (but this creates the problem of knowing what revision to update to).
If the dev then uses mercurial to change the revision of the source, how can this be reflected in the consuming application? The dependency/package that was used to fetch the source has not changed, but the source itself has...
Related
I have a situation where we have multiple C# projects that use similar set of Nuget packages (ex. Newton Json, Microsoft Compilers and CodeDom, Owin, log4net, jQuery, EntityFramework etc.)
I am trying to see how we can have a shared location for all Nuget packages to reduce the footprint of those binaries in Git, having a single repo for them by centralizing them in one place.
One option I found is to use Nuget.config in each project with repositoryPath set to point at the shared location. This works great for adding/upgrading/restoring Nuget packages in the project but it is not very clean when a package gets removed from one project but is still required in a different one. Basically the package will get removed from the shared location and the change is committed to Git, then when the other project requires it, it would get restored and added back to Git. Not a perfect solution in my mind.
I have a two part question:
1. Is there a way to improve the above workflow when packages get removed?
2. What is the industry standard for handling third party libraries delivered via Nuget? Or if there is none, can you share your experience handling Nuget packages across multiple projects.
If the concern lies with the footprint/organization of the Git repository, maybe you can do a .git ignore for the dependencies folders to prevent git from committing them into the repositories. When you need to build the projects from source, just do a dotnet /nuget restore to get the dependencies from the source you configured in the Nuget.config
Not sure if it is the industry standard, but we host our own Nuget server to control the libraries that the different teams can use. Microsoft has an article on Hosting your own NuGet feeds
Hope it helps.
I'm actually thinking about the pro and cons about using NuGet. In our current software we're storing each external reference in a common reference folder (which is commited to our SW versioning system). Over time this approach becomes more and more painful because we've to store different versions to the same library.
Since our devs are sometimes at the customer site (where not all customers are offering internet connectivity ...) we won't use NuGet directly, because NuGet packages can't be restored.
Based on that I'm actually thinking about using NuGet and store the packages folder in our SW versioning system.
Does anybody know if there are some disadvantages about this solution? Does anybody have a better proposal?
Thx.
I would argue against storing external nuget packages in your version control system.
It's not your application's responsibility to archive third party packages. Should you need to take care of that risk then build a solution intended for such (for example: use private nuget repository that's properly backed up).
Avoid duplication in code base - provided you use properly released packages, then the packages.config file content is sufficient for reliably reproducing the exact dependencies your application needs.
Synchronization is an effort - keeping packages.config and packages folder in sync- once you start including them in source control every developer working with packages would monitor and add or remove packages to source control.
If devs ever forget to add then local build still fails.
If they forget to remove no longer necessary piece then your downloadable set would contain junk.
VCS dataset size - storing them would needlessly enlarge your version control storage. Quite often the packages contain N different platform dlls, tools and whatnot which add up quite fast. Should you keep your dependencies constantly up to date, then after 10 years your VCS history would contain huige amount of irrelevant junk. Storage is cheap, but still..
Instead, consider having a private nuget repository with the purpose of serving and archiving the packages your application needs and set up your project to check your project nuget repository first. If your developers need offline compile support then they can set up project repository mirrors on their build boxes and configure the following fallback structure for repos:
Developer local project repository (ex: folder)
Shared project repository (ex: Nuget.Server)
(nuget.org)
A guide how to configure multiple repositories can be found here: How to configure local Nuget Repository.
I recently joined a company as Release Engineer where a large number of development teams develop numerous services, applications, web-apps in various languages with various inter-dependencies among them.
I am trying to find a way to simplify and preferably automate releases. Currently the release team is doing the following to "release" the software:
CURRENT PROCESS OF RELEASE
Diff the latest revision from SCM between QA and INTEGRATION branches.
Manually copy/paste "relevant" changes between those branches.
Copy the latest binaries to the right location (this is automated using a .cmd script).
Restart any services
MY QUESTION
I am hoping to avoid steps 1. and 2. altogether (obviously), but am running into issues where differences between the environments is causing the config files to be different for different environments (e.g. QA vs. INTEGRATION). Here is a sample:
IN THE QA ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.QA.domain.net/</value>
</setting>
IN THE INTEGRATION ENVIRONMENT:
<setting name="ServiceUri" serializeAs="String">
<value>https://servicepoint.integration.domain.net/</value>
</setting>
If you look closely then the only difference between the two <setting> tags above is the URL in the <value> tag. This is because the QA and INTEGRATION environments are in different data-centers and are ever so slightly not in sync (with them growing apart as development gets faster/better/stronger). Changes such as this where the URL/endpoint is different are TO BE IGNORED during "release" (i.e. these are not "relevant" changes to merge from QA to INTEGRATION).
Even in a regular release (about once a week) I have to deal with a dozen config files changes that have to released from QA to integration and I have to manually go through each config file and copy/paste non URL-related changes between the files. I can't simply take an entire package that the CI tool spits out from QA (or after QA), since the URL/endpoints are different.
Since there are multiple programming languages in use, the config file example above could be C#, C++ or Java. So am hoping any solution would be language agnostic.
SUMMARY OF ENVIRONMENTS/PROGRAMMING LANGUAGES/OS/ETC.
Multiple programming languages - C#, C++, Java, Ruby. Management is aware of this as one of the problems, since Release team is has to be king-of-all-trades and is addressing this.
Multiple OS - Windows 2003/2008/2012, CentOS, Red Hat, HP-UX. Management is addressing this too - starting to consolidate and limit to Windows 2012 and CentOS.
SCM - Perforce, TFS. Management is trying to move everyone to a single tool (likely TFS)
CI is being advocated, though not mandatory - Management is pushing change through but is taking time.
I have given example of QA and INTEGRATION, but in reality there is QA (managed by developers+testers), INTEGRATION (managed by my team), STABLE (releases to STABLE by my team but supported by Production Ops), PRODUCTION (supported by Production Ops). These are the official environments - others are currently unofficial, but devs or test teams have a few more. I would eventually want to start standardizing/consolidating these unofficial envs too, since devs+tests should not have to worry about doing this kind of stuff.
There is a lot of work being done to standardize how the binaries are being deployed using tools like DeployIT (http://www.xebialabs.com/products) which may provide some way to simplify these config changes.
The devs teams are agile and release often, but that just means more work diffing config files.
SOLUTIONS SUGGESTED BY TEAM MEMBERS:
Current mind-set is to use a LoadBalancer and standardize names across different environments, but I am not sure if "a process" such as this is the right solution. There must be a better way that can start with how devs write configs to how release environments meet dependencies.
Alternatively some team members are working on install-scripts (InstallShield / MSI) to automate find/replace or URLs/enpoints between envs. I am hoping this is not the solution, but it is doable.
If I have missed anything or should provide more information, please let me know.
Thanks
[Update]
References:
Managing complex Web.Config files between deployment environments - C# web.config specific, though a very good start.
http://www.hanselman.com/blog/ManagingMultipleConfigurationFileEnvironmentsWithPreBuildEvents.aspx - OK, though as a first look, this seems rather rudimentary, that may break easily.
Generally the problem isn't too difficult - you need branches for each of the environments and CI build setup for them. So a merge to the QA branch would trigger a build of that code and a custom deployment to QA. Simple.
Now managing multiple config files isn;t quite so easy (unless you have 1 for each environment, in which case you just call them Int.config, QA.config etc, store them all in the SCM, and pick the appropriate one to use in each branch's deployment script - eg, when the build for QA runs, it picks qa.config and copies it to the correct location and renames it to the correct name)(incidentally, this is the approach I tend to use as its very simple).
If you have multiple configs you need to use, then its always going to be a manual process - but you can help yourself by copying all the relevant configs to a build staging area that an admin will use to perform the deployment. Its a good first step in that the build they have in a staging directory will be the correct one for them, they just have to choose which config to use either during (eg as an option in the installer) or by manually copying the appropriate config over.
I would not try to manage some automated way of taking a single config file in source control and re-writing it with different data in the build, or pre-deploy steps. That way lies madness, and a lot of continual hassle trying to maintain the data and the tooling. Keep separate configs in place and make sure the devs know to update all of them when they make a change. (Or, you can hold 1 config in the SCM tree and make sure they know that merging their changes must not overwrite any existing modifications - multiple configs is easier)
I agree with #gbjbaanb. Have one config for each environment. Get your developers to write apps that read their properties (including their URLs) from config files and commit config files for each environment. Not only does this help you with deployment, but config files under revision control provides reproducibility, full transparency, and an audit trail of your environment specific settings.
Personally, I prefer to create a single deployable package that works on any environment by including all of the environment configs (even the ones you aren't using). You can then have some deployment automation that figures out which config files the apps should use and sets that up appropriately.
Thanks to #gman and #gbjbaanb for the the answers (https://stackoverflow.com/a/16310735/143189, https://stackoverflow.com/a/16246598/143189), but I felt that they didn't help me solve the underlying problem that I am facing, and restating just to make clear.
The code seems very aware of the environment in which they run. How to write environment-agnostic code?
The suggestions in the answers above are to store 1 config file for each environment (environment-config). This is possible, but any addition/deletion/edit of non-environment settings will have to be ported over to each environment-config.
After some study, I wonder if the following would work better?
Keep the config file's structure consistent/standardized e.g. XML. Try to keep the environment-specific endpoints in this config-file but store them in a way that allows easy access to the specific individual nodes/settings (e.g. using XPath).
When deploying to a specific environment, then your deployment tool should be able to parse (e.g. using XPath) and update the environment-specific endpoint to the value for the specific environment to which you are deploying.
The above is not a unique idea. There are some existing implementations that tackle the above solution already:
http://www.iis.net/learn/develop/windows-web-application-gallery/reference-for-the-web-application-package & http://www.iis.net/learn/publish/using-web-deploy/web-deploy-parameterization (WebDeploy)
http://docs.xebialabs.com/releases/3.9/deployit/packagingmanual.html#using-placeholders-in-ci-properties (DeployIt)
Home-spun solutions using XPath find and replace.
In short, while there are programming-language-specific solutions, and programming-language-agnostic solutions, I guess the big downfall is that Release Management needs to be considered during development too, else it will cause deployment headaches - I don't like that, since it sounds like "development should be aware of what tests will be designed". Is there a need AND a way to avoid this, is the big questions.
I'm working through the process of creating a "deployment pipeline" for a web application at the moment and am sifting my way through similar problems. Your environment sounds more complicated than ours, but I've got some thoughts.
First, read this book, I'm 2/3 the way through it and it's answering every question I ever had about software delivery, and many that I never thought to ask: http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912/ref=sr_1_1?s=books&ie=UTF8&qid=1371099379&sr=1-1
Version Control Systems are your best friend. Absolutely everything required to build a deployable package should be retrievable from your VCS.
Use a Continuous Integration server, we use TeamCity and are pretty happy with it so far.
The CI server builds packages that are totally agnostic to the eventual target environment. We still have a lot of code that "knows" about the target environments, which of course means that if we add a new environment, we have to modify all such code to make sure it will cope and then re-test it to make sure we didn't break anything in the process. I now see that this is error-prone and completely avoidable.
Tools like Visual Studio support config file transformation, which we looked at briefly but quickly realized that it depends on environment-specific config files being prepared with the code, by the developers in order to be added to the package. Instead, break out any settings that are specific to a particular environment into their own config mechanism (e.g. another xml file) and have your deployment tool apply this to the package as it deploys. Keep these files in VCS, but use a separate repository so that revisions to config don't trigger new builds and cause the build number to get falsely inflated.
This way, your environment-specific config files only contain things that change on a per-environment basis, and only if that environment needs something different to the default. Contrary to #gbjbaanb's recommendation, we are planning to do whatever is necessary to keep the package "pure" and the environment-specific config separate, even if it requires custom scripting etc. so I guess we're heading down the path of madness. :-)
For us, Powershell, XML and Web Deploy parameterization will be instrumental.
I'm also planning to be quite aggressive about refactoring the config files so that the same information isn't repeated several times in various places.
Good luck!
Imagine a distributed software system, installed on a group of a few hundred computers (nodes). Nodes are responsible for automatically running scheduled tasks. There are hundreds of tasks, and every task is scheduled to run on about 5-10 nodes. Nodes may stop for days, and may be removed from the system. Every task is defined by one or more source files, and node-specific config files. The code is developed and tested directly on nodes (using remote access), since only these are equipped with the special hardware and have the network context required to run the tasks (building a separate test system would be too expensive). The source files of every task refer to shared source files (libraries), and libraries may refer to other libraries. The dependency tree of tasks and libraries is complicated.
I don't have any experience with distributed version control systems, but I feel that this system could be built around a DVCS. Different libraries, and source files of different tasks, would have their own repository. Every node which runs a given task should have an instance of the repo of that task. The repo of every library, used by at least one task of a node, should also be present on that node. Developers would modify and commit code locally on nodes, and distribute the modifications to repos on other nodes using DVCS techniques.
Question #1
What would be the best approach to distribute code changes to other nodes?
Some possible scenarios:
Developers push their modifications to every other node which has an instance the same repo. (But they may forget/don't have time to do so.)
Nodes automatically pull every change from every other remote repo, and update themselves. (But there may be conflicts.)
For each repo, one of the instances is used as a "reference". Developers push their modifications to this instance, and every other node having an instance automatically pulls from here and updates itself. (But the node having the reference instance may stop.)
Question #2
What would be the best way to handle dependencies?
If more than one tasks (or libraries) refer to the same library, and the referred library has to be modified, one or more referring tasks (or libraries) may stop working (dependency hell). It would be better to stick with the originally referred version, and upgrade to the new one after proper testing. That is, more than one version of the same source file should be present in the same repo, which does not seem possible. Do I have to create a new branch for the new version of the referred library? If yes, how should I upgrade the referring repos?
Thank you for your help.
I don't have any experience with distributed version control systems,
but I feel that this system could be built around a DVCS.
Wrong feeling, in common. VCS (SCM) is Version Control System (Source Control Management), i.e - track
changes
in historical aspect mostly
as flat array without complex dependences (some dependences are still taken into account)
You have to see at another category of tools - configuration management software, which handle deploys, policies, complex dependences, conditions etc natively
You can get some iteration to CM with DVCS, but it will be hard work and pale semblance of existing well-established tools as a result
I have a NSIS installer that we previously built using nAnt scripts that copy some files around and run makensis.exe via a exec task to build the installer exe. After the nant script completes, I have the compelte structure for our CD and also our download.
I was just doing a get from sourcesafe onto an unused desktop and using it as a build box, compiling there. Sometimes we would have a couple of files checked in that fix something critical. In those cases I would go to the build box, and very selectively get only those files, to avoid getting other changed files that we aren't ready to release yet. Basically I am able to allow development to continue and selectively include certain changed files into the installer for release.
Now we no longer have a free box, and need to build from our server. So I am setting up CI Factory so that the developer can kick the build off without remoting into the server. The one issue I am struggling with, is the best way to continue to allow this selective change control to occur. The default concept of CI that CI Factory implements is fine for internal development "head". However, I also want to setup a CCNet project that is run only on demand via a Force Build for this "public release" type of build.
This is what I've brain stormed so far, without being sure how well this will work, if at all(still figuring out what CCNet and CI Factory are all about). The "public release" CCNet project config/build would be setup such that it would not get latest. Modifications would not trigger a build. Since the other CCNet project that is using the default CI methodology(we'll call it the "CI project") of getting latest when changes are detected, then these two projects can't share the same working directory. So the "public release" would need a different working copy, so that its files won't get updated when the CI project's build is triggered. The developer would need to remote into the server, one VSS, selectively do a get into the "public release"'s working copy, and then force a build through CI Factory.
The disadvantage's I see with this is
1) Having to remote in to selectively do gets.
2) I have no idea how to allow a single CI Factory project to have two different working copies of the Product folder, so that each project configuration block has it's own.
3) I'm afraid of what kind of strangeness this might cause. I'm not quite sure yet how to specify a source control block in CCNet project config block, but prevent it from doing a get latest when it builds. I'm still gradually figuring out what things are in scripts and can be easily taken out without breaking other things, versus what is not meant to be mucked around with and/or is not configurable.
I would really like to hear about how others deal with this issue of selectively releasing changes, if you have a similar situation. I am constrained to VSS, so my immediate need is to solve this with that in mind, but at the same time I'd be interested in hearing how you manage this with other source control systems. I guess you would probably have a branch that is your latest developments branch, and then merge changes into the trunk whenever you want to release them? I really don't trust VSS for branching/merging, and I think the branching concepts might be a little too much overhead and learning curve for this shop. Like I said though, stories with other source control systems would be useful future knowledge for me.
Thanks in advance.
You need a branching structure in your repository to facilitate this. Something like the release branch method. Only select individuals can commit to this branch (or have a release/stable for that). Set up your manual CI launches to pull from the release branch as release nightly promote to milestone or final from there. I don't like the idea of manually modifying things on your build machine. Set up the changes in version control, in a safe place to prepare your release and let CI build from there, but manually triggered.
Check out these branching patterns. I suggested C3, codeline-per-release, often called release branching.
Heres an article on VSS branching that includes a link to merging.
This question looks similar.
Maybe you could move to another source control system with better support for this kind of thing. Any suggestions from MS people out there?