Automatically managing license/author/version header in source files

Automatically managing license/author/version header in source files - version-control

It is generally considered good practice to add some lines with author, version and license information to the top of source files. For instance, Gnu GPL v3 suggests to add
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms [SNIP]
I find it tedious to add it manually to each file, and to have to update them all every now and then when some of this information changes (new authors, copyright years, version bumps).
Is there a way to manage this automatically, so that I only have to edit this stuff in one place and it gets automagically copied around?
If needed, you may assume that I am using any modern revision control system.

It is generally considered good practice to add some lines with author, version and license information to the top of source files.
That depends. First of all there are two (and more) ways to do this:
manage licensing information per file
manage licensing information in a central location
If you start a project from scratch, the per-file method is often easy to do while keeping things clear. As you write, over time it becomes more difficult to keep track of things. So more and more projects switch to the central location variant.
The file-by-file method has the benefit that the scope of a work is clear. Often you write the name of the application in the file-comment. If a single file is taken out for some reason, the information is still in there and the documentation chain is not broken.
With the central location method, the benefit is that this is normally supported by your version control software, for example GIT. Commits can be signed by the committing person, and author can be given. It's documented who has written which code automatically and that information is stored in a central location: the VCS.
Keep a COPYING file with your package where you provide the main information centrally. You can easily generate the list of authors via the VCS. And per each file you can create one header that just specifies which software and where to look into, just a bare outline:
/**
* Flux Deluxe v3.2.0 - Vector Drawing Redefined
*
* Copyright 2010, 2012 by its authors.
* Some rights reserved. See COPYING, AUTHORS.
*/
If you release a new version in a new year it's a no-brainer to update all files.

Use the License Header Manager

If working with Visual Studio, you could use macro's and attach a shortcut to it.
Then, when creating a new file, use the shortcut to add a header.
If you want to be sure that a header has been included in each file, you can use StyleCop.
Following links might be helpful:
http://abhijitjana.net/2010/12/05/add-document-header-for-files-automatically-in-visual-studio/
http://stylecop.codeplex.com/
In Eclipse, there is also macro support so you should be able to do the same as suggested for VS. However, I do not have any experience with that.
For Java, there is an alternative to StyleCop:
http://stylecop.codeplex.com/
I haven't heard of any SVN-tools that adapt the files itself.
Using macro's in your editor is the closest thing to what you want.

Related

Eclipse Public License v1.0

I would like to use a code which is under EPL v.1 license in a commercial project. As I know I can do so, but the problem is that I need to make some changes in this code.
Thus, I have two questions:
Can I change the EPL code and then use it in a commercial project without any restrictions.
If I am allowed do so, should I remove copyright notes in the files I made changes or may be I should add some additional notes.

Read Paragraph 4 in the official licence text.
You may use it for commercial products, but it must not create any liability on other (previous open source) contributors. In particular, you're responsible on your own if any problems occur.

How should I add licence information to maven project and its source files

I have several (Java) projects under maven control, developed in Eclipse, repo under Mercurial/bitbucket that I licence under Apache2 (though this question applies to any licences). What is the best way to licence this?
I have included a verbatim copy of the (Apache) LICENCE.txt in the top directory of the project. However there is no licence in any of the source files so that if they are re-used in other projects (as I hope they can be) they may get separated from the licence info. [Source files can be configuration/data as well as code and are not Java-specific]. If there are any changes to the licence then all these files will have to be edited. Possible approaches are:
use a brief sentence to refer back to LICENSE.txt
use a Maven licence tool if there is one?
use an Eclipse licence tool if there is one?
use a Bitbucket licence tool if it has one?
[I am on Windows so I don't want a sed/awk/grep approach]
UPDATE - have accepted #Nicmancol as the first answer given worked for me
UPDATE2 - Hmm. It has added a licence to all sorts of files in the distrib. Not such a good idea

You can use the Maven License Plugin or the License Maven Plugin

There are Eclipse plugins for adding / maintaining copyright notices in source file headers; e.g. see this SO question: How to manage license banners in source files of Eclipse plug-in projects. (The answers are more general than the question ...)
With a Maven project you can / should also add license details to the POM file.
From a purely legal perspective, it probably doesn't matter if a file gets separated from the "bundle" containing the copyright notice. Copyright applies irrespective of whether there is a copyright notice.
I agree that copyright applies irrespective, but authorship and licenses do not. So in an area where software is likely to be re-used we need to give the re-users that information.
Both authorship and licensing also apply irrespective of whether this is stated in each file.
Authorship is simply a fact, "William Stallings wrote Emacs" remains true even if someone strips the source headers. But knowing who the author of some piece of software is has no bearing over how someone else may use it, so it probably isn't of much relevance.
Licenses derive from copyright, and the default license is as set out in the relevant copyright law. That is, the default is that you do NOT have the right to make a copy, or have a copy that was made illegally.
If a file becomes separated from the license information, then it is up to the user of the file to deal with the problem; i.e. HE needs to find out what the license is. Because, the default is that he has no license.
Basically, if the copyright and/or license are unclear, the obligation is on the copier to find out what the copyright / license status is ... not the copyright owner / licensor. And that is as it should be. It is not possible for the copyright owner / licensor to PREVENT the information from BECOMING separated, and penalizing the copyright holder / licensor for something (illegal) that someone else did to achieve that separation would be manifestly unfair.

Documentation and version control

Given a project I'm about to start there will be documentation produced.
What is the best practice for this?
Should the documents live with the code and assets or should there be a separate documentation store?
Edit
I'd like a wiki but I will need to print the documents etc... It's a university project.

It really depends on your team. Where I work, we keep documentation in a wiki which is linked in with our team website. For the purposes of shipping documentation, the wiki can be exported and we run it through a parser that "fancifies" the look and feel of the documentation for customer purposes.
Storing the documentation with the code (typically in your source repository) is not a bad idea. Just make sure to keep them separated. For example, keep a docs folder which is on the same level with your src folder in your repository. This way, you can quickly ship the current documentation, you can easily track revisions, and anybody new to the project can immediately jump in without having to go to multiple locations for information.

Storing it in source control is fine.

This is an interesting question -- basically, what others are saying is right about generated documentation, source files and templates/etc. should be stored in source control and generated during your build process.
As far as requirements/specs/etc. documentation, I have worked both ways, and I very much prefer using SharePoint or a Wiki/document portal that is designed for document sharing/versioning. The reason is, most non-developer folks aren't comfortable working with source control systems, and you don't gain any of the advantages of intelligent merging if you are using a binary format like Word. Plus it's nice to have internet-based access so you can reference and work on the docs in a distributed team without people having to install extra software.

Here's a 2017 summary of the options and my experience:
(extreme 1) Completely external (e.g. a wiki, Google Docs, LaTeX, MS Word, MS Onedrive)
People aren't bothered about keeping it up to date (half of them don't even know where to find the page that needs updating since it's so out of the trenches).
wiki platforms are “captive user interfaces” - your data gets stored in their proprietary schemas and is not easy to examine with a simple text editor (Confluence is even worse in that you have no access to the plaintext content at all anymore)
(extreme 2) Completely internal (e.g. javadoc)
pollutes the source code, and is usually too low level to be of any use. Well-written source code is still the best form of low level documentation.
However, I feel package-info.java files are underutilized.
(balance) Colocated documentation (e.g. README.md)
A good half way solution, with the benefits of version control. If a single README.md file is not enough, consider a doc/ folder. The only drawback of this I've seen is whether to source control helpful graphics (e.g. png files) and risk bloating the repo.
One interesting way to avoid this problem is to use plaintext diagram tools (I find Grapheasy and Text Diagram to be a breath of fresh air).
plaintext can be easily read even if your rendering engine changes as the years go by.
Github's success is in no small part thanks to its README.md located in the root of the project.
One tiny disadvantage of this approach though is that your continuous integration system will trigger a new build each time you make edits to the README.md file.

If you are writing versioned user documentation associated with each release of the product, then it makes sense to put the documentation in source control along with its associated product release.
If you are writing internal developer documentation, use automated internal source code documentation (javadoc, doxygen, .net annotations, etc) for source level documentation and a project wiki for design level documentation.

I think most of us in the industry are not really following best-practices and it of course also depends a lot on your situation.
In an agile environment where you would have a very iterative process of release, you will want to "travel light". In this particular case, Jason's suggestion of a separate Wiki really works great.
In a water-fall/big bang model, you will have a better opportunity to have a decent documentation update with each new release. Also you will need to clearly document what version of the requirements was agreed on and have loads of documentation for every tiny change you do to requirements (due to the effects it has on subsequent stages). Often if the documentation can live together with the version controlled source code it is the best.

Are you using any sort of auto-documentation or is it completely manual? Assuming that you are using an auto-documentation system, the documentation is more or less generated on the fly, and would be part of the code itself.
To me, (assuming that it's possible with whatever code you are using), this would be the preferred method of handling it, as you wouldn't need to maintain the documentation source at all.

Do you version "derived" files?

Using online interfaces to a version control system is a nice way to have a published location for the most recent versions of code. For example, I have a LaTeX package here (which is released to CTAN whenever changes are verified to actually work):
http://github.com/wspr/pstool/tree/master
The package itself is derived from a single file (in this case, pstool.tex) which, when processed, produces the documentation, the readme, the installer file, and the actual files that make up the package as it is used by LaTeX.
In order to make it easy for users who want to download this stuff, I include all of the derived files mentioned above in the repository itself as well as the master file pstool.tex. This means that I'll have double the number of changes every time I commit because the package file pstool.sty is a generated subset of the master file.
Is this a perversion of version control?
#Jon Limjap raised a good point:
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
That's really the crux of the matter in this case. Yes, released versions of the package can be obtained from elsewhere. So it does really make more sense to only version the non-generated files.
On the other hand, #Madir's comment that:
the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes
is also rather pertinent in that if a user finds a bug and I fix it immediately, they can then head over to the repository and grab the file that's necessary for them to continue working without having to run any "installation" steps.
And this, I think, is the more important use case for my particular set of projects.

We don't version files that can be automatically generated using scripts included in the repository itself. The reason for this is that after a checkout, these files can be rebuild with a single click or command. In our projects we always try to make this as easy as possible, and thus preventing the need for versioning these files.
One scenario I can imagine where this could be useful if 'tagging' specific releases of a product, for use in a production environment (or any non-development environment) where tools required for generating the output might not be available.
We also use targets in our build scripts that can create and upload archives with a released version of our products. This can be uploaded to a production server, or a HTTP server for downloading by users of your products.

I am using Tortoise SVN for small system ASP.NET development. Most code is interpreted ASPX, but there are around a dozen binary DLLs generated by a manual compile step. Whilst it doesn't make a lot of sense to have these source-code versioned in theory, it certainly makes it convenient to ensure they are correctly mirrored from the development environment onto the production system (one click). Also - in case of disaster - the rollback to the previous step is again one click in SVN.
So I bit the bullet and included them in the SVN archive - the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes.

Not necessarily, although best practices for source control advise that you do not include generated files, for obvious reasons.
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?

Normally, derived files should not be stored in version control. In your case, you could build a release procedure that created a tarball that includes the derived files.
As you say, keeping the derived files in version control only increases the amount of noise you have to deal with.

In some cases we do, but it's more of a sysadmin type of use case, where the generated files (say, DNS zone files built from a script) have intrinsic interest in their own right, and the revision control is more linear audit trail than branching-and-tagging source control.

Solution deployment, CM, InstallShield

People,
We have 4 or 5 utilities that work in conjunction with our application. These utilities are either .bat files, or VB apps, PowerBuilder, etc. I am trying to manage these utils in source control, and am trying to figure out a better way to assign versions to them. Right now, the developers use the version control's meta-data -- specifically label -- to store the version number of the tool.
My goal is to have individual InstallShield packages for each utility, and an easy means to manage and assign version numbers to these packages.
Would you recommend a separate .ini file with the info, or store the info in InstallShield .ism file itself, or just use the meta-data info from version control tool?
UPDATE:
I like the idea Orion. I have one concern though. The script that increments the version number... it can not be intelligent enough to increment Major number etc. right. e.g. if one of the utils has version 1.2.3 and we are at a point where the new version is 2.0.0. The script may not be able to handle this.
I think this has to do a lot with our branching techniques -- we don't have any. The folks thought since the utils are so small, the source may not need branches.

PowerBuilder in particular has a nice trick you can do to incorporate the build number from an ini file into the compiled application.
Details here: http://www.pbdr.com/pbtips/ex/autorev.htm
We have ini file inside source control that stores the build number and its value is used in our build scripts to determine what label to apply to the source tree after a successful build. Works very nicely for our needs. When we branch, we do have to manually kick the file to increment the proper number though.

I managed our build system at my last job, which seemed to have some parallels to what you're asking.
There were ~30 C++ projects which needed compiling, and various .NET/Java things, and the odd perl script.
This was all built on our build machine using NAnt - If I were doing it today I'd use rake, but the idea is the same.
We basically had an auto-incrementing build number which was stored in a version.txt file in the root of the repository.
Each time we did a build (automatically done each night, or also on-demand if neccessary) the script would increment this number and check the file back into source control.
All the other apps referenced this file for their version number, or for things which didn't support working like this, the script would set environment variables or perform other workarounds
I'm pretty sure that our installshield programs referenced an environment variable for their version number, but we deprecated them in favour of wix as installshield really did suck
in the case of visual studio, grep/replace the number within the .csproj files, and check them back in
Hope this gives you some ideas

Using the meta data from your version control system should keep things simpler. It's how your developers already use the system. There is no additional file to maintain. My personal experience has taught me to version the satellite applications with the same as version as the main app. K.I.S.S

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse