Pros and cons of versioning javadoc - version-control

I am wondering whether or not to commit Javadoc files to my project's SVN repository.
I have read about SVN good practices, including several interesting questions on SO, but none has already been asked specifically on javadoc handling.
At first I was agreeing with the arguments that only source code should be versioned, and I thought that javadoc was really easy do re-build with Eclipse, or from a javadoc.xml ant file for example, but I also thought of these points :
Javadoc files are light, text-encoded, and changes to these files are easily trackable with diff tools.
It seems interesting to easily track changes to the javadoc, since in the case of a "public" javadoc, any change of it would probably mean a change in the API.
People willing to look at the javadoc do not necessarily want to get the whole project and compile it, so putting it in the repo seems as good an idea as another to allow for efficient sharing/tracking.
What are your thoughts about this? Please answer with constructive, non-subjective arguments. I am interested in understanding which case scenarios encourage the versioning of Javadoc, and which make it seem a bad choice.

One argument against would be merge conflicts, and as a former SVN user I hate merging with SVN. Even with Git this is just another step of work to do if those problems occur. And if your in a bigger team regular merges are daily work.
Another argument against would be, if a some people don't want the whole source tree, put the whole project under some CI system like Hudson and trigger the creation of the javadocs on a regular basis, like commits and publish them somewhere.
Conclusio for me is: don't version javadocs.

I recently added some javadoc output to a version control system (since github shows the contents of branch gh_pages as a website, this was the easiest way to put them on the web).
One problem here is that javadoc puts in every file the date/time of the javadoc run, so you always have changes to all your files from one commit to the next. So don't expect to be able to have a useful diff which shows you what documentation really changed between your versions, as long as you don't manage to somehow ignore these comment lines when diffing.
(Actually, due to another question I found out how to omit the timestamp.)
And of course, you should always be able to regenerate your javadoc from a checkout of the old sources. And for released libraries, publish the javadoc of the released version with it.
For third party-libraries which you are using inside your project as jar files (or anything that you don't compile yourself) it might be useful to store the javadoc corresponding to the version used inside the source tree (and thus be versioned, too), though. (When using Maven, you usually can download a javadoc jar together with the runnable jar of the library, so then it doesn't apply.)

Short answer: no, don't version your Javadocs.
The Javadocs are generated from your code, analogous to your .class files. If your API changes and you need to release a new version of the docs, you can always revert to that revision (or do a new check out) and generate the Javadocs from there.

My two cents...
I don't know how many times I've WISHED I had old versions of Javadocs available. For example, I may be using an old version of a third-party library, but the API docs for that library are based on the current code. All well and good if you're using the current code, but if you're on an older version, the javadocs could actually mislead you on how to use the class in question.
This is probably more of an issue with libraries you distribute than with your own internal code, but it is something I've run into now and again

Related

How to prove to the management the futility of saving IDE specific files in GitHub

I can not prove to the management the futility and even harm of saving IDE specific files and folders in GitHub
There is even a problem in the deliberate mixing of two different issues:
I want to use Eclipse;
If we use Eclipse, then its files should be stored in the repository;
I tried everything:
from "files not related to the project should not be there"
to "every developer knows how to configure their IDE for the project based on the pom.xml"
and "if two programmers use two different IDEs, should their files also be saved?"
and so on... like "specially designed gitignore.io provides recommended gitignore based on what you are using."
What arguments can be given besides “no one does that for a long time because it's obvious”.
PS. I am not going to start another holy war, I need arguments that are clear to the management.
The rule is not do not share IDE-specific files, but as long as tool-specific files are maintained, they should be shared, even if they are not used by everyone.
This applies to specific files of GitHub, Jenkins, FindBugs/SpotBugs, Eclipse and other tools. The presence of these files does not harm (files and folders starting with a dot meant to be hidden). This is well documented (e.g. here for Eclipse) and after all, the tools do not place these files in the project directory for no reason, although it would be possible otherwise, but because they are meant to be shared.
However, there are still people who believe that there should be one Maven-specific pom.xml file only, which is focused on building only. But since none of them is a tool developer and none of them has never convinced the tool developers, it is very unlikely that you will convince your management.
Also be aware that Eclipse-specific files are not specific to the Eclipse IDE, as they are also used by e.g. VS Code. Eclipse-specific are not even IDE-specific, since, for example, the Eclipse compiler for Java (ecj) can be used as a linter inside a build to run on a server.

Need a sensible version-control scheme for shared library code

Using Matlab for development and Mercurial for version-control, how do I properly version all code for each of my projects, when they share some common classes and functions?
My current scheme addresses this imperfectly; I have a repository for each project and a repository for the common library. This necessitates a manual manipulation protocol, including:
Manually referencing the project name/version in the commit description for the library
Manually updating the changeset for both the project and library, if reverting to a previous state
This has worked reasonably well so far, but does run the risk of human error in following the protocol and unintended consequences of a library modification on another project. The latter can be addressed with hg update -r on the library, but is error-prone since I have to remember to go back, as I move between projects.
Searching here (and elsewhere), I thought I had found salvation in sub-repository branches, only to discover the practice is basically frowned upon and considered a feature of last resort.
I then found that some folks eschew direct versioning with the project in favor of treating the library as a package for the build software to manage. Taking the library off the Matlab path, creating version clones and telling the builder which one to use, for any particular project, is a brilliant idea - except that I also use the Matlab interpreter to run/debug my code, as well as use the library in various scripts, so I need the library on the Matlab path - which means the builder will automatically pick up the library version that's on the path.
The only other scheme I can think of is to copy the library dependencies into the Project folder for revision control by the project repository. A change protocol would have to include copying the affected library class/function back to the library folder and typing an appropriate commit message. The trick here would be in manually updating project copies of library files, unless there's a Mercurial command to selectively pull from a foreign repo.
Does anybody have a better, more robust way to manage shared library code among projects in both interpreter and build environments?
Thanks to everyone who commented and to those who took the time to read my question. I am loathe to ask questions, since I never think my queries are so novel as to be previously un-asked. But in this case, I was finding it hard to come up with the right search terms/phrase; hence the less-than concise phrasing of my question.
I still don't know if there's a standard approach to managing software configuration for a project, when it includes non-project-specific dependencies, but the scheme I've decided to adopt is outlined below. I should say that the development framework I'm using is Matlab, which may well be argued isn't a terribly good framework for developing a GUI application, but it's the only one I have for now. Should I move to .Net, or some other framework, then maybe some of the issues I'm having would be much more readily resolved.
I decided the ability to version a project in its entirety took precedence and so I copied all of the project-agnostic dependencies (that is, functions and classes that I've developed) from a central library repo to a folder within the project repo.
It just means I have to be disciplined in managing the Matlab search path, as well as the protocol for copying changes made to these dependencies back to the central library - and for polling the library for any changes that originated from another project.
This doesn't seem elegant, but it does make me think more carefully about the functions and interfaces that I put into the library, which should be a good thing.

Checkstyle source control intergration

I've been looking into checkstyle recently as part of some research into standard coding conventions. Though it seems like it is perfectly suitable for brand new projects, it seems to have a huge barrier to adoption for already existing projects as it doesn't seem to supply a method of only checking new or edited code. Maybe I'm wrong?
If you have a codebase that has never had a coding standard it could be a massive effort to get the whole codebase inline with a standard all at once. Allowing it to be done incrementally over time as code naturally evolves seems like a more reasonable approach. But it doesn't seem like a possibility with checkstyle.
I assume this would have to be a tie in with a source control system in order to be possible. Is that possible with Checkstyle or is there another tool that can provide this functionality?
As far as I known, Checkstyle is meant to analyze source, without considering its history or revisions.
To add that kind of feature means script checkstyle analysis to feed it with the exact sub-set of files representing the delta.
But then, certain kind of checks would be likely to fail or to miss in their analysis, like duplicate code check.
So for that kind of incremental analysis, you not only need to restrict the set of sources, but also the set of rules you want to enforce, for some of those rules only make sense on the all sources.
So, why couldn't you run a full check on each file and then filter results based on changes managed by your source control system? Anything like that exist?
Not to my knowledge, especially with plugin like eclipse-cs for eclipse: it they analyze a file, they will display all warnings, even though the source control mentions the file has not changed since a given revision.
Only an external script would be able to do this:
The principle is simple (although it could be a bit slow at execution time):
for each file, do a diff to check if modification have been made
if yes,
do a svn blame to annotate lines with the revision number which contained the last change.
Then analyze the file with checkstyle.
The script can then filter the warning for the line being currently modified (or for all the lines modified after a given revision).
We developed a Checkstyle plugin for SCM-Manager, a tool for managing git, subversion and mercurial repositories. If activated it is possible to check committed source code against your Checkstyle rules. If the check found errors, the commit is aborted.

What to put under version control?

Almost any IDE creates lots of files that have nothing to do with the application being developed, they are generated and mantained by the IDE so he knows how to build the application, where the version control repository is and so on.
Should those files be kept under version control along with the files that really have something to do with the aplication (source code, application's configuration files, ...)?
The things is: on some IDEs if you create a new project and then import it into the version-control repository using the version-control client/commands embedded in the IDE, then all those files are sent to the respitory. And I'm not sure that's right: what is two different developers working on the same project want to use two different IDEs?
I want to keep this question agnostic avoiding references to any particular IDE, programming language or version control system. So this question is not exactly the same as these:
SVN and binaries - but this talks about binaries and SVN
Do you keep your build tools in version control? - but this talks about build tools (e.g. putting the jdk under version control)
What project files shouldn’t be checked into SVN - but this talks about SVN and dll's
Do you keep your project files under version control? - very similar (haven't found it before), thanks VonC
Rules of thumb:
Include everything which has an influence on the build result (compiler options, file encodings, ASCII/binary settings, etc.)
Include everything to make it possible to open the project from a clean checkout and being able to compile/run/test/debug/deploy it without any further manual intervention
Don't include files which contain absolute paths
Avoid including personal preferences (tab size, colors, window positions)
Follow the rules in this order.
[Update] There is always the question what should happen with generated code. As a rule of thumb, I always put those under version control. As always, take this rule with a grain of salt.
My reasons:
Versioning generated code seems like a waste of time. It's generated right? I can get it back at a push of a button!
Really?
If you had to bite the bullet and generate the exact same version of some previous release without fail, how much effort would it be? When generating code, you not only have to get all the input files right, you also have to turn back time for the code generator itself. Can you do that? Always? As easy as it would be to check out a certain version of the generated code if you had put it under version control?
And even if you could, could you ever be sure that didn't miss something?
So on one hand, putting generated code under version control make sense since it makes it dead easy to do what VCS are meant for: Go back in time.
Also it makes it easy to see the differences. Code generators are buggy, too. If I fix a bug and have 150'000 files generated, it helps a lot when I can compare them to the previous version to see that a) the bug is gone and b) nothing else changed unexpectedly. It's the unexpected part which you should worry about. If you don't, let me know and I'll make sure you never work for my company ever :-)
The major pain point of code generators is stability. It doesn't do when your code generator just spits out a random mess of bytes every time you run (well, unless you don't care about quality). Code generators need to be stable and deterministic. You run them twice with the same input and the output must be identical down to least significant bit.
So if you can't check in generated code because every run of the generator creates differences that aren't there, then your code generator has a bug. Fix it. Sort the code when you have to. Use hash maps that preserve order. Do everything necessary to make the output non-random. Just like you do everywhere else in your code.
Generated code that I might not put under version control would be documentation. Documentation is somewhat of a soft target. It doesn't matter as much when I regenerate the wrong version of the docs (say, it has a few typos more or less). But for releases, I might do that anyway so I can see the differences between releases. Might be useful, for example, to make sure the release notes are complete.
I also don't check in JAR files. As I do have full control over the whole build and full confidence that I can get back any version of the sources in a minute plus I know that I have everything necessary to build it without any further manual intervention, why would I need the executables for? Again, it might make sense to put them into a special release repo but then, better keep a copy of the last three years on your company's web server to download. Think: Comparing binaries is hard and doesn't tell you much.
I think it's best to put anything under version control that helps developers to get started quickly, ignoring anything that may be auto-generated by an IDE or build tools (e.g. Maven's eclipse plugin generates .project and .classpath - no need to check these in). Especially avoid files that change often, that contain nothing but user preferences, or that conflict between IDEs (e.g. another IDE that uses .project just like eclipse does).
For eclipse users, I find it especially handy to add code style (.settings/org.eclipse.jdt.core.prefs - auto formatting on save turned on) to get consistently formatted code.
Everything that can be automatically generated from the source+configuration files should not be under the version control! It only causes problems and limitations (like the one you stated - using 2 different project files by different programmers).
Its true not only for IDE "junk files" but also for intermediate files (like .pyc in python, .o in c etc).
This is where build automation and build files come in.
For example, you can still build the project (the two developers will need the same build software obviously) but they then could in turn use two different IDE's.
As for the 'junk' that gets generated, I tend to ignore most if it. I know this is meant to be language agnostic but consider Visual Studio. It generates user files (user settings etc..) this should not be under source control.
On the other hand, project files (used by the build process) most certainly should. I should add that if you are on a team and have all agreed on an IDE, then checking in IDE specific files is fine providing they are global and not user specific and/or not needed.
Those other questions do a good job of explaining what should and shouldn't be checked into source control so I wont repeat them.
In my opinion it depends on the project and environment. In a company environment where everybody is using the same IDE it can make sense to add the IDE files to the repository. While this depends a bit on the IDE, as some include absolute paths to things.
For a project which is developed in different environments it doesn't make sense and will be pain in the long run as the project files aren't maintained by all developers and make it harder to find "relevant" things.
Anything that would be devastating if it were lost, should be under version control.
In my opinion, anything needed to build the project (code, make files, media, databases with required program info, etc) should be in repositories. I realise that especially for media/database files this is contriversial, but to me if you can't branch and then hit build the source control's not doing it's job. This goes double for distributed systems with cheap branch creation/merging.
Anything else? Store it somewhere different. Developers should choose their own working environment as much as possible.
From what I have been looking at with version control, it seems that most things should go into it - e.g. source code and so on. However, the problem that many VCS's run into is when trying to handle large files, typically binaries, and at times things like audio and graphic files. Therefore, my personal way to do it is to put the source code under version control, along with general small sized graphics, and leave any binaries to other systems of management. If it is a binary that I created myself using the build system of the IDE, then that can definitily be ignored, because it is going to be regenerated every build. For dependancy libraries, well this is where dependancy package managers come in.
As for IDE generated files (I am assuming these are ones that aren't generated during the build process, such as the solution files for Visual Studio) - well, I think it would depend on whether or not you are working alone. If you are working alone, then go ahead and add them - they will allow you to revert settings in the solution or whatever you make. Same goes for other non-solution like files as well. However, if you are collaborating, then my recomendation is no - most IDE generated files tend to be, well, user specific - aka they work on your machine, but not neccesarily on others. Hence, you may be better of not including IDE generated files in that case.
tl;dr you should put most things that relate to your program into version control, excluding dependencies (things like libraries, graphics and audio should be handled by some other dependancy management system). As for things directly generated by the IDE - well, it would depend on if you are working alone or with other people.

Version control of deliverables

We need to regularly synchronize many dozens of binary files (project executables and DLLs) between many developers at several different locations, so that every developer has an up to date environment to build and test at. Due to nature of the project, updates must be done often and on-demand (overnight updates are not sufficient). This is not pretty, but we are stuck with it for a time.
We settled on using a regular version (source) control system: put everything into it as binary files, get-latest before testing and check-in updated DLL after testing.
It works fine, but a version control client has a lot of features which don't make sense for us and people occasionally get confused.
Are there any tools better suited for the task? Or may be a completely different approach?
Update:
I need to clarify that it's not a tightly integrated project - more like extensible system with a heap of "plugins", including thrid-party ones. We need to make sure those modules-plugins works nicely with recent versions of each other and the core. Centralised build as was suggested was considered initially, but it's not an option.
I'd probably take a look at rsync.
Just create a .CMD file that contains the call to rsync with all the correct parameters and let people call that. rsync is very smart in deciding what part of files need to be transferred, so it'll be very fast even when large files are involved.
What rsync doesn't do though is conflict resolution (or even detection), but in the scenario you described it's more like reading from a central place which is what rsync is designed to handle.
Another option is unison
You should look into continuous integration and having some kind of centralised build process. I can only imagine the kind of hell you're going through with your current approach.
Obviously that doesn't help with the keeping your local files in sync, but I think you have bigger problems with your process.
Building the project should be a centralized process in order to allow for better control soon your solution will be caos in the long run. Anyway here is what I'd do.
Create the usual repositories for
source files, resources,
documentation, etc for each project.
Create a repository for resources.
There will be the latest binary
versions for each project as well as
any required resources, files, etc.
Keep a good folder structure for
each project so developers can
"reference" the files directly.
Create a repository for final buidls
which will hold the actual stable
release. This will get the stable
files, done in an automatic way (if
possible) from the checked in
sources. This will hold the real
product, the real version for
integration testing and so on.
While far from being perfect you'll be able to define well established protocols. Check in your latest dll here, generate the "real" versión from latest source here.
What about embedding a 'what' string in the executables and libraries. Then you can synchronise the desired list of versions with a manifest.
We tend to use CVS id strings as a part of the what string.
const char cvsid[] = "#(#)INETOPS_filter_ip_$Revision: 1.9 $";
Entering the command
what filter_ip | grep INETOPS
returns
INETOPS_filter_ip_$Revision: 1.9 $
We do this for all deliverables so we can see if the versions in a bundle of libraries and executables match the list in a associated manifest.
HTH.
cheers,
Rob
Subversion handles binary files really well, is pretty fast, and scriptable. VisualSVN and TortoiseSVN make dealing with Subversion very easy too.
You could set up a folder that's checked out from Subversion with all your binary files (that all developers can push and update to) then just type "svn update" at the command line, or use TortoiseSVN: right click on the folder, click "SVN Update" and it'll update all the files and tell you what's changed.