Choosing a version control package for document control - version-control

We have a legal requirement to ensure the latest version of documents (mainly Word and Excel) are readily accessible. We currently implement document control by manually updating the page footer with a new version number but want a better system.
I've played around with TortoiseSVN and the functionality is good; but the problem is that unless I've missed a configuration variable somewhere, Subversion applies version numbers to the whole project (i.e. every file in the repository), not to the individual files. What I want is to be able to create a folder in the repository and all our documents go in there, and the version numbers of the files are only changed when that particular document is changed. Currently if we had 30 files and each was printed and put on display including the version numbers, if someone went to the repository the version number would almost certainly have changed even if the document contents were identical. Not ideal.
The alternative to this would be to create a new repository for each and every document but the administrative overhead on that will be prohibitive. I'm essentially looking for something that does much of what TortoiseSVN does, but treats files as individual projects with their own independent version number.
Whatever solution we come up with we would want the version of the document to be automatically shown in the page footer of the document. Tortoise can do this with a macro http://insights.oetiker.ch/windows/SvnProperties4MSOffice/.
Appreciate any help, thanks.
Greg

I think what you're actually looking for is a document management system.
Version Control Systems (VCS), such as Subversion (the underlying technology behind TortoiseSVN), are fundamentally unsuited for your task due to their focus on tracking changesets among all files within a given project (i.e. one changeset/version can involve changes to many files within the project).
Another advantage of using a document management system is that they typically allow you to attach extensible metadata attributes to your documents in a much richer way than version control systems, as well as providing search capabilities.

Related

How to manage multiple small parts of a library with version control

I'm new to source version control, so I don't want to make a mistake in choosing the wrong setup for my project.
I have kind of a "library" that is made of many small "procedures" (they are written in a pseudo-language specific of a third paty software). Each procedure is a small stand alone "package" of 2/3 files (just the procedure itself, the documentation, and maybe one or two other sub-procedures that are needed only to the main one).
So I have like hundreds of those procedure-packages, archived in subfolders depending on the area of application, and some of them more complex may use others more basic.
I modify those procedures pretty often in the early stages, to improve them, but of course sometimes the modifications break the compatibility since thei involve adding/removing input/output parameters, so I suppose I must somehow "tag" versions of each procedure as if it was a single piece of software...
So I'm wondering what's the best way to manage them with a version control (I'm using Mercurial): am I supposed to make like hundreds repositories? o_O Or keep everything in one big repository and tag it everytime a procedure is revised? or maybe learn and use subrepositories?
Thanks for your help.
Simone
I can be subrepositories (or GuestRepo - no updates from 2013) with tags
Each changeset in "main" repo have linked to it changesets from all repositories, i.e. when you update to old changeset in master, all subrepos also updated accordinly
Tags in main repository will allow you to mark stable|functional combinations for re-use
Subrepos make sense, if each package can stand for itself without any connection to the master software. If this condition is not met, I would stay with one single repository. Especially since you stated that your packages contain few files with few changes it seems that the subrepo approach does not make sense here.

best practice in storing non-source files under version control

I have a project under version control, but the project has some images, videos and zip files that change every so often. I don't want to store these files under version control because they take up a lot of space and make updates and commits very slow.
What's a good way of dealing with this issue and still commit non-source files that have changed? is there a better way?
I'm currently using subversion, if there is another version control client that is better for dealing with this issue, please recommend it!
I have lots of non-source files in SVN and the only time it slows down the commit is when I change them. I don't see how this is an issue if they're only changing "every so often". Also the size really shouldn't be a concern. If your repository is on a server and you're worried about how much space it's taking up you need to upgrade. Hard drives are cheap. Buy them.
Some people feel strongly that non-source files don't belong in source control, I say an entire project should be stored in source control. That way if my development system goes down I can switch to another and after a couple minutes of downloading the project I'm back to coding.
You added in a comment:
The problem I have with this is that I don't care about the version history of those zip/video files, as long as they're the newest ones, there is no problem.
That means you have a linear workflow of development, only working on the LATEST of one main branch.
You do not seem to deal with the phase "after release", where you have to:
maintain what runs in production
develop small evolutions...
..while doing massive refactoring for experimenting some big evolutions
In the last three cases, the question of "what were the exact images, videos and zip files "I have to use" or "I was using at the time" might become important.
Anyhow, if you feel SVN do not handle them appropriately, I would still recommend having some way to remember that, for SVN revision xxx to yyy, you were using the version 'z' of your set of binaries.
For that, you could setup an external repository like Maven. See question "Is it acceptable/good to store binaries in SVN?" (my answer in that question is near the top of the page, but I link directly to Evan's answer as he mentions Maven).
While I find it a huge pain to deal with non-text files in version control, especially ones that change a lot, I've accepted the practice of "if it is needed for the build/installer it should go in version control". This of course is not a hard rule. I don't keep 3rd party libraries under version control (though know people who do).
I came under this opinion after setting up a Continuous Integration server at my shop. Having everything needed for the build that may change make's it much easier. As previously mentioned I don't keep libs under version control, but that is due to the fact that we rarely upgrade/add new libraries. If this is not the case for your shop then you may consider doing so. Also, if your images/videos/zips change more then once a year then I'd recommend keeping them under version control.

Are there version control systems that allow you to permanently delete files?

I need to keep under version some large files (some Gigs).
I don't need, and I can't keep under version all the version of the files.
I want to be able to remove from my VCS large files version at some moment.
The files that I want to keep under version control are big .zip files or ISO images.
These files may contains executable software or data (seismic data, SAR images, GNSS data) and they are provided by the software supplier of my company.
What control version system could I use?
In CVS you can do that by removing the files from the repo. Subversion allows that by dumping the content of the repo and filter it to remove the files (that is a bit cumbersome). Perforce has an obliterate command for that. Many of the newer distributed VCS make it rather difficult by their usage of hashes all over the places and the fact that your repo may have been replicated elsewhere also complicate things. Hg has a strip command (part of the Mq extension), Git can also do that I think.
I don’t think there’s any version control system that allows you do that regularly because that goes against everything version control systems stand for.
Perforce generally allows files to be put in two way, as head revision only (so, you'd only every have one copy) or all revisions. Perforce does have the admin level obliterate command that can be used to delete revisions. Its up to you to query for a list of files, possibly by date or number of revisions, and to specify the revisions to the obliterate command. As the name suggests obliterate deletes the revisions permanently from the database, so, I always generate scripts to do this and review them before running them. If the obliterate command is NOT run with the -Y flag, it will generate a list of what would be obliterated, also very useful.
Somehow I get the impression that you should not use a version control system at all. As said before, what you're trying to do goes against everything you would need a version control system for in the first place.
I suggest you create a file system directory structure that makes sense for what you're trying to accomplish and so that you can structure your data. And just make backup's of those files.
TFS has a destroy command that you can use to permanently delete files or revisions as you see fit.
There is more information at this MSDN article.
Many version control systems allow you to configure them in a way so that they store only the differences between several versions of a file and save space through that.
For example if you have a 1Gig file committed, change a part of it and commit it again, only the changed part will be stored in the version control system.
There won't be 2Gigs used (initial and new file) but only 1Gig+sizeOfChanges.
There's just one downside:if you're storing files which change their whole content from revision to revision this can also be counter-productive as the changes take almost the same space as the original version. Archive files are a example for such files where only a small change in the (real) content can lead to a completely changed content of the archive file.
I'd suggest to test several version control systems on your own and with your specific needs and environment and monitor each one at the server-side how the storage requirements for each system changes.
Some distributed version control systems allow to create "checkpoints" that allow you to use this version as kind of a base revision and safe you from pulling all the history before the checkpoint on every checkout. So you can remove the big files, create a checkpoint, and checkout/clone the repository from that checkpoint to a new directory. Then you have there a new, small repository, but without the history before the checkpoint. It you don't need that history you can burn the old repository on CD and use the new, partial one from now on.
I've only tested it in darcs, and there it works, but YMMV depending on version control system and use cases.
It sounds to me like you need an intelligent backup system, rather than version control.
I use SyncBackSE; it allows you to keep a number of previous versions, and can also do things like "ignore all files changed more than 30 days ago".
It's one of the few bits of paid-for software I use. I think it's worth checking out.
I think you're talking about something like "AlienBrain" "bucket" system, aren't you? The ability to remove some revisions from version control.
If you want to destroy an item, it's normally called "obliterate" and it's supported by a number of systems out there.
Buckets, AFAIK are supported by:
AlienBrain
Accurev
PlasticSCM
I would save such files under a unique name (datestamped, perhaps), and perhaps additionally make a textual reference to the external file in the version control system.
Fossil allows you to do this via the "shun" mechanism. Fossil being a distributed SCM, however, means that this does not affect all repositories (for obvious reasons).

Source Control for multiple projects/solutions with shared libraries

I am currently working on a project to convert a number of Excel VBA powered workbooks to VSTO solutions. All of the workbooks will share a number of class libraries and third party assemblies, in fact most of the work is done in the class libraries. I currently have my folder structure laid out like this.
Base
Libraries
Assemblies
Workbooks
Workbook1
Workbook2
Each of the workbooks will be its own solution, and the workbook solutions just reference the assemblies in the folder structure. My question is how would you lay out the source control? Would you start the repository at the base? Or would you create a repository for each workbook solution? Would you rearrange the folders?
Now that we have the initial development done, we're about to have a bunch of outside developers come on to the project to helps us convert the rest of the workbooks and I really like the idea of them being able to check out from the base directory and having all of the dependencies ready to go. I also worry that there are other concerns that come with having 20+ solutions/projects under one source control repository.
I want everything to be as simple as possible for people joining the project but I don't want to sacrifice long term usability. In my mind I've been going back and forth, what's simpler one repository or one repository per solution?
I'd appreciate and insight you have, because I'm fresh out.
Additional Information: Currently, I am using Mercurial personally, but the project will probably get moved to StarTeam unless I can make some convincing arguments for something else.
You don't mention in your question what source control you are using. As it doesn't sound like you need to limit your outside developers access to the rest of the repository I would not bother with setting up multiple repositories. I would assume that unless your code runs into the millions of lines size that repository size is not an issue.
It all depends what functionality your revision control system supports. In subversion you can declare other folders as external and provide a file URL for the content of that folder, this will cause subversion to deal with that folder as a separate repository even though it is within your folder structure.

Is version control (ie. Subversion) applicable in document tracking? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am in charge of about 100+ documents (word document, not source code) that needs revision by different people in my department. Currently all the documents are in a shared folder where they will retrieve, revise and save back into the folder.
What I am doing now is looking up the "date modified" in the shared folder, opened up recent modified documents and use the "Track Change" function in MS Word to apply the changes. I find this a bit tedious.
So will it be better and easier if I commit this in a version control database?
Basically I want to keep different version of a file.
What have I learn from answers:
Use Time Machine to save different
version (or Shadow copy in Vista)
There is a difference between text
and binary documents when you use
version control app. (I didn't know
that)
Diff won't work on binary files
A notification system (ie email) for revision is great
Google Docs revision feature.
Update :
I played around with Google Docs revision feature and feel that it is almost right for me. Just a bit annoyed with the too frequent versioning (autosaving).
But what feels right for me doesn't mean it feels right for my dept. Will they be okay with saving all these documents with Google?
I've worked with Word documents in SVN. With TortoiseSVN, you can easily diff Word documents (between working copy and repository, or between two repository revisions). It's really slick and definitely recommended.
The other thing to do if you're using Word documents in SVN is to add the svn:needs-lock property to the Word documents. This will prevent two people from trying to edit the same document at the same time, since unfortunately there's no good way to merge Word documents.
With the above two things, handling revision controlled Word documents is at least tolerable. It certainly beats the alternative of using a shared folder and track-changes.
What on Earth are you all Word-is-binary-so-no-diff people talking about? TortoiseSVN, for example, integrates right out of the box with Word and enables you to use Word's built-in diff and merge functionality. It works just fine.
I have worked on projects that store documents in version control. It has worked out pretty well, although if people are unfamiliar with version control, they are probably going to have conceptual difficulties with things like "working copy" and "merge" and "conflict". Don't overestimate the users' capabilities when you plan your document management system.
I believe there exist big and powerful commercial solutions for all of this, as well. I'm sure if you have enough kilodollars, you can get something that fits your needs perfectly. Document management systems are a big business for big enterprise.
I guess one thing that nobody seems to have asked is if you have a legal requirement to store history of changes to the doc's?
Whether you do or don't is going to have an impact on what solutions you can consider.
Also a notification mechanism for out of date copies is also a bundle of fun. If engineer A has a copy of a document and engineer B then edits it and commits the changes you want engineer A to be notified that his copy is out of date.
Document control can become a real can of worms quite easily.
Maybe keep the doc's under CVS or SVN and set it up so that emails are generated to whoever has checked out a copy when updates for the same doc. are checked in to the repository?
Edit: I forgot to add don't forget to use the binary switch, e.g. -kb for CVS, when adding the new doc. Otherwise, you will get any sequences of data that happen to match the ascii for keyword strings having the relevant config management data appended thereby corrupting your doc. data.
Thinking out of the box, would migrating to a Wiki be out of the question?
Since you consider it feasible to force your users into Subversion (or something similar), a larger change seem acceptable.
Another migration target could be to use some kind of structured XML document format (DocBook comes to mind). This would enable you to indeed use diffs and source control, while getting all sorts of document formats for free.
Sharepoint also does a good (ok decent) job of versioning MS-specific documents.
How about trying git , It seems git can support word .doc and open document .odf files if you configure it in .gitattributes file.
Here is a reference , Scroll down to diffing binary files .
For what it's worth, there is also Google Docs. I guess it's not a perfect fit, but it's versioning is very convenient.
Clearcase integrates with Word for revision tracking. I believe Telelogic DOORs does as well.
I use Mercurial with the TortoiseHg overlay. I can right-click a changeset, choose "Visual Diff", then choose the "docdiff" tool (comes bundled), which launches the document in Word with the Track Changes.
You can, but you will allways compare the document versions with Word itself.
I haven't heard a version control database which can track changes in Word documents.
However there are some tools which can compare Word documents, so if you set up your version control client to use these tools for comparison, you can have some fun.
Not necessarily. It depends on how often the new files are committed to the repo. If the files are edited several times before a commit, then you're precisely where you are now. The biggest benefit is if the file becomes corrupted.
You can version any file; this is how Time Machine in Mac OS X Leopard works, for example, and there is an interesting article by someone who committed his entire computing environment into CVS and then just maintained working copies on his home and work machines.
But "better" and "easier" are specific to your situation, and I'm not sure I completely understand your problem as things stand.
Subversion, CVS and all other source control systems are not good for Word documents and other office files (such as Excel spread sheets), since the files themselves are stored in a binary format. That means that you can never go back and annotate (or blame, or whatever you want to call it), or do diffs between documents.
There are revision control systems for Word documents out there, unfortunately I do not know any good ones. We use such control systems for Excel at my work, and unfortunately they all cost money.
The good thing is that they make life a lot easier, especially if you ever have to do an audit or due diligence.
If you use WinMerge it has added support for merging Word and Excel binary files.
Have a look at Sharepoint. If cost is an issue, Sharepoint portal sevices can also work for you. Read this for more info
You could use something like the Revisionator, which is like google docs but with built in revision control including diffs, forks, and 3 way merges. http://revisionator.com
UPDATE: It also fixes the problem of too frequent autosaving that you mention with Google Docs. It'll still autosave to prevent data loss, but it will only create a new version in the revision history and share with other users when you explicitly "release" your changes.
Just wanted to clarify an answer someone gave but I don't have enough points yet.
diff will work on binary files but it is only going to say something not really useful like "toto1 and toto2 binary files differ".
You could do that, but if that files are binary you should always put a lock on it before editing. You won't get a conflict (which would be unresolvable).
Many of the new version control projects are better suited to entire directories, and not so much for single files.
Convincing someone that they need to get an entire project, when they only want to update an individual file can be a "fun" way to spend an afternoon.
Another option you have is a piece of software and cloud computing magic called dropbox. Or, you could ditch the word documents and make a locally shared mediawiki instead.
DropBox:
getdropbox DOT com
MediaWiki:
mediawiki DOT org
YES, it's applicable! I totally agree to say that the combo SVN+TortoiseSVN suits well to track MS Office documents. You can lock a document for edition, write protect all unlocked files to avoid conflicts (i.e. parallel modifications), diff two versions of the same file, see the history of all the modifications and of course rollback to an older revision.
I tried to describe all of those tips in a dedicated blog post. (disclaimer: I'm the blog owner)
All of this could even be accessible from the web with a SVN web client! (might need some software development)
But if you're not accustomed to Version Control Systems in an other context this may not be the obvious choice. The needed work for a good integration with docs give dedicated tools an advantage: "electronic document management" systems are made just for that. A VCS like SVN may stay a good alternative for cost reasons :-)
Did you test the online service Simul? It looks promising, I personally like the GitHub-like orientation. Note that I'm not affiliated to Simul!