Is version control (ie. Subversion) applicable in document tracking? [closed] - version-control

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am in charge of about 100+ documents (word document, not source code) that needs revision by different people in my department. Currently all the documents are in a shared folder where they will retrieve, revise and save back into the folder.
What I am doing now is looking up the "date modified" in the shared folder, opened up recent modified documents and use the "Track Change" function in MS Word to apply the changes. I find this a bit tedious.
So will it be better and easier if I commit this in a version control database?
Basically I want to keep different version of a file.
What have I learn from answers:
Use Time Machine to save different
version (or Shadow copy in Vista)
There is a difference between text
and binary documents when you use
version control app. (I didn't know
that)
Diff won't work on binary files
A notification system (ie email) for revision is great
Google Docs revision feature.
Update :
I played around with Google Docs revision feature and feel that it is almost right for me. Just a bit annoyed with the too frequent versioning (autosaving).
But what feels right for me doesn't mean it feels right for my dept. Will they be okay with saving all these documents with Google?

I've worked with Word documents in SVN. With TortoiseSVN, you can easily diff Word documents (between working copy and repository, or between two repository revisions). It's really slick and definitely recommended.
The other thing to do if you're using Word documents in SVN is to add the svn:needs-lock property to the Word documents. This will prevent two people from trying to edit the same document at the same time, since unfortunately there's no good way to merge Word documents.
With the above two things, handling revision controlled Word documents is at least tolerable. It certainly beats the alternative of using a shared folder and track-changes.

What on Earth are you all Word-is-binary-so-no-diff people talking about? TortoiseSVN, for example, integrates right out of the box with Word and enables you to use Word's built-in diff and merge functionality. It works just fine.
I have worked on projects that store documents in version control. It has worked out pretty well, although if people are unfamiliar with version control, they are probably going to have conceptual difficulties with things like "working copy" and "merge" and "conflict". Don't overestimate the users' capabilities when you plan your document management system.
I believe there exist big and powerful commercial solutions for all of this, as well. I'm sure if you have enough kilodollars, you can get something that fits your needs perfectly. Document management systems are a big business for big enterprise.

I guess one thing that nobody seems to have asked is if you have a legal requirement to store history of changes to the doc's?
Whether you do or don't is going to have an impact on what solutions you can consider.
Also a notification mechanism for out of date copies is also a bundle of fun. If engineer A has a copy of a document and engineer B then edits it and commits the changes you want engineer A to be notified that his copy is out of date.
Document control can become a real can of worms quite easily.
Maybe keep the doc's under CVS or SVN and set it up so that emails are generated to whoever has checked out a copy when updates for the same doc. are checked in to the repository?
Edit: I forgot to add don't forget to use the binary switch, e.g. -kb for CVS, when adding the new doc. Otherwise, you will get any sequences of data that happen to match the ascii for keyword strings having the relevant config management data appended thereby corrupting your doc. data.

Thinking out of the box, would migrating to a Wiki be out of the question?
Since you consider it feasible to force your users into Subversion (or something similar), a larger change seem acceptable.
Another migration target could be to use some kind of structured XML document format (DocBook comes to mind). This would enable you to indeed use diffs and source control, while getting all sorts of document formats for free.

Sharepoint also does a good (ok decent) job of versioning MS-specific documents.

How about trying git , It seems git can support word .doc and open document .odf files if you configure it in .gitattributes file.
Here is a reference , Scroll down to diffing binary files .

For what it's worth, there is also Google Docs. I guess it's not a perfect fit, but it's versioning is very convenient.

Clearcase integrates with Word for revision tracking. I believe Telelogic DOORs does as well.

I use Mercurial with the TortoiseHg overlay. I can right-click a changeset, choose "Visual Diff", then choose the "docdiff" tool (comes bundled), which launches the document in Word with the Track Changes.

You can, but you will allways compare the document versions with Word itself.
I haven't heard a version control database which can track changes in Word documents.
However there are some tools which can compare Word documents, so if you set up your version control client to use these tools for comparison, you can have some fun.

Not necessarily. It depends on how often the new files are committed to the repo. If the files are edited several times before a commit, then you're precisely where you are now. The biggest benefit is if the file becomes corrupted.
You can version any file; this is how Time Machine in Mac OS X Leopard works, for example, and there is an interesting article by someone who committed his entire computing environment into CVS and then just maintained working copies on his home and work machines.
But "better" and "easier" are specific to your situation, and I'm not sure I completely understand your problem as things stand.

Subversion, CVS and all other source control systems are not good for Word documents and other office files (such as Excel spread sheets), since the files themselves are stored in a binary format. That means that you can never go back and annotate (or blame, or whatever you want to call it), or do diffs between documents.
There are revision control systems for Word documents out there, unfortunately I do not know any good ones. We use such control systems for Excel at my work, and unfortunately they all cost money.
The good thing is that they make life a lot easier, especially if you ever have to do an audit or due diligence.

If you use WinMerge it has added support for merging Word and Excel binary files.

Have a look at Sharepoint. If cost is an issue, Sharepoint portal sevices can also work for you. Read this for more info

You could use something like the Revisionator, which is like google docs but with built in revision control including diffs, forks, and 3 way merges. http://revisionator.com
UPDATE: It also fixes the problem of too frequent autosaving that you mention with Google Docs. It'll still autosave to prevent data loss, but it will only create a new version in the revision history and share with other users when you explicitly "release" your changes.

Just wanted to clarify an answer someone gave but I don't have enough points yet.
diff will work on binary files but it is only going to say something not really useful like "toto1 and toto2 binary files differ".

You could do that, but if that files are binary you should always put a lock on it before editing. You won't get a conflict (which would be unresolvable).

Many of the new version control projects are better suited to entire directories, and not so much for single files.
Convincing someone that they need to get an entire project, when they only want to update an individual file can be a "fun" way to spend an afternoon.

Another option you have is a piece of software and cloud computing magic called dropbox. Or, you could ditch the word documents and make a locally shared mediawiki instead.
DropBox:
getdropbox DOT com
MediaWiki:
mediawiki DOT org

YES, it's applicable! I totally agree to say that the combo SVN+TortoiseSVN suits well to track MS Office documents. You can lock a document for edition, write protect all unlocked files to avoid conflicts (i.e. parallel modifications), diff two versions of the same file, see the history of all the modifications and of course rollback to an older revision.
I tried to describe all of those tips in a dedicated blog post. (disclaimer: I'm the blog owner)
All of this could even be accessible from the web with a SVN web client! (might need some software development)
But if you're not accustomed to Version Control Systems in an other context this may not be the obvious choice. The needed work for a good integration with docs give dedicated tools an advantage: "electronic document management" systems are made just for that. A VCS like SVN may stay a good alternative for cost reasons :-)
Did you test the online service Simul? It looks promising, I personally like the GitHub-like orientation. Note that I'm not affiliated to Simul!

Related

How should I start with tracking file changes/versions?

I've been working with a lot of my files on the go recently, and in the process often times accumulated several copies of files in different stages of completion/revision. I'm working on any number of projects at a given time, so it's not always easy to remember or figure out quickly which version I should continue working on.
What type of options would you recommend that allow me to track changes locally and if possible with files I work on while at a remote location? I've never worked with file versioning or tracking systems, so not sure what direction I should be looking in. I work mostly with HTML, CSS, and PHP.
Any help is awesomely appreciated! Thanks.
PS. Don't know if I should have this in a separate question but what options are available for the same type of thing, change tracking/logging for files on server? Preferably something that not only vaguely notes a file has been changed, but that tracks specific changes that have occurred in files.
It's seems to me that github is prefect choice for your requirement. You can create repository for maintain the history, it's easy to use and it is free
https://github.com/

Choosing a version control package for document control

We have a legal requirement to ensure the latest version of documents (mainly Word and Excel) are readily accessible. We currently implement document control by manually updating the page footer with a new version number but want a better system.
I've played around with TortoiseSVN and the functionality is good; but the problem is that unless I've missed a configuration variable somewhere, Subversion applies version numbers to the whole project (i.e. every file in the repository), not to the individual files. What I want is to be able to create a folder in the repository and all our documents go in there, and the version numbers of the files are only changed when that particular document is changed. Currently if we had 30 files and each was printed and put on display including the version numbers, if someone went to the repository the version number would almost certainly have changed even if the document contents were identical. Not ideal.
The alternative to this would be to create a new repository for each and every document but the administrative overhead on that will be prohibitive. I'm essentially looking for something that does much of what TortoiseSVN does, but treats files as individual projects with their own independent version number.
Whatever solution we come up with we would want the version of the document to be automatically shown in the page footer of the document. Tortoise can do this with a macro http://insights.oetiker.ch/windows/SvnProperties4MSOffice/.
Appreciate any help, thanks.
Greg
I think what you're actually looking for is a document management system.
Version Control Systems (VCS), such as Subversion (the underlying technology behind TortoiseSVN), are fundamentally unsuited for your task due to their focus on tracking changesets among all files within a given project (i.e. one changeset/version can involve changes to many files within the project).
Another advantage of using a document management system is that they typically allow you to attach extensible metadata attributes to your documents in a much richer way than version control systems, as well as providing search capabilities.

Any thoughts on Surround scm? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
So looking at different version control systems: subversion, accurev, surround, tfs, bitkeeper/git/mercurial
Subversion: I see it's quite the popular standard
Accurev: There seems to be a love hate relationship around it.
Surround and TFS: I haven't seen many comments around them.
Bitkeeper/Git/Mercurial: Seem pretty popular, but I think "distributed" may scare my manager lol
For some reason he seems attracted to Surround and it's not because of sales pitch. We had originally downloaded it for evaluation played around with it but nothing came of it. So now we are back to looking at scm and wants to try it again. So far I haven't seen any buzz around it like some other version control systems. Same for TFS
I've been using Surround SCM at my job and I'll say it is what it is, but there are a few things that I find lacking. Though, I've heard that surround scm integrates well with surround's issue tracking system, but I can't comment on that because we don't use that.
I personally find the UI to be buggy and confusing.
The workflows are confusing and often offer you with prompts that don't apply, so you get used to ignoring warnings.
eg. "are you sure you don't want to auto-merge?" "Are you sure you want to overwrite files?"
The UI always badgers you to use the auto-merge feature but every
time I've tried it, it ends up messing up my code (C#).
On top of that, the packaged diff tool (Guiffy) is buggy and doesn't display text
properly.
Weird workflow quirks can result in your changes being overwritten.
It doesn't do directory syncing
...which means that every time you add a new file to your project you must by-hand go and add it to the SCM repository. If you don't, everything will look normal to you until one of your teammates emails you because you broke the build.
There's no good way to copy over revision histories when you are branching
... which means that you are less likely to branch when you should be. There's nothing more frustrating than to have to store code locally because you're making changes right before a release and your team refuses to branch the code into another repository.
There's no good way to blacklist certain files from being checked-in or from being overwritten during an update.
If there's a file that you don't want to check-in then you're left with the painful chore of scanning through a long list of files and deselecting those you don't want every time you want to check-in. Yuck.
Features aren't documented that well
Of course, they release a user's guide but it's about as helpful as Microsoft Windows help function. It tells you step by step how to do things in the UI (ie. "click 'Create Shadow Directory', then click 'OK'", but it doesn't tell you what those features are, how they are intended to be used, what actually happens server-side etc.
Btw, if you know of any good way to get around these problems let me know :)
Danger! Danger, Will Robinson!
Surround is a data jail. Once you commit to it, you're stuck. There is no known way to get your history back out to another SCM. Don't get trapped!
This tends to be a problem with closed-source SCMs in general, but I have direct reports that it's especially bad with Surround.
Subversion, git, Mercurial, or Bazaar would be better choices.
I have used Surround at my job for about three years.
It does work well with their (Seapine's) test management and issue tracker program. If you are already using TestTrack, I would say Surround is a good choice.
In general I agree with #eremzeit, but the 'buggy and confusing' comment rarely applies to our workflow. The default diff tool (Guiffy) is bad, but often good enough.
One part I like is the easy ability to share files across repositories without needing to share a whole project/repository. Git does not have a mechanism to do this easily.
Last note: we have used Surround on Linux and Windows and it appears to work just as well on either. It is nice to have the same interface.
Surround SCM.
Pros:
Can apply a development work flow for all files. No two revisions of a file can be in the same status in the work flow.
Has a good UI.
Good licensing system.
Cons:
Stores all data in a RDBMS.. heading for a performance problem if the repo size is huge.
Does not support atomic commits. (you can do atomic commits but the files are still revisions and cannot be refereed using the changelist #)
My ideas about other tools
Subversion suits well for a corporate setup. Perforce is like subversion but faster and has a good UI, simple licensing terms and really super support system.
Recently Accurev has gained a strong footing with its innovative branching methodology.
IMHO. go for tool sets that interact well with your defect tracking, test case management and build management solution. This would help you create a good developer ecosystem thereby saving time.

best practice in storing non-source files under version control

I have a project under version control, but the project has some images, videos and zip files that change every so often. I don't want to store these files under version control because they take up a lot of space and make updates and commits very slow.
What's a good way of dealing with this issue and still commit non-source files that have changed? is there a better way?
I'm currently using subversion, if there is another version control client that is better for dealing with this issue, please recommend it!
I have lots of non-source files in SVN and the only time it slows down the commit is when I change them. I don't see how this is an issue if they're only changing "every so often". Also the size really shouldn't be a concern. If your repository is on a server and you're worried about how much space it's taking up you need to upgrade. Hard drives are cheap. Buy them.
Some people feel strongly that non-source files don't belong in source control, I say an entire project should be stored in source control. That way if my development system goes down I can switch to another and after a couple minutes of downloading the project I'm back to coding.
You added in a comment:
The problem I have with this is that I don't care about the version history of those zip/video files, as long as they're the newest ones, there is no problem.
That means you have a linear workflow of development, only working on the LATEST of one main branch.
You do not seem to deal with the phase "after release", where you have to:
maintain what runs in production
develop small evolutions...
..while doing massive refactoring for experimenting some big evolutions
In the last three cases, the question of "what were the exact images, videos and zip files "I have to use" or "I was using at the time" might become important.
Anyhow, if you feel SVN do not handle them appropriately, I would still recommend having some way to remember that, for SVN revision xxx to yyy, you were using the version 'z' of your set of binaries.
For that, you could setup an external repository like Maven. See question "Is it acceptable/good to store binaries in SVN?" (my answer in that question is near the top of the page, but I link directly to Evan's answer as he mentions Maven).
While I find it a huge pain to deal with non-text files in version control, especially ones that change a lot, I've accepted the practice of "if it is needed for the build/installer it should go in version control". This of course is not a hard rule. I don't keep 3rd party libraries under version control (though know people who do).
I came under this opinion after setting up a Continuous Integration server at my shop. Having everything needed for the build that may change make's it much easier. As previously mentioned I don't keep libs under version control, but that is due to the fact that we rarely upgrade/add new libraries. If this is not the case for your shop then you may consider doing so. Also, if your images/videos/zips change more then once a year then I'd recommend keeping them under version control.

What is the best solution for maintaining backup and revision control on live websites?

What is the best solution for maintaining backup and revision control on live websites?
As part of my job I work with several live websites. We need an efficient means of maintaining backups of the live folders over time. Additionally, updating these sites can be a pain, especially if a change happens to break in the live environment for whatever reason.
What would be ideal would be hassle-free source control. I implemented SVN for a while which was great as a semi-solution for backup as well as revision control (easy reversion of temporary or breaking changes) etc.
Unfortunately SVN places .SVN hidden directories everywhere which cause problems, especially when other developers make folder structure changes or copy/move website directories. I've heard the argument that this is a matter of education etc. but the approach taken by SVN is simply not a practical solution for us.
I am thinking that maybe an incremental backup solution may be better.
Other possibilities include:
SVK, which is command-line only which becomes a problem. Besides, I am unsure on how appropriate this would be.
Mercurial, perhaps with some triggers to hide the distributed component which is not required in this case and would be unnecessarily complicated for other developers.
I experimented briefly with Mercurial but couldn't find a nice way to have the repository seperate and kept constantly in-sync with the live folder working copy. Maybe as a source control solution (making repository and live folder the same place) combined with another backup solution this could be the way to go.
One downside of Mercurial is that it doesn't place empty folders under source control which is problematic for websites which often have empty folders as placeholder locations for file uploads etc.
Rsync, which I haven't really investigated.
I'd really appreciate your advice on the best way to maintain backups of live websites, ideally with an easy means of retrieving past versions quickly.
Answer replies:
#Kibbee:
It's not so much about education as no familiarity with anything but VSS and a lack of time/effort to learn anything else.
The xcopy/7-zip approach sounds reasonable I guess but it would quickly take up a lot of room right?
As far as source control, I think I'd like the source control to just say that "this is the state of the folder now, I'll deal with that and if I can't match stuff up that's your fault, I'll just start new histories" rather than fail hard.
#Steve M:
Yeah that's a nicer way of doing it but would require a significant cultural change. Having said that I very much like this approach.
#mk:
Nice, I didn't think about using Rsync to deploy. Does this only upload the differences? Overwriting the entire live directory everytime we make a change would be problematic due to site downtime.
I am still curious to see if there are any more traditional options
You can still use SVN, but instead of doing a checkout on your live environment, do an export, that way no .svn directories will be created. The downside, of course, is that no code changes on your live environment can take place. This is a good thing.
As a general rule, code changes on production systems should never be allowed. The change should be made and tested in a development/test/UAT environment, then once confirmed as OK, you can tag that code in SVN with something like RELEASE-x-x-x. Then, on the live system, export the code with that tag.
We use option 3. Rsync. I wrote a bash script to do this along with some extra checking, but here are the basics of what it does.
Make a tag for pushing to live.
Run svn export on that tag.
rsync to live.
So far it has been working out. We don't have to worry about user conflicts or have a separate user for running svn up on the production machine.
Any source control solution you pick is going to have problems if people are moving, deleting, or adding files and not telling the source control system about it. I'm not aware of any source control item that could solve this problem.
In the case where you just can't educate the people working on the project[1], then you may just have to go with daily snapshots. Something as simple as batch file using xcopy to a network drive, and possibly 7-zip on the command line to compress it so it doesn't take up too much space would probably be the simplest solution.
[1] I would highly disbelieve this, probably just more a case of people being too stubborn and not willing to learn, or do "extra work". Nevermind how much time source control could save them when they have to go back to previous versions, or 2 people have edited the same file.
rsync will only upload the differences. I haven't personally used it, but Mark Pilgrim wrote a long time ago about how it even handles binary diffs brilliantly.
svn+rsync sounds like a fantastic solution. I'll have to try that in the future.