File history: in the source or let scm handle it? - version-control

I'm learning mercurial as my solo scm software. With other management software, you can put change comments into the file header through tags. With hg you comment the change set, and that doesn't get into the source. I'm more used to central control like VSS.
Why should I put the file history into the header of the source file? Should I let mercurial manage the history with my changeset comments?

Let the source control system handle it.
If you put change details in the header it will soon become unwieldy and overwhelm the actual code.
Additionally if the scm has the concept of changelists (where many files are grouped into a single change) then you'll be able to write the comment so that it applies to the whole change and not just the edits in the one file (if that makes sense), giving you a clearer picture of why the edit was required.

Yes; let the source control system handle your changeset comments. The rationale for this is that it makes considerably more sense when you're viewing the change log later, trying to work out what's going on between two versions of a file - the source control system can present the change comment to try and enlighten the situation.

There's no reason to manually maintain a file history when SCM software is much better suited to solve this problem. All too often I see partially-completed file histories in the source, which actually hurts, because people incorrectly assume it is accurate.

The difference is not whether it's a centralized or distributed VCS, it's more about what's being changed.
When I moved to .Net, the number of files updated for any individual change seemed to skyrocket. If I had to log the change in each file, I'd never get any real work done. By commenting on the set of changes, it doesn't matter how many files I had to update.
If I ever needed to identify all of the changes for a particular change, I can diff between the two versions of the project.
The biggest difference (and advantage) I saw when switching away from SourceSafe was the switch from file based to project based commits. As soon as I got used to that, I stopped adding change-log type comments to all of my files.
(As a side effect, I've found that my process description comments have gotten better)

I'm not a big proponent of littering the code with change comments. In the event they are needed they can be looked up in the SCM (at least for the SCM variants I have used). If you do want them in the file, consider putting them at the end instead of the beginning. That way you won't have to scroll down past the (uninteresting, to me at least) comments before you get to the actual code.

Another vote for letting the SCM system handle the checkin comments, but I do have one thing to add.
Some systems allow you to use RCS tags in your source code where the SCM can insert the change history directly into the source file being committed automatically. Sounds like a nice balance because the history is then in the SCM system and then automatically put into the source code itself.
The problem is that this process changes the source file. I think that's a bad idea because the file cannot be changed on disk until after you comment is inserted. If you were a good engineer, you should have built and tested changes before the commit. If your source changes after the commit, then you've essentially got a build that could be broken - but most engineers won't build after a commit - why should they?
But it's just a comment you say! True, but I did have a case where there was code in my source file that strangely enough had reason to look like an RCS header tag and that section of the code got replaced on checkin, thereby munging my code. Easy enough to fix, but bad that a build got broken for 20+ users

Much easier to forget to maintain history in the source, as one always (imo) should comment commits to source control system that problem dissappers. Also if changing lots of files before commit, changing history in every file will be annoying work. This is really one of the points with having scm.

I have experience with this. I've had the file history in the comments, it was awful. Nothing but garbage, sometimes you would have to scroll down almost 1k lines of code changes before you finally got to what you wanted. Not to mention, you're slowing down other aspects of your build process by adding more kb to your source code tree.

Related

Version control personally and simply?

Requirement
make history for web text/code source files.
login-worker is only me, i.e personal usage.
automatically save history for each updated files(no require at once but at least once per week)
It must be a simple way to start and work.
I have 3 work places so need to do async files.
(not must but hopefully for future working environment) Any other non-engineer can also understand the location of history file and can see it easily.
Current way:
I made history folder the day, download files in there for edit, copy files when I edit/creat new one.
Advantage of the current way:
Very quick and simple, no need to do additional task to make history
Disadvantage of the current way:
Messy. Whenever day I work, I create a new history folder to keep downloaded files, so that it is messy in Finder(or windows explore).
Also, I don't have a way to Doing Async files for sure with in other places.
I tested to use GIT before, I had Thought GIT automatically save files I edit and save with a editor, but that was not the case. Also GIT is too complicated to use/start. If you recommend GIT, you need to show me ways to deal with the problem I had, for instance, simple GIT GUI with limited options without merging/project/branch etc because of personal usage for maintaining just one website.
Do you know any way to do version control personally and simply?
Thanks.
Suppose you entered <form ...> in your HTML—without the closing tag—and saved the file; do you really think the commit created by our imaginary VCS picked up that file's update event would have any sense?
What I mean, is that as with writing programs¹,
the history of source code changes are there for humans to read,
and for that matter, a good history graph should really read like a prose:
each commit should be atomic in the sense it comprises one (small) but
internally integral feature or fixes a bug, and had to be properly annotated
so that the intent of the change captured by that commit is clear.
What you want instead is just some dumb stream of changes purely for backup purposes.
Well, if you're fully aware of the repercussions (the most glaring one is that the generated history is completely useless for doing development on
the project and can only be used for rollbacks in case of "oopsies"),
there are two ways to go:
Some IDEs (namely, Eclipse) save a backup copy of each file they manage
on each save—thus providing your with such a rollback functionality w/o
using any VCS.
Script around any VCS you like: say, on Linux,
you start something like inotifywait telling it to watch your
project's root directory, recurvively, for write events on files,
read whatever the tool prints to its stdout when these events happen,
and for each event, call to your VCS of choice to record a new commit
with these changes.
¹ «Programs must be written for people to read, and only incidentally for machines to execute.» — Abelson & Sussman, "Structure and Interpretation of Computer Programs", preface to the first edition.
I strongly suggest you to have a deeper look at git.
It may looks difficult at the beginning, but you should spend some time learning it, that's all. All the problems above could be easily solved if you spend some time to learn the basics. There is also a nice "tutorial" on github on how to use git, no need to install anything: https://try.github.io/levels/1/challenges/1.

DBML changes & source control

I'm having some issues with the DBML. Every time the team needs to synchronize changes into SVN, the DBML is changed which generates lots of conflicts. This seems to be related to some rearrangement in the dbml editor, because most of the associationConnector sections in the .dmbl.layout seem to change during development if you open the dbml file in the editor.
Do you have any best practices to avoid these layout rearrangements that can easily take more than one hour to fix?
Best regards,
Gustavo
If it is just the .layout being changed then I would actually ignore the request since the GUID's which link everything up will still match. They just won't be in the same place as I set them, something I could live with. If the .dbml changed as well, then I would accept both without merging.
If you willing to use KDiff3, then you can configure a pre-processor command to sort the dbml file before merging. I've posted a guide on my blog at http://blog.trumpi.co.za/the-one-tip-that-i-wish-i-knew-years-ago-that-merges-easier/. The instructions are for git, but I know that TortoiseSVN can be configured in a similar way to route dbml merging to KDiff3.

How do you "check out" code?

I've never really worked with a lot of people where we had to check out code and have repositories of old code, etc. I'm not sure I even know what these terms mean. If I want to to start a new project that involves more than myself that tracks all the code changes, does "check out" (again, don't know what that means), how do I get started? Is that what SVN is for? Something else? Do I download a program that keeps up with the code?
What do I do?
It will all be in house. No Internet for storing code.
I don't even know if what I am asking for is called source control. I see things about checking out, SVN, source control, and so on. I don't know if it is all talking about the same thing or not. I was hoping to use something open source.
So, a long time ago, in the bad old days of yore, source control used a library metaphor. If you wanted to edit a file, the only way to avoid conflicts was to make sure that you were the ONLY one editing the file. What you'd do is ask the source control system to "check out" that file, indicating that you were editing it and nobody else was allowed to edit it until you made your changes and the file was "checked in". If you needed to make a change to a checked out file, you had to go find that freakin' developer who'd had everythingImportant.conf checked out since last Tuesday..freakin' Bill...
Anyway, source control doesn't really work like that anymore, but the language stuck with us. Nowadays, "checking out" code means downloading a copy of the code from the code repository. The files will appear in a local directory, allowing you to use them, compile the code, and even make changes to the source that you could perhaps upload back to the repository later, should you need to. Even better, with just a single command, you can get all the changes that have been made by other developers since the last time you downloaded the code. Good stuff.
There are several major source control libraries, of which SVN (also called Subversion) is one (CVS, Git, HG, Perforce, ClearCase, etc are others). I recommend starting with SVN, Git, or HG, since they're all free and all have excellent documentation.
You might want to start using source control even if you're the only developer. There's nothing worse than realizing that last night the thousand lines of code you deleted as useless were actually critically important and are now lost forever. Source control allows you to zoom forwards and backwards in the history of your files, letting you easily recover stuff that you should not have removed, and giving you a lot more confidence about deleting useless stuff. Plus, fiddling around with it on your own is good practice.
Being comfortable with source/revision control software is a critical job skill of any serious software engineer. Mastering it will effectively level you up as a professional developer. Coming onto a project and finding that the team keeps all their source in a folder somewhere is an awful experience. Good luck! You're already on the right path just by being interested!
Check out Eric Sink's excellent series of articles:
Source Control HOWTO
I recommend Git and Subversion (SVN) both as free, open-source version control systems that work very well. Git has some nice features given that it can be easier to work decentralized.
Checkout means retrieving a file from a source control system. A source control system is a database (some, like CVS, use just specially marked up text files, but a file system is also a database) that holds all versions of your code (that are checked in after you make modifications).
Microsoft Visual SourceSafe uses a very proprietary database which is prone to corruption if it is not regularly maintained and uses reserved checkouts exclusively. Don't use it, for all those reasons.
The difference between a reserved checkout and an unreserved checkout is in an unreserved checkout; two people can be modifying the same file at once. The first one to check in gets in no problem, and the second one has to update their code to the latest version and merge the changes into theirs (which usually happens automatically, but if the same area of the file was changed, then there is a conflict, which has to be resolved before it can be checked in).
For some arguments for unreserved checkouts, see here.
Following this, you will be looking at a build process that independently checks out the code and builds the source code, so that everyone's changes are built and distributed together.
Are you creating the project that requires source control? If so, choose a source control system that meets your needs, and read the documentation for how to get it set up. If you are simply using a previously set up source control system for an existing project, ask a coworker who has been using it, or ask the person who set up the source control system.
For choosing a source control system that meets your needs, most source control systems have extensive descriptions of their features online, many provide evaluation or even completely free products, and there are many many many anecdotal descriptions of what working with each individual source control system is like, which can help.
Just don't use Microsoft Visual SourceSafe if you value your sanity and your code.

Best Practices for Comments on Code Commit

What template do you use for comments on code commit?
One example of a template is:
(change 1): (source file 1.1, 1.2): (details of change made), (why)
(change 2): (source file 2.1): (details of change made), (why)
Ideally each change should be mapped to an issue in the issue tracker. Is this template alright?
Here are my thoughts... all these will be open to interpretation depending on your particular development methodologies.
You should be committing fairly often, with a single focus for every commit, so based on that, comments should be short and detail what the focus of the commit was.
I'm a fan of posting the what in your comment, with the why and the how being detailed elsewhere (ideally in your bug tracking). The why should be the ticket, and upon closing the ticket, you should have some kind of note about how that particular issue was addressed.
A reference to your bug tracking system is good if it isn't handled otherwise (TRAC/SVN interaction, for example). The reason for this is to point other developers in the right direction if they're looking for more information on the commit.
Don't include specific file names unless the fix really complex and detail is needed. Even still, complex details probably belong in bug tracking with your implementation notes, not in version control. Files edited, diff details, etc, should hopefully be included with version control, and we don't want to spend time duplicating this.
Given these ideas, an example commit comment for me would be something like
Req3845: Updated validation to use the new RegEx validation developed in Req3831.
Short, communicates what was changed, and provides some kind of reference for others to get more info without hunting you down.
I prefix each paragraph with + - * or !
+ means its a new feature
- means feature is removed
* means feature is changed
! means bugfix
I don't think you should commit detailed description about what parts of the code are changed, because that's why every VC has diff :)
If you use a bug tracking system, include relevant ticket numbers.
You do not need to mention changed files, or your name. The source repository can figure that out by itself. Describing the changes also only makes sense if it is not non-trivially obvious from the diff.
Make sure you have a good first line, because this frequently appears in the change history view, and people need to find things by this (the bug tracking ticket number should go there, for example).
Try to commit related changes in a single changeset (and split unrelated changes into two commits, even if to the same file).
I try to follow the same rule as for code comments:
Explain the WHY, not the HOW.
IMO a comment should contain a reference to the issue (task tracker, or requirement). Which files are affected is already available from the version control system. Apart from that, it should be as short as possible, but still readable.
I try to keep my fixes in separate check-ins.
I don't use an actual template, but a mental one, and it's like this.
Issue - dev level summary of
issue.
The issue tracker has all the management details, and the changes/diffs can be reviewed for code changes, so the comment is for dev's to understand the why/what of the issue.
Here's what I've seen used successfully:
Reference to bug number or feature ID
Brief description of the change. What was changed.
Code reviewer (to ensure you have one) unless handled by the checkin system.
Name of tester or description of which tests were run (if late in the process and you are being extra careful)
I use the simple technique described by Chaosben on the JEDI Windows API blog.
In order to get a fast view on the
changes made to a repository, we
suggest to write brief concise
comments starting each line with one
of these chars:
+ if you added a feature/function/…
- if you removed a feature/function/bug/…
# if you changed something
Doing it this way, other developers may find the desired revision much better.
First, commits should solve one single problem (separate commits for logically separate changes). If you don't know what to write in the commit message, or the comit message is too long it might mean that you have multiple independent changes in your commit, and you should split it into smaller items.
I think that commit message conventions expected and used by git makes much sense:
The first line of commit message should be a short description
If appropriate, prefix mentioned above summary line with subsystem prefix, e.g. "docs:" or "contrib:"
In next paragraph or paragraphs describe the change, explaining why's and how's
Keep in mind that if someone needs details of what changed, they can get a diff. That said, I just usually write a sentence or two for each major change, and then lump any minor fixes at the end.
There is no hard and fast rule as its plain english. I try to explain the work done in minimum words possible. Anybody looking for history of changes just want to know what happened in a particular change. If anybody is after more details then its there in the code.
Second thing I follow is if there is any bug associated then stick that in or if its related to any dev task then associate that with the change.
If two files were changed for different reasons, they should be in different commits The only time you should commit more than one code file at a time is because they all belong to the same fix/change

The theory (and terminology) behind Source Control

I've tried using source control for a couple projects but still don't really understand it. For these projects, we've used TortoiseSVN and have only had one line of revisions. (No trunk, branch, or any of that.) If there is a recommended way to set up source control systems, what are they? What are the reasons and benifits for setting it up that way? What is the underlying differences between the workings of a centralized and distributed source control system?
Think of source control as a giant "Undo" button for your source code. Every time you check in, you're adding a point to which you can roll back. Even if you don't use branching/merging, this feature alone can be very valuable.
Additionally, by having one 'authoritative' version of the source control, it becomes much easier to back up.
Centralized vs. distributed... the difference is really that in distributed, there isn't necessarily one 'authoritative' version of the source control, although in practice people usually still do have the master tree.
The big advantage to distributed source control is two-fold:
When you use distributed source control, you have the whole source tree on your local machine. You can commit, create branches, and work pretty much as though you were all alone, and then when you're ready to push up your changes, you can promote them from your machine to the master copy. If you're working "offline" a lot, this can be a huge benefit.
You don't have to ask anybody's permission to become a distributor of the source control. If person A is running the project, but person B and C want to make changes, and share those changes with each other, it becomes much easier with distributed source control.
I recommend checking out the following from Eric Sink:
http://www.ericsink.com/scm/source_control.html
Having some sort of revision control system in place is probably the most important tool a programmer has for reviewing code changes and understanding who did what to whom. Even for single person projects, it is invaluable to be able to diff current code against previous known working version to understand what might have gone wrong due to a change.
Here are two articles that are very helpful for understanding the basics. Beyond being informative, Sink's company sells a great source control product called Vault that is free for single users (I am not affiliated in any way with that company).
http://www.ericsink.com/scm/source_control.html
http://betterexplained.com/articles/a-visual-guide-to-version-control/
Vault info at www.vault.com.
Even if you don't branch, you may find it useful to use tags to mark releases.
Imagine that you rolled out a new version of your software yesterday and have started making major changes for the next version. A user calls you to report a serious bug in yesterday's release. You can't just fix it and copy over the changes from your development trunk because the changes you've just made the whole thing unstable.
If you had tagged the release, you could check out a working copy of it and use it to fix the bug.
Then, you might choose to create a branch at the tag and check the bug fix into it. That way, you can fix more bugs on that release while you continue to upgrade the trunk. You can also merge those fixes into the trunk so that they'll be present in the next release.
The common standard for setting up Subversion is to have three folders under the root of your repository: trunk, branches and tags. The trunk folder holds your current "main" line of development. For many shops and situations, this is all they ever use... just a single working repository of code.
The tags folder takes it one step further and allows you to "checkpoint" your code at certain points in time. For example, when you release a new build or sometimes even when you simply make a new build, you "tag" a copy into this folder. This just allows you to know exactly what your code looked like at that point in time.
The branches folder holds different kinds of branches that you might need in special situations. Sometimes a branch is a place to work on experimental feature or features that might take a long time to get stable (therefore you don't want to introduce them into your main line just yet). Other times, a branch might represent the "production" copy of your code which can be edited and deployed independently from your main line of code which contains changes intended for a future release.
Anyway, this is just one aspect of how to set up your system, but I think giving some thought to this structure is important.