Something that will alert your/force you to commit after editing X number of files, or modifying X number of lines of code, or writing X number of lines of code.
Edit:
There's clearly no feasible way for an automated system to determine if some realizable code chunk is complete but this would be good enough for me. I don't want to use this as an "autosave" feature but more as a brain jog to remember to commit once at a suitable point.
I completely agree with the other anwsers, but I think one can use this approach if, say, you want to ensure that you make small commits rather than large ones, especially in DVCS's like Git
I think you can setup a Scheduled Task or Cron, which will hit your working directory and run something like:
svn diff | grep -E "^\+ " | wc -l
and if the count is greater than something that you deem is when you want to commit, you can make it give you a reminder. I don't think you can integrate such a thing in Eclipse.
That's not what commits are for. They are not some sort of backup mechanism. You do a commit when a piece of work has reached some state that you want to remember, normally because you are happy with it. It makes no sense at all to do them every X hours or every N lines of code.
Usually it's just a workflow thing and a habit you should get into.
Commits should be related to the work you are doing - so that reverting is meaningful. It's pretty hard for anything else to detect that except you.
Related
I'm working on a new weave-based data structure for storing version control history. This will undoubtedly cause some religious wars about whether it's The Right Way Of Doing Things when it comes out, but that isn't my question right now.
My question has to do with what output blame should give. When a line of code has been added, removed, and merged into itself a number of times, it isn't always clear what revision should get blame for it. Notably this means that when a section of code is deleted, all records of it having been there is gone, and there is no blame for the removal. Everyone I've gone over this issue with has said that trying to do better simply isn't worth it. Sometimes people put in the hack that the line after the section which got deleted has its blame changed from whatever it actually was to the revision when the section got deleted. Presumably if the section is at the end then the last line get its blame changed, and if the file winds up empty then the blame really does disappear into the aether, because there's literally nowhere left to put blame information. For various technical reasons I won't be using this hack, but assume that continuing but with this completely undocumented but de facto standard practice will be uncontroversial (but feel free to flame me and get it out of your system).
Moving on to my actual question. Usually in blame for each line you look at the complete history of where it was added and removed in the history and using three-way merge (or, in the case of criss-cross merges, random bullshit) and based on the relationships between those you determine whether the line should have been there based on its history, and if it shouldn't but is then you mark it as new with the current revision. In the case where a line occurs in multiple ancestors with different blames then it picks which one to inherit arbitrarily. Again, I assume that continuing with this completely undocumented but de facto standard practice will be uncontroversial.
Where my new system diverges is that rather than doing a complicated calculation of whether a given line should be in the current revision based on a complex calculation of the whole history, it simply looks at the immediate ancestors, and if the line is in any of them it picks an arbitrary one to inherit the blame from. I'm making this change for largely technical reasons (and it's entirely possible that other blame implementations do the same thing, for similar technical reasons and a lack of caring) but after thinking about it a bit part of me actually prefers the new behavior as being more intuitive and predictable than the old one. What does everybody think?
I actually wrote a one of the blame implementations out there (Subversion's current one I believe, unless someone replaced it in the past year or two). I helped with some others as well.
At least most implementations of blame don't do what you describe:
Usually in blame for each line you look at the complete history of where it was added and removed in the history and using three way merge (or, in the case of criss-cross merges, random bullshit) and based on the relationships between those you determine whether the line should have been there based on its history, and if it shouldn't but is then you mark it as new with the current revision. In the case where a line occurs in multiple ancestors with different blames then it picks which one to inherit arbitrarily. Again, I assume that continuing with this completely undocumented but de facto standard practice will be uncontroversial.
Actually, most blames are significantly less complex than this and don't bother trying to use the relationships at all, but they just walk parents in some arbitrary order, using simple delta structures (usually the same internal structure whatever diff algorithm they have uses before it turns it into textual output) to see if the chunk changed, and if so, blame it, and mark that line as done.
For example, Mercurial just does an iterative depth first search until all lines are blamed. It doesn't try to take into account whether the relationships make it unlikely it blamed the right one.
Git does do something a bit more complicated, but still, not quite like you describe.
Subversion does what Mercurial does, but the history graph is very simple, so it's even easier.
In turn, what you are suggesting is, in fact, what all of them really do:
Pick an arbitrary ancestor and follow that path down the rabbit hole until it's done, and if it doesn't cause you to have blamed all the lines, arbitrarily pick the next ancestor, continue until all blame is assigned.
On a personal level, I prefer your simplified option.
Reason: Blame isn't used very much anyway.
So I don't see a point in wasting a lot of time doing a comprehensive implementation of it.
It's true. Blame has largely turned out to be one of those "pot of gold at the end of the rainbow" features. It looked really cool from those of us standing on the ground, dreaming about a day when we could just click on a file and see who wrote which lines of code. But now that it's widely implemented, most of us have come to realize that it actually isn't very helpful. Check the activity on the blame tag here on Stack Overflow. It is underwhemingly desolate.
I have run across dozens of "blame-worthy" scenarios in recent months alone, and in most cases I have attempted to use blame first, and found it either cumbersome or utterly unhelpful. Instead, I found the information I needed by doing a simple filtered changelog on the file in question. In some cases, I could have found the information using Blame as well, had I been persistent, but it would have taken much longer.
The main problem is code formatting changes. The first-tier blame for almost everything was listed as... me! Why? Because I'm the one responsible for fixing newlines and tabs, re-sorting function order, splitting functions into separate utility modules, fixing comment typos, and improving or simplifying code flow. And if it wasn't me, someone else had done a whitespace or block-move somewhere along-the-way as well. In order to get a meaningful blame on anything dating back to a time before I can already remember without the help of blame, I had to roll back revisions and re-blame. And re-blame again. And again.
So in order for a blame to actually be a useful time saver for more than the most lucky of situations, the blame has to be able to heuristicly make its way past newline, whitespace, and ideally block copy/move changes. That sounds like a very tall order, especially when scouring the changelog for a single file, most of the time, it won't yield many diffs anyway and you can just sift through by hand fairly quickly. (The notable exception being, perhaps, badly engineered source trees where 90% of the code is stuffed in one or two ginormous files... but who these days in a collaborative coding environment does much of that anymore?).
Conclusion: Give it a bare-bones implementation of blame, because some people like to see "it can blame!" on the features list. And then move on to things that matter. Enjoy!
The line-merge algorithm is stupider than the developer. If they disagree, that just indicates that the merger is wrong rather than indicating a decision point. So, the simplified logic should actually be more correct.
The rebaseif mercurial extension automates the process, when pulling, of doing a rebase only if the merge can be done automatically with no conflicts. (If there are conflicts to resolve manually, it does not rebase, leaving you ready to do a manual merge of the two branches.) This simplifies and linearizes the history when developers are working in different parts of the code, although any rebase does throw away some information about the state of the world when a developer was doing work. I tend to agree with arguments like this and this that in the general case, rebasing is not a good idea, but I find the rebase-if philosophy appealing for the non-conflict case. I’m on the fence about it, even though I understand that there are still risks of logic errors when changes happen in different parts of the code (and the author of rebaseif extension has come to feel it’s a bad idea..)
I recently went through a complicated and painful bisect, and I think that having a large number of merges of short branches in our repository was the main reason the bisect did not live up to its implied O(lg n) promise. I found myself needing to run "bisect --extend" many times, to stretch the range beyond the merge, going by a couple of changesets at a time, essentially making bisect O(n). I also found it very complicated to keep track of how the bisect was going and to understand what information I'd gained so far, because I couldn't follow the branching when looking at graphs of the repository.
Are there better ways to use bisect (and to look at and understand the revision history) or am I right that the process would have been smoother if we had used rebaseif more in development. Alternately, can you help me understand more concretely what may go wrong using rebase in the non-conflict case: is it likely enough to cause problems that it should be avoided?
I’m tagging this more generally (not just mercurial) since I think rebaseif matches a more typical git workflow: git users may have seen the gotchas.
I think the answer is simple: you have to devide between hard bisects or risky rebasing.
Or, something in between: only rebase if it is very unlikely that the rebase silently breaks things. If a rebase involves only a few changesets which additionally are semantically distant to the changes they are rebased on, it's usually safe to rebase.
Here's an example, where a conflict-free merge breaks things:
Suppose two branches start from a file with this content:
def foo(a):
# do
# something
# with a (an integer)
...
foo(4)
In branch A, this is changed to:
def foo(a):
# now this function is 10 times faster, but only work with positive integers
assert a > 0
# do
# something with
# with a
...
foo(4)
In branch B, it is changed to:
def foo(a):
# do
# something
# with a (an integer)
...
foo(4)
...
foo(-1) # now we have a use case where we need to call foo with -1
Semantically, both edits conflict with each other. However, Mercurial happily merges them without conflicts (in both cases, when rebasing or when doing a regular merge):
def foo(a):
# now this function is 10 times faster, but only work with positive integers
assert a > 0
# do
# something with
# with a
...
foo(4)
...
foo(-1) # now we have a use case where we need to call foo with -1
The advantage of a merge is that a it allows to understand what went wrong at some later point, so you can fix things accordingly. A rebase might throw away information you need to understand bugs caused by automatic merges.
The main argument against git rebase seems to be a philosophical one around "losing history", but if I really cared about that I'd make the final build step a checkin (or the first build step to track all the failed builds too!).
I'm not particularly familiar with Mercurial or bisecting (except that it's a bit like git), but in my month-and-a-bit with git I exclusively stuck to rebase. I also use git rebase -i --autosquash and git add -p a lot.
IME, there's also not that much difference between a rebase and a merge when it comes to fixing conflicts — the answer you linked to suggests "rebaseif" is bad because the "if" conditions on whether the merge proceeded without conflict, whereas it should be conditioned on whether the codebase builds and tests pass.
Perhaps my thinking is skewed by an inherent weakness in git's design (it doesn't explicitly keep track of the history of a branch, i.e. the subset of commits that it's actually pointed to), or perhaps it's just how I work (check that the diff is sane and that it builds, although admittedly after a rebase I don't check that intermediate commits build).
(Aside: For personal projects I often would like to keep track of each build output and corresponding source snapshot, but I've yet to find anything which is good at doing so.)
Just trying to get diff to work better for certain kinds of documents. With LaTeX, for example, I might have a long paragraph that is strictly just one line, but I don't want to see that entire paragraph if just a sentence is changed. Particularly if I'm running some kind of version control and a co-author edits the same paragraph (but not the same sentence) as me. I wouldn't want that to show up as a conflict.
That's a secondary question. The main question is whether I can use diff to look sentence-by-sentence. Thanks.
Edit
wdiff is almost perfect. But is there a merge equivalent, as diff has with diff3?
wdiff will give you a word-by-word diff instead of line-by-line. I'm not aware of any sentence-by-sentence diff programs.
Preprocess the files before diffing them. Write a script to write one sentence per line and any line by line diff program will work.
I have done this on a C token level for diffing C code in order to make absolutely sure my CVS merge was correct.
I seem to get into an annual debate about the use of the $Log$ keyword. My point of view is this:
$Log$ is white hot death.
All it does is jam marginally relevant spam into your source files. Any information that anyone thinks they might be able to get from a $Log$ is more readily available from (and is likely to be more accurate in) your version control system.
So, here's the question: how would you explain to an "old school" coder (who thinks that $Log$ is the way to manage source code changes) that we have better tools now?
The CVSNT remarks on $Log$ are a good start but they're just not pointed enough. To date, the closest that I've come to a one-liner that I've managed to come up with is "$Log$ is a wish. You're hoping that what gets spammed into your file has any relation to what really happened to this file."
PS for clarity: when I say "old school," I mean old in attitude, not old in years. My first programming paycheck (and a remarkably modest one it was, too) was sometime in 1986 and I never thought $Log$ was a good idea.
I think the Subversion FAQ also has a good explanation.
$Log$ is a total horror the moment you start merging changes
between branches. You're practically guaranteed to get conflicts there,
which -- because of the nature of this keyword -- simply cannot be
resolved automatically.
In addition to what the others have said, try putting a comment (/* ... */) into a commit message :->.
The amount of useful bits in a source file slowly decreases as changes are made to it with that $Log$ statement in it. We had it in some files that came from CVS and number of lines of $Log$ statements was on the order of 10x longer than the executable code in file actually was. And it had a few groups of duplicates caused by bad merging from some branches.
You may consider (emphasis on may) embedding immutable meta-data in your file.
(See the debate between me and an "older schooler" : Embedded Version Numbers - Good or Evil?).
Even though I have always considered that practice as evil (mixing meta-data information into data), introducing "merge hell", one could argue that it could work, with the right merge manager, for immutable meta-data with a fixed format, like:
$Revision$ $Revision: 9.13 $
$Date$ $Date: 2009/03/06 06:52:26 $
$RCSfile$ $RCSfile: stderr.c,v $
But mutable meta-data like logs ? With unknown format or content ? That is bound to fail.
One I am aware of is Perl::Critic
And my googling has resulted in no results on multiple attempts so far. :-(
Does anyone have any recommendations here?
Any resources to configure Perl::Critic as per our coding standards and run it on code base would be appreciated.
In terms of setting up a profile, have you tried perlcritic --profile-proto? This will emit to stdout all of your installed policies with all their options with descriptions of both, including their default values, in perlcriticrc format. Save and edit to match what you want. Whenever you upgrade Perl::Critic, you may want to run this command again and do a diff with your current perlcriticrc so you can see any changes to existing policies and pick up any new ones.
In terms of running perlcritic regularly, set up a Test::Perl::Critic test along with the rest of your tests. This is good for new code.
For your existing code, use Test::Perl::Critic::Progressive instead. T::P::C::Progressive will succeed the first time you run it, but will save counts on the number of violations; thereafter, T::P::C::Progressive will complain if any of the counts go up. One thing to look out for is when you revert changes in your source control system. (You are using one, aren't you?) Say I check in a change and run tests and my changes reduce the number of P::C violations. Later, it turns out my change was bad, so I revert to the old code. The T::P::C::Progressive test will fail due to the reduced counts. The easiest thing to do at this point is to just delete the history file (default location t/.perlcritic-history) and run again. It should reproduce your old counts and you can write new stuff to bring them down again.
Perl::Critic has a lot of policies that ship with it, but there are a bunch of add-on distributions of policies. Have a look at Task::Perl::Critic and
Task::Perl::Critic::IncludingOptionalDependencies.
You don't need to have a single perlcriticrc handle all your code. Create separate perlcriticrc files for each set of files you want to test and then a separate test that points to each one. For an example, have a look at the author tests for P::C itself at http://perlcritic.tigris.org/source/browse/perlcritic/trunk/Perl-Critic/xt/author/. When author tests are run, there's a test that runs over all the code of P::C, a second test that applies additional rules just on the policies, and a third one that criticizes P::C's tests.
I personally think that everyone should run at the "brutal" severity level, but knock out the policies that they don't agree with. Perl::Critic isn't entirely self compliant; even the P::C developers don't agree with everything Conway says. Look at the perlcriticrc files used on Perl::Critic itself and search the Perl::Critic code for instances of "## no critic"; I count 143 at present.
(Yes, I'm one of the Perl::Critic developers.)
There is perltidy for most stylistic standards. perlcritic can be easily configured using a .perlcritic file. I personally use the it at level one, but I've disabled a few policies.
In addition to 'automated frameworks', I highly recommend Damian Conway's Perl Best Practices. I don't agree with 100% of what he suggests, but most of the time he's bang on.
The post above mentioning Devel::Prof probably really means Devel::Cover (to get the code coverage of a test suite).
Like:
http://metacpan.org/pod/Perl::Critic
http://www.slideshare.net/joshua.mcadams/an-introduction-to-perl-critic/
Looks like a nice tool!
A nice combination is perlcritic with EPIC for Eclipse - hit CTRL-SHIFT-C (or your preferred configured shortcut) and your code is marked up with warning indicators wherever perlcritic has found something to complain about. Much nicer than remembering to run it before checkin. And as normal with perlcritic, it will pick up your .perlcriticrc so you can customise the rules. We keep our .perlcriticrc in version control so everyone gets the same standards.
In addition to the cosmetic best practices, I always find it useful to run Devel::Prof on my unit test suite to check test coverage.