Is the Mercurial .hgignore my only option for handling hundreds of temp files generated when compiling? - version-control

I've been all over google and SO looking for someone who has asked this question, but am coming up completely empty. I'll apologize in advance for the lengthy round-about way of asking the question. (If I was able to figure out how to encapsulate the problem, maybe I would have been successful in finding an answer.)
How are large projects managed in Mercurial, when the act of building / compiling generates hundreds of temporary files in order to create the end result?? Is .hgignore the only answer?
Example Scenario:
You have a project that wants to use some open source package for some feature, and needs to compile from source. So you go get the package. un-.tgz it and then slap it into its own Mercurial repository so you can then start tracking changes. Then you make all your changes, and run a build.
You test your end result, are happy with the results and are ready to commit back to your local clone of the repository. So you do an hg status to check your changes prior to committing The hg status results cause you to immediately start using all those words that would make your mother ashamed — because you now have screens and screens of "build cruft".
For the sake of argument say this package is MySQL or Apache: something that
you don't control and will be changing regularly,
leaves a whole lot of cruft behind in a whole lot of places, and
there is no guarantee the cruft won't change each time you get a new version from the external source.
Wow what? The particular project causing this angst is going to be worked on by multiple developers in multiple physical locations, and so needs to be as straightforward as possible. If there is too much involved they're not going to do it, and we'll have a bigger problem on our hands. (Sadly, some old dogs are not keen on learning new tricks...)
One proposed solution was that they would just have to commit everything locally before doing a make, so they have a "clean slate" they would then have to clone from to actually do the build in. That got shot down as (a) too many steps, and (b) not wanting to cruft up the history with a bunch of "time to build now" changesets.
Someone else has proposed that all the cruft just be committed into the Mercurial repository. I am strongly against that because then the next time around those files will turn up as "modified" and therefore be included in the changeset's file list.
We can't possibly be the only people who have run into this problem. So what is the "right" solution? Is our only recourse to try create a massively intelligent .hginore file? This makes me uneasy, because if I tell Mercurial to "ignore everything in this directory I haven't already told you about", then what happens if the next applied patch adds files into that ignored directory? (Mercurial will never see that new file, right?)
Hopefully this is not a completely stupid question with an obvious answer. I've compiled things from source many times before, but have never needed to apply version control on top of that. Plus we're new to Mercurial.

Two options:
The best option is to do an out of tree build, if you can. This is a build where you place the object files outside of the source tree. Some build systems, such as CMake, support this directly. For other systems, you need to be lucky since the upstream project must have added support for this in their Makefile or similar.
A more general option is to tell Mercurial to ignore specific types of files, not entire directories. This works well in my experience.
To test the second option, I wanted to compile Apache. However, it requires APR, so I tested with that instead. After checking in a clean apr-1.3.8.tar.bz2 I did ./configure; make and looked at the output of hg status. The first few pattens were easy:
syntax: glob
*~
*.o
*.lo
*.la
*.so
.libs/*
The remaining new files look like they are specific files generated by the build process. It's easy to add them too:
% hg status --unknown --no-status >> .hgignore
That also added .hgignore since I hadn't yet scheduled it for addition. Removing that I ended up with this .hgignore file:
syntax: glob
*~
*.o
*.lo
*.la
*.so
.libs/*
.make.dirs
Makefile
apr-1-config
apr-config.out
apr.exp
apr.pc
build/apr_rules.mk
build/apr_rules.out
build/pkg/pkginfo
config.log
config.nice
config.status
export_vars.c
exports.c
include/apr.h
include/arch/unix/apr_private.h
libtool
test/Makefile
test/internal/Makefile
I consider this a quite robust way to go about this in Mercurial or any other revision control system for that matter.

The best solution would be to fix the build process so that it behaves in a 'nice' manner.. namely allowing you to specify some separate directory to store intermediate files in (that could then be completely ignored via a very simple .hgignore entry... or not even within the version-controlled directory structure at all.

For what it's worth, I've found that in this situation a smart .hgignore is the only solution that has worked for me so far. With the inclusion of regular expression support, it's very powerful, but tricky, too, since a pattern that is cruft in one directory may well be source in another.
At least you can check in the .hgignore and share it with your developers. That way the work is only done once.
[Edit] At least, however, it's possible -- as noted above by Martin Geisler -- to have full path specifications in your .hgignore file; you can, therefore, have test/Makefile in the .hgignore and still have Mercurial notice a new test2/Makefile
His process for creating the file should give you almost what you want, and you can tune it from there.

One option you have is to clean your working directory after verifying a build.
make clean
hg status
Of course you may not want to clean your project if it takes more than a few minutes to build.

If the files you want to track are already known to hg, you can hgignore everything. Then you need to use hg import to add patch, and not just use the patch command (since hg needs to be aware if some new files should be tracked).

How about a shell (or whatever) script that walks your build directory recursively, finds every file created after your build process started running, and moves all these files (of course, you can specify the exceptions) into a cruft_dir subdirectory. Then you can just put cruft_dir/* in .hgignore.
EDIT: I forgot to add, but this is fairly obvious, that this shell script runs automatically as soon as your build finishes. Maybe it's even called as the last command in your Makefile/ant/whatever file.

Related

Version control personally and simply?

Requirement
make history for web text/code source files.
login-worker is only me, i.e personal usage.
automatically save history for each updated files(no require at once but at least once per week)
It must be a simple way to start and work.
I have 3 work places so need to do async files.
(not must but hopefully for future working environment) Any other non-engineer can also understand the location of history file and can see it easily.
Current way:
I made history folder the day, download files in there for edit, copy files when I edit/creat new one.
Advantage of the current way:
Very quick and simple, no need to do additional task to make history
Disadvantage of the current way:
Messy. Whenever day I work, I create a new history folder to keep downloaded files, so that it is messy in Finder(or windows explore).
Also, I don't have a way to Doing Async files for sure with in other places.
I tested to use GIT before, I had Thought GIT automatically save files I edit and save with a editor, but that was not the case. Also GIT is too complicated to use/start. If you recommend GIT, you need to show me ways to deal with the problem I had, for instance, simple GIT GUI with limited options without merging/project/branch etc because of personal usage for maintaining just one website.
Do you know any way to do version control personally and simply?
Thanks.
Suppose you entered <form ...> in your HTML—without the closing tag—and saved the file; do you really think the commit created by our imaginary VCS picked up that file's update event would have any sense?
What I mean, is that as with writing programs¹,
the history of source code changes are there for humans to read,
and for that matter, a good history graph should really read like a prose:
each commit should be atomic in the sense it comprises one (small) but
internally integral feature or fixes a bug, and had to be properly annotated
so that the intent of the change captured by that commit is clear.
What you want instead is just some dumb stream of changes purely for backup purposes.
Well, if you're fully aware of the repercussions (the most glaring one is that the generated history is completely useless for doing development on
the project and can only be used for rollbacks in case of "oopsies"),
there are two ways to go:
Some IDEs (namely, Eclipse) save a backup copy of each file they manage
on each save—thus providing your with such a rollback functionality w/o
using any VCS.
Script around any VCS you like: say, on Linux,
you start something like inotifywait telling it to watch your
project's root directory, recurvively, for write events on files,
read whatever the tool prints to its stdout when these events happen,
and for each event, call to your VCS of choice to record a new commit
with these changes.
¹ «Programs must be written for people to read, and only incidentally for machines to execute.» — Abelson & Sussman, "Structure and Interpretation of Computer Programs", preface to the first edition.
I strongly suggest you to have a deeper look at git.
It may looks difficult at the beginning, but you should spend some time learning it, that's all. All the problems above could be easily solved if you spend some time to learn the basics. There is also a nice "tutorial" on github on how to use git, no need to install anything: https://try.github.io/levels/1/challenges/1.

Starting to version an already medium size project

I am about to start participating in the development of a medium-sized project (~50k lines) that was until now written by a single person, and not versioned; as a result folders are cluttered with different versions of the same file (named file1, file2, file3, etc.).
I proposed to start using a VCS for it (a priori Mercurial, which is the only one I've ever used -- for my personal projects --, but I'm open to suggestions), so I'm taking any good ideas as to how to "start" the repository. E.g., should I make an initial commit with all the existing files, and immediately make a new commit with the unused files removed? Or something else?
(constructive remarks on mercurial vs bazaar vs git vs whatever are also welcome.)
Thanks for your tips.
E.g., should I make an initial commit with all the existing files, and immediately make a new commit with the unused files removed?
If the size of the repository is not a concern, then yes, that is a good starting point. Otherwise you can just commit what's actually used, and go from there.
As for which system, all DVCSes stick to the same core principles. Which one you pick is entirely subjective — the only way to truly know which one you like is to try each one.
I would say use what you are the most comfortable with and meets your needs. As far as where to start, I personally would seed the repo with the current source as is, that way you can verify that everything builds and runs as expected. you can make this initial seed a branch. That way you can always go back to your starting point before refactoring.
My approach to this was:
create a Mercurial repository in the existing project folder ("existing")
commit all project files to "existing"
create an empty repository in what a different location ("new")
As files are tested and QA'd (this was necessary because there was so much dross in "existing") pull them from "everything" to "new".
Once files had been pulled into "new"; delete the corresponding files from "existing". If access is needed to these files while the migration is under way, push them back from "new" to "existing".
This gave me the advantage of putting everything under some sort of control for recovery purposes, control over introducing the project to the DVCS. Eventually the existing project folder became completely tested and approved for the project moving forward. At this point the "everything" directory could be deleted or changed into a working folder; and "new" became the actual project folder.
I think Mercurial is a good choice. Lightweight, fast, very simple to use and well-integrated with Windows (if that's the platform you're dealing with).
I would probably get rid of all the clutter before the first commit. Delete everything you don't care about, run all the necessary tests and only then do the commit.
Yes, I'm dead set against the 0-day cluttering of repos.
Granted, a 50K SLOC project isn't very big, but if you commit files you already know you won't need, they will make your repo slightly bigger.
Also, remember to check that the tree doesn't contain large binary files. If it does, get rid of them if at all possible.

What strategy for committing AppName.xcodeproj bundles to SVN?

Each time I do a commit in Xcode I notice that the AppName.xcodeproj file/bundle has been modified. The modifications are obviously important although I don't have enough experience with Xcode to understand them.
What strategy should I use for this? Do I simply commit these changes each time? It's no big deal, it's just that it will appear in SVN history. I'm assuming that I don't add an 'ignore' SVN proprty for this file/bundle, right?
That project folder contains the metadata for your project, so it certainly needs to be included in source control. There are a some user-specific files you can leave out, though. My .gitignore includes these two entries
*.mode2v3
*.pbxuser
But it won't hurt to leave them in, since they don't affect anything when other users open the project.
I have followed this article for every project and it helps me extremely well. You have to commit two files: .gitignore and .gitattibutes first in order for GitX to have effect.
You'd better to add build directory in order not to include temporary binaries into the repository.

Procedures before checking in to source control?

I am starting to get a reputation at work as the "guy who breaks the builds".
The problem is not that I am writing dodgy code, but when it comes to checking my fixes back into source control, it all goes wrong.
I am regularly doing stupid things like :
forgetting to add new files
accidentally checking in code for a half fixed bug along with another bug fix
forgetting to save the files in VS before checking them in
I need to develop some habits / tools to stop this.
What do you regularly do to ensure the code you check in is correct and is what needs to go in?
Edit
I forgot to mention that things can get pretty chaotic in this place. I quite often have two or three things that Im working on in the same code base at any one time. When I check in I will only really want to check in one of those things.
A few suggestions:
try work on one issue at a time. It's easy to make unrelated changes to the codebase that then end up being committed as one big chunk with a poor log message. Git is excels here since you can so easily move switch branches, and stash and cherry pick changes.
run the status command before a commit to see which files you've touched and if you've created new files that need to be added to version control.
run the diff command to see what you've actually changed. Often times you find that you've left in some debug logging that should be taken out or made some unnecessary change that is just cluttering up the diff. Try to make your diffs as small and clean as possible.
make sure your working copy builds with your changes in it
update before checking in and make sure that your working copy builds with other peoples changes in it
run what ever smoke test suite you might have to make sure that your changes work correctly
make small and frequent commits. It's a lot easier to figure out what has broken the build when the breaking commit is small.
Other things that the team can do is setup a continuous integration server like David M suggested so that the broken build is discovered as soon as possibly and automatically.
I usually always do a Get Latest before, then build. If build is good then I check in my code.
Here is what I have been doing. I have used ClearCase and CVS in the past for source control, and most recently I have been using Subversion and Visual Studio 2008 as my IDE.
Make my code changes and build on the local machine.
Make sure they do, in fact, fix the bug in question.
Run an SVN update on the local machine and repeat steps 1 and 2.
Run through the automated unit tests to verify that they pass.
If an automated smoke test is available, which automatically tests a lot of the system's capabilities, run it. Verify that the results are correct.
Then go to the build machine and run the build script.
If the project's configuration has changed, this could definitely break a build. Perform an SVN update on the build machine, whether the build script does that or not. Open the build machine's copy of the IDE, and do a complete rebuild. This will show you whether the build box has any problems that you have taken care of on your machine but not on the build box.
The suggestions to keep separate branches for each issue are also very good, if you can keep track of all of the issues you are working on.
First, use multiple working copies (a.k.a sandboxes) - one per issue. So, if you've been working on some complex feature for a while, and you need to deal with a quick bug fix on the same project, check out a new clean working copy and do the bug fix there. With independent working copies for each issue there is no confusion about which changes to commit from the working copy to the reposistory.
Second, before committing changes, always perform the following three steps:
Buld the software.
Run a smoke test (does it start and run without crashing).
Inspect the changes you're checking in by diffing your changes against the baseline.
These should be repeated after any merge operations (e.g. after an SVN update).
At my workplace, the safety net for this is peer review. That is, get someone else to build, run, and reproduce your solution on their machine, on their view.
I cannot recommend this enough. It has caught so many omissions, would-be problems, and other accidental pieces of junk to make it a valuable part of the process. Not to mention that the mere knowledge that you have to place your work in front of someone else before having it go on to the main branch means that you raise your own quality standards.
In the past I have used branching in Clear Case to help with this issue. The process I used is below. I've never used SorceDepot so I do not know how this can be adapted to work with it.
Create a branch for the bug fix
Code all changes on the branch
Code Review
Merge to stable branch in a different view (the different view is important)
BEFORE checking in: compile, test, and run
Check in code to stable branch
By creating the branch and then merging the changes to a different view (I use Merge Manager to do the merge) any files that were not included or checked in immediately cause issues. This way everything gets tested as it will be when checked in on the stable branch.
The best thing to avoid your problems, is to use hooks, that are provided in most SCMs (they are for sure in SVN and Mercurial, and I believe they must be in other advanced SCMs). Attach unit tests to the hook and make it run every time someone checks code in - exactly before it is checked in. This way you will achieve two things:
code in SCM repo will always pass the tests,
you won't make most simple mistakes, because they should be easily detectable, if you have decent test suite.
I like having Tortoise plugins for Windows Explorer. The file icons are all badged with committed, modified or not added icons making it very easy to see what status the files are in. I also enable the meta data for Modified so I can sort changed files in the list (Details) view, where they bubble to the top so I can see them.
I bet there is a Tortoise* plugin for your SCM, I saw one for Mercurial and SVN (and CVS, ugh). I really wish Mac OS X's Finder would accept plugins like Tortoise, its so much easier than having to pop open a dedicated app most of the time.
Get someone else to go through "every" change "before" you check in the code.

TFS - Branching for experimental development: Solution fails to load

Disclaimer: I'm stuck on TFS and I hate it.
My source control structure looks like this:
/dev
/releases
/branches
/experimental-upgrade
I branched from dev to experimental-upgrade and didn't touch it. I then did some more work in dev and merged to experimental-upgrade. Somehow TFS complained that I had changes in both source and target and I had to resolve them. I chose to "Copy item from source branch" for all 5 items.
I check out the experimental-upgrade to a local folder and try to open the main solution file in there. TFS prompts me:
"Projects have recently been added to this solution. Would you like to get them from source control?
If I say yes it does some stuff but ultimately comes back failing to load a handful of the projects. If I say no I get the same result.
Comparing my sln in both branches tells me that they are equal.
Can anyone let me know what I'm doing wrong? This should be a straightforward branch/merge operation...
TIA.
UPDATE:
I noticed that if I click "yes" on the above dialog, the projects are downloaded to the $/ root of source control... (i.e. out of the dev & branches folders)
If I open up the solution in the branch and remove the dead projects and try to re-add them (by right-clicking sln, add existing project, choose project located in the branch folder, it gives me the error...
Cannot load the project c:\sandbox\my_solution\proj1\proj1.csproj, the file has been removed or deleted. The project path I was trying to add is this: c:\sandbox\my_solution\branches\experimental-upgrade\proj1\proj1.csproj
What in the world is pointing these projects outside of their local root? The solution file is identical to the one in the dev branch, and those projects load just fine. I also looked at the vspscc and vssscc files but didn't find anything.
Ideas?
#Ben
You can actually do a full delete in TFS, but it is highly not recommended unless you know what you are doing. You have to do it from the command line with the command tf destroy
tf destroy [/keephistory] itemspec1 [;versionspec]
[itemspec2...itemspecN] [/stopat:versionspec] [/preview]
[/startcleanup] [/noprompt]
Versionspec:
Date/Time Dmm/dd/yyyy
or any .Net Framework-supported format
or any of the date formats of the local machine
Changeset number Cnnnnnn
Label Llabelname
Latest version T
Workspace Wworkspacename;workspaceowner
Just before you do this make sure you try it out with the /preview. Also everybody has their own methodology for branching. Mine is to branch releases, and do all development in the development or root folder. Also it sounded like branching worked fine for you, just the solution file was screwed up, which may be because of a binding issue and the vssss file.
#Nick: No changes have been made to this just yet. I may have to delete it and re-branch (however you really can't fully delete in TFS)
And I have to disagree... branching is absolutely a good practice for experimental changes. Shelving is just temporary storage that will get backed up if I don't want to check in yet. But this needs to be developed while we develop real features.
Without knowing more about your solution setup I can't be sure. But, if you have any project references that could explain it. Because you have the "experimental-upgrade" subfolder under "branches" your relative paths have changed.
This means when VS used to look for your referenced projects in ..\..\project\whatever it now has to look in ..\..\..\project\whatever. Note the extra ..\
To fix this you have to re-add your project references. I haven't found a better way. You can either remove them and re-add them, or go to the properties window and change the path to them, then reload them. Either way, you'll have to redo your references to them from any projects.
Also, check your working folders to make sure that it didn't download any of your projects into the wrong folders. This can happen sometimes...
A couple of things. Are the folder structures the same? Can you delete and readd the project references successfully?
If you create a solution and then manually add all of the projects, does that work. (That may not be feasable - we have solutions with over a hundred projects).
One other thing (and it may be silly) - after you did the branch, did you commit it? I'm wondering if you branched and didn't check it in, and then merged, and then when you tried to check-in then, TFS was mighty confused.
#Kevin:
This means when VS used to look for your referenced projects in ....\project\whatever it now has to look in ......\project\whatever. Note the extra ..\
You may be on to something here, however it doesn't explain why some projects load and others do not. I haven't found a correlation between them yet.
I think I'll try to re-add the projects and see if that works.
#Cory:
I think that's what I'm going to try... I have about 20 projects and 8 or so aren't loading. The folder structures are identical from root... ie: there aren't any references outside of DEV.