Should I put my output files in source control? - version-control

I've been asked to put every single file in my project under source control, including the database file (not the schema, the complete file).
This seems wrong to me, but I can't explain it. Every resource I find about source control tells me not to put generated output files in a source control system. And I understand, it's not "source" files.
However, I've been presented with the following reasoning:
Who cares? We have plenty of bandwidth.
I don't mind having to resolve a conflict each time I get the latest revision, it's just one click
It's so much more convenient than having to think about good ignore files
Also, if I have to add an external DLL file in the bin folder now, I can't forget to put it in source control, as the bin folder is not being ignored now.
The simple solution for the last bullet-point is to add the file in a libraries folder and reference it from the project.
Please explain if and why putting generated output files under source control is wrong.

You haven't explained what "the database file" is.
I would certainly include 3rd party libraries in source control, as they're necessarily for the build and it's good to have a way of reproducing a build at a later time with the library versions you used at that particular moment. But yes, those libraries should be included from a "libraries" folder rather than the output directory.
I wouldn't generally include my own libraries built from the sources elsewhere in the same repository - although I have been in situations where that's been worth doing, where some projects didn't use the "latest and greatest" version of a common library, but just occasionally updated.
The most important practical argument I'd give against including everything, in a world where disk, processor and network are considered free and instantaneous, is that it makes it harder to tell what really changed for any given commit. It's easier to look down a list of 3 source files than 3 source files and 150 binaries from the obj/bin directories.

Generated output files (in general) are "dangerous" in a VCS because:
what you need to version is how to regenerate them: the day you will need to actually update them, chances are you won't remember how to do it
they can contain some private generated file which make them work on the committer desktop, but not on a client one ("works on my machine" TM syndrome)
some generated file are not easily stored in delta (binary especially), making them consuming lots of space (and the topic of cleaning that space will come-up someday...)
External libraries are not generated directly by your project, and can be put in a VCS, although external repositories like a public Maven repo are better at this kind of management.

Do we also put compiled object files such as class files, executables, DLLs build from our source? What about when we're doing serious volume testing and that database becomes many gigabytes or terabytes in size?
The clue is in the name: it's Source Code Management System.
I can understand the simplicity of put eveything in, it's more likely that developer doesn't forget some important file. But if you're doing regular automated builds then surely that gets picked up anyway?
I think the key phrase is here:
It's so much more convenient than
having to think about good ignore
files
Are you explicitly forbiden from having good ignore files? My guess is that already you are excluding .exe and .class (or whatever) files. Suppose you did take the trouble to exclude your database would that be a problem? Why? It's a concious action that you are chosing to take for the commone good. In Eclipse it's a couple of seconds work to add a new file type to the workspace's CVS ignore rules for all projects.
A rule of "No Ignore Files" is almost self-evidently absurd. Once you have the freedom the have some ignore files then why not just use them intelligently to exclude the DB? Who is inconveninced? Only yourself, if anyone, and you're prepared to do the extra work.

Related

Importing source files and folders into IAR Workbench

I have a cup of source files in a certain folder structure in my file system. I want to use this structure for a project in the IAR Workbench. Thinking of Eclipse, that could be so easy! But in the IAR Workbench, the folders will become to "Groups", which are only kind of virtual folders. The Workbench doesn't care about folders.
Is there some easy and fast way to import them?
Up to now I have to add the groups manually each and then add the files to the groups, and that's really annoying!
Is there maybe a tool to generate a proper project file (*.ewp) out of a file/folder structure path?
This would help me a lot!
You should have a look at IAR Project/Add Project Connection command.
Although IAR doesn't seem to have any public documentation on the xml syntax, or at least I couldn't find any, you can find Infineon DAVE (Config.xml) and Freescale PE (ProjectInfo.xml) files if you search around. These can be used as examples to figure out the syntax on how to write your own xml files in one of these interfaces, to allow you to specify where all your c, h, assembly and library files are from where ever they may be in your file system. They also allow you to define preprocessor includes for compiler/assembler, and DAVE allows you to define a path variable, which is also very useful.
See: https://mcuoneclipse.com/2013/11/01/iar-arm-v6-7-comes-with-improved-processor-expert-support/
I have modified a DAVE Config.xml file and found it EXTREMELY useful for managing and migrating even just a handful of project files. For example to upgrade to a new release with all files having a new directory root, you just change a single line in the xml file (defining the new root), and all source files, compiler includes etc are all updated to the new level. No more manually editing the preprocessor includes or replacing all the files in the project. And no more fiddling around with ../../ file system hierarchy navigation stuff, you just specify directly (or indirectly via a path to) where the files are, no more relative from where your project happens to be. VERY NICE.
IAR should consider opening this up (documenting) for general users, as it is very useful for project management and migration. While at it they should also consider generalizing the xml syntax a little bit and allow for definition of IAR group heading names, specifying linker file name, and definitely allowing multiple xml files to be included (connected) (so that subprojects can be easily added or removed without effecting the other subproject definition files) and a few basic things like that.
If they where to do a bang up job on this, they might consider allowing most/all aspects of IAR project configuration that might be required by the subproject, to be defined in these xml files, and then entire (sub)projects could just be plopped down anywhere and be up an running extremely quickly (OK, just let me dream a bit :)
For anyone who happens upon this you may want to check out https://github.com/IARSystems/project-migration-tools. They have a tool for pulling in file trees here.

Project files into VCS or not?

In our company we have a discussion whether to put project files into our Version Control System. What do you think? Consider an Eclipse project file for a C project that contain source and make files and other things. Would you put it into VCS?
If the project files meet the following criteria:
They only contain information for building the source quickly, checkout, commit and the basic routines (for developers)
Parts maybe for release can be separated from internal only (if you are a FOSS project or proprietary, for example)
They don't change anyone's IDE setup or personal preferences
They can be treated like source code for internal-only releases, and may have their own bugs and patches
I don't see a major reason why not. Makefiles/autotools defs usually go in the RCS (autotools inputs at least). Providing the data stored is relevant to all, and their machines (build output directories ...) give it a go
I'd recommend checking them in unless they contain absolute paths (some ancient IDEs like Borland C++ Builder do that), or - like Aiden Bell wrote - they contain IDE setup info.
For example: with Eclipse, .project and .classpath are safe. With Visual Studio, *.csproj and *.sln are safe (whereas *.suo is not).
Id recommend to allways check them in. It wont cost you anything, but sometimes you run into situations where you will be happy to check i.e. different settings of project files etc.
If you're using RCS to mean a general revision control system, then, yes, check source and make files in, and in general pretty much anything that you can't easily recreate from what you've got checked in.
If you're using RCS to mean rcs, then please, PLEASE upgrade to something better. SVN would be a good choice, or Git or something like that.

What to put under version control?

Almost any IDE creates lots of files that have nothing to do with the application being developed, they are generated and mantained by the IDE so he knows how to build the application, where the version control repository is and so on.
Should those files be kept under version control along with the files that really have something to do with the aplication (source code, application's configuration files, ...)?
The things is: on some IDEs if you create a new project and then import it into the version-control repository using the version-control client/commands embedded in the IDE, then all those files are sent to the respitory. And I'm not sure that's right: what is two different developers working on the same project want to use two different IDEs?
I want to keep this question agnostic avoiding references to any particular IDE, programming language or version control system. So this question is not exactly the same as these:
SVN and binaries - but this talks about binaries and SVN
Do you keep your build tools in version control? - but this talks about build tools (e.g. putting the jdk under version control)
What project files shouldn’t be checked into SVN - but this talks about SVN and dll's
Do you keep your project files under version control? - very similar (haven't found it before), thanks VonC
Rules of thumb:
Include everything which has an influence on the build result (compiler options, file encodings, ASCII/binary settings, etc.)
Include everything to make it possible to open the project from a clean checkout and being able to compile/run/test/debug/deploy it without any further manual intervention
Don't include files which contain absolute paths
Avoid including personal preferences (tab size, colors, window positions)
Follow the rules in this order.
[Update] There is always the question what should happen with generated code. As a rule of thumb, I always put those under version control. As always, take this rule with a grain of salt.
My reasons:
Versioning generated code seems like a waste of time. It's generated right? I can get it back at a push of a button!
Really?
If you had to bite the bullet and generate the exact same version of some previous release without fail, how much effort would it be? When generating code, you not only have to get all the input files right, you also have to turn back time for the code generator itself. Can you do that? Always? As easy as it would be to check out a certain version of the generated code if you had put it under version control?
And even if you could, could you ever be sure that didn't miss something?
So on one hand, putting generated code under version control make sense since it makes it dead easy to do what VCS are meant for: Go back in time.
Also it makes it easy to see the differences. Code generators are buggy, too. If I fix a bug and have 150'000 files generated, it helps a lot when I can compare them to the previous version to see that a) the bug is gone and b) nothing else changed unexpectedly. It's the unexpected part which you should worry about. If you don't, let me know and I'll make sure you never work for my company ever :-)
The major pain point of code generators is stability. It doesn't do when your code generator just spits out a random mess of bytes every time you run (well, unless you don't care about quality). Code generators need to be stable and deterministic. You run them twice with the same input and the output must be identical down to least significant bit.
So if you can't check in generated code because every run of the generator creates differences that aren't there, then your code generator has a bug. Fix it. Sort the code when you have to. Use hash maps that preserve order. Do everything necessary to make the output non-random. Just like you do everywhere else in your code.
Generated code that I might not put under version control would be documentation. Documentation is somewhat of a soft target. It doesn't matter as much when I regenerate the wrong version of the docs (say, it has a few typos more or less). But for releases, I might do that anyway so I can see the differences between releases. Might be useful, for example, to make sure the release notes are complete.
I also don't check in JAR files. As I do have full control over the whole build and full confidence that I can get back any version of the sources in a minute plus I know that I have everything necessary to build it without any further manual intervention, why would I need the executables for? Again, it might make sense to put them into a special release repo but then, better keep a copy of the last three years on your company's web server to download. Think: Comparing binaries is hard and doesn't tell you much.
I think it's best to put anything under version control that helps developers to get started quickly, ignoring anything that may be auto-generated by an IDE or build tools (e.g. Maven's eclipse plugin generates .project and .classpath - no need to check these in). Especially avoid files that change often, that contain nothing but user preferences, or that conflict between IDEs (e.g. another IDE that uses .project just like eclipse does).
For eclipse users, I find it especially handy to add code style (.settings/org.eclipse.jdt.core.prefs - auto formatting on save turned on) to get consistently formatted code.
Everything that can be automatically generated from the source+configuration files should not be under the version control! It only causes problems and limitations (like the one you stated - using 2 different project files by different programmers).
Its true not only for IDE "junk files" but also for intermediate files (like .pyc in python, .o in c etc).
This is where build automation and build files come in.
For example, you can still build the project (the two developers will need the same build software obviously) but they then could in turn use two different IDE's.
As for the 'junk' that gets generated, I tend to ignore most if it. I know this is meant to be language agnostic but consider Visual Studio. It generates user files (user settings etc..) this should not be under source control.
On the other hand, project files (used by the build process) most certainly should. I should add that if you are on a team and have all agreed on an IDE, then checking in IDE specific files is fine providing they are global and not user specific and/or not needed.
Those other questions do a good job of explaining what should and shouldn't be checked into source control so I wont repeat them.
In my opinion it depends on the project and environment. In a company environment where everybody is using the same IDE it can make sense to add the IDE files to the repository. While this depends a bit on the IDE, as some include absolute paths to things.
For a project which is developed in different environments it doesn't make sense and will be pain in the long run as the project files aren't maintained by all developers and make it harder to find "relevant" things.
Anything that would be devastating if it were lost, should be under version control.
In my opinion, anything needed to build the project (code, make files, media, databases with required program info, etc) should be in repositories. I realise that especially for media/database files this is contriversial, but to me if you can't branch and then hit build the source control's not doing it's job. This goes double for distributed systems with cheap branch creation/merging.
Anything else? Store it somewhere different. Developers should choose their own working environment as much as possible.
From what I have been looking at with version control, it seems that most things should go into it - e.g. source code and so on. However, the problem that many VCS's run into is when trying to handle large files, typically binaries, and at times things like audio and graphic files. Therefore, my personal way to do it is to put the source code under version control, along with general small sized graphics, and leave any binaries to other systems of management. If it is a binary that I created myself using the build system of the IDE, then that can definitily be ignored, because it is going to be regenerated every build. For dependancy libraries, well this is where dependancy package managers come in.
As for IDE generated files (I am assuming these are ones that aren't generated during the build process, such as the solution files for Visual Studio) - well, I think it would depend on whether or not you are working alone. If you are working alone, then go ahead and add them - they will allow you to revert settings in the solution or whatever you make. Same goes for other non-solution like files as well. However, if you are collaborating, then my recomendation is no - most IDE generated files tend to be, well, user specific - aka they work on your machine, but not neccesarily on others. Hence, you may be better of not including IDE generated files in that case.
tl;dr you should put most things that relate to your program into version control, excluding dependencies (things like libraries, graphics and audio should be handled by some other dependancy management system). As for things directly generated by the IDE - well, it would depend on if you are working alone or with other people.

Version control of deliverables

We need to regularly synchronize many dozens of binary files (project executables and DLLs) between many developers at several different locations, so that every developer has an up to date environment to build and test at. Due to nature of the project, updates must be done often and on-demand (overnight updates are not sufficient). This is not pretty, but we are stuck with it for a time.
We settled on using a regular version (source) control system: put everything into it as binary files, get-latest before testing and check-in updated DLL after testing.
It works fine, but a version control client has a lot of features which don't make sense for us and people occasionally get confused.
Are there any tools better suited for the task? Or may be a completely different approach?
Update:
I need to clarify that it's not a tightly integrated project - more like extensible system with a heap of "plugins", including thrid-party ones. We need to make sure those modules-plugins works nicely with recent versions of each other and the core. Centralised build as was suggested was considered initially, but it's not an option.
I'd probably take a look at rsync.
Just create a .CMD file that contains the call to rsync with all the correct parameters and let people call that. rsync is very smart in deciding what part of files need to be transferred, so it'll be very fast even when large files are involved.
What rsync doesn't do though is conflict resolution (or even detection), but in the scenario you described it's more like reading from a central place which is what rsync is designed to handle.
Another option is unison
You should look into continuous integration and having some kind of centralised build process. I can only imagine the kind of hell you're going through with your current approach.
Obviously that doesn't help with the keeping your local files in sync, but I think you have bigger problems with your process.
Building the project should be a centralized process in order to allow for better control soon your solution will be caos in the long run. Anyway here is what I'd do.
Create the usual repositories for
source files, resources,
documentation, etc for each project.
Create a repository for resources.
There will be the latest binary
versions for each project as well as
any required resources, files, etc.
Keep a good folder structure for
each project so developers can
"reference" the files directly.
Create a repository for final buidls
which will hold the actual stable
release. This will get the stable
files, done in an automatic way (if
possible) from the checked in
sources. This will hold the real
product, the real version for
integration testing and so on.
While far from being perfect you'll be able to define well established protocols. Check in your latest dll here, generate the "real" versión from latest source here.
What about embedding a 'what' string in the executables and libraries. Then you can synchronise the desired list of versions with a manifest.
We tend to use CVS id strings as a part of the what string.
const char cvsid[] = "#(#)INETOPS_filter_ip_$Revision: 1.9 $";
Entering the command
what filter_ip | grep INETOPS
returns
INETOPS_filter_ip_$Revision: 1.9 $
We do this for all deliverables so we can see if the versions in a bundle of libraries and executables match the list in a associated manifest.
HTH.
cheers,
Rob
Subversion handles binary files really well, is pretty fast, and scriptable. VisualSVN and TortoiseSVN make dealing with Subversion very easy too.
You could set up a folder that's checked out from Subversion with all your binary files (that all developers can push and update to) then just type "svn update" at the command line, or use TortoiseSVN: right click on the folder, click "SVN Update" and it'll update all the files and tell you what's changed.

Do you version "derived" files?

Using online interfaces to a version control system is a nice way to have a published location for the most recent versions of code. For example, I have a LaTeX package here (which is released to CTAN whenever changes are verified to actually work):
http://github.com/wspr/pstool/tree/master
The package itself is derived from a single file (in this case, pstool.tex) which, when processed, produces the documentation, the readme, the installer file, and the actual files that make up the package as it is used by LaTeX.
In order to make it easy for users who want to download this stuff, I include all of the derived files mentioned above in the repository itself as well as the master file pstool.tex. This means that I'll have double the number of changes every time I commit because the package file pstool.sty is a generated subset of the master file.
Is this a perversion of version control?
#Jon Limjap raised a good point:
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
That's really the crux of the matter in this case. Yes, released versions of the package can be obtained from elsewhere. So it does really make more sense to only version the non-generated files.
On the other hand, #Madir's comment that:
the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes
is also rather pertinent in that if a user finds a bug and I fix it immediately, they can then head over to the repository and grab the file that's necessary for them to continue working without having to run any "installation" steps.
And this, I think, is the more important use case for my particular set of projects.
We don't version files that can be automatically generated using scripts included in the repository itself. The reason for this is that after a checkout, these files can be rebuild with a single click or command. In our projects we always try to make this as easy as possible, and thus preventing the need for versioning these files.
One scenario I can imagine where this could be useful if 'tagging' specific releases of a product, for use in a production environment (or any non-development environment) where tools required for generating the output might not be available.
We also use targets in our build scripts that can create and upload archives with a released version of our products. This can be uploaded to a production server, or a HTTP server for downloading by users of your products.
I am using Tortoise SVN for small system ASP.NET development. Most code is interpreted ASPX, but there are around a dozen binary DLLs generated by a manual compile step. Whilst it doesn't make a lot of sense to have these source-code versioned in theory, it certainly makes it convenient to ensure they are correctly mirrored from the development environment onto the production system (one click). Also - in case of disaster - the rollback to the previous step is again one click in SVN.
So I bit the bullet and included them in the SVN archive - the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes.
Not necessarily, although best practices for source control advise that you do not include generated files, for obvious reasons.
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
Normally, derived files should not be stored in version control. In your case, you could build a release procedure that created a tarball that includes the derived files.
As you say, keeping the derived files in version control only increases the amount of noise you have to deal with.
In some cases we do, but it's more of a sysadmin type of use case, where the generated files (say, DNS zone files built from a script) have intrinsic interest in their own right, and the revision control is more linear audit trail than branching-and-tagging source control.