Should I store generated code in source control

Should I store generated code in source control - version-control

This is a debate I'm taking a part in. I would like to get more opinions and points of view.
We have some classes that are generated in build time to handle DB operations (in This specific case, with SubSonic, but I don't think it is very important for the question). The generation is set as a pre-build step in Visual Studio. So every time a developer (or the official build process) runs a build, these classes are generated, and then compiled into the project.
Now some people are claiming, that having these classes saved in source control could cause confusion, in case the code you get, doesn't match what would have been generated in your own environment.
I would like to have a way to trace back the history of the code, even if it is usually treated as a black box.
Any arguments or counter arguments?
UPDATE: I asked this question since I really believed there is one definitive answer. Looking at all the responses, I could say with high level of certainty, that there is no such answer. The decision should be made based on more than one parameter. Reading the answers below could provide a very good guideline to the types of questions you should be asking yourself when having to decide on this issue.
I won't select an accepted answer at this point for the reasons mentioned above.

Saving it in source control is more trouble than it's worth.
You have to do a commit every time you do a build for it to be any value.
Generally we leave generated code( idl, jaxb stuff, etc) outside source control where I work and it's never been a problem

Put it in source code control. The advantage of having the history of everything you write available for future developers outweighs the minor pain of occasionally rebuilding after a sync.

Every time I want to show changes to a source tree on my own personal repo, all the 'generated files' will show up as having changed and need comitting.
I would prefer to have a cleaner list of modifications that only include real updates that were performed, and not auto-generated changes.
Leave them out, and then after a build, add an 'ignore' on each of the generated files.

Look at it this way: do you check your object files into source control? Generated source files are build artifacts just like object files, libraries and executables. They should be treated the same. Most would argue that you shouldn't be checking generated object files and executables into source control. The same arguments apply to generated source.
If you need to look at the historical version of a generated file you can sync to the historical version of its sources and rebuild.
Checking generated files of any sort into source control is analogous to database denormalization. There are occasionally reasons to do this (typically for performance), but this should be done only with great care as it becomes much harder to maintain correctness and consistency once the data is denormalized.

I would say that you should avoid adding any generated code (or other artifacts) to source control. If the generated code is the same for the given input then you could just check out the versions you want to diff and generate the code for comparison.

I call the DRY principle. If you already have the "source files" in the repository which are used to generate these code files at build time, there is no need to have the same code committed "twice".
Also, you might avert some problems this way if for example the code generation breaks someday.

No, for three reasons.
Source code is everything necessary and sufficient to reproduce a snapshot of your application as of some current or previous point in time - nothing more and nothing less. Part of what this implies is that someone is responsible for everything checked in. Generally I'm happy to be responsible for the code I write, but not the code that's generated as a consequence of what I write.
I don't want someone to be tempted to try to shortcut a build from primary sources by using intermediate code that may or may not be current (and more importantly that I don't want to accept responsibility for.) And't it's too tempting for some people to get caught up in a meaningless process about debugging conflicts in intermediate code based on partial builds.
Once it's in source control, I accept responsibility for a. it being there, b. it being current, and c. it being reliably integratable with everything else in there. That includes removing it when I'm no longer using it. The less of that responsibility the better.

I really don't think you should check them in.
Surely any change in the generated code is either going to be noise - changes between environments, or changes as a result of something else - e.g. a change in your DB. If your DB's creation scripts (or any other dependencies) are in source control then why do you need the generated scripts as well?

The general rule is no, but if it takes time to generate the code (because of DB access, web services, etc.) then you might want to save a cached version in the source control and save everyone the pain.
Your tooling also need to be aware of this and handle checking-out from the source control when needed, too many tools decide to check out from the source control without any reason.
A good tool will use the cached version without touching it (nor modifying the time steps on the file).
Also you need to put big warning inside the generated code for people to not modify the file, a warning at the top is not enough, you have to repeat it every dozen lines.

We don't store generated DB code either: since it is generated, you can get it at will at any given version from the source files. Storing it would be like storing bytecode or such.
Now, you need to ensure the code generator used at a given version is available! Newer versions can generate different code...

There is a special case where you want to check in your generated files: when you may need to build on systems where tools used to generate the other files aren't available. The classic example of this, and one I work with, is Lex and Yacc code. Because we develop a runtime system that has to build and run on a huge variety of platforms and architectures, we can only rely on target systems to have C and C++ compilers, not the tools necessary to generate the lexing/parsing code for our interface definition translator. Thus, when we change our grammars, we check in the generated code to parse it.

Leave it out.
If you're checking in generated files you're doing something wrong. What's wrong may differ, it could be that your build process is inefficient, or something else, but I can't see it ever being a good idea. History should be associated with the source files, not the generated ones.
It just creates a headache for people who then end up trying to resolve differences, find the files that are no longer generated by the build and then delete them, etc.
A world of pain awaits those who check in generated files!

In some projects I add generated code to source control, but it really depends. My basic guideline is if the generated code is an intrinsic part of the compiler then I won't add it. If the generated code is from an external tool, such as SubSonic in this case, then I would add if to source control. If you periodically upgrade the component then I want to know the changes in the generated source in case bugs or issues arise.
As far as generated code needing to be checked in, a worst case scenario is manually differencing the files and reverting the files if necessary. If you are using svn, you can add a pre-commit hook in svn to deny a commit if the file hasn't really changed.

arriving a bit late ... anyway ...
Would you put compiler's intermediate file into source version control ?
In case of code generation, by definition the source code is the input of the generator while the generated code can be considered as intermediate files between the "real" source and the built application.
So I would say: don't put generated code under version control, but the generator and its input.
Concretely, I work with a code generator I wrote: I never had to maintain the generated source code under version control. I would even say that since the generator reached a certain maturity level, I didn't have to observe the contents of generated code although the input (for instance model description) changed.

The job of configuration management (of which version control is just one part) is to be able to do the following:
Know which changes and bug fixes have gone into every delivered build.
Be able to reproduce exactly any delivered build, starting from the original source code. Automatically generated code does not count as "source code" regardless of the language.
The first one ensures that when you tell the client or end user "the bug you reported last week is fixed and the new feature has been added" they don't come back two hours later and say "no it hasn't". It also makes sure they don't say "Why is it doing X? We never asked for X".
The second one means that when the client or end user reports a bug in some version you issued a year ago you can go back to that version, reproduce the bug, fix it, and prove that it was your fix has eliminated the bug rather than some perturbation of compiler and other fixes.
This means that your compiler, libraries etc also need to be part of CM.
So now to answer your question: if you can do all the above then you don't need to record any intermediate representations, because you are guaranteed to get the same answer anyway. If you can't do all the above then all bets are off because you can never guarantee to do the same thing twice and get the same answer. So you might as well put all your .o files under version control as well.

There are good arguments both for and against presented here.
For the record, I build the T4 generation system in Visual Studio and our default out-of-the-box option causes generated code to be checked in. You have to work a bit harder if you prefer not to check in.
For me the key consideration is diffing the generated output when either the input or generator itself is updated.
If you don't have your output checked in, then you have to take a copy of all generated code before upgrading a generator or modifying input in order to be able to compare that with the output from the new version. I think this is a fairly tedious process, but with checked in output, it's a simple matter of diffing the new output against the repository.
At this point, it is reasonable to ask "Why do you care about changes in generated code?" (Especially as compared to object code.)
I believe there are a few key reasons, which come down to the current state of the art rather than any inherent problem.
You craft handwritten code that meshes tightly with generated code. That's not the case on the whole with obj files these days. When the generated code changes, it's sadly quite often the case that some handwritten code needs to change to match. Folks often don't observe a high degree of backwards compatibility with extensibility points in generated code.
Generated code simply changes its behavior. You wouldn't tolerate this from a compiler, but in fairness, an application-level code generator is targeting a different field of problem with a wider range of acceptable solutions. It's important to see if assumptions you made about previous behavior are now broken.
You just don't 100% trust the output of your generator from release to release. There's a lot of value to be had from generator tools even if they aren't built and maintained with the rigor of your compiler vendor. Release 1.0 might have been perfectly stable for your application but maybe 1.1 has a few glitches for your use case now. Alternatively you change input values and find that you are exercisig a new piece of the generator that you hadn't used before - potentially you get surprised by the results.
Essentially all of these things come down to tool maturity - most business app code generators aren't close to the level that compilers or even lex/yacc-level tools have been for years.

Both side have valid and reasonable argument, and it's difficult to agree on something common. Version Control Systems (VCSs) tracks the files
developers put into it, and have the assumption that the files inside VCS are hand crafted by developers, and developers are interested in the history
and change between any revision of the files. This assumption equalize the two concepts, "I want to get this file when I do checkout." and "I am
interested in the change of this file."
Now, the arguments from both sides could be rephrase like this:
"I want to get all these generated files when I do checkout, because I don't have the tool to generate them in this machine."
"I should not put them into VCS, since I am not interested in the change of this file."
Fortunately, it seems that the two requirements are not conflicting fundamentally. With some extension of current VCSs, it should be possible to have
both. In other words, it's a false dilemma. If we ponder a while, it's not hard to realize that the problem stems from the assumption VCSs hold. VCSs
should distinguish the files, which are hand crafted by developers, from files which are not hand crafted by developers, but just happens to be inside
this VCS. For the first category of files, which we call source files (code) usually, VCSs have done great job now. For the latter category, VCSs have
not had such concept yet, as far as I know.
Summary
I will take git as one example to illustrate what I mean.
git status should not show generated files by default.
git commit should include generated files as snapshot.
git diff should not show generated files by default.
PS
Git hooks could be used as a workaround, but it would be great if git supports it natively. gitignore doesn't meet our requirement, for ignored
files won't go into VCSs.enter code here

I would argue for. If you're using a continuous integration process that checks out the code, modifies the build number, builds the software and then tests it, then it's simpler and easier to just have that code as part of your repository.
Additionally, it's part and parcel of every "snapshot" that you take of your software repository. If it's part of the software, then it should be part of the repository.

It really depends. Ultimately, the goal is to be able to reproduce what you had if need be. If you are able to regenerate your binaries exactly, there is no need to store them. but you need to remember that in order to recreate your stuff you will probably need your exact configuration you did it with in the first place, and that not only means your source code, but also your build environment, your IDE, maybe even other libraries, generators or stuff, in the exact configuration (versions) you have used.
I have run into trouble in projects were we upgraded our build environment to newer versions or even to another vendors', where we were unable to recreate the exact binaries we had before. This is a real pain when the binaries to be deplyed depend on a kind of hash, especially in secured environment, and the recreated files somehow differ because of compiler upgrades or whatever.
So, would you store generated code: I would say no. The binaries or deliverables that are released, including the tools that you reproduced them with I would store. And then, there is no need to store them in source control, just make a good backup of those files.

I (regretfully) wind up putting a lot of derived sources under source control because I work remotely with people who either can't be bothered to set up a proper build environment or who don't have the skills to set it up so that the derived sources are built exactly right. (And when it comes to Gnu autotools, I am one of those people myself! I can't work with three different systems each of which works with a different version of autotools—and only that version.)
This sort of difficulty probably applies more to part-time, volunteer, open-source projects than to paid projects where the person paying the bills can insist on a uniform build environment.
When you do this, you're basically committing to building the derived files only at one site, or only at properly configured sites. Your Makefiles (or whatever) should be set up to notice where they are running and should refuse to re-derive sources unless they know they are running at a safe build site.

The correct answer is "It Depends". It depends upon what the client's needs are.
If you can roll back code to a particular release and stand up to any external audit's without it, then you're still not on firm ground. As dev's we need to consider not just 'noise', pain and disk space, but the fact that we are tasked with the role of generating intellectual property and there may be legal ramifications. Would you be able to prove to a judge that you're able to regenerate a web site exactly the way a customer saw it two years ago?
I'm not suggesting you save or don't save gen'd files, whichever way you decide if you're not involving the Subject Matter Experts of the decision you're probably wrong.
My two cents.

I would say that yes you want to put it under source control. From a configuration management standpoint EVERYTHING that is used to produce a software build needs to be controlled so that it can be recreated. I understand that generated code can easily be recreated, but an argument can be made that it is not the same since the date/timestamps will be different between the two builds. In some areas such as government, they require a lot of times this is what's done.

In general, generated code need not be stored in source control because the revision history of this code can be traced by the revision history of the code that generated it!
However, it sounds the OP is using the generated code as the data access layer of the application instead of manually writing one. In this case, I would change the build process, and commit the code to source control because it is a critical component of the runtime code. This also removes the dependency on the code generation tool from the build process in case the developers need to use different version of the tool for different branches.
It seems that the code only needs to be generated once instead of every build. When a developer needs to add/remove/change the way an object accesses the database, the code should be generated again, just like making manual modifications. This speeds up the build process, allows manual optimizations to be made to the data access layer, and history of the data access layer is retained in a simple manner.

If it is part of the source code then it should be put in source control regardless of who or what generates it. You want your source control to reflect the current state of your system without having to regenerate it.

Absolutely have the generated code in source control, for many reasons. I'm reiterating what a lot of people have already said, but some reasons I'd do it are
With codefiles in source control, you'll potentially be able to compile the code without using your Visual Studio pre-build step.
When you're doing a full comparison between two versions, it would be nice to know if the generated code changed between those two tags, without having to manually check it.
If the code generator itself changes, then you'll want to make sure that the changes to the generated code changes appropriately. i.e. If your generator changes, but the output isn't supposed to change, then when you go to commit your code, there will be no differences between what was previously generated and what's in the generated code now.

I would leave generated files out of a source tree, but put it in a separate build tree.
e.g. workflow is
checkin/out/modify/merge source normally (w/o any generated files)
At appropriate occasions, check out source tree into a clean build tree
After a build, checkin all "important" files ("real" source files, executables + generated source file) that must be present for auditing/regulatory purposes. This gives you a history of all appropriate generated code+executables+whatever, at time increments that are related to releases / testing snapshots, etc. and decoupled from day-to-day development.
There's probably good ways in Subversion/Mercurial/Git/etc to tie the history of the real source files in both places together.

Looks like there are very strong and convincing opinions on both sides. I would recommend reading all the top voted answers, and then deciding what arguments apply to your specific case.
UPDATE: I asked this question since I really believed there is one definitive answer. Looking at all the responses, I could say with high level of certainty, that there is no such answer. The decision should be made based on more than one parameter. Reading the other answers could provide a very good guideline to the types of questions you should be asking yourself when having to decide on this issue.

Related

Automate build and developement pattern with VisualStudio

I'm currently working on a project that's been going on for several years straight. The development-team is small (less than 5 programmers), and source-control is virtually non-existent, and the deployment-process as is is just based on manually moving files from one server to another. The project is in classic ASP, so building isn't an issue, as both deployment and testing is just about getting the files to where they need to be and directing the browser at the correct location. Currently all development is done on a network-drive which is also the test-server. The test-server is only available when inside the the local network (can be accessed trough vpn), and is available on the address 'site.test' in the browser (requires editing to the hosts-file on all the clients, but since there are so few of us that hasn't proven to be any problem at all). All development is done in visual studio. Whenever a file is change the developer that changed the file is required to write the file he changed into a word-document and include a small description as of what was changed and why. Then, whenever there's supposed to be a version-bump (deployment), our lead-developer goes trough the word-document and copies every file (file by file) that has changed over to the production-server. Now, I don't think I need to tell you that this method is very error prone (a developer might for instance forget to add that he changed some dependency, and that might cause problems when deployed), and there's a lot of work involved with deploying.
And here comes the main question. I've been asked by the lead developer to use some time and see if I can come up with a simple solution that can simplify and automate the "version-control" and the deployment. Now, the important thing is that it's as easy as posible to use for the developers. Two of the existing developers have worked with computers for a long time, and are pretty stuck up in their routines, so for instance changing it into something like git bash wouldn't work at all. Don't get me wrong, I love git, but the first time one of them got a merge-conflict they wouldn't know what to do at all. Also, it would be ideal to change to a more distributed development-process where the developers wouldn't need to be logged into vpn (or need internet at all) to develop, and the changes they made offline could be synced up when they were done with them. Now, I've looked at Teem Development Server from Microsoft because of it's strong integration with Visual Studio. As far as I've tested it seems possible to make Visual Studio prompt the user if they want to check in changes whenever the user closes Visual Studio. Now, using TFS for source-control would probably eliminate most of the problems with the development, but how about deployment? Not to mention versioning? As far as I've understood (I've only looked briefly at TFS), TFS has a running number for every check-in, but is it possible to tell TFS that this check-in should be version 2.0.1 of the system (for example), and then have it deploy it to the web-server? And another problem, the whole solution consists of about 10 directories with hundreds of files in, though the system itself (without images and such) is only 5 directories, and only these 5 should be deployed to the server, is this possible to automate?
I know there's a lot of questions here, but what is most important is that I want to automate the development process (not the coding, but the managing of the code), and the deployment process, and I want to make it as simple as possible to use. I don't care if the setup is a bit of work, cause I got enough time at hand to setup whatever system that fits our needs, but the other devs should not have to do a lot of setup. If all of the machines that should use the system needs to be setup once, that's no problem at all, cause I can do that, but there shouldn't bee any need to do config and setups as we go.
Now, do any of you have any suggestions to what systems to use/how to use them, in order to simplify the described processes above? I've worked with several types of scm-systems before (GIT, HG and SubVersion), but I don't have any experience with build-systems at all (if that is needed). Articles, and discussion on how to efficiently setup systems like this would be greatly appreciated. In advance, thanks.

This is pretty subjective territory, but I think you need to get some easy wins first. The developers who are "stuck up in there ways" are the main roadblock here. They are going to see change as disruptive and not worth it. You need to slowly and carefully go for the easy wins.
First, TFS is probably not going to be a good choice. It's expensive, heavy, and the source control in TFS is pretty lousy. Go for Subversion: it's easy to setup and easy to use, and it's free. Get that in place first, and get the devs using it. Much easier said than done.
Later (possibly much later), once the devs are using it and couldn't imagine life without a VCS, then you could switch to Hg or Git if you need first class branching and all those other nice features.
Once you have Subversion in place, you can use something like JetBrains TeamCity or Jenkins, both of which are free and easy to use. However, I'm just assuming you don't have a lot of tests and build scripts that the CI server is really going to be running, so it's far more important that you get VCS first. In all things: keep it as simple as possible. Baby steps. Get some wins, build trust, repeat.

I can't even begin to think where to begin with this! Intending no offense directed at you, apart from the mention of git and HG, this post could have been written 10 years ago.
1) Source control - How can a team of developers possibly work effectively without some form of source control? Hell, even if it's Visual Source Safe (* shudder *) at least it would be something. You have to insist that the team implement source control. You know what's available so I won't get into preaching about that. (However, Subversion with TortoiseSVN has worked quite well for me.)
2)
"write the file he changed into a
word-document and include a small
description as of what was changed and
why"
You have got to be kidding... What happens if two developers change the same file? Does the lead then have to manually merge two changes that s/he extracts from the word doc? Please see #1 and explain to them how commit comments work.
Since your don't really need to "build" (i.e. compiled, etc.) anything, you should be able to solve most of your problems with some simple tools. First and foremost you need to use a source control solution. Yes, the developers would have to learn how to use another tool (EEEK!). You could do the initial leg work of getting the code into the repository. If you have file access to the other developers machines, you could even copy a checked-out working copy to their machines so they wouldn't have to do the checkout themselves (not really that hard). You could then use all the creamy goodness of version control to create version branches when each deployment needs to be done. You could write simple scripts using the command line SVN tools to check out said branches and automatically copy the files to the target server(s). Using a tool like BeyondCompare, the copy process could be restricted to only the files that are different (plus BC can handle an FTP target if that is an issue). By enforcing commit comments on the SVN repo, you'll guarantee that the developers provide comments, and for each set of changes between releases you could very easily generate a list of all those comments using the CSM log retrieval features.

Should I put included code under SCM?

I'm developing a web app.
If I include a jQuery plugin (or the jQuery file itself), this has to be put under my static directory, which is under SCM, to be served correctly.
Should I gitignore it, or add it, even if I don't plan on modifying anything from it?
And what about binary files (graphic resources) that might come with it?
Thanks in advance for any advice!

My view is that everything you need for your application to run correctly needs to be managed. This includes third-party code.
If you don't put it under SCM, how is it going to get deployed correctly on your production systems? If you have other ways of ensuring that, that's fine, but otherwise you run the risk that successful deployment is a matter of people remembering to do all the right things, rather than some automated low-risk "push the button" procedure.
If you don't manage it under SCM or something similar, how do you ensure that the versions you develop against and test against are the same? And that they're the same as production? Debugging an issue caused by a version difference you don't notice can be horrible.

I generally add external resources to my project directly. Doing so facilitates deployment and ensures that if someone changes the version of this file in your project, you have a clear audit history of what happened in case it causes issues in the code that you've written. Developers should know not to modify these external resources.
You could use something like git submodules, I suppose, but I haven't felt that this is worth the hassle in the past.
Binary files from external sources can be checked in to the project as well, although if they're extremely large you may want to consider a different approach.

There aren't a lot of reasons not to put external resources like jQuery into your repo:
If you pull it down from jQuery every time you check out or deploy, you have less control over which version you're using. This holds true for most third-party libraries; you probably don't want to upgrade your libraries without testing with your code to see if it breaks something.
You'll always have a complete copy of your site when you check out your repository and you won't need to go seeking resources that may have become unavailable.
For small (in terms of filesize) things like jQuery and images, I'd just add them unless you're really, really concerned about space.

It depends.
These arguments relate to having a copy of the library on your system and not pulling it from it's original location.
Arguments in favour:
It will ensure that everything needed for your project can be found in one place when someone else joins your development team. I've lost count of the number of times I've had to scramble around looking for the right versions of libraries in order to be able to get something working.
If you make any modifications to the library you can make these changes to the source controlled version so when a new version comes out you use the source control's merging tools to ensure your edits don't go missing.
Arguments against:
It could mean everyone has a copy of the library locally - unless you map the 3rd party tools to a central server.
Deploying could be problematical - again unless you map the 3rd party tools to a central server and don't include them in the deploy script.

Checkstyle source control intergration

I've been looking into checkstyle recently as part of some research into standard coding conventions. Though it seems like it is perfectly suitable for brand new projects, it seems to have a huge barrier to adoption for already existing projects as it doesn't seem to supply a method of only checking new or edited code. Maybe I'm wrong?
If you have a codebase that has never had a coding standard it could be a massive effort to get the whole codebase inline with a standard all at once. Allowing it to be done incrementally over time as code naturally evolves seems like a more reasonable approach. But it doesn't seem like a possibility with checkstyle.
I assume this would have to be a tie in with a source control system in order to be possible. Is that possible with Checkstyle or is there another tool that can provide this functionality?

As far as I known, Checkstyle is meant to analyze source, without considering its history or revisions.
To add that kind of feature means script checkstyle analysis to feed it with the exact sub-set of files representing the delta.
But then, certain kind of checks would be likely to fail or to miss in their analysis, like duplicate code check.
So for that kind of incremental analysis, you not only need to restrict the set of sources, but also the set of rules you want to enforce, for some of those rules only make sense on the all sources.
So, why couldn't you run a full check on each file and then filter results based on changes managed by your source control system? Anything like that exist?
Not to my knowledge, especially with plugin like eclipse-cs for eclipse: it they analyze a file, they will display all warnings, even though the source control mentions the file has not changed since a given revision.
Only an external script would be able to do this:
The principle is simple (although it could be a bit slow at execution time):
for each file, do a diff to check if modification have been made
if yes,
do a svn blame to annotate lines with the revision number which contained the last change.
Then analyze the file with checkstyle.
The script can then filter the warning for the line being currently modified (or for all the lines modified after a given revision).

We developed a Checkstyle plugin for SCM-Manager, a tool for managing git, subversion and mercurial repositories. If activated it is possible to check committed source code against your Checkstyle rules. If the check found errors, the commit is aborted.

What to put under version control?

Almost any IDE creates lots of files that have nothing to do with the application being developed, they are generated and mantained by the IDE so he knows how to build the application, where the version control repository is and so on.
Should those files be kept under version control along with the files that really have something to do with the aplication (source code, application's configuration files, ...)?
The things is: on some IDEs if you create a new project and then import it into the version-control repository using the version-control client/commands embedded in the IDE, then all those files are sent to the respitory. And I'm not sure that's right: what is two different developers working on the same project want to use two different IDEs?
I want to keep this question agnostic avoiding references to any particular IDE, programming language or version control system. So this question is not exactly the same as these:
SVN and binaries - but this talks about binaries and SVN
Do you keep your build tools in version control? - but this talks about build tools (e.g. putting the jdk under version control)
What project files shouldn’t be checked into SVN - but this talks about SVN and dll's
Do you keep your project files under version control? - very similar (haven't found it before), thanks VonC

Rules of thumb:
Include everything which has an influence on the build result (compiler options, file encodings, ASCII/binary settings, etc.)
Include everything to make it possible to open the project from a clean checkout and being able to compile/run/test/debug/deploy it without any further manual intervention
Don't include files which contain absolute paths
Avoid including personal preferences (tab size, colors, window positions)
Follow the rules in this order.
[Update] There is always the question what should happen with generated code. As a rule of thumb, I always put those under version control. As always, take this rule with a grain of salt.
My reasons:
Versioning generated code seems like a waste of time. It's generated right? I can get it back at a push of a button!
Really?
If you had to bite the bullet and generate the exact same version of some previous release without fail, how much effort would it be? When generating code, you not only have to get all the input files right, you also have to turn back time for the code generator itself. Can you do that? Always? As easy as it would be to check out a certain version of the generated code if you had put it under version control?
And even if you could, could you ever be sure that didn't miss something?
So on one hand, putting generated code under version control make sense since it makes it dead easy to do what VCS are meant for: Go back in time.
Also it makes it easy to see the differences. Code generators are buggy, too. If I fix a bug and have 150'000 files generated, it helps a lot when I can compare them to the previous version to see that a) the bug is gone and b) nothing else changed unexpectedly. It's the unexpected part which you should worry about. If you don't, let me know and I'll make sure you never work for my company ever :-)
The major pain point of code generators is stability. It doesn't do when your code generator just spits out a random mess of bytes every time you run (well, unless you don't care about quality). Code generators need to be stable and deterministic. You run them twice with the same input and the output must be identical down to least significant bit.
So if you can't check in generated code because every run of the generator creates differences that aren't there, then your code generator has a bug. Fix it. Sort the code when you have to. Use hash maps that preserve order. Do everything necessary to make the output non-random. Just like you do everywhere else in your code.
Generated code that I might not put under version control would be documentation. Documentation is somewhat of a soft target. It doesn't matter as much when I regenerate the wrong version of the docs (say, it has a few typos more or less). But for releases, I might do that anyway so I can see the differences between releases. Might be useful, for example, to make sure the release notes are complete.
I also don't check in JAR files. As I do have full control over the whole build and full confidence that I can get back any version of the sources in a minute plus I know that I have everything necessary to build it without any further manual intervention, why would I need the executables for? Again, it might make sense to put them into a special release repo but then, better keep a copy of the last three years on your company's web server to download. Think: Comparing binaries is hard and doesn't tell you much.

I think it's best to put anything under version control that helps developers to get started quickly, ignoring anything that may be auto-generated by an IDE or build tools (e.g. Maven's eclipse plugin generates .project and .classpath - no need to check these in). Especially avoid files that change often, that contain nothing but user preferences, or that conflict between IDEs (e.g. another IDE that uses .project just like eclipse does).
For eclipse users, I find it especially handy to add code style (.settings/org.eclipse.jdt.core.prefs - auto formatting on save turned on) to get consistently formatted code.

Everything that can be automatically generated from the source+configuration files should not be under the version control! It only causes problems and limitations (like the one you stated - using 2 different project files by different programmers).
Its true not only for IDE "junk files" but also for intermediate files (like .pyc in python, .o in c etc).

This is where build automation and build files come in.
For example, you can still build the project (the two developers will need the same build software obviously) but they then could in turn use two different IDE's.
As for the 'junk' that gets generated, I tend to ignore most if it. I know this is meant to be language agnostic but consider Visual Studio. It generates user files (user settings etc..) this should not be under source control.
On the other hand, project files (used by the build process) most certainly should. I should add that if you are on a team and have all agreed on an IDE, then checking in IDE specific files is fine providing they are global and not user specific and/or not needed.
Those other questions do a good job of explaining what should and shouldn't be checked into source control so I wont repeat them.

In my opinion it depends on the project and environment. In a company environment where everybody is using the same IDE it can make sense to add the IDE files to the repository. While this depends a bit on the IDE, as some include absolute paths to things.
For a project which is developed in different environments it doesn't make sense and will be pain in the long run as the project files aren't maintained by all developers and make it harder to find "relevant" things.

Anything that would be devastating if it were lost, should be under version control.

In my opinion, anything needed to build the project (code, make files, media, databases with required program info, etc) should be in repositories. I realise that especially for media/database files this is contriversial, but to me if you can't branch and then hit build the source control's not doing it's job. This goes double for distributed systems with cheap branch creation/merging.
Anything else? Store it somewhere different. Developers should choose their own working environment as much as possible.

From what I have been looking at with version control, it seems that most things should go into it - e.g. source code and so on. However, the problem that many VCS's run into is when trying to handle large files, typically binaries, and at times things like audio and graphic files. Therefore, my personal way to do it is to put the source code under version control, along with general small sized graphics, and leave any binaries to other systems of management. If it is a binary that I created myself using the build system of the IDE, then that can definitily be ignored, because it is going to be regenerated every build. For dependancy libraries, well this is where dependancy package managers come in.
As for IDE generated files (I am assuming these are ones that aren't generated during the build process, such as the solution files for Visual Studio) - well, I think it would depend on whether or not you are working alone. If you are working alone, then go ahead and add them - they will allow you to revert settings in the solution or whatever you make. Same goes for other non-solution like files as well. However, if you are collaborating, then my recomendation is no - most IDE generated files tend to be, well, user specific - aka they work on your machine, but not neccesarily on others. Hence, you may be better of not including IDE generated files in that case.
tl;dr you should put most things that relate to your program into version control, excluding dependencies (things like libraries, graphics and audio should be handled by some other dependancy management system). As for things directly generated by the IDE - well, it would depend on if you are working alone or with other people.

Do you version "derived" files?

Using online interfaces to a version control system is a nice way to have a published location for the most recent versions of code. For example, I have a LaTeX package here (which is released to CTAN whenever changes are verified to actually work):
http://github.com/wspr/pstool/tree/master
The package itself is derived from a single file (in this case, pstool.tex) which, when processed, produces the documentation, the readme, the installer file, and the actual files that make up the package as it is used by LaTeX.
In order to make it easy for users who want to download this stuff, I include all of the derived files mentioned above in the repository itself as well as the master file pstool.tex. This means that I'll have double the number of changes every time I commit because the package file pstool.sty is a generated subset of the master file.
Is this a perversion of version control?
#Jon Limjap raised a good point:
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?
That's really the crux of the matter in this case. Yes, released versions of the package can be obtained from elsewhere. So it does really make more sense to only version the non-generated files.
On the other hand, #Madir's comment that:
the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes
is also rather pertinent in that if a user finds a bug and I fix it immediately, they can then head over to the repository and grab the file that's necessary for them to continue working without having to run any "installation" steps.
And this, I think, is the more important use case for my particular set of projects.

We don't version files that can be automatically generated using scripts included in the repository itself. The reason for this is that after a checkout, these files can be rebuild with a single click or command. In our projects we always try to make this as easy as possible, and thus preventing the need for versioning these files.
One scenario I can imagine where this could be useful if 'tagging' specific releases of a product, for use in a production environment (or any non-development environment) where tools required for generating the output might not be available.
We also use targets in our build scripts that can create and upload archives with a released version of our products. This can be uploaded to a production server, or a HTTP server for downloading by users of your products.

I am using Tortoise SVN for small system ASP.NET development. Most code is interpreted ASPX, but there are around a dozen binary DLLs generated by a manual compile step. Whilst it doesn't make a lot of sense to have these source-code versioned in theory, it certainly makes it convenient to ensure they are correctly mirrored from the development environment onto the production system (one click). Also - in case of disaster - the rollback to the previous step is again one click in SVN.
So I bit the bullet and included them in the SVN archive - the convenience, which is real and repeated, outweighs cost, which is borne behind the scenes.

Not necessarily, although best practices for source control advise that you do not include generated files, for obvious reasons.
Is there another way for you to publish your generated files elsewhere for download, instead of relying on your version control to be your download server?

Normally, derived files should not be stored in version control. In your case, you could build a release procedure that created a tarball that includes the derived files.
As you say, keeping the derived files in version control only increases the amount of noise you have to deal with.

In some cases we do, but it's more of a sysadmin type of use case, where the generated files (say, DNS zone files built from a script) have intrinsic interest in their own right, and the revision control is more linear audit trail than branching-and-tagging source control.

Categories

HOME

kubernetes

nmap

devexpress-windows-ui

spring-authorization-s...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse