What is a sensible structure for multiple-language project in source control? - version-control

At work we're developing a large-scale application with quite a few front-end, back-end and support components. Typically the front-end is developed in C# and the back-end is developed in Java, although parts of the back-end are also developed in C# and possibly later C++.
The choice of language and platform is not arbitrary; we try to weigh the relative merits of each in development time, tool-chain cost, familiarity with the language by the specific development team etc. What all these components have in common, though, is that they are all required for the complete operation of the product, and that they are being developed concurrently by independent (but highly communicative) teams.
Previously, we have used Team Foundation Server for our .NET code and Subversion for our Java code; because there was clear separation of the teams' responsibilities, this caused little problem beyond the inconvenience of placing binaries (WARs, in this case) generated from one source tree in another, and the high manual overhead of keeping the branches and revisions in sync. With this project, the degree of separation between the teams is intentionally much smaller, and the volume of branching/merging is expected to be considerably higher; as a result we're moving to a unified VCS, more specifically Subversion.
This brings me to the meat of the question: how does one mix Java and C# code effectively? In practice, we'll have .NET code dependent on a Java codebase; the Java binaries are required to run anything other than unit test code (integration tests already require the binaries, and QA, acceptance testing etc. certainly does as well). What we currently have in mind looks something like:
/trunk
/java
/component1
/component2
/library1
/library2
/net
/assembly1
/assembly2
/...
project.sln
The idea is that the entire source tree is placed under one branch; the .NET code is dependant on the Java code, so we'll add a post-build step to the solution which will (most likely) call the ant script for the Java components. This allows branching of the entire codebase (for .NET developers) or just the Java components (for Java developers).
The problems with this solution are:
What happens when one of the two codebases becomes so large that making copies of it for every branch gets impractical? (our thoughts: split to separate repositories for .NET and Java code and use svn:externals, any input on this would be greatly appreciated).
We use Eclipse for Java development. How do we manage the "shared" workspace (i.e. which projects are required for which components, the dependency graph etc.)? Up until now we've had relatively few Java components, so each developer could just keep all of them in the workspace at the same time. With the increase in Java components and Java developers I don't see how we can keep doing that; any suggestions on how to keep the workspace versioned (a la solution files) while still maintaining sync between the two code-bases?
I would love to hear your input!

1: I've found it best to group things by component, rather than langugage. If one component requires several languages for interface, you still need to develop, test and release them as one. So, splitting a component across several repos is not a good idea.
If one part of the code depends tightly on the other, keep it together. Better to split components across repos. (This even goes for internal structure, where, especially as things grow, it's difficult if you package things by type, rather than by function, i.e. in MVC, don't have three huge packages for each category, rather keep FooView, FooModel and FooController tight.)
svn:externals might work, and with the later versions I think you can use "internals", i.e. link to other dirs in the same repo. That is miles easier than managing separate repos, especially with tagging and branching. (shudder)
2: You could always have the developers setup different workspaces, or perhaps use working sets. Commercial Eclipse releases has better support for sharing workspace settings than the OS variant. (Haven't tried, only worked and been frustrated with the OS one)
I've done C++ (MSVS) and Java (Eclipse) in one repo, and it works pretty well. Also C++/Python similarly. Make sure your build system supports building and testing everything (even if your IDEs only build one part).

Related

Best practice for using Powershell to compile solution with multiple projects

I have an application that contains close to 90 different projects written in at least 3 different languages (C#, Visual C++, and Visual Basic) targetting .NET 3, 3.5, 4.0, and 4.5,, with 6 different configurations. We are about to start scripting our compilations due to some messy configuration hassles involving preprocessing out features for different clients. I have seen multiple methods of using Powershell scripts to compile such applications but they seem to boil down to two options: compile the entire solution or compile each project individually.
So my question is: is there an industry best practice for such things? And if so what is it?
The project lead seems to be leaning toward compiling and individually configuring each project, but it seems wasteful since we already have configurations in VS to configure out individual projects. If this question is too subjective, I'd be happy to take it down. And sorry if my information is kind of vague, but I have to walk on egg shells with this project. Thanks.
If you are dealing with a large number of projects, and you are working with the Microsoft technology stack, the "Official" way to go would be to use Microsoft's full Team Foundation Server technology and formalize the development lifecycle.
It does Source Control, Build management, Testing, etc. I would then combine it with Release Management to more fully automate things.
The move to TFS is a big leap, and will require some time to get set up and configured, etc., but I think it is the best thing out there once you start reaching dozens of projects. PowerShell, and other methods are good, but become unmanageable when you are starting to scale out to large numbers of projects.

New project with Maven2

What's the recommended way to use Maven for a project that might grow large in the future?
I work with Eclipse and I see different approaches. Some use one project with no sub modules, and some Like mahout for example, have different sub-projects for different modules (e.g., core, math, examples, etc.). You can see it in this link:
http://svn.apache.org/repos/asf/mahout/trunk/
Is there any advantage preferring one over the other?
Thanks.
The decision to split a project into modules should be driven by the way you want to design and maintain your application. Maven itself will work well whether you choose to create a multi-module project, or lump everything together in a single module. So then the question becomes, what are the advantages/disadvantages of splitting your application into multiple modules.
Some drivers for splitting your app are purely technical:
Generation of separate client and server artifacts
Generation a command line version and a web app version
Module dependency relationships
In other cases, it is more design related concerns will drive you to split up your application. This is especially important if your concern is an application that will grow over time. In these cases, you want to separate your application into specific areas of concern, and have defined service boundaries through which modules interact. This allows you to evolve individual modules over time, while minimizing the effect on the remaining parts of the application.
Note that maven is especially good with large multi-module projects, because of the support it provides for dependency management and resolution.
Modules are really a core concepts with Maven (and are well supported) and the obvious advantage of modular builds is... well, modularity.
Pros:
Allows better separation of concerns.
Promotes, enforces modular design of code.
Gives you finer grained control of what you "apply" to each parts.
Allows to work on subparts of the code in isolation of the other parts.
Using binary dependencies speeds up compilation (vs compiling a whole monolithic project).
Allows users to depend on subparts of your application only.
Cons:
If not needed, multiple modules would induce more maintenance than a monolithic project.
So, Maven supports and promotes modular builds, that's part of the design.
And in some case, you actually don't even have the choice because of the one (main) artifact per module golden rule: if you want to distribute an application as different parts, or to assemble several parts of an application (e.g. a client JAR and and a server JAR, or JARs in a WAR, or some EJB-JARs and WARs in a EAR), you must create dedicated modules for them (Maven can't produce a WAR and an EAR from a same module).
To sum up, modularity of your build is somehow driven by the nature of your application, it might just be required. But if your app is not modular by nature (say you develop a swing client), you can use a single module. Divide it as needed when things become too complex, too big.
It's recommended to use modules cause usually in a large project you have reltionships between the modules which are handled by Maven. On the other hand the related parts are related to each other so a reactor build is a good way and of course the release system of Maven give you many things to support you in your project ( branching etc.).

Can Xcode and iPhone handle hundreds of static libraries?

I'm trying keep xcode chaos under control. Namely, how to reuse my small components/classes between projects. One strategy is to make every class, or tightly coupled collection of classes, in to a static library, each being a spawn of a different sub project with a few targets like unit tests, demos, and, of course, the library.
The way it looks now, I could I see a final app being composed of some custom code, and, say, a couple hundred libraries. That scares me, but should it? Would performance suffer? Are there other limitations to the many-library approach that would make it impractical?
Having 100's of static libraries is not keeping chaos under control it's making it far worse. Grouping your code logically into static libraries is a great idea but one per class is far too fine grained. 100's of libraries equals hundreds of projects which is a lot of maintenance.
If your concern is manageability, have you considered using svn:externals or git submodules?
It's a subdirectory from a different repository than the rest of your tree, so you can have multiple projects all using the latest version of your shared code in addition to a project just for testing that shared code. The file hierarchy would look something like:
tests/ <-- svn checkout
shared-code/
test-code/
project 1/ <-- svn checkout
shared-code/ <-- svn:external to tests/shared-code/
p1-specific-code/
project 2/ <-- svn checkout
shared-code/ <-- svn:external to tests/shared-code/
p2-specific-code/
There's a little svn dance to be done when tagging with svn:externals, and I believe git submodules require a different dance when updating their contents to HEAD, but those are both far from the headache involved with keeping common code in sync across multiple projects.
After some offline-discussion, the consensus was that 100s of libraries would not slow down executions, but could be painful during linking.
The complexity of managing lots of libraries, could, of course, could make the solution worse than the disease.

One big executable or many small DLL's?

Over the years my application has grown from 1MB to 25MB and I expect it to grow further to 40, 50 MB. I don't use DLL's, but put everything in this one big executable.
Having one big executable has certain advantages:
Installing my application at the customer is really: copy and run.
Upgrades can be easily zipped and sent to the customer
There is no risk of having conflicting DLL's (where the customer has not version X of the EXE, but version Y of the DLL)
The big disadvantage of the big EXE is that linking times seem to grow exponentially.
Additional problem is that a part of the code (let's say about 40%) is shared with another application. Again, the advantages are that:
There is no risk on having a mix of incorrect DLL versions
Every developer can make changes on the common code which speeds up developments.
But again, this has a serious impact on compilation times (everyone compiles the common code again on his PC) and on linking times.
The question Grouping DLL's for use in Executable mentions the possibility of mixing DLL's in one executable, but it looks like this still requires you to link all functions manually in your application (using LoadLibrary, GetProcAddress, ...).
What is your opinion on executable sizes, the use of DLL's and the best 'balance' between easy deployment and easy/fast development?
A single executable has a huge positive impact on maintainability. It is easier to debug, deploy (size issues aside) and diagnose in the field. As you point out, it completely sidesteps DLL hell.
The most straightforward solution to your problem is to have two compilation modes, one that builds a single exe for production and one that builds lots of little DLLs for development.
The tenet is: reduce the number of your .NET assemblies to the strict minimum. Having a single assembly is the ideal number. This is for example the case for Reflector or NHibernate that both come as a very few assemblies. My company published free two white books on the topic One big executable or many small DLL's:
Partitioning code base through .NET assemblies and Visual Studio projects (8 pages)
Defining .NET Components with Namespaces (7 pages)
Arguments are developed in these white-books come with invalid/valid reasons to create an assembly and a case study on the code base of the tool NDepend.
The problem is that MS fosters(and is still fostering) the idea that assemblies are components while assemblies are just physical artifact to pack code. The notion of component is a logical artifact and typically an assemblies should contains several components. It is a good idea to partition component with the notion of namespaces although it is not always practicable (especially in the case of a framework with a public API where namespace are used to partition the API and not necessarily the components)
One big executable is definitely beneficial - you can have whole program optimization and less overhead and maintenance is much simpler.
As for the link time - you could have both the "many DLLs" and "one big executable" at the same time. For each DLL have a project configuration that builds a static library. So when you debug things you compile the "DLL" configuration of the project and when you need to ship you compile the "static library" configurations of your projects. Sometimes you will have different behavior in different configurations, but this will have to be addressed per incident.
An easier way to maintain large programs is to compose them into smaller manageable parts. A program can be composed into a shell and modules that add feature to the shell. Large programs like Visual Studio, outlook all use the same concepts. Try this approach to make a more maintainable and robust programs.

How do you organise your code library?

I am interested to know how people organise their code libraries, particularly with respect to reusable components. I am talking in OO terms below but I am interested in how your organise libraries for other types of language also.
For example:
Are you a stickler for class library projects for everything or do you prefer to keep everything in a single project?
Do you reuse your prebuilt DLLs or do you include individual classes from previous projects in your current work? If individual classes, do you share them between the projects to ensure all are kept up to date or do you permit branching?
How large are your reusable elements? How focussed are they? How are they focussed?
What level of reuse do you attain through your preferred practices?
etc.
EDIT
I am not looking for specific guidance here, I am just interested in people's thoughts and practices. I am particularly interested in the reuse of code between disparate projects, rather than within a single project. (Unfortunately the use of 'project' here is misleading - I mean reuse between real-world projects undertaken for customers, not projects in a Visual Studio sense.)
It generally can be guide by deployment considerations:
How will you deploy (i.e. what will you copy on your production machine) ?
If what you are deploying are packaged components (i.e. dll, jar, war, ...), it is wise to organize the "code library" as a collection of packaged set of files.
That way, you will develop directly with the -- dll, jar, war, ... -- which will be deployed on the production platform.
The idea being: if it works with those packaged files, it may still work in production.
the reuse of code between disparate projects, rather than within a single project.
I maintain that kind of reuse is easier in a "component" approach (like the one discussed in the question "Vendor Branches in GIT")
Over more than 40 current projects, we achieved:
technical reuse by systematically isolating any pure technical aspect into independent framework (typically, log framework, exception framework, KPI - Key Performance Indicator - framework, and so on).
Those technical components are reused into every other projects.
functional reuse by setting a clear applicative architecture in order to divide any functional domain (given the business and functional specifications) into well-defined applications. That would typically involve, for instance, a bus layer which is also a great candidate for exposing services reused by any other projects.
Summary:
For large functional domain, a single project being not manageable, a good applicative architecture will lead to natural code reuse.
We follow these principles:
The Release-Reuse Equivalency Principle: The granule of reuse is the granule of release.
The Common Closure Principle: The classes in a package should be closed together against the same kinds of changes.
The Common Reuse Principle: The classes in a package are reused together.
The Acyclic Dependencies Principle: Allow no cycles in the package dependency graph.
The Stable Dependency Principle: Depend in the direction of stability.
The Stable Abstraction Principle: A package should be as abstract as it is stable.
You can find out more over here and over here.
It depends on what platform you work. I'm a (proud) Java developer and we have nice tools to organise our dependencies such as Maven or Ivy
Whatever else you decide good source code control is crucial to this,as it allows you to implement your strategy whatever way you like without ending up with lots of unrelated copies of your libraries.good branching support is essential.