New project with Maven2 - eclipse

What's the recommended way to use Maven for a project that might grow large in the future?
I work with Eclipse and I see different approaches. Some use one project with no sub modules, and some Like mahout for example, have different sub-projects for different modules (e.g., core, math, examples, etc.). You can see it in this link:
http://svn.apache.org/repos/asf/mahout/trunk/
Is there any advantage preferring one over the other?
Thanks.

The decision to split a project into modules should be driven by the way you want to design and maintain your application. Maven itself will work well whether you choose to create a multi-module project, or lump everything together in a single module. So then the question becomes, what are the advantages/disadvantages of splitting your application into multiple modules.
Some drivers for splitting your app are purely technical:
Generation of separate client and server artifacts
Generation a command line version and a web app version
Module dependency relationships
In other cases, it is more design related concerns will drive you to split up your application. This is especially important if your concern is an application that will grow over time. In these cases, you want to separate your application into specific areas of concern, and have defined service boundaries through which modules interact. This allows you to evolve individual modules over time, while minimizing the effect on the remaining parts of the application.
Note that maven is especially good with large multi-module projects, because of the support it provides for dependency management and resolution.

Modules are really a core concepts with Maven (and are well supported) and the obvious advantage of modular builds is... well, modularity.
Pros:
Allows better separation of concerns.
Promotes, enforces modular design of code.
Gives you finer grained control of what you "apply" to each parts.
Allows to work on subparts of the code in isolation of the other parts.
Using binary dependencies speeds up compilation (vs compiling a whole monolithic project).
Allows users to depend on subparts of your application only.
Cons:
If not needed, multiple modules would induce more maintenance than a monolithic project.
So, Maven supports and promotes modular builds, that's part of the design.
And in some case, you actually don't even have the choice because of the one (main) artifact per module golden rule: if you want to distribute an application as different parts, or to assemble several parts of an application (e.g. a client JAR and and a server JAR, or JARs in a WAR, or some EJB-JARs and WARs in a EAR), you must create dedicated modules for them (Maven can't produce a WAR and an EAR from a same module).
To sum up, modularity of your build is somehow driven by the nature of your application, it might just be required. But if your app is not modular by nature (say you develop a swing client), you can use a single module. Divide it as needed when things become too complex, too big.

It's recommended to use modules cause usually in a large project you have reltionships between the modules which are handled by Maven. On the other hand the related parts are related to each other so a reactor build is a good way and of course the release system of Maven give you many things to support you in your project ( branching etc.).

Related

Simultaneous main()s for SBT-based project

Say, you have sbt-based project targeting some distributed aims. That is, your project contains Play application(s) (with some subprojects hierarchy) as well as, say, few other own services, tools to fill test data in, load and so on.. You see, to develop the project you inevitably want to start many simultaneous main()s.
At the moment I have decided the problem this way: sbt terminal session is used to run Play, and Scala IDE is used for others. To eliminate any clashes I was forced to write own template engine and router for Play (that is eliminating managed sources in Play's terms).
OTOH, I don't want to be strongly sticked to the Scala IDE (or any IDE) having an opportunity to start simultaneously many main()s (tracking output of each of them) in terms of sbt sessions themselves.
What is your sbt-based developer's environment for distributed systems developing?
While not immediately helpful, you may be interested in the backgroundRun task some of us are working on adding to sbt. For now the API is in this plugin code:
https://github.com/sbt/sbt-remote-control/blob/master/ui-interface/src/main/scala/sbt/BackgroundRun.scala
It's just a version of run that doesn't
block and you can manage jobs much as you can in bash.
Supporting Play is harder than supporting generic run, it requires this PR: https://github.com/playframework/playframework/pull/3591
Anyway this isn't all published and working but working on it.
The best short term solution I think is to run multiple instances of sbt and be careful not to compile in more than one of them at once. If you get any weird build or classpath errors, do a clean and be sure it wasn't caused by multiple compiles stomping each other.

gwt multiple modules without redundant code

I'm trying to find a way to rid redundant compilation and js from a client's GWT code. Problem is that they have a multiple EntryPoint site and a massive model that gets compiled for every module. We're talking about 30 GWT modules and entry points each compiling the entire model package of the app separately. It takes about 15 minutes on my 8 core monster just to GWT compile this beast. And yes, compilation is parallellized and uses all cores (can hardly move my mouse in Ubuntu :) )
To change the architecture to a single module is not really an option I think. Is there no way to have inherits be shared between modules? The modules aren't necessarily that big all of them, but the problem is again that all inherits are compiled redundantly for each module. This of course has negative effects for the end user as well since every page basically has to load the entire model-js again and again.
According to
http://www.gwtproject.org/doc/latest/DevGuideOrganizingProjects.html#DevGuideModuleXml
the suggestion still seems to be to just make one great monolithic module. Isn't there any better way?
Any tips highly appreciated!
As it is said in the GWT Documentation you refer to, GWT mechanism to face the problem of avoiding redundant code is merging all modules in just a a super-gwt-module which includes all sub-modules you have in your applications.
I suppose you are producing a module for a different page or feature at your website, so using a unique module, as I say, implies that you will need a mechanism to run the appropriate application-code per page, based on the url or something.
You can take advantage of using code-splitting, so your modules will be EntryPoints instead of RunAsyncCallbacks, and each module will be compiled in one js fragment which will be loaded asynchronously.
Note that you will include the same javascript fragment in all pages, and this will load other fragments depending on the page.
The advantages of this solution are many:
You only have one compilation process. It could take a long time, but for sure it will take much less than compiling all modules individually because redundant code will be compiled once.
You can maintain different .gwt.xml, one to continue developing the individual modules with its own EntryPoint, and another without EntryPoint which will be inherited by your super-module.
Once compiled, the first fragment loaded (shared by all apps) would be very small, and it will be cached just once, so all apps would load very fast.
Many of the code shared by the modules (gwt-core, jre, etc), could go to the first fragment and would be shared by all the modules, decreasing the final downloaded size of each app.
This is an out-of-the-box solution, gwt compiler makes a good job splitting the code, merging shared code to intermediate modules, and adding the methods to load fragments asynchronously when demanded.
Java ecosystem facilitates modular apps (dependencies, maven, etc).
Otherwise, if you continue wanting individual modules, the way to compile all of them is what you are actually doing: executing gwt compiler once per module (and permutation). You can improve your compilation time, though, having a continuous integration cluster like Jenkins and running jobs in parallel, or using more brute force (memory, cpu, ...).
As you probably know GWT compiles each module into one big JavaScript file and optimizes everything based on all available information about everything in the whole module. This is why you need to compile everything for each module.
One solution might be to do create one big module, but use code splitting similar to the module structure. Than you don't get one very large monolithic JavaScript file, but 'modules' are loaded as needed.
Did you try compiling with less localworkers, instead of using all possible available cores? I've had the best results with localworker set to 4 (even on a 6-core machine).

One big executable or many small DLL's?

Over the years my application has grown from 1MB to 25MB and I expect it to grow further to 40, 50 MB. I don't use DLL's, but put everything in this one big executable.
Having one big executable has certain advantages:
Installing my application at the customer is really: copy and run.
Upgrades can be easily zipped and sent to the customer
There is no risk of having conflicting DLL's (where the customer has not version X of the EXE, but version Y of the DLL)
The big disadvantage of the big EXE is that linking times seem to grow exponentially.
Additional problem is that a part of the code (let's say about 40%) is shared with another application. Again, the advantages are that:
There is no risk on having a mix of incorrect DLL versions
Every developer can make changes on the common code which speeds up developments.
But again, this has a serious impact on compilation times (everyone compiles the common code again on his PC) and on linking times.
The question Grouping DLL's for use in Executable mentions the possibility of mixing DLL's in one executable, but it looks like this still requires you to link all functions manually in your application (using LoadLibrary, GetProcAddress, ...).
What is your opinion on executable sizes, the use of DLL's and the best 'balance' between easy deployment and easy/fast development?
A single executable has a huge positive impact on maintainability. It is easier to debug, deploy (size issues aside) and diagnose in the field. As you point out, it completely sidesteps DLL hell.
The most straightforward solution to your problem is to have two compilation modes, one that builds a single exe for production and one that builds lots of little DLLs for development.
The tenet is: reduce the number of your .NET assemblies to the strict minimum. Having a single assembly is the ideal number. This is for example the case for Reflector or NHibernate that both come as a very few assemblies. My company published free two white books on the topic One big executable or many small DLL's:
Partitioning code base through .NET assemblies and Visual Studio projects (8 pages)
Defining .NET Components with Namespaces (7 pages)
Arguments are developed in these white-books come with invalid/valid reasons to create an assembly and a case study on the code base of the tool NDepend.
The problem is that MS fosters(and is still fostering) the idea that assemblies are components while assemblies are just physical artifact to pack code. The notion of component is a logical artifact and typically an assemblies should contains several components. It is a good idea to partition component with the notion of namespaces although it is not always practicable (especially in the case of a framework with a public API where namespace are used to partition the API and not necessarily the components)
One big executable is definitely beneficial - you can have whole program optimization and less overhead and maintenance is much simpler.
As for the link time - you could have both the "many DLLs" and "one big executable" at the same time. For each DLL have a project configuration that builds a static library. So when you debug things you compile the "DLL" configuration of the project and when you need to ship you compile the "static library" configurations of your projects. Sometimes you will have different behavior in different configurations, but this will have to be addressed per incident.
An easier way to maintain large programs is to compose them into smaller manageable parts. A program can be composed into a shell and modules that add feature to the shell. Large programs like Visual Studio, outlook all use the same concepts. Try this approach to make a more maintainable and robust programs.

How do you organise your code library?

I am interested to know how people organise their code libraries, particularly with respect to reusable components. I am talking in OO terms below but I am interested in how your organise libraries for other types of language also.
For example:
Are you a stickler for class library projects for everything or do you prefer to keep everything in a single project?
Do you reuse your prebuilt DLLs or do you include individual classes from previous projects in your current work? If individual classes, do you share them between the projects to ensure all are kept up to date or do you permit branching?
How large are your reusable elements? How focussed are they? How are they focussed?
What level of reuse do you attain through your preferred practices?
etc.
EDIT
I am not looking for specific guidance here, I am just interested in people's thoughts and practices. I am particularly interested in the reuse of code between disparate projects, rather than within a single project. (Unfortunately the use of 'project' here is misleading - I mean reuse between real-world projects undertaken for customers, not projects in a Visual Studio sense.)
It generally can be guide by deployment considerations:
How will you deploy (i.e. what will you copy on your production machine) ?
If what you are deploying are packaged components (i.e. dll, jar, war, ...), it is wise to organize the "code library" as a collection of packaged set of files.
That way, you will develop directly with the -- dll, jar, war, ... -- which will be deployed on the production platform.
The idea being: if it works with those packaged files, it may still work in production.
the reuse of code between disparate projects, rather than within a single project.
I maintain that kind of reuse is easier in a "component" approach (like the one discussed in the question "Vendor Branches in GIT")
Over more than 40 current projects, we achieved:
technical reuse by systematically isolating any pure technical aspect into independent framework (typically, log framework, exception framework, KPI - Key Performance Indicator - framework, and so on).
Those technical components are reused into every other projects.
functional reuse by setting a clear applicative architecture in order to divide any functional domain (given the business and functional specifications) into well-defined applications. That would typically involve, for instance, a bus layer which is also a great candidate for exposing services reused by any other projects.
Summary:
For large functional domain, a single project being not manageable, a good applicative architecture will lead to natural code reuse.
We follow these principles:
The Release-Reuse Equivalency Principle: The granule of reuse is the granule of release.
The Common Closure Principle: The classes in a package should be closed together against the same kinds of changes.
The Common Reuse Principle: The classes in a package are reused together.
The Acyclic Dependencies Principle: Allow no cycles in the package dependency graph.
The Stable Dependency Principle: Depend in the direction of stability.
The Stable Abstraction Principle: A package should be as abstract as it is stable.
You can find out more over here and over here.
It depends on what platform you work. I'm a (proud) Java developer and we have nice tools to organise our dependencies such as Maven or Ivy
Whatever else you decide good source code control is crucial to this,as it allows you to implement your strategy whatever way you like without ending up with lots of unrelated copies of your libraries.good branching support is essential.

What is a sensible structure for multiple-language project in source control?

At work we're developing a large-scale application with quite a few front-end, back-end and support components. Typically the front-end is developed in C# and the back-end is developed in Java, although parts of the back-end are also developed in C# and possibly later C++.
The choice of language and platform is not arbitrary; we try to weigh the relative merits of each in development time, tool-chain cost, familiarity with the language by the specific development team etc. What all these components have in common, though, is that they are all required for the complete operation of the product, and that they are being developed concurrently by independent (but highly communicative) teams.
Previously, we have used Team Foundation Server for our .NET code and Subversion for our Java code; because there was clear separation of the teams' responsibilities, this caused little problem beyond the inconvenience of placing binaries (WARs, in this case) generated from one source tree in another, and the high manual overhead of keeping the branches and revisions in sync. With this project, the degree of separation between the teams is intentionally much smaller, and the volume of branching/merging is expected to be considerably higher; as a result we're moving to a unified VCS, more specifically Subversion.
This brings me to the meat of the question: how does one mix Java and C# code effectively? In practice, we'll have .NET code dependent on a Java codebase; the Java binaries are required to run anything other than unit test code (integration tests already require the binaries, and QA, acceptance testing etc. certainly does as well). What we currently have in mind looks something like:
/trunk
/java
/component1
/component2
/library1
/library2
/net
/assembly1
/assembly2
/...
project.sln
The idea is that the entire source tree is placed under one branch; the .NET code is dependant on the Java code, so we'll add a post-build step to the solution which will (most likely) call the ant script for the Java components. This allows branching of the entire codebase (for .NET developers) or just the Java components (for Java developers).
The problems with this solution are:
What happens when one of the two codebases becomes so large that making copies of it for every branch gets impractical? (our thoughts: split to separate repositories for .NET and Java code and use svn:externals, any input on this would be greatly appreciated).
We use Eclipse for Java development. How do we manage the "shared" workspace (i.e. which projects are required for which components, the dependency graph etc.)? Up until now we've had relatively few Java components, so each developer could just keep all of them in the workspace at the same time. With the increase in Java components and Java developers I don't see how we can keep doing that; any suggestions on how to keep the workspace versioned (a la solution files) while still maintaining sync between the two code-bases?
I would love to hear your input!
1: I've found it best to group things by component, rather than langugage. If one component requires several languages for interface, you still need to develop, test and release them as one. So, splitting a component across several repos is not a good idea.
If one part of the code depends tightly on the other, keep it together. Better to split components across repos. (This even goes for internal structure, where, especially as things grow, it's difficult if you package things by type, rather than by function, i.e. in MVC, don't have three huge packages for each category, rather keep FooView, FooModel and FooController tight.)
svn:externals might work, and with the later versions I think you can use "internals", i.e. link to other dirs in the same repo. That is miles easier than managing separate repos, especially with tagging and branching. (shudder)
2: You could always have the developers setup different workspaces, or perhaps use working sets. Commercial Eclipse releases has better support for sharing workspace settings than the OS variant. (Haven't tried, only worked and been frustrated with the OS one)
I've done C++ (MSVS) and Java (Eclipse) in one repo, and it works pretty well. Also C++/Python similarly. Make sure your build system supports building and testing everything (even if your IDEs only build one part).