Over the years my application has grown from 1MB to 25MB and I expect it to grow further to 40, 50 MB. I don't use DLL's, but put everything in this one big executable.
Having one big executable has certain advantages:
Installing my application at the customer is really: copy and run.
Upgrades can be easily zipped and sent to the customer
There is no risk of having conflicting DLL's (where the customer has not version X of the EXE, but version Y of the DLL)
The big disadvantage of the big EXE is that linking times seem to grow exponentially.
Additional problem is that a part of the code (let's say about 40%) is shared with another application. Again, the advantages are that:
There is no risk on having a mix of incorrect DLL versions
Every developer can make changes on the common code which speeds up developments.
But again, this has a serious impact on compilation times (everyone compiles the common code again on his PC) and on linking times.
The question Grouping DLL's for use in Executable mentions the possibility of mixing DLL's in one executable, but it looks like this still requires you to link all functions manually in your application (using LoadLibrary, GetProcAddress, ...).
What is your opinion on executable sizes, the use of DLL's and the best 'balance' between easy deployment and easy/fast development?
A single executable has a huge positive impact on maintainability. It is easier to debug, deploy (size issues aside) and diagnose in the field. As you point out, it completely sidesteps DLL hell.
The most straightforward solution to your problem is to have two compilation modes, one that builds a single exe for production and one that builds lots of little DLLs for development.
The tenet is: reduce the number of your .NET assemblies to the strict minimum. Having a single assembly is the ideal number. This is for example the case for Reflector or NHibernate that both come as a very few assemblies. My company published free two white books on the topic One big executable or many small DLL's:
Partitioning code base through .NET assemblies and Visual Studio projects (8 pages)
Defining .NET Components with Namespaces (7 pages)
Arguments are developed in these white-books come with invalid/valid reasons to create an assembly and a case study on the code base of the tool NDepend.
The problem is that MS fosters(and is still fostering) the idea that assemblies are components while assemblies are just physical artifact to pack code. The notion of component is a logical artifact and typically an assemblies should contains several components. It is a good idea to partition component with the notion of namespaces although it is not always practicable (especially in the case of a framework with a public API where namespace are used to partition the API and not necessarily the components)
One big executable is definitely beneficial - you can have whole program optimization and less overhead and maintenance is much simpler.
As for the link time - you could have both the "many DLLs" and "one big executable" at the same time. For each DLL have a project configuration that builds a static library. So when you debug things you compile the "DLL" configuration of the project and when you need to ship you compile the "static library" configurations of your projects. Sometimes you will have different behavior in different configurations, but this will have to be addressed per incident.
An easier way to maintain large programs is to compose them into smaller manageable parts. A program can be composed into a shell and modules that add feature to the shell. Large programs like Visual Studio, outlook all use the same concepts. Try this approach to make a more maintainable and robust programs.
Related
I have an application that contains close to 90 different projects written in at least 3 different languages (C#, Visual C++, and Visual Basic) targetting .NET 3, 3.5, 4.0, and 4.5,, with 6 different configurations. We are about to start scripting our compilations due to some messy configuration hassles involving preprocessing out features for different clients. I have seen multiple methods of using Powershell scripts to compile such applications but they seem to boil down to two options: compile the entire solution or compile each project individually.
So my question is: is there an industry best practice for such things? And if so what is it?
The project lead seems to be leaning toward compiling and individually configuring each project, but it seems wasteful since we already have configurations in VS to configure out individual projects. If this question is too subjective, I'd be happy to take it down. And sorry if my information is kind of vague, but I have to walk on egg shells with this project. Thanks.
If you are dealing with a large number of projects, and you are working with the Microsoft technology stack, the "Official" way to go would be to use Microsoft's full Team Foundation Server technology and formalize the development lifecycle.
It does Source Control, Build management, Testing, etc. I would then combine it with Release Management to more fully automate things.
The move to TFS is a big leap, and will require some time to get set up and configured, etc., but I think it is the best thing out there once you start reaching dozens of projects. PowerShell, and other methods are good, but become unmanageable when you are starting to scale out to large numbers of projects.
I'm trying to find a way to rid redundant compilation and js from a client's GWT code. Problem is that they have a multiple EntryPoint site and a massive model that gets compiled for every module. We're talking about 30 GWT modules and entry points each compiling the entire model package of the app separately. It takes about 15 minutes on my 8 core monster just to GWT compile this beast. And yes, compilation is parallellized and uses all cores (can hardly move my mouse in Ubuntu :) )
To change the architecture to a single module is not really an option I think. Is there no way to have inherits be shared between modules? The modules aren't necessarily that big all of them, but the problem is again that all inherits are compiled redundantly for each module. This of course has negative effects for the end user as well since every page basically has to load the entire model-js again and again.
According to
http://www.gwtproject.org/doc/latest/DevGuideOrganizingProjects.html#DevGuideModuleXml
the suggestion still seems to be to just make one great monolithic module. Isn't there any better way?
Any tips highly appreciated!
As it is said in the GWT Documentation you refer to, GWT mechanism to face the problem of avoiding redundant code is merging all modules in just a a super-gwt-module which includes all sub-modules you have in your applications.
I suppose you are producing a module for a different page or feature at your website, so using a unique module, as I say, implies that you will need a mechanism to run the appropriate application-code per page, based on the url or something.
You can take advantage of using code-splitting, so your modules will be EntryPoints instead of RunAsyncCallbacks, and each module will be compiled in one js fragment which will be loaded asynchronously.
Note that you will include the same javascript fragment in all pages, and this will load other fragments depending on the page.
The advantages of this solution are many:
You only have one compilation process. It could take a long time, but for sure it will take much less than compiling all modules individually because redundant code will be compiled once.
You can maintain different .gwt.xml, one to continue developing the individual modules with its own EntryPoint, and another without EntryPoint which will be inherited by your super-module.
Once compiled, the first fragment loaded (shared by all apps) would be very small, and it will be cached just once, so all apps would load very fast.
Many of the code shared by the modules (gwt-core, jre, etc), could go to the first fragment and would be shared by all the modules, decreasing the final downloaded size of each app.
This is an out-of-the-box solution, gwt compiler makes a good job splitting the code, merging shared code to intermediate modules, and adding the methods to load fragments asynchronously when demanded.
Java ecosystem facilitates modular apps (dependencies, maven, etc).
Otherwise, if you continue wanting individual modules, the way to compile all of them is what you are actually doing: executing gwt compiler once per module (and permutation). You can improve your compilation time, though, having a continuous integration cluster like Jenkins and running jobs in parallel, or using more brute force (memory, cpu, ...).
As you probably know GWT compiles each module into one big JavaScript file and optimizes everything based on all available information about everything in the whole module. This is why you need to compile everything for each module.
One solution might be to do create one big module, but use code splitting similar to the module structure. Than you don't get one very large monolithic JavaScript file, but 'modules' are loaded as needed.
Did you try compiling with less localworkers, instead of using all possible available cores? I've had the best results with localworker set to 4 (even on a 6-core machine).
I work in a small iPhone development team, in our office, we have at least 4 copies of XCode running on the network at any one time. Contemplating getting everyone to have it running.
We're networked together using a standard WIFI Switch, so network speed and latency isn't as good as wired network...
Just wondering, is there any real time gain to be had on using distributed builds? Once it passes the relevant data back and forth over the network. At least for relatively small projects.
it depends on your project, its dependencies, and the amount of data that must be transferred.
15-20 seconds is not terrible. certainly, there is more work overall to be done. it may be a good idea for everyone to farm it out to a very fast Mac Pro, rather than to each other if you're using dual cores (that info was not given).
as far as project configuration: if you have a bunch of dependent libraries in your projects, then it may help to disable precompiled headers. much of the equation is the average number of dependencies, and the number of objects to generate.
at 15-20 seconds, it would help many developers to write so they optimize their build times before farming out. if it were a few minutes, then you may want to jump straight into distributed builds with a 8 or 12 core.
one easily overlooked aspect of slow builds on small projects: disable static analysis per build, and just run it manually every hour two, fixing every issue then.
otherwise, your project could probably be divided into smaller projects/libraries. chances are, you won't always be editing the same dependencies.
assuming compilation, linking, etc are where the time is spent at this point: much of the rest falls into the typical issues involved with building c and c++ programs. minimize your dependencies and include graphs. it's actually quite easy to accomplish with objc; since much of the interfaces use objc-types, you can use forwards.
if your library is small (e.g. less than 50 objects generated), then you may also gain a speedup by not using precompiled headers. if everything already depends on your inclusion of 12 system frameworks included by the pch.... then try it in the next project.
of course, you could just try timing a clean rebuild, a build with generated pch files, and several incremental builds in order to come to a conclusion.
I've noticed in pretty much every company I've worked that they have a common library that is generally shared across a number of projects. More often than not this has been a single companyx-commons project that ends up as a dumping ground for common programs including:
Command Line Parsers
File Utilities
Framework Helpers
etc...
Some of these are well thought out and some duplicate functionality found in Apache commons-lang, commons-io etc..
What are the things you have in your common library and more importantly how do you structure the common libraries to make them easy to improve and incorporate across other projects?
In my experience, the single biggest factor in the success of a common library is user buy-in; users in this case being other developers; and culture of your workplace/team(s) will be a big factor.
Separate libraries (projects/assemblies if you're in .Net) for different application tiers is essential (e.g: there's obviously no point putting UI and data access code together).
Keep things as simple as possible; what you don't put in a common library is often at least as important as what you do. Users of the library won't want to have to think, so usage needs to be super easy.
The golden rule we stuck to was keeping individual functions focused on a single task - do one thing and do it well (or very very well); don't try and provide something that tries to take every possibility into account, the more reusable you think you're making it - the less likely it is to be used. Code Complete (the book) has some excellent content on common libraries.
A good approach to setting/improving a library up is to do regular code reviews and retrospectives; find good candidates that you've already come up with and consider re-factoring them into a library for future projects; a good candidate will be something that more than one developer has had to do on more that one project (for example).
Set-up some sort of simple and clear governance of the libraries - someone who can 'own' a specific library and ensure it's overal quality (such as a senior dev or team lead).
I have so far written most of the common libraries we use at our office.
We have certain button classes that are just slightly more useful to us than the standard buttons
A database management class that does some internal caching and can connect to ODBC, OLEDB, SQL, and Access databases without even the flip of a parameter
Some grid and list controls that are multi threaded so we can add large amounts of data to them without the program slowing and without having to write all the multithreading code every time there is a performance issue with a list box/combo box.
These classes make it easier for all of us to work on each other's code and know how exactly they work since we all use the exact same interfaces throughout our products.
As far as organization goes, all of the DLL's are stored along with their source code on a shared development drive in the office that we all have access to. (We're a pretty small shop)
We split our libraries by function.
Commmon.Ui.dll has base classes for ui elements.
Common.Data.Dll is sort of a wrapper around Enterprise library Data access classes.
Common.Business is a dumping ground for other common classes that don't fit into one of those.
We create other specialized dlls as needs arise.
At work we're developing a large-scale application with quite a few front-end, back-end and support components. Typically the front-end is developed in C# and the back-end is developed in Java, although parts of the back-end are also developed in C# and possibly later C++.
The choice of language and platform is not arbitrary; we try to weigh the relative merits of each in development time, tool-chain cost, familiarity with the language by the specific development team etc. What all these components have in common, though, is that they are all required for the complete operation of the product, and that they are being developed concurrently by independent (but highly communicative) teams.
Previously, we have used Team Foundation Server for our .NET code and Subversion for our Java code; because there was clear separation of the teams' responsibilities, this caused little problem beyond the inconvenience of placing binaries (WARs, in this case) generated from one source tree in another, and the high manual overhead of keeping the branches and revisions in sync. With this project, the degree of separation between the teams is intentionally much smaller, and the volume of branching/merging is expected to be considerably higher; as a result we're moving to a unified VCS, more specifically Subversion.
This brings me to the meat of the question: how does one mix Java and C# code effectively? In practice, we'll have .NET code dependent on a Java codebase; the Java binaries are required to run anything other than unit test code (integration tests already require the binaries, and QA, acceptance testing etc. certainly does as well). What we currently have in mind looks something like:
/trunk
/java
/component1
/component2
/library1
/library2
/net
/assembly1
/assembly2
/...
project.sln
The idea is that the entire source tree is placed under one branch; the .NET code is dependant on the Java code, so we'll add a post-build step to the solution which will (most likely) call the ant script for the Java components. This allows branching of the entire codebase (for .NET developers) or just the Java components (for Java developers).
The problems with this solution are:
What happens when one of the two codebases becomes so large that making copies of it for every branch gets impractical? (our thoughts: split to separate repositories for .NET and Java code and use svn:externals, any input on this would be greatly appreciated).
We use Eclipse for Java development. How do we manage the "shared" workspace (i.e. which projects are required for which components, the dependency graph etc.)? Up until now we've had relatively few Java components, so each developer could just keep all of them in the workspace at the same time. With the increase in Java components and Java developers I don't see how we can keep doing that; any suggestions on how to keep the workspace versioned (a la solution files) while still maintaining sync between the two code-bases?
I would love to hear your input!
1: I've found it best to group things by component, rather than langugage. If one component requires several languages for interface, you still need to develop, test and release them as one. So, splitting a component across several repos is not a good idea.
If one part of the code depends tightly on the other, keep it together. Better to split components across repos. (This even goes for internal structure, where, especially as things grow, it's difficult if you package things by type, rather than by function, i.e. in MVC, don't have three huge packages for each category, rather keep FooView, FooModel and FooController tight.)
svn:externals might work, and with the later versions I think you can use "internals", i.e. link to other dirs in the same repo. That is miles easier than managing separate repos, especially with tagging and branching. (shudder)
2: You could always have the developers setup different workspaces, or perhaps use working sets. Commercial Eclipse releases has better support for sharing workspace settings than the OS variant. (Haven't tried, only worked and been frustrated with the OS one)
I've done C++ (MSVS) and Java (Eclipse) in one repo, and it works pretty well. Also C++/Python similarly. Make sure your build system supports building and testing everything (even if your IDEs only build one part).