Simultaneous main()s for SBT-based project - scala

Say, you have sbt-based project targeting some distributed aims. That is, your project contains Play application(s) (with some subprojects hierarchy) as well as, say, few other own services, tools to fill test data in, load and so on.. You see, to develop the project you inevitably want to start many simultaneous main()s.
At the moment I have decided the problem this way: sbt terminal session is used to run Play, and Scala IDE is used for others. To eliminate any clashes I was forced to write own template engine and router for Play (that is eliminating managed sources in Play's terms).
OTOH, I don't want to be strongly sticked to the Scala IDE (or any IDE) having an opportunity to start simultaneously many main()s (tracking output of each of them) in terms of sbt sessions themselves.
What is your sbt-based developer's environment for distributed systems developing?

While not immediately helpful, you may be interested in the backgroundRun task some of us are working on adding to sbt. For now the API is in this plugin code:
https://github.com/sbt/sbt-remote-control/blob/master/ui-interface/src/main/scala/sbt/BackgroundRun.scala
It's just a version of run that doesn't
block and you can manage jobs much as you can in bash.
Supporting Play is harder than supporting generic run, it requires this PR: https://github.com/playframework/playframework/pull/3591
Anyway this isn't all published and working but working on it.
The best short term solution I think is to run multiple instances of sbt and be careful not to compile in more than one of them at once. If you get any weird build or classpath errors, do a clean and be sure it wasn't caused by multiple compiles stomping each other.

Related

How can I detect the frameworks and/or libraries used in any Source Code Repository/Directory programatically?

Suppose I have a source code directory, I want to run a script that would scan the code in the directory and return the languages, frameworks and libraries used in it. I've tried github/linguist, its a great tool which even Github uses to detect the programming languages used in a source code, however I am not able go beyond that and detect the framework exactly.
I even tried tools like it-depends, to fetch the dependencies but, its getting messed up.
Could someone help me out to figure out how I can do this stuff, with an existing tool or if have to make one such tool how should I approach it.
Thanks in Advance
This is, in the general case, impossible. The halting problem precludes any program from being able to compute, in finite time, what other programs may or may not do - including what dependencies it requires to run. Sure, you can make it work for some inputs - but never for all.
So you have to compromise:
which languages do you need to support? it-depends does not try to support Java, for example. Different languages have different ways of calling in dependencies from their source-code. For example, if working with C, you will want to look at #includes.
which build-chains to you need to support? parsing a standard Makefile for C is very different from, say, looking into a Maven pom.xml for Java. Additionally, build-chains can perform arbitrary computation -- and again, due to the halting problem, your dependency-detection program will not be able to "statically" figure out intended behavior. It is entirely possible to link against one library or another one (or none at all) depending on what is detected to exist. What should you output in this case?. For programs that have no documented build process, you simply cannot know their dependencies. Often, the build-process is human-documented but not machine-readable...
what do you consider a library/framework? long-lived libraries can evolve through many different versions, and the fact that one version is required and not another may not be explicit in the source-code. If a code-base depends on behavior found in only a specific, now superseded, version of a library, and no explicit mention of that version is found -- your dependency-detection program will have no way to know about it (unless you code in library-version-specific detection; which is doable, but on a case-by-case basis, and requires deep knowledge of differences between versions).
Therefore the answer to your question is that... it depends (they go into a fair amount of detail regarding limitations). For the specific case of Java + Maven, which is not covered by it-depends, you can use Maven itself, via mvn dependency:tree. Choose a subset of the problem instead of trying to solve it all at once.

A way to leverage GWT 2.8 incremental compilation to add extra modules faster

GWT used be able to produce and use *.gwtar files before to speed up (at least a little bit) the transpilation - libraries could come with their *.gwtar files and only modified code would need to be fully rebuilt.
I am working on a large GWT application that allow for more modules to be added by the product's end user - kind of like plugins. These modules are specifically packaged and must obey certain contracts. During deployment we rebuild the app using the combined code (existing + module being added). The production build process takes significant time already but we can live with it in development, we use Super Dev mode for the rest. However, end users see multi-minute builds (say 10-20!) when they add a module and that is rather inconvenient. *.gwtar files no longer help/work - we noticed that they started breaking some stuff with GWT 2.7, actually (compiler reported errors with 3rd party libraries but only when *.gwtars were used).
GWT isn't really made to be able to have independently compiled modules talk to one another without rebuilding (that would be a very welcome feature), but we are looking for a way to leverage GWT incremental compilation to speed up the process and enhance our end users' experience. I have not been able to find the documentation on where the intermediate artifacts are stored and whether/how they can be reused.
While the concern about stability of this system exists - i.e. the files may change from release to release, the greatest time is taken by the base product, which also supplies the tooling. Thus, we can change this with every release too, as needed - even if plugins don't come with these artifacts, the transpilation may still be faster.
Can anyone, please, help me figure out how to leverage this (GWT 2.8 incremental transpilation) for the above use case?

gwt multiple modules without redundant code

I'm trying to find a way to rid redundant compilation and js from a client's GWT code. Problem is that they have a multiple EntryPoint site and a massive model that gets compiled for every module. We're talking about 30 GWT modules and entry points each compiling the entire model package of the app separately. It takes about 15 minutes on my 8 core monster just to GWT compile this beast. And yes, compilation is parallellized and uses all cores (can hardly move my mouse in Ubuntu :) )
To change the architecture to a single module is not really an option I think. Is there no way to have inherits be shared between modules? The modules aren't necessarily that big all of them, but the problem is again that all inherits are compiled redundantly for each module. This of course has negative effects for the end user as well since every page basically has to load the entire model-js again and again.
According to
http://www.gwtproject.org/doc/latest/DevGuideOrganizingProjects.html#DevGuideModuleXml
the suggestion still seems to be to just make one great monolithic module. Isn't there any better way?
Any tips highly appreciated!
As it is said in the GWT Documentation you refer to, GWT mechanism to face the problem of avoiding redundant code is merging all modules in just a a super-gwt-module which includes all sub-modules you have in your applications.
I suppose you are producing a module for a different page or feature at your website, so using a unique module, as I say, implies that you will need a mechanism to run the appropriate application-code per page, based on the url or something.
You can take advantage of using code-splitting, so your modules will be EntryPoints instead of RunAsyncCallbacks, and each module will be compiled in one js fragment which will be loaded asynchronously.
Note that you will include the same javascript fragment in all pages, and this will load other fragments depending on the page.
The advantages of this solution are many:
You only have one compilation process. It could take a long time, but for sure it will take much less than compiling all modules individually because redundant code will be compiled once.
You can maintain different .gwt.xml, one to continue developing the individual modules with its own EntryPoint, and another without EntryPoint which will be inherited by your super-module.
Once compiled, the first fragment loaded (shared by all apps) would be very small, and it will be cached just once, so all apps would load very fast.
Many of the code shared by the modules (gwt-core, jre, etc), could go to the first fragment and would be shared by all the modules, decreasing the final downloaded size of each app.
This is an out-of-the-box solution, gwt compiler makes a good job splitting the code, merging shared code to intermediate modules, and adding the methods to load fragments asynchronously when demanded.
Java ecosystem facilitates modular apps (dependencies, maven, etc).
Otherwise, if you continue wanting individual modules, the way to compile all of them is what you are actually doing: executing gwt compiler once per module (and permutation). You can improve your compilation time, though, having a continuous integration cluster like Jenkins and running jobs in parallel, or using more brute force (memory, cpu, ...).
As you probably know GWT compiles each module into one big JavaScript file and optimizes everything based on all available information about everything in the whole module. This is why you need to compile everything for each module.
One solution might be to do create one big module, but use code splitting similar to the module structure. Than you don't get one very large monolithic JavaScript file, but 'modules' are loaded as needed.
Did you try compiling with less localworkers, instead of using all possible available cores? I've had the best results with localworker set to 4 (even on a 6-core machine).

New project with Maven2

What's the recommended way to use Maven for a project that might grow large in the future?
I work with Eclipse and I see different approaches. Some use one project with no sub modules, and some Like mahout for example, have different sub-projects for different modules (e.g., core, math, examples, etc.). You can see it in this link:
http://svn.apache.org/repos/asf/mahout/trunk/
Is there any advantage preferring one over the other?
Thanks.
The decision to split a project into modules should be driven by the way you want to design and maintain your application. Maven itself will work well whether you choose to create a multi-module project, or lump everything together in a single module. So then the question becomes, what are the advantages/disadvantages of splitting your application into multiple modules.
Some drivers for splitting your app are purely technical:
Generation of separate client and server artifacts
Generation a command line version and a web app version
Module dependency relationships
In other cases, it is more design related concerns will drive you to split up your application. This is especially important if your concern is an application that will grow over time. In these cases, you want to separate your application into specific areas of concern, and have defined service boundaries through which modules interact. This allows you to evolve individual modules over time, while minimizing the effect on the remaining parts of the application.
Note that maven is especially good with large multi-module projects, because of the support it provides for dependency management and resolution.
Modules are really a core concepts with Maven (and are well supported) and the obvious advantage of modular builds is... well, modularity.
Pros:
Allows better separation of concerns.
Promotes, enforces modular design of code.
Gives you finer grained control of what you "apply" to each parts.
Allows to work on subparts of the code in isolation of the other parts.
Using binary dependencies speeds up compilation (vs compiling a whole monolithic project).
Allows users to depend on subparts of your application only.
Cons:
If not needed, multiple modules would induce more maintenance than a monolithic project.
So, Maven supports and promotes modular builds, that's part of the design.
And in some case, you actually don't even have the choice because of the one (main) artifact per module golden rule: if you want to distribute an application as different parts, or to assemble several parts of an application (e.g. a client JAR and and a server JAR, or JARs in a WAR, or some EJB-JARs and WARs in a EAR), you must create dedicated modules for them (Maven can't produce a WAR and an EAR from a same module).
To sum up, modularity of your build is somehow driven by the nature of your application, it might just be required. But if your app is not modular by nature (say you develop a swing client), you can use a single module. Divide it as needed when things become too complex, too big.
It's recommended to use modules cause usually in a large project you have reltionships between the modules which are handled by Maven. On the other hand the related parts are related to each other so a reactor build is a good way and of course the release system of Maven give you many things to support you in your project ( branching etc.).

What is a sensible structure for multiple-language project in source control?

At work we're developing a large-scale application with quite a few front-end, back-end and support components. Typically the front-end is developed in C# and the back-end is developed in Java, although parts of the back-end are also developed in C# and possibly later C++.
The choice of language and platform is not arbitrary; we try to weigh the relative merits of each in development time, tool-chain cost, familiarity with the language by the specific development team etc. What all these components have in common, though, is that they are all required for the complete operation of the product, and that they are being developed concurrently by independent (but highly communicative) teams.
Previously, we have used Team Foundation Server for our .NET code and Subversion for our Java code; because there was clear separation of the teams' responsibilities, this caused little problem beyond the inconvenience of placing binaries (WARs, in this case) generated from one source tree in another, and the high manual overhead of keeping the branches and revisions in sync. With this project, the degree of separation between the teams is intentionally much smaller, and the volume of branching/merging is expected to be considerably higher; as a result we're moving to a unified VCS, more specifically Subversion.
This brings me to the meat of the question: how does one mix Java and C# code effectively? In practice, we'll have .NET code dependent on a Java codebase; the Java binaries are required to run anything other than unit test code (integration tests already require the binaries, and QA, acceptance testing etc. certainly does as well). What we currently have in mind looks something like:
/trunk
/java
/component1
/component2
/library1
/library2
/net
/assembly1
/assembly2
/...
project.sln
The idea is that the entire source tree is placed under one branch; the .NET code is dependant on the Java code, so we'll add a post-build step to the solution which will (most likely) call the ant script for the Java components. This allows branching of the entire codebase (for .NET developers) or just the Java components (for Java developers).
The problems with this solution are:
What happens when one of the two codebases becomes so large that making copies of it for every branch gets impractical? (our thoughts: split to separate repositories for .NET and Java code and use svn:externals, any input on this would be greatly appreciated).
We use Eclipse for Java development. How do we manage the "shared" workspace (i.e. which projects are required for which components, the dependency graph etc.)? Up until now we've had relatively few Java components, so each developer could just keep all of them in the workspace at the same time. With the increase in Java components and Java developers I don't see how we can keep doing that; any suggestions on how to keep the workspace versioned (a la solution files) while still maintaining sync between the two code-bases?
I would love to hear your input!
1: I've found it best to group things by component, rather than langugage. If one component requires several languages for interface, you still need to develop, test and release them as one. So, splitting a component across several repos is not a good idea.
If one part of the code depends tightly on the other, keep it together. Better to split components across repos. (This even goes for internal structure, where, especially as things grow, it's difficult if you package things by type, rather than by function, i.e. in MVC, don't have three huge packages for each category, rather keep FooView, FooModel and FooController tight.)
svn:externals might work, and with the later versions I think you can use "internals", i.e. link to other dirs in the same repo. That is miles easier than managing separate repos, especially with tagging and branching. (shudder)
2: You could always have the developers setup different workspaces, or perhaps use working sets. Commercial Eclipse releases has better support for sharing workspace settings than the OS variant. (Haven't tried, only worked and been frustrated with the OS one)
I've done C++ (MSVS) and Java (Eclipse) in one repo, and it works pretty well. Also C++/Python similarly. Make sure your build system supports building and testing everything (even if your IDEs only build one part).