Any Real-World Experience Using Software Transactional Memory? [closed] - scala

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
It seems that there has been a recent rising interest in STM (software transactional memory) frameworks and language extensions. Clojure in particular has an excellent implementation which uses MVCC (multi-version concurrency control) rather than a rolling commit log. GHC Haskell also has an extremely elegant STM monad which also allows transaction composition. Finally, so as to toot my own horn just a bit, I've recently implemented an STM framework for Scala which statically enforces reference restrictions.
All of these are interesting experiments, but they seem to be confined to that sphere alone (experimentation). So my question is: have any of you seen or used STM in the real world? If so, why? What sort of benefits did it bring? What about performance? (there seems to be a great deal of conflicting information on this point) Would you use STM again or would you prefer to use some other concurrency abstraction like actors?

I participated in the hobbyist development of the BitTorrent client in Haskell (named conjure). It uses STM quite heavily to coordinate different threads (1 per peer + 1 for storage management + 1 for overall management).
Benefits: less locks, readable code.
Speed was not an issue, at least not due to STM usage.
Hope this helps

We use it pretty routinely for high concurrency apps at Galois (in Haskell). It works, its used widely in the Haskell world, and it doesn't deadlock (though of course you can have too much contention). Sometimes we rewrite things to use MVars, if we've got the design right -- as they're faster.
Just use it. It's no big deal. As far as I'm concerned, STM in Haskell is "solved". There's no further work to do. So we use it.

The article "Software Transactional Memory: Why is it Only a Research Toy?" (Călin Caşcaval et al., Communications of the ACM, Nov. 2008), fails to look at the Haskell implementation, which is a really big omission. The problem for STM, as the article points out, is that implementations must chose between either making all variable accesses transactional unless the compiler can prove them safe (which kills performance) or letting the programmer indicate which ones are to be transactional (which kills simplicity and reliability). However the Haskell implementation uses the purity of Haskell to avoid the need to make most variable uses transactional, while the type system provides a simple model together with effective enforcement for the transactional mutation operations. Thus a Haskell program can use STM for those variables that are truly shared between threads whilst guaranteeing that non-transactional memory use is kept safe.

We, factis research GmbH, are using Haskell STM with GHC in production. Our server receives a stream of messages about new and modified "objects" from a clincal "data server", it transforms this event stream on the fly (by generating new objects, modifying objects, aggregating things, etc) and calculates which of these new objects should be synchronized to connected iPads. It also receives form inputs from iPads which are processed, merged with the "main stream" and also synchronized to the other iPads. We're using STM for all channels and mutable data structures that need to be shared between threads. Threads are very lightweight in Haskell so we can have lots of them without impacting performance (at the moment 5 per iPad connection). Building a large application is always a challenge and there were many lessons to be learned but we never had any problems with STM. It always worked as you'd naively expect. We had to do some serious performance tuning but STM was never a problem. (80% of the time we were trying to reduce short-lived allocations and overall memory usage.)
STM is one area where Haskell and the GHC runtime really shines. It's not just an experiment and not for toy programs only.
We're building a different component of our clincal system in Scala and have been using Actors so far, but we're really missing STM. If anybody has experience of what it's like to use one of the Scala STM implementations in production I'd love to hear from you. :-)

We have implemented our entire system (in-memory database and runtime) on top of our own STM implementation in C. Prior to this, we had some log and lock based mechanism to deal with concurrency, but this was a pain to maintain. We are very happy with STM since we can treat every operation the same way. Almost all locks could be removed. We use STM now for almost anything at any size, we even have a memory manager implement on top.
The performance is fine but to speed things up we now developed a custom operating system in collaboration with ETH Zurich. The system natively supports transactional memory.
But there are some challenges caused by STM as well. Especially with larger transactions and hotspots that cause unnecessary transaction conflicts. If for example two transactions put an item into a linked list, an unnecessary conflict will occur that could have been avoided using a lock free data structure.

I'm currently using Akka in some PGAS systems research. Akka is a Scala library for developing scalable concurrent systems using Actors, STM, and built-in fault tolerance capabilities modeled after Erlang's "Let It Fail/Crash/Crater/ROFL" philosophy. Akka's STM implementation is supposedly built around a Scala port of Clojure's STM implementation. An overview of Akka's STM module can be found here.

Related

Cost of Scala's immutable object creation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I see posts like the for-comprehension in [1] and it really makes me wonder what the overall implication of using the immutable Map vs a Mutable one is. It seems like Scala developers are very comfortable with allowing mutations of immutable data structures to incur the cost of a new object- or maybe I'm just missing something. If every mutation operation on an immutable data structure is returning a new instance, though I understand it's good for thread safety, but what if i know how to fine-tune my mutable objects already to make these same guarantees?
[1] In Scala, how can I do the equivalent of an SQL SUM and GROUP BY?
In general, the only way to answer these kind of performance questions is to profile them in your real-world code. Microbenchmarks are often misleading (see e.g. this benchmarking tale) - and particularly if you're talking about concurrency the best strategy can be very different depending on how concurrent your use case is in practice.
In theory, a Sufficiently Smart Compiler™ should be able - perhaps with the help of a linear type system (inferred or otherwise) - to reproduce all the efficiency advantages of a mutable data structure. In fact, since it has more information available about the programmer's intent and is less constrained by incidental details that the programmer had to specify, such a compiler ought to be able to generate higher-performance code - and e.g. GCC rewrites code into immutable form (SSA) for optimization purposes. For an example that hits closer to home, many real-world Java programs have perfectly adequate throughput, but have issues with latency caused by Java's garbage collector stopping the world to compact the heap. A JVM that was aware that certain objects were immutable would be able to move them without stopping the world (you can simply copy the object, update all references to it, and then delete the old copy, since it doesn't matter if some threads see the old version while some of them see the new one).
In practice, it depends, and again the only way is to benchmark your specific case. In my experience, for the level of investment of programmer time that's available for most practical business problems, spending x hours on a (immutable) Scala version tends to yield a more performant program than spending the same time on a mutable Scala or Java version - indeed, in the amount of programmer time it takes to produce an acceptably-performing Scala version it would probably be impossible to complete a Java version at all (particularly if we require the same defect rate). On the other hand, if you have unlimited expert programmer time available and need to get the absolute best performance possible, you would probably want to use a very low-level mutable language (this is why LAPACK is still written in Fortran) - or even implement your algorithm directly on an FPGA as JP Morgan recently did.
But even in this case you probably want to have a prototype in a higher-level language so that you can write tests and compare the two to confirm that the high-performance implementation works correctly. Particularly if we're just talking about mutable vs. immutable in Scala, premature optimization is the root of all evil. Write your program, and then if performance is inadequate, profile it and look at the hotspots. If you really are spending too much time copying an immutable data structure, that's an appropriate time to replace it with a mutable version, and carefully check the thread safety guarantees by hand. If you're writing properly decoupled code then it should be easy to replace the performance-critical pieces as and when you need to, and until then you can reap the development time gains of code that's simpler and easier to reason about (particularly in concurrency cases). In my experience performance problems in well-written code are a lot less likely than people expect; most software performance issues are caused by a poor choice of algorithm or data structure rather than this kind of small overhead.
Your question starts with a wrong assumption, based on a misunderstanding of the cost incurring of using immutable objects.
Working with guaranteed immutable objects that are build form immutable objects allows you to use structural sharing, so you can create new objects based on the old ones without having to resort to a deep copy of the object and you can ,roughly spoken, reuse parts of the object the new on is based on.
So this mitigates the impact of using immutable objects greatly.
So what is the difference to fine-tuned, hand-crafted mutable objects ?
immutable objects fit better for the FP paradigma
compile time optimization and checks
lowers the chance of runtime exceptions
The question is very generic, so it is hard to give a definite answer. It seems that you are just uncomfortable with the amount of object allocation happening in idiomatic scala code using for comprehensions and the like.
The scala compiler does not do any special magic to fuse operations or to elide object allocations. It is up to the person writing the data structure to make sure that functional data structures reuse the as much as possible from previous versions (structural sharing). Many of the data structures used in scala collections do this reasonably well. See for example this talk about Functional Data Structures in Scala to give you a general idea.
If you are interested in the details, the book to get is Purely Functional Data Structures by Chris Okasaki. The material in this book applies also to other functional languages like Haskell and OCaml and Clojure.
The JVM is extremely good at allocating and collecting short-lived objects. So many things that seem outrageously inefficient to somebody accustomed to low level programming are actually surprisingly efficient. But there are definitely situations where mutable state has performance or other advantages. That is why scala does not forbid mutable state, but only has a preference towards immutability. If you find that you really need mutable state for performance reasons, it is usually a good idea to wrap your mutable state in an akka actor instead of trying to get low-level thread synchronization right.

What are the weaknesses in using Immutability + Actor model for concurrency programming?

While building a large multi threaded application for the financial services industry, I utilized immutable classes and an Actor model for workflow everywhere I could. I'm pretty pleased with the outcome. It uses a fair amount of heap space (its in Java btw) but the JVM's GC works pretty well with short lived immutable classes.
I am just wondering if there are any downsides to using this kind of pattern going forward? When debugging a team mates code, I often find myself recomending this pattern in one way or another. I guess once one has a hammer, everything looks like a nail. So the question is: When would this design pattern (paradigm?) work poorly?
My hunch is when memory usage is a big issue or when the project restrictions require something along the lines of low-level C, etc.
Many science simulations codes are really memory intensive. For example for cellular automata models fast memory access is more important than CPU power. In that case, accessing and modifying in place a mutable array is always faster (at least in all my trials).
All depends of your project design.
If you have some resource and lot of actors use it then the common pattern is to design accessor actor. Then when some other actor needs to ask about some resource, he ask about it accessor actor. Then the answer is copied through message channel.
Now imagine - you have really heavy resource (eg map[String, BigObject]) and other actors frequently ask about some BigObject then you waste your bandwidth.
Better idea would be to share the resource to all actors in readonly mode, and make one actor to perform writes.
Other example would be database connector which connect to database with a lot of blob data. When database connector is thread safe (as normally is) it's better to share the connector object reference to all actors, then design some actor which provides the access.
All you need to remember every that communication between actors is done by copying messages.

Clojure or Scala for bioinformatics/biostatistics/medical research [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am not a professional programmer (my area is medical research), but I am quite capable in C/C++, and various scripting languages. A while back I got intrigued by Lisp, but I never got the time to seriously learn it. After a brief exposure to R I decided to invest more time in a functional programming language.
I would like the practicality of a JVM language and thus narrowed to Clojure and Scala. From what I understand, both can use already existing Java libraries and given at performance-critical code can be delegated to Java, have the potential to perform relatively equally well.
How do these languages compare in the application space I need them for?
Are There any real-life projects in bioinformatics using either?
Already existing code would be a serious plus, as would be good documentation and a fairly gentle learning curve. Also, how does the concurrency model of the two compare with each other?
Any significant advantages/disadvantages any one has?
I can personally vouch for Clojure as a great tool for this kind of work. (I believe Scala would be great too, I just have less experience with it).
My personal research is in the field of predictive modelling / machine learning and is very computationally intensive - so I think it has many parallels with bioinformatics or biostatistics.
My personal approach / setup includes:
Incanter used primarily as a data visualisation tool. Great for producing quick visualisations which are usually just 1-liners at the REPL. There are also lots of statistical and numerical processing tools which I believe use the Colt library under the hood. I'm not an expert in R but I understand that Incanter is roughly "R translated to Clojure/Lisp".
Exploiting quite a few Java libraries as needed. Some of these are my own, for example algorithms that I have written in Java in order to get the best possible fine-tuned performance out of the JVM. But you could equally easily use any of the other great Java libraries available, as calling Java from Clojure is very simple (.methodName object param1 param2)
Quite a lot of higher order functions to automate my workflow. For example I have a higher order function that will run an optimisation algorithm of any kind in a loop for a specified amount of time and then produce an Incanter graph of the improvement on each iteration. Not rocket science, but really easy to code up in a few lines of Clojure.
Never really having to worry about performance. You can make Clojure go pretty fast if you want to (e.g. with type hints, primitive arithmetic support etc.) but normally it's irrelevant as you're going to spend 99%+ of your cycles in well-optimised library code anyway. Hence a bit of overhead in the "glue" code is negligible - I feel I gain much more in terms of personal productivity by having a dynamic, high-level, functional language to work in.
Major use of Clojure's concurrency features - this has to be one of Clojure's strongest features. I tend to use the STM to code concurrent processes with transactions that can't interfere with each other, then kick off long-running calculations in a future so that I can get on with other tasks and wait for notification of the result.
A slowly growing collection of macros to "extend the language" when needed. I actually use macros less than I thought I would (higher order functions are often a better choice). But when you need them they are invaluable - this is where you really appreciate the value of a homoiconic language. Since they effectively allow you to add new syntax to the language itself, they are very powerful when used correctly to build the DSL that you need.
In short - I don't think you can go wrong with Clojure as a researcher.
The one thing I probably wouldn't use it for (yet) is actually writing a new numerical library - this would probably be better done in Scala or pure Java as you would probably want to adopt a more imperative / OOP style.
I am not sure about bioinformatics and biostatistics per se, but I do scientific data analysis frequently and I appreciate that Scala allows me to write as-fast-as-Java code with relative ease. I believe that it is often possible in Clojure now, but I haven't seen the benchmarks to back that up. For the time being, I think the prudent thing to assume is that they do not perform equally well. See, for example, the Computer Languages Benchmark Game, where Scala is faster than Clojure in every single test. (Ignore the horrible "pidigits" result for Clojure--Scala (and Java) are calling the GMP library written in C, which Clojure could do but because of a technical detail requiring a different wrapping for the library, isn't presently allowed in the game). Looking at multicore comparisons doesn't improve Clojure's showing, and note that the Clojure code is no shorter for these sorts of lowish-level algorithmic tasks.
Clojure is ahead for the time being with parallel collections, though the upcoming 2.9 release of Scala should make up much of the difference. Neither has a gentle learning curve when coming from C++; Scala is maybe a little easier given that the syntax outwardly looks a little more familiar. I believe there are good materials for learning each of them.
Edit: P.S. You can call R from Java (and therefore from either Clojure or Scala) using rJava (specifically the JRI interface). Edit to edit: and, these days, rScala.
Edit #2: Scala was faster than Clojure in everything at the time of writing; as of this edit, Clojure's a little ahead in one (at the cost of a huge amount of code)--but anyway, the overall point stands. (And the Scala implementation on that one test could be sped up.)
If you like R, give Incanter a try! It's R for Clojure.
Scala's is geared toward being syntactically easy for people coming from Java, which was intended to be syntactically easy for people coming from C though with two levels of indirection like this the advantage may be lost.
Clojure is getting a lot of traction in the Big Data space and maps very well onto Hadoop jobs for Huge Data. I think this would be a big advantage in the bioinformatics world.
Really, these things are largely personal taste so try both and see that makes you happy :)
If you are looking to get a feel for Clojure without a lot of "intellectual overhead" may I suggest using leiningen to get a test project started quickly?
To build on Rex's answer I would like to add some Scala libraries/products that may be of interest to you:
ADAM
Spark (sparkseq, 2)
Scala Map Reduce (SMR): http://scala-blogs.org/2008/09/scalable-language-and-scalable.html
SHadoop: http://jonhnny-weslley.blogspot.com/2008/05/shadoop.html
ScalaLab: MATLAB-like scientific computing in Scala
ScalaNLP: Collection of libraries for natural language processing (NLP), machine learning, and statistics.
Factorie: Toolkit for deployable probabilistic modeling
Gridgain: Compute cluster for Scala and Java
BioScala: Bioinformatics for the Scala programming language
I don't know Scala, so I can't offer a comparison, but I am actively using Clojure in bioinformatics projects.
The Java integration is excellent, and I have had no problem making use of the BioJava libraries.
Where Clojure's concurrency model shines is in the immutable default data types and functional programming with the seq abstraction.
In my bioinformatic work I very often find myself with a lot input data (say gene sequences) which need to be subjected to the same analysis. Once I have my analysis function I can map it over a sequence of inputs (with the results lazily generated). I have gotten full utilization of a large 48-core server simply by changing that map to a pmap.
Large scale parallelization with a single character change is hard to beat!
Of course pmap isn't a magic bullet and only helps when the analysis function computationally dominates, but the fact that map and pmap can just be plugged in and out shows the elegance and simplicity enabled by Clojure's design.
I am only passingly familiar with Scala, so the best I can do is evangelize a bit for Clojure. It's a great language, but take all this advice with a grain of salt as it's coming from an enthusiast.
If you are looking for concurrency, Clojure is fantastic both for ease of programming and for performance. The immutable data structures mean that it's trivial to work with a coherent snapshot of the world without any manual and error-prone locking; the STM makes it fairly simple to change data in a thread-sensitive way without breaking anyone else's snapshots.
My understanding is that Scala has a lot of the nice functional tools that Clojure does, but Clojure will always win syntactically by virtue of being a Lisp. If you're looking to do some specialized bioinformatics stuff, Clojure is able to hide the bits of Lisp that you don't want, and raise your own constructs to the same level as the built-in language constructs. I can't find the reference right now, but there's some well-known quote about Lisp that goes like:
Lisp is not the perfect language for any program. But it is the perfect language for building the perfect language for every program.
That's horribly paraphrased, but in my experience it has been true. It looks like you'll want a fairly specialized set of tools, and no language will make those feel as natural as a Lisp.
You have to ask yourself how important functional programming is for you. You know C++ so you probably know OO. I would say it's easier to do FP in Clojure (because you can't really drop back to OO-style) in Scala you will probebly end up dropping FP and do more OO style.
I can't really say anything about your application space.
Since you mentioned R, there is an R-like Clojure library for statistics called Incanter. I don't know about other existing projects in your application space.
There is a lot of information about both languages, so that should not be a problem. The learning curve is kind of steep with both languages. Clojure is a much smaller language and since you already know some lisp it should not be to hard to learn the important stuff. Scala has a type system that will be hard to pick up especially since your main experience is with C/C++.
Both languages have great concurrency models and you will probably be happy with both.
I have some experience in Scala and only little knowledge in Clojure, but I programmed Lisp many years ago.
Lisp is a beautiful language, but it never made it to the world, because it was too limited. I believe you need a statically-typed language to develop robust systems. The type system in Scala is not difficult to master to benefit from it. If you want to do very advanced things with it to make your libraries idiot-proof, you can, but then you will need to study the type system a little more.
Scala favours immutable types, but you can use mutables without any problem, which you sometimes do need. Concurrency in Scala is very well implemented and frameworks like akka extend and enhance these possibilities.
Scala stands a better chance to become a mainstream language since it's a fuller language. I'm afraid that Clojure is too much like Lisp (but reimplemented on the JVM). I liked Lisp a lot, but it had too many disadvantages for real-life programs. With Scala I think we have the best of both worlds (OO and functional) in a clean marriage. On top of that, Scala seems to really catch on in the market.
We have been working on some experimental code in the Rudolf/BioClojure project on GitHub. Also, look at Jan Aert's BioClojure project which is more structured.
Additionally, there is a BioCaml project in the works...

Why is Scala good for concurrency?

Are there any special concurrency operators, or is functional style programming good for concurrency? And why?
At the moment, Scala already supports two major strategies for concurrency - thread-based concurrency (derived from Java) and type-safe Actor-based concurrency (inspired by Erlang). In the nearest future (Scala 2.9), there will be two big additions to that:
Software Transactional Memory (the basis for concurrency in Clojure, and probably the second most popular concurrency-style in Haskell)
Parallel Collections (without going into detail, they allow for parallelizing basic collection transformers, like foreach or map across multiple threads).
Actor syntax (concurrency operators) is heavily influenced by Erlang (with some important additions) - with regards to the library you use (standard actors, Akka, Lift, scalaz), there will be different combinations of question and exclamation marks: ! (in the most cases for sending a message one-way), !!, !?, etc.
Besides that, first-class functions make your life way easier even when you work with old Java concurrency frameworks: ExecutorService, Fork-Join Framework, etc.
Above all of that stands immutability that simplifies concurrency a lot, making code more predictable and reliable.
There are a number of language features that make Scala good for concurrency. For example:
Most of the data structures are immutable and don't require anything special for concurrent access.
The functional style is a really good way to do highly concurrent operations.
Scala includes a really handy "actor" framework that helps with concurrent asynchronous operations.
Further reading:
http://www.ibm.com/developerworks/java/library/j-scala02049.html
http://blog.objectmentor.com/articles/2008/08/14/the-seductions-of-scala-part-iii-concurrent-programming
http://akka.io
Well, there is the hype and there is the reality. Scala got a fame for being good for concurrency because it is a functional language and because of its actors library. Functional languages are good for concurrency because they focus on immutability, which helps concurrent algorithms. Actors got their reputation because they are the base to Erlang's track record of massively concurrent systems.
So, in a sense, Scala's reputation is due to being a "me too" of successful techniques. Yet, there is something that Scala does bring to the table, which is its ability to support such additions to the language through libraries, which makes it able to adapt and adopt new techniques as they are devised.
Actors are not native to Scala, and yet there are already there different libraries in wide use that all seem to be. Neither is transactional memory, but, again, there are already libraries that look like they are.
They, these libraries, are even available for java, but there they are clunky to use.
So the secret is not so much what it can do, but that it makes it look easy.
The big keyword here is immutability. See this Wiki page. Since any variable can be defined as mutable or immutable, this is a big win for concurrency, since if an object cannot be changed, it is thread safe and therefore writing concurrent programs is easier.
Just to rain on everyone's parade a bit, see this question.
I've dual-coded a fair few multithreaded things in Scala (mainly using Futures, a bit with Actors too) and C++ (using TBB) since then (mostly Project Euler problems). General picture seems to be that Scala needs ~1/3 the number of lines of code of the C++ solution (and is quicker to write), but the C++ solution will be ~x10 faster runtime (unless you go to some efforts to avoid any "object churn", as shown in the fast version in the answer referenced above, but at that point you're losing much of the elegance of Scala). I'm still on 2.7.7 mind you; haven't tried Scala 2.8 yet.
Using Scala was my first real encounter with a language which strongly emphasised immutability and I'm impressed by its power to hugely simplify the mental model you have to maintain of object "state machines" while programming (and this in turn makes coding for concurrency easier). It's certainly influenced how I approach C++ coding.

Garbage-collectors for multi-core llvm?

I've been looking at LLVM for quite some time as a new back-end for the language I'm currently implementing. It seems to have good performance, rather high-level generation APIs, enough low-level support to optimize exotic optimizations. In addition, and although I haven't checked it myself, Apple seems to have successfully demonstrated the use of LLVM for garbage-collected multi-core programs.
So far, so good. As I'm interested in both garbage-collection and multi-core, the next step would be to choose a LLVM multi-core-able garbage-collector. Which brings me to the question: what is available? I'm aware of Jon Harrop's HLVM work, but that's about it.
Note that I need cross-platform, so Apple's GC is probably not what I'm looking for (unless there's a cross-platform version). Also note that I have nothing against stop-the-world garbage-collectors.
Thanks in advance,
Yoric
LLVM docs say that it does not support multi-threaded collectors yet.
As the matrix indicates, LLVM's
garbage collection infrastructure is
already suitable for a wide variety of
collectors, but does not currently
extend to multithreaded programs. This
will be added in the future as there
is interest.
The docs do say that to do multi-threaded garbage collection you need to stop the world and that this is a non-portable thing:
Threaded
Denotes a multithreaded mutator; the collector must still stop the
mutator ("stop the world") before
beginning reachability analysis.
Stopping a multithreaded mutator is a
complicated problem. It generally
requires highly platform specific code
in the runtime, and the production of
carefully designed machine code at
safe points.
However, shared state between threads is a nasty scaling issue. If your language communicates solely through message passing between 'tasks', and therefore there was no shared state between worker threads, then you could use a per-thread collector for the per-thread heap?
The quotes that Will gave are about LLVM's intrinsic support for GC, where you augment LLVM with C++ code telling it how to walk the stack, interpret stack frames, inject read and write barriers and so on. The primary goal of my HLVM project is to become useful with minimal effort and risk so I chose to use the shadow stack for an "uncooperative environment" in order to avoid hacking on immature internals of LLVM. Consequently, those statements about LLVM's intrinsic support for GC do not apply to HLVM's garbage collector because it does not use that infrastructure at all. My results are extremely compelling: you can achieve excellent performance with minimal effort (serial performance and parallel performance).
I believe HLVM already runs out-of-the-box across Unixs including Mac OS X because it requires only POSIX threads. I strongly disagree with the claim that writing a stop-the-world GC is difficult: it took me 5 days to write a 100-line multicore garbage collector and I barely know anything about computers. I cannot believe it would be difficult to port to Windows either.