I am looking for an implementation of an immutable priority queue for at least Scala 2.8, but preferably more current. Is there a good implementation somewhere?
Some links here: http://www.scala-lang.org/old/node/10374
In particular, see https://github.com/scalaz/scalaz/blob/master/core/src/main/scala/scalaz/FingerTree.scala and https://github.com/Sciss/FingerTree
I think you can trust the code in scalaz to be sound. If you want a lighter-weight library, you can examine the source code from Sciss and see what you think.
Related
I have read various old StackOverflow discussions on this general topic but there is still one part of the puzzle which appears, to me at least, to be missing.
It is simply this: what is the actual mechanism by which the anonymous function is serialized? And, where could we find its source code?
Or is it all just magic?
Other relevant SO articles (the third of these itself points to some useful articles outside StockOverflow):
Serialization of Scala Functions
Why Scala can serialize...
How to serialize functions in Scala
I'm going to answer my own question with what, I believe is the correct answer. The reason I'm doing it this way is that it seems to me that this aspect of serialization is never explained and it does appear to work just by magic. I essentially confirmed (to my satisfaction) the answer as part of the research I was doing to ensure that my question above was indeed appropriate.
But the main reason I'm offering my own answer is that I invite knowledgeable users either to agree with it, to correct it, to expand upon it, or to destroy it. Here goes...
It's all magic. No, I'm just kidding. But essentially the mechanism, once Scala has taken the step of representing the anonymous function as a Class, is entirely provided for by Java. In addition, we, the programmer, need to ensure that an anonymous function is as much pure code as possible: no references to any objects that might not be serializable. The secret sauce is to be found in the Java class: ObjectStreamClass. Which, in turn, is invoked by the Java serialization classes: ObjectInputStream and ObjectOutputStream.
Essentially the serialized bytes contain the full pathname of the class, its serialVersionUID, and whatever other relevant information is necessary. When deserializing, the system will simply look up the class in the appropriate classpath and return a reference to it. This obviously assumes that the deserializing system has the class in its classpath. The mechanism for that is a little beyond the scope of my research but it's clear that in a system like Spark, it should be easy to arrange.
No (additional) compilation/decompilation of byte code is necessary as the classLoader has everything necessary. I'm slightly surprised to find the ObjectStreamClass in java.io rather than in the reflection package, but I suppose there's an argument for it being there, given the tight coupling with ObjectInputStream and ObjectOutputStream.
One thing to keep in mind is that while we think in terms of serializing/deserializing objects, rather than classes, what we are dealing with here is an object of type Class.
One more thing to note is that in Scala 2.12, anonymous functions are now implemented differently: as Java8 lambdas. This has broken the mechanism described above in a rather serious way. So serious, that Spark is currently having trouble supporting Scala 2.12. The holdup appears to be this issue: SPARK-14540.
Is anyone aware of a standard API equivalent to Akka's ByteString: http://doc.akka.io/api/akka/2.3.5/index.html#akka.util.ByteString
This very convenient class has no dependency on any other Akka code, and it saddens me to have to import the whole Akka jar just to use it.
I found this fairly old discussion mentioning adding it to the standard API, but I don't know what happened to this project: https://groups.google.com/forum/#!msg/scalaz/ZFcjGpZswRc/0tCIdXvpGBAJ
Does anyone know of an equivalent piece of code in the standard API? Or in a very lightweight library?
You might want to check out scodec-bits. It provides two types, BitVector and ByteVector (API docs), supporting fast appends, take, drop, random access, etc. The library has zero dependencies. We split it out of scodec precisely because we thought it might of general use outside of scodec, where it's used heavily.
Does Scala have any well developed libraries in the spirit of Haskell's pipes, or at least iteratee?
I found Play's iteratee library first, but I couldn't make it work, and it seems tightly coupled with Play's concurrency primitive Promise, which could be inappropriate in many cases.
Scalaz has some iteratee support (like IterV), but it seems there are only core classes with no additional support functions, predefined iteratees/enumerators etc. Also I couldn't find any documentation, even scaladoc is very sparse, so it's quite difficult to use properly.
And I couldn't find anything similar to pipes.
Building up on comments from Travis, currently there are:
Scalaz 7 iteratee package (iterv, you mentioned, is a compatibility layer with scalaz 6)
A port of Conduit library
Runar's scala-machines library (presentation, haskell version)
I have a code in scala that, for various reasons, have few lines of code that cannot be accessed by more threads at the same time.
How to easily make it thread-safe? I know I could use Actors model, but I find it a bit too overkill for few lines of code.
I would use some kind of lock, but I cannot find any concrete examples on either google or on StackOverflow.
I think that the most simple solution would be to use synchronized for critical sections (just like in Java). Here is Scala syntax for it:
someObj.synchronized {
// tread-safe part
}
It's easy to use, but it blocks and can easily cause deadlocks, so I encourage you to look at java.util.concurrent or Akka for, probably, more complicated, but better/non-blocking solutions.
You can use any Java concurrency construct, such as Semaphores, but I'd recommend against it, as semaphores are error prone and clunky to use. Actors are really the best way to do it here.
Creating actors is not necessarily hard. There is a short but useful tutorial on actors over at scala-lang.org: http://www.scala-lang.org/node/242
If it is really very simple you can use synchronized: http://www.ibm.com/developerworks/java/library/j-scala02049/index.html
Or you could use some of the classes from the concurrent package in the jdk: http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/package-summary.html
If you want to use actors, you should use akka actors (they will replace scala actors in the future), see here: http://doc.akka.io/docs/akka/2.0.1/. They also support things like FSM (Finite State Machine) and STM (Software Transactional Memory).
In general try to use pure 'functions' or methods with immutable data structures that should help with thread safety.
I use a lot of scala maps, occasionally I want to pass them in as a map to a legacy java api which wants a java.util.Map (and I don't care if it throws away any changes).
An excellent library I have found that does a better job of this:
http://github.com/jorgeortiz85/scala-javautils
(bad name, awesome library). You explicitly invoke .asJava or .asScala depending on what direction you want to go. No surprises.
Scala provides wrappers for Java collections so that they can be used as Scala collections but not the other way around. That being said it probably wouldn't be hard to write your own wrapper and I'm sure it would be useful for the community. This question comes up on a regular basis.
This question and answer discuss this exact problem and the possible solutions. It advises against transparent conversions as they can have very strange side-effects. It advocates using scala-javautils instead. I've been using them in a large project for a few months now and have found them to be very reliable and easy to use.