Independent random number generators - scala

I need to create independent random number generators in scala from a given one, something similar to haskell's split function. Can I use the current number generator to produce ints or longs, and use these as seed to create new generators?
val rng: Random
val seed1 = rng.nextLong()
val seed2 = rng.nextLong()
val rng1 = new Random(seed1)
val rng2 = new Random(seed2)
Is this a good way to do it? Or does it screw the sequence of randomness (for example, by relying on Ints or Longs which may carry less information than the actual state of a random number generator)?

It depends on the specific algorithm used for generating pseudo-random numbers. If the documentation doesn't promise it gives independent streams (and Java's doesn't), assume it won't work well. Terms like "multiple streams", "parallel streams", or "splittable" can also be used.
I believe MRG32k3a is the most popular generator with multiple stream support. You can easily find Java implementations online, but the ones I did are GPL-licensed, so I refrained from including them. PCG also supports multiple streams and there is a Scala implementation in Spire. (And a Java implementation at https://github.com/alexeyr/pcg-java).

I think it depends on what your usecase is.
If you just want to make sure the rngs numbers advance independently (e.g., each thread or actor gets an rng and you want to stay deterministic), your approach is fine.

Related

Functional implementation of Tarjan's Strongly Connected Components algorithm

I went ahead and implemented the textbook version of Tarjan's SCC algorithm in Scala. However, I dislike the code - it is very imperative/procedural with lots of mutating states and book-keeping indices. Is there a more "functional" version of the algorithm? I believe imperative versions of algorithms hide the core ideas behind the algorithm unlike the functional versions. I found someone else encountering the same problem with this particular algorithm but I have not been able to translate his Clojure code into idomatic Scala.
Note: If anyone wants to experiment, I have a good setup that generates random graphs and tests your SCC algorithm vs running Floyd-Warshall
See Lazy Depth-First Search and Linear Graph Algorithms in Haskell by David King and John Launchbury. It describes many graph algorithms in a functional style, including SCC.
The following functional Scala code generates a map that assigns a representative to each node of a graph. Each representative identifies one strongly connected component. The code is based on Tarjan's algorithm for strongly connected components.
In order to understand the algorithm it might suffice to understand the fold and the contract of the dfs function.
def scc[T](graph:Map[T,Set[T]]): Map[T,T] = {
//`dfs` finds all strongly connected components below `node`
//`path` holds the the depth for all nodes above the current one
//'sccs' holds the representatives found so far; the accumulator
def dfs(node: T, path: Map[T,Int], sccs: Map[T,T]): Map[T,T] = {
//returns the earliest encountered node of both arguments
//for the case both aren't on the path, `old` is returned
def shallowerNode(old: T,candidate: T): T =
(path.get(old),path.get(candidate)) match {
case (_,None) => old
case (None,_) => candidate
case (Some(dOld),Some(dCand)) => if(dCand < dOld) candidate else old
}
//handle the child nodes
val children: Set[T] = graph(node)
//the initially known shallowest back-link is `node` itself
val (newState,shallowestBackNode) = children.foldLeft((sccs,node)){
case ((foldedSCCs,shallowest),child) =>
if(path.contains(child))
(foldedSCCs, shallowerNode(shallowest,child))
else {
val sccWithChildData = dfs(child,path + (node -> path.size),foldedSCCs)
val shallowestForChild = sccWithChildData(child)
(sccWithChildData, shallowerNode(shallowest, shallowestForChild))
}
}
newState + (node -> shallowestBackNode)
}
//run the above function, so every node gets visited
graph.keys.foldLeft(Map[T,T]()){ case (sccs,nextNode) =>
if(sccs.contains(nextNode))
sccs
else
dfs(nextNode,Map(),sccs)
}
}
I've tested the code only on the example graph found on the Wikipedia page.
Difference to imperative version
In contrast to the original implementation, my version avoids explicitly unwinding the stack and simply uses a proper (non tail-) recursive function. The stack is represented by a persistent map called path instead. In my first version I used a List as stack; but this was less efficient since it had to be searched for containing elements.
Efficiency
The code is rather efficient. For each edge, you have to update and/or access the immutable map path, which costs O(log|N|), for a total of O(|E| log|N|). This is in contrast to O(|E|) achieved by the imperative version.
Linear Time implementation
The paper in Chris Okasaki's answer gives a linear time solution in Haskell for finding strongly connected components. Their implementation is based on Kosaraju's Algorithm for finding SCCs, which basically requires two depth-first traversals. The paper's main contribution appears to be a lazy, linear time DFS implementation in Haskell.
What they require to achieve a linear time solution is having a set with O(1) singleton add and membership test. This is basically the same problem that makes the solution given in this answer have a higher complexity than the imperative solution. They solve it with state-threads in Haskell, which can also be done in Scala (see Scalaz). So if one is willing to make the code rather complicated, it is possible to implement Tarjan's SCC algorithm to a functional O(|E|) version.
Have a look at https://github.com/jordanlewis/data.union-find, a Clojure implementation of the algorithm. It's sorta disguised as a data structure, but the algorithm is all there. And it's purely functional, of course.

Scala Buffer: Size or Length?

I am using a mutable Buffer and need to find out how many elements it has.
Both size and length methods are defined, inherited from separate traits.
Is there any actual performance difference, or can they be considered exact synonyms?
They are synonyms, mostly a result of Java's decision of having size for collections and length for Array and String. One will always be defined in terms of the other, and you can easily see which is which by looking at the source code, the link for which is provided on scaladoc. Just find the defining trait, open the source code, and search for def size or def length.
In this case, they can be considered synonyms. You may want to watch out with some other cases such as Array - whilst length and size will always return the same result, in versions prior to Scala 2.10 there may be a boxing overhead for calling size (which is provided by a Scala wrapper around the Array), whereas length is provided by the underlying Java Array.
In Scala 2.10, this overhead has been removed by use of a value class providing the size method, so you should feel free to use whichever method you like.
As of Scala-2.11, these methods may have different performance. For example, consider this code:
val bigArray = Array.fill(1000000)(0)
val beginTime = System.nanoTime()
var i = 0
while (i < 2000000000) {
i += 1
bigArray.length
}
val endTime = System.nanoTime()
println(endTime - beginTime)
sys.exit(-1)
Running this on my amd64 computer gives about 2423834 nanos time (varies from time to time).
Now, if I change the length method to size, it will become about 70764719 nanos time.
This is more than 20x slower.
Why does it happen? I didn't dig it through, I don't know. But there are scenarios where length and size perform drastically different.
They are synonyms, as the scaladoc for Buffer.size states:
The size of this buffer, equivalent to length.
The scaladoc for Buffer.length is explicit too:
The length of the buffer. Note: xs.length and xs.size yield the same result.
Simple advice: refer to the scaladoc before asking a question.
UPDATE: Just saw your edit adding mention of performance. As Daniel C. Sobral aid, one is normally always implemented in term of the other, so they have the same performance.

Is it possible to implement F#'s infrastructure for Units of Measurement in Scala?

F# ships with special support for a unit of measurement system, which provides static type safety while compiling down to the numeric types instead of burdening the runtime with wrapping/unwrapping operations.
Is it possible to use some of Scala's type system magic to implement something comparable to that?
The answer is no.
Now, someone is bound to point me to Scalar, but that gives runtime checking. Perhaps, then, point to the efforts of Jesper Nordenberg's type-safe units or Jim McBeath's take on it, but these are cumbersome and awkward.
I'll point, instead to the Units compiler plugin. It gave Scala, back in 2008/2009, a pretty good system of units, as can be seen in this post. It did so, however, by extending the compiler, which would not be necessary if the type system was enough. Alas, it has not been maintained and it doesn't work anymore.
I don't know anything about it, but I just stumbled accross this talk at Scala Days: https://wiki.scala-lang.org/display/SW/ScalaDays+2011+Resources#ScalaDays2011Resources-ScalaUImplementingaScalalibraryforUnitsofMeasure
Kind of. You can encode the SI units quite easily using a type representation of integers in a tuple of exponents. See http://svn.assembla.com/svn/metascala/src/metascala/Units.scala for an example implementation.
It should also be possible to support an extensible units system if the units are encoded as a TList of pairs of a unit type and an integer (for example, ((M, _1), (S, _2)) where M <: Unit and S <: Unit). Calculating the types for quantity operations becomes a bit more complicated in this encoding.
Regarding performance there will always be a memory overhead for wrapping the value in a type containing the unit information. However there is probably no performance overhead in the actual operations as all unit checking is done at compile time.
Have a look at Units of Measure - A Scala Macro System. It seems to satisfy your requirements.

Restriction on Range

i'm surprised. Why was made restriction of implementation to type Range, is whose the size limited by Int.MaxValue?
Thanks.
From the NumericRange docs,
NumericRange is a more generic version of the Range class which works
with arbitrary types. It must be supplied with an Integral
implementation of the range type.
Factories for likely types include Range.BigInt, Range.Long, and
Range.BigDecimal. Range.Int exists for completeness, but the Int-based
scala.Range should be more performant.
val r1 = new Range(0, 100, 1)
val veryBig = Int.MaxValue.toLong + 1
val r2 = Range.Long(veryBig, veryBig + 100, 1)
assert(r1 sameElements r2.map(_ - veryBig))
In my opinion the other answer is just wrong.
It demonstrates that you can use other number types, but this doesn't change the fact that a Range can only hold 2³¹ elements, like every other collection in Scala/Java.
As far as I know there is no real rationale behind this design decision. Having 64-bit collections would be certainly nice and support for arrays with 64bit indices are common for Java, but it is hard to integrate that into the existing language/collection framework. Some people say that the JVM is limited to a total of 4 billion objects, but I couldn't verify that.

Implementing a Measured value in Scala

A Measured value consists of (typically nonnegative) floating-point number and unit-of-measure. The point is to represent real-world quantities, and the rules that govern them. Here's an example:
scala> val oneinch = Measure(1.0, INCH)
oneinch : Measure[INCH] = Measure(1.0)
scala> val twoinch = Measure(2.0, INCH)
twoinch : Measure[INCH] = Measure(2.0)
scala> val onecm = Measure(1.0, CM)
onecm : Measure[CM] = Measure(1.0)
scala> oneinch + twoinch
res1: Measure[INCH] = Measure(3.0)
scala> oneinch + onecm
res2: Measure[INCH] = Measure(1.787401575)
scala> onecm * onecm
res3: Measure[CMSQ] = Measure(1.0)
scala> onecm * oneinch
res4: Measure[CMSQ] = Measure(2.54)
scala> oncem * Measure(1.0, LITER)
console>:7: error: conformance mismatch
scala> oneinch * 2 == twoinch
res5: Boolean = true
Before you get too excited, I haven't implemented this, I just dummied up a REPL session. I'm not even sure of the syntax, I just want to be able to handle things like adding Measured quantities (even with mixed units), multiplying Measured quantities, and so on, and ideally, I like Scala's vaunted type-system to guarantee at compile-time that expressions make sense.
My questions:
Is there extant terminology for this problem?
Has this already been done in Scala?
If not, how would I represent concepts like "length" and "length measured in meters"?
Has this been done in some other language?
A $330-million Mars probe was lost because the contractor was using yards and pounds and NASA was using meters and newtons. A Measure library would have prevented the crash.
F# has support for it, see for example this link for an introduction. There has been some work done in Scala on Units, for example here and here. There is a Scala compiler plugin as well, as described in this blog post. I briefly tried to install it, but using Scala 2.8.1, I got an exception when I started up the REPL, so I'm not sure whether this plugin is actively maintained at the moment.
Well, this functionality exists in Java, meaning you can use it directly in Scala.
jsr-275, which was moved to google code. jscience implements the spec. Here's a good introduction. If you want a better interface, I'd use this as a base and build a wrapper around it.
Your question is fully answered with one word. You can thank me later.
FRINK. http://futureboy.us/frinkdocs/
FYI, I have developed a Scalar class in Scala to represent physical units. I am currently using it for my R&D work in air traffic control, and it is working well for me. It does not check for unit consistency at compile time, but it checks at run time. I have a unique scheme for easily substituting it with basic numeric types for efficiency after the application is tested. You can find the code and the user guide at
http://russp.us/scalar-scala.htm
Here is the summary from the website:
Summary-- A Scala class was designed to represent physical scalars and to eliminate errors involving implicit physical units (e.g., confusing radians and degrees). The standard arithmetic operators are overloaded to provide syntax identical to that for basic numeric types. The Scalar class itself does not define any units but is part of a package that includes a complete implementation of the standard metric system of units and many common non-metric units. The scalar package also allows the user to define a specialized or reduced set of physical units for any particular application or domain. Once an application has been developed and tested, the Scalar class can be switched off at compile time to achieve the execution efficiency of operations on basic numeric types, which are an order of magnitude faster. The scalar class can also be used for discrete units to enforce type checking of integer counts, thereby enhancing the static type checking of Scala with additional dynamic type checking.
Let me clarify my previous post. I should have said, "These kinds of errors ["meter/yard conversion errors"] are automatically AVOIDED (not "handled") by simply using my Scalar class. All unit conversions are done automatically. That's the easy part.
The harder part is the checking for unit inconsistencies, such as adding a length to a velocity. This is where the issue of dynamic vs. static type checking comes up. I agree that static checking is generally preferable, but only if it can be done without sacrificing usability and convenience.
I have seen at least two "projects" for static checking of units, but I have never heard of anyone actually using them for real work. If someone knows of a case where they were used, please let me know. Until you use software for real work, you don't know what sorts of issues will come up.
As I wrote above, I am currently using my Scalar class (http://russp.us/scalar-scala.htm) for my R&D work in ATC. I've had to make many tweaks along the way for usability and convenience, but it is working well for me. I would be willing to consider a static units implementation if a proven one comes along, but for now I feel that I have essentially 99% of the value of such a thing. Hey, the vast majority of scientists and engineers just use "Doubles," so cut me some slack!
"Yeah, ATC software with run-time type checking? I can see headlines now: "Flight 34 Brought Down By Meter/Yard Conversion"."
Sorry, but you don't know what you're talking about. ATC software is tested for years before it is deployed. That is enough time to catch unit inconsistency errors.
More importantly, meter/yard conversions are not even an issue here. These kinds of errors are automatically handled simply by using my Scalar class. For those kinds of errors, you need neither static nor dynamic checking. The issue of static vs. dynamic checking comes up only for unit inconsistencies, as in adding length to time. These kinds of errors are less common and are typically caught with dynamic checking on the first test run.
By the way, the interface here is terrible.