Restriction on Range - scala

i'm surprised. Why was made restriction of implementation to type Range, is whose the size limited by Int.MaxValue?
Thanks.

From the NumericRange docs,
NumericRange is a more generic version of the Range class which works
with arbitrary types. It must be supplied with an Integral
implementation of the range type.
Factories for likely types include Range.BigInt, Range.Long, and
Range.BigDecimal. Range.Int exists for completeness, but the Int-based
scala.Range should be more performant.
val r1 = new Range(0, 100, 1)
val veryBig = Int.MaxValue.toLong + 1
val r2 = Range.Long(veryBig, veryBig + 100, 1)
assert(r1 sameElements r2.map(_ - veryBig))

In my opinion the other answer is just wrong.
It demonstrates that you can use other number types, but this doesn't change the fact that a Range can only hold 2³¹ elements, like every other collection in Scala/Java.
As far as I know there is no real rationale behind this design decision. Having 64-bit collections would be certainly nice and support for arrays with 64bit indices are common for Java, but it is hard to integrate that into the existing language/collection framework. Some people say that the JVM is limited to a total of 4 billion objects, but I couldn't verify that.

Related

Scala | FilterNot vs diff

Given:
val mySet = Array(1,2,3).toSet
val myArr = Array(1,2,2)
Code snippet 1:
val difference = mySet.filterNot(myArr.toSet)
Code snippet 2:
val difference = mySet diff myArr.toSet
From above two ways of finding difference, which one will be faster for huge sets. I am new to scala. Is predicate for filterNot going to initialize new set for each value of mySet.
Once the size of a set is > 4 then it will be a HashSet.
I suspect diff will be faster because it is implemented for diffing two HashSets whereas filterNot is more general purpose.
Considering that we have no idea what kind of implementation is used underneath (Set might be HashSet or ListSet) I would be very careful of any guessing about the performance. One version might have one algorithm of picking it, next version might use a different one. I suggest that you pick an implementation explicitly (e.g. arr.to(HashSet) in 2.13) and do some actual benchmarks to check that performance is acceptable.
And if the type you use underneath is Int then probably you would benefit from using something like BitSet or other specialized data structure.

Independent random number generators

I need to create independent random number generators in scala from a given one, something similar to haskell's split function. Can I use the current number generator to produce ints or longs, and use these as seed to create new generators?
val rng: Random
val seed1 = rng.nextLong()
val seed2 = rng.nextLong()
val rng1 = new Random(seed1)
val rng2 = new Random(seed2)
Is this a good way to do it? Or does it screw the sequence of randomness (for example, by relying on Ints or Longs which may carry less information than the actual state of a random number generator)?
It depends on the specific algorithm used for generating pseudo-random numbers. If the documentation doesn't promise it gives independent streams (and Java's doesn't), assume it won't work well. Terms like "multiple streams", "parallel streams", or "splittable" can also be used.
I believe MRG32k3a is the most popular generator with multiple stream support. You can easily find Java implementations online, but the ones I did are GPL-licensed, so I refrained from including them. PCG also supports multiple streams and there is a Scala implementation in Spire. (And a Java implementation at https://github.com/alexeyr/pcg-java).
I think it depends on what your usecase is.
If you just want to make sure the rngs numbers advance independently (e.g., each thread or actor gets an rng and you want to stay deterministic), your approach is fine.

Scala Buffer: Size or Length?

I am using a mutable Buffer and need to find out how many elements it has.
Both size and length methods are defined, inherited from separate traits.
Is there any actual performance difference, or can they be considered exact synonyms?
They are synonyms, mostly a result of Java's decision of having size for collections and length for Array and String. One will always be defined in terms of the other, and you can easily see which is which by looking at the source code, the link for which is provided on scaladoc. Just find the defining trait, open the source code, and search for def size or def length.
In this case, they can be considered synonyms. You may want to watch out with some other cases such as Array - whilst length and size will always return the same result, in versions prior to Scala 2.10 there may be a boxing overhead for calling size (which is provided by a Scala wrapper around the Array), whereas length is provided by the underlying Java Array.
In Scala 2.10, this overhead has been removed by use of a value class providing the size method, so you should feel free to use whichever method you like.
As of Scala-2.11, these methods may have different performance. For example, consider this code:
val bigArray = Array.fill(1000000)(0)
val beginTime = System.nanoTime()
var i = 0
while (i < 2000000000) {
i += 1
bigArray.length
}
val endTime = System.nanoTime()
println(endTime - beginTime)
sys.exit(-1)
Running this on my amd64 computer gives about 2423834 nanos time (varies from time to time).
Now, if I change the length method to size, it will become about 70764719 nanos time.
This is more than 20x slower.
Why does it happen? I didn't dig it through, I don't know. But there are scenarios where length and size perform drastically different.
They are synonyms, as the scaladoc for Buffer.size states:
The size of this buffer, equivalent to length.
The scaladoc for Buffer.length is explicit too:
The length of the buffer. Note: xs.length and xs.size yield the same result.
Simple advice: refer to the scaladoc before asking a question.
UPDATE: Just saw your edit adding mention of performance. As Daniel C. Sobral aid, one is normally always implemented in term of the other, so they have the same performance.

Why do immutable objects enable functional programming?

I'm trying to learn scala and I'm unable to grasp this concept. Why does making an object immutable help prevent side-effects in functions. Can anyone explain like I'm five?
Interesting question, a bit difficult to answer.
Functional programming is very much about using mathematics to reason about programs. To do so, one needs a formalism that describe the programs and how one can make proofs about properties they might have.
There are many models of computation that provide such formalisms, such as lambda calculus and turing machines. And there's a certain degree of equivalency between them (see this question, for a discussion).
In a very real sense, programs with mutability and some other side effects have a direct mapping to functional program. Consider this example:
a = 0
b = 1
a = a + b
Here are two ways of mapping it to functional program. First one, a and b are part of a "state", and each line is a function from a state into a new state:
state1 = (a = 0, b = ?)
state2 = (a = state1.a, b = 1)
state3 = (a = state2.a + state2.b, b = state2.b)
Here's another, where each variable is associated with a time:
(a, t0) = 0
(b, t1) = 1
(a, t2) = (a, t0) + (b, t1)
So, given the above, why not use mutability?
Well, here's the interesting thing about math: the less powerful the formalism is, the easier it is to make proofs with it. Or, to put it in other words, it's too hard to reason about programs that have mutability.
As a consequence, there's very little advance regarding concepts in programming with mutability. The famous Design Patterns were not arrived at through study, nor do they have any mathematical backing. Instead, they are the result of years and years of trial and error, and some of them have since proved to be misguided. Who knows about the other dozens "design patterns" seen everywhere?
Meanwhile, Haskell programmers came up with Functors, Monads, Co-monads, Zippers, Applicatives, Lenses... dozens of concepts with mathematical backing and, most importantly, actual patterns of how code is composed to make up programs. Things you can use to reason about your program, increase reusability and improve correctness. Take a look at the Typeclassopedia for examples.
It's no wonder people not familiar with functional programming get a bit scared with this stuff... by comparison, the rest of the programming world is still working with a few decades-old concepts. The very idea of new concepts is alien.
Unfortunately, all these patterns, all these concepts, only apply with the code they are working with does not contain mutability (or other side effects). If it does, then their properties cease to be valid, and you can't rely on them. You are back to guessing, testing and debugging.
In short, if a function mutates an object then it has side effects. Mutation is a side effect. This is just true by definition.
In truth, in a purely functional language it should not matter if an object is technically mutable or immutable, because the language will never "try" to mutate an object anyway. A pure functional language doesn't give you any way to perform side effects.
Scala is not a pure functional language, though, and it runs in the Java environment in which side effects are very popular. In this environment, using objects that are incapable of mutation encourages you to use a pure functional style because it makes a side-effect oriented style impossible. You are using data types to enforce purity because the language does not do it for you.
Now I will say a bunch of other stuff in the hope that it helps this make sense to you.
Fundamental to the concept of a variable in functional languages is referential transparency.
Referential transparency means that there is no difference between a value, and a reference to that value. In a language where this is true, it makes it much simpler to think about a program works, since you never have to stop and ask, is this a value, or a reference to a value? Anyone who's ever programmed in C recognizes that a great part of the challenge of learning that paradigm is knowing which is which at all times.
In order to have referential transparency, the value that a reference refers to can never change.
(Warning, I'm about to make an analogy.)
Think of it this way: in your cell phone, you have saved some phone numbers of other people's cell phones. You assume that whenever you call that phone number, you will reach the person you intend to talk to. If someone else wants to talk to your friend, you give them the phone number and they reach that same person.
If someone changes their cell phone number, this system breaks down. Suddenly, you need to get their new phone number if you want to reach them. Maybe you call the same number six months later and reach a different person. Calling the same number and reaching a different person is what happens when functions perform side effects: you have what seems to be the same thing, but you try to use it, it turns out it's different now. Even if you expected this, what about all the people you gave that number to, are you going to call them all up and tell them that the old number doesn't reach the same person anymore?
You counted on the phone number corresponding to that person, but it didn't really. The phone number system lacks referential transparency: the number isn't really ALWAYS the same as the person.
Functional languages avoid this problem. You can give out your phone number and people will always be able to reach you, for the rest of your life, and will never reach anybody else at that number.
However, in the Java platform, things can change. What you thought was one thing, might turn into another thing a minute later. If this is the case, how can you stop it?
Scala uses the power of types to prevent this, by making classes that have referential transparency. So, even though the language as a whole isn't referentially transparent, your code will be referentially transparent as long as you use immutable types.
Practically speaking, the advantages of coding with immutable types are:
Your code is simpler to read when the reader doesn't have to look out for surprising side effects.
If you use multiple threads, you don't have to worry about locking because shared objects can never change. When you have side effects, you have to really think through the code and figure out all the places where two threads might try to change the same object at the same time, and protect against the problems that this might cause.
Theoretically, at least, the compiler can optimize some code better if it uses only immutable types. I don't know if Java can do this effectively, though, since it allows side effects. This is a toss-up at best, anyway, because there are some problems that can be solved much more efficiently by using side effects.
I'm running with this 5 year old explanation:
class Account(var myMoney:List[Int] = List(10, 10, 1, 1, 1, 5)) {
def getBalance = println(myMoney.sum + " dollars available")
def myMoneyWithInterest = {
myMoney = myMoney.map(_ * 2)
println(myMoney.sum + " dollars will accru in 1 year")
}
}
Assume we are at an ATM and it is using this code to give us account information.
You do the following:
scala> val myAccount = new Account()
myAccount: Account = Account#7f4a6c40
scala> myAccount.getBalance
28 dollars available
scala> myAccount.myMoneyWithInterest
56 dollars will accru in 1 year
scala> myAccount.getBalance
56 dollars available
We mutated the account balance when we wanted to check our current balance plus a years worth of interest. Now we have an incorrect account balance. Bad news for the bank!
If we were using val instead of var to keep track of myMoney in the class definition, we would not have been able to mutate the dollars and raise our balance.
When defining the class (in the REPL) with val:
error: reassignment to val
myMoney = myMoney.map(_ * 2
Scala is telling us that we wanted an immutable value but are trying to change it!
Thanks to Scala, we can switch to val, re-write our myMoneyWithInterest method and rest assured that our Account class will never alter the balance.
One important property of functional programming is: If I call the same function twice with the same arguments I'll get the same result. This makes reasoning about code much easier in many cases.
Now imagine a function returning the attribute content of some object. If that content can change the function might return different results on different calls with the same argument. => no more functional programming.
First a few definitions:
A side effect is a change in state -- also called a mutation.
An immutable object is an object which does not support mutation, (side effects).
A function which is passed mutable objects (either as parameters or in the global environment) may or may not produce side effects. This is up to the implementation.
However, it is impossible for a function which is passed only immutable objects (either as parameters or in the global environment) to produce side effects. Therefore, exclusive use of immutable objects will preclude the possibility of side effects.
Nate's answer is great, and here is some example.
In functional programming, there is an important feature that when you call a function with same argument, you always get same return value.
This is always true for immutable objects, because you can't modify them after create it:
class MyValue(val value: Int)
def plus(x: MyValue) = x.value + 10
val x = new MyValue(10)
val y = plus(x) // y is 20
val z = plus(x) // z is still 20, plus(x) will always yield 20
But if you have mutable objects, you can't guarantee that plus(x) will always return same value for same instance of MyValue.
class MyValue(var value: Int)
def plus(x: MyValue) = x.value + 10
val x = new MyValue(10)
val y = plus(x) // y is 20
x.value = 30
val z = plus(x) // z is 40, you can't for sure what value will plus(x) return because MyValue.value may be changed at any point.
Why do immutable objects enable functional programming?
They don't.
Take one definition of "function," or "prodecure," "routine" or "method," which I believe applies to many programming languages: "A section of code, typically named, accepting arguments and/or returning a value."
Take one definition of "functional programming:" "Programming using functions." The ability to program with functions is indepedent of whether state is modified.
For instance, Scheme is considered a functional programming language. It features tail calls, higher-order functions and aggregate operations using functions. It also has mutable objects. While mutability destroys some nice mathematical qualities, it does not necessarily prevent "functional programming."
I've read all the answers and they don't satisfy me, because they mostly talk about "immutability", and not about its relation to FP.
The main question is:
Why do immutable objects enable functional programming?
So I've searched a bit more and I have another answer, I believe the easy answer to this question is: "Because Functional Programming is basically defined on the basis of functions that are easy to reason about". Here's the definition of Functional Programming:
The process of building software by composing pure functions.
If a function is not pure -- which means receiving the same input, it's not guaranteed to always produce the same output (e.g., if the function relies on a global object, or date and time, or a random number to compute the output) -- then that function is unpredictable, that's it! Now exactly the same story goes about "immutability" as well, if objects are not immutable, a function with the same object as its input may have different results (aka side effects) each time used, and this will make it hard to reason about the program.
I first tried to put this in a comment, but it got longer than the limit, I'm by no means a pro so please take this answer with a grain of salt.

Implementing a Measured value in Scala

A Measured value consists of (typically nonnegative) floating-point number and unit-of-measure. The point is to represent real-world quantities, and the rules that govern them. Here's an example:
scala> val oneinch = Measure(1.0, INCH)
oneinch : Measure[INCH] = Measure(1.0)
scala> val twoinch = Measure(2.0, INCH)
twoinch : Measure[INCH] = Measure(2.0)
scala> val onecm = Measure(1.0, CM)
onecm : Measure[CM] = Measure(1.0)
scala> oneinch + twoinch
res1: Measure[INCH] = Measure(3.0)
scala> oneinch + onecm
res2: Measure[INCH] = Measure(1.787401575)
scala> onecm * onecm
res3: Measure[CMSQ] = Measure(1.0)
scala> onecm * oneinch
res4: Measure[CMSQ] = Measure(2.54)
scala> oncem * Measure(1.0, LITER)
console>:7: error: conformance mismatch
scala> oneinch * 2 == twoinch
res5: Boolean = true
Before you get too excited, I haven't implemented this, I just dummied up a REPL session. I'm not even sure of the syntax, I just want to be able to handle things like adding Measured quantities (even with mixed units), multiplying Measured quantities, and so on, and ideally, I like Scala's vaunted type-system to guarantee at compile-time that expressions make sense.
My questions:
Is there extant terminology for this problem?
Has this already been done in Scala?
If not, how would I represent concepts like "length" and "length measured in meters"?
Has this been done in some other language?
A $330-million Mars probe was lost because the contractor was using yards and pounds and NASA was using meters and newtons. A Measure library would have prevented the crash.
F# has support for it, see for example this link for an introduction. There has been some work done in Scala on Units, for example here and here. There is a Scala compiler plugin as well, as described in this blog post. I briefly tried to install it, but using Scala 2.8.1, I got an exception when I started up the REPL, so I'm not sure whether this plugin is actively maintained at the moment.
Well, this functionality exists in Java, meaning you can use it directly in Scala.
jsr-275, which was moved to google code. jscience implements the spec. Here's a good introduction. If you want a better interface, I'd use this as a base and build a wrapper around it.
Your question is fully answered with one word. You can thank me later.
FRINK. http://futureboy.us/frinkdocs/
FYI, I have developed a Scalar class in Scala to represent physical units. I am currently using it for my R&D work in air traffic control, and it is working well for me. It does not check for unit consistency at compile time, but it checks at run time. I have a unique scheme for easily substituting it with basic numeric types for efficiency after the application is tested. You can find the code and the user guide at
http://russp.us/scalar-scala.htm
Here is the summary from the website:
Summary-- A Scala class was designed to represent physical scalars and to eliminate errors involving implicit physical units (e.g., confusing radians and degrees). The standard arithmetic operators are overloaded to provide syntax identical to that for basic numeric types. The Scalar class itself does not define any units but is part of a package that includes a complete implementation of the standard metric system of units and many common non-metric units. The scalar package also allows the user to define a specialized or reduced set of physical units for any particular application or domain. Once an application has been developed and tested, the Scalar class can be switched off at compile time to achieve the execution efficiency of operations on basic numeric types, which are an order of magnitude faster. The scalar class can also be used for discrete units to enforce type checking of integer counts, thereby enhancing the static type checking of Scala with additional dynamic type checking.
Let me clarify my previous post. I should have said, "These kinds of errors ["meter/yard conversion errors"] are automatically AVOIDED (not "handled") by simply using my Scalar class. All unit conversions are done automatically. That's the easy part.
The harder part is the checking for unit inconsistencies, such as adding a length to a velocity. This is where the issue of dynamic vs. static type checking comes up. I agree that static checking is generally preferable, but only if it can be done without sacrificing usability and convenience.
I have seen at least two "projects" for static checking of units, but I have never heard of anyone actually using them for real work. If someone knows of a case where they were used, please let me know. Until you use software for real work, you don't know what sorts of issues will come up.
As I wrote above, I am currently using my Scalar class (http://russp.us/scalar-scala.htm) for my R&D work in ATC. I've had to make many tweaks along the way for usability and convenience, but it is working well for me. I would be willing to consider a static units implementation if a proven one comes along, but for now I feel that I have essentially 99% of the value of such a thing. Hey, the vast majority of scientists and engineers just use "Doubles," so cut me some slack!
"Yeah, ATC software with run-time type checking? I can see headlines now: "Flight 34 Brought Down By Meter/Yard Conversion"."
Sorry, but you don't know what you're talking about. ATC software is tested for years before it is deployed. That is enough time to catch unit inconsistency errors.
More importantly, meter/yard conversions are not even an issue here. These kinds of errors are automatically handled simply by using my Scalar class. For those kinds of errors, you need neither static nor dynamic checking. The issue of static vs. dynamic checking comes up only for unit inconsistencies, as in adding length to time. These kinds of errors are less common and are typically caught with dynamic checking on the first test run.
By the way, the interface here is terrible.