Scala Range.Double missing last element - scala

I am trying to create a list of numBins numbers evenly spaced in the range [lower,upper). Of course, there are floating point issues and this approach is not the best. The result of using Range.Double, however, surprises me as the element missing is not close to the upper bound at all.
Setup:
val lower = -1d
val upper = 1d
val numBins = 11
val step = (upper-lower)/numBins // step = 0.18181818181818182
Problem:
scala> Range.Double(lower, upper, step)
res0: scala.collection.immutable.NumericRange[Double] = NumericRange(-1.0, -0.8181818181818182, -0.6363636363636364, -0.45454545454545453, -0.2727272727272727, -0.0909090909090909, 0.09090909090909093, 0.27272727272727276, 0.4545454545454546, 0.6363636363636364)
Issue: The list seems to be one element short. 0.8181818181818183 is one step further, and is less than 1.
Workaround:
Scala> for (bin <- 0 until numBins) yield lower + bin * step
res1: scala.collection.immutable.IndexedSeq[Double] = Vector(-1.0, -0.8181818181818181, -0.6363636363636364, -0.4545454545454546, -0.2727272727272727, -0.09090909090909083, 0.09090909090909083, 0.2727272727272727, 0.4545454545454546, 0.6363636363636365, 0.8181818181818183)
This result now contains the expected number of elements, including 0.818181..

I think the root cause of your problem is some features in implementation of toString for NumericRange
217 override def toString() = {
218 val endStr = if (length > Range.MAX_PRINT) ", ... )" else ")"
219 take(Range.MAX_PRINT).mkString("NumericRange(", ", ", endStr)
220 }
UPD: It's not about toString. Some other methods like map and foreach cut last elements from returned collection.
Anyway by checking size of collection you've got - you'll find out - all elements are there.
What you've done in your workaround example - is used different underlying datatype.

Related

How to generate 15 digit random number using Scala

I am new to Scala programming, I want to generate random number with 15 digits, So can you please let share some example. I have tried the below code to get the alpha number string with 10 digits.
var ranstr = s"${(Random.alphanumeric take 10).mkString}"
print("ranstr", ranstr)
You need to pay attention to the return type. You cannot have a 15-digit Int because that type is a 32-bit signed integer, meaning that it's maximum value is a little over 2B. Even getting a 10-digit number means you're at best getting a number between 1B and the maximum value of Int.
Other answers go in the detail of how to get a 15-digits number using Long. In your comment you mentioned between, but because of the limitation I mentioned before, using Ints will not allow you to go beyond the 9 digits in your example. You can, however, explicitly annotate your numeric literals with a trailing L to make them Long and achieve what you want as follows:
Random.between(100000000000000L, 1000000000000000L)
Notice that the documentation for between says that the last number is exclusive.
If you're interested in generating arbitrarily large numbers, a String might get the job done, as in the following example:
import scala.util.Random
import scala.collection.View
def nonZeroDigit: Char = Random.between(49, 58).toChar
def digit: Char = Random.between(48, 58).toChar
def randomNumber(length: Int): String = {
require(length > 0, "length must be strictly positive")
val digits = View(nonZeroDigit) ++ View.fill(length - 1)(digit)
digits.mkString
}
randomNumber(length = 1)
randomNumber(length = 10)
randomNumber(length = 15)
randomNumber(length = 40)
Notice that when converting an Int to a Char what you get is the character encoded by that number, which isn't necessarily the same as the digit represented by the Int itself. The numbers you see in the functions from the ASCII table (odds are it's good enough for what you want to do).
If you really need a numeric type, for arbitrarily large integers you will need to use BigInt. One of its constructors allows you to parse a number from a string, so you can re-use the code above as follows:
import scala.math.BigInt
BigInt(randomNumber(length = 15))
BigInt(randomNumber(length = 40))
You can play around with this code here on Scastie.
Notice that in my example, in order to keep it simple, I'm forcing the first digit of the random number to not be zero. This means that the number 0 itself will never be a possible output. If you want that to be the case if one asks for a 1-digit long number, you're advised to tailor the example to your needs.
A similar approach to that by Alin's foldLeft, based here in scanLeft, where the intermediate random digits are first collected into a Vector and then concatenated as a BigInt, while ensuring the first random digit (see initialization value in scanLeft) is greater than zero,
import scala.util.Random
import scala.math.BigInt
def randGen(n: Int): BigInt = {
val xs = (1 to n-1).scanLeft(Random.nextInt(9)+1) {
case (_,_) => Random.nextInt(10)
}
BigInt(xs.mkString)
}
To notice that Random.nextInt(9) will deliver a random value between 0 and 8, thus we add 1 to shift the possibble values from 1 to 9. Thus,
scala> (1 to 15).map(randGen(_)).foreach(println)
8
34
623
1597
28474
932674
5620336
66758916
186155185
2537294343
55233611616
338190692165
3290592067643
93234908948070
871337364826813
There a lot of ways to do this.
The most common way is to use Random.nextInt(10) to generate a digit between 0-9.
When building a number of a fixed size of digits, you have to make sure the first digit is never 0.
For that I'll use Random.nextInt(9) + 1 which guarantees generating a number between 1-9, a sequence with the other 14 generated digits, and a foldleft operation with the first digit as accumulator to generate the number:
val number =
Range(1, 15).map(_ => Random.nextInt(10)).foldLeft[Long](Random.nextInt(9) + 1) {
(acc, cur_digit) => acc * 10 + cur_digit
}
Normally for such big numbers it's better to represent them as sequence of characters instead of numbers because numbers can easily overflow. But since a 15 digit number fits in a Long and you asked for a number, I used one instead.
In scala we have scala.util.Random to get a random value (not only numeric), for a numeric value random have nextInt(n: Int) what return a random num < n. Read more about random
First example:
val random = new Random()
val digits = "0123456789".split("")
var result = ""
for (_ <- 0 until 15) {
val randomIndex = random.nextInt(digits.length)
result += digits(randomIndex)
}
println(result)
Here I create an instance of random and use a number from 0 to 9 to generate a random number of length 15
Second example:
val result2 = for (_ <- 0 until 15) yield random.nextInt(10)
println(result2.mkString)
Here I use the yield keyword to get an array of random integers from 0 to 9 and use mkString to combine the array into a string. Read more about yield

How to generate a random sequence of binary strings of fixed size ( say 36 bits ) in scala

I'm trying to generate a unique random sequence of 50 Binary strings of size 36 bits each. I tried doing nextInt followed by toBinaryString which didn't solve my problem as nextInt don't support such big numbers and also checked nextString which generates string of some random characters (not 0/1) is there any other way to achieve this ?
And to add one more requirement I want 36 bits to be present at every time suppose if some random generator generated 3 as a number I want the output as 000...(34)11.
I'm quite new to scala, Pardon me if my question seemed irrelavant or redundant.
You can try
val r = scala.util.Random
val a: immutable.Seq[Int] = (0 to 50).map(_ => r.nextInt(1000000))
val y = a.map( x => {
val bin = x.toBinaryString
val zero = 36 - bin.length
List.fill(zero)(0).mkString("") ++ bin
})
println(Random.shuffle(y))

What is the scala equivalent of Python's Numpy np.random.choice?(Random weighted selection in scala)

I was looking for Scala's equivalent code or underlying theory for pythons np.random.choice (Numpy as np). I have a similar implementation that uses Python's np.random.choice method to select the random moves from the probability distribution.
Python's code
Input list: ['pooh', 'rabbit', 'piglet', 'Christopher'] and probabilies: [0.5, 0.1, 0.1, 0.3]
I want to select one of the value from the input list given the associated probability of each input element.
The Scala standard library has no equivalent to np.random.choice but it shouldn't be too difficult to build your own, depending on which options/features you want to emulate.
Here, for example, is a way to get an infinite Stream of submitted items, with the probability of any one item weighted relative to the others.
def weightedSelect[T](input :(T,Int)*): Stream[T] = {
val items :Seq[T] = input.flatMap{x => Seq.fill(x._2)(x._1)}
def output :Stream[T] = util.Random.shuffle(items).toStream #::: output
output
}
With this each input item is given with a multiplier. So to get an infinite pseudorandom selection of the characters c and v, with c coming up 3/5ths of the time and v coming up 2/5ths of the time:
val cvs = weightedSelect(('c',3),('v',2))
Thus the rough equivalent of the np.random.choice(aa_milne_arr,5,p=[0.5,0.1,0.1,0.3]) example would be:
weightedSelect("pooh"-> 5
,"rabbit" -> 1
,"piglet" -> 1
,"Christopher" -> 3).take(5).toArray
Or perhaps you want a better (less pseudo) random distribution that might be heavily lopsided.
def weightedSelect[T](items :Seq[T], distribution :Seq[Double]) :Stream[T] = {
assert(items.length == distribution.length)
assert(math.abs(1.0 - distribution.sum) < 0.001) // must be at least close
val dsums :Seq[Double] = distribution.scanLeft(0.0)(_+_).tail
val distro :Seq[Double] = dsums.init :+ 1.1 // close a possible gap
Stream.continually(items(distro.indexWhere(_ > util.Random.nextDouble())))
}
The result is still an infinite Stream of the specified elements but the passed-in arguments are a bit different.
val choices :Stream[String] = weightedSelect( List("this" , "that")
, Array(4998/5000.0, 2/5000.0))
// let's test the distribution
val (choiceA, choiceB) = choices.take(10000).partition(_ == "this")
choiceA.length //res0: Int = 9995
choiceB.length //res1: Int = 5 (not bad)

Different result returned using Scala Collection par in a series of runs

I have tasks that I want to execute concurrently and each task takes substantial amount of memory so I have to execute them in batches of 2 to conserve memory.
def runme(n: Int = 120) = (1 to n).grouped(2).toList.flatMap{tuple =>
tuple.par.map{x => {
println(s"Running $x")
val s = (1 to 100000).toList // intentionally to make the JVM allocate a sizeable chunk of memory
s.sum.toLong
}}
}
val result = runme()
println(result.size + " => " + result.sum)
The result I expected from the output was 120 => 84609924480 but the output was rather random. The returned collection size differed from execution to execution. Most of the time there was missing count even though all the futures were executed looking at the console. I thought flatMap waits the parallel executions in map to complete before returning the complete. What should I do to always get the right result using par? Thanks
Just for the record: changing the underlying collection in this case shouldn't change the output of your program. The problem is related to this known bug. It's fixed from 2.11.6, so if you use that (or higher) Scala version, you should not see the strange behavior.
And about the overflow, I still think that your expected value is wrong. You can check that the sum is overflowing because the list is of integers (which are 32 bit) while the total sum exceeds the integer limits. You can check it with the following snippet:
val n = 100000
val s = (1 to n).toList // your original code
val yourValue = s.sum.toLong // your original code
val correctValue = 1l * n * (n + 1) / 2 // use math formula
var bruteForceValue = 0l // in case you don't trust math :) It's Long because of 0l
for (i ← 1 to n) bruteForceValue += i // iterate through range
println(s"yourValue = $yourValue")
println(s"correctvalue = $correctValue")
println(s"bruteForceValue = $bruteForceValue")
which produces the output
yourValue = 705082704
correctvalue = 5000050000
bruteForceValue = 5000050000
Cheers!
Thanks #kaktusito.
It worked after I changed the grouped list to Vector or Seq i.e. (1 to n).grouped(2).toList.flatMap{... to (1 to n).grouped(2).toVector.flatMap{...

Consolidating a data table in Scala

I am working on a small data analysis tool, and practicing/learning Scala in the process. However I got stuck at a small problem.
Assume data of type:
X Gr1 x_11 ... x_1n
X Gr2 x_21 ... x_2n
..
X GrK x_k1 ... x_kn
Y Gr1 y_11 ... y_1n
Y Gr3 y_31 ... y_3n
..
Y Gr(K-1) ...
Here I have entries (X,Y...) that may or may not exist in up to K groups, with a series of values for each group. What I want to do is pretty simple (in theory), I would like to consolidate the rows that belong to the same "entity" in different groups. so instead of multiple lines that start with X, I want to have one row with all values from x_11 to x_kn in columns.
What makes things complicated however is that not all entities exist in all groups. So wherever there's "missing data" I would like to pad with for instance zeroes, or some string that denotes a missing value. So if I have (X,Y,Z) in up to 3 groups, the type I table I want to have is as follows:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 N/A N/A y_31 y_32
Z N/A N/A z_21 z_22 N/A N/A
I have been stuck trying to figure this out, is there a smart way to use List functions to solve this?
I wrote this simple loop:
for {
(id, hitlist) <- hits.groupBy(_.acc)
h <- hitlist
} println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
to able to generate the tables that look like the example above. Note that, my original data is of a different format and layout,but that has little to do with the problem at hand, thus I have skipped all steps regarding parsing. I should be able to use groupBy in a better way that actually solves this for me, but I can't seem to get there.
Then I modified my loop mapping the hits to ratios and appending them to one another:
for ((id, hitlist) <- hits.groupBy(_.acc)){
val l = hitlist.map(_.ratios).foldRight(List[Double]()){
(l1: List[Double], l2: List[Double]) => l1 ::: l2
}
println(id + "\t" + l.mkString("\t"))
//println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
}
That gets me one step closer but still no cigar! Instead of a fully padded "matrix" I get a jagged table. Taking the example above:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 y_31 y_32
Z z_21 z_22
Any ideas as to how I can pad the table so that values from respective groups are aligned with one another? I should be able to use _.sampleId, which holds the "group membersip" for each "hit", but I am not sure how exactly. ´hits´ is a List of type Hit which is practically a wrapper for each row, giving convenience methods for getting individual values, so essentially a tuple which have "named indices" (such as .acc, .sampleId..)
(I would like to solve this problem without hardcoding the number of groups, as it might change from case to case)
Thanks!
This is a bit of a contrived example, but I think you can see where this is going:
case class Hit(acc:String, subAcc:String, value:Int)
val hits = List(Hit("X", "x_11", 1), Hit("X", "x_21", 2), Hit("X", "x_31", 3))
val kMax = 4
val nMax = 2
for {
(id, hitlist) <- hits.groupBy(_.acc)
k <- 1 to kMax
n <- 1 to nMax
} yield {
val subId = "x_%s%s".format(k, n)
val row = hitlist.find(h => h.subAcc == subId).getOrElse(Hit(id, subId, 0))
println(row)
}
//Prints
Hit(X,x_11,1)
Hit(X,x_12,0)
Hit(X,x_21,2)
Hit(X,x_22,0)
Hit(X,x_31,3)
Hit(X,x_32,0)
Hit(X,x_41,0)
Hit(X,x_42,0)
If you provide more information on your hits lists then we could probably come with something a little more accurate.
I have managed to solve this problem with the following code, I am putting it here as an answer in case someone else runs into a similar problem and requires some help. The use of find() from Noah's answer was definitely very useful, so do give him a +1 in case this code snippet helps you out.
val samples = hits.groupBy(_.sampleId).keys.toList.sorted
for ((id, hitlist) <- hits.groupBy(_.acc)) {
val ratios =
for (sample <- samples)
yield hitlist.find(h => h.sampleId == sample).map(_.ratios)
.getOrElse(List(Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN))
println(id + "\t" + ratios.flatten.mkString("\t"))
}
I figure it's not a very elegant or efficient solution, as I have two calls to groupBy and I would be interested to see better solutions to this problem.