How to generate a random sequence of binary strings of fixed size ( say 36 bits ) in scala - scala

I'm trying to generate a unique random sequence of 50 Binary strings of size 36 bits each. I tried doing nextInt followed by toBinaryString which didn't solve my problem as nextInt don't support such big numbers and also checked nextString which generates string of some random characters (not 0/1) is there any other way to achieve this ?
And to add one more requirement I want 36 bits to be present at every time suppose if some random generator generated 3 as a number I want the output as 000...(34)11.
I'm quite new to scala, Pardon me if my question seemed irrelavant or redundant.

You can try
val r = scala.util.Random
val a: immutable.Seq[Int] = (0 to 50).map(_ => r.nextInt(1000000))
val y = a.map( x => {
val bin = x.toBinaryString
val zero = 36 - bin.length
List.fill(zero)(0).mkString("") ++ bin
})
println(Random.shuffle(y))

Related

How to generate 15 digit random number using Scala

I am new to Scala programming, I want to generate random number with 15 digits, So can you please let share some example. I have tried the below code to get the alpha number string with 10 digits.
var ranstr = s"${(Random.alphanumeric take 10).mkString}"
print("ranstr", ranstr)
You need to pay attention to the return type. You cannot have a 15-digit Int because that type is a 32-bit signed integer, meaning that it's maximum value is a little over 2B. Even getting a 10-digit number means you're at best getting a number between 1B and the maximum value of Int.
Other answers go in the detail of how to get a 15-digits number using Long. In your comment you mentioned between, but because of the limitation I mentioned before, using Ints will not allow you to go beyond the 9 digits in your example. You can, however, explicitly annotate your numeric literals with a trailing L to make them Long and achieve what you want as follows:
Random.between(100000000000000L, 1000000000000000L)
Notice that the documentation for between says that the last number is exclusive.
If you're interested in generating arbitrarily large numbers, a String might get the job done, as in the following example:
import scala.util.Random
import scala.collection.View
def nonZeroDigit: Char = Random.between(49, 58).toChar
def digit: Char = Random.between(48, 58).toChar
def randomNumber(length: Int): String = {
require(length > 0, "length must be strictly positive")
val digits = View(nonZeroDigit) ++ View.fill(length - 1)(digit)
digits.mkString
}
randomNumber(length = 1)
randomNumber(length = 10)
randomNumber(length = 15)
randomNumber(length = 40)
Notice that when converting an Int to a Char what you get is the character encoded by that number, which isn't necessarily the same as the digit represented by the Int itself. The numbers you see in the functions from the ASCII table (odds are it's good enough for what you want to do).
If you really need a numeric type, for arbitrarily large integers you will need to use BigInt. One of its constructors allows you to parse a number from a string, so you can re-use the code above as follows:
import scala.math.BigInt
BigInt(randomNumber(length = 15))
BigInt(randomNumber(length = 40))
You can play around with this code here on Scastie.
Notice that in my example, in order to keep it simple, I'm forcing the first digit of the random number to not be zero. This means that the number 0 itself will never be a possible output. If you want that to be the case if one asks for a 1-digit long number, you're advised to tailor the example to your needs.
A similar approach to that by Alin's foldLeft, based here in scanLeft, where the intermediate random digits are first collected into a Vector and then concatenated as a BigInt, while ensuring the first random digit (see initialization value in scanLeft) is greater than zero,
import scala.util.Random
import scala.math.BigInt
def randGen(n: Int): BigInt = {
val xs = (1 to n-1).scanLeft(Random.nextInt(9)+1) {
case (_,_) => Random.nextInt(10)
}
BigInt(xs.mkString)
}
To notice that Random.nextInt(9) will deliver a random value between 0 and 8, thus we add 1 to shift the possibble values from 1 to 9. Thus,
scala> (1 to 15).map(randGen(_)).foreach(println)
8
34
623
1597
28474
932674
5620336
66758916
186155185
2537294343
55233611616
338190692165
3290592067643
93234908948070
871337364826813
There a lot of ways to do this.
The most common way is to use Random.nextInt(10) to generate a digit between 0-9.
When building a number of a fixed size of digits, you have to make sure the first digit is never 0.
For that I'll use Random.nextInt(9) + 1 which guarantees generating a number between 1-9, a sequence with the other 14 generated digits, and a foldleft operation with the first digit as accumulator to generate the number:
val number =
Range(1, 15).map(_ => Random.nextInt(10)).foldLeft[Long](Random.nextInt(9) + 1) {
(acc, cur_digit) => acc * 10 + cur_digit
}
Normally for such big numbers it's better to represent them as sequence of characters instead of numbers because numbers can easily overflow. But since a 15 digit number fits in a Long and you asked for a number, I used one instead.
In scala we have scala.util.Random to get a random value (not only numeric), for a numeric value random have nextInt(n: Int) what return a random num < n. Read more about random
First example:
val random = new Random()
val digits = "0123456789".split("")
var result = ""
for (_ <- 0 until 15) {
val randomIndex = random.nextInt(digits.length)
result += digits(randomIndex)
}
println(result)
Here I create an instance of random and use a number from 0 to 9 to generate a random number of length 15
Second example:
val result2 = for (_ <- 0 until 15) yield random.nextInt(10)
println(result2.mkString)
Here I use the yield keyword to get an array of random integers from 0 to 9 and use mkString to combine the array into a string. Read more about yield

Spark: increase the size of an RDD using sample with replacement

I have a RDD[(String,Array[String])] and I need to replicate the data inside to increase the size of it.
I've read here https://stackoverflow.com/a/41787801/9759150 with replacemente you can get the same element in sample twice.
For example:
If RDD.count() is, let's say, 35 elements, and I need to generate from it an RDD with 200 elements. How can I do this?
I saw applying sample is like this:
val sampledRDD = rdd.sample(true, fraction, seed)
I do not how can I choose fraction parameter to my problem.
Thank you!
I was doing some tests and I figured out that .sample() is able to do the thing that I wanted!. The key is keep with replacement in true (as I said in the question), seed could be whatever (a number, of course), but fraction should be:
val fraction = num_new.toDouble / rdd.count() // following my examle: num_new is 200, and rdd.count() is 35
val sampledRDD = rdd.sample(true, fraction, seed)
In this case, fraction = 5.71428571428571, that means the sampledRDD will have each element of it fraction repeated times.
You can see this answer for more information about the meaning of fraction in rdd.sample(). The short story is, it represents the probability of drawing a sample. This means the final rdd won't be guaranteed to be exactly equal to the specified fraction*original size.
I would approach this in the opposite direction:
First, generate an RDD that is simply the original RDD, repeated several times
Now, sample out of that RDD down to the size you want.
Something like:
val rdds = (1 to 10).map(_ => originalRdd)
val bigRdd = sc.union(rdds)
val sampledRdd = bigRdd.sample(true, fraction, seed)
and set fraction such that the final RDD is the size you want:
val fraction = numResultsIWant/100*originalRdd.count()
and we picked 10 there because that was the number of copies of the RDD we created.

Scala Range.Double missing last element

I am trying to create a list of numBins numbers evenly spaced in the range [lower,upper). Of course, there are floating point issues and this approach is not the best. The result of using Range.Double, however, surprises me as the element missing is not close to the upper bound at all.
Setup:
val lower = -1d
val upper = 1d
val numBins = 11
val step = (upper-lower)/numBins // step = 0.18181818181818182
Problem:
scala> Range.Double(lower, upper, step)
res0: scala.collection.immutable.NumericRange[Double] = NumericRange(-1.0, -0.8181818181818182, -0.6363636363636364, -0.45454545454545453, -0.2727272727272727, -0.0909090909090909, 0.09090909090909093, 0.27272727272727276, 0.4545454545454546, 0.6363636363636364)
Issue: The list seems to be one element short. 0.8181818181818183 is one step further, and is less than 1.
Workaround:
Scala> for (bin <- 0 until numBins) yield lower + bin * step
res1: scala.collection.immutable.IndexedSeq[Double] = Vector(-1.0, -0.8181818181818181, -0.6363636363636364, -0.4545454545454546, -0.2727272727272727, -0.09090909090909083, 0.09090909090909083, 0.2727272727272727, 0.4545454545454546, 0.6363636363636365, 0.8181818181818183)
This result now contains the expected number of elements, including 0.818181..
I think the root cause of your problem is some features in implementation of toString for NumericRange
217 override def toString() = {
218 val endStr = if (length > Range.MAX_PRINT) ", ... )" else ")"
219 take(Range.MAX_PRINT).mkString("NumericRange(", ", ", endStr)
220 }
UPD: It's not about toString. Some other methods like map and foreach cut last elements from returned collection.
Anyway by checking size of collection you've got - you'll find out - all elements are there.
What you've done in your workaround example - is used different underlying datatype.

Efficiently way to read binary files in scala

I'm trying to read a binary file (16 MB) in which I have only integers coded on 16 bits. So for that, I used chunks of 1 MB which gives me an array of bytes. For my own needs, I convert this byte array to a short array with the following function convert but reading this file with a buffer and convert it into a short array take me 5 seconds, is it a faster way than my solution ?
def convert(in: Array[Byte]): Array[Short] = in.grouped(2).map {
case Array(one) => (one << 8 | (0 toByte)).toShort
case Array(hi, lo) => (hi << 8 | lo).toShort
} .toArray
val startTime = System.nanoTime()
val file = new RandomAccessFile("foo","r")
val defaultBlockSize = 1 * 1024 * 1024
val byteBuffer = new Array[Byte](defaultBlockSize)
val chunkNums = (file.length / defaultBlockSize).toInt
for (i <- 1 to chunkNums) {
val seek = (i - 1) * defaultBlockSize
file.seek(seek)
file.read(byteBuffer)
val s = convert(byteBuffer)
println(byteBuffer size)
}
val stopTime = System.nanoTime()
println("Perf of = " + ((stopTime - startTime) / 1000000000.0) + " for a duration of " + duration + " s")
16 MB easily fits in memory unless you're running this on a feature phone or something. No need to chunk it and make the logic harder.
Just gulp the whole file at once with java.nio.files.Files.readAllBytes:
val buffer = java.nio.files.Files.readAllBytes(myfile.toPath)
assuming you are not stuck with Java 1.6. (If you are stuck with Java 1.6, pre-allocate your buffer size using myfile.size, and use read on a FileInputStream to get it all in one go. It's not much harder, just don't forget to close it!)
Then if you don't want to convert it yourself, you can
val bb = java.nio.ByteBuffer.wrap(buffer)
bb.order(java.nio.ByteOrder.nativeOrder)
val shorts = new Array[Short](buffer.length/2)
bb.asShortBuffer.get(shorts)
And you're done.
Note that this is all Java stuff; there's nothing Scala-specific here save the syntax.
If you're wondering why this is so much faster than your code, it's because grouped(2) boxes the bytes and places them in an array. That's three allocations for every short you want! You can do it yourself by indexing the array directly, and that will be fast, but why would you want to when ByteBuffer and friends do exactly what you need already?
If you really really care about that last (odd) byte, then you can use (buffer.length + 1)/2 for the size of shorts, and tack on a if ((buffer.length) & 1 == 1) shorts(shorts.length-1) = ((bb.get&0xFF) << 8).toShort to grab the last byte.
A couple of issues pop out:
If byteBuffer is always going to be 1024*1024 size then the case Array(one) in convert will never actually be used and therefore pattern matching is unnecessary.
Also, you can avoid the for loop with a tail recursive function. After the val byteBuffer = ... line you can replace the chunkNums and for loop with:
#scala.annotation.tailrec
def readAndConvert(b: List[Array[Short]], file : RandomAccessFile) : List[Array[Short]] = {
if(file.read(byteBuffer) < 0)
b
else {
file.skipBytes(1024*1024)
readAndConvert(b.+:(convert(byteBuffer)), file)
}
}
val sValues = readAndConvert(List.empty[Array[Short]], file)
Note: because list preppending is much faster than appending the above loop gets you the converted value in reverse order from the reading order in the file.

How do I get Scala BigDecimal to display a large number of digits?

Doing the following
val num = BigDecimal(1.0)
val den = BigDecimal(3.0)
println((num/den)(MathContext.DECIMAL128))
I only get
0.3333333333333333333333333333333333
Which is less than the 128 I want
The default context is MathContext.DECIMAL128 which is used in all computations so in your example the result of num/den is already rounded to 128 places. You need to set your context on all values first and then do your computations.
val mc = new MathContext(512)
val num = BigDecimal(1.0,mc)
val den = BigDecimal(3.0,mc)
println(num/den)
Don't try and use MathContext.UNLIMITED unless you know your arithmetic does not produce an unbounded decimal representation. It will blow up even before you try to print.
MathContext128 is IEEE 754R Decimal128 format, 34 digits. So the output is correct (I assume the 128 refers to 128 bits of precision, not decimals).
I guess you can make your own MathContext with about four times the precision:
MathContext moreContext = new MathContext(512); // 512 bits (!) of precision
This works:
val mc = new java.math.MathContext(128)
val one_third = (BigDecimal(1, mc) / BigDecimal(3, mc)).toString
// 0. and a bunch of 3
one_third.filter(_ == '3').size // returns 128
If you use 512 you'll get 512 '3' digits.