Generate Random String/letter in Scala - scala

I'm trying to generate a random String, and these are the possibilities I've found:
Random.nextPrintableChar(), which prints letters, numbers, punctuation
Random.alphanumeric.take(size).mkString, which prints letters and numbers
Random.nextString(1), which prints Chinese chars almost every time lol
Random is scala.util.Random
size is an Int
The second option almost does the job, but I need to start with a letter. I found Random.nextPrintableChar() but it also prints punctuation.
What's the solution?
My solution so far was:
val low = 65 // A
val high = 90 // Z
((Random.nextInt(high - low) + low).toChar
Inspired by Random.nextPrintableChar implementation:
def nextPrintableChar(): Char = {
val low = 33
val high = 127
(self.nextInt(high - low) + low).toChar
}

Found a better solution:
Random.alphanumeric.filter(_.isLetter).head
A better solution as jwvh commented: Random.alphanumeric.dropWhile(_.isDigit)

For better control of the contents, select the alphabet yourself:
val alpha = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def randStr(n:Int) = (1 to n).map(_ => alpha(Random.nextInt(alpha.length))).mkString

Actually the fastest method to generate Random ASCII String is the following
val rand = new Random()
val Alphanumeric = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".getBytes
def mkStr(chars: Array[Byte], length: Int): String = {
val bytes = new Array[Byte](length)
for (i <- 0 until length) bytes(i) = chars(rand.nextInt(chars.length))
new String(bytes, StandardCharsets.US_ASCII)
}
def nextAlphanumeric(length: Int): String = mkStr(Alphanumeric, length)

Related

Scalacheck number generator between 0 <= x < 2^64

I'm trying to right a good number generator that covers uint64_t in C. Here is what I have so far.
def uInt64s : Gen[BigInt] = Gen.choose(0,64).map(pow2(_) - 1)
It is a good start, but it only generates numbers 2^n - 1. Is there a more effective way to generate random BigInts while preserving the number range 0 <= n < 2^64?
Okay, maybe I am missing something here, but isn't it as simple as this?
def uInt64s : Gen[BigInt] = Gen.chooseNum(Long.MinValue,Long.MaxValue)
.map(x => BigInt(x) + BigInt(2).pow(63))
Longs already have the correct number of bits - just adding 2^63 so Long.MinValue becomes 0 and Long.MaxValue becomes 2^64 - 1. And doing the addition with BigInts of course.
I was curious about the distribution of generated values. Apparently the distribution of chooseNum is not uniform, since it prefers special values, but the edge cases for Longs are probably also interesting for UInt64s:
/** Generates numbers within the given inclusive range, with
* extra weight on zero, +/- unity, both extremities, and any special
* numbers provided. The special numbers must lie within the given range,
* otherwise they won't be included. */
def chooseNum[T](minT: T, maxT: T, specials: T*)(
With ScalaCheck...
Generating a number from 0..Long.MaxValue is easy.
Generating an unsigned long from 0..Long.MaxValue..2^64-1 is not so easy.
Tried:
❌ Gen.chooseNum(BigInt(0),BigInt(2).pow(64)-1) Does not work: At this time there is not an implicit defined for BigInt.
❌ Arbitrary.arbBigInt.arbitrary Does not work: It's type BigInt but still limited to the range of signed Long.
✔ Generate a Long as BigInt and shift left arbitrarily to make an UINT64 Works: Taking Rickard Nilsson's, ScalaCheck code as a guide this passed the test.
This is what I came up with:
// Generate a long and map to type BigInt
def genBigInt : Gen[BigInt] = Gen.chooseNum(0,Long.MaxValue) map (x => BigInt(x))
// Take genBigInt and shift-left a chooseNum(0,64) of positions
def genUInt64 : Gen[BigInt] = for { bi <- genBigInt; n <- Gen.chooseNum(0,64); x = (bi << n) if x >= 0 && x < BigInt(2).pow(64) } yield x
...
// Use the generator, genUInt64()
As noted, Scalacheck number generator between 0 <= x < 2^64, the distribution of the BigInts generated is not even. The preferred generator is #stholzm solution:
def genUInt64b : Gen[BigInt] =
Gen.chooseNum(Long.MinValue,Long.MaxValue) map (x =>
BigInt(x) + BigInt(2).pow(63))
it is simpler, the numbers fed to ScalaCheck will be more evenly distributed, it is faster, and it passes the tests.
A simpler and more efficient alternative to stholmz's answer is as follows:
val myGen = {
val offset = -BigInt(Long.MinValue)
Arbitrary.arbitrary[Long].map { BigInt(_) + offset }
}
Generate an arbitrary Long;
Convert it to a BigInt;
Add the appropriate offset, i.e. -BigInt(Long.MinValue)).
Tests in the REPL:
scala> myGen.sample
res0: Option[scala.math.BigInt] = Some(9223372036854775807)
scala> myGen.sample
res1: Option[scala.math.BigInt] = Some(12628207908230674671)
scala> myGen.sample
res2: Option[scala.math.BigInt] = Some(845964316914833060)
scala> myGen.sample
res3: Option[scala.math.BigInt] = Some(15120039215775627454)
scala> myGen.sample
res4: Option[scala.math.BigInt] = Some(0)
scala> myGen.sample
res5: Option[scala.math.BigInt] = Some(13652951502631572419)
Here is what I have so far, I'm not entirely happy with it
/**
* Chooses a BigInt in the ranges of 0 <= bigInt < 2^^64
* #return
*/
def bigInts : Gen[BigInt] = for {
bigInt <- Arbitrary.arbBigInt.arbitrary
exponent <- Gen.choose(1,2)
} yield bigInt.pow(exponent)
def positiveBigInts : Gen[BigInt] = bigInts.filter(_ >= 0)
def bigIntsUInt64Range : Gen[BigInt] = positiveBigInts.filter(_ < (BigInt(1) << 64))
/**
* Generates a number in the range 0 <= x < 2^^64
* then wraps it in a UInt64
* #return
*/
def uInt64s : Gen[UInt64] = for {
bigInt <- bigIntsUInt64Range
} yield UInt64(bigInt)
Since it appears that Arbitrary.argBigInt.arbitrary is only ranges -2^63 <= x <= 2^63 I take the x^2 some of the time to get a number larger than 2^63
Free free to comment if you see a place improvements can be made or a bug fixed

Scala - What type are the numbers in the List using x.toString.toList?

I have written a function in Scala that should calculate the sum of the squares of the digits of a number. Eg: 44 -> 32 (4^2 + 4^2 = 16 + 16 = 32)
Here it is:
def digitSum(x:BigInt) : BigInt = {
var sum = 0
val leng = x.toString.toList.length
var y = x.toString.toList
for (i<-0 until leng ) {
sum += y(i).toInt * y(i).toInt
}
return sum
}
However when I call the function let's say with digitSum(44) instead of 32 I get 5408.
Why is this happening? Does it have to do with the fact that in the list there are Strings? If so why does the .toInt method do not work?
Thanks!
The answer to your questions has been already covered here Scala int value of String characters, have a good read through and you will have more information than required ;)
Also looking at your code, it can benefit more from Scala expressiveness and functional features. The same function can be written in the following manner:
def digitSum(x: BigInt) = x.toString
.map(_.asDigit)
.map(a => a * a)
.sum
In the future try to avoid using mutable variables and standard looping techniques if you could.
When you do toString you're mapping the String to Chars not Ints and then to Ints later. This is what it looks like in the repl:
scala> "1".toList.map(_.toInt)
res0: List[Int] = List(49)
What you want is probably something like this:
def digitSum(x:BigInt) : BigInt = {
var sum = 0
val leng = x.toString.toList.length
var y = x.toString.toList
for (i<-0 until leng ) {
sum += (y(i).toInt - 48) * (y(i).toInt - 48) //Subtract out char base
}
sum
}

Scala finding more elegant way

I am new to Scala and functional programming.
I was solving problem where you have to read number, and then that number of integers. After that you should calculate sum of all digits in all the integers.
Here is my code
def sumDigits(line: String) =
line.foldLeft(0)(_ + _.toInt - '0'.toInt)
def main(args: Array[String]) {
val numberOfLines = Console.readInt
val lines = for (i <- 1 to numberOfLines) yield Console.readLine
println(lines.foldLeft(0)( _ + sumDigits(_)))
}
Is there more elegant or efficient way?
sumDigits() can be implemented easier with sum:
def sumDigits(line: String) = line.map(_.asDigit).sum
Second foldLeft() can also be replaced with sum:
lines.map(sumDigits).sum
Which brings us to the final version (notice there is no main, instead with extend App):
object Main extends App {
def sumDigits(line: String) = line.map(_.asDigit).sum
val lines = for (_ <- 1 to Console.readInt) yield Console.readLine
println(lines.map(sumDigits).sum)
}
Or if you really want to squeeze as much as possible in one line, inline sumDigits (not recommended):
lines.map(_.map(_.asDigit).sum).sum
I like compact code, so I might (if I was really going for brevity)
object Reads extends App {
import Console._
println( Seq.fill(readInt){readLine.map(_ - '0').sum}.sum )
}
which sets the number of lines inline and does the processing as you go. No error checking, though. You could throw in a .filter(_.isDigit) right after the readLine to at least discard non-digits. You might also def p[A](a: A) = { println(a); a } and wrap the reads in p so you can see what had been typed (by default on some platforms at least there's no echo to screen).
One-liner Answer:
Iterator.continually(Console.readLine).take(Console.readInt).toList.flatten.map(_.asDigit).sum
To start with, you have to do some kind of parsing on line to break apart the existing decimal integers sub-strings:
val numbers = "5 1 4 9 16 25"
val ints = numbers.split("\\s+").toList.map(_.toInt)
Then you want to pull off the first one as the count and keep the rest to decode and sum:
val count :: numbers = ints
Then use the built-in sum method:
val sum = numbers.sum
Altogether in the REPL:
scala> val numbers = "5 1 4 9 16 25"
numbers: String = 5 1 4 9 16 25
scala> val ints = numbers.split("\\s+").toList.map(_.toInt)
ints: List[Int] = List(5, 1, 4, 9, 16, 25)
scala> val count :: numbers = ints
count: Int = 5
numbers: List[Int] = List(1, 4, 9, 16, 25)
scala> val sum = numbers.sum
sum: Int = 55
If you want to do something with the leading number count, you could verify that it's correct:
scala> assert(count == numbers.length)
Which produces no output, since the assertion passes.

Scala Doubles, and Precision

Is there a function that can truncate or round a Double? At one point in my code I would like a number like: 1.23456789 to be rounded to 1.23
You can use scala.math.BigDecimal:
BigDecimal(1.23456789).setScale(2, BigDecimal.RoundingMode.HALF_UP).toDouble
There are a number of other rounding modes, which unfortunately aren't very well documented at present (although their Java equivalents are).
Here's another solution without BigDecimals
Truncate:
(math floor 1.23456789 * 100) / 100
Round (see rint):
(math rint 1.23456789 * 100) / 100
Or for any double n and precision p:
def truncateAt(n: Double, p: Int): Double = { val s = math pow (10, p); (math floor n * s) / s }
Similar can be done for the rounding function, this time using currying:
def roundAt(p: Int)(n: Double): Double = { val s = math pow (10, p); (math round n * s) / s }
which is more reusable, e.g. when rounding money amounts the following could be used:
def roundAt2(n: Double) = roundAt(2)(n)
Since no-one mentioned the % operator yet, here comes. It only does truncation, and you cannot rely on the return value not to have floating point inaccuracies, but sometimes it's handy:
scala> 1.23456789 - (1.23456789 % 0.01)
res4: Double = 1.23
How about :
val value = 1.4142135623730951
//3 decimal places
println((value * 1000).round / 1000.toDouble)
//4 decimal places
println((value * 10000).round / 10000.toDouble)
Edit: fixed the problem that #ryryguy pointed out. (Thanks!)
If you want it to be fast, Kaito has the right idea. math.pow is slow, though. For any standard use you're better off with a recursive function:
def trunc(x: Double, n: Int) = {
def p10(n: Int, pow: Long = 10): Long = if (n==0) pow else p10(n-1,pow*10)
if (n < 0) {
val m = p10(-n).toDouble
math.round(x/m) * m
}
else {
val m = p10(n).toDouble
math.round(x*m) / m
}
}
This is about 10x faster if you're within the range of Long (i.e 18 digits), so you can round at anywhere between 10^18 and 10^-18.
For those how are interested, here are some times for the suggested solutions...
Rounding
Java Formatter: Elapsed Time: 105
Scala Formatter: Elapsed Time: 167
BigDecimal Formatter: Elapsed Time: 27
Truncation
Scala custom Formatter: Elapsed Time: 3
Truncation is the fastest, followed by BigDecimal.
Keep in mind these test were done running norma scala execution, not using any benchmarking tools.
object TestFormatters {
val r = scala.util.Random
def textFormatter(x: Double) = new java.text.DecimalFormat("0.##").format(x)
def scalaFormatter(x: Double) = "$pi%1.2f".format(x)
def bigDecimalFormatter(x: Double) = BigDecimal(x).setScale(2, BigDecimal.RoundingMode.HALF_UP).toDouble
def scalaCustom(x: Double) = {
val roundBy = 2
val w = math.pow(10, roundBy)
(x * w).toLong.toDouble / w
}
def timed(f: => Unit) = {
val start = System.currentTimeMillis()
f
val end = System.currentTimeMillis()
println("Elapsed Time: " + (end - start))
}
def main(args: Array[String]): Unit = {
print("Java Formatter: ")
val iters = 10000
timed {
(0 until iters) foreach { _ =>
textFormatter(r.nextDouble())
}
}
print("Scala Formatter: ")
timed {
(0 until iters) foreach { _ =>
scalaFormatter(r.nextDouble())
}
}
print("BigDecimal Formatter: ")
timed {
(0 until iters) foreach { _ =>
bigDecimalFormatter(r.nextDouble())
}
}
print("Scala custom Formatter (truncation): ")
timed {
(0 until iters) foreach { _ =>
scalaCustom(r.nextDouble())
}
}
}
}
It's actually very easy to handle using Scala f interpolator - https://docs.scala-lang.org/overviews/core/string-interpolation.html
Suppose we want to round till 2 decimal places:
scala> val sum = 1 + 1/4D + 1/7D + 1/10D + 1/13D
sum: Double = 1.5697802197802198
scala> println(f"$sum%1.2f")
1.57
You may use implicit classes:
import scala.math._
object ExtNumber extends App {
implicit class ExtendedDouble(n: Double) {
def rounded(x: Int) = {
val w = pow(10, x)
(n * w).toLong.toDouble / w
}
}
// usage
val a = 1.23456789
println(a.rounded(2))
}
Recently, I faced similar problem and I solved it using following approach
def round(value: Either[Double, Float], places: Int) = {
if (places < 0) 0
else {
val factor = Math.pow(10, places)
value match {
case Left(d) => (Math.round(d * factor) / factor)
case Right(f) => (Math.round(f * factor) / factor)
}
}
}
def round(value: Double): Double = round(Left(value), 0)
def round(value: Double, places: Int): Double = round(Left(value), places)
def round(value: Float): Double = round(Right(value), 0)
def round(value: Float, places: Int): Double = round(Right(value), places)
I used this SO issue. I have couple of overloaded functions for both Float\Double and implicit\explicit options. Note that, you need to explicitly mention the return type in case of overloaded functions.
Those are great answers in this thread. In order to better show the difference, here is just an example. The reason I put it here b/c during my work the numbers are required to be NOT half-up :
import org.apache.spark.sql.types._
val values = List(1.2345,2.9998,3.4567,4.0099,5.1231)
val df = values.toDF
df.show()
+------+
| value|
+------+
|1.2345|
|2.9998|
|3.4567|
|4.0099|
|5.1231|
+------+
val df2 = df.withColumn("floor_val", floor(col("value"))).
withColumn("dec_val", col("value").cast(DecimalType(26,2))).
withColumn("floor2", (floor(col("value") * 100.0)/100.0).cast(DecimalType(26,2)))
df2.show()
+------+---------+-------+------+
| value|floor_val|dec_val|floor2|
+------+---------+-------+------+
|1.2345| 1| 1.23| 1.23|
|2.9998| 2| 3.00| 2.99|
|3.4567| 3| 3.46| 3.45|
|4.0099| 4| 4.01| 4.00|
|5.1231| 5| 5.12| 5.12|
+------+---------+-------+------+
floor function floors to the largest interger less than current value. DecimalType by default will enable HALF_UP mode, not just cut to precision you want. If you want to cut to a certain precision without using HALF_UP mode, you can use above solution instead ( or use scala.math.BigDecimal (where you have to explicitly define rounding modes).
I wouldn't use BigDecimal if you care about performance. BigDecimal converts numbers to string and then parses it back again:
/** Constructs a `BigDecimal` using the decimal text representation of `Double` value `d`, rounding if necessary. */
def decimal(d: Double, mc: MathContext): BigDecimal = new BigDecimal(new BigDec(java.lang.Double.toString(d), mc), mc)
I'm going to stick to math manipulations as Kaito suggested.
Since the question specified rounding for doubles specifically, this seems way simpler than dealing with big integer or excessive string or numerical operations.
"%.2f".format(0.714999999999).toDouble
A bit strange but nice. I use String and not BigDecimal
def round(x: Double)(p: Int): Double = {
var A = x.toString().split('.')
(A(0) + "." + A(1).substring(0, if (p > A(1).length()) A(1).length() else p)).toDouble
}
You can do:Math.round(<double precision value> * 100.0) / 100.0
But Math.round is fastest but it breaks down badly in corner cases with either a very high number of decimal places (e.g. round(1000.0d, 17)) or large integer part (e.g. round(90080070060.1d, 9)).
Use Bigdecimal it is bit inefficient as it converts the values to string but more relieval:
BigDecimal(<value>).setScale(<places>, RoundingMode.HALF_UP).doubleValue()
use your preference of Rounding mode.
If you are curious and want to know more detail why this happens you can read this:
I think previous answers are:
Plain wrong: using math.floor for example doesn't work for negative values..
Unnecessary complicated.
Here is a suggestion based on #kaito's answer (i can't comment yet):
def truncateAt(x: Double, p: Int): Double = {
val s = math.pow(10, p)
(x * s).toInt / s
}
toInt will work for positive and negative values.

Scala: groupBy (identity) of List Elements

I develop an application that builds pairs of words in (tokenised) text and produces the number of times each pair occurs (even when same-word pairs occur multiple times, it's OK as it'll be evened out later in the algorithm).
When I use
elements groupBy()
I want to group by the elements' content itself, so I wrote the following:
def self(x: (String, String)) = x
/**
* Maps a collection of words to a map where key is a pair of words and the
* value is number of
* times this pair
* occurs in the passed array
*/
def producePairs(words: Array[String]): Map[(String,String), Double] = {
var table = List[(String, String)]()
words.foreach(w1 =>
words.foreach(w2 =>
table = table ::: List((w1, w2))))
val grouppedPairs = table.groupBy(self)
val size = int2double(grouppedPairs.size)
return grouppedPairs.mapValues(_.length / size)
}
Now, I fully realise that this self() trick is a dirty hack. So I thought a little a came out with a:
grouppedPairs = table groupBy (x => x)
This way it produced what I want. However, I still feel that I clearly miss something and there should be easier way of doing it. Any ideas at all, dear all?
Also, if you'd help me to improve the pairs extraction part, it'll also help a lot – it looks very imperative, C++ - ish right now. Many thanks in advance!
I'd suggest this:
def producePairs(words: Array[String]): Map[(String,String), Double] = {
val table = for(w1 <- words; w2 <- words) yield (w1,w2)
val grouppedPairs = table.groupBy(identity)
val size = grouppedPairs.size.toDouble
grouppedPairs.mapValues(_.length / size)
}
The for comprehension is much easier to read, and there is already a predifined function identity, with is a generalized version of your self.
you are creating a list of pairs of all words against all words by iterating over words twice, where i guess you just want the neighbouring pairs. the easiest is to use a sliding view instead.
def producePairs(words: Array[String]): Map[(String, String), Int] = {
val pairs = words.sliding(2, 1).map(arr => arr(0) -> arr(1)).toList
val grouped = pairs.groupBy(t => t)
grouped.mapValues(_.size)
}
another approach would be to fold the list of pairs by summing them up. not sure though that this is more efficient:
def producePairs(words: Array[String]): Map[(String, String), Int] = {
val pairs = words.sliding(2, 1).map(arr => arr(0) -> arr(1))
pairs.foldLeft(Map.empty[(String, String), Int]) { (m, p) =>
m + (p -> (m.getOrElse(p, 0) + 1))
}
}
i see you are return a relative number (Double). for simplicity i have just counted the occurances, so you need to do the final division. i think you want to divide by the number of total pairs (words.size - 1) and not by the number of unique pairs (grouped.size)..., so the relative frequencies sum up to 1.0
Alternative approach which is not of order O(num_words * num_words) but of order O(num_unique_words * num_unique_words) (or something like that):
def producePairs[T <% Traversable[String]](words: T): Map[(String,String), Double] = {
val counts = words.groupBy(identity).map{case (w, ws) => (w -> ws.size)}
val size = (counts.size * counts.size).toDouble
for(w1 <- counts; w2 <- counts) yield {
((w1._1, w2._1) -> ((w1._2 * w2._2) / size))
}
}