Prettier way in scala to foldLeft a column? - scala

I have an SQL which I am trying to code in spark scala using dataframes
SELECT country,
Substr(substring_col, 1, 3) AS Code,
CASE
WHEN Substr(substring_col, 1, 9) = '238208700' THEN
'columnName1'
WHEN Substr(substring_col, 1, 9) = '240018000' THEN 'columnName2'
WHEN Substr(substring_col, 1, 9) = '240017531' THEN 'columnName3'
WHEN Substr(substring_col, 1, 9) = '240017001'
OR Substr(substring_col, 1, 9) = '240017301'
OR Substr(substring_col, 1, 9) = '240017302' THEN 'columnName4'
WHEN Substr(substring_col, 1, 9) = '240017211' THEN 'columnName5'
WHEN Substr(substring_col, 1, 9) = '248010160'
OR Substr(substring_col, 1, 9) = '248010241'
OR Substr(substring_col, 1, 9) = '248010420' THEN 'columnName6'
ELSE 'not_defined'
END AS custom_column_name
FROM t_filtered
Below is code which I am trying to make it look pretty but unable to achieve the desired result
val substring_col = substring(col("col_name"),1,9)
val mapper = Map("columnName1" -> List("238208700"),
"columnName2" -> List("240017301", "240017001")
)
myDf.withColumn("custom_column_name" ,mapper.foldLeft(lit(""))((accu, mapperMap) => {
when(substring_col isin mapperMap._2, mapperMap._1)
}))
I get the below error
Unsupported literal type class scala.collection.immutable.$colon$colon List(238208700)
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(238208700)

Related

Validate that sequence of numbers are within allowed range

How can I validate that a sequence of numbers falls into an allowed range of numbers in a more scala/functional (not java like) way?
val valuesRight = Seq(1, 2, 3, 4, 5, 6, 7)
val valuesWrong = Seq(1, 2, 5, 6, 7, 8, 9)
val allowedValues = Range(1, 8)
def containsNotAllowedValues(allowed: Range, input: Seq[Int]): Boolean = {
var containsNotAllowedValue = false
// pseudo code of how to do it in java, But how in scala / functional way?
while next element is available
validate if it is contained in allowed
if not allowed set containsNotAllowedValue to true and break the loop early
}
containsNotAllowedValue
}
containsNotAllowedValues(allowedValues, valuesRight) // expected false as no wrong element contained
containsNotAllowedValues(allowedValues, valuesWrong) // expected true as at least single wrong element is contained
You can use the forall function on Seq. It checks whether the given predicate is true for all elements in the Seq. If that function returns false with the predicate allowed contains _, then your Seq contains an illegal value.
def containsNotAllowedValues(allowed: Range, input: Seq[Int]): Boolean =
!input.forall { allowed contains _ }
You could do something as simple as this:
def containsNotAllowedValues(allowed: Range, input: Seq[Int]): Boolean = {
!(allowed.head <= input.min && allowed.last >= input.max)
}
val valuesRight = Seq(1, 2, 3, 4, 5, 6, 7)
val valuesWrong = Seq(1, 2, 5, 6, 7, 8, 9)
val allowedValues = Range(1, 8)
containsNotAllowedValues(allowedValues,valuesRight) //false
containsNotAllowedValues(allowedValues,valuesWrong) //true
This will fail if the Range has only 1 value like Range(1,1)
If you want you can add
allowed.size>1
to catch those cases

Create sublists with elements 1 and 2 in alternate ways in Scala

Hi all I am new with Scala and I have a doubt:
Creation of sublists with elements 1 and 2 in alternate ways, never having two consecutive numbers repeated.
My function:
def functionAlternateElements(list : List[Int]): Option[List[Int]] = {
//What should be the solution?
}
Tests:
test("AlternateElementsTests") {
assert(ca1.functionAlternateElements(List(1)) === Some(List(1)))
assert(ca1.functionAlternateElements(List(1,2)) === Some(List(2)))
assert(ca1.functionAlternateElements(List(1,2,3)) === Some(List(1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4)) === Some(List(1, 2, 1)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5)) === Some(List(2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6)) === Some(List(1, 2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7)) === Some(List(1, 2, 1, 2, 1)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8)) === Some(List(2, 1, 2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9)) === Some(List(1, 2, 1, 2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10)) === Some(List(1, 2, 1, 2, 1, 2, 1)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11)) === Some(List(2, 1, 2, 1, 2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12)) === Some(List(1, 2, 1, 2, 1, 2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13)) === Some(List(1, 2, 1, 2, 1, 2, 1, 2, 1)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14)) === Some(List(2, 1, 2, 1, 2, 1, 2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)) === Some(List(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)) === Some(List(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)))
assert(ca1.functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)) === Some(List(2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2)))
}
How can I do this?
If you know the code can you give an explanation so I can understand it please?
Given that the assertions seem to be more or less arbitrary, I don't know how to write a program that can produce these strange outputs.
However, I know how to write a program that can write a program that can produce these strange outputs. The idea is to generate code that satisfies the requirements. We will generate it by supplying an ansatz and then searching for a bunch of magic numbers by brute-force.
First, let's build some infrastructure. We need a tiny framework for solving arbitrary problems by brute force. Here is a trait that is useful for very small problems:
trait BruteForce[X] {
def engageBruteForceAttack(constraint: X => Boolean): Option[X]
def zip[Y](other: BruteForce[Y]): BruteForce[(X, Y)] =
new ProductBruteForce[X, Y](this, other)
}
We need to handle only the finite case and the case of cartesian product:
class FiniteBruteForce[X](possibilities: List[X])
extends BruteForce[X] {
def engageBruteForceAttack(constraint: X => Boolean) = possibilities.find(constraint)
}
object FiniteBruteForce {
def apply[X](xs: X*) = new FiniteBruteForce[X](xs.toList)
}
class ProductBruteForce[A, B](a: BruteForce[A], b: BruteForce[B])
extends BruteForce[(A, B)] {
def engageBruteForceAttack(constraint: ((A, B)) => Boolean) = {
var solution: Option[(A, B)] = None
a.engageBruteForceAttack { x =>
b.engageBruteForceAttack { y =>
if (constraint((x, y))) {
solution = Some((x, y))
true
} else {
false
}
}.map(_ => true).getOrElse(false)
}
solution
}
}
Now we can extract the inputs and outputs from your test cases:
val mysteriousTestCases = List(
(List(1), List(1)),
(List(1,2), List(2)),
(List(1,2,3), List(1, 2)),
(List(1,2,3,4), List(1, 2, 1)),
(List(1,2,3,4,5), List(2, 1, 2)),
(List(1,2,3,4,5,6), List(1, 2, 1, 2)),
(List(1,2,3,4,5,6,7), List(1, 2, 1, 2, 1)),
(List(1,2,3,4,5,6,7,8), List(2, 1, 2, 1, 2)),
(List(1,2,3,4,5,6,7,8,9), List(1, 2, 1, 2, 1, 2)),
(List(1,2,3,4,5,6,7,8,9,10), List(1, 2, 1, 2, 1, 2, 1)),
(List(1,2,3,4,5,6,7,8,9,10,11), List(2, 1, 2, 1, 2, 1, 2)),
(List(1,2,3,4,5,6,7,8,9,10,11,12), List(1, 2, 1, 2, 1, 2, 1, 2)),
(List(1,2,3,4,5,6,7,8,9,10,11,12,13), List(1, 2, 1, 2, 1, 2, 1, 2, 1)),
(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14), List(2, 1, 2, 1, 2, 1, 2, 1, 2)),
(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), List(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)),
(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16), List(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)),
(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17), List(2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2))
)
Now let's write a program that will write us the function that computes the strange 1,2,1,1,2,... sequence:
val (illuminatiPivot, (illuminatiOffset, illuminatiShift)) =
FiniteBruteForce(-1, -2, 0, 1, 2).zip(
FiniteBruteForce(-1, -2, 1, 2, 3).zip(
FiniteBruteForce(0, 1, 2)
)
).engageBruteForceAttack{ case (p, (o, s)) =>
mysteriousTestCases.forall { case (input, output) =>
val (start :: tail) = output
val n = input.size
val illuminatiNumber =
if (n < p) (n + o)
else List(1, 1, 2)((n + s) % 3)
start == illuminatiNumber
}
}.get
println(s"""|// The function that generates the start number
|// of the strange sequences
|def illuminatiNumber(n: Int): Int = {
| if (n < $illuminatiPivot) (n + $illuminatiOffset)
| else List(1, 1, 2)((n + $illuminatiShift) % 3)
|}
|""".stripMargin)
Now do the same for the lengths of the outputs. The ansatz is this time: it should be some affine-linear function that grows with factor 2/3 and is rounded in a weird way:
val (hl3ConfirmedConst, hl3ConfirmedOffset) =
FiniteBruteForce(-2, -1, 0, 1, 2).zip(
FiniteBruteForce(-2, -1, 0, 1, 2, 3)
).engageBruteForceAttack{ case (c, o) =>
mysteriousTestCases.forall { case (input, output) =>
val n = input.size
val halfLife3Confirmed = c + (n * 2 + o) / 3
output.size == halfLife3Confirmed
}
}.get
println(s"""|def halfLife3Confirmed(i: Int): Int = {
| $hl3ConfirmedConst + (i * 2 + $hl3ConfirmedOffset) / 3
|}
|""".stripMargin)
Finally, I didn't bother to think hard enough how to map 1 to sequence 1,2,1,2,... and 2 to 2,1,2,1,..., so I brute-forced this too:
val (tabulationOffset, tabulationShift) =
FiniteBruteForce(-1, 0, 1, 2).zip(FiniteBruteForce(0, 1)).engageBruteForceAttack{
case (x, y) =>
(0 to 2).map(i => x + (y + 1 + i) % 2).toList == List(1, 2, 1) &&
(0 to 2).map(i => x + (y + 2 + i) % 2).toList == List(2, 1, 2)
}.get
Now we can write out the initially seeked functionAlternateElements-method:
println(s"""|def functionAlternateElements(list : List[Int]): Option[List[Int]] = {
| val n = list.size // throw away everything but the size
| val resultStart = illuminatiNumber(n)
| val resultSize = halfLife3Confirmed(n)
| Some(List.tabulate(resultSize){ i => $tabulationOffset + ($tabulationShift + resultStart + i) % 2 })
|}
|""".stripMargin)
Let's also write out the assertions again:
for ((i, o) <- mysteriousTestCases) {
val input = i.mkString("List(", ",", ")")
val output = o.mkString("List(", ",", ")")
println(s"""assert(functionAlternateElements($input) == Some($output))""")
}
And also let's append a line that tells us that everything went well in the end:
println("""println("All assertions are true")""")
When run, the above contraption produces the following code:
// The function that generates the start number
// of the strange sequences
def illuminatiNumber(n: Int): Int = {
if (n < -1) (n + -1)
else List(1, 1, 2)((n + 0) % 3)
}
def halfLife3Confirmed(i: Int): Int = {
0 + (i * 2 + 1) / 3
}
def functionAlternateElements(list : List[Int]): Option[List[Int]] = {
val n = list.size // throw away everything but the size
val resultStart = illuminatiNumber(n)
val resultSize = halfLife3Confirmed(n)
Some(List.tabulate(resultSize){ i => 1 + (1 + resultStart + i) % 2 })
}
assert(functionAlternateElements(List(1)) == Some(List(1)))
assert(functionAlternateElements(List(1,2)) == Some(List(2)))
assert(functionAlternateElements(List(1,2,3)) == Some(List(1,2)))
assert(functionAlternateElements(List(1,2,3,4)) == Some(List(1,2,1)))
assert(functionAlternateElements(List(1,2,3,4,5)) == Some(List(2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6)) == Some(List(1,2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7)) == Some(List(1,2,1,2,1)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8)) == Some(List(2,1,2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9)) == Some(List(1,2,1,2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10)) == Some(List(1,2,1,2,1,2,1)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11)) == Some(List(2,1,2,1,2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12)) == Some(List(1,2,1,2,1,2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13)) == Some(List(1,2,1,2,1,2,1,2,1)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14)) == Some(List(2,1,2,1,2,1,2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)) == Some(List(1,2,1,2,1,2,1,2,1,2)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)) == Some(List(1,2,1,2,1,2,1,2,1,2,1)))
assert(functionAlternateElements(List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)) == Some(List(2,1,2,1,2,1,2,1,2,1,2)))
println("All assertions are true")
When we execute the generated code, we get only:
All assertions are true
and no AssertionErrors. Thus, we've managed to implement the strange function successfully.
Tip of the day: you know that your specification is bad when the only way to satisfy the tests is by brute-forcing a bunch of magic numbers.

Scala REPL not printing Range

When I try to print a Range in Scala REPL, how come it doesn't give me the list of numbers.
It displays Range(0 to 10) instead of printing Range(1,2,3,4,5,6,7,8,9,10).
Scala REPL range printout
It looks like the toString function for Range has changed in Scala 2.12.
Testing with 2.12.0:
scala> (1 to 10)
res0: scala.collection.immutable.Range.Inclusive = Range 1 to 10
Testing with 2.11.8:
scala> (0 to 10)
res0: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Source code for 2.12:
override def toString = {
val preposition = if (isInclusive) "to" else "until"
val stepped = if (step == 1) "" else s" by $step"
val prefix = if (isEmpty) "empty " else if (!isExact) "inexact " else ""
s"${prefix}Range $start $preposition $end$stepped"
}
Source code for 2.11:
override def toString() = {
val endStr =
if (numRangeElements > Range.MAX_PRINT || (!isEmpty && numRangeElements < 0)) ", ... )" else ")"
take(Range.MAX_PRINT).mkString("Range(", ", ", endStr)
}
An easy solution, when you're lost with the bounds of your Range and want to inspect its actual individual elements, consists in transforming it as a List:
scala> (0 until 10)
res0: scala.collection.immutable.Range = Range 0 until 10
scala> (0 until 10).toList
res1: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Scala trying to count instances of a digit in a number

This is my first day using scala. I am trying to make a string of the number of times each digit is represented in a string. For instance, the number 4310227 would return "1121100100" because 0 appears once, 1 appears once, 2 appears twice and so on...
def pow(n:Int) : String = {
val cubed = (n * n * n).toString
val digits = 0 to 9
val str = ""
for (a <- digits) {
println(a)
val b = cubed.count(_==a.toString)
println(b)
}
return cubed
}
and it doesn't seem to work. would like some scalay reasons why and to know whether I should even be going about it in this manner. Thanks!
When you iterate over strings, which is what you are doing when you call String#count(), you are working with Chars, not Strings. You don't want to compare these two with ==, since they aren't the same type of object.
One way to solve this problem is to call Char#toString() before performing the comparison, e.g., amend your code to read cubed.count(_.toString==a.toString).
As Rado and cheeken said, you're comparing a Char with a String, which will never be be equal. An alternative to cheekin's answer of converting each character to a string is to create a range from chars, ie '0' to '9':
val digits = '0' to '9'
...
val b = cubed.count(_ == a)
Note that if you want the Int that a Char represents, you can call char.asDigit.
Aleksey's, Ren's and Randall's answers are something you will want to strive towards as they separate out the pure solution to the problem. However, given that it's your first day with Scala, depending on what background you have, you might need a bit more context before understanding them.
Fairly simple:
scala> ("122333abc456xyz" filter (_.isDigit)).foldLeft(Map.empty[Char, Int]) ((histo, c) => histo + (c -> (histo.getOrElse(c, 0) + 1)))
res1: scala.collection.immutable.Map[Char,Int] = Map(4 -> 1, 5 -> 1, 6 -> 1, 1 -> 1, 2 -> 2, 3 -> 3)
This is perhaps not the fastest approach because intermediate datatype like String and Char are used but one of the most simplest:
def countDigits(n: Int): Map[Int, Int] =
n.toString.groupBy(x => x) map { case (n, c) => (n.asDigit, c.size) }
Example:
scala> def countDigits(n: Int): Map[Int, Int] = n.toString.groupBy(x => x) map { case (n, c) => (n.asDigit, c.size) }
countDigits: (n: Int)Map[Int,Int]
scala> countDigits(12345135)
res0: Map[Int,Int] = Map(5 -> 2, 1 -> 2, 2 -> 1, 3 -> 2, 4 -> 1)
Where myNumAsString is a String, eg "15625"
myNumAsString.groupBy(x => x).map(x => (x._1, x._2.length))
Result = Map(2 -> 1, 5 -> 2, 1 -> 1, 6 -> 1)
ie. A map containing the digit with its corresponding count.
What this is doing is taking your list, grouping the values by value (So for the initial string of "15625", it produces a map of 1 -> 1, 2 -> 2, 6 -> 6, and 5 -> 55.). The second bit just creates a map of the value to the count of how many times it occurs.
The counts for these hundred digits happen to fit into a hex digit.
scala> val is = for (_ <- (1 to 100).toList) yield r.nextInt(10)
is: List[Int] = List(8, 3, 9, 8, 0, 2, 0, 7, 8, 1, 6, 9, 9, 0, 3, 6, 8, 6, 3, 1, 8, 7, 0, 4, 4, 8, 4, 6, 9, 7, 4, 6, 6, 0, 3, 0, 4, 1, 5, 8, 9, 1, 2, 0, 8, 8, 2, 3, 8, 6, 4, 7, 1, 0, 2, 2, 6, 9, 3, 8, 6, 7, 9, 5, 0, 7, 6, 8, 7, 5, 8, 2, 2, 2, 4, 1, 2, 2, 6, 8, 1, 7, 0, 7, 6, 9, 5, 5, 5, 3, 5, 8, 2, 5, 1, 9, 5, 7, 2, 3)
scala> (new Array[Int](10) /: is) { case (a, i) => a(i) += 1 ; a } map ("%x" format _) mkString
warning: there were 1 feature warning(s); re-run with -feature for details
res7: String = a8c879caf9
scala> (new Array[Int](10) /: is) { case (a, i) => a(i) += 1 ; a } sum
warning: there were 1 feature warning(s); re-run with -feature for details
res8: Int = 100
I was going to point out that no one used a char range, but now I see Kristian did.
def pow(n:Int) : String = {
val cubed = (n * n * n).toString
val cnts = for (a <- '0' to '9') yield cubed.count(_ == a)
(cnts map (c => ('0' + c).toChar)).mkString
}

How to take the first distinct (until the moment) elements of a list?

I am sure there is an elegant/funny way of doing it,
but I can only think of a more or less complicated recursive solution.
Rephrasing:
Is there any standard lib (collections) method nor simple combination of them to take the first distinct elements of a list?
scala> val s = Seq(3, 5, 4, 1, 5, 7, 1, 2)
s: Seq[Int] = List(3, 5, 4, 1, 5, 7, 1, 2)
scala> s.takeWhileDistinct //Would return Seq(3,5,4,1), it should preserve the original order and ignore posterior occurrences of distinct values like 7 and 2.
If you want it to be fast-ish, then
{ val hs = scala.collection.mutable.HashSet[Int]()
s.takeWhile{ hs.add } }
will do the trick. (Extra braces prevent leaking the temp value hs.)
This is a short approach in a maximum of O(2logN).
implicit class ListOps[T](val s: Seq[T]) {
def takeWhileDistinct: Seq[T] = {
s.indexWhere(x => { s.count(x==) > 1 }) match {
case ind if (ind > 0) => s.take(
s.indexWhere(x => { s.count(x==) > 1 }, ind + 1) + ind).distinct
case _ => s
}
}
}
val ex = Seq(3, 5, 4, 5, 7, 1)
val ex2 = Seq(3, 5, 4, 1, 5, 7, 1, 5)
println(ex.takeWhileDistinct.mkString(", ")) // 3, 4, 5
println(ex2.takeWhileDistinct.mkString(", ")) // 3, 4, 5, 1
Look here for live results.
Interesting problem. Here's an alternative. First, let's get the stream of s so we can avoid unnecessary work (though the overhead is likely to be greater than the saved work, sadly).
val s = Seq(3, 5, 4, 5, 7, 1)
val ss = s.toStream
Now we can build s again, but keeping track of whether there are repetitions or not, and stopping at the first one:
val newS = ss.scanLeft(Seq[Int]() -> false) {
case ((seen, stop), current) =>
if (stop || (seen contains current)) (seen, true)
else ((seen :+ current, false))
}
Now all that's left is take the last element without repetition, and drop the flag:
val noRepetitionsS = newS.takeWhile(!_._2).last._1
A variation on Rex's (though I prefer his...)
This one is functional throughout, using the little-seen scanLeft method.
val prevs = xs.scanLeft(Set.empty[Int])(_ + _)
(xs zip prevs) takeWhile { case (x,prev) => !prev(x) } map {_._1}
UPDATE
And a lazy version (using iterators, for moar efficiency):
val prevs = xs.iterator.scanLeft(Set.empty[Int])(_ + _)
(prevs zip xs.iterator) takeWhile { case (prev,x) => !prev(x) } map {_._2}
Turn the resulting iterator back to a sequence if you want, but this'll also work nicely with iterators on both the input AND the output.
The problem is simpler than the std lib function I was looking for (takeWhileConditionOverListOfAllAlreadyTraversedItems):
scala> val s = Seq(3, 5, 4, 1, 5, 7, 1, 2)
scala> s.zip(s.distinct).takeWhile{case(a,b)=>a==b}.map(_._1)
res20: Seq[Int] = List(3, 5, 4, 1)