Failed to print get count result in forloop - scala

I have been trying to count inside a for loop, but the result just ends with a parentheses. I am just printing out the key here in map.
var count = 0
xs.foreach(x => (myMap += ((count+=1).toString+","+java.util.UUID.randomUUID.toString -> x)))
Output:
(),901e9926-be1e-4dc4-b3e3-6c3b2feea2c4
Expected output:
1,901e9926-be1e-4dc4-b3e3-6c3b2feea2c4

Within your foreach, count += 1 would be of type Unit. If I understand your question correctly, the example below (using an arbitrary xs collection) might be what you're looking for:
val xs = List("a", "b", "c", "d")
var count = 0
var myMap = Map[String, String]()
xs.foreach{ x =>
count += 1
myMap += ((count.toString + "," + java.util.UUID.randomUUID.toString) -> x)
}
myMap.keys
// res1: Iterable[String] = Set(
// 1,bd971c44-b9d0-41a0-b59f-3acbf2e0dee0, 2,5459eed9-309d-4f9c-afd7-10aced9df2a0,
// 3,5816ea42-d8ed-4beb-8b30-0376d0674700, 4,30f6f22f-1e6d-4eec-86af-5bc6734d5196
// )
In case you want a more idiomatic approach, using zip for the count and foldLeft for Map aggregation would produce similar result:
val myMap = Map[String, String]()
val resultMap = xs.zip(Stream from 1).foldLeft( myMap )(
(m, x) => m + ((x._2.toString + "," + java.util.UUID.randomUUID.toString) -> x._1)
)

What you are printing here is actually (count+=1).toString. In Scala, an assignment like this will be evaluated to Unit, which is expressed by parentheses. That's why you print () and not the value of count. If you check the count variable value afterwards you will see that it is 1 as expected.
Additionally, what you are trying to do could be expressed in a better way, e.g, you could do:
val myMap = xs.zipWithIndex.map(x => (x._2 + 1) + "," + java.util.UUID.randomUUID -> x._1).toMap

Related

Scala - conditional product/join of two arrays with default values using for comprehensions

I have two Sequences, say:
val first = Array("B", "L", "T")
val second = Array("T70", "B25", "B80", "A50", "M100", "B50")
How do I get a product such that elements of the first array are joined with each element of the second array which startsWith the former and also yield a default empty result when no element in the second array meets the condition.
Effectively to get an Output:
expectedProductArray = Array("B-B25", "B-B80", "B-B50", "L-Default", "T-T70")
I tried doing,
val myProductArray: Array[String] = for {
f <- first
s <- second if s.startsWith(f)
} yield s"""$f-$s"""
and i get:
myProductArray = Array("B-B25", "B-B80", "B-B50", "T-T70")
Is there an Idiomatic way of adding a default value for values in first sequence not having a corresponding value in the second sequence with the given criteria? Appreciate your thoughts.
Here's one approach by making array second a Map and looking up the Map for elements in array first with getOrElse:
val first = Array("B", "L", "T")
val second = Array("T70", "B25", "B80", "A50", "M100", "B50")
val m = second.groupBy(_(0).toString)
// m: scala.collection.immutable.Map[String,Array[String]] =
// Map(M -> Array(M100), A -> Array(A50), B -> Array(B25, B80, B50), T -> Array(T70))
first.flatMap(x => m.getOrElse(x, Array("Default")).map(x + "-" + _))
// res1: Array[String] = Array(B-B25, B-B80, B-B50, L-Default, T-T70)
In case you prefer using for-comprehension:
for {
x <- first
y <- m.getOrElse(x, Array("Default"))
} yield s"$x-$y"

How to set Map values in spark/scala

I am new to spark-scala development. I am trying to create map values in spark using scala but getting nothing printed
def createMap() : Map[String, Int] = {
var tMap:Map[String, Int] = Map()
val tDF = spark.sql("select a, b, c from temp")
for (x <- tDF) {
val k = x.getAs[Long](0) + "|" + x.getAs[Long](1)
val v = x.getAs[Int](2)
tMap += ( k -> v )
println( k -> v ) ///----------This print values
}
println("Hellllooooooooo1")
for ((k,v) <- tMap) println("key = " + k+ ", value= " + v) ////------This prints nothing
println("Hellllooooooooo2")
return tMap
}
Please suggest.
user8598832 gives how to do it properly (for some value of properly). The reason your approach doesn't work is that you're adding (k, v) to the map in an executor, but the println occurs in the driver, which generally won't see the map(s) in the executor(s) (to the extent that it might, that's just an artifact of running it in local mode not in a distributed mode).
The "right" (if collecting to driver is ever right) way to do it:
import org.apache.spark.sql.functions._
tDF.select(concat_ws("|", col("a"), col("b")), col("c")).as[(String, Int)].rdd.collectAsMap

Scala: Add a sequence number to duplicate elements in a list

I have a list and want to add a sequential number to duplicate elements.
val lst=List("a", "b", "c", "b", "c", "d", "b","a")
The result should be
List("a___0", "b___0", "c____0", "b___1", "c____1", "d___0", "b___2","a___1")
preserving the original order.
What I have so far:
val lb=new ListBuffer[String]()
for(i<-0 to lst.length-2) {
val lbSplit=lb.map(a=>a.split("____")(0)).distinct.toList
if(!lbSplit.contains(lst(i))){
var count=0
lb+=lst(i)+"____"+count
for(j<-i+1 to lst.length-1){
if(lst(i).equalsIgnoreCase(lst(j))) {
count+=1
lb+= lst(i)+"____"+count
}
}
}
}
which results in :
res120: scala.collection.mutable.ListBuffer[String]
= ListBuffer(a____0, a____1, b____0, b____1, b____2, c____0, c____1, d____0)
messing up the order. Also if there is a more concise way that would be great.
This should work without any mutable variables.
val lst=List("a", "b", "c", "b", "c", "d", "b","a")
lst.foldLeft((Map[String,Int]().withDefaultValue(0),List[String]())){
case ((m, l), x) => (m + (x->(m(x)+1)), x + "__" + m(x) :: l)
}._2.reverse
// res0: List[String] = List(a__0, b__0, c__0, b__1, c__1, d__0, b__2, a__1)
explanation
lst.foldLeft - Take the List of items (in this case a List[String]) and fold them (starting on the left) into a single item.
(Map[String,Int]().withDefaultValue(0),List[String]()) - In this case the new item will be a tuple of type (Map[String,Int], List[String]). We'll start the tuple with an empty Map and an empty List.
case ((m, l), x) => - Every time an element from lst is passed in to the tuple calculation we'll call that element x. We'll also receive the tuple from the previous calculation. We'll call the Map part m and we'll call the List part l.
m + (x->(m(x)+1)) - The Map part of the new tuple is created by creating/updating the count for this String (the x) and adding it to the received Map.
x + "__" + m(x) :: l - The List part of the new tuple is created by pre-pending a new String at the head.
}._2.reverse - The fold is finished. Extract the List from the tuple (the 2nd element) and reverse it to restore the original order of elements.
I think a more concise way that preserves the order would just to be to use a Map[String, Int] to keep a running total of each time you've seen a particular string. Then you can just map over lst directly and keep updating the map each time you've seen a string:
var map = Map[String, Int]()
lst.map { str =>
val count = map.getOrElse(str, 0) //get current count if in the map, otherwise zero
map += (str -> (count + 1)) //update the count
str + "__" + count
}
which will give you the following for your example:
List(a__0, b__0, c__0, b__1, c__1, d__0, b__2, a__1)
I consider that easiest to read, but if you want to avoid var then you can use foldLeft with a tuple to hold the intermediate state of the map:
lst.foldLeft((List[String](), Map[String, Int]())) { case ((list, map), str) =>
val count = map.getOrElse(str, 0)
(list :+ (str + "__" + count), map + (str -> (count + 1)))
}._1

Applying operation to corresponding elements of Array

I want to sum the corresponding elements of the list and multiply the results while keeping the label associated with the array element so
("a",Array((0.5,1.0),(0.667,2.0)))
becomes :
(a , (0.5 + 0.667) * (1.0 + 2.0))
Here is my code to express this for a single array element :
val data = Array(("a",Array((0.5,1.0),(0.667,2.0))), ("b",Array((0.6,2.0), (0.6,2.0))))
//> data : Array[(String, Array[(Double, Double)])] = Array((a,Array((0.5,1.0),
//| (0.667,2.0))), (b,Array((0.6,2.0), (0.6,2.0))))
val v1 = (data(0)._1, data(0)._2.map(m => m._1).sum)
//> v1 : (String, Double) = (a,1.167)
val v2 = (data(0)._1, data(0)._2.map(m => m._2).sum)
//> v2 : (String, Double) = (a,3.0)
val total = (v1._1 , (v1._2 * v2._2)) //> total : (String, Double) = (a,3.5010000000000003)
I just want apply this function to all elements of the array so val "data" above becomes :
Map[(String, Double)] = ((a,3.5010000000000003),(b,4.8))
But I'm not sure how to combine the above code into a single function which maps over all the array elements ?
Update : the inner Array can be of variable length so this is also valid :
val data = Array(("a",Array((0.5,1.0,2.0),(0.667,2.0,1.0))), ("b",Array((0.6,2.0), (0.6,2.0))))
Pattern matching is your friend! You can use it for tuples and arrays. If there are always two elements in the inner array, you can do it this way:
val data = Array(("a",Array((0.5,1.0),(0.667,2.0))), ("b",Array((0.6,2.0), (0.6,2.0))))
data.map {
case (s, Array((x1, x2), (x3, x4))) => s -> (x1 + x3) * (x2 + x4)
}
// Array[(String, Double)] = Array((a,3.5010000000000003), (b,4.8))
res6.toMap
// scala.collection.immutable.Map[String,Double] = Map(a -> 3.5010000000000003, b -> 4.8)
If the inner elements are variable length, you could do it this way (a for comprehension instead of explicit maps):
for {
(s, tuples) <- data
sum1 = tuples.map(_._1).sum
sum2 = tuples.map(_._2).sum
} yield s -> sum1 * sum2
Note that while this is a very clear solution, it's not the most efficient possible, because we're iterating over the tuples twice. You could use a fold instead, but it would be much harder to read (for me anyway. :)
Finally, note that .sum will produce zero on an empty collection. If that's not what you want, you could do this instead:
val emptyDefault = 1.0 // Or whatever, depends on your use case
for {
(s, tuples) <- data
sum1 = tuples.map(_._1).reduceLeftOption(_ + _).getOrElse(emptyDefault)
sum2 = tuples.map(_._2).reduceLeftOption(_ + _).getOrElse(emptyDefault)
} yield s -> sum1 * sum2
You can use algebird numeric library:
val data = Array(("a",Array((0.5,1.0),(0.667,2.0))), ("b",Array((0.6,2.0), (0.6,2.0))))
import com.twitter.algebird.Operators._
def sumAndProduct(a: Array[(Double, Double)]) = {
val sums = a.reduceLeft((m, n) => m + n)
sums._1 * sums._2
}
data.map{ case (x, y) => (x, sumAndProduct(y)) }
// Array((a,3.5010000000000003), (b,4.8))
It will work fine for variable size array as well.
val data = Array(("a",Array((0.5,1.0))), ("b",Array((0.6,2.0), (0.6,2.0))))
// Array((a,0.5), (b,4.8))
Like this? Does your array always have only 2 pairs?
val m = data map ({case (label,Array(a,b)) => (label, (a._1 + b._1) * (a._2 + b._2)) })
m.toMap

Strange results when using Scala collections

I have some tests with results that I can't quite explain.
The first test does a filter, map and reduce on a list containing 4 elements:
{
val counter = new AtomicInteger(0)
val l = List(1, 2, 3, 4)
val filtered = l.filter{ i =>
counter.incrementAndGet()
true
}
val mapped = filtered.map{ i =>
counter.incrementAndGet()
i*2
}
val reduced = mapped.reduce{ (a, b) =>
counter.incrementAndGet()
a+b
}
println("counted " + counter.get + " and result is " + reduced)
assert(20 == reduced)
assert(11 == counter.get)
}
The counter is incremented 11 times as I expected: once for each element during filtering, once for each element during mapping and three times to add up the 4 elements.
Using wildcards the result changes:
{
val counter = new AtomicInteger(0)
val l = List(1, 2, 3, 4)
val filtered = l.filter{
counter.incrementAndGet()
_ > 0
}
val mapped = filtered.map{
counter.incrementAndGet()
_*2
}
val reduced = mapped.reduce{ (a, b) =>
counter.incrementAndGet()
a+b
}
println("counted " + counter.get + " and result is " + reduced)
assert(20 == reduced)
assert(5 == counter.get)
}
I can't work out how to use wildcards in the reduce (code doesnt compile), but now, the counter is only incremented 5 times!!
So, question #1: Why do wildcards change the number of times the counter is called and how does that even work?
Then my second, related question. My understanding of views was that they would lazily execute the functions passed to the monadic methods, but the following code doesn't show that.
{
val counter = new AtomicInteger(0)
val l = Seq(1, 2, 3, 4).view
val filtered = l.filter{
counter.incrementAndGet()
_ > 0
}
println("after filter: " + counter.get)
val mapped = filtered.map{
counter.incrementAndGet()
_*2
}
println("after map: " + counter.get)
val reduced = mapped.reduce{ (a, b) =>
counter.incrementAndGet()
a+b
}
println("after reduce: " + counter.get)
println("counted " + counter.get + " and result is " + reduced)
assert(20 == reduced)
assert(5 == counter.get)
}
The output is:
after filter: 1
after map: 2
after reduce: 5
counted 5 and result is 20
Question #2: How come the functions are being executed immediately?
I'm using Scala 2.10
You're probably thinking that
filter {
println
_ > 0
}
means
filter{ i =>
println
i > 0
}
but Scala has other ideas. The reason is that
{ println; _ > 0 }
is a statement that first prints something, and then returns the > 0 function. So it interprets what you're doing as a funny way to specify the function, equivalent to:
val p = { println; (i: Int) => i > 0 }
filter(p)
which in turn is equivalent to
println
val temp = (i: Int) => i > 0 // Temporary name, forget we did this!
val p = temp
filter(p)
which as you can imagine doesn't quite work out the way you want--you only print (or in your case do the increment) once at the beginning. Both your problems stem from this.
Make sure if you're using underscores to mean "fill in the parameter" that you only have a single expression! If you're using multiple statements, it's best to stick to explicitly named parameters.