Suppose I have a list of tuples
('a', 1), ('b', 2)...
How would one get about converting it to a String in the format
a 1
b 2
I tried using collection.map(_.mkString('\t')) However I'm getting an error since essentially I'm applying the operation to a tuple instead of a list. Using flatMap didn't help either
For Tuple2 you can use:
val list = List(("1", 4), ("dfg", 67))
list.map { case (str, int) => s"$str $int"}
For any tuples try this code:
val list = List[Product](("dfsgd", 234), ("345345", 345, 456456))
list.map { tuple =>
tuple.productIterator.mkString("\t")
}
Related
How do I combine some elements in a list when they have the same property?
E.g. let say I have the following:
case class Foo(year: Int, amount: Int)
val list = List(Foo(2015, 10), Foo(2015, 15), Foo(2019, 55))
How do I transform list into List(Foo(2015, 25), Foo(2019, 55)) the Scala way?
As you can see both Foo(2015, 10) and Foo(2015, 15) are merged into List(Foo(2015, 25).
Similar question with Combining elements in the same list but that's for C#/LINQ.
If you're on Scala 2.13+, consider using groupMapReduce:
list.groupMapReduce(_.year)(_.amount)(_ + _).
map{ case (y, a) => Foo(y, a) }
// res1: scala.collection.immutable.Iterable[Foo] = List(Foo(2019,55), Foo(2015,25))
Use groupBy to arrange list by year, then map over the results to get it in the proper shape and sum the amount of each Foo.
scala> list.groupBy(foo => foo.year).map(m => Foo(m._1, m._2.map(foo => foo.amount).sum))
res5: scala.collection.immutable.Iterable[Foo] = List(Foo(2015,25), Foo(2019,55))
Just a Refactoring of Brian's answer. I use Pattern Matching to properly name the values.
I think it helps read the code.
list
.groupBy{case Foo(year, _) => year}
.map{ case (year, foos) =>
Foo(year,
foos.map{ case Foo(_, amount) => amount}.sum)
}
I have Some() type Map[String, String], such as
Array[Option[Any]] = Array(Some(Map(String, String)
I want to return it as
Array(Map(String, String))
I've tried few different ways of extracting it-
Let's say if
val x = Array(Some(Map(String, String)
val x1 = for (i <- 0 until x.length) yield { x.apply(i) }
but this returns IndexedSeq(Some(Map)), which is not what I want.
I tried pattern matching,
x.foreach { i =>
i match {
case Some(value) => value
case _ => println("nothing") }}
another thing I tried that was somewhat successful was that
x.apply(0).get.asInstanceOf[Map[String, String]]
will do something what I want, but it only gets 0th index of the entire array and I'd want all the maps in the array.
How can I extract Map type out of Some?
If you want an Array[Any] from your Array[Option[Any]], you can use this for expression:
for {
opt <- x
value <- opt
} yield value
This will put the values of all the non-empty Options inside a new array.
It is equivalent to this:
x.flatMap(_.toArray[Any])
Here, all options will be converted to an array of either 0 or 1 element. All these arrays will then be flattened back to one single array containing all the values.
Generally, the pattern is either to use transformations on the Option[T], like map, flatMap, filter, etc.
The problem is, we'll need to add a type cast to retrieve the underlying Map[String, String] from Any. So we'll use flatten to remove any potentially None types and unwrap the Option, and asInstanceOf to retreive the type:
scala> val y = Array(Some(Map("1" -> "1")), Some(Map("2" -> "2")), None)
y: Array[Option[scala.collection.immutable.Map[String,String]]] = Array(Some(Map(1 -> 1)), Some(Map(2 -> 2)), None)
scala> y.flatten.map(_.asInstanceOf[Map[String, String]])
res7: Array[Map[String,String]] = Array(Map(1 -> 1), Map(2 -> 2))
Also when you talk just about single value you can try Some("test").head and for null simply Some(null).flatten
I have a sample List as below
List[(String, Object)]
How can I loop through this list using for?
I want to do something like
for(str <- strlist)
but for the 2d list above. What would be placeholder for str?
Here it is,
scala> val fruits: List[(Int, String)] = List((1, "apple"), (2, "orange"))
fruits: List[(Int, String)] = List((1,apple), (2,orange))
scala>
scala> fruits.foreach {
| case (id, name) => {
| println(s"$id is $name")
| }
| }
1 is apple
2 is orange
Note: The expected type requires a one-argument function accepting a 2-Tuple.
Consider a pattern matching anonymous function, { case (id, name) => ... }
Easy to copy code:
val fruits: List[(Int, String)] = List((1, "apple"), (2, "orange"))
fruits.foreach {
case (id, name) => {
println(s"$id is $name")
}
}
With for you can extract the elements of the tuple,
for ( (s,o) <- list ) yield f(s,o)
I will suggest using map, filter,fold or foreach(whatever suits your need) rather than iterating over a collection using loop.
Edit 1:
e.g
if you want to apply some func foo(tuple) on each element
val newList=oldList.map(tuple=>foo(tuple))
val tupleStrings=tupleList.map(tuple=>tuple._1) //in your situation
if you want to filter according to some boolean condition
val newList=oldList.filter(tuple=>someCondition(tuple))
or simply if you want to print your List
oldList.foreach(tuple=>println(tuple)) //assuming tuple is printable
you can find example and similar functions here https://twitter.github.io/scala_school/collections.html
If you just want to get the strings you could map over your list of tuples like this:
// Just some example object
case class MyObj(i: Int = 0)
// Create a list of tuples like you have
val tuples = Seq(("a", new MyObj), ("b", new MyObj), ("c", new MyObj))
// Get the strings from the tuples
val strings = tuples.map(_._1)
// Output: Seq[String] = List(a, b, c)
Note: Tuple members are accessed using the underscore notation (which
is indexed from 1, not 0)
This post is essentially about how to build joint and marginal histograms from a (String, String) RDD. I posted the code that I eventually used below as the answer.
I have an RDD that contains a set of tuples of type (String,String) and since they aren't unique I want to get a look at how many times each String, String combination occurs so I use countByValue like so
val PairCount = Pairs.countByValue().toSeq
which gives me a tuple as output like this ((String,String),Long) where long is the number of times that the (String, String) tuple appeared
These Strings can be repeated in different combinations and I essentially want to run word count on this PairCount variable so I tried something like this to start:
PairCount.map(x => (x._1._1, x._2))
But the output the this spits out is String1->1, String2->1, String3->1, etc.
How do I output a key value pair from a map job in this case where the key is going to be one of the String values from the inner tuple, and the value is going to be the Long value from the outter tuple?
Update:
#vitalii gets me almost there. the answer gets me to a Seq[(String,Long)], but what I really need is to turn that into a map so that I can run reduceByKey it afterwards. when I run
PairCount.flatMap{case((x,y),n) => Seq[x->n]}.toMap
for each unique x I get x->1
for example the above line of code generates mom->1 dad->1 even if the tuples out of the flatMap included (mom,30) (dad,59) (mom,2) (dad,14) in which case I would expect toMap to provide mom->30, dad->59 mom->2 dad->14. However, I'm new to scala so I might be misinterpreting the functionality.
how can I get the Tuple2 sequence converted to a map so that I can reduce on the map keys?
If I correctly understand question, you need flatMap:
val pairCountRDD = pairs.countByValue() // RDD[((String, String), Int)]
val res : RDD[(String, Int)] = pairCountRDD.flatMap { case ((s1, s2), n) =>
Seq(s1 -> n, s2 -> n)
}
Update: I didn't quiet understand what your final goal is, but here's a few more examples that may help you, btw code above is incorrect, I have missed the fact that countByValue returns map, and not RDD:
val pairs = sc.parallelize(
List(
"mom"-> "dad", "dad" -> "granny", "foo" -> "bar", "foo" -> "baz", "foo" -> "foo"
)
)
// don't use countByValue, if pairs is large you will run out of memmory
val pairCountRDD = pairs.map(x => (x, 1)).reduceByKey(_ + _)
val wordCount = pairs.flatMap { case (a,b) => Seq(a -> 1, b ->1)}.reduceByKey(_ + _)
wordCount.take(10)
// count in how many pairs each word occur, keys and values:
val wordPairCount = pairs.flatMap { case (a,b) =>
if (a == b) {
Seq(a->1)
} else {
Seq(a -> 1, b ->1)
}
}.reduceByKey(_ + _)
wordPairCount.take(10)
to get the histograms for the (String,String) RDD I used this code.
val Hist_X = histogram.map(x => (x._1-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_Y = histogram.map(x => (x._2-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_XY = histogram.map(x => (x-> 1.0)).reduceByKey(_+_)
where histogram was the (String,String) RDD
I have following two lists-
List(("ABC",1,10),("PQR",1,10))
List((1,"abc",3940903,0.0),(2,"pqr",1234,3.0))
I want following output
List(("ABC",1,10,1,"abc",3940903,0.0),("PQR",1,10,2,"pqr",1234,3.0)
I tried out concat, ::: but didn't worked for me.
How do I get above output using scala??
You can not merge tuples directly in Scala. There are two ways to achieve it
Using shapeless
val A = List(("ABC", 1, 10), ("PQR", 1, 10))
val B = List((1, "abc", 3940903, 0.0), (2, "pqr", 1234, 3.0))
val zippedList = A zip B
import shapeless.syntax.std.tuple._
zippedList.map { case (a, b) => a ++ b }
//List((ABC,1,10,1,abc,3940903,0.0), (PQR,1,10,2,pqr,1234,3.0))
This method works on arbitrary size tuples
Using no external library
zippedList.map { case ((a,b,c), (p,q,r,s)) => (a,b,c,p,q,r,s) }
//List((ABC,1,10,1,abc,3940903,0.0), (PQR,1,10,2,pqr,1234,3.0))
The tuples should have fixed arity for this to work