Scala - Getting sequence id of elements during map - scala

Scala newbie question.
I want to map a list to another list but I want to every object to know its sequence number.
In the following simple code, what is the right alternative to the usage of var v?
class T (s: String, val sequence:Int)
val stringList = List("a","b","C")
var v = 0
val tList = stringList.map(s => { v=v+1; new T(s, v);})

You can use zipWithIndex to get a tuple for each element containing the actual element and the index, then just map that tuple to your object:
List("a", "b", "C")
.zipWithIndex
.map(e => new T(e._1, e._2))

val tList = List.tabulate(stringList.length)(idx => new T(stringList(idx), idx))

Related

Creating a Map by reading elements of List in Scala

I have some records in a List .
Now I want to create a new Map(Mutable Map) from that List with unique key for each record. I want to achieve this my reading a List and calling the higher order method called map in scala.
records.txt is my input file
100,Surender,2015-01-27
100,Surender,2015-01-30
101,Raja,2015-02-19
Expected Output :
Map(0-> 100,Surender,2015-01-27, 1 -> 100,Surender,2015-01-30,2 ->101,Raja,2015-02-19)
Scala Code :
object SampleObject{
def main(args:Array[String]) ={
val mutableMap = scala.collection.mutable.Map[Int,String]()
var i:Int =0
val myList=Source.fromFile("D:\\Scala_inputfiles\\records.txt").getLines().toList;
println(myList)
val resultList= myList.map { x =>
{
mutableMap(i) =x.toString()
i=i+1
}
}
println(mutableMap)
}
}
But I am getting output like below
Map(1 -> 101,Raja,2015-02-19)
I want to understand why it is keeping the last record alone .
Could some one help me?
val mm: Map[Int, String] = Source.fromFile(filename).getLines
.zipWithIndex
.map({ case (line, i) => i -> line })(collection.breakOut)
Here the (collection.breakOut) is to avoid the extra parse caused by toMap.
Consider
(for {
(line, i) <- Source.fromFile(filename).getLines.zipWithIndex
} yield i -> line).toMap
where we read each line, associate an index value starting from zero and create a map out of each association.

acces tuple inside a tuple for anonymous map job in Spark

This post is essentially about how to build joint and marginal histograms from a (String, String) RDD. I posted the code that I eventually used below as the answer.
I have an RDD that contains a set of tuples of type (String,String) and since they aren't unique I want to get a look at how many times each String, String combination occurs so I use countByValue like so
val PairCount = Pairs.countByValue().toSeq
which gives me a tuple as output like this ((String,String),Long) where long is the number of times that the (String, String) tuple appeared
These Strings can be repeated in different combinations and I essentially want to run word count on this PairCount variable so I tried something like this to start:
PairCount.map(x => (x._1._1, x._2))
But the output the this spits out is String1->1, String2->1, String3->1, etc.
How do I output a key value pair from a map job in this case where the key is going to be one of the String values from the inner tuple, and the value is going to be the Long value from the outter tuple?
Update:
#vitalii gets me almost there. the answer gets me to a Seq[(String,Long)], but what I really need is to turn that into a map so that I can run reduceByKey it afterwards. when I run
PairCount.flatMap{case((x,y),n) => Seq[x->n]}.toMap
for each unique x I get x->1
for example the above line of code generates mom->1 dad->1 even if the tuples out of the flatMap included (mom,30) (dad,59) (mom,2) (dad,14) in which case I would expect toMap to provide mom->30, dad->59 mom->2 dad->14. However, I'm new to scala so I might be misinterpreting the functionality.
how can I get the Tuple2 sequence converted to a map so that I can reduce on the map keys?
If I correctly understand question, you need flatMap:
val pairCountRDD = pairs.countByValue() // RDD[((String, String), Int)]
val res : RDD[(String, Int)] = pairCountRDD.flatMap { case ((s1, s2), n) =>
Seq(s1 -> n, s2 -> n)
}
Update: I didn't quiet understand what your final goal is, but here's a few more examples that may help you, btw code above is incorrect, I have missed the fact that countByValue returns map, and not RDD:
val pairs = sc.parallelize(
List(
"mom"-> "dad", "dad" -> "granny", "foo" -> "bar", "foo" -> "baz", "foo" -> "foo"
)
)
// don't use countByValue, if pairs is large you will run out of memmory
val pairCountRDD = pairs.map(x => (x, 1)).reduceByKey(_ + _)
val wordCount = pairs.flatMap { case (a,b) => Seq(a -> 1, b ->1)}.reduceByKey(_ + _)
wordCount.take(10)
// count in how many pairs each word occur, keys and values:
val wordPairCount = pairs.flatMap { case (a,b) =>
if (a == b) {
Seq(a->1)
} else {
Seq(a -> 1, b ->1)
}
}.reduceByKey(_ + _)
wordPairCount.take(10)
to get the histograms for the (String,String) RDD I used this code.
val Hist_X = histogram.map(x => (x._1-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_Y = histogram.map(x => (x._2-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_XY = histogram.map(x => (x-> 1.0)).reduceByKey(_+_)
where histogram was the (String,String) RDD

How to find tuple with different value in a list using scala?

I have following list:
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
I want following output:
List(("name3",40))
I tried following:
val distElements = list.map(_._2).distinct
list.groupBy(_._1).map{ case(k,v) =>
val h = v.map(_._2)
if(distElements.equals(h)) List.empty else distElements.diff(h)
}.flatten
But this is not I am looking for.
Can anybody give answer/hint me to get expected output.
I understand the question as looking for the element of the list whose _2 (number) occurs only once.
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
First you group by the _2 element, which gives you a map whose keys are lists of all elements with the same _2:
val g = list.groupBy(_._2) // Map[Int, List[(String, Int)]]
Now you can filter those entries that consists only of one element:
val opt = g.collectFirst { // Option[(String, Int)]
case (_, single :: Nil) => single
}
Or (if you are expecting possibly more than one distinct value)
val col = g.collect { // Map[String, Int]
case (_, single :: Nil) => single
}
Seems to me that you're looking to match against both the value of the left hand and the right hand at the same time while also preserving the type of collection you're looking at, a List. I would use collect:
val out = myList.collect{
case item # ("name3", 40) => item
}
which combines a PartialFunction with filter and map like qualities. In this case, it filters out any value for which the PartialFunction is not defined while mapping the values which match. Here, I've only allowed for a singular match.

Create Map in Scala using loop

I am trying to create a map after getting result for each items in the list. Here is what I tried so far:
val sourceList: List[(Int, Int)] = ....
val resultMap: Map[Int, Int] = for(srcItem <- sourceList) {
val result: Int = someFunction(srcItem._1)
Map(srcItem._1 -> result)
}
But I am getting type mismatch error in IntelliJ and I am definitely not writing proper syntax here. I don't think I can use yield as I don't want List of Map. What is correct way to create Map using for loop. Any suggestion?
The simplest way is to create the map out of a list of tuples:
val resultMap = sourceList.map(item => (item._1, someFunction(item._1))).toMap
Or, in the monadic way:
val listOfTuples = for {
(value, _) <- sourceList
} yield (value, someFunction(value))
val resultMap = listOfTuples.toMap
Alternatively, if you want to avoid the creation of listOfTuples you can make the transformation a lazy one by calling .view on sourceList and then call toMap:
val resultMap = sourceList.view
.map(item => (item._1, someFunction(item._1)))
.toMap
Finally, if you really want to avoid generating extra objects you can use a mutable Map instead and append the keys and values to it using += or .put

Filter a list by item index?

val data = List("foo", "bar", "bash")
val selection = List(0, 2)
val selectedData = data.filter(datum => selection.contains(datum.MYINDEX))
// INVALID CODE HERE ^
// selectedData: List("foo", "bash")
Say I want to filter a List given a list of selected indices. If, in the filter method, I could reference the index of a list item then I could solve this as above, but datum.MYINDEX isn't valid in the above case.
How could I do this instead?
How about using zipWithIndex to keep a reference to the item's index, filtering as such, then mapping the index away?
data.zipWithIndex
.filter{ case (datum, index) => selection.contains(index) }
.map(_._1)
It's neater to do it the other way about (although potentially slow with Lists as indexing is slow (O(n)). Vectors would be better. On the other hand, the contains of the other solution for every item in data isn't exactly fast)
val data = List("foo", "bar", "bash")
//> data : List[String] = List(foo, bar, bash)
val selection = List(0, 2)
//> selection : List[Int] = List(0, 2)
selection.map(index=>data(index))
//> res0: List[String] = List(foo, bash)
First solution that came to my mind was to create a list of pairs (element, index), filter every element by checking if selection contains that index, then map resulting list in order to keep only raw elementd (omit index). Code is self explanatory:
data.zipWithIndex.filter(pair => selection.contains(pair._2)).map(_._1)
or more readable:
val elemsWithIndices = data.zipWithIndex
val filteredPairs = elemsWithIndices.filter(pair => selection.contains(pair._2))
val selectedElements = filteredPairs.map(_._1)
This Works :
val data = List("foo", "bar", "bash")
val selection = List(0, 2)
val selectedData = data.filter(datum => selection.contains(data.indexOf(datum)))
println (selectedData)
output :
List(foo, bash)
Since you have a list of indices already, the most efficient way is to pick those indices directly:
val data = List("foo", "bar", "bash")
val selection = List(0, 2)
val selectedData = selection.map(index => data(index))
or even:
val selectedData = selection.map(data)
or if you need to preserve the order of the items in data:
val selectedData = selection.sorted.map(data)
UPDATED
In the spirit of finding all the possible algorithms, here's the version using collect:
val selectedData = data
.zipWithIndex
.collect {
case (item, index) if selection.contains(index) => item
}
The following is the probably most scalable way to do it in terms of efficiency, and unlike many answers on SO, actually follows the official scala style guide exactly.
import scala.collection.immutable.HashSet
val selectionSet = new HashSet() ++ selection
data.zipWithIndex.collect {
case (datum, index) if selectionSet.contains(index) => datum
}
If the resulting collection is to be passed to additional map, flatMap, etc, suggest turning data into a lazy sequence. In fact perhaps you should do this anyway in order to avoid 2-passes, one for the zipWithIndex one for the collect, but I doubt when benchmarked one would gain much.
There is actually an easier way to filter by index using the map method. Here is an example
val indices = List(0, 2)
val data = List("a", "b", "c")
println(indices.map(data)) // will print List("a", "c")