Filtering maps in iterator - scala

I have following code:
val rows: Iterator[Map[String,String]] = CSVDictReader(file.getInputStream)
val parsedProducts = rows.map(x => Product(name = x.get("NAME"), id = x.get("ID")))
And I would like to replace map entries with empty string. With a map alone I could use:
filter(_._2.trim.nonEmpty)
I cannot get my head around how to do this in a nice way without introducing some helper function to return None in case value is empty string.
Edit: In my example I have only name and id but in the real code there are easily over ten columns of data. Also, I would need to have None instead of empty string value. So name=Option("") should be replaced with name=None

You can filter Options as well.
Let's say your x.get("NAME") returns a Some("") or even Some(" ").
Then you may do something like this: x.get("NAME").filter(_.trim.nonEmpty)
Hope I understood your question correctly

something like this?
val rows: Iterator[Map[String,String]] = CSVDictReader(file.getInputStream)
val parsedProducts = for {
row <- rows
name <- row.get("NAME")
id <- row.get("ID")
} yield Product(name, id)
Here, if row.get("NAME") or row.get("ID") return None, the corresponding entry will not be yielded.

I'm not sure if this is what you're looking for, but the following code snippet:
val rows: Iterator[Map[String,String]] = Iterator(Map("NAME" -> " ", "ID" -> "foo"), Map("NAME" -> " ", "ID" -> ""))
val fieldNames = List("NAME","ID","ANOTHER COLUMN")
val cleanedRows = rows map { row =>
fieldNames map { fieldName =>
Map ( fieldName -> row.get(fieldName).filter (_.trim.nonEmpty) )
}
}
while(cleanedRows.hasNext) {
println(cleanedRows.next)
}
Would print out:
List(Map(NAME -> None), Map(ID -> Some(foo)), Map(ANOTHER COLUMN -> None))
List(Map(NAME -> None), Map(ID -> None), Map(ANOTHER COLUMN -> None))
So at this point cleanedRows would have the entries you need to create your Product instances.

Related

Scala map key consists of 2 comma-separated sets. How to extract the first key in a set?

I have a Scala map collection that looks something like this:
var collection = Map((A,B) -> 1)
The key is (A,B) and the value is 1.
My question: If I use collection.head._1, the result is (A,B) which is correct. But I want to extract A only, without B, as I need to compare A with some other variable. So the final result should be A stored in a different variable.
I tried to use collection.head._1(0) which results in error
Any does not take parameters
You can try:
val collection = Map(("A","B") -> 1)
collection.map{ case ((a, b),v) => a -> v}
You can use keySet to get all the keys as a Set[(String, String)] and then map it into the first element of each:
val coll: Map[(String, String), Int] =
Map(
("one", "elephant") -> 1,
("two", "elephants") -> 2,
("three", "elephants") -> 3
)
/*
val myKeys = coll.keySet.map { case (x, _) => x }
// equivalent to:
val myKeys = coll.keySet.map(tup => tup._1)
// equivalent to: */
val myKeys = coll.keySet.map(_._1) // Set(one, two, three)

Scala create immutable nested map

I have a situation here
I have two strins
val keyMap = "anrodiApp,key1;iosApp,key2;xyz,key3"
val tentMap = "androidApp,tenant1; iosApp,tenant1; xyz,tenant2"
So what I want to add is to create a nested immutable nested map like this
tenant1 -> (andoidiApp -> key1, iosApp -> key2),
tenant2 -> (xyz -> key3)
So basically want to group by tenant and create a map of keyMap
Here is what I tried but is done using mutable map which I do want, is there a way to create this using immmutable map
case class TenantSetting() {
val requesterKeyMapping = new mutable.HashMap[String, String]()
}
val requesterKeyMapping = keyMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map(keyValuePair => (keyValuePair[0],keyValuePair[1]))
.toMap
}.flatten.toMap
val config = new mutable.HashMap[String, TenantSetting]
tentMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map { keyValuePair =>
val requester = keyValuePair[0]
val tenant = keyValuePair[1]
if (!config.contains(tenant)) config.put(tenant, new TenantSetting)
config.get(tenant).get.requesterKeyMapping.put(requester, requesterKeyMapping.get(requester).get)
}
}
The logic to break the strings into a map can be the same for both as it's the same syntax.
What you had for the first string was not quite right as the filter you were applying to each string from the split result and not on the array result itself. Which also showed in that you were using [] on keyValuePair which was of type String and not Array[String] as I think you were expecting. Also you needed a trim in there to cope with the spaces in the second string. You might want to also trim the key and value to avoid other whitespace issues.
Additionally in this case the combination of map and filter can be more succinctly done with collect as shown here:
How to convert an Array to a Tuple?
The use of the pattern with 2 elements ensures you filter out anything with length other than 2 as you wanted.
The iterator is to make the combination of map and collect more efficient by only requiring one iteration of the collection returned from the first split (see comments below).
With both strings turned into a map it just needs the right use of groupByto group the first map by the value of the second based on the same key to get what you wanted. Obviously this only works if the same key is always in the second map.
def toMap(str: String): Map[String, String] =
str
.split(";")
.iterator
.map(_.trim.split(','))
.collect { case Array(key, value) => (key.trim, value.trim) }
.toMap
val keyMap = toMap("androidApp,key1;iosApp,key2;xyz,key3")
val tentMap = toMap("androidApp,tenant1; iosApp,tenant1; xyz,tenant2")
val finalMap = keyMap.groupBy { case (k, _) => tentMap(k) }
Printing out finalMap gives:
Map(tenant2 -> Map(xyz -> key3), tenant1 -> Map(androidApp -> key1, iosApp -> key2))
Which is what you wanted.

Using tuple as a key in scala

Question 1: Can I use tuple as a key of a map in Scala?
Question 2: If yes , how can I create a map with a tuple as key?
Question 3: I want to convert my scala map to RDD, how would I do in the following case? I am trying to do in this way
var mapRDD = sc.parallelize(map.toList)
Is this the right way to do ?
Question 4: For this particular code snippet, when I do a println on map, it has no values.
I have not included the whole code, basically mapAgainstValue contains userId as key and list of friends as values. I want to recreate a map RDD with the following transformation in the key.
What would be the reason for empty map?
var mapAgainstValue = logData.map(x=>x.split("\t")).filter(x => x.length == 2).map(x => (x(0),x(1).split(",")))
var map:Map[String,List[String]] = Map()
var changedMap = mapAgainstValue.map{
line =>
var key ="";
for(userIds <- line._2){
if(line._1.toInt < userIds.toInt){
key =line._1.concat("-"+userIds);
}
else {
key = userIds.concat("-" + line._1);
}
map += (key -> line._2.toList)
}
}
changedMap.collect()
map.foreach(println)
Yes, you can use Tuple as a key in Map.
For example:
val userMap = Map(
(1, 25) -> "shankar",
(2, 35) -> "ramesh")
Then you can try print the output using foreach
val userMapRDD = sparkContext.parallelize(userMap.toSeq, 2)
mapRDD.foreach(element => {
println(element)
})
If you want to transform the mapRDD to something else. following code returns only age and name as tuple.
val mappedRDD = userMapRDD.map {
case ((empId: Int, age: Int), name: String) => {
(age, name)
}
}

Creating a Map by reading elements of List in Scala

I have some records in a List .
Now I want to create a new Map(Mutable Map) from that List with unique key for each record. I want to achieve this my reading a List and calling the higher order method called map in scala.
records.txt is my input file
100,Surender,2015-01-27
100,Surender,2015-01-30
101,Raja,2015-02-19
Expected Output :
Map(0-> 100,Surender,2015-01-27, 1 -> 100,Surender,2015-01-30,2 ->101,Raja,2015-02-19)
Scala Code :
object SampleObject{
def main(args:Array[String]) ={
val mutableMap = scala.collection.mutable.Map[Int,String]()
var i:Int =0
val myList=Source.fromFile("D:\\Scala_inputfiles\\records.txt").getLines().toList;
println(myList)
val resultList= myList.map { x =>
{
mutableMap(i) =x.toString()
i=i+1
}
}
println(mutableMap)
}
}
But I am getting output like below
Map(1 -> 101,Raja,2015-02-19)
I want to understand why it is keeping the last record alone .
Could some one help me?
val mm: Map[Int, String] = Source.fromFile(filename).getLines
.zipWithIndex
.map({ case (line, i) => i -> line })(collection.breakOut)
Here the (collection.breakOut) is to avoid the extra parse caused by toMap.
Consider
(for {
(line, i) <- Source.fromFile(filename).getLines.zipWithIndex
} yield i -> line).toMap
where we read each line, associate an index value starting from zero and create a map out of each association.

How to get a List of Maps from a List of Objects in Scala

I need some help with Scala. I really have troubles in understanding how to deal with collections. What I have to do is traversing a List like this
List( MyObject(id, name, status), ..., MyObject(id, name, status) )
and getting another List like this one
List( Map("key", id1), Map("key", id2), ..., Map("key", idN) )
Notice that the 'key' element of all the maps have to be the same
Thanks
you can use the map function to transform a list of MyObject to a list of Map by:
val list = List( MyObject(id, name, status), ..., MyObject(id, name, status) )
val result = list map {o => Map("key" -> o.id)}
scala school from twitter is a good reading for beginners, and if you want to know the architecture of the Scala collections framework in detail, please refer to scala doc
I think this should do it.
list map { x => Map( "key" -> x.id ) }
An example
scala> case class Tst (fieldOne : String, fieldTwo : String)
defined class Tst
scala> val list = List(Tst("x", "y"), Tst("z", "a"))
list: List[Tst] = List(Tst(x,y), Tst(z,a))
list map { x => Map( "key" -> x.fieldOne ) }
res6: List[scala.collection.immutable.Map[String,String]] = List(Map(key -> y), Map(key -> a))