Initialize variables in one time with map containing nested IndexedSeq - scala

I search of method to resolve this initialization problem of tuple () with result of a map, like this :
//My current state of cities
val listOfCity = IndexedSeq(new City1(), new City2())
// Function which compute my new state
val (newCity,exchange) = listOfCity.map{ city => computeNewCity(city,listOfCity)}
The variable newCity contain the result ._1 of my tuple returned by computeNewCity() and the variable exchange contain the result ._2 of the same tuple.
The function computeNewCity() return a new version of my object city and an history of exchange, results of my exchange with other cities in listOfCity, it's a tuple of type (City, Exchange)
How can i make this with help of functionnal programming ?
Thanks !
Sr

The problem is listOfCity.map{ city => computeNewCity(city,listOfCity)} returns an IndexedSeq[(City, Exchange)] (one tuple for each city in listOfCity), and obviously you can't just assign it to a (City, Exchange) tuple. You could take first element or last:
val (firstCity,exchange) = listOfCity.map{ city => computeNewCity(city,listOfCity)}.first
val (lastCity,exchange) = listOfCity.map{ city => computeNewCity(city,listOfCity)}.last
or get a tuple of two sequences (cities and their corresponding exchanges)
val (cities,exchanges) = listOfCity.map{ city => computeNewCity(city,listOfCity)}.unzip

Is this what you’re trying to do?
scala> val Seq(a, b) = IndexedSeq(IndexedSeq(3.0,2.0), IndexedSeq(1.0))
a: IndexedSeq[Double] = Vector(3.0, 2.0)
b: IndexedSeq[Double] = Vector(1.0)

Related

Filter an Array using another Array

val value = Array["id","sd","cd"] -- List of columns
val cols_list = Array["cd","id","tm","no","in","ts","nm"] - -- List of columns
i want a list with columns not in cols_list.
code i tried as below :
val newcol = for (x <- cols_list if x.toString.toUpperCase() not in value )
it's throwing error as value not is not a member of String.
is there a way that we can achieve?
Please suggest.
simplest is filterNot method (Apart from diff) that returns all elements from a list for which your function returns false.
val value = Array("id","sd","cd") // List of columns
val cols_list = Array("cd","id","tm","no","in","ts","nm")
val finallist = cols_list.filterNot(value.contains(_)) //cols_list.par.filterNot also you can use
println(finallist.mkString(" "))
}
Result : tm no in ts nm
How it works...
filter creates a collection with those elements that do not satisfy the predicate p and discarding the rest. This is collection level and will work for all Collections API in scala
signature :
def filterNot(p: (A) => Boolean): Collection[A]
use .diff to get list of columns not in cols_list
val value = Array("id","sd","cd")
val cols_list = Array("cd","id","tm","no","in","ts","nm")
value.diff(cols_list)
//Array[String] = Array(sd)
//case insensitive
value.map(x => x.toUpperCase).diff(cols_list.map(x => x.toUpperCase)).map(x => x.toLowerCase)
//Array[String] = Array(sd)
UPDATE:
cols_list.diff(value)
//Array[String] = Array(tm, no, in, ts, nm)
cols_list.map(x => x.toUpperCase).diff(value.map(x => x.toUpperCase)).map(x => x.toLowerCase)
//Array[String] = Array(tm, no, in, ts, nm)
cols_list.map(x => x.toUpperCase).diff(value.map(x => x.toUpperCase)).map(x => "\"a." + x.toLowerCase + "\"").mkString(",")
//String = "a.tm","a.no","a.in","a.ts","a.nm"

How to use map / flatMap on a scala Map?

I have two sequences, i.e. prices: Seq[Price] and overrides: Seq[Override]. I need to do some magic on them yet only for a subset based on a shared id.
So I grouped them both into a Map each via groupBy:
I do the group by via:
val pricesById = prices.groupBy(_.someId) // Int => Seq[Cruise]
val overridesById = overrides.groupBy(_.someId) // // Int => Seq[Override]
I expected to be able to create my wanted sequence via flatMap:
val applyOverrides = (someId: Int, prices: Seq[Price]): Seq[Price] => {
val applicableOverrides = overridesById.getOrElse(someId, Seq())
magicMethod(prices, applicableOverrides) // returns Seq[Price]
}
val myPrices: Seq[Price] = pricesById.flatMap(applyOverrides)
I expected myPrices to contain just one big Seq[Price].
Yet I get a weird type mismatch within the flatMap method with NonInferedB I am unable to resolve.
In scala, maps are tuples, not a key-value pair.
The function for flatMap hence expects only one parameter, namely the tuple (key, value), and not two parameters key, value.
Since you can access first element of a tuple via _1, the second via _2 and so on, you can generate your desired function like so:
val pricesWithMagicApplied = pricesById.flatMap(tuple =>
applyOverrides(tuple._1, tuple._2)
Another approach is to use case matching:
val pricesWithMagicApplied: Seq[CruisePrice] = pricesById.flatMap {
case (someId, prices) => applyOverrides(someId, prices)
}.toSeq

Filter a list based on a parameter

I want to filter the employees based on name and return the id of each employee
case class Company(emp:List[Employee])
case class Employee(id:String,name:String)
val emp1=Employee("1","abc")
val emp2=Employee("2","def")
val cmpy= Company(List(emp1,emp2))
val a = cmpy.emp.find(_.name == "abc")
val b = a.map(_.id)
val c = cmpy.emp.find(_.name == "def")
val d = c.map(_.id)
println(b)
println(d)
I want to create a generic function that contains the filter logic and I can have different kind of list and filter parameter for those list
Ex employeeIdByName which takes the parameters
Updated
criteria for filter eg :_.name and id
list to filter eg:cmpy.emp value
for comparison eg :abc/def
Any better way to achieve the result
I have used map and find
If you really want a "generic" filter function, that can filter any list of elements, by any property of these elements, based on a closed set of "allowed" values, while mapping results to some other property - it would look something like this:
def filter[T, P, R](
list: List[T], // input list of elements with type T (in our case: Employee)
propertyGetter: T => P, // function extracting value for comparison, in our case a function from Employee to String
values: List[P], // "allowed" values for the result of propertyGetter
resultMapper: T => R // function extracting result from each item, in our case from Employee to String
): List[R] = {
list
// first we filter only items for which the result of
// applying "propertyGetter" is one of the "allowed" values:
.filter(item => values.contains(propertyGetter(item)))
// then we map remaining values to the result using the "resultMapper"
.map(resultMapper)
}
// for example, we can use it to filter by name and return id:
filter(
List(emp1, emp2),
(emp: Employee) => emp.name, // function that takes an Employee and returns its name
List("abc"),
(emp: Employee) => emp.id // function that takes an Employee and returns its id
)
// List(1)
However, this is a ton of noise around a very simple Scala operation: filtering and mapping a list; This specific usecase can be written as:
val goodNames = List("abc")
val input = List(emp1, emp2)
val result = input.filter(emp => goodNames.contains(emp.name)).map(_.id)
Or even:
val result = input.collect {
case Employee(id, name) if goodNames.contains(name) => id
}
Scala's built-in map, filter, collect functions are already "generic" in the sense that they can filter/map by any function that applies to the elements in the collection.
You can use Shapeless. If you have a employees: List[Employee], you can use
import shapeless._
import shapeless.record._
employees.map(LabelledGeneric[Employee].to(_).toMap)
To convert each Employee to a map from field key to field value. Then you can apply the filters on the map.

Extracting data from RDD in Scala/Spark

So I have a large dataset that is a sample of a stackoverflow userbase. One line from this dataset is as follows:
<row Id="42" Reputation="11849" CreationDate="2008-08-01T13:00:11.640" DisplayName="Coincoin" LastAccessDate="2014-01-18T20:32:32.443" WebsiteUrl="" Location="Montreal, Canada" AboutMe="A guy with the attention span of a dead goldfish who has been having a blast in the industry for more than 10 years.
Mostly specialized in game and graphics programming, from custom software 3D renderers to accelerated hardware pipeline programming." Views="648" UpVotes="337" DownVotes="40" Age="35" AccountId="33" />
I would like to extract the number from reputation, in this case it is "11849" and the number from age, in this example it is "35" I would like to have them as floats.
The file is located in a HDFS so it comes in the format RDD
val linesWithAge = lines.filter(line => line.contains("Age=")) //This is filtering data which doesnt have age
val repSplit = linesWithAge.flatMap(line => line.split("\"")) //Here I am trying to split the data where there is a "
so when I split it with quotation marks the reputation is in index 3 and age in index 23 but how do I assign these to a map or a variable so I can use them as floats.
Also I need it to do this for every line on the RDD.
EDIT:
val linesWithAge = lines.filter(line => line.contains("Age=")) //transformations from the original input data
val repSplit = linesWithAge.flatMap(line => line.split("\""))
val withIndex = repSplit.zipWithIndex
val indexKey = withIndex.map{case (k,v) => (v,k)}
val b = indexKey.lookup(3)
println(b)
So if added an index to the array and now I've successfully managed to assign it to a variable but I can only do it to one item in the RDD, does anyone know how I could do it to all items?
What we want to do is to transform each element in the original dataset (represented as an RDD) into a tuple containing (Reputation, Age) as numeric values.
One possible approach is to transform each element of the RDD using String operations in order to extract the values of the elements "Age" and "Reputation", like this:
// define a function to extract the value of an element, given the name
def findElement(src: Array[String], name:String):Option[String] = {
for {
entry <- src.find(_.startsWith(name))
value <- entry.split("\"").lift(1)
} yield value
}
We then use that function to extract the interesting values from every record:
val reputationByAge = lines.flatMap{line =>
val elements = line.split(" ")
for {
age <- findElement(elements, "Age")
rep <- findElement(elements, "Reputation")
} yield (rep.toInt, age.toInt)
}
Note how we don't need to filter on "Age" before doing this. If we process a record that does not have "Age" or "Reputation", findElement will return None. Henceforth the result of the for-comprehension will be None and the record will be flattened by the flatMap operation.
A better way to approach this problem is by realizing that we are dealing with structured XML data. Scala provides built-in support for XML, so we can do this:
import scala.xml.XML
import scala.xml.XML._
// help function to map Strings to Option where empty strings become None
def emptyStrToNone(str:String):Option[String] = if (str.isEmpty) None else Some(str)
val xmlReputationByAge = lines.flatMap{line =>
val record = XML.loadString(line)
for {
rep <- emptyStrToNone((record \ "#Reputation").text)
age <- emptyStrToNone((record \ "#Age").text)
} yield (rep.toInt, age.toInt)
}
This method relies on the structure of the XML record to extract the right attributes. As before, we use the combination of Option values and flatMap to remove records that do not contain all the information we require.
First, you need a function which extracts the value for a given key of your line (getValueForKeyAs[T]), then do:
val rdd = linesWithAge.map(line => (getValueForKeyAs[Float](line,"Age"), getValueForKeyAs[Float](line,"Reputation")))
This should give you an rdd of type RDD[(Float,Float)]
getValueForKeyAs could be implemented like this:
def getValueForKeyAs[A](line:String, key:String) : A = {
val res = line.split(key+"=")
if(res.size==1) throw new RuntimeException(s"no value for key $key")
val value = res(1).split("\"")(1)
return value.asInstanceOf[A]
}

How to find tuple with different value in a list using scala?

I have following list:
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
I want following output:
List(("name3",40))
I tried following:
val distElements = list.map(_._2).distinct
list.groupBy(_._1).map{ case(k,v) =>
val h = v.map(_._2)
if(distElements.equals(h)) List.empty else distElements.diff(h)
}.flatten
But this is not I am looking for.
Can anybody give answer/hint me to get expected output.
I understand the question as looking for the element of the list whose _2 (number) occurs only once.
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
First you group by the _2 element, which gives you a map whose keys are lists of all elements with the same _2:
val g = list.groupBy(_._2) // Map[Int, List[(String, Int)]]
Now you can filter those entries that consists only of one element:
val opt = g.collectFirst { // Option[(String, Int)]
case (_, single :: Nil) => single
}
Or (if you are expecting possibly more than one distinct value)
val col = g.collect { // Map[String, Int]
case (_, single :: Nil) => single
}
Seems to me that you're looking to match against both the value of the left hand and the right hand at the same time while also preserving the type of collection you're looking at, a List. I would use collect:
val out = myList.collect{
case item # ("name3", 40) => item
}
which combines a PartialFunction with filter and map like qualities. In this case, it filters out any value for which the PartialFunction is not defined while mapping the values which match. Here, I've only allowed for a singular match.