If I have a case class like below:
case class Student(name: String, activities: Seq[String], grade: String)
And I have a List like this:
val students = List(
Student("John", List("soccer", "Video Games"), "9th"),
Student("Jane", List("sword fighting", "debate"), "10th"),
Student("Boy Wonder", List("1", "5", "2"), "5th")
)
How can I sort the contents based on name and activities attributes to form a string? In the scenario above the string would be:
boywonder_1_2_5_5th_jane_debate_swordfighting_10th_john_soccer_videogames_9th
The sorting in this case is done like this:
First the elements are sorted with name -- Thats why in the final string boywonder comes first
Then that elements' activities are sorted as well -- Thats why Boy Wonder's activities are sorted as 1_2_5
You need to:
Make everything lowercase.
Sort the inner list activities.
Sort the outer list students, by name.
Turn everything into a String.
Here is the code.
students
.map { student =>
student.copy(
name = student.name.toLowerCase,
activities = student.activities.sorted.map(activity => activity.toLowerCase)
)
}.sortBy(student => student.name)
.map(student => s"${student.name}${student.activities.mkString}${student.grade}")
.mkString
.replaceAll("\\s", "")
// res: String = "boywonder1255thjanedebateswordfighting10thjohnvideogamessoccer9th"
Related
I have some troubles with groupByKey in scala and Spark.
I have 2 case classes :
case class Employee(id_employee: Long, name_emp: String, salary: String)
For the moment I use this 2nd case class:
case class Company(id_company: Long, employee:Seq[Employee])
However, I want to replace it with this new one:
case class Company(id_company: Long, name_comp: String employee:Seq[Employee])
There is a parent DataSet (df1) that I use with groupByKey to create Company objects :
val companies = df1.groupByKey(v => v.id_company)
.mapGroups(
{
case(k,iter) => Company(k, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)
}
).collect()
This code works, it returns objects like this one :
Company(1234,List(Employee(0987, John, 30000),Employee(4567, Bob, 50000)))
But I don't find the tip to add the Company name_comp to those objects (this field exist df1). In order to retrieve objects like this (using the new case class):
Company(1234, NYTimes, List(Employee(0987, John, 30000),Employee(4567, Bob, 50000)))
Since you want both the company id and name, what you can do is to use a tuple as the key when you group your data. This will make both values easily available when constructing the Company class:
df1.groupByKey(v => (v.id_company, v.name_comp))
.mapGroups{ case((id, name), iter) =>
Company(id, name, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)}
.collect()
I'm reading a .csv file that returns a list of String lists, recipiesList, in the following format:
List(List(Portuguese Green Soup, Portugal), List(Grilled Sardines, Portugal), List(Salted Cod with Cream, Portugal))
I have a class Recipe, which has been defined in the following manner:
case class Recipe(name: String, country: String)
Is there any immediate way that I can transform recipiesList into a list of type List[Recipe]? Such as with map or some sort of extractor?
You can transform elements of a List using the map method:
val input = List(List("Portuguese Green Soup", "Portugal"),
List("Grilled Sardines", "Portugal"),
List("Salted Cod with Cream", "Portugal"))
val output = input map { case List(name, country) => Recipe(name, country) }
The quick way would be:
recipiesList.map(s => Recipe(s(0), s(1))
I have a list of Person objects with many fields and I can easily do:
list.map(person => person.getName)
In order to generate another collection with all the peoples names.
How can you use the map function to create a new collection with all the fields of the Person class, BUT their name though?
In other words, how can you create a new collection out of a given collection which will contain all the elements of your initial collection with some of their fields removed?
You can use unapply method of your case class to extract the members as tuple then remove the things that you don't want from the tuple.
case class Person(name: String, Age: Int, country: String)
// defined class Person
val personList = List(
Person("person_1", 20, "country_1"),
Person("person_2", 30, "country_2")
)
// personList: List[Person] = List(Person(person_1,20,country_1), Person(person_2,30,country_2))
val tupleList = personList.flatMap(person => Person.unapply(person))
// tupleList: List[(String, Int, String)] = List((person_1,20,country_1), (person_2,30,country_2))
val wantedTupleList = tupleList.map({ case (name, age, country) => (age, country) })
// wantedTupleList: List[(Int, String)] = List((20,country_1), (30,country_2))
// the above is more easy to understand but will cause two parses of list
// better is to do it in one parse only, like following
val yourList = personList.flatMap(person => {
Person.unapply(person) match {
case (name, age, country) => (age, country)
}
})
// yourList: List[(Int, String)] = List((20,country_1), (30,country_2))
I am trying to figure out how to access particular elements from RDD myRDD with example entries below:
(600,List((600,111,7,1), (615,111,3,5))
(601,List((622,112,2,1), (615,111,3,5), (456,111,9,12))
I want to extract some data from Redis DB using 3-rd field from sub-lists as ID. For example, in case of (600,List((600,111,1,1), (615,111,1,5)), the IDs are 7 and 3.
In case of (601,List((622,112,2,1), (615,111,3,5), (456,111,9,12)), the ID's are 2, 3 and 9.
The problem is that I don't know how to collect values using multiple IDs. In the given code below, I use line._2(3), but it's not correct, because this way I access sublists instead of the fields inside these sublists.
Should I use flatMap or similar?
val newRDD = myRDD.mapPartitions(iter => {
val redisPool = new Pool(new JedisPool(new JedisPoolConfig(), "localhost", 6379, 2000))
iter.map({line => (line._1,
redisPool.withJedisClient { client =>
val start_date: String = Dress.up(client).hget("id:"+line._2(3),"start_date")
val end_date: String = Dress.up(client).hget("id:"+line._2(3),"end_date")
val additionalData = List((start_date,end_date))
Map(("base_data", line._2), ("additional_data", additionalData))
})
})
})
newRDD.collect().foreach(println)
If we assume that Redis DB contains some relevant data, then the result newRDD could be the following:
(600,Map("base_data" -> List((600,111,7,1), (615,111,3,5)), "additional_data" -> List((2014,2015),(2015,2016)))
(601,Map("base_data" -> List((622,112,2,1), (615,111,3,5), (456,111,9,12)), "additional_data" -> List((2010,2015),(2011,2016),(2014,2016)))
To get a list of third elements of each tuple in line._2, use line._2.map(_._3) (assuming the type of line is (Int, List[(Int, Int, Int, Int)]), like it looks from your example, and types like Any aren't involved). Overall, it seems like your code should look like
iter.map({ case (first, second) => (first,
redisPool.withJedisClient { client =>
val additionalData = second.map { tuple =>
val start_date: String = Dress.up(client).hget("id:"+tuple._3,"start_date")
val end_date: String = Dress.up(client).hget("id:"+tuple._3,"end_date")
(start_date, end_date)
}
Map(("base_data", second), ("additional_data", additionalData))
})
})
I have two different lists which contains different data.
Here is a example of lists-
list1:[{"name":"name1","srno":"srno1"},{"name":"name2","srno":"srno2"}]
list2:[{"location":"location1","srno":"srno2"},{"location":"location2","srno":"srno1"}]
These two lists have a field in common that is 'srno' which is of type string.
I want to map lists on srno and merge these two lists such that record corresponding to 'srno:1' from list1 to 'srno:1' to list2.
So file list would be like this:
[{"name":"name1","srno":"srno1","location":"location2"},{"name":"name2","srno":"srno2","location":"location2"}]
How do I sort and merge these two lists to form a single list using scala?
Edit:
There will be one to one correspondance i.e. srno1 will be present exactly once in both the lists
Assuming you are converting your json to case classes, you can use for comprehension to do this.
case class NameSrno(name: String, srno: String)
case class SrnoLoc(srno: String, location: String)
case class All(name: String, srno: String, location: String)
def merge(nsl: List[NameSrno], sll: List[SrnoLoc]): List[All] = {
for {
ns <- nsl
sl <- sll
if (ns.srno == sl.srno)
} yield All(ns.name, ns.srno, sl.location)
}
Usage:
val nsl = List(NameSrno("item1", "1"), NameSrno("item2", "2"))
val sll = List(SrnoLoc("1", "London"), SrnoLoc("2", "Tokyo"))
merge(nsl, sll)
//> res0: List[test.SeqOps.All] = List(All(item1,1,London), All(item2,2,Tokyo))