Split key value in map scala - scala

I don't know if it is possible, but I'd like in my mapPartitions to split in two lists the variable "a". Like here to have a list l that stores all numbers and an other list let's say b that stores all words. with something like a.mapPartitions((p,v) =>{ val l = p.toList; val b = v.toList; ....}
With for example in my for loop l(i)=1 and b(i) ="score"
import scala.io.Source
import org.apache.spark.rdd.RDD
import scala.collection.mutable.ListBuffer
val a = sc.parallelize(List(("score",1),("chicken",2),("magnacarta",2)) )
a.mapPartitions(p =>{val l = p.toList;
val ret = new ListBuffer[Int]
val words = new ListBuffer[String]
for(i<-0 to l.length-1){
words+= b(i)
ret += l(i)
}
ret.toList.iterator
}
)

Spark is a distributed computing engine. you can perform operation on partitioned data across nodes of the cluster. Then you need a Reduce() method that performs a summary operation.
Please see this code that should do what you want:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object SimpleApp {
class MyResponseObj(var numbers: List[Int] = List[Int](), var words: List[String] = List[String]()) extends java.io.Serializable{
def +=(str: String, int: Int) = {
numbers = numbers :+ int
words = words :+ str
this
}
def +=(other: MyResponseObj) = {
numbers = numbers ++ other.numbers
words = words ++ other.words
this
}
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")
val sc = new SparkContext(conf)
val a = sc.parallelize(List(("score", 1), ("chicken", 2), ("magnacarta", 2)))
val myResponseObj = a.mapPartitions[MyResponseObj](it => {
var myResponseObj = new MyResponseObj()
it.foreach {
case (str :String, int :Int) => myResponseObj += (str, int)
case _ => println("unexpected data")
}
Iterator(myResponseObj)
}).reduce( (myResponseObj1, myResponseObj2) => myResponseObj1 += myResponseObj2 )
println(myResponseObj.words)
println(myResponseObj.numbers)
}
}

Related

generate list of case class with int field without repeat

I want to generate a List of some class which contains several fields. One of them is Int type and it doesn’t have to repeat. Could you help me to write the code?
I tried next:
case class Person(name: String, age: Int)
implicit val genPerson: Gen[Person] =
for {
name <- arbitrary[String]
age <- Gen.posNum[Int]
} yield Person(name, age)
implicit val genListOfPerson: Gen[scala.List[Person]] = Gen.listOfN(3, genPerson)
The problem is that I got an instance of a person with equal age.
If you're requiring that no two Persons in the generated list have the same age, you can
implicit def IntsArb: Arbitrary[Int] = Arbitrary(Gen.choose[Int](0, Int.MaxValue))
implicit val StringArb: Arbitrary[String] = Arbitrary(Gen.listOfN(5, Gen.alphaChar).map(_.mkString))
implicit val PersonGen = Arbitrary(Gen.resultOf(Person.apply _))
implicit val PersonsGen: Arbitrary[List[Person]] =
Arbitrary(Gen.listOfN(3, PersonGen.arbitrary).map { persons =>
val grouped: Map[Int, List[Person]] = persons.groupBy(_.age)
grouped.values.map(_.head) // safe because groupBy
})
Note that this will return a List with no duplicate ages but there's no guarantee that the list will have size 3 (it is guaranteed that the list will be nonempty, with size at most 3).
If having a list of size 3 is important, at the risk of generation failing if the "dice are against you", you can have something like:
def uniqueAges(persons: List[Person], target: Int): Gen[List[Person]] = {
val grouped: Map[Int, List[Person]] = persons.groupBy(_.age)
val uniquelyAged = grouped.values.map(_.head)
val n = uniquelyAged.size
if (n == target) Gen.const(uniquelyAged)
else {
val existingAges = grouped.keySet
val genPerson = PersonGen.arbitrary.retryUntil { p => !existingAges(p.age) }
Gen.listOf(target - n, genPerson)
.flatMap(l => uniqueAges(l, target - n))
.map(_ ++ uniquelyAged)
}
}
implicit val PersonsGen: Arbitrary[List[Person]] =
Arbitrary(Gen.listOfN(3, PersonGen.arbitrary).flatMap(l => uniqueAges(l, 3)))
You can do it as follows:
implicit def IntsArb: Arbitrary[Int] = Arbitrary(Gen.choose[Int](0, Int.MaxValue))
implicit val StringArb: Arbitrary[String] = Arbitrary(Gen.listOfN(5, Gen.alphaChar).map(_.mkString))
implicit val PersonGen = Arbitrary(Gen.resultOf(Person.apply _))
implicit val PersonsGen: Arbitrary[List[Person]] = Arbitrary(Gen.listOfN(3, PersonGen.arbitrary))

Sum of int elements in list and vector using single function in Scala

How to make this code work?
sealed abstract class Addable[A] {
def sum(el: Seq[A]): A
}
class MyAddable[A]() extends Addable[A] {
override def sum(el: Seq[A]): A = {
el.sum
}
}
val myvec = Vector(1, 2, 3)
val mylist = List(1, 2, 3)
val inst = new MyAddable
val res0 = inst.sum(mylist) // should return 6
val res1 = inst.sum(myvec) // should return 6
println(s"res0 = $res0")
println(s"res1 = $res1")
I want to pass a generic data type (Vector/List[Int]) and get a sum of it's elements using the described signatures and code structure.
At the moment I am getting:
found : immutable.this.List[scala.this.Int]
required: Seq[scala.this.Nothing]
Scalafiddle
The specific error is here:
val inst = new MyAddable
which should be
val inst = new MyAddable[Int]()
MyAddable is generic but you are not specifying a type, so it is assuming Nothing, hence the error message.
sealed abstract class Addable[A] {
def sum(el: Seq[A]): A
}
class MyAddable[A: Numeric]() extends Addable[A] {
override def sum(el: Seq[A]): A = {
el.sum
}
}
val myvec = Vector(1, 2, 3)
val mylist = List(1, 2, 3)
val inst = new MyAddable[Int]()
val res0 = inst.sum(mylist)
val res1 = inst.sum(myvec)
println(s"res0 = $res0")
println(s"res1 = $res1")
import cats.{Semigroup}
import cats.implicits._
// Specify a generic Reduce Function. Use Contravariant parameter to support reduce on derived types
trait Reduce[-F[_]] {
def reduce[A](fa:F[A])(f:(A,A) => A):A
}
object Reduce {
implicit val SeqReduce = new Reduce[Seq] {
def reduce[A] (data:Seq[A])(f:(A,A) => A ):A = data reduce f
}
implicit val OptReduce = new Reduce[Option] {
def reduce[A] (data:Option[A])(f:(A,A) => A ):A = data reduce f
}
}
// Generic sum function
def sum[A:Semigroup, F[_]](container: F[A])(implicit red:Reduce[F]):A = {
red.reduce(container)(Semigroup.combine(_,_))
}
val myvec = Vector(1, 2, 3)
val mylist = List (1, 2, 3)
val mymap = Map ( 1 -> "one",
2 -> "two",
3 -> "three"
)
val myopt = Some(1)
val res0 = sum(myvec)
val res1 = sum(mylist)
val res2 = sum(myopt)
println(s"res0 = $res0")
println(s"res1 = $res1")
println(s"res2 = $res2")
This gets a little more complicated for Maps

Map Partition Iterator return

Can anyone help in accepting the returning Iterator listWords() method to mapPartitions.
object MapPartitionExample {
def main(args: Array[String]): Unit = {
val conf= new SparkConf().setAppName("MapPartitionExample").setMaster("local[*]")
val sc= new SparkContext(conf)
val input:RDD[String] = sc.parallelize(List("ABC","DEF","GHU","YHG"))
val x= input.mapPartitions(word => listWords(word))
}
def listWords(words: Iterator[String]) : util.Iterator[String] = {
val arrList = new util.ArrayList[String]()
while( words.hasNext ) {
arrList.add( words.next())
}
return arrList.iterator()
}
}
Return type of the function used in mapPartitions should be scala.collection.Iterator, not java.util.Iterator. I don't see much point of your current code, but you can use Scala mutable collections:
import scala.collection.mutable.ArrayBuffer
def listWords(words: Iterator[String]) : Iterator[String] = {
val arr = ArrayBuffer[String]()
while( words.hasNext ) {
arr += words.next()
}
arr.toIterator
}
Personally I'd just map:
def listWords(words: Iterator[String]) : Iterator[String] = {
// Some init code
words.map(someFunction)
}
Iterable[NotInferU] is expected but you are returning java.util.Iterator[String]
You would need to convert the java.util.Iterator to scala Iterator by importing scala.collection.JavaConversions._ as below
def listWords(words: Iterator[String]) : Iterator[String] = {
val arrList = new util.ArrayList[String]()
while( words.hasNext ) {
arrList.add( words.next())
}
import scala.collection.JavaConversions._
return arrList.toList.iterator
}
Rest of the codes are as it is.
I hope the answer is helpful

Add items to Future[List] inside recursion

I'm having an issue with Future List inside a recursion.
When i implemented this method without Futures i used ListBuffer and then adding items to the list.
val filtered = ListBuffer.empty[PostMD]
filtered ++= postMd.filter(_.fromID == userID)
Now i'm trying to implement it with Futures but i can't find a similar solution
What will be the best way to work with a Future List.
def getData(url: String, userID: String) = {
val filtered: (List[PostMD]) => Future[List[PostMD]] = Future[List[PostMD]]
def inner(url: String): Unit = {
val chunk: Future[JsValue] = BusinessLogic.Methods.getJsonValue(url)
val postMd: Future[List[PostMD]] = for {
x <- chunk.map(_.\("data").as[List[JsValue]])
y <- x.map(_.\("data").as[PostMD])
} yield y
filtered = postMd.map(_.filter(_.fromID == userID)) // <- returned Future[List[PostMD]]
val next: String = (chunk.map(_.\("paging").\("next"))).toString
if (next != null) inner(next)
}
inner(url)
filtered
}
thanks,
miki
I tried to do what you want with random number generation.
import scala.concurrent.{Await, Future}
import scala.util.Random
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
val RANDOM = new Random()
def futureRec(num: Int, f: Future[List[Integer]]): Future[List[Integer]] = {
if(num == 0) {
f
} else {
f.flatMap(l => {
futureRec(num - 1, Future.successful(RANDOM.nextInt() :: l))
})
}
}
val futureResult = futureRec(5, Future.successful(Nil))
Await.result(futureResult, 5 minutes)
So I would do, what you want something like this:
def getData(url: String, userID: String):Future[List[PostMD]] = {
def inner(url: String, f: Future[List[PostMD]]): Future[List[PostMD]] = {
val chunk: Future[JsValue] = ???
chunk.flatMap(ch => {
val postMd = (ch \ "data").\\("data").map(_.as[PostMD]).toList
val relatedPostMd = postMd.filter(_.fromID == userID)
val next: String = (ch.\("paging").\("next")).as[String]
if (next != null)
inner(next, f.map(l => l ++ relatedPostMd))
else
f.map(l => l ++ relatedPostMd)
})
}
inner(url, Future.successful(Nil))
}

Extending collection classes with extra fields in Scala

I'm looking to create a class that is basically a collection with an extra field. However, I keep running into problems and am wondering what the best way of implementing this is. I've tried to follow the pattern given in the Scala book. E.g.
import scala.collection.IndexedSeqLike
import scala.collection.mutable.Builder
import scala.collection.generic.CanBuildFrom
import scala.collection.mutable.ArrayBuffer
class FieldSequence[FT,ST](val field: FT, seq: IndexedSeq[ST] = Vector())
extends IndexedSeq[ST] with IndexedSeqLike[ST,FieldSequence[FT,ST]] {
def apply(index: Int): ST = return seq(index)
def length = seq.length
override def newBuilder: Builder[ST,FieldSequence[FT,ST]]
= FieldSequence.newBuilder[FT,ST](field)
}
object FieldSequence {
def fromSeq[FT,ST](field: FT)(buf: IndexedSeq[ST])
= new FieldSequence(field, buf)
def newBuilder[FT,ST](field: FT): Builder[ST,FieldSequence[FT,ST]]
= new ArrayBuffer mapResult(fromSeq(field))
implicit def canBuildFrom[FT,ST]:
CanBuildFrom[FieldSequence[FT,ST], ST, FieldSequence[FT,ST]] =
new CanBuildFrom[FieldSequence[FT,ST], ST, FieldSequence[FT,ST]] {
def apply(): Builder[ST,FieldSequence[FT,ST]]
= newBuilder[FT,ST]( _ ) // What goes here?
def apply(from: FieldSequence[FT,ST]): Builder[ST,FieldSequence[FT,ST]]
= from.newBuilder
}
}
The problem is the CanBuildFrom that is implicitly defined needs an apply method with no arguments. But in these circumstances this method is meaningless, as a field (of type FT) is needed to construct a FieldSequence. In fact, it should be impossible to construct a FieldSequence, simply from a sequence of type ST. Is the best I can do to throw an exception here?
Then your class doesn't fulfill the requirements to be a Seq, and methods like flatMap (and hence for-comprehensions) can't work for it.
I'm not sure I agree with Landei about flatMap and map. If you replace with throwing an exception like this, most of the operations should work.
def apply(): Builder[ST,FieldSequence[FT,ST]] = sys.error("unsupported")
From what I can see in TraversableLike, map and flatMap and most other ones use the apply(repr) version. So for comprehensions seemingly work. It also feels like it should follow the Monad laws (the field is just carried accross).
Given the code you have, you can do this:
scala> val fs = FieldSequence.fromSeq("str")(Vector(1,2))
fs: FieldSequence[java.lang.String,Int] = FieldSequence(1, 2)
scala> fs.map(1 + _)
res3: FieldSequence[java.lang.String,Int] = FieldSequence(2, 3)
scala> val fs2 = FieldSequence.fromSeq("str1")(Vector(10,20))
fs2: FieldSequence[java.lang.String,Int] = FieldSequence(10, 20)
scala> for (x <- fs if x > 0; y <- fs2) yield (x + y)
res5: FieldSequence[java.lang.String,Int] = FieldSequence(11, 21, 12, 22)
What doesn't work is the following:
scala> fs.map(_ + "!")
// does not return a FieldSequence
scala> List(1,2).map(1 + _)(collection.breakOut): FieldSequence[String, Int]
java.lang.RuntimeException: unsupported
// this is where the apply() is used
For breakOut to work you would need to implement the apply() method. I suspect you could generate a builder with some default value for field: def apply() = newBuilder[FT, ST](getDefault) with some implementation of getDefault that makes sense for your use case.
For the fact that fs.map(_ + "!") does not preserve the type, you need to modify your signature and implementation, so that the compiler can find a CanBuildFrom[FieldSequence[String, Int], String, FieldSequence[String, String]]
implicit def canBuildFrom[FT,ST_FROM,ST]:
CanBuildFrom[FieldSequence[FT,ST_FROM], ST, FieldSequence[FT,ST]] =
new CanBuildFrom[FieldSequence[FT,ST_FROM], ST, FieldSequence[FT,ST]] {
def apply(): Builder[ST,FieldSequence[FT,ST]]
= sys.error("unsupported")
def apply(from: FieldSequence[FT,ST_FROM]): Builder[ST,FieldSequence[FT,ST]]
= newBuilder[FT, ST](from.field)
}
In the end, my answer was very similar to that in a previous question. The difference with that question and my original and the answer are slight but basically allow anything that has a sequence to be a sequence.
import scala.collection.SeqLike
import scala.collection.mutable.Builder
import scala.collection.mutable.ArrayBuffer
import scala.collection.generic.CanBuildFrom
trait SeqAdapter[+A, Repr[+X] <: SeqAdapter[X,Repr]]
extends Seq[A] with SeqLike[A,Repr[A]] {
val underlyingSeq: Seq[A]
def create[B](seq: Seq[B]): Repr[B]
def apply(index: Int) = underlyingSeq(index)
def length = underlyingSeq.length
def iterator = underlyingSeq.iterator
override protected[this] def newBuilder: Builder[A,Repr[A]] = {
val sac = new SeqAdapterCompanion[Repr] {
def createDefault[B](seq: Seq[B]) = create(seq)
}
sac.newBuilder(create)
}
}
trait SeqAdapterCompanion[Repr[+X] <: SeqAdapter[X,Repr]] {
def createDefault[A](seq: Seq[A]): Repr[A]
def fromSeq[A](creator: (Seq[A]) => Repr[A])(seq: Seq[A]) = creator(seq)
def newBuilder[A](creator: (Seq[A]) => Repr[A]): Builder[A,Repr[A]] =
new ArrayBuffer mapResult fromSeq(creator)
implicit def canBuildFrom[A,B]: CanBuildFrom[Repr[A],B,Repr[B]] =
new CanBuildFrom[Repr[A],B,Repr[B]] {
def apply(): Builder[B,Repr[B]] = newBuilder(createDefault)
def apply(from: Repr[A]) = newBuilder(from.create)
}
}
This fixes all the problems huynhjl brought up. For my original problem, to have a field and a sequence treated as a sequence, a simple class will now do.
trait Field[FT] {
val defaultValue: FT
class FieldSeq[+ST](val field: FT, val underlyingSeq: Seq[ST] = Vector())
extends SeqAdapter[ST,FieldSeq] {
def create[B](seq: Seq[B]) = new FieldSeq[B](field, seq)
}
object FieldSeq extends SeqAdapterCompanion[FieldSeq] {
def createDefault[A](seq: Seq[A]): FieldSeq[A] =
new FieldSeq[A](defaultValue, seq)
override implicit def canBuildFrom[A,B] = super.canBuildFrom[A,B]
}
}
This can be tested as so:
val StringField = new Field[String] { val defaultValue = "Default Value" }
StringField: java.lang.Object with Field[String] = $anon$1#57f5de73
val fs = new StringField.FieldSeq[Int]("str", Vector(1,2))
val fsfield = fs.field
fs: StringField.FieldSeq[Int] = (1, 2)
fsfield: String = str
val fm = fs.map(1 + _)
val fmfield = fm.field
fm: StringField.FieldSeq[Int] = (2, 3)
fmfield: String = str
val fs2 = new StringField.FieldSeq[Int]("str1", Vector(10, 20))
val fs2field = fs2.field
fs2: StringField.FieldSeq[Int] = (10, 20)
fs2field: String = str1
val ffor = for (x <- fs if x > 0; y <- fs2) yield (x + y)
val fforfield = ffor.field
ffor: StringField.FieldSeq[Int] = (11, 21, 12, 22)
fforfield: String = str
val smap = fs.map(_ + "!")
val smapfield = smap.field
smap: StringField.FieldSeq[String] = (1!, 2!)
smapfield: String = str
val break = List(1,2).map(1 + _)(collection.breakOut): StringField.FieldSeq[Int]
val breakfield = break.field
break: StringField.FieldSeq[Int] = (2, 3)
breakfield: String = Default Value
val x: StringField.FieldSeq[Any] = fs
val xfield = x.field
x: StringField.FieldSeq[Any] = (1, 2)
xfield: String = str