How to insert elements in list from another list using scala? - scala

I have two lists -
A = (("192.168.1.1","private","Linux_server","str1"),
("192.168.1.2","private","Linux_server","str2"))
B = ("A","B")
I want following output
outputList = (("192.168.1.1","private","Linux_server","str1", "A"),
("192.168.1.2","private","Linux_server","str2","B"))
I want to insert second list element into first list as list sequence.
Two lists size will be always same.
How do I get above output using scala??

The short answer:
A = (A zip B).map({ case (x, y) => x :+ y })
Some compiling code to be more explicit:
val a = List(
List("192.168.1.1", "private", "Linux_server", "str1"),
List("192.168.1.2", "private", "Linux_server", "str2")
)
val b = List("A", "B")
val c = List(
List("192.168.1.1", "private", "Linux_server", "str1", "A"),
List("192.168.1.2", "private", "Linux_server", "str2", "B")
)
assert((a zip b).map({ case (x, y) => x :+ y }) == c)

Related

Counting number of occurrences of Array element in a RDD

I have a RDD1 with Key-Value pair of type [(String, Array[String])] (i will refer to them as (X, Y)), and a Array Z[String].
I'm trying for every element in Z to count how many X instances there are that have Z in Y. I want my output as ((X, Z(i)), #ofinstances).
RDD1= ((A, (2, 3, 4), (B, (4, 4, 4)), (A, (4, 5)))
Z = (1, 4)
then i want to get:
(((A, 4), 2), ((B, 4), 1))
Hope that made sense.
As you can see over, i only want an element if there is atleast one occurence.
I have tried this so far:
val newRDD = RDD1.map{case(x, y) => for(i <- 0 to (z.size-1)){if(y.contains(z(i))) {((x, z(i)), 1)}}}
My output here is an RDD[Unit]
Im not sure if what i'm asking for is even possible, or if i have to do it an other way.
So it is just another word count
val rdd = sc.parallelize(Seq(
("A", Array("2", "3", "4")),
("B", Array("4", "4", "4")),
("A", Array("4", "5"))))
val z = Array("1", "4")
To make lookups efficient convert z to Set:
val zs = z.toSet
val result = rdd
.flatMapValues(_.filter(zs contains _).distinct)
.map((_, 1))
.reduceByKey(_ + _)
where
_.filter(zs contains _).distinct
filters out values that occur in z and deduplicates.
result.take(2).foreach(println)
// ((B,4),1)
// ((A,4),2)

Could Anyone explain about this code?

according this link: https://github.com/amplab/training/blob/ampcamp6/machine-learning/scala/solution/MovieLensALS.scala
I don't understand what is the point of :
val numUsers = ratings.map(_._2.user).distinct.count
val numMovies = ratings.map(_._2.product).distinct.count
_._2.[user|product] , what does that mean?
That is accessing the tuple elements: The following example might explain it better.
val xs = List(
(1, "Foo"),
(2, "Bar")
)
xs.map(_._1) // => List(1,2)
xs.map(_._2) // => List("Foo", "Bar")
// An equivalent way to write this
xs.map(e => e._1)
xs.map(e => e._2)
// Perhaps a better way is
xs.collect {case (a, b) => a} // => List(1,2)
xs.collect {case (a, b) => b} // => List("Foo", "Bar")
ratings is a collection of tuples:(timestamp % 10, Rating(userId, movieId, rating)). The first underscore in _._2.user refers to the current element being processed by the map function. So the first underscore now refers to a tuple (pair of values). For a pair tuple t you can refer to its first and second elements in the shorthand notation: t._1 & t._2 So _._2 is selecting the second element of the tuple currently being processed by the map function.
val ratings = sc.textFile(movieLensHomeDir + "/ratings.dat").map { line =>
val fields = line.split("::")
// format: (timestamp % 10, Rating(userId, movieId, rating))
(fields(3).toLong % 10, Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble))
}

How to merge adjacent, similar entries of a sorted `Stream` or `List`

Given a
large ( > 1,000,000 entries, don't expect it to fit into memory)
sorted (wrt. the first value of the tuple)
stream like
val ss = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")).toStream
// just for demo
val xs = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0"))
I want to join adjacent entries such that the output of transformation becomes
List( (1, "2.5 5.0"), (2, "3.0 4.0 6.0"), (3, "6.0") )
The second tuple value is to be merged by some monoid function (string concatenation here)
Ideas / attempts / tries
groupBy
groupBy does not seem to be a valid alternative, as entries are collected in a map in memory.
scanLeft
val ss: Stream[(Int, String)] = List( (1, "2.5"), (1, "5.0"), (2, "3.0")).toStream
val transformed = ss.scanLeft(Joiner(0, "a"))( (j, t) => {
j.x match {
case t._1 => j.copy(y = j.y + " " + t._2)
case _ => Joiner(t._1, t._2)
}
})
println(transformed.toList)
which ends up in
List(Joiner(0,a), Joiner(1,2.5), Joiner(1,2.5 5.0), Joiner(2,3.0))
(please ignore wrapping Joiner)
but I didn't find a way to get rid of the "incomplete" entries.
Emit true to indicate the initial element (when the value switches), not the final one, that's easy, right? Then you can just collect those entries, that are followed by the initial one.
Something like this perhaps:
ss.scanLeft((0, "", true)) {
case ((a, str, _), (b, c)) if (str == "" || a == b) => (b, str + " " + c, false)
case (_, (b, c)) => (b, c.toString, true)
} .:+ (0, "", true)
.sliding(2)
.collect { case Seq(a, (_, _, true)) => (a._1, a._2) }
(note the .:+ thingy - it appends a "dummy" entry to the end of the stream, so that the last "real" element is also followed by the "true" entry, and does not get filtered out).
This seems to do okay.
def makeEm(s: Stream[(Int, String)]) = {
import Stream._
#tailrec
def z(source: Stream[(Int, String)], curr: (Int, List[String]), acc: Stream[(Int, String)]): Stream[(Int, String)] = source match {
case Empty =>
Empty
case x #:: Empty =>
acc :+ (curr._1 -> (x._2 :: curr._2).mkString(","))
case x #:: y #:: etc if x._1 != y._1 =>
val c = curr._1 -> (x._2 :: curr._2).mkString(",")
z(y #:: etc, (y._1, List[String]()), acc :+ c)
case x #:: etc =>
z(etc, (x._1, x._2 :: curr._2), acc)
}
z(s, (0, List()), Stream())
}
Tests:
val ss = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")).toStream
makeEm(ss).toList.mkString(",")
val s = List().toStream
makeEm(s).toList.mkString(",")
val ss2 = List( (1, "2.5"), (1, "5.0")).toStream
makeEm(ss2).toList.mkString(",")
val s3 = List((1, "2.5"),(2, "4.0"),(3, "1.0")).toStream
makeEm(s3).toList.mkString(",")
Output
ss: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res0: String = (1,5.0,2.5),(2,6.0,4.0,3.0),(3,1.0)
s: scala.collection.immutable.Stream[Nothing] = Stream()
res1: String =
ss2: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res2: String = (1,5.0,2.5)
s3: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res3: String = (0,2.5),(2,4.0),(3,1.0)

How to convert String Iterator into List of Tuples

How can
val s = Iterator("a|b|2","a|c|3")
be converted to
List( (("a" , "b") , 2) , (("a" , "c") , 3)))
This is my current progress :
val v = s.map(m => m.split("|")(0))
How can I parse the String into its constituent parts so can be converted to a List of Tuples ?
You can match on the array returned from split:
val v = s.map(_.split('|') match { case Array(a, b, n) => ((a, b), n.toInt) })

Multiple yields in sequence comprehension?

I'm trying to learn Scala and tried to write a sequence comprehension that extracts unigrams, bigrams and trigrams from a sequence. E.g., [1,2,3,4] should be transformed to (not Scala syntax)
[1; _,1; _,_,1; 2; 1,2; _,1,2; 3; 2,3; 1,2,3; 4; 3,4; 2,3,4]
In Scala 2.8, I tried the following:
def trigrams(tokens : Seq[T]) = {
var t1 : Option[T] = None
var t2 : Option[T] = None
for (t3 <- tokens) {
yield t3
yield (t2,t3)
yield (t1,t2,Some(t3))
t1 = t2
t2 = t3
}
}
But this doesn't compile as, apparently, only one yield is allowed in a for-comprehension (no block statements either). Is there any other elegant way to get the same behavior, with only one pass over the data?
You can't have multiple yields in a for loop because for loops are syntactic sugar for the map (or flatMap) operations:
for (i <- collection) yield( func(i) )
translates into
collection map {i => func(i)}
Without a yield at all
for (i <- collection) func(i)
translates into
collection foreach {i => func(i)}
So the entire body of the for loop is turned into a single closure, and the presence of the yield keyword determines whether the function called on the collection is map or foreach (or flatMap). Because of this translation, the following are forbidden:
Using imperative statements next to a yield to determine what will be yielded.
Using multiple yields
(Not to mention that your proposed verison will return a List[Any] because the tuples and the 1-gram are all of different types. You probably want to get a List[List[Int]] instead)
Try the following instead (which put the n-grams in the order they appear):
val basis = List(1,2,3,4)
val slidingIterators = 1 to 4 map (basis sliding _)
for {onegram <- basis
ngram <- slidingIterators if ngram.hasNext}
yield (ngram.next)
or
val basis = List(1,2,3,4)
val slidingIterators = 1 to 4 map (basis sliding _)
val first=slidingIterators head
val buf=new ListBuffer[List[Int]]
while (first.hasNext)
for (i <- slidingIterators)
if (i.hasNext)
buf += i.next
If you prefer the n-grams to be in length order, try:
val basis = List(1,2,3,4)
1 to 4 flatMap { basis sliding _ toList }
scala> val basis = List(1, 2, 3, 4)
basis: List[Int] = List(1, 2, 3, 4)
scala> val nGrams = (basis sliding 1).toList ::: (basis sliding 2).toList ::: (basis sliding 3).toList
nGrams: List[List[Int]] = ...
scala> nGrams foreach (println _)
List(1)
List(2)
List(3)
List(4)
List(1, 2)
List(2, 3)
List(3, 4)
List(1, 2, 3)
List(2, 3, 4)
I guess I should have given this more thought.
def trigrams(tokens : Seq[T]) : Seq[(Option[T],Option[T],T)] = {
var t1 : Option[T] = None
var t2 : Option[T] = None
for (t3 <- tokens)
yield {
val tri = (t1,t2,t3)
t1 = t2
t2 = Some(t3)
tri
}
}
Then extract the unigrams and bigrams from the trigrams. But can anyone explain to me why 'multi-yields' are not permitted, and if there's any other way to achieve their effect?
val basis = List(1, 2, 3, 4)
val nGrams = basis.map(x => (x)) ::: (for (a <- basis; b <- basis) yield (a, b)) ::: (for (a <- basis; b <- basis; c <- basis) yield (a, b, c))
nGrams: List[Any] = ...
nGrams foreach (println(_))
1
2
3
4
(1,1)
(1,2)
(1,3)
(1,4)
(2,1)
(2,2)
(2,3)
(2,4)
(3,1)
(3,2)
(3,3)
(3,4)
(4,1)
(4,2)
(4,3)
(4,4)
(1,1,1)
(1,1,2)
(1,1,3)
(1,1,4)
(1,2,1)
(1,2,2)
(1,2,3)
(1,2,4)
(1,3,1)
(1,3,2)
(1,3,3)
(1,3,4)
(1,4,1)
(1,4,2)
(1,4,3)
(1,4,4)
(2,1,1)
(2,1,2)
(2,1,3)
(2,1,4)
(2,2,1)
(2,2,2)
(2,2,3)
(2,2,4)
(2,3,1)
(2,3,2)
(2,3,3)
(2,3,4)
(2,4,1)
(2,4,2)
(2,4,3)
(2,4,4)
(3,1,1)
(3,1,2)
(3,1,3)
(3,1,4)
(3,2,1)
(3,2,2)
(3,2,3)
(3,2,4)
(3,3,1)
(3,3,2)
(3,3,3)
(3,3,4)
(3,4,1)
(3,4,2)
(3,4,3)
(3,4,4)
(4,1,1)
(4,1,2)
(4,1,3)
(4,1,4)
(4,2,1)
(4,2,2)
(4,2,3)
(4,2,4)
(4,3,1)
(4,3,2)
(4,3,3)
(4,3,4)
(4,4,1)
(4,4,2)
(4,4,3)
(4,4,4)
You could try a functional version without assignments:
def trigrams[T](tokens : Seq[T]) = {
val s1 = tokens.map { Some(_) }
val s2 = None +: s1
val s3 = None +: s2
s1 zip s2 zip s3 map {
case ((t1, t2), t3) => (List(t1), List(t1, t2), List(t1, t2, t3))
}
}