In the code below
test("duplicatedParamGetsFirst2") {
val str = "A=B&C" //"A=B&A=C"
val res = for {
x <- str.split("&")
y <- if(x.indexOf("=") == -1) "" else x.substring(x.indexOf("=") + 1)
} yield (if (x.indexOf("=") == -1) x else x.substring(0, x.indexOf("=")), y)
res.foreach(x => println(x))
}
I expected the result (A,B)(C,) but I got just (A,B). How do I fix it?
Your goal isn't completely clear. Maybe this gets close.
"A=B&C".split("&").map(_.split("="))
// res0: Array[Array[String]] = Array(Array(A, B), Array(C))
You can use .toList, or some other collection cast, if you don't want the result in Arrays.
Leo C's solution works. Here is another snippet, generating an array of pairs, close in style to your original code:
val s = "A=B&C"
val res = for {
t <- s.split("&")
a = t.split("=")
} yield a(0) -> a.lift(1).getOrElse("")
res.foreach(println)
// (A,B)
// (C,)
Not sure the result date type is what you want, as your for-comprehension will yield an Array of Tuple2[String, Char] because y would be of type Char when generated from String x. A simple way to generate your tuples would be to apply split twice as follows:
val str = "A=B&C"
str.split("&").
map( x => if (x contains "=") x.split("=") else Array(x, "") ).
map{ case Array(a, b) => (a, b) }
// res1: Array[(String, String)] = Array((A,B), (C,""))
If you must use for-comprehension, here's one way to do it:
val res = for {
x <- str.split("&")
} yield if (x contains "=")
x.split("=") match { case Array(a, b) => (a, b) } else
(x, "")
// res2: Array[(String, String)] = Array((A,B), (C,""))
The code should be:
val str = "A=B&C" //"A=B&A=C"
val res = for {
x <- str.split("&")
} yield
{
val y = if(x.indexOf("=") == -1) "" else x.substring(x.indexOf("=") + 1)
(if (x.indexOf("=") == -1) x else x.substring(0, x.indexOf("=")), y)
}
res.foreach(x => println(x))
About the for( expressA express B), I don't know how to express it.
Related
I have the following issue
Given this list in input , I want to concateneate integers for each line having the same title,
val listIn= List("TitleB,Int,11,0",
"TitleB,Int,1,0",
"TitleB,Int,1,0",
"TitleB,Int,3,0",
"TitleA,STR,3,0",
"TitleC,STR,4,5")
I wrote the following function
def sumB(list: List[String]): List[String] = {
val itemPattern = raw"(.*)(\d+),(\d+)\s*".r
list.foldLeft(ListMap.empty[String, (Int,Int)].withDefaultValue((0,0))) {
case (line, stri) =>
val itemPattern(k,i,j) = stri
val (a, b) = line(k)
line.updated(k, (i.toInt + a, j.toInt + b))
}.toList.map { case (k, (i, j)) => s"$k$i,$j" }
}
Expected output would be:
List("TitleB,Int,16,0",
"TitleA,STR,3,0",
"TitleC,STR,4,5")
Since you are looking to preserve the order of the titles as they appear in the input data, I would suggest you to use LinkedHashMap with foldLeft as below
val finalResult = listIn.foldLeft(new mutable.LinkedHashMap[String, (String, String, Int, Int)]){ (x, y) => {
val splitted = y.split(",")
if(x.keySet.contains(Try(splitted(0)).getOrElse(""))){
val oldTuple = x(Try(splitted(0)).getOrElse(""))
x.update(Try(splitted(0)).getOrElse(""), (Try(splitted(0)).getOrElse(""), Try(splitted(1)).getOrElse(""), oldTuple._3+Try(splitted(2).toInt).getOrElse(0), oldTuple._4+Try(splitted(3).toInt).getOrElse(0)))
x
}
else {
x.put(Try(splitted(0)).getOrElse(""), (Try(splitted(0)).getOrElse(""), Try(splitted(1)).getOrElse(""), Try(splitted(2).toInt).getOrElse(0), Try(splitted(3).toInt).getOrElse(0)))
x
}
}}.mapValues(iter => iter._1+","+iter._2+","+iter._3+","+iter._4).values.toList
finalResult should be your desired output
List("TitleB,Int,16,0", "TitleA,STR,3,0", "TitleC,STR,4,5")
When I try to filter a RDD using some condition exception generated due to bad record. I want to ignore those record don't want to capture too. So, how can I add a try block when I use filter method?
scala> val newRDD = mysc1.filter(_(3) == "NS3")
newRDD: org.apache.spark.rdd.RDD[Array[String]]
= MapPartitionsRDD[12] at filter at <console>:28
scala> newRDD.take(10)
Error:
java.lang.ArrayIndexOutOfBoundsException: 3
In this particular instance, it could be as simple as
mysc1.filter(arr => (arr.length > 3) && (arr(3) == "NS3"))
mysc1.flatMap(x => Try(x(3)).filter(_ == "NS3").map(_ => x).toOption)
or even better using Array as PartialFuntion
mysc1.flatMap(x => x.lift(3).filter(_ == "NS3").map(_ => x))
with for comperehension
mysc1.flatMap(x => for(y <- Try(x(3)).toOption if y == "NS3") yield x)
and
mysc1.flatMap(x => for(y <- x.lift(3) if y == "NS3") yield x)
and finally full for version
val newRDD = for {
x <- mysc1
y <- x.lift(3) if y == "NS3"
} yield x
scala> def filterFn[A](array: Array[A], valueToMatch: A): Boolean = array match {
| case Array(_, _, x, _*) if x == valueToMatch => true
| case _ => false
| }
filterFn: [A](array: Array[A], valueToMatch: A)Boolean
scala> filterFn(Array(1,2,3), 3)
res2: Boolean = true
scala> filterFn( Array(), "foobar" )
res4: Boolean = false
Then, you could do something like:
mysc1.filter(xs => filterFn(xs, "NS3") )
I have a two dimension collection(say Vector[Vector[Int]]) and I want to find the index of a element in it. My solution is like:
def find(vec: Vector[Vector[Int]], target: Int) = {
def indexOfTarget(v: Vector[Int]) = v.indexOf(target)
val r = vec.indexWhere((v) => indexOfTarget(v) != -1)
val c = indexOfTarget(vec(r))
(r, c)
}
But it's just... ugly. And it invokes indexOfTarget one more time than necessary.
Is there a better way?
How about:
vec.view.zipWithIndex.map {
case (iv,i) => (i, iv.indexOf(target))
}.find(_._2 != -1)
Note that thanks to the view, the zipWithIndex and map are evaluated lazily and this hence does only the calculation that is absolutely required.
You could try this:
def find(vec: Vector[Vector[Int]], target: Int) = {
val zipped = vec.zipWithIndex
val result = for{
tup <- zipped
index <- List(tup._1.indexOf(target))
if (index != -1)
} yield (tup._2, index)
result.headOption
}
The result type will be an Option[(Int,Int)] with it being None if no match.
An implementation that returns Some((x, y)) if the element is present, and None otherwise:
def find(vec: Vector[Vector[Int]], target: Int) = (for {
(xs, posX) <- vec.view.zipWithIndex
(`target`, posY) <- xs.view.zipWithIndex
} yield (posX, posY) ).headOption
scala> find(Vector(Vector(1, 2, 4), Vector(2, 2, 3)), 3)
res0: Option[(Int, Int)] = Some((1,2))
Is there any difference between this code:
for(term <- term_array) {
val list = hashmap.get(term)
...
}
and:
for(term <- term_array; val list = hashmap.get(term)) {
...
}
Inside the loop I'm changing the hashmap with something like this
hashmap.put(term, string :: list)
While checking for the head of list it seems to be outdated somehow when using the second code snippet.
The difference between the two is, that the first one is a definition which is created by pattern matching and the second one is a value inside a function literal. See Programming in Scala, Section 23.1 For Expressions:
for {
p <- persons // a generator
n = p.name // a definition
if (n startsWith "To") // a filter
} yield n
You see the real difference when you compile sources with scalac -Xprint:typer <filename>.scala:
object X {
val x1 = for (i <- (1 to 5); x = i*2) yield x
val x2 = for (i <- (1 to 5)) yield { val x = i*2; x }
}
After code transforming by the compiler you will get something like this:
private[this] val x1: scala.collection.immutable.IndexedSeq[Int] =
scala.this.Predef.intWrapper(1).to(5).map[(Int, Int), scala.collection.immutable.IndexedSeq[(Int, Int)]](((i: Int) => {
val x: Int = i.*(2);
scala.Tuple2.apply[Int, Int](i, x)
}))(immutable.this.IndexedSeq.canBuildFrom[(Int, Int)]).map[Int, scala.collection.immutable.IndexedSeq[Int]]((
(x$1: (Int, Int)) => (x$1: (Int, Int) #unchecked) match {
case (_1: Int, _2: Int)(Int, Int)((i # _), (x # _)) => x
}))(immutable.this.IndexedSeq.canBuildFrom[Int]);
private[this] val x2: scala.collection.immutable.IndexedSeq[Int] =
scala.this.Predef.intWrapper(1).to(5).map[Int, scala.collection.immutable.IndexedSeq[Int]](((i: Int) => {
val x: Int = i.*(2);
x
}))(immutable.this.IndexedSeq.canBuildFrom[Int]);
This can be simplified to:
val x1 = (1 to 5).map {i =>
val x: Int = i * 2
(i, x)
}.map {
case (i, x) => x
}
val x2 = (1 to 5).map {i =>
val x = i * 2
x
}
Instantiating variables inside for loops makes sense if you want to use that variable the for statement, like:
for (i <- is; a = something; if (a)) {
...
}
And the reason why your list is outdated, is that this translates to a foreach call, such as:
term_array.foreach {
term => val list= hashmap.get(term)
} foreach {
...
}
So when you reach ..., your hashmap has already been changed. The other example translates to:
term_array.foreach {
term => val list= hashmap.get(term)
...
}
I think this might be a common operation. So maybe it's inside the API but I can't find it. Also I'm interested in an efficient functional/simple solution if not.
Given a sequence of tuples ("a" -> 1, "b" ->2, "c" -> 3) I want to turn it into a map. That's easy using TraversableOnce.toMap. But I want to fail this construction if the resulting map "would contain a contradiction", i.e. different values assigned to the same key. Like in the sequence ("a" -> 1, "a" -> 2). But duplicates shall be allowed.
Currently I have this (very imperative) code:
def buildMap[A,B](in: TraversableOnce[(A,B)]): Option[Map[A,B]] = {
val map = new HashMap[A,B]
val it = in.toIterator
var fail = false
while(it.hasNext){
val next = it.next()
val old = map.put(next._1, next._2)
fail = old.isDefined && old.get != next._2
}
if(fail) None else Some(map.toMap)
}
Side Question
Is the final toMap really necessary? I get a type error when omitting it, but I think it should work. The implementation of toMap constructs a new map which I want to avoid.
As always when working with Seq[A] the optimal solution performance-wise depends on the concrete collection type.
A general but not very efficient solution would be to fold over an Option[Map[A,B]]:
def optMap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] =
in.iterator.foldLeft(Option(Map[A,B]())) {
case (Some(m),e # (k,v)) if m.getOrElse(k, v) == v => Some(m + e)
case _ => None
}
If you restrict yourself to using List[A,B]s an optimized version would be:
#tailrec
def rmap[A,B](in: List[(A,B)], out: Map[A,B] = Map[A,B]()): Option[Map[A,B]] = in match {
case (e # (k,v)) :: tail if out.getOrElse(k,v) == v =>
rmap(tail, out + e)
case Nil =>
Some(out)
case _ => None
}
Additionally a less idiomatic version using mutable maps could be implemented like this:
def mmap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] = {
val dest = collection.mutable.Map[A,B]()
for (e # (k,v) <- in) {
if (dest.getOrElse(k, v) != v) return None
dest += e
}
Some(dest.toMap)
}
Here is a fail-slowly solution (if creating the entire map and then discarding it is okay):
def uniqueMap[A,B](s: Seq[(A,B)]) = {
val m = s.toMap
if (m.size == s.length) Some(s) else None
}
Here is a mutable fail-fast solution (bail out as soon as the error is detected):
def uniqueMap[A,B](s: Seq[(A,B)]) = {
val h = new collection.mutable.HashMap[A,B]
val i = s.iterator.takeWhile(x => !(h contains x._1)).foreach(h += _)
if (h.size == s.length) Some(h) else None
}
And here's an immutable fail-fast solution:
def uniqueMap[A,B](s: Seq[(A,B)]) = {
def mapUniquely(i: Iterator[(A,B)], m: Map[A,B]): Option[Map[A,B]] = {
if (i.hasNext) {
val j = i.next
if (m contains j._1) None
else mapUniquely(i, m + j)
}
else Some(m)
}
mapUniquely(s.iterator, Map[A,B]())
}
Edit: and here's a solution using put for speed (hopefully):
def uniqueMap[A,B](s: Seq[(A,B)]) = {
val h = new collection.mutable.HashMap[A,B]
val okay = s.iterator.forall(x => {
val y = (h put (x._1,x._2))
y.isEmpty || y.get == x._2
})
if (okay) Some(h) else None
}
Edit: now tested, and it's ~2x as fast on input that works (returns true) than Moritz' or my straightforward solution.
Scala 2.9 is near, so why not to take advantage of the combinations method (inspired by Moritz's answer):
def optMap[A,B](in: List[(A,B)]) = {
if (in.combinations(2).exists {
case List((a,b),(c,d)) => a == c && b != d
case _ => false
}) None else Some(in.toMap)
}
scala> val in = List(1->1,2->3,3->4,4->5,2->3)
in: List[(Int, Int)] = List((1,1), (2,3), (3,4), (4,5), (2,3))
scala> optMap(in)
res29: Option[scala.collection.immutable.Map[Int,Int]] = Some(Map(1 -> 1, 2 -> 3, 3 -> 4, 4 -> 5))
scala> val in = List(1->1,2->3,3->4,4->5,2->3,1->2)
in: List[(Int, Int)] = List((1,1), (2,3), (3,4), (4,5), (2,3), (1,2))
scala> optMap(in)
res30: Option[scala.collection.immutable.Map[Int,Int]] = None
You can also use gourpBy as follows:
val pList = List(1 -> "a", 1 -> "b", 2 -> "c", 3 -> "d")
def optMap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] = {
Option(in.groupBy(_._1).map{case(_, list) => if(list.size > 1) return None else list.head})
}
println(optMap(pList))
It's efficiency is competitive to the above solutions.
In fact if you examine the gourpBy implementation you will see that it is very similar to some of the solutions suggested.