scala pattern matching to drop some cases - scala

With Scala 2.12, I am looping an array with pattern matching to create a new array as below.
val arrNew=arrText.map {
case x if x.startsWith("A") =>x.substring(12, 20)
case x if x.startsWith("B") =>x.substring(21, 40)
case x => "0"
}.filter(_!="0")
If an element matches one of the two patterns, a new element is added into the new array arrNew. Those that do not match will be dropped. My codes actually loop arrText twice with filter. If I do not include case x =>"0", there will be errors complaining some elements are not getting matched. Are the codes below the only way of looping just once? Any way I can loop only once with case matching?
map { x =>
if (condition1) (output1)
else if (condition2) (output2)
}

you can use collect
[use case] Builds a new collection by applying a partial function to all elements of this sequence on which the function is defined.
val arrNew=arrText.collect {
case x if x.startsWith("A") =>x.substring(12, 20)
case x if x.startsWith("B") =>x.substring(21, 40)
}

Related

scala to check whether loop through all element in a vector when joining two vectors

I have 2 vectors as below.
val vecBase21=....sortBy(r=>(r._1,r._2))
vecBase21: scala.collection.immutable.Vector[(String, String, Double)] = Vector((036,20210624 0400,2.0), (036,20210624 0405,2.0), (036,20210624 0410,2.0), (036,20210624 0415,2.0), (036,20210624 0420,2.0),...)
val vecBase22=....sortBy(r=>(r._1,r._2))
vecBase22: scala.collection.immutable.Vector[(String, String, Double)] = Vector((036,20210625 0400,2.0), (036,20210625 0405,2.0), (036,20210625 0410,2.0), (036,20210625 0415,2.0), (036,20210625 0420,2.0),...)
Inside, x._1 is ID, x._2 is date time, and x._3 is value.Then I did this to create a 3rd vector as follow.
val vecBase30=vecBase21.map(x=>vecBase22.filter(y=>x._1==y._1 && x._2==y._2).map(y=>(x._1,x._2,x._3,y._3))).flatten
This is literally a join in SQL, a join b on a.id=b.id and a.date_time=b.date_time. It loops in vecBase22 to search one combination of ID and date_time from vecBase21. As each combination is unique in one vector and they are sorted, I want to find out whether the loop in vecBase22 stops once it finds a match or it loops till the end of vecBase22 anyway. I tried this
val vecBase30=vecBase21.map(x=>vecBase22.filter(y=>x._1==y._1 && x._2==y._2).map{y=>
println("x1="+x._1+" y1="+y._1+" x2="+x._2+" y2="+y._2)
(x._1,x._2,x._3,y._3)}).flatten
But it apparently gives only matched results. Is there a way of printing all combinations from two vectors that the machine evaluates whether there is a match?
As each combination is unique in one vector and they are sorted, I want to find out whether the loop in vecBase22 stops once it finds a match or it loops till the end of vecBase22 anyway
When you call filter on vecBase22 you loop through every element of that collection to see if it matches the predicate. This returns a new collection and passes it to the map function. If you want to short-circuit the filtering process you could consider using the method collectFirst (Scala 2.12):
def collectFirst[B](pf: PartialFunction[A, B]): Option[B]
Finds the first element of the traversable or iterator for which the given partial function is defined, and applies the partial function to it.
Note: may not terminate for infinite-sized collections.
Note: might return different results for different runs, unless the underlying collection type is ordered.
pf: the partial function
returns: an option value containing pf applied to the first value for which it is defined, or None if none exists.
Example:
Seq("a", 1, 5L).collectFirst({ case x: Int => x*10 }) = Some(10)
So you could do something like:
val vecBase30: Vector[(String, String, Double, Double)] = vecBase21
.flatMap(x => vecBase22.collectFirst {
case matched: (String, String, Double) if x._1 == matched._1 && x._2 == matched._2 => (x._1, x._2, x._3, matched._3)
})
First off: yes it loop through all items of vecBase22, for each item of vecBase21. That's what the map and filter do.
If the println doesn't work, it is probably because you are executing you code in an interpreter that lose the std out. Some notebook maybe?
Also, if you want it stop once it find a match, use Seq.find
Finally, you can improve readability. here is a couple of ideas:
use case class instead of tuple
add space around operator
add new lines before each monad operation if it doesn't fit one line
use flatMap instead of map followed by flatten
add val type (not necessary but it helps reading the code)
That gives:
case class Item(id: String, time: String, value: Double)
case class Joint(id: String, time: String, v1: Double, v2: Double)
val vecBase21: Vector[Item] = ....sortBy(item => (item.id, item.time))
val vecBase22: Vector[Item] = ....sortBy(item => (item.id, item.time))
val vecBase30: Vector[Joint] = vecBase21.flatMap( x =>
vecBase22
.filter( y => x.id == y.id && x.time == y.time)
.map( y => Joint(x.id, x.time, x.value, y.value))
)

Instantiating objects on both sides of assignment operator in Scala; how does it work

I want to understand the mechanism behind the following line:
val List(x) = Seq(1 to 10)
What is the name of this mechanism? Is this the same as typecasting, or is there something else going on? (Tested in Scala 2.11.12.)
The mechanism is called Pattern Matching.
Here is the official documentation: https://docs.scala-lang.org/tour/pattern-matching.html
This works also in for comprehensions:
for{
People(name, firstName) <- peoples
} yield s"$firstName $name"
To your example:
val List(x) = Seq(1 to 10)
x is the content of that List - in your case Range 1 to 10 (You have a list with one element).
If you have really a list with more than one element, that would throw an exception
val List(x) = (1 to 10).toList // -> ERROR: undefined
So the correct pattern matching would be:
val x::xs = (1 to 10).toList
Now x is the first element (head) and xs the rest (tail).
I suspect that your problem is actually the expression
Seq(1 to 10)
This does not create a sequence of 10 elements, but a sequence containing a single Range object. So when you do this
val List(x) = Seq(1 to 10)
x is assigned to that Range object.
If you want a List of numbers, do this:
(1 to 10).toList
The pattern List(x) will only match if the expression on the right is an instance of List containing a single element. It will not match an empty List or a List with more than one element.
In this case it happens to work because the constructor for Seq actually returns an instance of List.
This technique is called object destructuring. Haskell provides a similar feature. Scala uses pattern matching to achieve this.
The method used in this case is Seq#unapplySeq:
https://www.scala-lang.org/api/current/scala/collection/Seq.html
You can think of
val <pattern> = <value>
<next lines>
as
<value> match {
case <pattern> =>
<next lines>
}
This doesn't happen only when <pattern> is just a variable or a variable with a type.

accumulate the element ahead of nth element in list

I want to create a list from the n+1th value till n.length , where n is the value passed to a function
def test(n:String) ={
val list = List("1","12","30","40","50")
list match{
case s::rest if s==n => Seq(rest).flatten
case _ => Nil
}
}
if "12" is passed , I get a empty list.
Expected Output = List("30,40,50")
Putting it another way, you want to remove the first n values from the list. For this, you use drop:
list.drop(n)
If you want to drop values based on a condition, use dropWhile:
list.dropWhile(_ != "30")
To exclude the matching element, use another drop:
list.dropWhile(_ != "12").drop(1)
[ As noted in the comments, using tail could cause exception if the element is not found ]

representation of values (x,y) vs x._1,y._1

I am new to spark using scala and very much confused by the notations (x,y) in some scenarios and x._1, y._1. Especially when they are used one over the other in spark transformations
could someone explain is there a specific rule of thumb for when to use each of these syntaxes
Basically there are 2 ways to access a tuple parameter in anonymous function. They're functionally equivalent, use whatever method you prefer.
Through the attributes _1, _2,...
Through pattern matching into variable with meaningful name
val tuples = Array((1, 2), (2, 3), (3, 4))
// Attributes
tuples.foreach { t =>
println(s"${t._1} ${t._2}")
}
// Pattern matching
tuples.foreach { t =>
t match {
case (first, second) =>
println(s"$first $second")
}
}
// Pattern matching can also written as
tuples.foreach { case (first, second) =>
println(s"$first $second")
}
The notation (x, y) is a tuple of 2 elements, x and y. There are different ways to get access to the individual values in a tuple. You can use the ._1, ._2 notation to get at the elements:
val tup = (3, "Hello") // A tuple with two elements
val number = tup._1 // Gets the first element (3) from the tuple
val text = tup._2 // Gets the second element ("Hello") from the tuple
You can also use pattern matching. One way to extract the two values is like this:
val (number, text) = tup
Unlike a collection (for example, a List) a tuple has a fixed number of values (it's not always exactly two values) and the values can have different types (such as an Int and a String in the example above).
There are many tutorials about Scala tuples, for example: Scala tuple examples and syntax.

Logic on a recursive method

One of my exercises requires me to write a recursive method in which a list is given, and it returns the same list with only every other element on it.
for example : List {"a", "b", "c"} would return
List{"a","c"}
I am writing in scala, and I understand that it has built in library but I am not supposed to use those. I can only use if/else, helper methods,and patterns.
How could I parse thru a list using head and tail only?
so far I have this:
def removeLetter(list:List[String]):List[String]=list match{
case Nil => Nil
case n::rest=>
if (n == rest){ // I understand that this doesn't quite work.
tail
}
else
head::removeLetter(tail)
}
}
I am looking for the logic and not code.
Using pattern matching, you can also deconstruct a list on it's first two elements in the same way you're doing with your n::rest construction. Just remember to also take lists with uneven length into account.
You correctly stated one base-case to the recursion: In case of an empty list, the result is again the empty list. case Nil => Nil
There is a second base-case: A list containing a single element is again the list itself. case x :: Nil => x :: Nil
You can formulate the recursive step as follows: Given a list with at least two elements, the result is a list containing the first element, followed by every other element of the list after the second element. case x :: y :: z => x :: removeLetter(z) (Note that here, x and y are both of type String while z is of type List[String])
Remark: If you wanted to, you could also combine both base-cases, as in both cases, the input to the function is its output.