Replace empty space in array of string in spark scala - scala

I have a scenario where Array[String] has got empty space. When i apply replace it doesn't return proper result. What would be mistake in my implementation.
scala> val chk2 =Array("8.0","60.0","")
chk2: Array[String] = Array(8.0, 60.0, "")
scala> val chk3 = chk2.map(x => (x.replace("", "0")))
chk3: Array[String] = Array(080.000, 06000.000, 0)

You can use map with pattern matching:
chk2.map{ case "" => "0"; case x => x }
// res2: Array[String] = Array(8.0, 60.0, 0)

Related

How to define variables by list droping extra parts?

This is working fine,
val Array(a,b) = "Hello,Bye".split(',')
But it is an error because extra-information is not ignored:
val Array(a,b) = "Hello,Bye,etc".split(',')
// scala.MatchError: ...
how to ignore extra-information?
Same error in the case of less items:
val Array(a,b) = "Hello".split(',')
IMPORTANT: no elegant way like the Javascript Destructuring assignment?
Add a placeholder using underscore:
val Array(a,b, _) = "Hello,Bye,etc".split(',')
EDIT: Using match-case syntax is generally more preferred and more flexible (and you can catch all possible outcome):
val s = "Hello,Bye,etc"
s.split(',') match {
case Array(a) => //...
case Array(a, b) => //...
case Array(a, b, rest#_*) => //...
case _ => //Catch all case to avoid MatchError
}
#_ will cover both instances.
val Array(a,b,x#_*) = "Hello,Bye,etc".split(',')
//a: String = Hello
//b: String = Bye
//x: Seq[String] = ArraySeq(etc)
val Array(c,d,z#_*) = "Hello,Bye".split(',')
//c: String = Hello
//d: String = Bye
//z: Seq[String] = ArraySeq()
From your comments it looks like you want to default to "", an empty String. I found a way to do it with Stream, which has been deprecated in Scala 2.13, but so far it is the cleanest solution I've found.
val Stream(a,b,c,d,_*) = "one,two,etc".split(",").toStream ++ Stream.continually("")
//a: String = one
//b: String = two
//c: String = etc
//d: String = ""
I would consider making the result values of type Option[String] by lift-ing the split Array[String] (viewed as a partial function) into an Int => Option[String] function:
val opts = "Hello".split(",").lift
// opts: Int => Option[String] = <function1>
opts(0)
// res1: Option[String] = Some(Hello)
opts(1)
// res2: Option[String] = None
Or, if String values are preferred with None translated to "":
val strs = "Hello,world".split(",").lift.andThen(_.getOrElse(""))
// strs: Int => String = scala.Function1$$Lambda$...
strs(0)
// res3: String = Hello
strs(1)
// res4: String = "world"
strs(2)
// res5: String = ""
Note that with this approach, you can take as many opts(i) or strs(i), i = 0, 1, 2, ..., as wanted.
You can do this by converting to List first:
val a :: b :: _ = "Hello,Bye,etc".split(',').toList

scala - a better way to parse an array of strings into a single string

I have implemented a method that is supposed to convert an array of strings into a single string. But getting an exception when using it with UDF and applying the UDF to a column:
val concatUdf = udf(convertArray)
java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Ljava.lang.String;
What should be improved in my current implementation in order to return valid String? I'm new to Scala and probably this is not the most elegant solution.
def convertArray: Array[String] => String =
(strings: Array[String]) => {
Option(strings) match {
case Some(arr) => strings.mkString(", ")
case Some(Array()) => ""
case None => ""
}
}
I believe you could just do
def convertArray(strings: Array[String]): String =
if (strings.nonEmpty)
strings.mkString(", ")
else
""
in your code, the second case is unreachable, because your first case will always match, including empty arrays. That said, your code seems to work fine for me on Scala 2.12.6 (apart from the warning about the unreachable code):
scala> def convertArray: Array[String] => String =
| (strings: Array[String]) => {
| Option(strings) match {
| case Some(arr) => strings.mkString(", ")
| case Some(Array()) => ""
| case None => ""
| }
| }
<console>:15: warning: unreachable code
case Some(Array()) => ""
^
convertArray: Array[String] => String
scala> convertArray(Array())
res1: String = ""
scala> convertArray(Array("bro"))
res2: String = bro
scala> convertArray(Array("bro", "dude"))
res3: String = bro, dude
Just use mkString, no need to re-invent the wheel:
println(Array().mkString(", "))
println(Array("hello").mkString(", "))
println(Array("hello", "world").mkString(", "))
Output:
//empty string
hello
hello, world
def convertArray(strings: Array[String]): String = Option(strings).getOrElse(Array()).mkString(", ")
Output :
scala> def convertArray(strings: Array[String]): String = Option(strings).getOrElse(Array()).mkString(", ")
convertArray: (strings: Array[String])String
scala> convertArray(Array("to", "to", "ta", "ta"))
res16: String = to, to, ta, ta
scala> convertArray(Array())
res17: String = ""
scala> convertArray(null)
res18: String = ""

How to check if any element of list starts with any character from given range?

I've got a list of names and given range of characters A..F
I've tried this:
val r = x.filter(_.name.startsWith('A' to 'F'))
but it doesn't work, any suggestions?
If the 1st character test is always a range, and not a collection of discreet characters, you can filter on the ASCII value.
val r = x.filter(y => y.name.head >= 'A' && y.name.head <= 'F')
"There exists a name from list 1 such that there exists a letter from list 2 such that the head of the name is the letter":
list1.exists(name => list2.exists(letter => name.headOption == Some(letter)))
Examples:
scala> List("Alice", "Bob").exists(name => ('A' to 'F').exists(letter => name.headOption == Some(letter)))
res1: Boolean = true
scala> List("Alice", "Bob").exists(name => ('X' to 'Z').exists(letter => name.headOption == Some(letter)))
res2: Boolean = false
you can use listOfExpectedChars.contains(firstLetter).
scala> val names = Seq("Architects", "Metallica", "Pink floyd", "Foo Fighters")
names: Seq[String] = List(Architects, Metallica, Pink floyd, Foo Fighters)
scala> names.filter(name => ('A' to 'F').contains(name(0)))
res1: Seq[String] = List(Architects, Foo Fighters)
name(0) is equivalent to name.head.
scala> names.filter(name => ('A' to 'F').contains(name.head))
res2: Seq[String] = List(Architects, Foo Fighters)
Note that .head on empty array errors out java.util.NoSuchElementException. So safer way is to use .headOption
scala> val names = Seq("Architects", "Metallica", "Pink floyd", "Foo Fighters", "")
names: Seq[String] = List(Architects, Metallica, Pink floyd, Foo Fighters, "")
scala> names.filter(name => ('A' to 'F').map(Option(_)).contains(name.headOption))
res3: Seq[String] = List(Architects, Foo Fighters)

Scala: How to convert a Seq[Array[String]] into Seq[Double]?

I need to split up the data in Seq[Array[String]] type into two Seq[Double] type items.
Sample data : ([4.0|1492168815],[11.0|1491916394],[2.0|1491812028]).
I used
var action1, timestamp1 = seq.map(t =>
(t.split("|"))).flatten.asInstanceOf[Seq[Double]]
but didn't get the results as expected. Looking out for valuable suggestions.
Assuming your input is in format "[double1|double2]",
scala> Seq("[4.0|1492168815]","[11.0|1491916394]","[2.0|1491812028]")
res72: Seq[String] = List([4.0|1492168815], [11.0|1491916394], [2.0|1491812028])
drop [ and ], then split by \\|, | is a metacharacter in regex.
scala> res72.flatMap {_.dropRight(1).drop(1).split("\\|").toList}.map{_.toDouble}
res74: Seq[Double] = List(4.0, 1.492168815E9, 11.0, 1.491916394E9, 2.0, 1.491812028E9)
Or you can do
scala> val actTime = seq.flatMap(t => t.map(x => { val temp = x.split("\\|"); (temp(0), temp(1))}))
actTime: Seq[(String, String)] = List((4.0,1492168815), (11.0,1491916394), (2.0,1491812028))
And to separate them into two Seq[Double] you can do
scala> val action1 = actTime.map(_._1.toDouble)
action1: Seq[Double] = List(4.0, 11.0, 2.0)
scala> val timestamp1 = actTime.map(_._2.toDouble)
timestamp1: Seq[Double] = List(1.492168815E9, 1.491916394E9, 1.491812028E9)
If there could be non-double data in input, you should use Try for safer Double conversion,
scala> Seq("[4.0|1492168815]","[11.0|1491916394]","[2.0|1491812028]", "[abc|abc]")
res75: Seq[String] = List([4.0|1492168815], [11.0|1491916394], [2.0|1491812028], [abc|abc])
scala> import scala.util.Success
import scala.util.Success
scala> import scala.util.Try
import scala.util.Try
scala> res75.flatMap {_.dropRight(1).drop(1).split("\\|").toList}
.map{d => Try(d.toDouble)}
.collect {case Success(x) => x }
res83: Seq[Double] = List(4.0, 1.492168815E9, 11.0, 1.491916394E9, 2.0, 1.491812028E9)
Extract each item in the input list with regular expression groups delimited with [, | and ],
val pat = "\\[(.*)\\|(.*)\\]".r
Hence if we suppose an input such as
val xs = List("[4.0|1492168815]","[11.0|1491916394]","[2.0|1491812028]")
consider
xs.map { v => val pat(a,b) = v; (a.toDouble, b.toLong) }.unzip
where we apply the regex defined in pat onto each item of the list, tuple each group for each item and finally unzip them so that we bisect the tuples into separate collections; viz.
(List(4.0, 11.0, 2.0),List(1492168815, 1491916394, 1491812028))

Scala reverse string

I'm a newbie to scala, I'm just writing a simple function to reverse a given string:
def reverse(s: String) : String
for(i <- s.length - 1 to 0) yield s(i)
the yield gives back a scala.collection.immutable.IndexedSeq[Char], and can not convert it to a String. (or is it something else?)
how do i write this function ?
Note that there is already defined function:
scala> val x = "scala is awesome"
x: java.lang.String = scala is awesome
scala> x.reverse
res1: String = emosewa si alacs
But if you want to do that by yourself:
def reverse(s: String) : String =
(for(i <- s.length - 1 to 0 by -1) yield s(i)).mkString
or (sometimes it is better to use until, but probably not in that case)
def reverse(s: String) : String =
(for(i <- s.length until 0 by -1) yield s(i-1)).mkString
Also, note that if you use reversed counting (from bigger one to less one value) you should specify negative step or you will get an empty set:
scala> for(i <- x.length until 0) yield i
res2: scala.collection.immutable.IndexedSeq[Int] = Vector()
scala> for(i <- x.length until 0 by -1) yield i
res3: scala.collection.immutable.IndexedSeq[Int] = Vector(16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
Here's a short version
def reverse(s: String) = ("" /: s)((a, x) => x + a)
edit
: or even shorter, we have the fantastically cryptic
def reverse(s: String) = ("" /: s)(_.+:(_))
but I wouldn't really recommend this...
You could also write this using a recursive approach (throwing this one in just for fun)
def reverse(s: String): String = {
if (s.isEmpty) ""
else reverse(s.tail) + s.head
}
As indicated by om-nom-nom, pay attention to the by -1 (otherwise you are not really iterating and your result will be empty). The other trick you can use is collection.breakOut.
It can also be provided to the for comprehension like this:
def reverse(s: String): String =
(for(i <- s.length - 1 to 0 by -1) yield s(i))(collection.breakOut)
reverse("foo")
// String = oof
The benefit of using breakOut is that it will avoid creating a intermediate structure as in the mkString solution.
note: breakOut is leveraging CanBuildFrom and builders which are part of the foundation of the redesigned collection library introduced in scala 2.8.0
All the above answers are correct and here's my take:
scala> val reverseString = (str: String) => str.foldLeft("")((accumulator, nextChar) => nextChar + accumulator)
reverseString: String => java.lang.String = <function1>
scala> reverseString.apply("qwerty")
res0: java.lang.String = ytrewq
def rev(s: String): String = {
val str = s.toList
def f(s: List[Char], acc: List[Char]): List[Char] = s match {
case Nil => acc
case x :: xs => f(xs, x :: acc)
}
f(str, Nil).mkString
}
Here is my version of reversing a string.
scala> val sentence = "apple"
sentence: String = apple
scala> sentence.map(x => x.toString).reduce((x, y) => (y + x))
res9: String = elppa