Converting empty string to a None in Scala - scala

I have a requirement to concatenate two potentially empty address lines into one (with a space in between the two lines), but I need it to return a None if both address lines are None (this field is going into an Option[String] variable). The following command gets me what I want in terms of the concatenation:
Seq(myobj.address1, myobj.address2).flatten.mkString(" ")
But that gives me an empty string instead of a None in case address1 and address2 are both None.

This converts a single string to Option, converting it to None if it's either null or an empty-trimmed string:
(kudos to #Miroslav Machura for this simpler version)
Option(x).filter(_.trim.nonEmpty)
Alternative version, using collect:
Option(x).collect { case x if x.trim.nonEmpty => x }

Assuming:
val list1 = List(Some("aaaa"), Some("bbbb"))
val list2 = List(None, None)
Using plain Scala:
scala> Option(list1).map(_.flatten).filter(_.nonEmpty).map(_.mkString(" "))
res38: Option[String] = Some(aaaa bbbb)
scala> Option(list2).map(_.flatten).filter(_.nonEmpty).map(_.mkString(" "))
res39: Option[String] = None
Or using scalaz:
import scalaz._; import Scalaz._
scala> list1.flatten.toNel.map(_.toList.mkString(" "))
res35: Option[String] = Some(aaaa bbbb)
scala> list2.flatten.toNel.map(_.toList.mkString(" "))
res36: Option[String] = None

Well, In Scala there is Option[ T ] type which is intended to eliminate various run-time problems due to nulls.
So... Here is how you use Options, So basically a Option[ T ] can have one of the two types of values - Some[ T ] or None
// A nice string
var niceStr = "I am a nice String"
// A nice String option
var noceStrOption: Option[ String ] = Some( niceStr )
// A None option
var noneStrOption: Option[ String ] = None
Now coming to your part of problem:
// lets say both of your myobj.address1 and myobj.address2 were normal Strings... then you would not have needed to flatten them... this would have worked..
var yourString = Seq(myobj.address1, myobj.address2).mkString(" ")
// But since both of them were Option[ String ] you had to flatten the Sequence[ Option[ String ] ] to become a Sequence[ String ]
var yourString = Seq(myobj.address1, myobj.address2).flatten.mkString(" ")
//So... what really happens when you flatten a Sequence[ Option[ String ] ] ?
// Lets say we have Sequence[ Option [ String ] ], like this
var seqOfStringOptions = Seq( Some( "dsf" ), None, Some( "sdf" ) )
print( seqOfStringOptions )
// List( Some(dsf), None, Some(sdf))
//Now... lets flatten it out...
var flatSeqOfStrings = seqOfStringOptions.flatten
print( flatSeqOfStrings )
// List( dsf, sdf )
// So... basically all those Option[ String ] which were None are ignored and only Some[ String ] are converted to Strings.
// So... that means if both address1 and address2 were None... your flattened list would be empty.
// Now what happens when we create a String out of an empty list of Strings...
var emptyStringList: List[ String ] = List()
var stringFromEmptyList = emptyStringList.mkString( " " )
print( stringFromEmptyList )
// ""
// So... you get an empty String
// Which means we are sure that yourString will always be a String... though it can be empty (ie - "").
// Now that we are sure that yourString will alwyas be a String, we can use pattern matching to get out Option[ String ] .
// Getting an appropriate Option for yourString
var yourRequiredOption: Option[ String ] = yourString match {
// In case yourString is "" give None.
case "" => None
// If case your string is not "" give Some[ yourString ]
case someStringVal => Some( someStringVal )
}

You might also use the reduce method here:
val mySequenceOfOptions = Seq(myAddress1, myAddress2, ...)
mySequenceOfOptions.reduce[Option[String]] {
case(Some(soFar), Some(next)) => Some(soFar + " " + next)
case(None, next) => next
case(soFar, None) => soFar
}

Here's a function that should solve the original problem.
def mergeAddresses(addr1: Option[String],
addr2: Option[String]): Option[String] = {
val str = s"${addr1.getOrElse("")} ${addr2.getOrElse("")}"
if (str.trim.isEmpty) None else Some(str)
}

the answer from #dk14 is actually incorrect/incomplete because if list2 has a Some("") it will not yield a None because the filter() evaluates to an empty list instead of a None ( ScalaFiddle link)
val list2 = List(None, None, Some(""))
// this yields Some()
println(Option(list2).map(_.flatten).filter(_.nonEmpty).map(_.mkString(" ")))
but it's close. you just need to ensure that empty string is converted to a None so we combine it with #juanmirocks answer (ScalaFiddle link):
val list1 = List(Some("aaaa"), Some("bbbb"))
val list2 = List(None, None, Some(""))
// yields Some(aaaa bbbbb)
println(Option(list1.map(_.collect { case x if x.trim.nonEmpty => x }))
.map(_.flatten).filter(_.nonEmpty).map(_.mkString(" ")))
// yields None
println(Option(list2.map(_.collect { case x if x.trim.nonEmpty => x }))
.map(_.flatten).filter(_.nonEmpty).map(_.mkString(" ")))

I was searching a kind of helper function like below in the standard library but did not find yet, so I defined in the meantime:
def string_to_Option(x: String): Option[String] = {
if (x.nonEmpty)
Some(x)
else
None
}
with the help of the above you can then:
import scala.util.chaining.scalaUtilChainingOps
object TEST123 {
def main(args: Array[String]): Unit = {
val address1 = ""
val address2 = ""
val result =
Seq(
address1 pipe string_to_Option,
address2 pipe string_to_Option
).flatten.mkString(" ") pipe string_to_Option
println(s"The result is «${result}»")
// prints out: The result is «None»
}
}

With Scala 2.13:
Option.unless(address.isEmpty)(address)
For example:
val address = "foo"
Option.unless(address.isEmpty)(address) // Some("foo")
val address = ""
Option.unless(address.isEmpty)(address) // None

implicit class EmptyToNone(s: String):
def toOption: Option[String] = if (s.isEmpty) None else Some(s)
Example:
scala> "".toOption
val res0: Option[String] = None
scala> "foo".toOption
val res1: Option[String] = Some(foo)
(tested with Scala 3.2.2)

Related

How to input and output an Seq of an object to a function in Scala

I want to parse a column to get split values using Seq of an object
case class RawData(rawId: String, rawData: String)
case class SplitData(
rawId: String,
rawData: String,
split1: Option[Int],
split2: Option[String],
split3: Option[String],
split4: Option[String]
)
def rawDataParser(unparsedRawData: Seq[RawData]): Seq[RawData] = {
unparsedrawData.map(rawData => {
val split = rawData.address.split(", ")
rawData.copy(
split1 = Some(split(0).toInt),
split2 = Some(split(1)),
split3 = Some(split(2)),
split4 = Some(split(3))
)
})
}
val rawDataDF= Seq[(String, String)](
("001", "Split1, Split2, Split3, Split4"),
("002", "Split1, Split2, Split3, Split4")
).toDF("rawDataID", "rawData")
val rawDataDS: Dataset[RawData] = rawDataDF.as[RawData]
I need to use rawDataParser function to parse my rawData. However, the parameter to the function is of type Seq. I am not sure how should I convert rawDataDS as an input to function to parse the raw data. some form of guidance to solve this is appreciated.
Each DataSet is further divided into partitions. You can use mapPartitions with a mapping Iterator[T] => Iterator[U] to convert a DataSet[T] into a DataSet[U].
So, you can just use your addressParser as the argument for mapPartition.
val rawAddressDataDS =
spark.read
.option("header", "true")
.csv(csvFilePath)
.as[AddressRawData]
val addressDataDS =
rawAddressDataDS
.map { rad =>
AddressData(
addressId = rad.addressId,
address = rad.address,
number = None,
road = None,
city = None,
country = None
)
}
.mapPartitions { unparsedAddresses =>
addressParser(unparsedAddresses.toSeq).toIterator
}

How to split a Scala list into sublists based on a list of indexes

I have a function which should take in a long string and separate it into a list of strings where each list element is a sentence of the article. I am going to achieve this by splitting on space and then grouping the elements from that split according to the tokens which end with a dot:
def getSentences(article: String): List[String] = {
val separatedBySpace = article
.map((c: Char) => if (c == '\n') ' ' else c)
.split(" ")
val splitAt: List[Int] = Range(0, separatedBySpace.size)
.filter(i => endsWithDot(separatedBySpace(0))).toList
// TODO
}
I have separated the string on space, and I've found each index that I want to group the list on. But how do I now turn separatedBySpace into a list of sentences based on splitAt?
Example of how it should work:
article = "I like donuts. I like cats."
result = List("I like donuts.", "I like cats.")
PS: Yes, I now that my algorithm for splitting the article into sentences has flaws, I just want to make a quick naive method to get the job done.
I ended up solving this by using recursion:
def getSentenceTokens(article: String): List[List[String]] = {
val separatedBySpace: List[String] = article
.replace('\n', ' ')
.replaceAll(" +", " ") // regex
.split(" ")
.toList
val splitAt: List[Int] = separatedBySpace.indices
.filter(i => ( i > 0 && endsWithDot(separatedBySpace(i - 1)) ) || i == 0)
.toList
groupBySentenceTokens(separatedBySpace, splitAt, List())
}
def groupBySentenceTokens(tokens: List[String], splitAt: List[Int], sentences: List[List[String]]): List[List[String]] = {
if (splitAt.size <= 1) {
if (splitAt.size == 1) {
sentences :+ tokens.slice(splitAt.head, tokens.size)
} else {
sentences
}
}
else groupBySentenceTokens(tokens, splitAt.tail, sentences :+ tokens.slice(splitAt.head, splitAt.tail.head))
}
val s: String = """I like donuts. I like cats
This is amazing"""
s.split("\\.|\n").map(_.trim).toList
//result: List[String] = List("I like donuts", "I like cats", "This is amazing")
To include the dots in the sentences:
val (a, b, _) = s.replace("\n", " ").split(" ")
.foldLeft((List.empty[String], List.empty[String], "")){
case ((temp, result, finalStr), word) =>
if (word.endsWith(".")) {
(List.empty[String], result ++ List(s"$finalStr${(temp ++ List(word)).mkString(" ")}"), "")
} else {
(temp ++ List(word), result, finalStr)
}
}
val result = b ++ List(a.mkString(" ").trim)
//result = List("I like donuts.", "I like cats.", "This is amazing")

How to access individual value of a map in a Map[String,(String, String)]

How can i access the individual values in a Map, so to say. the Map is of type Map[String,(String, String)]. Based on the input string i want to return value(String1) or value(String2) if the argument matches key or to return the argument itself in case there is no match ,
val mappeddata = Map("LOWES" -> ("Lowes1","Lowes2"))
Updated.
the below is working in case when none of the values are empty
scala> mappeddata.find(_._1 == "LOWES").map(_._2._2).getOrElse("LOWES")
res135: Option[String] = Some(Lowes2)
scala> mappeddata.find(_._1 == "LOWES").map(_._2._1).getOrElse("LOWES")
res136: Option[String] = Some(Lowes1)
but if the value is empty that i want to return input string itself but instead its returning null
scala> val mappeddata = Map("LOWES" -> ("Lowes1",""))
mappeddata: scala.collection.immutable.Map[String,(String, String)] = Map(LOWES -> (Lowes1,""))
scala> mappeddata.find(_._1 == "LOWES").map(_._2._2).getOrElse("LOWES")
res140: String = "
what needs to be done to fix this?
Basically you are asking to get the values part of a Map. In my below example, I am extracting Lowes2.
val m = Map("LOWES" -> ("Lowes1","Lowes2"), "Other" -> ("other1","other2"))
println(m.get("LOWES").get._1) // will print **Lowes2**
Not sure what you want but maybe this is helpful:
val m = Map[String, (String, String)]()
val value = m("first") // value if exists or throws exception
val value: Option[(String, String)] = m.get("first")// as an option
val values: List[(String, String)] = m.map(_._2).toList // list of values
This works.
scala> if (mappeddata.get("LOWES").get._1.isEmpty) "LOWES" else mappeddata.get("LOWES").get._1
res163: String = Lowes1
scala> if (mappeddata.get("LOWES").get._2.isEmpty) "LOWES" else mappeddata.get("LOWES").get._2
res164: String = LOWES
//Updated
scala> if (mappeddata("LOWES")._1.isEmpty) "LOWES" else mappeddata("LOWES")._1
res163: String = Lowes1
scala> if (mappeddata("LOWES")._2.isEmpty) "LOWES" else mappeddata("LOWES")._2
res164: String = LOWES

Trouble with Scala pattern matching with map - required String

I'm trying to make my RDD into a pairdRDD, but having trouble with the pattern matching and I have no idea what I'm doing wrong..
val test = sc.textFile("neighborhood_test.csv");
val nhead0 = test.first;
val test_split = test.map(line => line.split("\t"));
val nhead = test_split.first;
val test_neigh0 = test.filter(line => line!= nhead0);
//test_neigh0.first = 3335 Dunlap Seattle
val test_neigh1 = test_neigh0.map(line => line.split("\t"));
//test_neigh1.first = Array[String] = Array(3335, Dunlap, Seattle)
val test_neigh = test_neigh1.map({case (id, neigh, city) => (id, (neigh, city))});
Gives error:
found : (T1, T2, T3)
required: String
val test_neigh = test_neigh0.map({case (id, neigh, city) => (id, (neigh, city))});
EDIT:
The inputfile is tab seperated and looks like this:
id neighbourhood city
3335 Dunlap Seattle
4291 Roosevelt Seattle
5682 South Delridge Seattle
As output I wan't a pairRDD with id as key, and (neigh, city) as value.
Neither test_neigh0.first nor test_neigh1.first is a triple, so you cannot pattern match it as such.
The elements in test_neigh1 are Array[String]. Under the assumption that these arrays are all of length 3, you can pattern match against them as { case Array(id, neigh, city) => ...}.
To make sure that you won't get a matching error if one of the line as more or less than 3 elements, you may collect on this pattern matching, instead of mapping on it.
val test_neigh: RDD[(String, (String, String))] = test_neigh1.collect{
case Array(id, neigh, city) => (id, (neigh, city))
}
EDIT
The issues you experienced as described in your comment are related to RDD[_] not being a usual collection (such as List, Array or Set). To avoid those, you might need to fetch elements in the array without pattern matching:
val test_neigh: RDD[(String, (String, String))] = test_neigh0.map(line => {
val arr = line.split("\t")
(arr(0), (arr(1), arr(2))
})
val baseRDD = sc.textFile("neighborhood_test.csv").filter { x => !x.contains("city") }
baseRDD.map { x =>
val split = x.split("\t")
(split(0), (split(1), split(2)))
}.groupByKey().foreach(println(_))
Result:
(3335,CompactBuffer((Dunlap,Seattle)))
(4291,CompactBuffer((Roosevelt,Seattle)))
(5682,CompactBuffer((South Delridge,Seattle)))

Count occurrences of word in file

Below code attempts to count the number of times "Apple" appears in an HTML file.
object Question extends App {
def validWords(fileSentancesPart: List[String], wordList: List[String]): List[Option[String]] =
fileSentancesPart.map(sentancePart => {
if (isWordContained(wordList, sentancePart)) {
Some(sentancePart)
} else {
None
}
})
def isWordContained(wordList: List[String], sentancePart: String): Boolean = {
for (word <- wordList) {
if (sentancePart.contains(word)) {
return true;
}
}
false
}
lazy val lines = scala.io.Source.fromFile("c:\\data\\myfile.txt" , "latin1").getLines.toList.map(m => m.toUpperCase.split(" ")).flatten
val vw = validWords(lines, List("APPLE")) .flatten.size
println("size is "+vw)
}
The count is 79 as per the Scala code. But when I open the file with a text editor it finds 81 words with "Apple" contained. The search is case insensitive. Can spot where the bug is ? (I'm assuming the bug is with my code and not the text editor!)
I've wrote a couple of tests but the code seems to behave as expected in these simple use cases :
import scala.collection.mutable.Stack;
import org.scalatest.FlatSpec;
import org.scalatest._;
class ConvertTes extends FlatSpec {
"Valid words" should "be returned" in {
val fileWords = List("this" , "is" , "apple" , "applehere")
val validWords = List("apple")
lazy val lines = scala.io.Source.fromFile("c:\\data\\myfile.txt" , "latin1").getLines.toList.map(m => m.toUpperCase.split(" ")).flatten
val l : List[String] = validWords(fileWords, validWords).flatten
l.foreach(println)
}
"Entire line " should "be returned for matched word" in {
val fileWords = List("this" , "is" , "this apple is an" , "applehere")
val validWords = List("apple")
val l : List[String] = validWords(fileWords, validWords).flatten
l.foreach(println)
}
}
The HTML file being parsed (referred to as "c:\data\myfile.txt") in code above :
https://drive.google.com/file/d/0B1TIppVWd0LSVG9Edl9OYzh4Q1U/view?usp=sharing
Any suggestions on alternatives to code above welcome.
Think my issue is as per #Jack Leow comment. For code :
val fileWords = List("this", "is", "this appleisapple an", "applehere")
val validWords = List("apple")
val l: List[String] = validWords(fileWords, validWords).flatten
println("size : " + l.size)
size printed is 2, when it should be 3
I think you should do the following:
def validWords(
fileSentancesPart: List[String],
wordList: List[String]): List[Option[String]] =
fileSentancesPart /* add flatMap */ .flatMap(_.tails)
.map(sentancePart => {
if (isWordContained(wordList, sentancePart)) {
Some(sentancePart)
} else {
None
}
})
def isWordContained(
wordList: List[String],
sentancePart: String): Boolean = {
for (word <- wordList) {
//if (sentancePart.contains(word)) {
if (sentancePart.startsWith(word)) { // use startsWith
return true;
}
}
false
}
You could use regular expressions with a Source iterator:
val regex = "([Aa]pple)".r
val count = Source.fromFile("/test.txt").getLines.map(regex.findAllIn(_).length).sum