Splitting String using first entry - scala

Say I have a list:
val l = List("and", "or", "up to").
I want to check if anyone of these entries from l are present in a string. If it is present, then the string will be split using the first found entry from l. If not, then the entire string is returned.
So for example, let's say our string is: "1.5 litres of milk or 2 apples" should return List("1.5 litres of milk", "2 apples").
On the other hand a string like "1.5 litres of milk along with 2 apples" should return List("1.5 litres of milk along with 2 apples").

Presuming (from your comment) that "caseLine" contains the text to be split, you can try:
l.find(caseLine.contains).map(caseLine.split)
.map(_.toList.map(_.trim))
.getOrElse(caseLine :: Nil)
Find returns the first element of l found in caseLine, as an Option, which is then mapped to produce caseLine split, then trimmed, and finally the getOrElse returns the original caseLine if find returned None. The results have all been converted to List[String] for consistent typing of the results.

Try this:
def splitOnFirstMatch(splitters: List[String], s: String): List[String] = {
splitters.find(s.contains) match {
case Some(x) => s.split(x).map(_.trim).toList
case None => List(s)
}
}
I am first finding the first string in splitters that are contained in s, then I return the string splitted on this word. I return the list with the input string if no splitter exists.

Related

How can i split string to list using scala

i has string str = "one,two,(three,four), five"
I want to split this string into list, like this:
List[String] = ("one", "two", "(three, four)", "five")?
i have no idea for this.
thanks
We can try matching on the pattern \(.*?\)|[^, ]+:
val str = "one,two,(three,four), five"
val re = """\(.*?\)|[^, ]+""".r
for(m <- re.findAllIn(str)) println(m)
This prints:
one
two
(three,four)
five
This regex pattern eagerly first tries to find a (...) term. That failing, it matches any content other than comma or space, to consume one CSV term at a time. This trick avoids the problem of matching across commas inside (...).

Finding the average of option values containg object in a list

I'm new to Scala and I have been struggling with Option and Lists. I have the following object:
object Person {
case class Person(fName: String,
lName: String,
jeans: Option[Jeans],
jacket: Option[Jacket],
location: List[Locations],
age: Int)
case class Jeans(brand: String, price: Int, color: String)
...
}
And I'm trying to write the function that takes as input list of type person and return the average price of their jeans:
def avgPriceJeans(input: List[Person]): Int
When you have a list of values and want to reduce all of them to a single value, applying some kind of operation. You need a fold, the most common one would be a foldLeft.
As you can see in the scaladoc. This method receives an initial value and a combination function.
It should be obvious that the initial value should be a zero. And that the combination function should take the current accumulate and add to it the price of the current jeans.
Nevertheless, now we have another problem, the jeans may or may not exists, thus we use option. In this case we need a way to say if they exists give me their price, if not give a default value (which in this case makes sense to be another zero).
And that is precisely what Option.fold give us.
Thus we end with something like:
val sum = input.foldLeft(0) {
(acc, person) =>
acc + person.jeans.fold(ifEmpty = 0)(_.price)
}
Now that you need the average, you only need to divide that sum with the count.
However, we can do the count in the same foldLeft, just to avoid an extra iteration.
(I changed the return type, as well as the price property, to Double to ensure accurate results).
def avgPriceJeans(input: List[Person]): Double = {
val (sum, count) = input.foldLeft((0.0d, 0)) {
case ((accSum, accCount), person) =>
(
accSum + person.jeans.fold(ifEmpty = 0.0d)(_.price),
accCount + 1
)
}
sum / count
}
As #SethTissue points out, this is relatively straightforward:
val prices = persons.flatMap(_.jeans.map(_.price))
prices.sum.toDouble / prices.length
The first line needs some unpicking:
Taking this from the inside out, the expression jeans.map(_.price) takes the value of jeans, which is Option[Jeans], and extracts the price field to give Option[Int]. The flatMap call is equivalent to map followed by flatten. The map call applies this inner expression to the jeans field of each element of persons. This turns the List[Person] into a List[Option[Int]]. The flatten call extracts all the Int values from Some[Int] and discards all the None values. This gives List[Int] with one element for each Person that had a non-empty jeans field.
The second line simply sums the values in the List, converts it to Double and then divides by the length of the list. Adding error checking is left as an exercise!
One approach consist in calculate the value of the jeans price for each element of the list.
After that you can sum all the values (with sum method) and divide by the list size.
I managed the case when jeans is None with 0 as price value (so I consider it for the sum).
Here the code:
def avgPriceJeans(input: List[Person]): Int =
input.map(_.jeans.map(_.price).getOrElse(0)).sum / input.size

Pre append regex pattern to split and map case class to splitted

I want to split following string which takes the form
val str = "X|blnk_1|blnk_2|blnk_3|blnk_4|time1|time2|blnk_5|blnk_6|blnk_7|blnk_8| |Z01|Str1|01|001|NE]|[HEX1|HEX2]|[NA|001:1000|123:456|[00]|]|Z01|Str2|02|002|NE]|[HEX3|HEX4]|[NA|002:1001|234:456|[01]|]|Z02|02|z2|Str|Str|"
This string always start with X and split positions are Z01,Z02,...,Z0D
str.split("""\|Z0[1|2|3|4|5|6|7|8|9|A|B|C|D]{1}\|""") foreach println
Here there could be no ordering of the position of Z01,...,Z0D appear in the string.
The split gives the desired result :
X|blnk_1|blnk_2|blnk_3|blnk_4|time1|time2|blnk_5|blnk_6|blnk_7|blnk_8|
Str1|01|001|NE]|[HEX1|HEX2]|[NA|001:1000|123:456|[00]|]
Str2|02|002|NE]|[HEX3|HEX4]|[NA|002:1001|234:456|[01]|]
02|z2|Str|Str|
However I want to map X, Z01,... to case classes. Since there is no ordering there is no way of identifying to which case class split would need to map(can't use length of individual splits ).
I expect my split to have following output :
X|blnk_1|blnk_2|blnk_3|blnk_4|time1|time2|blnk_5|blnk_6|blnk_7|blnk_8|
Z01|[Str1|01|001|NE]|[HEX1|HEX2]|[NA|001:1000|123:456|[00]|]|
Z01|[Str2|02|002|NE]|[HEX3|HEX4]|[NA|002:1001|234:456|[01]|]|
Z01|02|z2|Str|Str|
so that the result could be mapped to case class with the help of the preappended pattern.
For example:
case class X( ....)
case class Z01(val1: String, val2: String, val3: String)
case class Z02(val1: Int, val2: String, val3: String,val4:String)
.................
X|blnk_1|blnk_2|blnk_3|blnk_4|time1|time2|blnk_5|blnk_6|blnk_7|blnk_8| maps to case class X
and
Z01|[Str1|01|001|NE]|[HEX1|HEX2]|[NA|001:1000|123:456|[00]|]| maps to case class Z01
and in the end result should be in the form of ordered and similar groups to be taken as a array of particular case class.
X
Array[Z01]
Array[Z02]
......
......
As an alternative it might be an option to get your values by matching them:
(?:Z0[1-9A-D]|^X).*?(?=\|Z0[1-9A-D]|$)
This matches:
(?:Z0[1-9A-D]|X\|) In a non capturing group match Z0 and list the possible options in a character class or | X at the start ^of the line
.*? Match any character one or more times non greedy
(?=\|Z0[1-9A-D]|$) Positive lookahead which asserts that what follows is a pipe | followed by Z0 and a character from the character list or | the end of the line $
For example:
val re = """(?:Z0[1-9A-D]|^X).*?(?=\|Z0[1-9A-D]|$)""".r
val str = "X|blnk_1|blnk_2|blnk_3|blnk_4|time1|time2|blnk_5|blnk_6|blnk_7|blnk_8| |Z01|Str1|01|001|NE]|[HEX1|HEX2]|[NA|001:1000|123:456|[00]|]|Z01|Str2|02|002|NE]|[HEX3|HEX4]|[NA|002:1001|234:456|[01]|]|Z02|02|z2|Str|Str|"
re.findAllIn(str) foreach println
That will result in:
X|blnk_1|blnk_2|blnk_3|blnk_4|time1|time2|blnk_5|blnk_6|blnk_7|blnk_8|
Z01|Str1|01|001|NE]|[HEX1|HEX2]|[NA|001:1000|123:456|[00]|]
Z01|Str2|02|002|NE]|[HEX3|HEX4]|[NA|002:1001|234:456|[01]|]
Z02|02|z2|Str|Str|
Demo
How about this idea?
val x = str.split("""\|Z0[1|2|3|4|5|6|7|8|9|A|B|C|D]{1}\|""") // actual string splits
val y = """\|Z0[1|2|3|4|5|6|7|8|9|A|B|C|D]{1}\|""".r.findAllIn(str).toArray // delimiters Array
val final_data = x.slice(1, x.size).zip(y).map(x => x._2+x._1).toList // taking actual splits except first one .... and then zipping and concatenating with delimiters like below.
/*
List(|Z01|Str1|01|001|NE]|[HEX1|HEX2]|[NA|001:1000|123:456|[00]|], |Z01|Str2|02|002|NE]|[HEX3|HEX4]|[NA|002:1001|234:456|[01]|], |Z02|02|z2|Str|Str|) */
the first | in the final_data can be removed with subString

scala list read value of given string

first list
val l1 = List(("A",12,13),("B",122,123),("C",1212,123))
finding string
val l2 = "A"
If string "A" presents in list then display matching data in above case if string "A" match then output will be
12
else string does not match then shows only 0
Find first match; retrieve second part of tuplet or 0
l1.find(_._1 == "A").map(_._2).getOrElse(0)
There is a little nasty rule exists in scala pattern matching, if some variable starts with an Upper case letter the it matches against its value, so you can rename val l2 = "A" to val L2 = "A" the you the following would work -
scala> l1.collectFirst{ case (L2, i, _) => i }.getOrElse(0)
res0: Int = 12
l1.find(_._1 == l2).map(_._2).getOrElse(0)
or more verbose version
l1.find(a => a._1 == l2).map(a => a._2).getOrElse(0)
Using a for comprehension the solution may be reformulated as return second element in matching tuples or else and empty list if no matches were found, namely
for ( (s,i,j) <- l1 if s == l2) yield i
which delivers
List(12)

how would I map a list of strings with a known format to a list of tuples?

I have an array of strings. Each string has 2 parts and is separated by white space. Looks like:
x <white space> y
I want to turn it into an array of Tuples where each tuple has (x, y)
How can I write this in scala? I know it will need something similar to:
val results = listOfStrings.collect { str => (str.left, str.right) }
not sure how i can break up each str to the left and right sides needed...
You could take advantage of the fact that in Scala, Regular expressions are also "extractors".
scala> var PairWithSpaces = "(\\w+)\\s+(\\w+)".r
PairWithSpaces: scala.util.matching.Regex = (.+)\s+(.+)
scala> val PairWithSpaces(l, r) = "1 17"
l: String = 1
r: String = 17
Now you can build your extractor into a natural looking "map":
scala> Array("a b", "1 3", "Z x").map{case PairWithSpaces(x,y) => (x, y) }
res10: Array[(String, String)] = Array((a,b), (1,3), (Z,x))
Perhaps overkill for you, but can really help readability if your regex gets fancy. I also like how this approach will fail fast if an illegal string is given.
Warning, not sure if the regex matches exactly what you need...
You could (assuming that you want to drop without complaint any string that doesn't fit the pattern):
val results = listOfStrings.map(_.split("\\s+")).collect { case Array(l,r) => (l,r) }