I am trying achieve the below in scala
var date="12/01/2021"
var a,b,c = date.split("/")
print(a,b,c)
//expected result
12,01,2021
There's no way you know for sure the size of the array after splitting, which is why you cannot destructure it like this.
However you can use pattern matching:
date.split("/") match {
case Array(a, b, c) => print(...)
case _ => print("invalid format")
}
Or just access the array by index (not safe):
val arr = date.split("/")
val (a, b, c) = (arr(0), arr(1), arr(2))
You can write
val Array(a,b,c) = date.split("/")
or
val Array(a,b,c) = date.split("/").take(3)
However, the pattern match as #Gaël J suggested has the advantage of graceful handling of cases where the result doesn't have the 3 parts expected
You can pattern match directly on the date String (based on Gaël's answer)
val date = "12/01/2021"
date match {
case s"$a/$b/$c" => print(a, b, c)
case _ => print("invalid format")
}
Related
I might have something like this:
val found = source.toCharArray.foreach{ c =>
// Process char c
// Sometimes (e.g. on newline) I want to emit a result to be
// captured in 'found'. There may be 0 or more captured results.
}
This shows my intent. I want to iterate over some collection of things. Whenever the need arrises I want to "emit" a result to be captured in found. It's not a direct 1-for-1 like map. collect() is a "pull", applying a partial function over the collection. I want a "push" behavior, where I visit everything but push out something when needed.
Is there a pattern or collection method I'm missing that does this?
Apparently, you have a Collection[Thing], and you want to obtain a new Collection[Event] by emitting a Collection[Event] for each Thing. That is, you want a function
(Collection[Thing], Thing => Collection[Event]) => Collection[Event]
That's exactly what flatMap does.
You can write it down with nested fors where the second generator defines what "events" have to be "emitted" for each input from the source. For example:
val input = "a2ba4b"
val result = (for {
c <- input
emitted <- {
if (c == 'a') List('A')
else if (c.isDigit) List.fill(c.toString.toInt)('|')
else Nil
}
} yield emitted).mkString
println(result)
prints
A||A||||
because each 'a' emits an 'A', each digit emits the right amount of tally marks, and all other symbols are ignored.
There are several other ways to express the same thing, for example, the above expression could also be rewritten with an explicit flatMap and with a pattern match instead of if-else:
println(input.flatMap{
case 'a' => "A"
case d if d.isDigit => "|" * (d.toString.toInt)
case _ => ""
})
I think you are looking for a way to build a Stream for your condition. Streams are lazy and are computed only when required.
val sourceString = "sdfdsdsfssd\ndfgdfgd\nsdfsfsggdfg\ndsgsfgdfgdfg\nsdfsffdg\nersdff\n"
val sourceStream = sourceString.toCharArray.toStream
def foundStreamCreator( source: Stream[Char], emmitBoundaryFunction: Char => Boolean): Stream[String] = {
def loop(sourceStream: Stream[Char], collector: List[Char]): Stream[String] =
sourceStream.isEmpty match {
case true => collector.mkString.reverse #:: Stream.empty[String]
case false => {
val char = sourceStream.head
emmitBoundaryFunction(char) match {
case true =>
collector.mkString.reverse #:: loop(sourceStream.tail, List.empty[Char])
case false =>
loop(sourceStream.tail, char :: collector)
}
}
}
loop(source, List.empty[Char])
}
val foundStream = foundStreamCreator(sourceStream, c => c == '\n')
val foundIterator = foundStream.toIterator
foundIterator.next()
// res0: String = sdfdsdsfssd
foundIterator.next()
// res1: String = dfgdfgd
foundIterator.next()
// res2: String = sdfsfsggdfg
It looks like foldLeft to me:
val found = ((List.empty[String], "") /: source.toCharArray) {case ((agg, tmp), char) =>
if (char == '\n') (tmp :: agg, "") // <- emit
else (agg, tmp + char)
}._1
Where you keep collecting items in a temporary location and then emit it when you run into a character signifying something. Since I used List you'll have to reverse at the end if you want it in order.
Example
(168,20874,List(, 33895, 2711))
to 168,20874| , 33895, 2711
Basically convert RDD[(Any, scala.collection.immutable.Iterable[String])] to String.
Thanks
Method "map" with predefined formatter can be used:
val rdd = sparkContext.parallelize(List((168, 20874, List(33895, 2711))))
val result = rdd.map { case (a, b, c) => s"$a,$b| ,${c.mkString(",")}" }
result.foreach(println)
Output:
168,20874| ,33895,2711
I have a Seq[String] in Scala, and if the Seq contains certain Strings, I append a relevant message to another list.
Is there a more 'scalaesque' way to do this, rather than a series of if statements appending to a list like I have below?
val result = new ListBuffer[Err]()
val malformedParamNames = // A Seq[String]
if (malformedParamNames.contains("$top")) result += IntegerMustBePositive("$top")
if (malformedParamNames.contains("$skip")) result += IntegerMustBePositive("$skip")
if (malformedParamNames.contains("modifiedDate")) result += FormatInvalid("modifiedDate", "yyyy-MM-dd")
...
result.toList
If you want to use some scala iterables sugar I would use
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.map { v =>
v match {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
}.flatten
result.toList
Be warn if you ask for scala-esque way of doing things there are many possibilities.
The map function combined with flatten can be simplified by using flatmap
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.flatMap {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
result
Most 'scalesque' version I can think of while keeping it readable would be:
val map = scala.collection.immutable.ListMap(
"$top" -> IntegerMustBePositive("$top"),
"$skip" -> IntegerMustBePositive("$skip"),
"modifiedDate" -> FormatInvalid("modifiedDate", "yyyy-MM-dd"))
val result = for {
(k,v) <- map
if malformedParamNames contains k
} yield v
//or
val result2 = map.filterKeys(malformedParamNames.contains).values.toList
Benoit's is probably the most scala-esque way of doing it, but depending on who's going to be reading the code later, you might want a different approach.
// Some type definitions omitted
val malformations = Seq[(String, Err)](
("$top", IntegerMustBePositive("$top")),
("$skip", IntegerMustBePositive("$skip")),
("modifiedDate", FormatInvalid("modifiedDate", "yyyy-MM-dd")
)
If you need a list and the order is siginificant:
val result = (malformations.foldLeft(List.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
pair._2 ++: acc // prepend to list for faster performance
} else acc
}).reverse // and reverse since we were prepending
If the order isn't significant (although if the order's not significant, you might consider wanting a Set instead of a List):
val result = (malformations.foldLeft(Set.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
acc ++ pair._2
} else acc
}).toList // omit the .toList if you're OK with just a Set
If the predicates in the repeated ifs are more complex/less uniform, then the type for malformations might need to change, as they would if the responses changed, but the basic pattern is very flexible.
In this solution we define a list of mappings that take your IF condition and THEN statement in pairs and we iterate over the inputted list and apply the changes where they match.
// IF THEN
case class Operation(matcher :String, action :String)
def processInput(input :List[String]) :List[String] = {
val operations = List(
Operation("$top", "integer must be positive"),
Operation("$skip", "skip value"),
Operation("$modify", "modify the date")
)
input.flatMap { in =>
operations.find(_.matcher == in).map { _.action }
}
}
println(processInput(List("$skip","$modify", "$skip")));
A breakdown
operations.find(_.matcher == in) // find an operation in our
// list matching the input we are
// checking. Returns Some or None
.map { _.action } // if some, replace input with action
// if none, do nothing
input.flatMap { in => // inputs are processed, converted
// to some(action) or none and the
// flatten removes the some/none
// returning just the strings.
I have the following input string:
"0.3215,Some(0.5123)"
I would like to retrieve the tuple (0.3215,Some(0.5123)) with: (BigDecimal,Option[BigDecimal]).
Here is one of the thing I tried so far:
"\\d+\\.\\d+,Some\\(\\d+\\.\\d+".r findFirstIn iData match {
case None => Map[BigDecimal, Option[BigDecimal]]()
case Some(s) => {
val oO = s.split(",Some\\(")
BigDecimal.valueOf(oO(0).toDouble) -> Option[BigDecimal](BigDecimal.valueOf(lSTmp2(1).toDouble))
}
}
Using a Map and transforming it into a tuple.
When I try directly the tuple I get an Equals or an Object.
Must miss something here...
Your code has several issues, but the big one seems to be that the case None side of the match returns a Map but the Some(s) side returns a Tuple2. Map and Tuple2 unify to their lowest-common-supertype, Equals, which is what you're seeing.
I think this is what you're trying to achieve?
val Pattern = "(\\d+\\.\\d+),Some\\((\\d+\\.\\d+)\\)".r
val s = "0.3215,Some(0.5123)"
s match {
case Pattern(a,b) => Map(BigDecimal(a) -> Some(BigDecimal(b)))
case _ => Map[BigDecimal, Option[BigDecimal]]()
}
// Map[BigDecimal,Option[BigDecimal]] = Map(0.3215 -> Some(0.5123))
I'm trying to 'group' a string into segments, I guess this example would explain it more succintly
scala> val str: String = "aaaabbcddeeeeeeffg"
... (do something)
res0: List("aaaa","bb","c","dd","eeeee","ff","g")
I can thnk of a few ways to do this in an imperative style (with vars and stepping through the string to find groups) but I was wondering if any better functional solution could
be attained? I've been looking through the Scala API but there doesn't seem to be something that fits my needs.
Any help would be appreciated
You can split the string recursively with span:
def s(x : String) : List[String] = if(x.size == 0) Nil else {
val (l,r) = x.span(_ == x(0))
l :: s(r)
}
Tail recursive:
#annotation.tailrec def s(x : String, y : List[String] = Nil) : List[String] = {
if(x.size == 0) y.reverse
else {
val (l,r) = x.span(_ == x(0))
s(r, l :: y)
}
}
Seems that all other answers are very concentrated on collection operations. But pure string + regex solution is much simpler:
str split """(?<=(\w))(?!\1)""" toList
In this regex I use positive lookbehind and negative lookahead for the captured char
def group(s: String): List[String] = s match {
case "" => Nil
case s => s.takeWhile(_==s.head) :: group(s.dropWhile(_==s.head))
}
Edit: Tail recursive version:
def group(s: String, result: List[String] = Nil): List[String] = s match {
case "" => result reverse
case s => group(s.dropWhile(_==s.head), s.takeWhile(_==s.head) :: result)
}
can be used just like the other because the second parameter has a default value and thus doesnt have to be supplied.
Make it one-liner:
scala> val str = "aaaabbcddddeeeeefff"
str: java.lang.String = aaaabbcddddeeeeefff
scala> str.groupBy(identity).map(_._2)
res: scala.collection.immutable.Iterable[String] = List(eeeee, fff, aaaa, bb, c, dddd)
UPDATE:
As #Paul mentioned about the order here is updated version:
scala> str.groupBy(identity).toList.sortBy(_._1).map(_._2)
res: List[String] = List(aaaa, bb, c, dddd, eeeee, fff)
You could use some helper functions like this:
val str = "aaaabbcddddeeeeefff"
def zame(chars:List[Char]) = chars.partition(_==chars.head)
def q(chars:List[Char]):List[List[Char]] = chars match {
case Nil => Nil
case rest =>
val (thesame,others) = zame(rest)
thesame :: q(others)
}
q(str.toList) map (_.mkString)
This should do the trick, right? No doubt it can be cleaned up into one-liners even further
A functional* solution using fold:
def group(s : String) : Seq[String] = {
s.tail.foldLeft(Seq(s.head.toString)) { case (carry, elem) =>
if ( carry.last(0) == elem ) {
carry.init :+ (carry.last + elem)
}
else {
carry :+ elem.toString
}
}
}
There is a lot of cost hidden in all those sequence operations performed on strings (via implicit conversion). I guess the real complexity heavily depends on the kind of Seq strings are converted to.
(*) Afaik all/most operations in the collection library depend in iterators, an imho inherently unfunctional concept. But the code looks functional, at least.
Starting Scala 2.13, List is now provided with the unfold builder which can be combined with String::span:
List.unfold("aaaabbaaacdeeffg") {
case "" => None
case rest => Some(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
or alternatively, coupled with Scala 2.13's Option#unless builder:
List.unfold("aaaabbaaacdeeffg") {
rest => Option.unless(rest.isEmpty)(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
Details:
Unfold (def unfold[A, S](init: S)(f: (S) => Option[(A, S)]): List[A]) is based on an internal state (init) which is initialized in our case with "aaaabbaaacdeeffg".
For each iteration, we span (def span(p: (Char) => Boolean): (String, String)) this internal state in order to find the prefix containing the same symbol and produce a (String, String) tuple which contains the prefix and the rest of the string. span is very fortunate in this context as it produces exactly what unfold expects: a tuple containing the next element of the list and the new internal state.
The unfolding stops when the internal state is "" in which case we produce None as expected by unfold to exit.
Edit: Have to read more carefully. Below is no functional code.
Sometimes, a little mutable state helps:
def group(s : String) = {
var tmp = ""
val b = Seq.newBuilder[String]
s.foreach { c =>
if ( tmp != "" && tmp.head != c ) {
b += tmp
tmp = ""
}
tmp += c
}
b += tmp
b.result
}
Runtime O(n) (if segments have at most constant length) and tmp.+= probably creates the most overhead. Use a string builder instead for strict runtime in O(n).
group("aaaabbcddeeeeeeffg")
> Seq[String] = List(aaaa, bb, c, dd, eeeeee, ff, g)
If you want to use scala API you can use the built in function for that:
str.groupBy(c => c).values
Or if you mind it being sorted and in a list:
str.groupBy(c => c).values.toList.sorted