I want to decode an optional query parameter in my Scala code. I'm using http4s. The parameter is of the form ?part=35/43. End goal is to store this fraction as a Type Part = (Int, Int) so that we can have (35, 43) as a tuple to use further in the code. I've created an Object like:
https://http4s.org/v0.18/dsl/#optional-query-parameters
object OptionalPartitionQueryParamMatcher
extends OptionalValidatingQueryParamDecoderMatcher[Part]("part")
Now OptionalValidatingQueryParamDecoderMatcher needs an implicit QueryParamDecoder[Part] in scope.
For this I created an implicit val, which needs to check if we actually have a valid fraction, which is to say, both the chars should be a digit (and not a/1 or b/c etc) and the fraction should be less than 1 (1/2, 5/8 etc):
implicit val ev: QueryParamDecoder[Part] =
new QueryParamDecoder[Part] {
def decode(
partition: QueryParameterValue
): ValidatedNel[ParseFailure, Part] = {
val partAndPartitions = partition.value.split("/")
Validated
.catchOnly[NumberFormatException] {
val part = partAndPartitions(0).toInt
val partitions = partAndPartitions(1).toInt
if (
partAndPartitions.length != 2 || part > partitions || part <= 0 || partitions <= 0
) {
throw new IllegalArgumentException
}
(part, partitions)
}
.leftMap(e => ParseFailure("Invalid query parameter part", e.getMessage))
.toValidatedNel
}
}
The problem with above code is, it only catches NumberFormatException (that too when it can't convert a string to Int using .toInt) but what if I input something like ?part=/1, it should then catch ArrayIndexOutOfBoundsException because I'm querying the first two values in the Array, or let's say IllegalArgumentException when the fraction is not valid at all. How can I achieve that, catching everything in a single pass? Thanks!
well, the simplest approach would be to use .catchOnly[Throwable] (or even QueryParamDecoder .fromUnsafeCast directly) and it will catch any error.
However, I personally would prefer to do something like this:
(I couldn't compile the code right now, so apologies if it has some typos)
implicit final val PartQueryParamDecoder: QueryParamDecoder[Part] =
QueryParamDecoder[String].emap { str =>
def failure(details: String): Either[ParseFailure, Part] =
Left(ParseFailure(
sanitized = "Invalid query parameter part",
details = s"'${str}' is not properly formttated: ${details}"
))
str.split('/').toList match {
case aRaw :: bRaw :: Nil =>
(aRaw.toIntOption, bRaw.toIntOption) match {
case (Some(a), Some(b)) =>
Right(Part(a, b))
case _ =>
failure(details = "Some of the fraction parts are not numbers")
}
case _ =>
failure(details = "It doesn't correspond to a fraction 'a/b'")
}
}
Related
I'm fresh with scala and udf, now I would like to write a udf which accept 3 parameters from a dataframe columns(one of them is array), for..loop current array, parse and return a case class which will be used afterwards. here's a my code roughly:
case class NewFeatures(dd: Boolean, zz: String)
val resultUdf = udf((arrays: Option[Row], jsonData: String, placement: Int) => {
for (item <- arrays) {
val aa = item.getAs[Long]("aa")
val bb = item.getAs[Long]("bb")
breakable {
if (aa <= 0 || bb <= 0) break
}
val cc = item.getAs[Long]("cc")
val dd = cc > 0
val jsonData = item.getAs[String]("json_data")
val jsonDataObject = JSON.parseFull(jsonData).asInstanceOf[Map[String, Any]]
var zz = jsonDataObject.getOrElse("zz", "").toString
NewFeatures(dd, zz)
}
})
when I run it, it will get exception:
java.lang.UnsupportedOperationException: Schema for type Unit is not supported
how should I modify above udf
First of all, try better naming for your variables, for instance in your case, "arrays" is of type Option[Row]. Here, for (item <- arrays) {...} is basically a .map method, using map on Options, you should provide a function, that uses Row and returns a value of some type (~= signature: def map[V](f: Row => V): Option[V], what you want in your case: def map(f: Row => NewFeatures): Option[NewFeature]). While you're breaking out of this map in some circumstances, so there's no assurance for the compiler that the function inside map method would always return an instance of NewFeatures. So it is Unit (it only returns on some cases, and not all).
What you want to do could be enhanced in something similar to this:
val funcName: (Option[Row], String, Int) => Option[NewFeatures] =
(rowOpt, jsonData, placement) => rowOpt.filter(
/* your break condition */
).map { row => // if passes the filter predicate =>
// fetch data from row, create new instance
}
I might have something like this:
val found = source.toCharArray.foreach{ c =>
// Process char c
// Sometimes (e.g. on newline) I want to emit a result to be
// captured in 'found'. There may be 0 or more captured results.
}
This shows my intent. I want to iterate over some collection of things. Whenever the need arrises I want to "emit" a result to be captured in found. It's not a direct 1-for-1 like map. collect() is a "pull", applying a partial function over the collection. I want a "push" behavior, where I visit everything but push out something when needed.
Is there a pattern or collection method I'm missing that does this?
Apparently, you have a Collection[Thing], and you want to obtain a new Collection[Event] by emitting a Collection[Event] for each Thing. That is, you want a function
(Collection[Thing], Thing => Collection[Event]) => Collection[Event]
That's exactly what flatMap does.
You can write it down with nested fors where the second generator defines what "events" have to be "emitted" for each input from the source. For example:
val input = "a2ba4b"
val result = (for {
c <- input
emitted <- {
if (c == 'a') List('A')
else if (c.isDigit) List.fill(c.toString.toInt)('|')
else Nil
}
} yield emitted).mkString
println(result)
prints
A||A||||
because each 'a' emits an 'A', each digit emits the right amount of tally marks, and all other symbols are ignored.
There are several other ways to express the same thing, for example, the above expression could also be rewritten with an explicit flatMap and with a pattern match instead of if-else:
println(input.flatMap{
case 'a' => "A"
case d if d.isDigit => "|" * (d.toString.toInt)
case _ => ""
})
I think you are looking for a way to build a Stream for your condition. Streams are lazy and are computed only when required.
val sourceString = "sdfdsdsfssd\ndfgdfgd\nsdfsfsggdfg\ndsgsfgdfgdfg\nsdfsffdg\nersdff\n"
val sourceStream = sourceString.toCharArray.toStream
def foundStreamCreator( source: Stream[Char], emmitBoundaryFunction: Char => Boolean): Stream[String] = {
def loop(sourceStream: Stream[Char], collector: List[Char]): Stream[String] =
sourceStream.isEmpty match {
case true => collector.mkString.reverse #:: Stream.empty[String]
case false => {
val char = sourceStream.head
emmitBoundaryFunction(char) match {
case true =>
collector.mkString.reverse #:: loop(sourceStream.tail, List.empty[Char])
case false =>
loop(sourceStream.tail, char :: collector)
}
}
}
loop(source, List.empty[Char])
}
val foundStream = foundStreamCreator(sourceStream, c => c == '\n')
val foundIterator = foundStream.toIterator
foundIterator.next()
// res0: String = sdfdsdsfssd
foundIterator.next()
// res1: String = dfgdfgd
foundIterator.next()
// res2: String = sdfsfsggdfg
It looks like foldLeft to me:
val found = ((List.empty[String], "") /: source.toCharArray) {case ((agg, tmp), char) =>
if (char == '\n') (tmp :: agg, "") // <- emit
else (agg, tmp + char)
}._1
Where you keep collecting items in a temporary location and then emit it when you run into a character signifying something. Since I used List you'll have to reverse at the end if you want it in order.
I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.
I have to validate some variables manually for some reason and return a map with the sequance of the error messages for each variable. I've decided to use mutable collections for this because I think there is no other choise left:
val errors = collection.mutable.Map[String, ListBuffer[String]]()
//field1
val fieldToValidate1 = getData1()
if (fieldToValidate1 = "")
errors("fieldToValidate1") += "it must not be empty!"
if (validate2(fieldToValidate1))
errors("fieldToValidate1") += "validation2!"
if (validate3(fieldToValidate1))
errors("fieldToValidate1") += "validation3!"
//field2
val fieldToValidate2 = getData1()
//approximately the same steps
if (fieldToValidate2 = "")
errors("fieldToValidate2") += "it must not be empty!"
//.....
To my mind, it look kind of clumsy and there should other elegant solution. I'd also like to not use mutable collections if possible. Your ideas?
Instead of using mutable collections, you can define errors with var and update it in this way.
var errors = Map[String, List[String]]().withDefaultValue(Nil)
errors = errors updated ("fieldToValidate1", errors("fieldToValidate1") ++ List("it must not be empty!"))
errors = errors updated ("fieldToValidate1", errors("fieldToValidate1") ++ List("validation2"))
The code looks more tedious, but it gets out of mutable collections.
So what is a good type for your check? I was thinking about A => Option[String] if A is the type of your object under test. If your error messages do not depend on the value of the object under test, (A => Boolean, String) might be more convenient.
//for constructing checks from boolean test and an error message
def checkMsg[A](check: A => Boolean, msg: => String): A => Option[String] =
x => if(check(x)) Some(msg) else None
val checks = Seq[String => Option[String]](
checkMsg((_ == ""), "it must not be empty"),
//example of using the object under test in the error message
x => Some(x).filterNot(_ startsWith "ab").map(x => x + " does not begin with ab")
)
val objectUnderTest = "acvw"
val errors = checks.flatMap(c => c(objectUnderTest))
Error labels
As I just noted, you were asking for a map with a label for each check. In this case, you need to provide the check label, of course. Then the type of your check would be (String, A => Option[String]).
Although a [relatively] widespread way of doing-the-thing-right would be using scalaz's Validation (as #senia has shown), I think it is a little bit overwhelming approach (if you're bringing scalaz to your project you have to be a seasoned scala developer, otherwise it may bring you more harm than good).
Nice alternative could be using ScalaUtils which has Or and Every specifically made for this purpose, in fact if you're using ScalaTest you already have seen an example of them in use (it uses scalautils underneath). I shamefully copy-pasted example from their doc:
import org.scalautils._
def parseName(input: String): String Or One[ErrorMessage] = {
val trimmed = input.trim
if (!trimmed.isEmpty) Good(trimmed) else Bad(One(s""""${input}" is not a valid name"""))
}
def parseAge(input: String): Int Or One[ErrorMessage] = {
try {
val age = input.trim.toInt
if (age >= 0) Good(age) else Bad(One(s""""${age}" is not a valid age"""))
}
catch {
case _: NumberFormatException => Bad(One(s""""${input}" is not a valid integer"""))
}
}
import Accumulation._
def parsePerson(inputName: String, inputAge: String): Person Or Every[ErrorMessage] = {
val name = parseName(inputName)
val age = parseAge(inputAge)
withGood(name, age) { Person(_, _) }
}
parsePerson("Bridget Jones", "29")
// Result: Good(Person(Bridget Jones,29))
parsePerson("Bridget Jones", "")
// Result: Bad(One("" is not a valid integer))
parsePerson("Bridget Jones", "-29")
// Result: Bad(One("-29" is not a valid age))
parsePerson("", "")
// Result: Bad(Many("" is not a valid name, "" is not a valid integer))
Having said this, I don't think you can do any better than your current approach if you want to stick with core scala without any external dependencies.
In case you can use scalaz the best solution for aggregation errors is Validation:
def validate1(value: String) =
if (value == "") "it must not be empty!".failNel else value.success
def validate2(value: String) =
if (value.length > 10) "it must not be longer than 10!".failNel else value.success
def validate3(value: String) =
if (value == "error") "it must not be equal to 'error'!".failNel else value.success
def validateField(name: String, value: String): ValidationNel[(String, String), String] =
(
validate1(value) |#|
validate2(value) |#|
validate3(value)
).tupled >| value leftMap { _.map{ name -> _ } }
val result = (
validateField("fieldToValidate1", getData1()) |#|
validateField("fieldToValidate2", getData2())
).tupled
Then you could get optional errors Map like this:
val errors =
result.swap.toOption.map{
_.toList.groupBy(_._1).map{ case (k, v) => k -> v.map(_._2) }
}
// Some(Map(fieldToValidate2 -> List(it must not be equal to 'error'!), fieldToValidate1 -> List(it must not be empty!)))
Much like this question:
Functional code for looping with early exit
Say the code is
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return /*???*/ Some(obj)
}
None
}
How to yield a single element from a for loop like this in scala?
I do not want to use find, as proposed in the original question, i am curious about if and how it could be implemented using the for loop.
* UPDATE *
First, thanks for all the comments, but i guess i was not clear in the question. I am shooting for something like this:
val seven = for {
x <- 1 to 10
if x == 7
} return x
And that does not compile. The two errors are:
- return outside method definition
- method main has return statement; needs result type
I know find() would be better in this case, i am just learning and exploring the language. And in a more complex case with several iterators, i think finding with for can actually be usefull.
Thanks commenters, i'll start a bounty to make up for the bad posing of the question :)
If you want to use a for loop, which uses a nicer syntax than chained invocations of .find, .filter, etc., there is a neat trick. Instead of iterating over strict collections like list, iterate over lazy ones like iterators or streams. If you're starting with a strict collection, make it lazy with, e.g. .toIterator.
Let's see an example.
First let's define a "noisy" int, that will show us when it is invoked
def noisyInt(i : Int) = () => { println("Getting %d!".format(i)); i }
Now let's fill a list with some of these:
val l = List(1, 2, 3, 4).map(noisyInt)
We want to look for the first element which is even.
val r1 = for(e <- l; val v = e() ; if v % 2 == 0) yield v
The above line results in:
Getting 1!
Getting 2!
Getting 3!
Getting 4!
r1: List[Int] = List(2, 4)
...meaning that all elements were accessed. That makes sense, given that the resulting list contains all even numbers. Let's iterate over an iterator this time:
val r2 = (for(e <- l.toIterator; val v = e() ; if v % 2 == 0) yield v)
This results in:
Getting 1!
Getting 2!
r2: Iterator[Int] = non-empty iterator
Notice that the loop was executed only up to the point were it could figure out whether the result was an empty or non-empty iterator.
To get the first result, you can now simply call r2.next.
If you want a result of an Option type, use:
if(r2.hasNext) Some(r2.next) else None
Edit Your second example in this encoding is just:
val seven = (for {
x <- (1 to 10).toIterator
if x == 7
} yield x).next
...of course, you should be sure that there is always at least a solution if you're going to use .next. Alternatively, use headOption, defined for all Traversables, to get an Option[Int].
You can turn your list into a stream, so that any filters that the for-loop contains are only evaluated on-demand. However, yielding from the stream will always return a stream, and what you want is I suppose an option, so, as a final step you can check whether the resulting stream has at least one element, and return its head as a option. The headOption function does exactly that.
def findFirst[T](objects: List[T], expensiveFunc: T => Boolean): Option[T] =
(for (obj <- objects.toStream if expensiveFunc(obj)) yield obj).headOption
Why not do exactly what you sketched above, that is, return from the loop early? If you are interested in what Scala actually does under the hood, run your code with -print. Scala desugares the loop into a foreach and then uses an exception to leave the foreach prematurely.
So what you are trying to do is to break out a loop after your condition is satisfied. Answer here might be what you are looking for. How do I break out of a loop in Scala?.
Overall, for comprehension in Scala is translated into map, flatmap and filter operations. So it will not be possible to break out of these functions unless you throw an exception.
If you are wondering, this is how find is implemented in LineerSeqOptimized.scala; which List inherits
override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
var these = this
while (!these.isEmpty) {
if (p(these.head)) return Some(these.head)
these = these.tail
}
None
}
This is a horrible hack. But it would get you the result you wished for.
Idiomatically you'd use a Stream or View and just compute the parts you need.
def findFirst[T](objects: List[T]): T = {
def expensiveFunc(o : T) = // unclear what should be returned here
case class MissusedException(val data: T) extends Exception
try {
(for (obj <- objects) {
if (expensiveFunc(obj) != null) throw new MissusedException(obj)
})
objects.head // T must be returned from loop, dummy
} catch {
case MissusedException(obj) => obj
}
}
Why not something like
object Main {
def main(args: Array[String]): Unit = {
val seven = (for (
x <- 1 to 10
if x == 7
) yield x).headOption
}
}
Variable seven will be an Option holding Some(value) if value satisfies condition
I hope to help you.
I think ... no 'return' impl.
object TakeWhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else
seq(seq.takeWhile(_ == null).size)
}
object OptionLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T], index: Int = 0): T = if (seq.isEmpty) null.asInstanceOf[T] else
Option(seq(index)) getOrElse func(seq, index + 1)
}
object WhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else {
var i = 0
def obj = seq(i)
while (obj == null)
i += 1
obj
}
}
objects iterator filter { obj => (expensiveFunc(obj) != null } next
The trick is to get some lazy evaluated view on the colelction, either an iterator or a Stream, or objects.view. The filter will only execute as far as needed.