I am storing error code and its string message in this way:
object Error {
def FileNotFound(filename: String) = Error("ERR01", s"${filename} not found")
def UserNotExist(userName: String) = Error("ERR02", s"${userName} not exist")
}
case class Error(code: String, value: String) {}
Benefit of keeping it like this is that I can pass string value to error message.
And I am creating it like
def validate(data: SomeType): List[Error] = {
var errors = ListBuffer[Error]()
if (validation1("abc") == false) {
errors+= Error.FileNotFound("abc")
}
if (validation2("lmn") == false) {
errors+= Error.UserNotExist("lmn")
}
errors.toList
}
I am new to Scala and functional programming.
Is it right way to write error code like this?
Is it following functional programming paradigm?
Scala: 2.11
Here are my two cents with a full example.
I assume you want to do this in nice-and-easy functional style, so let's remove the mutable ListBuffer and see how to deal with it. The description of the steps are after the code.
UPDATE: I want to note that, in your code, the usage of ListBuffer is acceptable because it doesn't break rerefential transparency. This is because you don't expose its mutable nature outside the function and the output of the function depends only on its inputs.
Nonetheless I usually prefer to avoid using mutable data if not for specific reasons, like compactness or performance.
object StackOverflowAnswer {
/* Base type with some common fields */
class MyError( val code: String, val msg: String )
/* FileNotFound type, subtype of MyError, with its specific fields */
case class FileNotFound( filename: String ) extends MyError(
"ERR01", s"$filename not found"
)
/* UserDoesntExist type, subtype of MyError, with its specific fields */
case class UserDoesntExist( userName: String ) extends MyError(
"ERR01", s"$userName doesn't exist"
)
/*
* Validates the file. If it finds an error it returns a Some(MyError) with
* the error that's been found
* */
def checkForBadFile( data: String ): Option[MyError] =
if( data.contains("bad_file.txt") )
Some(FileNotFound("bad_file.txt"))
else
None
/*
* Validates the user. If it finds an error it returns a Some(MyError) with
* the error that's been found
* */
def checkForMissingUser( data: String ): Option[MyError] =
if( data.contains("bad_username") )
Some(UserDoesntExist("bad_username"))
else
None
/*
* Performs all the validations and returns a list with the errors.
*/
def validate( data: String ): List[MyError] = {
val fileCheck = checkForBadFile( data )
val userCheck = checkForMissingUser( data )
List( fileCheck, userCheck ).flatten
}
/* Run and test! */
def main( args: Array[String] ): Unit = {
val goodData = "This is a text"
val goodDataResult = validate( goodData ).map( _.msg )
println(s"The checks for '$goodData' returned: $goodDataResult")
val badFile = "This is a text with bad_file.txt"
val badFileResult = validate( badFile ).map( _.msg )
println(s"The checks for '$badFile' returned: $badFileResult")
val badUser = "This is a text with bad_username"
val badUserResult = validate( badUser ).map( _.msg )
println(s"The checks for '$badUser' returned: $badUserResult")
val badBoth = "This is a text with bad_file.txt and bad_username"
val badBothResult = validate( badBoth ).map( _.msg )
println(s"The checks for '$badBoth' returned: $badBothResult")
}
}
I start defining the type structure for the error, similar to your.
Then I have two functions that performs validations for each check you want. When they find an error, they return it using the Option type of Scala. If you're not familiar with Option, you can have a look at this link or do some googling.
Then I have the validation function that calls and store each the above single checks. The last bit is the use of flatten (doc here) that "flattens" List[Option[MyError]] into List[MyError], removing the middle Option.
Then there is the actual code that shows you some examples.
Related
I have a pipeline in a place where data is being sent from Flink to Kafka topic in a JSON format. I was also able to get it from the Kafka topic and was able to get the JSON attributes as well. Now, like scala reflect classes where I can also compare the data type at runtime, I was trying to do the same thing in Fink using TypeInformation where I can set some predefined format and whatever data is being read from topic should go under this Validation and should be passed or failed accordingly.
I have a data like below:.
{"policyName":"String", "premium":2400, "eventTime":"2021-12-22 00:00:00" }
For my problem, I came across a couple of examples in Flink's book where it is mentioned how to create a TypeInformation variable but there was nothing mentioned on how to use it so I tried my way:
val objectMapper = new ObjectMapper()
val tupleType: TypeInformation[(String, String, String)] =
Types.TUPLE[(String, Int, String)]
println(tupleType.getTypeClass)
src.map(v => v)
.map { x =>
val policyName: String = objectMapper.readTree(x).get("policyName").toString()
val premium: Int = objectMapper.readTree(x).get("premium").toString().toInt
val eventTime: String = objectMapper.readTree(x).get("eventTime").toString()
if ((policyName, premium, eventTime)== tupleType.getTypeClass) {
println("Good Record: " + (policyName, premium, eventTime))
}
else {
println("Bad Record: " + (id, category, eventTime))
}
}
Now if I pass the input as below to the flink kafka producer:
{"policyName":"whatever you feel like","premium":"4000","eventTime":"2021-12-20 00:00:00"}
It should give me the expected output as a "Bad record" and the tuple since the datatype of premium is String and not Long/Int.
If a pass the input as below:
{"policyName":"whatever you feel like","premium":4000,"eventTime":"2021-12-20 00:00:00"}
It should give me the output as "Good Record" and the tuple
But according to my code, it is always giving me the else part.
If I create a datastream variable and store the results of the above map and then compare like below then it gives me the correct result:
if (tupleType == datas.getType()) { //where 'datas' is a datastream
print("Good Records")
} else {
println("Bad Records")
}
But I want to send the good/bad records to a different stream or maybe can directly be inserted in the Cassandra table. So, that is why I am using loops for identifying the records one by one. Is my way correct? What would be the best practice considering what I am trying to achieve?
Based on Dominik's inputs, I tried creating my ow CustomDeserializer class:
import com.fasterxml.jackson.databind.ObjectMapper
import org.apache.flink.api.common.serialization.DeserializationSchema
import org.apache.flink.api.common.typeinfo.TypeInformation
import java.nio.charset.StandardCharsets
class sample extends DeserializationSchema[String] {
override def deserialize(message: Array[Byte]): Tuple3[Int, String, String] = {
val data = new String(message,
StandardCharsets.UTF_8)
val objectMapper = new ObjectMapper()
val id: Int = objectMapper.readTree(data).get("id").toString().toInt
val category: String = objectMapper.readTree(data).get("Category").toString()
val eventTime: String = objectMapper.readTree(data).get("eventTime").toString()
return (id, category, eventTime)
}
override def isEndOfStream(t: String): Boolean = ???
override def getProducedType: TypeInformation[String] = return TypeInformation.of(classOf[String])
}
I wanna try to implement something like below:
src.map(v => v)
.map { x =>
if (new sample().deserialize(x)==true) {
println("Good Record: " + (id, category, eventTime))
}
else {
println("Bad Record: " + (id, category, eventTime))
}
}
But the input is in Array[Bytes] form. So how can I implement it? Where exactly I am going wrong? What needs to be modified? This is my first ever attempt in Flink Scala custom classes.
Inputs Passed: Inputs
I don't really think that using TypeInformation to do what You want is best idea. You can simply use something like ProcessFunction that will accept a JSON String and then use the ObjectMapper to deserialize JSON to class with the expected structure. You can output the correctly deserialized objects from the ProcessFunction and the Strings that failed deserialization can be apassed as side output since they will be Your Bad Records.
This could look like below, note that this uses Jackson scala to perform deserialization to case class. You can find more info here
case class Premium(policyName: String, premium: Long, eventTime: String)
class Splitter extends ProcessFunction[String, Premium] {
val outputTag = new OutputTag[String]("failed")
def fromJson[T](json: String)(implicit m: Manifest[T]): Either[String, T] = {
Try {
lazy val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
mapper.readValue[T](json)
} match {
case Success(x) => Right(x)
case Failure(err) => {
Left(json)
}
}
}
override def processElement(i: String, context: ProcessFunction[String, Premium]#Context, collector: Collector[Premium]): Unit = {
fromJson(i) match {
case Right(data) => collector.collect(data)
case Left(json) => context.output(outputTag, json)
}
}
}
Then You can use the outputTag to get the side output data from the stream to get incorrect records.
I am trying to build a multi level Validator object that is fairly generic. The idea being you have levels of validations if Level 1 passes then you do Level 2, etc. but I am struggling with one specific area: creating a function call but not executing it until a later point.
Data:
case class FooData(alpha: String, beta: String) extends AllData
case class BarData(gamma: Int, delta: Int) extends AllData
ValidationError:
case class ValidationError(code: String, message: String)
Validator:
object Validator {
def validate(validations: List[List[Validation]]): List[ValidationError] = {
validations match {
case head :: nil => // Execute the functions and get the results back
// Recursively work down the levels (below syntax may be incorrect)
case head :: tail => validate(head) ... // if no errors then validate(tail) etc.
...
}
}
}
Sample Validator:
object CorrectNameFormatValidator extends Validation {
def validate(str: String): Seq[ValidationError] = {
...
}
}
How I wish to use it:
object App {
def main(args: Array[String]): Unit = {
val fooData = FooData(alpha = "first", beta = "second")
val levelOneValidations = List(
CorrectNameFormatValidator(fooData.alpha),
CorrectNameFormatValidator(fooData.beta),
SomeOtherValidator(fooData.beta)
)
// I don't want these to execute as function calls here
val levelTwoValidations = List(
SomeLevelTwoValidator (fooData.alpha),
SomeLevelTwoValidator(fooData.beta),
SomeOtherLevelValidator(fooData.beta),
SomeOtherLevelValidator(fooData.alpha)
)
val validationLevels = List(levelOneValidations, levelTwoValidations)
Validator.validate(validationLevels)
}
}
Am I doing something really convoluted when I don't need to be or am I just missing a component?
Essentially I want to define when a function will be called and with which parameters but I don't want the call to happen until I say within the Validator. Is this something that's possible?
You can use lazy val or def when defining levelOneValidation, levelTwoValidations and validationLevel:
object App {
def main(args: Array[String]): Unit = {
val fooData = FooData(alpha = "first", beta = "second")
lazy val levelOneValidations = List(
CorrectNameFormatValidator(fooData.alpha),
CorrectNameFormatValidator(fooData.beta),
SomeOtherValidator(fooData.beta)
)
// I don't want these to execute as function calls here
lazy val levelTwoValidations = List(
SomeLevelTwoValidator (fooData.alpha),
SomeLevelTwoValidator(fooData.beta),
SomeOtherLevelValidator(fooData.beta),
SomeOtherLevelValidator(fooData.alpha)
)
lazy val validationLevels = List(levelOneValidations, levelTwoValidations)
Validator.validate(validationLevels)
}
}
You also need to change validate method to get the validations ByName and not
ByValue using : => :
object Validator {
def validate(validations: => List[List[Validation]]): List[ValidationError] = {
validations match {
case head :: nil => // Execute the functions and get the results back
// Recursively work down the levels (below syntax may be incorrect)
case head :: tail => validate(head) ... // if no errors then validate(tail) etc.
...
}
}
}
Anyway, I think you can implement it differently by just using some OOP design patterns, like Chain of Responsibility.
I have a parameter
case class Envelope(subject: Option[String]) {
}
and I want to apply a require function only if subject is non null.
Something like below:
require(StringUtils.isNotBlank(subject))
You can try this also:
case class Envelope(subject: Option[String])
def check(en: Envelope): Boolean = {
require(en.subject.isDefined)
true
}
It will be finer and you don't have any need to import StringUtils.
For fetching the value from Option you should go for the getorElse. Here we can define the default value for the variable. Ex:
def check(str: Option[String]): String = {
str.getOrElse("")
}
scala> check(None)
res1: String = ""
scala> check(Some("Test"))
res2: String = Test
Only get will throw the exception when it will get None. Ex:
def check(str: Option[String]): String = {
str.get
}
scala> check(Some("Test"))
res2: String = Test
scala> check(None)
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at check(<console>:24)
... 48 elided
I think you can go that way:
subject.map(s => if(StringUtils.isNotBlank(s)) require(s) else s)
Options should, from a practical perspective, almost never be null in Scala (I'd argue never, but there are all sorts of compromises and exceptions, especially when working with Java dependencies where null checks are more common-place).
With some pre-processing and pattern-matching, you may find this approach acceptable:
// When working with `Option`s, naming convention usually is "[name_of_var]Opt"
case class Envelope( subjectOpt: Option[String] ) {
// Utility to check whether or not this `Envelope` has a subject
def hasSubject: Boolean = subjectOpt.isDefined
// Get the subject or throw an exception if it does not have one (None)
def getSubject: String = {
subjectOpt match {
case None => throw Exception( "Envelope has no subject!" )
case Some( subject ) => subject
}
}
}
object Envelope {
// Utility to create an `Envelope` with no subject
def apply(): Envelope = Envelope(None)
// Utility to do a bit of pre-processing, checking for `null` or empty strings
def apply( subject: String ): Envelope = {
if ( subject == null || subject.isEmpty ) Envelope()
else Envelope( Some( subject ) )
}
}
val nullString: String = null
// Example usages (will have no subject):
val envelopeWithNoSubject: Envelope = Envelope()
val envelopeWithEmptySubject: Envelope = Envelope( "" )
val envelopeWithNullSubject: Envelope = Envelope( nullString )
// envelopeWithNoSubject, ...EmptySubject, and ...NullSubject all return:
... .hasSubject // False
... .getSubject // Exception: Envelope has no subject!
// Example use with subject
val envelopeWithSubject: Envelope = Envelope( "Scala" )
envelopeWithSubject.hasSubject// True
envelopeWithSubject.getSubject // "Scala"
You should be changing the require function call to the following
require(StringUtils.isBlank(subject.get))
.get method returns the string that the subject Option carries.
I'm new to the concept of using the Option type but I've tried to use it multiple places in this class to avoid these errors.
The following class is used to store data.
class InTags(val tag35: Option[String], val tag11: Option[String], val tag_109: Option[String], val tag_58: Option[String])
This following code takes a string and converts it into a Int -> String map by seperating on an equals sign.
val message= FIXMessage("8=FIX.4.29=25435=D49=REDACTED56=REDACTED115=REDACTED::::::::::CENTRAL34=296952=20151112-17:11:1111=Order7203109=CENTRAL1=TestAccount63=021=155=CSCO48=CSCO.O22=5207=OQ54=160=20151112-17:11:1338=5000040=244=2815=USD59=047=A13201=CSCO.O13202=510=127
")
val tag58 = message.fields(Some(58)).getOrElse("???")
val in_messages= new InTags(message.fields(Some(35)), message.fields(Some(11)), message.fields(Some(109)), Some(tag58))
println(in_messages.tag_109.getOrElse("???"))
where the FIXMessage object is defined as follows:
class FIXMessage (flds: Map[Option[Int], Option[String]]) {
val fields = flds
def this(fixString: String) = this(FIXMessage.parseFixString(Some(fixString)))
override def toString: String = {
fields.toString
}
}
object FIXMessage{
def apply(flds: Map[Option[Int], Option[String]]) = {
new FIXMessage(flds)
}
def apply(flds: String) = {
new FIXMessage(flds)
}
def parseFixString(fixString: Option[String]): Map[Option[Int], Option[String]] = {
val str = fixString.getOrElse("str=???")
val parts = str.split(1.toChar)
(for {
part <- parts
p = part.split('=')
} yield Some(p(0).toInt) -> Some(p(1))).toMap
}
}
The error I'm getting is ERROR key not found: Some(58) but doesnt the option class handle this? Which basically means that the string passed into the FIXMessage object doesnt contain a substring of the format 58=something(which is true) What is the best way to proceed?
You are using the apply method in Map, which returns the value or throw NoSuchElementException if key is not present.
Instead you could use getOrElse like
message.fields.getOrElse(Some(58), Some("str"))
I have the below requirement where I am checking whether a value is greater than 10 or not and based on that I will break, otherwise I will return a String. Below is my code:
import scala.util.control.Breaks._
class BreakInScala {
val breakException = new RuntimeException("Break happened")
def break = throw breakException
def callMyFunc(x: Int): String = breakable(myFunc(x))
def myFunc(x: Int): String = {
if (x > 10) {
"I am fine"
} else {
break
}
}
}
Now what is the happening is that I am getting the error message saying "type mismatch; found : Unit required: String" The reason is :
def breakable(op: => Unit)
But then how I will write a function which can return value as well as break if required?
The Scala compiler can evaluate that a branch throws an exception and not use it to form a minimum bound for the return type, but not if you move the throwing code out in a method: since it can be overridden, the compiler cannot be sure it will actually never return.
Your usage of the Break constructs seems confused: it already provides a break method, there is no need to provide your own, unless you want to throw your exception instead, which would make using Break unnecessary.
You are left with a couple of options then, since I believe usage of Break is unnecessary in your case.
1) Simply throw an exception on failure
def myFunc(x: Int): String = {
if (x > 10) {
"I am fine"
} else {
throw new RuntimeException("Break happened")
}
}
def usemyFunc(): Unit = {
try {
println("myFunc(11) is " + myFunc(11))
println("myFunc(5) is " + myFunc(5))
} catch {
case e: Throwable => println("myFunc failed with " + e)
}
}
2) Use the Try class (available from Scala 2.10) to return either a value or an exception. This differs from the previous suggestion because it forces the caller to inspect the result and check whether a value is available or not, but makes using the result a bit more cumbersome
import scala.util.Try
def myFunc(x: Int): Try[String] = {
Try {
if (x > 10) {
"I am fine"
} else {
throw new RuntimeException("Break happened")
}
}
}
def useMyFunc(): Unit = {
myFunc match {
case Try.Success(s) => println("myFunc succeded with " + s)
case Try.Failure(e) => println("myFunc failed with " + e)
}
}
3) If the thrown exception isn't relevant, you can use the Option class instead.
You can see how the multiple ways of working with Options relate to each other in
this great cheat sheet.
def myFunc(x: Int): Option[String] = {
if (x > 10) {
Some("I am fine") /* Some(value) creates an Option containing value */
} else {
None /* None represents an Option that has no value */
}
}
/* There are multiple ways to work with Option instances.
One of them is using pattern matching. */
def useMyFunc(): Unit = {
myFunc(10) match {
case Some(s) => println("myFunc succeded with " + s)
case None => println("myFunc failed")
}
}
/* Another one is using the map, flatMap, getOrElse, etc methods.
They usually take a function as a parameter, which is only executed
if some condition is met.
map only runs the received function if the Option contains a value,
and passes said value as a parameter to it. It then takes the result
of the function application, and creates a new Option containing it.
getOrElse checks if the Option contains a value. If it does, it is returned
directly. If it does not, then the result of the function passed to it
is returned instead.
Chaining map and getOrElse is a common idiom meaning roughly 'transform the value
contained in this Option using this code, but if there is no value, return whatever
this other piece of code returns instead'.
*/
def useMyFunc2(): Unit = {
val toPrint = myFunc(10).map{ s =>
"myFunc(10) succeded with " + s
}.getOrElse{
"myFunc(10) failed"
}
/* toPrint now contains a message to be printed, depending on whether myFunc
returned a value or not. The Scala compiler is smart enough to infer that
both code paths return String, and make toPrint a String as well. */
println(toPrint)
}
This is a slightly odd way of doing things (throwing an exception), an alternative way of doing this might be to define a "partial function" (a function which is only defined only a specific subset of it's domain a bit like this:
scala> val partial = new PartialFunction[Int, String] {
| def apply(i : Int) = "some string"
| def isDefinedAt(i : Int) = i < 10
}
partial: PartialFunction[Int, String] = <function1>
Once you've defined the function, you can then "lift" it into an Option of type Int, by doing the following:
scala> val partialLifted = partial.lift
partialOpt: Int => Option[String] = <function1>
Then, if you call the function with a value outside your range, you'll get a "None" as a return value, otherwise you'll get your string return value. This makes it much easier to apply flatMaps/ getOrElse logic to the function without having to throw exceptions all over the place...
scala> partialLifted(45)
res7: Option[String] = None
scala> partialLifted(10)
res8: Option[String] = Some(return string)
IMO, this is a slightly more functional way of doing things...hope it helps