I have a XML document representing my model that I need to parse and save in db. In some fields it may have NULL values indicated by xsi:nil. Like so
<quantity xsi:nil="true"/>
For parsing I use scala.xml DSL. The problem is I can't find any way of determining if something is nil or not. This: (elem \ "quantity") just returns an empty string which then blows up when I try to convert it to number. Also wrapping that with Option doesn't help.
Is there any way to get None, Nil or even null from that XML piece?
In this case, you could use namespace URI with your XML with attribute method to get the text in the "xsi:nil" attribute.
Here is a working example:
scala> val xml = <quantity xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
xml: scala.xml.Elem = <quantity xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"></quantity>
scala> xml.attribute("http://www.w3.org/2001/XMLSchema-instance", "nil")
res0: Option[Seq[scala.xml.Node]] = Some(true)
If you consider a empty node is None, then you don't even need to bother the attribute. Just filter out the node without any text inside it, and using headOption to get the value.
scala> val s1 = <quantity xsi:nil="true">12</quantity>
s1: scala.xml.Elem = <quantity xsi:nil="true">12</quantity>
scala> val s2 = <quantity xsi:nil="true"/>
s2: scala.xml.Elem = <quantity xsi:nil="true"></quantity>
scala> s1.filterNot(_.text.isEmpty).headOption.map(_.text.toInt)
res10: Option[Int] = Some(12)
scala> s2.filterNot(_.text.isEmpty).headOption.map(_.text.toInt)
res11: Option[Int] = None
If you use xtract you can do this with a combination of filter and otpional:
(__ \ "quantity").read[Node]
.filter(_.attribute("http://www.w3.org/2001/XMLSchema-instance", "nil").isEmpty)
.map(_.toDouble).optional
See https://www.lucidchart.com/techblog/2016/07/12/introducing-xtract-a-new-xml-deserialization-library-for-scala/
Disclaimer: I work for Lucid Software and am a contributor to xtract.
Related
I am actually working on windows and I have to parse xml from a file.
The issue is when i parse the root element, and get the children via the child method, I am getting empty children.
XML.load("my_path\\sof.xml").child
res0: Seq[scala.xml.Node] = List(
, <b/>,
)
This is my xml file
sof.xml
<a>
<b></b>
</a>
But when I remove every \n and \r of the file like this :
sof.xml
<a><b></b></a>
I got the following result which is expected
res0: Seq[scala.xml.Node] = List(<b/>)
My question is, is there an option to read it correctly from the intended form?
The issue is the newlines/whitespace are treated as Text nodes. The scala.xml.Utility.trim(x: Node) method will remove the unnecessary whitespace:
scala> val a = XML.loadString("""<a>
| <b></b>
| </a>""")
a: scala.xml.Elem =
<a>
<b/>
</a>
scala> scala.xml.Utility.trim(a)
res0: scala.xml.Node = <a><b/></a>
Note that this differs from the .collect method if you have actual Text nodes inbetween elements, e.g.:
scala> val a = XML.loadString("""<a>
| <b>Test </b> Foo
| </a>""")
a: scala.xml.Elem =
<a>
<b>Test </b> Foo
</a>
scala> scala.xml.Utility.trim(a).child
res0: Seq[scala.xml.Node] = List(<b>Test</b>, Test)
scala> a.child.collect { case e: scala.xml.Elem => e }
res1: Seq[scala.xml.Elem] = List(<b>Test </b>)
Using .collect method, the "Foo" string is excluded from the children list.
I checked that with this on Mac:
XML.loadString("""<a>
| <b></b>
|</a>""").child
This results in the same behavior - which I also not understand.
However this can fix this in your code:
XML.loadString("""<a>
| <b></b>
|</a>""").child
.collect{ case e: Elem=> e}
This will eliminate the xml.Texts.
I have rdd with json rows:
val jsons = sc.textFile("hdfs://" + directory + "articles_json/*/*").flatMap(_.split("\n")).
map(x => JSON.parseFull(x))
Each json has field "dc:title" and i want to create rdd with these titles and with indexes.
val titles_rdd = jsons.filter(x => x.isDefined).
map(x => x.get.asInstanceOf[Map[String, Any]].
get("dc:title").get.asInstanceOf[String]).zipWithIndex()
But, i don't understand, should i use .get in x => x.get.asInstanceOf in map, or just x => x.asInstanceOf? And the same question about .get after get("dc:title")?
Did you try with sqlContext? Parsing is much simpler with this.
https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets
It would be great, if you can give sample json of yours
EDITED:
i assume this is your question,
you have a list,
scala> val a = List(Some(1),Some(2),Some(3),None,Some(4))
a: List[Option[Int]] = List(Some(1), Some(2), Some(3), None, Some(4))
you want to know whether you should be using as below to retrieve values,
scala> val b = a.filter{_.isDefined}.map{x => x.get.asInstanceOf[Int]}
b: List[Int] = List(1, 2, 3, 4)
OR
like this,
scala> val b = a.filter{_.isDefined}.map{x => x.asInstanceOf[Int]}
If you run above code, you'll get below exception.
java.lang.ClassCastException: scala.Some cannot be cast to
java.lang.Integer at
scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:105) at
$anonfun$2.apply(:8) at $anonfun$2.apply(:8) at
scala.collection.immutable.List.map(List.scala:272) ... 33 elided
Reason is pretty simple, you want value that is residing inside the Some, but your question is about how to convert Some to your desired object.
in you above example , line 2,
map (x => x...)
x will be of type Some ,if you want its value , you have to call get function or else you won't get the value.
below link will be of some help.
http://www.scala-lang.org/api/current/index.html#scala.Some
Please let me know if your question still stands unclarified
If all the lines parse into JSON objects (ie. no arrays), you could use a for comprehension:
val titles_rdd = (for {
json <- jsons
jmap <- json
jtitle <- jmap.get("dc:title")
} yield jtitle) zipWithIndex
I have 2 RDD's that I joined them together using left join. As a result, the fields of the right RDD are now defined as Option as they might be None (null). when writing the result to a file it looks something like this: Some(value) for example: Some('value1'), Some('Value2').
How can I remove the 'Some' / remove the Option from the field definition?
If you have an Option[String] and turn this into a String, you still need to handle the case where your value is None.
For example you can turn None's into empty strings:
val myInput: Option[String] = ...
val myOutput: String = myInput.getOrElse("")
Or into null's:
val myInput: Option[String] = ...
val myOutput: String = myInput.orNull
Or not write them at all:
val myInput: Option[String] = ...
// Does nothing if myInput is None
myInput.foreach(writeToFile)
post("/api/v1/multi_preview/create"){
val html = getParam("html").get
val subject = getParam("subject").get
}
I want to know what exactly the .get method does in scala. getParam() is already returning the parameters to the post hit . I know that .get will make it easier as we dont have to "match" to check for null values as it will automotically thrown an exception in the former case.
Is there more to it than meets the eye?
It's usually a function on Options (i.e. Some or None). It gets you the contained element if it exists, otherwise it throws a NoSuchElementException.
https://www.scala-lang.org/api/current/scala/Option.html
scala> val x:Option[Int] = Some(42)
x: Option[Int] = Some(42)
scala> x.get
res2: Int = 42
scala> None.get
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:322)
... 32 elided
As a side note, you should try to avoid using get because it lands you back in the land of null-pointer exceptions. Instead, try to use getOrElse, or continue to use your Option value through higher-order functions like map, filter, fold, reduce etc.
Here is an example of how you can use it to your advantage:
scala> def foo(opt:Option[Int]) = opt map (_+2) filter (_%2 == 0) map (_+1)
foo: (opt: Option[Int])Option[Int]
scala> foo(Some(40))
res4: Option[Int] = Some(43)
scala> foo(Some(41))
res5: Option[Int] = None
scala> foo(None)
res6: Option[Int] = None
You can just pretend that the value is always specified if you don't "touch" it directly.
I suppose that's some Scalatra related code, if that's the case, getParam return an Option. Options are a wrapper around types that allow you to avoid having to check for nulls (and other kind of utilities too), in fact a value wrapped in an Option can be Some, in which case you can use get to access the value, e.g.
val someString = Option("some text")
println(someString.get) // prints "some text"
Or can be a None in which case when calling get you get an exception, wether a value is a Some or None can be determined via param match
someOption match {
case Some(value) => doSomething(value)
case None => doSomethingElse()
}
Or using isDefined which returns true if it's Some, false if it's None.
Note that your code could throw exceptions since you call get without knowing if it's a Some or None, you should use getOrElse which returns the value the Option holds if there's any, or a default specified parameter:
val someNone = Option(null)
println(someNone.getOrElse("some default")) // prints "some default"
Let's say that I have a List (or the values in a Map), and i want to perform an operation on each item. But unfortunately, for whatever reason, this list of values can contain nulls.
scala> val players = List("Messi", null, "Xavi", "Iniesta", null)
players: List[java.lang.String] = List(Messi, null, Xavi, Iniesta, null)
In order to avoid blowing up with a NPE, i need to do the following:
scala> players.filterNot(_ == null ).map(_.toUpperCase)
res84: List[java.lang.String] = List(MESSI, XAVI, INIESTA)
Is there any better way of doing this?
Ideally something like:
players.safeMap(_.toUpperCase)
On the scala-language mailing list, Simon proposed this:
players.filter ( null !=).map(_.toUpperCase )
which is shorter version of my original take, and as short as you can get without a dedicated method.
Even better, Stefan and Kevin proposed the method withFilter which will return a lazy proxy, so both operations can be merged.
players.withFilter ( null !=).map(_.toUpperCase )
If you can’t avoid nulls (e.g. if you get your list from Java code), another alternative is to use collect instead of map:
scala> players.collect { case player if player != null => player.toUpperCase }
res0: List[java.lang.String] = List(MESSI, XAVI, INIESTA)
I'd do this:
players flatMap Option map (_.toUpperCase)
But that's worse than collect. filter + map is always better done with collect.
You could convert to a list of Option[String]:
scala> val optionPlayers = players.map(Option(_))
optionPlayers: List[Option[java.lang.String]] = List(Some(Messi), None, Some(Xavi), Some(Iniesta), None)
Option is universally preferred to null and it gives you a lot of flexibility in how you can safely handle the data. Here's are thee easy ways to get the result you were looking for:
scala> optionPlayers.collect { case Some(s) => s.toUpperCase }
res0: List[java.lang.String] = List(MESSI, XAVI, INIESTA)
scala> optionPlayers.flatMap(_.map(_.toUpperCase))
res1: List[java.lang.String] = List(MESSI, XAVI, INIESTA)
scala> optionPlayers.flatten.map(_.toUpperCase)
res2: List[java.lang.String] = List(MESSI, XAVI, INIESTA)
You can find a lot more information about Option in other StackOverflow questions or by searching the web.
Or, you can always just define that safeMap method you wanted as an implicit on List:
implicit def enhanceList[T](list: List[T]) = new {
def safeMap[R](f: T => R) = list.filterNot(_ == null).map(f)
}
so you can do:
scala> players.safeMap(_.toUpperCase)
res4: List[java.lang.String] = List(MESSI, XAVI, INIESTA)
Though if you define an implicit, you might want to use a CanBuildFrom style like the basic collections do to make it work on more than just List. You can find more information about that elsewhere.