ruamel.yaml support for both none & null - ruamel.yaml

I have a YAML file which is as follows
Input:
a:
test: null
test12:
Expected Output: (no change in Input)
a:
test: null
test12:
below is my code
import ruamel.yaml
def my_represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null', u'')
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.allow_duplicate_keys = True
yaml.default_flow_style = None
yaml.representer.ignore_aliases = lambda *data: True
yaml.representer.add_representer(type(None), my_represent_none)
however, this is leading to having test: null to be changed to test: this should not happen and there should be no change to the input value.
HOw can I achieve this?

First of all, there is no change in the input value. Whether the
Null Language-Independent Type is
represented as ~, null, Null, Null or as the empty string,
the value is always the same.
The output you want is in a generic way, as you have two values that will be loaded
in Python as None and both will normally be represented in the same way. Contrary
to e.g. string scalars, where the information quotes can be preserved by subclassing str
and have useful instance to work with, you cannot sublass None and expect the typical
if data['a']['test'] is None: to work as expected. So there is no preservation of the
actual value of the Null-types in YAML, and the output is standardised to one and the
same value.
However if the first Null-type needs to be represented as null and the all
others as the empty string (as in your expected output), you can do this:
import sys
import ruamel.yaml
yaml_str = """\
a:
test: null
test12:
"""
class NullRepresenter:
def __init__(self):
self.count = 0
def __call__(self, repr, data):
ret_val = repr.represent_scalar(u'tag:yaml.org,2002:null',
u'null' if self.count == 0 else u'')
self.count += 1
return ret_val
my_represent_none = NullRepresenter()
yaml = ruamel.yaml.YAML()
yaml.representer.add_representer(type(None), my_represent_none)
yaml.dump(yaml.load(yaml_str), sys.stdout)
which gives:
a:
test: null
test12:
Given that you could, during loading mark the actual representation of all
Null types and store them (with some extra methods of NullRepresenter)
and retrieve them during dumping. This will however be non-trivial
if the number or the order of keys/elements with None values changes,
or the context changes and e.g. an empty string can no longer be used as representation.
In Python None is one and the same object wherever used in your data structure.
To represent that inconsistently doesn't make sense, and is
likely to confuse human interpretation of the YAML document.

Related

Pattern for generating negative Scalacheck scenarios: Using property based testing to test validation logic in Scala

We are looking for a viable design pattern for building Scalacheck Gen (generators) that can produce both positive and negative test scenarios. This will allow us to run forAll tests to validate functionality (positive cases), and also verify that our case class validation works correctly by failing on all invalid combinations of data.
Making a simple, parameterized Gen that does this on a one-off basis is pretty easy. For example:
def idGen(valid: Boolean = true): Gen[String] = Gen.oneOf(ID.values.toList).map(s => if (valid) s else Gen.oneOf(simpleRandomCode(4), "").sample.get)
With the above, I can get a valid or invalid ID for testing purposes. The valid one, I use to make sure business logic succeeds. The invalid one, I use to make sure our validation logic rejects the case class.
Ok, so -- problem is, on a large scale, this becomes very unwieldly. Let's say I have a data container with, oh, 100 different elements. Generating a "good" one is easy. But now, I want to generate a "bad" one, and furthermore:
I want to generate a bad one for each data element, where a single data element is bad (so at minimum, at least 100 bad instances, testing that each invalid parameter is caught by validation logic).
I want to be able to override specific elements, for instance feeding in a bad ID or a bad "foobar." Whatever that is.
One pattern we can look to for inspiration is apply and copy, which allows us to easily compose new objects while specifying overridden values. For example:
val f = Foo("a", "b") // f: Foo = Foo(a,b)
val t = Foo.unapply(f) // t: Option[(String, String)] = Some((a,b))
Foo(t.get._1, "c") // res0: Foo = Foo(a,c)
Above we see the basic idea of creating a mutating object from the template of another object. This is more easily expressed in Scala as:
val f = someFoo copy(b = "c")
Using this as inspiration we can think about our objectives. A few things to think about:
First, we could define a map or a container of key/values for the data element and generated value. This could be used in place of a tuple to support named value mutation.
Given a container of key/value pairs, we could easily select one (or more) pairs at random and change a value. This supports the objective of generating a data set where one value is altered to create failure.
Given such a container, we can easily create a new object from the invalid collection of values (using either apply() or some other technique).
Alternatively, perhaps we can develop a pattern that uses a tuple and then just apply() it, kind of like the copy method, as long as we can still randomly alter one or more values.
We can probably explore developing a reusable pattern that does something like this:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => invalidV = v.invalidate(1); validate(invalidV) must beFalse }
In the above code, we have a generator thingGen that returns (valid) Things. Then for all instances returned, we invoke a generic method invalidate(count: Int) which will randomly invalidate count values, returning an invalid object. We can then use that to ascertain whether our validation logic works correctly.
This would require defining an invalidate() function that, given a parameter (either by name, or by position) can then replace the identified parameter with a value that is known to be bad. This implies have an "anti-generator" for specific values, for instance, if an ID must be 3 characters, then it knows to create a string that is anything but 3 characters long.
Of course to invalidate a known, single parameter (to inject bad data into a test condition) we can simply use the copy method:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => v2 = v copy(id = "xxx"); validate(v2) must beFalse }
That is the sum of my thinking to date. Am I barking up the wrong tree? Are there good patterns out there that handle this kind of testing? Any commentary or suggestions on how best to approach this problem of testing our validation logic?
We can combine a valid instance and an set of invalid fields (so that every field, if copied, would cause validation failure) to get an invalid object using shapeless library.
Shapeless allows you to represent your class as a list of key-value pairs that are still strongly typed and support some high-level operations, and converting back from this representation to your original class.
In example below I'll be providing an invalid instance for each single field provided
import shapeless._, record._
import shapeless.labelled.FieldType
import shapeless.ops.record.Updater
A detailed intro
Let's pretend we have a data class, and a valid instance of it (we only need one, so it can be hardcoded)
case class User(id: String, name: String, about: String, age: Int) {
def isValid = id.length == 3 && name.nonEmpty && age >= 0
}
val someValidUser = User("oo7", "Frank", "A good guy", 42)
assert(someValidUser.isValid)
We can then define a class to be used for invalid values:
case class BogusUserFields(name: String, id: String, age: Int)
val bogusData = BogusUserFields("", "1234", -5)
Instances of such classes can be provided using ScalaCheck. It's much easier to write a generator where all fields would cause failure. Order of fields doesn't matter, but their names and types do. Here we excluded about from User set of fields so we can do what you asked for (feeding only a subset of fields you want to test)
We then use LabelledGeneric[T] to convert User and BogusUserFields to their corresponding record value (and later we will convert User back)
val userLG = LabelledGeneric[User]
val bogusLG = LabelledGeneric[BogusUserFields]
val validUserRecord = userLG.to(someValidUser)
val bogusRecord = bogusLG.to(bogusData)
Records are lists of key-value pairs, so we can use head to get a single mapping, and the + operator supports adding / replacing field to another record. Let's pick every invalid field into our user one at a time. Also, here's the conversion back in action:
val invalidUser1 = userLG.from(validUserRecord + bogusRecord.head)// invalid name
val invalidUser2 = userLG.from(validUserRecord + bogusRecord.tail.head)// invalid ID
val invalidUser3 = userLG.from(validUserRecord + bogusRecord.tail.tail.head) // invalid age
assert(List(invalidUser1, invalidUser2, invalidUser3).forall(!_.isValid))
Since we basically are applying the same function (validUserRecord + _) to every key-value pair in our bogusRecord, we can also use map operator, except we use it with an unusual - polymorphic - function. We can also easily convert it to List, because every element will be of a same type now.
object polymerge extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[userLG.Repr, FieldType[K, V]]) =
at[FieldType[K, V]](upd(validUserRecord, _))
}
val allInvalidUsers = bogusRecord.map(polymerge).toList.map(userLG.from)
assert(allInvalidUsers == List(invalidUser1, invalidUser2, invalidUser3))
Generalizing and removing all the boilerplate
Now the whole point of this was that we can generalize it to work for any two arbitrary classes. The encoding of all relationships and operations is a bit cumbersome and it took me a while to get it right with all the implicit not found errors, so I'll skip the details.
class Picks[A, AR <: HList](defaults: A)(implicit lgA: LabelledGeneric.Aux[A, AR]) {
private val defaultsRec = lgA.to(defaults)
object mergeIntoTemplate extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[AR, FieldType[K, V]]) =
at[FieldType[K, V]](upd(defaultsRec, _))
}
def from[B, BR <: HList, MR <: HList, F <: Poly](options: B)
(implicit
optionsLG: LabelledGeneric.Aux[B, BR],
mapper: ops.hlist.Mapper.Aux[mergeIntoTemplate.type, BR, MR],
toList: ops.hlist.ToTraversable.Aux[MR, List, AR]
) = {
optionsLG.to(options).map(mergeIntoTemplate).toList.map(lgA.from)
}
}
So, here it is in action:
val cp = new Picks(someValidUser)
assert(cp.from(bogusData) == allInvalidUsers)
Unfortunately, you cannot write new Picks(someValidUser).from(bogusData) because implicit for mapper requires a stable identifier. On the other hand, cp instance can be reused with other types:
case class BogusName(name: String)
assert(cp.from(BogusName("")).head == someValidUser.copy(name = ""))
And now it works for all types! And bogus data is required to be any subset of class fields, so it will work even for class itself
case class Address(country: String, city: String, line_1: String, line_2: String) {
def isValid = Seq(country, city, line_1, line_2).forall(_.nonEmpty)
}
val acp = new Picks(Address("Test country", "Test city", "Test line 1", "Test line 2"))
val invalidAddresses = acp.from(Address("", "", "", ""))
assert(invalidAddresses.forall(!_.isValid))
You can see the code running at ScalaFiddle

scala hashmap get string value returns some()

val vJsonLoc = new HashMap[String, String]();
def getPrevJson(s:String) = vJsonLoc.get(s)
val previousFile = getPrevJson(s"/${site.toLowerCase}/$languagePath/$channel/v$v/$segment")
this returns
Some(/Users/abc/git/abc-c2c/)
on trying to append string previousFile + "/" + index + ".json"
the result is Some(/Users/abc/git/abc-c2c/)/0.json when the desired result is /Users/abc/git/abc-c2c/0.json
Guess this is some concept of Option that have not understood. New to scala.
As you pointed out, you're getting back an Option type, and not a direct reference to the String contained in your data structure. This is a very standard Scala practice, allowing you to better handle cases where an expected value might not be present in your data structure.
For example, in Java, this type of method typically returns the value if it exists and null if it doesn't. This means, however, subsequent code could be operating on the null value and thus you'd need further protection against exceptions.
In Scala, you're getting a reference to an object which may, or may not, have the value you expect. This is the Option type, and can be either Some (in which case the reference is accessible) or None (in which case you have several options for handling it).
Consider your code:
val vJsonLoc = new HashMap[String, String]();
def getPrevJson(s:String) = vJsonLoc.get(s)
val previousFile = getPrevJson(s"/${site.toLowerCase}/$languagePath/$channel/v$v/$segment")
If the HashMap returned String, your previousFile reference could point to either a null value or to a String value. You'd need to protect against a potential exception (regular practice in Java).
But in Scala, get is returning an Option type, which can be handled in a number of ways:
val previousFile = getPrevJson("your_string").getOrElse("")
//or
val previousFile = getPrevJson("your_string") match {
case Some(ref) => ref
case None => ""
}
The resulting reference previousFile will point to a String value: either the expected value ("get") or the empty string ("OrElse").
Scala Map on get returns Option. Use vJsonLoc(s) instead of vJsonLoc.get(s)

How do I get an object's type and pass it along to asInstanceOf in Scala?

I have a Scala class that reads formatting information from a JOSN template file, and data from a different file. The goal is to format as a JSON object specified by the template file. I'm getting the layout working, but now I want to set the type of my output to the type in my template (i.e. if I have a field value as a String in the template, it should be a string in the output, even if it's an integer in the raw data).
Basically, I'm looking for a quick and easy way of doing something like:
output = dataValue.asInstanceOf[templateValue.getClass]
That line gives me an error that type getClass is not a member of Any. But I haven't been able to find any other member or method that gives me an variable type at runtime. Is this possible, and if so, how?
Clarification
I should add, by this point in my code, I know I'm dealing with just a key/value pair. What I'd like is the value's type.
Specifically, given the JSON template below, I want the name to be cast to a String, age to be cast to an integer, and salary to be cast a decimal on output regardless of how it appears in the raw data file (it could be all strings, age and salary could both be ints, etc.). What I was hoping for is a simple cast that didn't require me to do pattern matching to handle each data type specifically.
Example template:
people: [{
name: "value",
age: 0,
salary: 0.00
}]
Type parameters must be known at compile time (type symbols), and templateValue.getClass is just a plain value (of type Class), so it cannot be used as type parameter.
What to do instead - this depends on your goal, which isn't yet clear to me... but it may look like
output = someMethod(dataValue, templateValue.getClass),
and inside that method you may do different computations depending on second argument of type Class.
How do I get an object's type and pass it along to asInstanceOf in Scala?
The method scala.reflect.api.JavaUniverse.typeOf[T] requires it's type argument to be hard-coded by the caller or type-inferred. To type-infer, create a utility method like the following (works for all types, even generics - it counteracts java runtime type arg erasure by augmenting T during compilation with type tag metadata):
// http://www.scala-lang.org/api/current/index.html#scala.reflect.runtime.package
import scala.reflect.runtime.universe._
def getType[T: TypeTag](a: T): Type = typeOf[T]
3 requirements here:
type arg implements TypeTag (but previous implementation via Manifest still available...)
one or more input args are typed T
return type is Type (if you want the result to be used externally to the method)
You can invoke without specifying T (it's type-inferred):
import scala.reflect.runtime.universe._
def getType[T: TypeTag](a: T): Type = typeOf[T]
val ls = List[Int](1,2,3)
println(getType(ls)) // prints List[Int]
However, asInstanceOf will only cast the type to a (binary consistent) type in the hierarchy with no conversion of data or format. i.e. the data must already be in the correct binary format - so that won't solve your problem.
Data Conversion
A few methods convert between Integers and Strings:
// defined in scala.Any:
123.toString // gives "123"
// implicitly defined for java.lang.String via scala.collection.immutable.StringOps:
123.toHexString // gives "7b"
123.toOctalString // gives "173"
"%d".format(123) // also gives "123"
"%5d".format(123) // gives " 123"
"%05d".format(123) // gives "00123"
"%01.2f".format(123.456789) // gives "123.46"
"%01.2f".format(123.456789) // gives "0.46"
// implicitly defined for java.lang.String via scala.collection.immutable.StringOps:
" 123".toInt // gives 123
"00123".toInt // gives 123
"00123.4600".toDouble // gives 123.46
".46".toDouble // gives 0.46
Parsing directly from file to target type (no cast or convert):
Unfortunately, scala doesn't have a method to read the next token in a stream as an integer/float/short/boolean/etc. But you can do this by obtaining a java FileInputStream, wrapping it in a DataInputStream and then calling readInt, readFloat, readShort, readBoolean, etc.
In a type-level context the value-level terms still have a few accessors. The first one and the one you asked for is the type of the value itself (type):
output = dataValue.asInstanceOf[templateValue.type]
if the type of the value has inner members, those become available as well:
class A {
class B {}
}
val a: A = new A
val b: a.B = new a.B
Notice b: a.B.
I must also mention how to access such members without a value-level term:
val b: A#B = new a.B

Why is it bad to use data types to initialize instances in Python?

Sorry if the answer to this is obvious, but I want to be sure, if you have(in Python):
class Name(object):
def __init__(self):
self.int_att1 = 0
self.int_att2 = 0
self.list_att = []
why can't(I know you can, but I've never seen a programmer do this) you initialize it in this way:
class Name(object):
def __init__(self):
self.int_att1 = int
self.int_att2 = int
self.list_att = list
Are there bad effects that come out of this?
What makes the former more Pythonic?
My best guess is because Python needs a value for instance.attribute. But if this is the case than why is it that using datatype works?
When you do self.int_att1 = int, the attribute int_att1 now references to the type int, I mean, is just another name for int. And I'll prove this in a simple way:
>>> number_type = int
>>> i = number_type()
>>> i
0
>>> type(i)
<type 'int'>
In this code, number_type becomes a "synonym" for the type int, that's why when I asked the type of i, it returned int and not number_type.
Same for self.list_att = list.
Probably, what you wanted to do was:
class Name(object):
def __init__(self):
self.int_att1 = int()
self.int_att2 = int()
self.list_att = list()
This is totally different, you are not longer assigning the type to the attribute. Instead, you are assigning the initial value of that type. Let's see what those initial values are:
>>> initial_int = int()
>>> initial_int
0
>>> initial_list = list()
>>> initial_list
[]
So, we can clearly see that the initial value of the type int is 0 and the initial value of the type list is [], and that is exactly what you were looking for!
You can, but it has a totally different effect. In the second case, you're setting the attribute equal to the value of the type, which is usually not what you want. In Python types are objects too, so this is legal.
It would let you do something like:
n = Name()
v = n.int_att1()
v would then be 0, because that's the value constructed by int's no-arg constructor.
From The Zen of Python: "Explicit is better than implicit."
You could certainly initialize with the default value of a type (not the type itself as in your example), but explicit is better than implicit. When you initialize with an actual value, there's no doubt about what you're getting.

Scala match/compare enumerations

I have an enumeration that I want to use in pattern matches in an actor. I'm not getting what i'd expect and, now, I'm suspecting I'm missing something simple.
My enumeration,
object Ops extends Enumeration {
val Create = Value("create")
val Delete = Value("delete")
}
Then, I create an Ops from a String:
val op = Ops.valueOf("create")
Inside my match, I have:
case (Ops.Create, ...)
But Ops.Create doesn't seem to equal ops.valueOf("create")
The former is just an atom 'create' and the later is Some(create)
Hopefully, this is enough info for someone to tell me what I'm missing...
Thanks
If you are just trying to get a copy of Create, then you should refer to it directly in your code:
val op = Ops.Create
But if you are parsing it from a string, the string might contain junk, so valueOf returns an Option:
val op1 = Ops.valueOf("create") // Some(Ops.Create)
val op2 = Ops.valueOf("delete") // Some(Ops.Delete)
val op3 = Ops.valueOf("aljeaw") // None
Now, in your match you can just carry along the Option[Ops.Value] and look for:
case(Some(Ops.Create),...)
and you have built-in robustness to junk as input.
Enumeration.valueOf returns None or Some, because you may be asking to create a value that doesn't exist. In your case, for example, Ops.valueOf("blah") would return None, since you don't have an appropriate enumeration value.
To be honest, in this case, I'd use a case class or a case object instead of an Enumeration (they provide better type safety).
It looks like I needed to use the 'get' method of the returned Some to actually get what I wanted. E.g.
ops.valueOf("create").get == Ops.Create
Seems neither intuitive nor friendly but it works.