Prevent empty values in an array being inserted into Mongo collection - mongodb

I am trying to prevent empty values being inserted into my mongoDB collection. The field in question looks like this:
MongoDB Field
"stadiumArr" : [
"Old Trafford",
"El Calderon",
...
]
Sample of (mapped) case class
case class FormData(_id: Option[BSONObjectID], stadiumArr: Option[List[String]], ..)
Sample of Scala form
object MyForm {
val form = Form(
mapping(
"_id" -> ignored(Option.empty[BSONObjectID]),
"stadiumArr" -> optional(list(text)),
...
)(FormData.apply)(FormData.unapply)
)
}
I am also using the Repeated Values functionality in Play Framework like so:
Play Template
#import helper._
#(myForm: Form[models.db.FormData])(implicit request: RequestHeader, messagesProvider: MessagesProvider)
#repeatWithIndex(myForm("stadiumArr"), min = 5) { (stadium, idx) =>
#inputText(stadium, '_label -> ("stadium #" + (idx + 1)))
}
This ensures that whether there are at least 5 values or not in the array; there will still be (at least) 5 input boxes created. However if one (or more) of the input boxes are empty when the form is submitted an empty string is still being added as value in the array, e.g.
"stadiumArr" : [
"Old Trafford",
"El Calderon",
"",
"",
""
]
Based on some other ways of converting types from/to the database; I've tried playing around with a few solutions; such as:
implicit val arrayWrite: Writes[List[String]] = new Writes[List[String]] {
def writes(list: List[String]): JsValue = Json.arr(list.filterNot(_.isEmpty))
}
.. but this isn't working. Any ideas on how to prevent empty values being inserted into the database collection?

Without knowing specific versions or libraries you're using it's hard to give you an answer, but since you linked to play 2.6 documentation I'll assume that's what you're using there. The other assumption I'm going to make is that you're using reactive-mongo library. Whether or not you're using the play plugin for that library or not is the reason why I'm giving you two different answers here:
In that library, with no plugin, you'll have defined a BSONDocumentReader and a BSONDocumentWriter for your case class. This might be auto-generated for you with macros or not, but regardless how you get it, these two classes have useful methods you can use to transform the reads/writes you have to another one. So, let's say I defined a reader and writer for you like this:
import reactivemongo.bson._
case class FormData(_id: Option[BSONObjectID], stadiumArr: Option[List[String]])
implicit val formDataReaderWriter = new BSONDocumentReader[FormData] with BSONDocumentWriter[FormData] {
def read(bson: BSONDocument): FormData = {
FormData(
_id = bson.getAs[BSONObjectID]("_id"),
stadiumArr = bson.getAs[List[String]]("stadiumArr").map(_.filterNot(_.isEmpty))
)
}
def write(formData: FormData) = {
BSONDocument(
"_id" -> formData._id,
"stadiumArr" -> formData.stadiumArr
)
}
}
Great you say, that works! You can see in the reads I went ahead and filtered out any empty strings. So even if it's in the data, it can be cleaned up. That's nice and all, but let's notice I didn't do the same for the writes. I did that so I can show you how to use a useful method called afterWrite. So pretend the reader/writer weren't the same class and were separate, then I can do this:
val initialWriter = new BSONDocumentWriter[FormData] {
def write(formData: FormData) = {
BSONDocument(
"_id" -> formData._id,
"stadiumArr" -> formData.stadiumArr
)
}
}
implicit val cleanWriter = initialWriter.afterWrite { bsonDocument =>
val fixedField = bsonDocument.getAs[List[String]]("stadiumArr").map(_.filterNot(_.isEmpty))
bsonDocument.remove("stadiumArr") ++ BSONDocument("stadiumArr" -> fixedField)
}
Note that cleanWriter is the implicit one, that means when the insert call on the collection happens, it will be the one chosen to be used.
Now, that's all a bunch of work, if you're using the plugin/module for play that lets you use JSONCollections then you can get by with just defining play json Reads and Writes. If you look at the documentation you'll see that the reads trait has a useful map function you can use to transform one Reads into another.
So, you'd have:
val jsonReads = Json.reads[FormData]
implicit val cleanReads = jsonReads.map(formData => formData.copy(stadiumArr = formData.stadiumArr.map(_.filterNot(_.isEmpty))))
And again, because only the clean Reads is implicit, the collection methods for mongo will use that.
NOW, all of that said, doing this at the database level is one thing, but really, I personally think you should be dealing with this at your Form level.
val form = Form(
mapping(
"_id" -> ignored(Option.empty[BSONObjectID]),
"stadiumArr" -> optional(list(text)),
...
)(FormData.apply)(FormData.unapply)
)
Mainly because, surprise surprise, form has a way to deal with this. Specifically, the mapping class itself. If you look there you'll find a transform method you can use to filter out empty values easily. Just call it on the mapping you need to modify, for example:
"stadiumArr" -> optional(
list(text).transform(l => l.filter(_.nonEmpty), l => l.filter(_.nonEmpty))
)
To explain a little more about this method, in case you're not used to reading the signatures in the scaladoc.
def
transform[B](f1: (T) ⇒ B, f2: (B) ⇒ T): Mapping[B]
says that by calling transform on some mapping of type Mapping[T] you can create a new mapping of type Mapping[B]. In order to do this you must provide functions that convert from one to the other. So the code above causes the list mapping (Mapping[List[String]]) to become a Mapping[List[String]] (the type did not change here), but when it does so it removes any empty elements. If I break this code down a little it might be more clear:
def convertFromTtoB(list: List[String]): List[String] = list.filter(_.nonEmpty)
def convertFromBtoT(list: List[String]): List[String] = list.filter(_.nonEmpty)
...
list(text).transform(convertFromTtoB, convertFromBtoT)
You might wondering why you need to provide both, the reason is because when you call Form.fill and the form is populated with values, the second method will be called so that the data goes into the format the play form is expecting. This is more obvious if the type actually changes. For example, if you had a text area where people could enter CSV but you wanted to map it to a form model that had a proper List[String] you might do something like:
def convertFromTtoB(raw: String): List[String] = raw.split(",").filter(_.nonEmpty)
def convertFromBtoT(list: List[String]): String = list.mkString(",")
...
text.transform(convertFromTtoB, convertFromBtoT)
Note that when I've done this in the past sometimes I've had to write a separate method and just pass it in if I didn't want to fully specify all the types, but you should be able to work from here given the documentation and type signature for the transform method on mapping.
The reason I suggest doing this in the form binding is because the form/controller should be the one with the concern of dealing with your user data and cleaning things up I think. But you can always have multiple layers of cleaning and whatnot, it's not bad to be safe!

I've gone for this (which always seems obvious when it's written and tested):
implicit val arrayWrite: Writes[List[String]] = new Writes[List[String]] {
def writes(list: List[String]): JsValue = Json.toJson(list.filterNot(_.isEmpty).toIndexedSeq)
}
But I would be interested to know how to
.map the existing Reads rather than redefining from scratch
as #cchantep suggests

Related

How can I dynamically (runtime) generate a sorted collection in Scala using the java.lang.reflect.Type?

Given an array of items I need to generate a sorted collection in Scala for a java.lang.reflect.Type but I'm unable to do so. The following snippet might explain better.
def buildList(paramType: Type): SortedSet[_] = {
val collection = new Array[Any](5)
for (i <- 0 until 5) {
collection(i) = new EasyRandom().nextObject(paramType.asInstanceOf[Class[Any]])
}
SortedSet(collection:_*)
}
I'm unable to do as I get the error "No implicits found for parameter ord: Ordering[Any]". I'm able to work around this if I swap to an unsorted type such as Set.
def buildList(paramType: Type): Set[_] = {
val collection = new Array[Any](5)
for (i <- 0 until 5) {
collection(i) = new EasyRandom().nextObject(paramType.asInstanceOf[Class[Any]])
}
Set(collection:_*)
}
How can I dynamically build a sorted set at runtime? I've been looking into how Jackson tries to achieve the same but I couldn't quite follow how to get T here: https://github.com/FasterXML/jackson-module-scala/blob/0e926622ea4e8cef16dd757fa85400a0b9dcd1d3/src/main/scala/com/fasterxml/jackson/module/scala/introspect/OrderingLocator.scala#L21
(Please excuse me if my question is unclear.)
This happens because SortedSet needs a contextual (implicit) Ordering type class instance for a given type A
However, as Luis said on the comment section, I'd strongly advice you against using this approach and using a safer, strongly typed one, instead.
Generating random case classes (which I suppose you're using since you're using Scala) should be easy with the help of some libraries like magnolia. That would turn your code into something like this:
def randomList[A : Ordering : Arbitrary]: SortedSet[A] = {
val arb: Arbitrary[A] = implicitly[Arbitrary[A]]
val sampleData = (1 to 5).map(arb.arbitrary.sample)
SortedSet(sampleData)
}
This approach involves some heavy concepts like implicits and type classes, but is way safer.

Pattern for generating negative Scalacheck scenarios: Using property based testing to test validation logic in Scala

We are looking for a viable design pattern for building Scalacheck Gen (generators) that can produce both positive and negative test scenarios. This will allow us to run forAll tests to validate functionality (positive cases), and also verify that our case class validation works correctly by failing on all invalid combinations of data.
Making a simple, parameterized Gen that does this on a one-off basis is pretty easy. For example:
def idGen(valid: Boolean = true): Gen[String] = Gen.oneOf(ID.values.toList).map(s => if (valid) s else Gen.oneOf(simpleRandomCode(4), "").sample.get)
With the above, I can get a valid or invalid ID for testing purposes. The valid one, I use to make sure business logic succeeds. The invalid one, I use to make sure our validation logic rejects the case class.
Ok, so -- problem is, on a large scale, this becomes very unwieldly. Let's say I have a data container with, oh, 100 different elements. Generating a "good" one is easy. But now, I want to generate a "bad" one, and furthermore:
I want to generate a bad one for each data element, where a single data element is bad (so at minimum, at least 100 bad instances, testing that each invalid parameter is caught by validation logic).
I want to be able to override specific elements, for instance feeding in a bad ID or a bad "foobar." Whatever that is.
One pattern we can look to for inspiration is apply and copy, which allows us to easily compose new objects while specifying overridden values. For example:
val f = Foo("a", "b") // f: Foo = Foo(a,b)
val t = Foo.unapply(f) // t: Option[(String, String)] = Some((a,b))
Foo(t.get._1, "c") // res0: Foo = Foo(a,c)
Above we see the basic idea of creating a mutating object from the template of another object. This is more easily expressed in Scala as:
val f = someFoo copy(b = "c")
Using this as inspiration we can think about our objectives. A few things to think about:
First, we could define a map or a container of key/values for the data element and generated value. This could be used in place of a tuple to support named value mutation.
Given a container of key/value pairs, we could easily select one (or more) pairs at random and change a value. This supports the objective of generating a data set where one value is altered to create failure.
Given such a container, we can easily create a new object from the invalid collection of values (using either apply() or some other technique).
Alternatively, perhaps we can develop a pattern that uses a tuple and then just apply() it, kind of like the copy method, as long as we can still randomly alter one or more values.
We can probably explore developing a reusable pattern that does something like this:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => invalidV = v.invalidate(1); validate(invalidV) must beFalse }
In the above code, we have a generator thingGen that returns (valid) Things. Then for all instances returned, we invoke a generic method invalidate(count: Int) which will randomly invalidate count values, returning an invalid object. We can then use that to ascertain whether our validation logic works correctly.
This would require defining an invalidate() function that, given a parameter (either by name, or by position) can then replace the identified parameter with a value that is known to be bad. This implies have an "anti-generator" for specific values, for instance, if an ID must be 3 characters, then it knows to create a string that is anything but 3 characters long.
Of course to invalidate a known, single parameter (to inject bad data into a test condition) we can simply use the copy method:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => v2 = v copy(id = "xxx"); validate(v2) must beFalse }
That is the sum of my thinking to date. Am I barking up the wrong tree? Are there good patterns out there that handle this kind of testing? Any commentary or suggestions on how best to approach this problem of testing our validation logic?
We can combine a valid instance and an set of invalid fields (so that every field, if copied, would cause validation failure) to get an invalid object using shapeless library.
Shapeless allows you to represent your class as a list of key-value pairs that are still strongly typed and support some high-level operations, and converting back from this representation to your original class.
In example below I'll be providing an invalid instance for each single field provided
import shapeless._, record._
import shapeless.labelled.FieldType
import shapeless.ops.record.Updater
A detailed intro
Let's pretend we have a data class, and a valid instance of it (we only need one, so it can be hardcoded)
case class User(id: String, name: String, about: String, age: Int) {
def isValid = id.length == 3 && name.nonEmpty && age >= 0
}
val someValidUser = User("oo7", "Frank", "A good guy", 42)
assert(someValidUser.isValid)
We can then define a class to be used for invalid values:
case class BogusUserFields(name: String, id: String, age: Int)
val bogusData = BogusUserFields("", "1234", -5)
Instances of such classes can be provided using ScalaCheck. It's much easier to write a generator where all fields would cause failure. Order of fields doesn't matter, but their names and types do. Here we excluded about from User set of fields so we can do what you asked for (feeding only a subset of fields you want to test)
We then use LabelledGeneric[T] to convert User and BogusUserFields to their corresponding record value (and later we will convert User back)
val userLG = LabelledGeneric[User]
val bogusLG = LabelledGeneric[BogusUserFields]
val validUserRecord = userLG.to(someValidUser)
val bogusRecord = bogusLG.to(bogusData)
Records are lists of key-value pairs, so we can use head to get a single mapping, and the + operator supports adding / replacing field to another record. Let's pick every invalid field into our user one at a time. Also, here's the conversion back in action:
val invalidUser1 = userLG.from(validUserRecord + bogusRecord.head)// invalid name
val invalidUser2 = userLG.from(validUserRecord + bogusRecord.tail.head)// invalid ID
val invalidUser3 = userLG.from(validUserRecord + bogusRecord.tail.tail.head) // invalid age
assert(List(invalidUser1, invalidUser2, invalidUser3).forall(!_.isValid))
Since we basically are applying the same function (validUserRecord + _) to every key-value pair in our bogusRecord, we can also use map operator, except we use it with an unusual - polymorphic - function. We can also easily convert it to List, because every element will be of a same type now.
object polymerge extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[userLG.Repr, FieldType[K, V]]) =
at[FieldType[K, V]](upd(validUserRecord, _))
}
val allInvalidUsers = bogusRecord.map(polymerge).toList.map(userLG.from)
assert(allInvalidUsers == List(invalidUser1, invalidUser2, invalidUser3))
Generalizing and removing all the boilerplate
Now the whole point of this was that we can generalize it to work for any two arbitrary classes. The encoding of all relationships and operations is a bit cumbersome and it took me a while to get it right with all the implicit not found errors, so I'll skip the details.
class Picks[A, AR <: HList](defaults: A)(implicit lgA: LabelledGeneric.Aux[A, AR]) {
private val defaultsRec = lgA.to(defaults)
object mergeIntoTemplate extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[AR, FieldType[K, V]]) =
at[FieldType[K, V]](upd(defaultsRec, _))
}
def from[B, BR <: HList, MR <: HList, F <: Poly](options: B)
(implicit
optionsLG: LabelledGeneric.Aux[B, BR],
mapper: ops.hlist.Mapper.Aux[mergeIntoTemplate.type, BR, MR],
toList: ops.hlist.ToTraversable.Aux[MR, List, AR]
) = {
optionsLG.to(options).map(mergeIntoTemplate).toList.map(lgA.from)
}
}
So, here it is in action:
val cp = new Picks(someValidUser)
assert(cp.from(bogusData) == allInvalidUsers)
Unfortunately, you cannot write new Picks(someValidUser).from(bogusData) because implicit for mapper requires a stable identifier. On the other hand, cp instance can be reused with other types:
case class BogusName(name: String)
assert(cp.from(BogusName("")).head == someValidUser.copy(name = ""))
And now it works for all types! And bogus data is required to be any subset of class fields, so it will work even for class itself
case class Address(country: String, city: String, line_1: String, line_2: String) {
def isValid = Seq(country, city, line_1, line_2).forall(_.nonEmpty)
}
val acp = new Picks(Address("Test country", "Test city", "Test line 1", "Test line 2"))
val invalidAddresses = acp.from(Address("", "", "", ""))
assert(invalidAddresses.forall(!_.isValid))
You can see the code running at ScalaFiddle

Yield mutable.seq from mutable.traversable type in Scala

I have a variable underlying of type Option[mutable.Traversable[Field]]
All I wanted todo in my class was provide a method to return this as Sequence in the following way:
def toSeq: scala.collection.mutable.Seq[Field] = {
for {
f <- underlying.get
} yield f
}
This fails as it complains that mutable.traversable does not conform to mutable.seq. All it is doing is yielding something of type Field - in my mind this should work?
A possible solution to this is:
def toSeq: Seq[Field] = {
underlying match {
case Some(x) => x.toSeq
case None =>
}
}
Although I have no idea what is actually happening when x.toSeq is called and I imagine there is more memory being used here that actually required to accomplish this.
An explanation or suggestion would be much appreciated.
I am confused why you say that "I imagine there is more memory being used here than actually required to accomplish". Scala will not copy your Field values when doing x.toSeq, it is simply going to create an new Seq which will have pointers to the same Field values that underlying is pointing to. Since this new structure is exactly what you want there is no avoiding the additional memory associated with the extra pointers (but the amount of additional memory should be small). For a more in-depth discussion see the wiki on persistent data structures.
Regarding your possible solution, it could be slightly modified to get the result you're expecting:
def toSeq : Seq[Field] =
underlying
.map(_.toSeq)
.getOrElse(Seq.empty[Field])
This solution will return an empty Seq if underlying is a None which is safer than your original attempt which uses get. I say it's "safer" because get throws a NoSuchElementException if the Option is a None whereas my toSeq can never fail to return a valid value.
Functional Approach
As a side note: when I first started programming in scala I would write many functions of the form:
def formatSeq(seq : Seq[String]) : Seq[String] =
seq map (_.toUpperCase)
This is less functional because you are expecting a particular collection type, e.g. formatSeq won't work on a Future.
I have found that a better approach is to write:
def formatStr(str : String) = str.toUpperCase
Or my preferred coding style:
val formatStr = (_ : String).toUpperCase
Then the user of your function can apply formatStr in any fashion they want and you don't have to worry about all of the collection casting:
val fut : Future[String] = ???
val formatFut = fut map formatStr
val opt : Option[String] = ???
val formatOpt = opt map formatStr

Scala copy and reflection

In my project, there are many places where objects are picked out of a collection, copied with some values changed, and pushed back into the collection. I have been trying to create my own 'copy' method, that in addition to making a copy, also gives me a 'Diff' object. In other words, something that contains the arguments you just put into it.
The 'Diff' object should then be sent somewhere to be aggregated, so that someone else can get a report of all the changes since last time, without sending the actual entire object. This is all simple enough if one does it like this:
val user = new User(Some(23), true, "arne", None, None, "position", List(), "email", None, false)
val user0 = user.copy(position = "position2")
list ::= user0
val diff = new Diff[User](Map("position" -> "position2"))
However, there is some duplicate work there, and I would very much like to just have it in one method, like:
val (user, diff) = user.copyAndDiff(position = "position")
I haven't been able to figure out what form the arguments to 'copy' actually takes, but I would be able to work with other forms as well.
I made a method with a Type argument, that should make a copy and a diff. Something like this:
object DiffCopy[Copyable]{
def apply(original:Copyable, changes:Map[String, Any]){
original.copy(??uhm..
original.getAllTheFieldsAndCopyAndOverWriteSomeAccordingToChanges??
My first problem was that there doesn't seem to be any way to guarantee that the original object has a 'copy' method that I can overload to. The second problem appears when I want to actually assign the changes to their correct fields in the new, copied object. I tried to fiddle about with Reflection, and tried to find a way to set the value of a field with a name given as String. In which case I could keep my Diff as a a simple map, and simply create this diff-map first, and then apply it to my objects and also send them to where they needed to go.
However, I ended up deeper and deeper in the rabbit hole, and further and further away from what I actually wanted. I got to a point where I had an array of fields from an arbitrary object, and could get them by name, but I couldn't get it to work for a generic Type. So now I am here to ask if anyone can give me some advice on this situation?
The best answer I could get, would be if someone could tell me a simple way to apply a Map[String, Any] to something equivalent to the 'copy' method. I'm fairly sure this should be possible to implement, but it is simply currently beyond me...
A little bit overcomplicated but solve your original problem.
The best answer I could get, would be if someone could tell me a simple way to apply a Map[String, Any] to something equivalent to the 'copy' method. I'm fairly sure this should be possible to implement, but it is simply currently beyond me...
Take all fields from case class to map.
Update map with new values.
Create case class from new fields map.
Problems:
low performance
I'm pretty sure it can be done simpler...
case class Person(name: String, age: Int)
def getCCParams(cc: Any) =
(Map[String, Any]() /: cc.getClass.getDeclaredFields) {(a, f) =>
f.setAccessible(true)
a + (f.getName -> f.get(cc))
}
def enrichCaseClass[T](cc: T, vals : Map[String, Any])(implicit cmf : ClassManifest[T]) = {
val ctor = cmf.erasure.getConstructors().head
val params = getCCParams(cc.asInstanceOf[Any]) ++ vals
val args = cmf.erasure.getDeclaredFields().map( f => params(f.getName).asInstanceOf[Object] )
ctor.newInstance(args : _*).asInstanceOf[T]
}
val j = Person("Jack", 15)
enrichCaseClass(j, Map("age" -> 18))

Converting thunk to sequence upon iteration

I have a server API that returns a list of things, and does so in chunks of, let's say, 25 items at a time. With every response, we get a list of items, and a "token" that we can use for the following server call to return the next 25, and so on.
Please note that we're using a client library that has been written in stodgy old mutable Java, and doesn't lend itself nicely to all of Scala's functional compositional patterns.
I'm looking for a way to return a lazily evaluated sequence of all server items, by doing a server call with the latest token whenever the local list of items has been exhausted. What I have so far is:
def fetchFromServer(uglyStateObject: StateObject): Seq[Thing] = {
val results = server.call(uglyStateObject)
uglyStateObject.update(results.token())
results.asScala.toList ++ (if results.moreAvailable() then
fetchFromServer(uglyStateObject)
else
List())
}
However, this function does eager evaluation. What I'm looking for is to have ++ concatenate a "strict sequence" and a "lazy sequence", where a thunk will be used to retrieve the next set of items from the server. In effect, I want something like this:
results.asScala.toList ++ Seq.lazy(() => fetchFromServer(uglyStateObject))
Except I don't know what to use in place of Seq.lazy.
Things I've seen so far:
SeqView, but I've seen comments that it shouldn't be used because it re-evaluates all the time?
Streams, but they seem like the abstraction is supposed to generate elements at a time, whereas I want to generate a bunch of elements at a time.
What should I use?
I also suggest you to take a look at scalaz-strem. Here is small example how it may look like
import scalaz.stream._
import scalaz.concurrent.Task
// Returns updated state + fetched data
def fetchFromServer(uglyStateObject: StateObject): (StateObject, Seq[Thing]) = ???
// Initial state
val init: StateObject = new StateObject
val p: Process[Task, Thing] = Process.repeatEval[Task, Seq[Thing]] {
var state = init
Task(fetchFromServer(state)) map {
case (s, seq) =>
state = s
seq
}
} flatMap Process.emitAll
As a matter of fact, in the meantime I already found a slightly different answer that I find more readable (indeed using Streams):
def fetchFromServer(uglyStateObject: StateObject): Stream[Thing] = {
val results = server.call(uglyStateObject)
uglyStateObject.update(results.token())
results.asScala.toStream #::: (if results.moreAvailable() then
fetchFromServer(uglyStateObject)
else
Stream.empty)
}
Thanks everyone for