Merge nested json using play scala - scala

I have two JSON object (with the same structure):
The first one is
json1 = {a: {b: [{c: "x" , d: val1}, {c: "y" , d: val2}]} }
and the second is
json2 = {a: {b: [{c: "x" , d: val3}, {c: "y" , d: val4}]} }
is there any way to merge these two object to have one object (if c value is same then sum d values):
result = {a: { b: [{c: "x", d: (val1+val3) } , {c: "y", d: (val2+val4) }] } }
if
json2 = {a: {b: [{c: "y" , d: val3}, {c: "z" , d: val4}]} }
result = {a: { b: [{c: "x" , d: val1} , {c: "y", d: (val2+val4+val3)},{c: "z" , d: val4}] } }
Is there any built in method to do this trick. Thanks.

If you know your JSON structure, one way of doing this would be to turn them into case classes and compare them. Here's a way I found (which is by no means optimised):
//Going by your JSON structure of {"a": {"b": [{"c": String , "d": Int}]}}
import play.api.libs.json.{Json, OFormat}
case class A(a: B)
object A{implicit val format: OFormat[A] = Json.format[A]}
case class B(b: Seq[C])
object B{implicit val format: OFormat[B] = Json.format[B]}
case class C(c: Option[String], d: Option[Int])
object C{implicit val format: OFormat[C] = Json.format[C]}
val json1 = Json.parse("""{"a": {"b": [{"c": "x" , "d": 1}, {"c": "y" , "d": 2}]}}""").as[A]
val json2 = Json.parse("""{"a": {"b": [{"c": "x" , "d": 3}, {"c": "y" , "d": 4}]}}""").as[A]
val cSeq: Seq[C] = {
(json1.a.b zip json2.a.b) map {
// List((C(Some(x),Some(1)),C(Some(x),Some(3))), (C(Some(y),Some(2)),C(Some(y),Some(4))))
c =>
val (c1, c2) = c
// assign a value to each element of the pairs
val BLANK_C = C(None, None)
if (c1.c.get == c2.c.get) C(c1.c, Some(c1.d.get + c2.d.get)) else BLANK_C
// if the "c" keys match, add the "d" keys. If not, return an empty C model
// will need to handle if get fails (ie None instead of Some(whatever))
}
}
val json3 = Json.toJson(A(B(cSeq)))
println(json3)
// {"a":{"b":[{"c":"x","d":4},{"c":"y","d":6}]}}
Currently, if the parts don't match then it returns an empty object. You didn't specify what you want to happen when they don't match so I'll leave that up to you to sort out.

Related

Circe: Decode Each Member of Array to Case Class

I'm learning Circe and need help navigating json hierarchies.
Given a json defined as follows:
import io.circe._
import io.circe.parser._
val jStr = """
{
"a1": ["c1", "c2"],
"a2": [{"d1": "abc", "d2": "efg"}, {"d1": "hij", "d2": "klm"}]
}
"""
val j = parse(jStr).getOrElse(Json.Null)
val ja = j.hcursor.downField("a2").as[Json].getOrElse("").toString
ja
ja is now: [ { "d1" : "abc", "d2" : "efg" }, { "d1" : "hij", "d2" : "klm" } ]
I can now do the following to this list:
case class Song(id: String, title: String)
implicit val songDecoder: Decoder[Song] = (c: HCursor ) => for {
id <- c.downField("d1").as[String]
title <- c.downField("d2").as[String]
} yield Song(id,title)
io.circe.parser.decode[List[Song]](ja).getOrElse("")
Which returns what I want: List(Song(abc,efg), Song(hij,klm))
My questions are as follows:
How do I add item a1.c1 from the original json (j) to each item retrieved from the array? I want to add it to Song modified as follows: case class Song(id: String, title: String, artist: String)
It seems wrong to turn the json object back into a String for the iterative step of retrieving id and title. Is there a way to do this without turning json into String?

Iterator of repeated words in a file

Suppose, I'm writing a function to find "repeated words" in a text file. For example, in aaa aaa bb cc cc bb dd repeated words are aaa and cc but not bb, because two bb instances don't appear next to each other.
The function receives an iterator and returns iterator like that:
def foo(in: Iterator[String]): Iterator[String] = ???
foo(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb")) // Iterator("aaa", "cc")
foo(Iterator("a", "a", "a", "b", "c", "b")) // Iterator("a")
How would you write foo ? Note that the input is huge and all words do not fit in memory (but the number of repeated words is relatively small).
P.S. I would like also to enhance foo later to return also positions of the repeated words, the number of repetitions, etc.
UPDATE:
OK then. Let specify bit what you want:
input | expected
|
a |
aa | a
abc |
aabc | a
aaabbbbbbc | ab
aabaa | aa
aabbaa | aba
aabaa | aa
Is it true? If so this is working solution. Not sure about performance but at least it is lazy (don't load everything into memory).
//assume we have no nulls in iterator.
def foo[T >: Null](it:Iterator[T]) = {
(Iterator(null) ++ it).sliding(3,1).collect {
case x # Seq(a,b,c) if b == c && a != b => c
}
}
We need this ugly Iterator(null) ++ because we are looking for 3 elements and we need a way to see if first two are the same.
This is pure implementation and it has some advantages over imperative one (eg. in other answers). Most important one is that it is lazy:
//infinite iterator!!!
val it = Iterator.iterate('a')(s => (s + (if(Random.nextBoolean) 1 else 0)).toChar)
//it'll take only as much as needs to take this 10 items.
//should not blow up
foo(it).take(10)
//imperative implementation will blow up in such situation.
fooImp(it).take(10)
here are all implementations from this and other posts seen in this topic:
https://scalafiddle.io/sf/w5yozTA/15
WITH INDEXES AND POSITIONS
In comment you have asked if it would be easy to add the number of repeated words and their indices. I thought about it for a while and i've made something like this. Not sure if it has great performance but it should be lazy (eg. should work for big files).
/** returns Iterator that replace consecutive items with (item, index, count).
It contains all items from orginal iterator. */
def pack[T >: Null](it:Iterator[T]) = {
//Two nulls, each for one sliding(...)
(Iterator(null:T) ++ it ++ Iterator(null:T))
.sliding(2,1).zipWithIndex
//skip same items
.filter { case (x, _) => x(0) != x(1) }
//calculate how many items was skipped
.sliding(2,1).collect {
case Seq((a, idx1), (b, idx2)) => (a(1), idx1 ,idx2-idx1)
}
}
def foo[T >: Null](it:Iterator[T]) = pack(it).filter(_._3 > 1)
OLD ANSWER (BEFORE UPDATE QUESTION)
Another (simpler) solution could be something like this:
import scala.collection.immutable._
//Create new iterator each time we'll print it.
def it = Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "dd", "ee", "ee", "ee", "ee", "ee", "aaa", "aaa", "ff", "ff", "zz", "gg", "aaa", "aaa")
//yep... this is whole implementation :)
def foo(it:Iterator[String]) = it.sliding(2,1).collect { case Seq(a,b) if a == b => a }
println(foo(it).toList) //dont care about duplication
//List(aaa, cc, dd, ee, ee, ee, ff)
println(foo(it).toSet) //throw away duplicats but don't keeps order
//Set(cc, aaa, ee, ff, dd)
println(foo(it).to[ListSet]) //throw away duplicats and keeps order
//ListSet(aaa, cc, dd, ee, ff)
//oh... and keep result longer than 5 items while testing.
//Scala collections (eg: Sets) behaves bit diffrently up to this limit (they keeps order)
//just test with bit bigger Sequences :)
https://scalafiddle.io/sf/w5yozTA/1
(if answer is helpful up-vote please)
Here is a solution with an Accumulator:
case class Acc(word: String = "", count: Int = 0, index: Int = 0)
def foo(in: Iterator[String]) =
in.zipWithIndex
.foldLeft(List(Acc())) { case (Acc(w, c, i) :: xs, (word: String, index)) =>
if (word == w) // keep counting
Acc(w, c + 1, i) :: xs
else
Acc(word, 1, index) :: Acc(w, c, i) :: xs
}.filter(_.count > 1)
.reverse
val it = Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "aaa", "aaa", "aaa", "aaa")
This returns List(Acc(aaa,2,0), Acc(cc,2,3), Acc(aaa,4,7))
It also handles if the same word has another group with repeated words.
And you have the index of the occurrences as well as the count.
Let me know if you need more explanation.
Here's a solution that uses only the original iterator. No intermediate collections. So everything stays completely lazy and is suitable for very large input data.
def foo(in: Iterator[String]): Iterator[String] =
Iterator.unfold(in.buffered){ itr => // <--- Scala 2.13
def loop :Option[String] =
if (!itr.hasNext) None
else {
val str = itr.next()
if (!itr.hasNext) None
else if (itr.head == str) {
while (itr.hasNext && itr.head == str) itr.next() //remove repeats
Some(str)
}
else loop
}
loop.map(_ -> itr)
}
testing:
val it = Iterator("aaa", "aaa", "aaa", "bb", "cc", "cc", "bb", "dd")
foo(it) // Iterator("aaa", "cc")
//pseudo-infinite iterator
val piIt = Iterator.iterate(8)(_+1).map(_/3) //2,3,3,3,4,4,4,5,5,5, etc.
foo(piIt.map(_.toString)) //3,4,5,6, etc.
It's some complex compare to another answers, but it use relatively small additional memory. And probably more fast.
def repeatedWordsIndex(in: Iterator[String]): java.util.Iterator[String] = {
val initialCapacity = 4096
val res = new java.util.ArrayList[String](initialCapacity) // or mutable.Buffer or mutable.Set, if you want Scala
var prev: String = null
var next: String = null
var prevEquals = false
while (in.hasNext) {
next = in.next()
if (next == prev) {
if (!prevEquals) res.add(prev)
prevEquals = true
} else {
prevEquals = false
}
prev = next
}
res.iterator // may be need to call distinct
}
You could traverse the collection using foldLeft with its accumulator being a Tuple of Map and String to keep track of the previous word for the conditional word counts, followed by a collect, as shown below:
def foo(in: Iterator[String]): Iterator[String] =
in.foldLeft((Map.empty[String, Int], "")){ case ((m, prev), word) =>
val count = if (word == prev) m.getOrElse(word, 0) + 1 else 1
(m + (word -> count), word)
}._1.
collect{ case (word, count) if count > 1 => word }.
iterator
foo(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd")).toList
// res1: List[String] = List("aaa", "cc")
To capture also the repeated word counts and indexes, just index the collection and apply similar tactic for the conditional word count:
def bar(in: Iterator[String]): Map[(String, Int), Int] =
in.zipWithIndex.foldLeft((Map.empty[(String, Int), Int], "", 0)){
case ((m, pWord, pIdx), (word, idx)) =>
val idx1 = if (word == pWord) idx min pIdx else idx
val count = if (word == pWord) m.getOrElse((word, idx1), 0) + 1 else 1
(m + ((word, idx1) -> count), word, idx1)
}._1.
filter{ case ((_, _), count) => count > 1 }
bar(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "cc", "cc", "cc"))
// res2: Map[(String, Int), Int] = Map(("cc", 7) -> 3, ("cc", 3) -> 2, ("aaa", 0) -> 2)
UPDATE:
As per the revised requirement, to minimize memory usage, one approach would be to keep the Map to a minimal size by removing elements of count 1 (which would be the majority if few words are repeated) on-the-fly during the foldLeft traversal. Method baz below is a revised version of bar:
def baz(in: Iterator[String]): Map[(String, Int), Int] =
(in ++ Iterator("")).zipWithIndex.
foldLeft((Map.empty[(String, Int), Int], (("", 0), 0), 0)){
case ((m, pElem, pIdx), (word, idx)) =>
val sameWord = word == pElem._1._1
val idx1 = if (sameWord) idx min pIdx else idx
val count = if (sameWord) m.getOrElse((word, idx1), 0) + 1 else 1
val elem = ((word, idx1), count)
val newMap = m + ((word, idx1) -> count)
if (sameWord) {
(newMap, elem, idx1)
} else
if (pElem._2 == 1)
(newMap - pElem._1, elem, idx1)
else
(newMap, elem, idx1)
}._1.
filter{ case ((word, _), _) => word != "" }
baz(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "cc", "cc", "cc"))
// res3: Map[(String, Int), Int] = Map(("aaa", 0) -> 2, ("cc", 3) -> 2, ("cc", 7) -> 3)
Note that the dummy empty String appended to the input collection is to ensure that the last word gets properly processed as well.

Append auto-incrementing suffix to duplicated elements of a List

Given the following list :
val l = List("A", "A", "C", "C", "B", "C")
How can I add an auto-incrementing suffix to every elements so that I end up with a list containing no more duplicates, like the following (the ordering doesn't matter) :
List("A0", "A1", "C0", "C1", "C2", "B0")
I found it out by myself just after having written this question
val l = List("A", "A", "C", "C", "B", "C")
l.groupBy(identity) // Map(A->List(A,A),C->List(C,C,C),B->List(B))
.values.flatMap(_.zipWithIndex) // List((A,0),(A,1),(C,0),(C,1),(C,2),(B,0))
.map{ case (str, i) => s"$str$i"}
If there is a better solution (using foldLeft maybe) please let me know
In a single pass straightforward way :
def transformList(list : List[String]) : List[String] = {
val buf: mutable.Map[String, Int] = mutable.Map.empty
list.map {
x => {
val i = buf.getOrElseUpdate(x, 0)
val result = s"${x.toString}$i"
buf.put(x, i + 1)
result
}
}
}
transformList( List("A", "A", "C", "C", "B", "C"))
Perhaps not the most readable solution, but...
def appendCount(l: List[String]): List[String] = {
// Since we're doing zero-based counting, we need to use `getOrElse(e, -1) + 1`
// to indicate a first-time element count as 0.
val counts =
l.foldLeft(Map[String, Int]())((acc, e) =>
acc + (e -> (acc.getOrElse(e, -1) + 1))
)
val (appendedList, _) =
l.foldRight(List[String](), counts){ case (e, (li, m)) =>
// Prepend the element with its count to the accumulated list.
// Decrement that element's count within the map of element counts
(s"$e${m(e)}" :: li, m + (e -> (m(e) - 1)))
}
appendedList
}
The idea here is that you create a count of each element in the list. You then iterate from the back of the list of original values and append the count to the value while decrementing the count map.
You need to define a helper here because foldRight will require both the new List[String] and the counts as an accumulator (and, as such, will return both). You'll just ignore the counts at the end (they'll all be -1 anyway).
I'd say your way is probably more clear. You'll need to benchmark to see which is faster if that's a concern.
Ideone.

How to map based on multiple lists with arbitrary elements?

I have the following model:
case class Car(brand: String, year: Int, model: String, ownerId: String)
case class Person(firstName: String, lastName: String, id: String)
case class House(address: String, size: Int, ownerId: String)
case class Info(id: String, lastName: String, carModel: String, address: String)
I want to build a List[Info] based on the following lists:
val personL: List[Person] = List(Person("John", "Doe", "1"), Person("Jane", "Doe", "2"))
val carL: List[Car] = List(Car("Mercedes", 1999, "G", "1"), Car("Tesla", 2016, "S", "4"), Car("VW", 2015, "Golf", "2"))
val houseL: List[House] = List(House("Str. 1", 1000, "2"), House("Bvl. 3", 150, "8"))
The info should be gathered based on the personL, for example:
val info = personL.map { p =>
val car = carL.find(_.ownerId.equals(p.id))
val house = houseL.find(_.ownerId.equals(p.id))
val carModel = car.map(_.model)
val address = house.map(_.address)
Info(p.id, p.lastName, carModel.getOrElse(""), address.getOrElse(""))
}
Result:
info: List[Info] = List(Info(1,Doe,G,), Info(2,Doe,Golf,Str. 1))
Now I am wondering if there's an expression which is more concise than my map construct which solves exactly my problem.
Here is one option by building the maps from ownerid to model and address firstly, and then look up the info while looping through the person List:
val carMap = carL.map(car => car.ownerId -> car.model).toMap
// carMap: scala.collection.immutable.Map[String,String] = Map(1 -> G, 4 -> S, 2 -> Golf)
val addrMap = houseL.map(house => house.ownerId -> house.address).toMap
// addrMap: scala.collection.immutable.Map[String,String] = Map(2 -> Str. 1, 8 -> Bvl. 3)
personL.map(p => Info(p.id, p.lastName, carMap.getOrElse(p.id, ""), addrMap.getOrElse(p.id, "")))
// res3: List[Info] = List(Info(1,Doe,G,), Info(2,Doe,Golf,Str. 1))
I would say use for comprehensions. If you need exactly that result which in that case would resemble a left join then the for comprehension is still ugly:
for {
person <- persons
model <- cars.find(_.ownerId == person.id).map(_.model).orElse(Some("")).toList
address <- houses.find(_.ownerId == person.id).map(_.address).orElse(Some("")).toList
} yield Info(person.id, person.lastName, model, address)
Note that you can remove the .toList call in this exceptional case as the two Option generators appear after the collection generators.
If you can sacrifice the default model / address values then it looks simple enough:
for {
person <- persons
car <- cars if car.ownerId == person.id
house <- houses if house.ownerId == person.id
} yield Info(person.id, person.lastName, car.model, car.address)
Hope that helps.
May be converting the individual lists in hashmaps with a map function and look up by key instead of iterating all those lists for every element of person might help?

Add a set of strings to an existing set with casbah

I have a user object as follows:
{ user: "joe", acks: ["a", "b" ] }
I want to add a set of strings to the acks field. Here's my attempt to do this with one update:
def addSomeAcks(toBeAcked = Array[String])
DB.getCollection("userAcks").update(
MongoDBObject("user" -> "joe"),
$addToSet("acks") $each toBeAcked
)
}
def test() {
addSomeAcks(Set("x", "y", "z"))
}
When I run this code I get an embedded set as follows:
{ user: "joe", acks: ["a", "b", ["x", "y", "z" ] ] }
but the result I want is:
{ user: "joe", acks: ["a", "b", "x", "y", "z" ] }
I can make it work by calling update for each item in toBeAcked, is there a way to do this in one call?
The problem is that $each takes a variable number of arguments, not a collection type like Traversable. Because of that it treats the set that you pass as a single element and adds it to array as such. This leads to nesting as you observe. You need to unwrap it this way: $each(toBeAcked: _*) or pass each elem separately $each("x", "y", "z").
Here is a complete example that works as you'd expect it to:
package com.example
import com.mongodb.casbah.Imports._
object TestApp extends App {
val col = MongoConnection()("test")("userAcks")
def printAll(): Unit =
col.find().foreach(println)
def insertFirst(): Unit =
col.insert(MongoDBObject("user" -> "joe", "acks" -> List("a", "b")))
def addSomeAcks(toBeAcked: Seq[String]): Unit =
col.update(
MongoDBObject("user" -> "joe"),
$addToSet("acks") $each (toBeAcked: _*))
printAll()
insertFirst()
printAll()
addSomeAcks(Seq("x", "y", "z"))
printAll()
}