MongoDB: Transform array of objects to array of arrays - mongodb

I have a collection named "records" that contains documents in the following form:
{
"name": "a"
"items": [
{
"a": "5",
"b": "1",
"c": "2"
},
{
"a": "6",
"b": "3",
"c": "7"
}
]
}
I want to keep the data just as it is in the database (to make the data easy to read and interpret). But I'd like to run a query that returns the data in the following form:
{
"name": "a"
"items": [
["5", "1", "2"],
["6", "3", "7"],
]
}
Is this possible with pymongo? I know I can run a query and translate the documents using Python, but I'd like to avoid iterating over the query result if possible.

I have a table named "records"
Collection
Is this possible with pymongo?
Yes
Any pointers on how to approach this would be super helpful!
I'd suggest you to use a view to transform your data during a query in MongoDB.
In this way, you can get transformed data and apply find to already transformed data if you need.
db.createCollection(
"view_name",
{"viewOn": "original_collection_name",
"pipeline": [{$unwind: "$items"},
{$project: {name: 1, items: {$objectToArray: "$items"}}},
{$project: {name: 1, items: {$concatArrays: ["$items.v"]}}},
{$group: {_id: "$_id", name: {$first: "$name"},
items: {$push: "$items"}}}]
}
)
> db.view_name.find({name: "a"})
{ "_id" : ObjectId("5fc3dbb69cb76f866582620f"), "name" : "a", "items" : [ [ "5", "1", "2" ], [ "6", "3", "7" ] ] }
> db.view_name.find({"items": {$in: [["5", "1", "2"]]}})
{ "_id" : ObjectId("5fc3dbb69cb76f866582620f"), "name" : "a", "items" : [ [ "5", "1", "2" ], [ "6", "3", "7" ] ] }
> db.view_name.find()
{ "_id" : ObjectId("5fc3dbb69cb76f866582620f"), "name" : "a", "items" : [ [ "5", "1", "2" ], [ "6", "3", "7" ] ] }
Query:
db.original_collection_name.aggregate([
{$unwind: "$items"},
{$project: {name: 1, items: {$objectToArray: "$items"}}},
{$project: {name: 1, items: {$concatArrays: ["$items.v"]}}},
{$group: {_id: "$_id", name: {$first: "$name"}, items: {$push: "$items"}}}])

Using $objectToArray and $map transformations:
// { name: "a", items: [ { a: "5", b: "1", c: "2" }, { a: "6", b: "3", c: "7" } ] }
db.collection.aggregate([
{ $set: { items: { $map: { input: "$items", as: "x", in: { $objectToArray: "$$x" } } } } },
// {
// name: "a",
// items: [
// [ { k: "a", v: "5" }, { k: "b", v: "1" }, { k: "c", v: "2" } ],
// [ { k: "a", v: "6" }, { k: "b", v: "3" }, { k: "c", v: "7" } ]
// ]
// }
{ $set: { items: { $map: { input: "$items", as: "x", in: "$$x.v" } } } }
])
// { name: "a", items: [["5", "1", "2"], ["6", "3", "7"]] }
This maps items' elements as key/value arrays such that { field: "value" } becomes [ { k: "field", v: "value" } ]. This way whatever the field name, we can easily access the value using v, which is the role of the second $set stage: "$$x.v".
This has the benefit of avoiding heavy stages such as unwind/group.
Note that you can also imbricate the second $map within the first; but that's probably less readable.

Related

Remove multiple objects from nested array 3

I try to clean my collection with single update query , need to remove some deeply nested objects , but without breaking other objects , here is a good solution provided by #rickhg12hs:
Remove multiple objects from deeply nested array 2
but it has small drawback , it is breaking the content of _a._p object when there is no _a._p.s object inside...
and original solution provided by #nimrod serok:
Remove multiple elements from deep nested array with single update query
but it has other issue , when there is missing "_a._p.s.c" , "_a._p.s.d" or "_a._p.s.a" object it add objects with null values instead which afcourse is not expected ...
Playground test
This are 2x example original documents:
[
{
"_id": ObjectId("5c05984246a0201286d4b57a"),
f: "x",
"_a": [
{
"_onlineStore": {}
},
{
"_p": {
"s": {
"a": {
"t": [
{
id: 1,
"dateP": "20200-09-20",
did: "x",
dst: "y",
den: "z"
},
{
id: 2,
"dateP": "20200-09-20"
}
]
},
"c": {
"t": [
{
id: 3,
"dateP": "20300-09-22"
},
{
id: 4,
"dateP": "20300-09-23",
did: "x",
dst: "y",
den: "z"
},
{
id: 5,
"dateP": "20300-09-23"
}
]
}
}
}
}
]
},
{
"_id": ObjectId("5c05984246a0201286d4b57b"),
f: "x",
"_a": [
{
"_onlineStore": {}
},
{
"_p": {
_t: "Some field",
_x: "Some other field"
}
}
]
}
]
Expected result after update:
[
{
"_a": [
{
"_onlineStore": {}
},
{
"_p": {
"s": {
"a": {
"t": [
{
"dateP": "20200-09-20",
"den": "z",
"did": "x",
"dst": "y",
"id": 1
}
]
},
"c": {
"t": [
{
"dateP": "20300-09-23",
"den": "z",
"did": "x",
"dst": "y",
"id": 4
}
]
}
}
}
}
],
"_id": ObjectId("5c05984246a0201286d4b57a"),
"f": "x"
},
{
"_a": [
{
"_onlineStore": {}
},
{
"_p": {
_t: "Some field",
_x: "Some other field"
}
}
],
"_id": ObjectId("5c05984246a0201286d4b57b"),
"f": "x"
}
]
The goal is with single update query to remove any objects under _a._p.s.[a|c|d].t where the fields did,dst and den are missing but without breaking other objects _a._p where _a._p.s do not exists ...
Looks like a small change to #rickhg12hs's answer can solve this:
db.collection.update({},
[
{$set: {
_a: {$map: {
input: "$_a",
as: "elem",
in: {$cond: [
{$or: [
{$eq: [{$type: "$$elem._p"}, "missing"]},
{$eq: [{$type: "$$elem._p.s"}, "missing"]}
]},
"$$elem",
{
_p: {s: {
$arrayToObject: {$map: {
input: {$objectToArray: "$$elem._p.s"},
as: "anyKey",
in: {
k: "$$anyKey.k",
v: {
t: {$filter: {
input: "$$anyKey.v.t",
as: "t",
cond: {$setIsSubset: [
["did", "dst", "den"],
{$map: {
input: {$objectToArray: "$$t"},
in: "$$this.k"
}}
]}
}}
}
}
}}
}
}}
]}
}}
}}
],
{
"multi": true
})
See how it works on the playground example

MongoDb score results based on simple matches

I'm trying to create a simple search algorithm that will try to match against a first name, last name, and/or set of tags, as an example:
[
{
"key": 1,
"fname": "Bob",
"lname": "Smith",
"tags": [
"a",
"b",
"c"
]
},
{
"key": 2,
"fname": "John",
"lname": "Jacob",
"tags": [
"c",
"d",
"e"
]
},
{
"key": 3,
"fname": "Will",
"lname": "Smith",
"tags": [
"a",
"b",
"c"
]
}
]
This works with the following, but I can only get the tags count. Basically what I'm going for here is to match first-name, last-name, or tags and for each match store a "point":
db.collection.aggregate([
{
$match: {
$or: [
{
"fname": "Will"
},
{
"lname": "Smith"
},
{
tags: {
$in: [
"b",
"c"
]
}
}
]
}
},
{
$project: {
tagsMatchCount: {
$size: {
"$setIntersection": [
[
"b",
"c"
],
"$tags"
]
}
}
}
},
{
"$sort": {
tagsMatchCount: -1
}
}
])
Here's the sandbox I'm playing with: https://mongoplayground.net/p/DFJQZY-dfb5
Query
create a document to hold the matches each in separate field
add one extra field total
keep only those with at least 1 match
you can sort also after by any of the 3 types of matches or by total, like
{"$sort":{"points.total":-1}}
if you have index that can be used, remove my $match and add your match as first stage like in your example
Test code here
aggregate(
[{"$set":
{"points":
{"fname":{"$cond":[{"$eq":["$fname", "Will"]}, 1, 0]},
"lname":{"$cond":[{"$eq":["$lname", "Smith"]}, 1, 0]},
"tags":{"$size":{"$setIntersection":["$tags", ["b", "c"]]}}}}},
{"$set":
{"points.total":
{"$add":["$points.fname", "$points.lname", "$points.tags"]}}},
{"$match":{"$expr":{"$gt":["$points.total", 0]}}}])

mongodb aggregate uniqueness and count at same time with higher level averages

Consider this dataset. For each name, we wish to find the average of x and the distinct set and count of game. For Steve, this is avg(x)=19, game A is 2, and game B is 1. For Bob, this is avg(x) = 58, game B is 4:
{"name":"Steve", "game": "A", x:7},
{"name":"Steve", "game": "A", x:21},
{"name":"Steve", "game": "B", x:31},
{"name":"Bob", "game": "B", x:41},
{"name":"Bob", "game": "B", x:51},
{"name":"Bob", "game": "B", x:71},
{"name":"Bob", "game": "B", x:79},
{"name":"Jill", "game": "A", x:61},
{"name":"Jill", "game": "B", x:71},
{"name":"Jill", "game": "C", x:81},
{"name":"Jill", "game": "D", x:91}
EDIT: Answer is below but leaving this incomplete solution as a stepping stone.
I am really close with this. Note we cannot use $addToSet because it is "lossy". So instead, we group by player and game to get the full list, then in a second group, capture list size:
db.foo2.aggregate([
{$group: {_id:{n:"$name",g:"$game"}, z:{$push: "$x"} }}
,{$group: {_id:"$_id.n",
avgx: {$avg: "$z"},
games: {$push: {name: "$_id.g", num: {$size:"$z"}}}
}}
]);
which yields:
{
"_id" : "Steve",
"avgx" : null,
"games" : [ {"name":"A", "num":2 },
{"name":"B", "num":1 }
]
}
{
"_id" : "Bob",
"avgx" : null,
"games" : [ {"name":"B", "num":4 } ]
}
but I just cannot seem to get the avgx working properly. If I needed the average within the game type that would be easy but I need it across the player. $avg in the $group context does not work with array inputs.
Try this:
db.collection.aggregate([
{
$group: {
_id: "$name",
avg: {
$avg: "$x"
},
gamesUnFiltered: {
$push: {
name: "$game",
num: "$x"
}
}
}
},
{
$addFields: {
games: {
$reduce: {
input: "$gamesUnFiltered",
initialValue: [],
in: {
$cond: [
{
$not: [
{
$in: [
"$$this.name",
"$$value.name"
]
}
]
},
{
$concatArrays: [
[
"$$this"
],
"$$value"
]
},
"$$value"
]
}
}
}
}
},
{
$project: {
gamesUnFiltered: 0
}
}
])
Output:
[
{
"_id": "Bob",
"avg": 60.5,
"games": [
{
"name": "B",
"num": 41
}
]
},
{
"_id": "Steve",
"avg": 19.666666666666668,
"games": [
{
"name": "B",
"num": 31
},
{
"name": "A",
"num": 7
}
]
},
{
"_id": "Jill",
"avg": 76,
"games": [
{
"name": "D",
"num": 91
},
{
"name": "C",
"num": 81
},
{
"name": "B",
"num": 71
},
{
"name": "A",
"num": 61
}
]
}
]
Got it! You need an extra $unwind and use $first to "carry" the a field from stage to stage. Threw in total_games for extra info. In general, the "group-unwind-first" pattern is a way to aggregate one or more things then "reset" to unaggregated state to perform additional operations with the aggregate values traveling along with each doc.
db.foo2.aggregate([
{$group: {_id:"$name", a:{$avg:"$x"}, g:{$push: "$game"} }}
,{$unwind: "$g"}
,{$group: {_id:{name:"$_id",game:"$g"}, a:{$first:"$a"}, n:{$sum:1}}}
,{$group: {_id:"$_id.name",
a:{$first:"$a"},
total_games: {$sum:"$n"},
games: {$push: {name:"$_id.game",n:"$n"}}
}}
]);

Find percent in mongo

I have a collection with 2 docs like below.
{
_id:1,
Score: 30,
Class:A,
School:X
}
{
Score:40,
Class:A,
School:Y
}
I need help in writing query to find out percentage of score like below
{
School:X,
Percent:30/70
}
{
School:Y
Percent:40/70
}
This input:
var r =
[
{"school":"X", "class":"A", "score": 30}
,{"school":"Y", "class":"A", "score": 40}
,{"school":"Z", "class":"A", "score": 20}
,{"school":"Y", "class":"B", "score": 50}
,{"school":"Z", "class":"B", "score": 17}
];
run through this pipeline:
db.foo.aggregate([
// Use $group to gather up the class and save the inputs via $push
{$group: {_id: "$class", tot: {$sum: "$score"}, items: {$push: {score:"$score",school:"$school"}}} }
// Now we have total by class, so just "reproject" that array and do some nice
// formatting as requested:
,{$project: {
items: {$map: { // overwrite input array $items; this is OK
input: "$items",
as: "z",
in: {
school: "$$z.school",
pct: {$concat: [ {$toString: "$$z.score"}, "/", {$toString:"$tot"} ]}
}
}}
}}
]);
produces this output, where _id is the Class:
{
"_id" : "A",
"items" : [
{"school" : "X", "pct" : "30/90"},
{"school" : "Y", "pct" : "40/90"},
{"school" : "Z", "pct" : "20/90"}
]
}
{
"_id" : "B",
{"school" : "Y", "pct" : "50/67"},
{"school" : "Z", "pct" : "17/67"}
]
}
From here you can $unwind if you wish.

Create mongodb view for sub-collections

I have some collections with sub-collections in them and I need to be able to get sub-collections as they're not sub-collections. Let's say I have collection like that:
[
{author: "aa", books: [{title:"a", pages: 100}, {title: "b", pages: 200}]},
{author: "ab", books: [{title:"c", pages: 80}, {title: "d", pages: 150}]}
]
I want to be able to view this collection like this:
[
{author: "aa", books.title: "a", books.pages: 100},
{author: "aa", books.title: "b", books.pages: 200},
{author: "ab", books.title: "c", books.pages: 80},
{author: "ab", books.title: "d", books.pages: 150}
]
Is it possible to create a view as what I need and filter it through web api?
Edit after #mickl 's question:
What I want is show every sub-collection in a new row. I have 2 records in the main collection and 2 sub-collections in every record. So I want to get 4 rows and want to be able to do it on the db side not on the api side.
So the key thing here is $unwind operator which transforms an array of n elements into n elements with single subdocument.
db.createView(
"yourview",
"yourcollection",
[ { $unwind: "$books" } ]
)
This will give you a documents in following format:
{ author: "aa", books: { title: "a", pages: 100 } },
{ author: "aa", books: { title: "b", pages: 200 } },
{ author: "ab", books: { title: "c", pages: 80 } },
{ author: "ab", books: { title: "d", pages: 150 } }
EDIT: to have keys with dots in their names you can run below command:
db.createView(
"yourview",
"yourcollection",
[
{ $unwind: "$books" },
{
$project: {
author: 1,
books2: {
$map: {
input: { $objectToArray: "$books" },
as: "book",
in: {
k: { $concat: [ "books.", "$$book.k" ] },
v: "$$book.v"
}
}
}
}
},
{
$replaceRoot: {
newRoot: { $mergeObjects: [ { author: "$author" }, { $arrayToObject: "$books2" } ] }
}
}
]
)
Basically it uses $objectToArray and $arrayToObject to "force" MongoDB to return fields with dots in their names. Outputs:
{ "author" : "aa", "books.title" : "a", "books.pages" : 100 }
{ "author" : "aa", "books.title" : "b", "books.pages" : 200 }
{ "author" : "ab", "books.title" : "c", "books.pages" : 80 }
{ "author" : "ab", "books.title" : "d", "books.pages" : 150 }