How to transform deeply nested data in mongodb aggregation framework? - mongodb

I’m a newbie with mongoDB aggregation and I’m struggling a bit trying to get my data to look the way I want. I’m a student completing a bootcamp
And we are doing a project where we seed a database of our choice with from millions lines of CSV that was extracted from a SQL database, though Im not sure which one.
For context, the data is questions and answers from a mock retail application we built.
I was given three files. One with questions, one with answers, and one with photos that were uploaded to answers. I successfully used the $ lookup and $out operators
To join these files on the appropriate index and export to a new collection . So now I just have a collection of questions, and a collections of ansPhotos
The issue Is that the data needs to be structurally transformed for different cases.
Suppose I want all the questions and answers for a particular product . Below shows how the question data is structured, giving me all questions for a product_id of 1:
db.questions.find({ product_id: 1 })[
({
_id: ObjectId('61731a1cae4ca5aef1836b04'),
question_id: 4,
product_id: 1,
question_body: 'How long does it last?',
question_date: Long('1594341317010'),
asker_name: 'funnygirl',
asker_email: 'first.last#gmail.com',
reported: 0,
helpful: 6,
},
{
_id: ObjectId('61731a1cae4ca5aef1836b05'),
question_id: 3,
product_id: 1,
question_body: 'Does this product run big or small?',
question_date: Long('1608535907083'),
asker_name: 'jbilas',
asker_email: 'first.last#gmail.com',
reported: 0,
helpful: 8,
},
{
_id: ObjectId('61731a1cae4ca5aef1836b06'),
question_id: 6,
product_id: 1,
question_body: 'Is it noise cancelling?',
question_date: Long('1608855284662'),
asker_name: 'coolkid',
asker_email: 'first.last#gmail.com',
reported: 1,
helpful: 19,
},
{
_id: ObjectId('61731a1cae4ca5aef1836b08'),
question_id: 1,
product_id: 1,
question_body: 'What fabric is the top made of?',
question_date: Long('1595884714409'),
asker_name: 'yankeelover',
asker_email: 'first.last#gmail.com',
reported: 0,
helpful: 1,
},
{
_id: ObjectId('61731a1cae4ca5aef1836b0d'),
question_id: 5,
product_id: 1,
question_body: 'Can I wash it?',
question_date: Long('1608855284662'),
asker_name: 'cleopatra',
asker_email: 'first.last#gmail.com',
reported: 0,
helpful: 7,
},
{
_id: ObjectId('61731a1cae4ca5aef1836b13'),
question_id: 2,
product_id: 1,
question_body: 'HEY THIS IS A WEIRD QUESTION!!!!?',
question_date: Long('1613888219613'),
asker_name: 'jbilas',
asker_email: 'first.last#gmail.com',
reported: 1,
helpful: 4,
})
];
I now want to get all the answers for all these questions. For brevity and because I’ll be pasting a lot of context / examples, heres what a couple of answer documents from ansPhotos looks like:
db.ansPhotos.find({question_id:4})
[
{
_id: ObjectId("61731c9c39b2df95b4573b3c"),
id: 65,
question_id: 4,
body: 'It runs small',
date: Long("1605784307205"),
answerer_name: 'dschulman',
answerer_email: 'first.last#gmail.com',
reported: 0,
helpful: 1,
photos: [
{
_id: ObjectId("61731edbbac3ef59b2a59b04"),
id: 15,
answer_id: 65,
url: 'https://images.unsplash.com/photo-1536922645426-5d658ab49b81?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80'
},
{
_id: ObjectId("61731edbbac3ef59b2a59b0a"),
id: 14,
answer_id: 65,
url: 'https://images.unsplash.com/photo-1470116892389-0de5d9770b2c?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1567&q=80'
}
]
},
{
_id: ObjectId("61731c9c39b2df95b4573b54"),
id: 89,
question_id: 4,
body: 'Showing no wear after a few months!',
date: Long("1599089609530"),
answerer_name: 'sillyguy',
answerer_email: 'first.last#gmail.com',
reported: 0,
helpful: 8,
photos: []
}
]
Now for the part I’m struggling with.
The data needs to look different for different API calls. I basically need to nest every answer with its photos in each question.
Here are the key challenges I’m facing and transformations I have to make. There are other transformations that I am not discussing because they are easy to do, such as not returning the Object_id for answers,
Transforming the date etc.
Each question has an answers object that is stored in key-value pairs with its “id” as the key and the object as the value.
Each answer must only have the photos url in an array, instead of an array of objects that have a URL property each., as you can see above for answers related to question_id 4.
Some questions do not have any answers. Question with “ question_id:3” below is one such question. I am still expected to return an empty object at the “answers” key if there are no questions for it.
[
{
"question_id": 4,
"question_body": "How long does it last?",
"question_date": "2020-07-10T00:35:17.010Z",
"asker_name": "funnygirl",
"reported": false,
"question_helpfullness": 6,
"answers": {
"65": {
_id: ObjectId("61731c9c39b2df95b4573b3c"),
"id": 65,
"question_id": 4,
"body": "It runs small",
"date": 1605784307205,
"answerer_name": "dschulman",
"answerer_email": "first.last#gmail.com",
"helpful": 1,
"photos": ["https://images.unsplash.com/photo-1536922645426-5d658ab49b81?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80”,
"https://images.unsplash.com/photo-1470116892389-0de5d9770b2c?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1567&q=80"]
},
"89": {
"_id": "61731c9c39b2df95b4573b54",
"id": 89,
"question_id": 4,
"body": "Showing no wear after a few months!",
"date": 1599089609530,
"answerer_name": "sillyguy",
"answerer_email": "first.last#gmail.com",
"reported": 0,
"helpful": 8,
"photos": []
}
}
},
{
"question_id": 5,
"question_body": "Can I wash it?",
"question_date": "2020-12-25T00:14:44.662Z",
"asker_name": "cleopatra",
"reported": false,
"question_helpfullness": 7,
"answers": {
"46": {
"_id": "61731c9c39b2df95b4573b27",
"id": 46,
"question_id": 5,
"body": "I've thrown it in the wash and it seems fine",
"date": 1606022843272,
"answerer_name": "marcanthony",
"answerer_email": "first.last#gmail.com",
"reported": 0,
"photos": []
},
"64": {
"_id": "61731c9c39b2df95b4573b3b",
"id": 64,
"question_id": 5,
"body": "It says not to",
"date": 1588644950162,
"answerer_name": "ceasar",
"answerer_email": "first.last#gmail.com",
"helpful": 0,
"photos": []
},
}
},
{
"question_id": 3,
"question_body": "Does this product run big or small?",
"question_date": "2020-12-21T07:31:47.083Z",
"asker_name": "jbilas",
"reported": false,
"question_helpfullness": 8,
"answers": {}
}
},
//etc..
What I’ve tried in the pipeline:
Calling db.questions.aggregate([]) with the following stages.
Get all products that have a product id of 1 and are not reported:
Stage 1 :
{
'$match': {
'product_id': 1,
'reported': 0
}
}
Stage 2:
Join all questions documents with their respective answers in an array called “answers”
{
'$lookup': {
'from': 'ansPhotos',
'localField': 'question_id',
'foreignField': 'question_id',
'as': 'answers'
}
}
Sample output:
questions_answers> db.questions.aggregate([{$match:{product_id:1,reported:0}},{$lookup:{from:'ansPhotos',localField:'question_id',foreignField:'question_id',as:'answers'}}])
[
{
_id: ObjectId("61731a1cae4ca5aef1836b04"),
question_id: 4,
product_id: 1,
question_body: 'How long does it last?',
question_date: Long("1594341317010"),
asker_name: 'funnygirl',
asker_email: 'first.last#gmail.com',
reported: 0,
helpful: 6,
answers: [
{
_id: ObjectId("61731c9c39b2df95b4573b3c"),
id: 65,
question_id: 4,
body: 'It runs small',
date: Long("1605784307205"),
answerer_name: 'dschulman',
answerer_email: 'first.last#gmail.com',
reported: 0,
helpful: 1,
photos: [
{
_id: ObjectId("61731edbbac3ef59b2a59b04"),
id: 15,
answer_id: 65,
url: 'https://images.unsplash.com/photo-1536922645426-5d658ab49b81?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80'
},
{
_id: ObjectId("61731edbbac3ef59b2a59b0a"),
id: 14,
answer_id: 65,
url: 'https://images.unsplash.com/photo-1470116892389-0de5d9770b2c?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1567&q=80'
}
]
},
{
_id: ObjectId("61731c9c39b2df95b4573b54"),
id: 89,
question_id: 4,
body: 'Showing no wear after a few months!',
date: Long("1599089609530"),
answerer_name: 'sillyguy',
answerer_email: 'first.last#gmail.com',
reported: 0,
helpful: 8,
photos: []
}
]
},
{
_id: ObjectId("61731a1cae4ca5aef1836b05"),
question_id: 3,
product_id: 1,
question_body: 'Does this product run big or small?',
question_date: Long("1608535907083"),
asker_name: 'jbilas',
asker_email: 'first.last#gmail.com',
reported: 0,
helpful: 8,
answers: []
},
//etc…
]
Stage 3:
unwind each answer array , preserving null arrays because I still need to return questions without answers.
{
'$unwind': {
'path': '$answers',
'preserveNullAndEmptyArrays': true
}
}
I then have a document for each answer and can manipulate the “answers.photos” object. Each answers field is now an object containing the answers.
STAGE 4
Things become muddy here.
For example, I’ve tried to use $addFields, $set and $project to just get the photos.url property for each answer and put it in an array. I’ve had some success doing this. but…
STAGE 5
Then I try to $group them back into arrays of objects And had some success with it… Note the $ifNull is my feeble attempt to give the next stage what it wants, but it is not working.
{
$group: {“answers.id”},
question_id:{$first:"$question_id"}, question_body:{$first:'$question_body'},question_date:{$first:"$question_date"},asker_name:{$first:"$asker_name"},reported:{$first:"$reported"},question_helpfullness:{$first:"$helpful"},
answers:{$push:{$ifNull:["$answers",{_id:"$_id",id:'noanswers' ,question_id:"$question_id",}]}}
}
BUT I also need to do this at some point:
STAGE 6 or later…
{
'$addFields': {
'answers': {
'$arrayToObject': {
'$map': {
'input': '$answers',
'in': {
'k': {
'$toString': '$$this.id'
},
'v': '$$this'
}
}
}
}
}
}
To give me the appropriate key-value pairs as seen in the desired output.
This is where things get muddy. I have tried a TON of configurations over the last 5 days.
In most cases if I directly manipulate the answers array after stage 4, I get this error when I then try to use $addFields:
PlanExecutor error during aggregation :: caused by :: $arrayToObject requires an object with keys ‘k’ and ‘v’, where the value of ‘k’ must be of type string. Found type: null
This is because the question with id 3 has no answers and I’ve inadvertently assigned it to an empty object using any of the methods mentioned in Stage 4.
I’ve tried some $ifNull operations as you can see to give the this question the key value pairs its expecting, but am only successful sometimes and usually there are other weird side effects.
To summarize, is there a way for me to get only the url property out of the “answers.photos” array, account for the edge case of having a question that has no answers, and still structure them in the key value pairs as illustrated?
Apologies if this is too long or difficult to read. If there’s some more formatting I can do to make it better please let me know. Any help is very very much appreciated.
Joe

Related

Apollo graphql returning entire subdocument collection with each document

I'm working on a MERN stack with ApolloServer/Client. I wrote a query that should return Building documents (98 documents) with many subfields including an array of their associated Room subdocuments (1-9 subdocuments each, 379 total).
The query works great except that each Building document is being returned with an array of all 379 Room subdocuments.
I think I've narrowed it down to an issue with ApolloServer. I've confirmed the data in my MongoDB is correct and that the Mongoose call in the resolver returns the correct data. When I run the query to the frontend or in Apollo playground, I get the extra documents.
Here are some paraphrased example code
// MongoDB example
// buildings
{ "_id": 1, "address": "123 main st", "rooms": [{"_id": 12},{"_id": 15}]},
{ "_id": 2, "address": "125 main st", "rooms": [{"_id": 11}, {"_id": 16}, {"_id": 13}]},
{ "_id": 3, "address": "222 state st", "rooms": [{"_id": 14}]}
// rooms
{ "_id": 11, "number": "b7"},
{ "_id": 12, "number": "145"},
{ "_id": 13, "number": "12"},
// etc...
// Query
const resolvers = {
Query: {
Buildings: ()=> await Buildings.find({}).populate('rooms') // this returns correctly
}
}
// Typedefs
type Building {
_id: ID!
address: String!
rooms: [Room]
}
type Room {
_id: ID!
number: String!
}
query {
Buildings: [Building]
}
/* Something about this is returning
{[
{_id: 1, address: whatever, rooms: [11, 12, 13, 14, 15, 16]},
{_id: 2, address: whatever, rooms: [11, 12, 13, 14, 15, 16]},
{_id: 3, address: whatever, rooms: [11, 12, 13, 14, 15, 16]}
]}
instead of
{[
{_id: 1, address: whatever, rooms: [ 12, 15, ]},
{_id: 2, address: whatever, rooms: [ 11, 13, 16 ]},
{_id: 3, address: whatever, rooms: [ 14 ]}
]}
I'm working in a sizable code base. I've searched for any duplicated name functions in the typedefs like 'buildings' and 'rooms', but haven't found any. Suggestions?
Thanks!
You say that await Buildings.find({}).populate('rooms') returns the correct results.
I suspect that there is a type resolver for Building/rooms that has an incorrect query tied to it. Your typeDefs are fine, this is going to be in your resolvers code somewhere.
Ideally your Buildings query should just return [Building] then the resolver for the rooms field under Building should go figure out what rooms belong to a particular building and return [Room]

Get items of array by index in MongoDB

So I have a data structure in a Mongo collection (v. 4.0.18) that looks something like this…
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
},
{
"name": "Michael",
"age": 10,
},
{
"name": "Sharon",
"age": 8,
}
]
}
In Mongo, how would I get back a projection of this object with only the first two kids? I want the output to look like this:
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
}
]
}
It seems like I should easily be able to get them by index, but I'm not seeing anything in the docs about how to do that. The real-world problem I'm trying to solve has nothing to do with kids, and the array could be quite lengthy. I'm trying to break it up and process it in batches without having to load the whole thing into memory in my application.
EDIT (non-sequential indexes):
I noticed that since I asked about item 1 & 2 that $slice would suffice…however, what if I wanted items 1 & 3? Is there a way I can specify specific array indexes to return?
Any ideas or pointers for how to accomplish that?
Thanks!
You are looking for the $slice projection operator if the desired selection are near each other.
https://docs.mongodb.com/manual/reference/operator/projection/slice/
This would return the first 2
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": 2 } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'Alice', 'age': 15}, {'name': 'James', 'age': 13}]}
this would skip the first kid and return the next two (second and third)
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": [1, 2] } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'James', 'age': 13}, {'name': 'Michael', 'age': 10}]}
Edit:
Arbitrary selections 1 and 3 probably need to route through an aggregation pipeline rather than a simple query. The performance shouldn't be too much different assuming you have an index on the $match field.
Steps of your pipeline should be pretty obvious and you should be able to take it from here.
Hate to point to RTFM, but that's going to be super helpful here to at least be acquainted with the pipeline operations.
https://docs.mongodb.com/manual/reference/operator/aggregation/
Your pipeline should:
$match on your desired query
$set some new field kid_selection to element 1 (second element) and element 3 (4th element) since counting starts at 0. Notice the prefixed $ on the "kids" key name in the kid_selection setter. When referencing a key in the document you're working on, you need to prefix with $
project the whole document, minus the original kids field that we've selected from
client.db.collection.aggregate([
{"$match":{"name":"Doug"}},
{"$set": {"kid_selection": [
{ "$arrayElemAt": [ "$kids", 1 ] },
{ "$arrayElemAt": [ "$kids", 3 ] }
]}},
{ "$project": { "kids": 0 } }
])
returns
{
'_id': ObjectId('5f86038635649a988cdd2ade'),
'name': 'Doug',
'kid_selection': [
{'name': 'James', 'age': 13},
{'name': 'Sharon', 'age': 8}
]
}

MongoDB aggregate query for values in an array

So I have data that looks like this:
{
_id: 1,
ranking: 5,
tags: ['Good service', 'Clean room']
}
Each of these stand for a review. There can be multiple reviews with a ranking of 5. The tags field can be filled with up to 4 different tags.
4 tags are: 'Good service', 'Good food', 'Clean room', 'Need improvement'
I want to make a MongoDB aggregate query where I say 'for each ranking (1-5) give me the number of times each tag occurred for each ranking.
So an example result might look like this, _id being the ranking:
[
{ _id: 5,
totalCount: 5,
tags: {
goodService: 1,
goodFood: 3,
cleanRoom: 1,
needImprovement: 0
},
{ _id: 4,
totalCount: 7,
tags: {
goodService: 0,
goodFood: 2,
cleanRoom: 3,
needImprovement: 0
},
...
]
Having trouble with the counting the occurrences of each tag. Any help would be appreciated
You can try below aggregation.
db.colname.aggregate([
{"$unwind":"$tags"},
{"$group":{
"_id":{
"ranking":"$ranking",
"tags":"$tags"
},
"count":{"$sum":1}
}},
{"$group":{
"_id":"$_id.ranking",
"totalCount":{"$sum":"$count"},
"tags":{"$push":{"tags":"$_id.tags","count":"$count"}}
}}
])
To get the key value pair instead of array you can replace $push with $mergeObjects from 3.6 version.
"tags":{"$mergeObjects":{"$arrayToObject":[[["$_id.tags","$count"]]]}}

mongoid search in an array inside an array of hash

Say Object embeds_many searched_items
Here is the document:
{"_id": { "$oid" : "5320028b6d756e1981460000" },
"searched_items": [
{
"_id": { "$oid" : "5320028b6d756e1981470000" },
"hotel_id": 127,
"room_info": [
{
"price": 10,
"amenity_ids": [
1,
2
]
},
{
"price": 160,
"amenity_ids": null
}
]
},
{
"_id": { "$oid" : "5320028b6d756e1981480000" },
"hotel_id": 161,
"room_info": [
{
"price": 400,
"amenity_ids": [4,5]
}
]
}
]
}
I want to find the "searched_items" having room_info.amenity_ids IN [2,3].
I've tried
object.searched_items.where('room_info.amenity_ids' => [2, 3])
object.searched_items.where('room_info.amenity_ids' =>{'$in' => [2,3]}
with no luck
mongoid provides elem_match method for searching within objects of Array Type
e.g.
class A
include Mongoid::Document
field :some_field, type: Array
end
A.create(some_field: [{id: 'a', name: 'b'}, {id: 'c', name: 'd'}])
A.elem_match(some_field: { :id.in=> ["a", "c"] }) => will return the object
Let me know if you have any other doubts.
update
class SearchedHotel
include Mongoid::Document
field :hotel_id, type: String
field :room_info, type: Array
end
SearchedHotel.create(hotel_id: "1", room_info: [{id: 1, amenity_ids: [1,2], price: 600},{id: 2, amenity_ids: [1,2,3], price: 1000}])
SearchedHotel.create(hotel_id: "2", room_info: [{id: 3, amenity_ids: [1,2], price: 600}])
SearchedHotel.elem_match(room_info: {:amenity_ids.in => [1,2]})
Mongoid::Criteria
selector: {"room_info"=>{"$elemMatch"=>{"amenity_ids"=>{"$in"=>[1, 2]}}}}
options: {}
class: SearchedHotel
embedded: false
And it returns both the records. Am I missing something from your question/requirement. If yes, do let me know.
It's important to distinguish between top-level queries sent to the MongoDB server and
client-side operations on embedded-documents that are implemented by Mongoid.
This is the underlying confusion between the original question and the answer from #sandeep-kumar and associated comments.
The original question is all about the where clause on embedded documents after the query result has already been fetched.
The answer #sandeep-kumar and comments are all about top-level queries.
The following test covers both, showing how answers from #sandeep-kumar do work on the examples in your comments,
and also what does and does not work on your original question.
To summarize, Sandeep's answers do work for top-level queries.
Please review your code, if there are remaining problems, please post the exact Ruby code that summarizes the problem.
For your original question, please note that "object" has already been fetched from MongoDB,
and that you can verify this by looking at the log/test.log file.
The subsequent "where" operations are all client-side execution by Mongoid.
Simple "where" clauses do work at the embedded document level.
Complex "where" clauses involving nested array values don't seem to work -
I didn't really expect Mongoid to reimplement '$in' on the client-side.
Knowing that the "object" already has the query result,
and that the association "searched_items" gives you convenient access to the embedded documents,
you can write Ruby code to select what you want as in the following test.
Hope that this helps.
test/unit/my_object_test.rb
require 'test_helper'
require 'pp'
class MyObjectTest < ActiveSupport::TestCase
def setup
MyObject.delete_all
A.delete_all
SearchedHotel.delete_all
end
test "original question with client-side where operation on embedded documents" do
doc = {"_id"=>{"$oid"=>"5320028b6d756e1981460000"}, "searched_items"=>[{"_id"=>{"$oid"=>"5320028b6d756e1981470000"}, "hotel_id"=>127, "room_info"=>[{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]}, {"_id"=>{"$oid"=>"5320028b6d756e1981480000"}, "hotel_id"=>161, "room_info"=>[{"price"=>400, "amenity_ids"=>[4, 5]}]}]}
MyObject.create(doc)
puts
object = MyObject.first
<<-EOT.split("\n").each{|line| puts "#{line}:"; eval "pp #{line}"}
object.searched_items.where('hotel_id' => 127).to_a
object.searched_items.where(:hotel_id.in => [127,128]).to_a
object.searched_items.where('room_info.amenity_ids' => {'$in' => [2,3]}).to_a
object.searched_items.where('room_info.amenity_ids'.to_sym.in => [2,3]).to_a
object.searched_items.select{|searched_item| searched_item.room_info.any?{|room_info| room_info['amenity_ids'] && !(room_info['amenity_ids'] & [2,3]).empty?}}.to_a
EOT
end
test "A comment - top-level queries" do
A.create(some_field: [{id: 'a', name: 'b', tag_ids: [6,7,8]}, {id: 'c', name: 'd'}, tag_ids: [5,6,7]])
A.create(some_field: [{id: 'a', name: 'b', tag_ids: [1,2,3]}, {id: 'c', name: 'd'}, tag_ids: [2,3,4]])
puts
pp A.where('some_field.tag_ids'.to_sym.in => [2,3]).to_a
pp A.elem_match(some_field: { :tag_ids.in => [2,3,4] }).to_a
end
test "SearchedHotel comment - top-level query" do
s = <<-EOT
[#<SearchedHotel _id: 53253c246d756e49a7030000, hotel_id: \"1\", room_info: [{\"id\"=>1, \"amenity_ids\"=>[1, 2], \"price\"=>600}, {\"id\"=>2, \"amenity_ids\"=>[1, 2, 3], \"price\"=>1000}]>, #<SearchedHotel _id: 53253c246d756e49a7040000, hotel_id: \"2\", room_info: [{\"id\"=>3, \"amenity_ids\"=>[1, 2], \"price\"=>600}]>]
EOT
a = eval(s.gsub('#<SearchedHotel ', '{').gsub(/>,/, '},').gsub(/>\]/, '}]').gsub(/_id: \h+, /, ''))
SearchedHotel.create(a)
puts
<<-EOT.split("\n").each{|line| puts "#{line}:"; eval "pp #{line}"}
SearchedHotel.elem_match(room_info: {:amenity_ids.in => [1,2]}).to_a
EOT
end
end
$ ruby -Ilib -Itest test/unit/my_object_test.rb
Run options:
# Running tests:
[1/3] MyObjectTest#test_A_comment_-_top-level_queries
[#<A _id: 5359329d7f11ba034b000002, some_field: [{"id"=>"a", "name"=>"b", "tag_ids"=>[1, 2, 3]}, {"id"=>"c", "name"=>"d"}, {"tag_ids"=>[2, 3, 4]}]>]
[#<A _id: 5359329d7f11ba034b000002, some_field: [{"id"=>"a", "name"=>"b", "tag_ids"=>[1, 2, 3]}, {"id"=>"c", "name"=>"d"}, {"tag_ids"=>[2, 3, 4]}]>]
[2/3] MyObjectTest#test_SearchedHotel_comment_-_top-level_query
SearchedHotel.elem_match(room_info: {:amenity_ids.in => [1,2]}).to_a:
[#<SearchedHotel _id: 5359329d7f11ba034b000003, hotel_id: "1", room_info: [{"id"=>1, "amenity_ids"=>[1, 2], "price"=>600}, {"id"=>2, "amenity_ids"=>[1, 2, 3], "price"=>1000}]>,
#<SearchedHotel _id: 5359329d7f11ba034b000004, hotel_id: "2", room_info: [{"id"=>3, "amenity_ids"=>[1, 2], "price"=>600}]>]
[3/3] MyObjectTest#test_original_question_with_client-side_where_operation_on_embedded_documents
object.searched_items.where('hotel_id' => 127).to_a:
[#<SearchedItem _id: 5359329d7f11ba034b000006, hotel_id: 127, room_info: [{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]>]
object.searched_items.where(:hotel_id.in => [127,128]).to_a:
[#<SearchedItem _id: 5359329d7f11ba034b000006, hotel_id: 127, room_info: [{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]>]
object.searched_items.where('room_info.amenity_ids' => {'$in' => [2,3]}).to_a:
[]
object.searched_items.where('room_info.amenity_ids'.to_sym.in => [2,3]).to_a:
[]
object.searched_items.select{|searched_item| searched_item.room_info.any?{|room_info| room_info['amenity_ids'] && !(room_info['amenity_ids'] & [2,3]).empty?}}.to_a:
[#<SearchedItem _id: 5359329d7f11ba034b000006, hotel_id: 127, room_info: [{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]>]
Finished tests in 0.089544s, 33.5031 tests/s, 0.0000 assertions/s.
3 tests, 0 assertions, 0 failures, 0 errors, 0 skips

Is there a way to upsert a list with a single query?

I know this question has been asked before, but that's a different scenario.
I'd like to have a collection like this:
{
"_id" : ObjectId("4c28f62cbf8544c60506f11d"),
"pk": 1,
"forums": [{
"pk": 1,
"thread_count": 10,
"post_count": 20,
}, {
"pk": 2,
"thread_count": 5,
"post_count": 24,
}]
}
What I want to do is to upsert a "forum" item, incrementing counters or adding an item if it does not exist.
For example to do something like this (I hope it makes sense):
db.mycollection.update({
"pk": 3,
"forums.pk": 2
}, {
"$inc": {"forums.$.thread_count": 1},
"$inc": {"forums.$.post_count": 1},
}, true)
and have:
{
"_id" : ObjectId("4c28f62cbf8544c60506f11d"),
"pk": 1,
"forums": [{
"pk": 1,
"thread_count": 10,
"post_count": 20,
}, {
"pk": 2,
"thread_count": 5,
"post_count": 24,
}]
},
{
"_id" : ObjectId("4c28f62cbf8544c60506f11e"),
"pk": 3,
"forums": [{
"pk": 2,
"thread_count": 1,
"post_count": 1,
}]
}
I can surely make it in three steps:
Upsert the whole collection with a new item
addToSet the forum item to the list
increment forum item counters with positional operator
That's to say:
db.mycollection.update({pk:3}, {pk:3}, true)
db.mycollection.update({pk:3}, {$addToSet: {forums: {pk:2}}})
db.mycollection.update({pk:3, 'forums.pk': 2}, {$inc: {'forums.$.thread_counter': 1, {'forums.$.post_counter': 1}})
Are you aware of a more efficient way to do it?
TIA, Germano
As you may have discovered, the positional operator cannot be used in upserts:
The positional operator cannot be combined with an upsert since it requires a matching array element. If your update results in an insert then the "$" will literally be used as the field name.
So you won't be able to achieve the desired result in a single query.
You have to separate the creation of the document from the counter update. Your own solution is on the right track. It can be condensed into the following two queries:
// optionally create the document, including the array
db.mycollection.update({pk:3}, {$addToSet: {forums: {pk:2}}}, true)
// update the counters in the array item
db.mycollection.update({pk:3, 'forums.pk': 2}, {$inc: {'forums.$.thread_counter': 1, 'forums.$.post_counter': 1}})