How to filter on more than one record mongodb embedded documents - mongodb

This is my model:
order:{
_id: 88565,
activity:
[
{_id: 57235, content: "foo"},
{_id: 57236, content: "bar"}
]
}
This is my query:
db.order.find({
"$and": [
{
"activity.content "bar"
},
{
"activity._id": 57235
}
]
});
This query will select the order with id 88565 even if the conditions are satisfied together by 2 different embedded activities.
I would expect that this query returned nothing.
I know that I can use elemMatch to filter embedded documents with more precision but this behaviour seems very confusing.
Is there a way to obtain a proper filtering where an AND clause has a single embedded document scope?

Related

How to build a MongoDB query that combines two field temporarily?

I have a schema which has one field named ownerId and a field which is an array named participantIds. In the frontend users can select participants. I'm using these ids to filter documents by querying the participantIds with the $all operator and the list of participantsIds from the frontend. This is perfect except that the participantsIds in the document don't include the ownerId. I thought about using aggregate to add a new field which consists of a list like this one: [participantIds, ownerId] and then querying against this new field with $all and after that delete the field again since it isn't need in the frontend.
How would such a query look like or is there any better way to achieve this behavior? I'm really lost right now since I'm trying to implement this with mongo_dart for the last 3 hours.
This is how the schema looks like:
{
_id: ObjectId(),
title: 'Title of the Event',
startDate: '2020-09-09T00:00:00.000',
endDate: '2020-09-09T00:00:00.000',
startHour: 1,
durationHours: 1,
ownerId: '5f57ff55202b0e00065fbd10',
participantsIds: ['5f57ff55202b0e00065fbd14', '5f57ff55202b0e00065fbd15', '5f57ff55202b0e00065fbd13'],
classesIds: [],
categoriesIds: [],
roomsIds: [],
creationTime: '2020-09-10T16:42:14.966',
description: 'Some Desc'
}
Tl;dr I want to query documents with the $all operator on the participantsIds field but the ownerId should be included in this query.
What I want is instead of querying against:
participantsIds: ['5f57ff55202b0e00065fbd14', '5f57ff55202b0e00065fbd15', '5f57ff55202b0e00065fbd13']
I want to query against:
participantsIds: ['5f57ff55202b0e00065fbd14', '5f57ff55202b0e00065fbd15', '5f57ff55202b0e00065fbd13', '5f57ff55202b0e00065fbd10']
Having fun here, by the way, it's better to use Joe answer if you are doing the query frequently, or even better a "All" field on insertion.
Additional Notes: Use projection at the start/end, to get what you need
https://mongoplayground.net/p/UP_-IUGenGp
db.collection.aggregate([
{
"$addFields": {
"all": {
$setUnion: [
"$participantsIds",
[
"$ownerId"
]
]
}
}
},
{
$match: {
all: {
$all: [
"5f57ff55202b0e00065fbd14",
"5f57ff55202b0e00065fbd15",
"5f57ff55202b0e00065fbd13",
"5f57ff55202b0e00065fbd10"
]
}
}
}
])
Didn't fully understand what you want to do but maybe this helps:
db.collection.find({
ownerId: "5f57ff55202b0e00065fbd10",
participantsIds: {
$all: ['5f57ff55202b0e00065fbd14',
'5f57ff55202b0e00065fbd15',
'5f57ff55202b0e00065fbd13']
})
You could use the pipeline form of update to either add the owner to the participant list or add a new consolidated field:
db.collection.update({},[{$set:{
allParticipantsIds: {$setUnion: [
"$participantsIds",
["$ownerId"]
]}
}}])

MongoDB: update nested value in a collection based on existing field value

I want to update nested _ids over an entire collection IF they are of a type string.
If I have object that look like this...
user : {
_id: ObjectId('234wer234wer234wer'),
occupation: 'Reader',
books_read: [
{
title: "Best book ever",
_id: "123qwe234wer345ert456rty"
},
{
title: "Worst book ever",
_id: "223qwe234wer345ert456rty"
},
{
title: "A Tail of Two Cities",
_id: ObjectId("323qwe234wer345ert456rty")
}
]
}
and I want to change the type of the _Ids from string to ObjectId
how would I do that.??
I have done "this" in the past...But this is working on NON-nested item - I need to change a nested value
db.getCollection('users')
.find({
$or: [
{occupation:{$exists:false}},
{occupation:{$eq:null}}
]
})
.forEach(function (record) {
record.occupation = 'Reader';
db.users.save(record);
});
Any help - I am trying to avoid writing a series of loop on the app server to make db calls - so I am hoping for something directly in 'mongo'
There isn't a way of doing (non $rename) updates operations on a document while referencing existing fields -- MongoDB: Updating documents using data from the same document
So, you'll need to write a script (similar to the one you posted with find & each) to recreate those documents with the correct _id type. To find the subdocuments to update you can use the $type operator. A query like db.coll.find({nestedField._id: {$type: 'string' }}) should find all the full documents that have bad subdocuments, or you could do an aggregation query with $match & $unwind to only get the subdocuments
db.coll.aggregate([
{ $match: {'nestedField._id': {$type: 'string' }}}, // limiting to documents that have any bad subdocuments
{ $unwind: '$nestedField'}, // creating a separate document in the pipeline for each entry in the array
{ $match: {'nestedField._id': {$type: 'string' }}}, // limiting to only the subdocuments that have bad fields
{ $project: { nestedId: 'nestedField._id' }} // output will be: {_id: documentedId, nestedId }
])
I am trying to avoid writing a series of loop on the app server to make db calls - so I am hoping for something directly in 'mongo'
You can run js code directly on the mongo to avoid making api calls, but I don't think there's any way to avoid looping over the documents.

Can I use populate before aggregate in mongoose?

I have two models, one is user
userSchema = new Schema({
userID: String,
age: Number
});
and the other is the score recorded several times everyday for all users
ScoreSchema = new Schema({
userID: {type: String, ref: 'User'},
score: Number,
created_date = Date,
....
})
I would like to do some query/calculation on the score for some users meeting specific requirement, say I would like to calculate the average of score for all users greater than 20 day by day.
My thought is that firstly do the populate on Scores to populate user's ages and then do the aggregate after that.
Something like
Score.
populate('userID','age').
aggregate([
{$match: {'userID.age': {$gt: 20}}},
{$group: ...},
{$group: ...}
], function(err, data){});
Is it Ok to use populate before aggregate? Or I first find all the userID meeting the requirement and save them in a array and then use $in to match the score document?
No you cannot call .populate() before .aggregate(), and there is a very good reason why you cannot. But there are different approaches you can take.
The .populate() method works "client side" where the underlying code actually performs additional queries ( or more accurately an $in query ) to "lookup" the specified element(s) from the referenced collection.
In contrast .aggregate() is a "server side" operation, so you basically cannot manipulate content "client side", and then have that data available to the aggregation pipeline stages later. It all needs to be present in the collection you are operating on.
A better approach here is available with MongoDB 3.2 and later, via the $lookup aggregation pipeline operation. Also probably best to handle from the User collection in this case in order to narrow down the selection:
User.aggregate(
[
// Filter first
{ "$match": {
"age": { "$gt": 20 }
}},
// Then join
{ "$lookup": {
"from": "scores",
"localField": "userID",
"foriegnField": "userID",
"as": "score"
}},
// More stages
],
function(err,results) {
}
)
This is basically going to include a new field "score" within the User object as an "array" of items that matched on "lookup" to the other collection:
{
"userID": "abc",
"age": 21,
"score": [{
"userID": "abc",
"score": 42,
// other fields
}]
}
The result is always an array, as the general expected usage is a "left join" of a possible "one to many" relationship. If no result is matched then it is just an empty array.
To use the content, just work with an array in any way. For instance, you can use the $arrayElemAt operator in order to just get the single first element of the array in any future operations. And then you can just use the content like any normal embedded field:
{ "$project": {
"userID": 1,
"age": 1,
"score": { "$arrayElemAt": [ "$score", 0 ] }
}}
If you don't have MongoDB 3.2 available, then your other option to process a query limited by the relations of another collection is to first get the results from that collection and then use $in to filter on the second:
// Match the user collection
User.find({ "age": { "$gt": 20 } },function(err,users) {
// Get id list
userList = users.map(function(user) {
return user.userID;
});
Score.aggregate(
[
// use the id list to select items
{ "$match": {
"userId": { "$in": userList }
}},
// more stages
],
function(err,results) {
}
);
});
So by getting the list of valid users from the other collection to the client and then feeding that to the other collection in a query is the onyl way to get this to happen in earlier releases.

MongoDB query to find property of first element of array

I have the following data in MongoDB (simplified for what is necessary to my question).
{
_id: 0,
actions: [
{
type: "insert",
data: "abc, quite possibly very very large"
}
]
}
{
_id: 1,
actions: [
{
type: "update",
data: "def"
},{
type: "delete",
data: "ghi"
}
]
}
What I would like is to find the first action type for each document, e.g.
{_id:0, first_action_type:"insert"}
{_id:1, first_action_type:"update"}
(It's fine if the data structured differently, but I need those values present, somehow.)
EDIT: I've tried db.collection.find({}, {'actions.action_type':1}), but obviously that returns all elements of the actions array.
NoSQL is quite new to me. Before, I would have stored all this in two tables in a relational database and done something like SELECT id, (SELECT type FROM action WHERE document_id = d.id ORDER BY seq LIMIT 1) action_type FROM document d.
You can use $slice operator in projection. (but for what you do i am not sure that the order of the array remain the same when you update it. Just to keep in mind))
db.collection.find({},{'actions':{$slice:1},'actions.type':1})
You can also use the Aggregation Pipeline introduced in version 2.2:
db.collection.aggregate([
{ $unwind: '$actions' },
{ $group: { _id: "$_id", first_action_type: { $first: "$actions.type" } } }
])
Using the $arrayElemAt operator is actually the most elegant way, although the syntax may be unintuitive:
db.collection.aggregate([
{ $project: {first_action_type: {$arrayElemAt: ["$actions.type", 0]}
])

Facet search using MongoDB

I am contemplating to use MongoDB for my next project. One of the core requirements for this application is to provide facet search. Has anyone tried using MongoDB to achieve a facet search?
I have a product model with various attributes like size, color, brand etc. On searching a product, this Rails application should show facet filters on sidebar. Facet filters will look something like this:
Size:
XXS (34)
XS (22)
S (23)
M (37)
L (19)
XL (29)
Color:
Black (32)
Blue (87)
Green (14)
Red (21)
White (43)
Brand:
Brand 1 (43)
Brand 2 (27)
I think using Apache Solr or ElasticSearch you get more flexibility and performance, but this is supported using Aggregation Framework.
The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query.
Example
//'tags' filter simulates the search
//this query gets the products
db.products.find({tags: {$all: ["tag1", "tag2"]}})
//this query gets the size facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$size"}, count: {$sum:1}},
{$sort: {count:-1}}
)
//this query gets the color facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$color"}, count: {$sum:1}},
{$sort: {count:-1}}
)
//this query gets the brand facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$brand"}, count: {$sum:1}},
{$sort: {count:-1}}
)
Once the user filters the search using facets, you have to add this filter to query predicate and match predicate as follows.
//user clicks on "Brand 1" facet
db.products.find({tags: {$all: ["tag1", "tag2"]}, brand: "Brand 1"})
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$size"}, count: {$sum:1}},
{$sort: {count:-1}}
)
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$color"}, count: {$sum:1}},
{$sort: {count:-1}}
)
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$brand"}, count: {$sum:1}},
{$sort: {count:-1}}
)
Mongodb 3.4 introduces faceted search
The $facet stage allows you to create multi-faceted aggregations which
characterize data across multiple dimensions, or facets, within a
single aggregation stage. Multi-faceted aggregations provide multiple
filters and categorizations to guide data browsing and analysis.
Input documents are passed to the $facet stage only once.
Now, you dont need to query N times for retrieving aggregations on N groups.
$facet enables various aggregations on the same set of input documents,
without needing to retrieve the input documents multiple times.
A sample query for the OP use-case would be something like
db.products.aggregate( [
{
$facet: {
"categorizedByColor": [
{ $match: { color: { $exists: 1 } } },
{
$bucket: {
groupBy: "$color",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedBySize": [
{ $match: { size: { $exists: 1 } } },
{
$bucket: {
groupBy: "$size",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedByBrand": [
{ $match: { brand: { $exists: 1 } } },
{
$bucket: {
groupBy: "$brand",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
]
}
}
])
A popular option for more advanced search with MongoDB is to use ElasticSearch in conjunction with the community supported MongoDB River Plugin. The MongoDB River plugin feeds a stream of documents from MongoDB into ElasticSearch for indexing.
ElasticSearch is a distributed search engine based on Apache Lucene, and features a RESTful JSON interface over http. There is a Facet Search API and a number of other advanced features such as Percolate and "More like this".
You can do the query, the question would be is it fast or not. ie something like:
find( { size:'S', color:'Blue', Brand:{$in:[...]} } )
the question is then how is the performance. There isn't any special facility for faceted search in the product yet. Down the road there might be some set intersection-like query plans that are good but that is tbd/future.
If your properties are a predefined set and you know what they are you could create an index on each of them. Only one of the indexes will be used in the current implementation so this will help but only get you so far: if the data set is medium plus in size it might be fine.
You could use compound indexes which perhaps compound two or more of the properties. If you have a small # of properties this might work pretty well. The index need not use all the variables queries on but in the one above a compound index on any two of the three is likely to perform better than an index on a single item.
If you dont have too many skus brute force would work; e.g. if you are 1MM skues a table scan in ram might be fast enough. in this case i would make a table with just the facet values and make it as small as possible and keep the full sku docs in a separate collection. e.g.:
facets_collection:
{sz:1,brand:123,clr:'b',_id:}
...
if the # of facet dimensions isnt' too high you could instead make a highly compound index of the facit dimensions and you would get the equivalent to the above without the extra work.
if you create quit a few indexes, it is probably best to not create so many that they no longer fit in ram.
given the query runs and it is a performance question one might just with mongo and if it isn't fast enough then bolt on solr.
The faceted solution (count based) depends on your application design.
db.product.insert(
{
tags :[ 'color:green','size:M']
}
)
However, if one is able to feed data in the above format where facets and their values are joined together to form a consistent tag, then using the below query
db.productcolon.aggregate(
[
{ $unwind : "$tags" },
{
$group : {
_id : '$tags',
count: { $sum: 1 }
}
}
]
)
See the result output below
{
"_id" : "color:green",
"count" : NumberInt(1)
}
{
"_id" : "color:red",
"count" : NumberInt(1)
}
{
"_id" : "size:M",
"count" : NumberInt(3)
}
{
"_id" : "color:yellow",
"count" : NumberInt(1)
}
{
"_id" : "height:5",
"count" : NumberInt(1)
}
Beyond this step, your application server can do a color/size grouping before sending back to the client.
Note - The approach to combine facet and its values gives you all facet values agggregated and you can avoid - "The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query." see Garcia's answer