How to do left outer join and count in AWS DocumentDB? - aws-documentdb

I am trying to get count of tenant for the owner,
owner_collection:
[
{_id: ObjectId("123dddfffaaa7744"), owner_name:"Sam"},
{_id: ObjectId("243dddfffaaa7755"), owner_name:"Ray"}
]
tenant_collection:
[
{_id: ObjectId("2223dddfffaaa12233"), tenant_name:"tet", owner_id: ObjectId("123dddfffaaa7744")},
{_id: ObjectId("3343dddfffaaa12234"), tenant_name:"sothy", owner_id: ObjectId("123dddfffaaa7744")},
{_id: ObjectId("6583dddfffaaa1876"), tenant_name:"gill", owner_id: ObjectId("243dddfffaaa7755")},
{_id: ObjectId("2223dddfffaaa12233"), tenant_name:"tony", owner_id: ObjectId("123dddfffaaa7744")}
]
In mongodb as it is like this,
db.getCollection("owner_collection").aggregate([
{
$lookup:{from:"tenant_collection",localField:"_id",foreignField:"owner_id", as:"tenantCount"}
},{
$addFields:{
tenantCount:{$size:"$tenantCount"}
}
}
]);
but when i am executing this in aws DocumentDB (mongodb compatibile), it throws error
invalid $lookup namespace
Is there any way to achieve in DocumentDB ?

The error suggests that the collection you are joining from (tenant_collection) doesn't exist, that is the standard error message of DocumentDB for such a case. This is a different behavior than MongoDB, which doesn't complain when joining with non-existing collection and returns the results from the left hand side collection.
To double check on this, just create an empty collection in DocumentDB and run the $lookup again ( db.createCollection("tenant_collection") )

Related

MongoDB: update nested value in a collection based on existing field value

I want to update nested _ids over an entire collection IF they are of a type string.
If I have object that look like this...
user : {
_id: ObjectId('234wer234wer234wer'),
occupation: 'Reader',
books_read: [
{
title: "Best book ever",
_id: "123qwe234wer345ert456rty"
},
{
title: "Worst book ever",
_id: "223qwe234wer345ert456rty"
},
{
title: "A Tail of Two Cities",
_id: ObjectId("323qwe234wer345ert456rty")
}
]
}
and I want to change the type of the _Ids from string to ObjectId
how would I do that.??
I have done "this" in the past...But this is working on NON-nested item - I need to change a nested value
db.getCollection('users')
.find({
$or: [
{occupation:{$exists:false}},
{occupation:{$eq:null}}
]
})
.forEach(function (record) {
record.occupation = 'Reader';
db.users.save(record);
});
Any help - I am trying to avoid writing a series of loop on the app server to make db calls - so I am hoping for something directly in 'mongo'
There isn't a way of doing (non $rename) updates operations on a document while referencing existing fields -- MongoDB: Updating documents using data from the same document
So, you'll need to write a script (similar to the one you posted with find & each) to recreate those documents with the correct _id type. To find the subdocuments to update you can use the $type operator. A query like db.coll.find({nestedField._id: {$type: 'string' }}) should find all the full documents that have bad subdocuments, or you could do an aggregation query with $match & $unwind to only get the subdocuments
db.coll.aggregate([
{ $match: {'nestedField._id': {$type: 'string' }}}, // limiting to documents that have any bad subdocuments
{ $unwind: '$nestedField'}, // creating a separate document in the pipeline for each entry in the array
{ $match: {'nestedField._id': {$type: 'string' }}}, // limiting to only the subdocuments that have bad fields
{ $project: { nestedId: 'nestedField._id' }} // output will be: {_id: documentedId, nestedId }
])
I am trying to avoid writing a series of loop on the app server to make db calls - so I am hoping for something directly in 'mongo'
You can run js code directly on the mongo to avoid making api calls, but I don't think there's any way to avoid looping over the documents.

How to delete duplicates using MongoDB Aggregations in MongoDB Compass Community

I somehow created duplicates of every single entry in my database. Currently, there are 176039 documents and counting, half are duplicates. Each document is structured like so
_id : 5b41d9ccf10fcf0014fe8917
originName : "Hartsfield Jackson Atlanta International Airport"
destinationName : "Antigua"
totalDuration : 337
Inside the MongoDB Compass Community App for Mac under the Aggregations tab, I was able to find duplicates using this pipeline
[
{$group: {
_id: {originName: "$originName", destinationName: "$destinationName"},
count: {$sum: 1}}},
{$match: {count: {"$gt": 1}}}
]
I'm not sure how to move forward and delete the duplicates at this point. I'm assuming it has something to do with $out.
Edit: Something I didn't notice until now is that the values for totalDuration on each double are actually different.
Add
{$project:{_id:0, "originName":"$_id.originName", "destinationName":"$_id.destinationName"}},
{ $out : collectionname }
This will replace the documents in your current collection with documents from aggregation pipeline. If you need totalDuration in the collection then add that field in both group and project stage before running the pipeline

Refine/Restructure data from Mongodb query

Im using NodeJs, MongoDB Native 2.0+
The following query fetch one client document containing arrays of embedded staff and services.
db.collection('clients').findOne({_id: sessId}, {"services._id": 1, "staff": {$elemMatch: {_id: reqId}}}, callback)
Return a result like this:
{
_id: "5422c33675d96d581e09e4ca",
staff:[
{
name: "Anders"
_id: "5458d0aa69d6f72418969428"
// More fields not relevant to the question...
}
],
services: [
{
_id: "54578da02b1c54e40fc3d7c6"
},
{
_id: "54578da42b1c54e40fc3d7c7"
},
{
_id: "54578da92b1c54e40fc3d7c9"
}
]
}
Note that each embedded object in services actually contains several fields, but _id is the only field returned by means of the projection of the query.
From this returned data I start by "pluck" all id's from services and save them in an array later used for validation. This is by no means a difficult operation... but I'm curious... Is there an easy way to do some kind of aggregation instead of find, to get an array of already plucked objectId's directly from the DB. Something like this:
{
_id: "5422c33675d96d581e09e4ca",
staff:[
{
name: "Anders"
_id: "5458d0aa69d6f72418969428"
// More fields not relevant to the question...
}
],
services: [
"54578da02b1c54e40fc3d7c6",
"54578da42b1c54e40fc3d7c7",
"54578da92b1c54e40fc3d7c9"
]
}
One way of doing it is to first,
$unwind the document based on the staff field, this is done to
select the intended staff. This step is required due to the
unavailability of the $elemMatch operator in the aggregation
framework.
There is an open ticket here: Jira
Once the document with the correct staff is selected, $unwind, based on $services.
The $group, together $pushing all the services _id together in an array.
This is then followed by a $project operator, to show the intended fields.
db.clients.aggregate([
{$match:{"_id":sessId}},
{$unwind:"$staff"},
{$match:{"staff._id":reqId}},
{$unwind:"$services"},
{$group:{"_id":"$_id","services_id":{$push:"$services._id"},"staff":{$first:"$staff"}}},
{$project:{"services_id":1,"staff":1}}
])

How to filter on more than one record mongodb embedded documents

This is my model:
order:{
_id: 88565,
activity:
[
{_id: 57235, content: "foo"},
{_id: 57236, content: "bar"}
]
}
This is my query:
db.order.find({
"$and": [
{
"activity.content "bar"
},
{
"activity._id": 57235
}
]
});
This query will select the order with id 88565 even if the conditions are satisfied together by 2 different embedded activities.
I would expect that this query returned nothing.
I know that I can use elemMatch to filter embedded documents with more precision but this behaviour seems very confusing.
Is there a way to obtain a proper filtering where an AND clause has a single embedded document scope?

Facet search using MongoDB

I am contemplating to use MongoDB for my next project. One of the core requirements for this application is to provide facet search. Has anyone tried using MongoDB to achieve a facet search?
I have a product model with various attributes like size, color, brand etc. On searching a product, this Rails application should show facet filters on sidebar. Facet filters will look something like this:
Size:
XXS (34)
XS (22)
S (23)
M (37)
L (19)
XL (29)
Color:
Black (32)
Blue (87)
Green (14)
Red (21)
White (43)
Brand:
Brand 1 (43)
Brand 2 (27)
I think using Apache Solr or ElasticSearch you get more flexibility and performance, but this is supported using Aggregation Framework.
The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query.
Example
//'tags' filter simulates the search
//this query gets the products
db.products.find({tags: {$all: ["tag1", "tag2"]}})
//this query gets the size facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$size"}, count: {$sum:1}},
{$sort: {count:-1}}
)
//this query gets the color facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$color"}, count: {$sum:1}},
{$sort: {count:-1}}
)
//this query gets the brand facet
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}},
{$group: {_id: "$brand"}, count: {$sum:1}},
{$sort: {count:-1}}
)
Once the user filters the search using facets, you have to add this filter to query predicate and match predicate as follows.
//user clicks on "Brand 1" facet
db.products.find({tags: {$all: ["tag1", "tag2"]}, brand: "Brand 1"})
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$size"}, count: {$sum:1}},
{$sort: {count:-1}}
)
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$color"}, count: {$sum:1}},
{$sort: {count:-1}}
)
db.products.aggregate(
{$match: {tags: {$all: ["tag1", "tag2"]}}, brand: "Brand 1"},
{$group: {_id: "$brand"}, count: {$sum:1}},
{$sort: {count:-1}}
)
Mongodb 3.4 introduces faceted search
The $facet stage allows you to create multi-faceted aggregations which
characterize data across multiple dimensions, or facets, within a
single aggregation stage. Multi-faceted aggregations provide multiple
filters and categorizations to guide data browsing and analysis.
Input documents are passed to the $facet stage only once.
Now, you dont need to query N times for retrieving aggregations on N groups.
$facet enables various aggregations on the same set of input documents,
without needing to retrieve the input documents multiple times.
A sample query for the OP use-case would be something like
db.products.aggregate( [
{
$facet: {
"categorizedByColor": [
{ $match: { color: { $exists: 1 } } },
{
$bucket: {
groupBy: "$color",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedBySize": [
{ $match: { size: { $exists: 1 } } },
{
$bucket: {
groupBy: "$size",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedByBrand": [
{ $match: { brand: { $exists: 1 } } },
{
$bucket: {
groupBy: "$brand",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
]
}
}
])
A popular option for more advanced search with MongoDB is to use ElasticSearch in conjunction with the community supported MongoDB River Plugin. The MongoDB River plugin feeds a stream of documents from MongoDB into ElasticSearch for indexing.
ElasticSearch is a distributed search engine based on Apache Lucene, and features a RESTful JSON interface over http. There is a Facet Search API and a number of other advanced features such as Percolate and "More like this".
You can do the query, the question would be is it fast or not. ie something like:
find( { size:'S', color:'Blue', Brand:{$in:[...]} } )
the question is then how is the performance. There isn't any special facility for faceted search in the product yet. Down the road there might be some set intersection-like query plans that are good but that is tbd/future.
If your properties are a predefined set and you know what they are you could create an index on each of them. Only one of the indexes will be used in the current implementation so this will help but only get you so far: if the data set is medium plus in size it might be fine.
You could use compound indexes which perhaps compound two or more of the properties. If you have a small # of properties this might work pretty well. The index need not use all the variables queries on but in the one above a compound index on any two of the three is likely to perform better than an index on a single item.
If you dont have too many skus brute force would work; e.g. if you are 1MM skues a table scan in ram might be fast enough. in this case i would make a table with just the facet values and make it as small as possible and keep the full sku docs in a separate collection. e.g.:
facets_collection:
{sz:1,brand:123,clr:'b',_id:}
...
if the # of facet dimensions isnt' too high you could instead make a highly compound index of the facit dimensions and you would get the equivalent to the above without the extra work.
if you create quit a few indexes, it is probably best to not create so many that they no longer fit in ram.
given the query runs and it is a performance question one might just with mongo and if it isn't fast enough then bolt on solr.
The faceted solution (count based) depends on your application design.
db.product.insert(
{
tags :[ 'color:green','size:M']
}
)
However, if one is able to feed data in the above format where facets and their values are joined together to form a consistent tag, then using the below query
db.productcolon.aggregate(
[
{ $unwind : "$tags" },
{
$group : {
_id : '$tags',
count: { $sum: 1 }
}
}
]
)
See the result output below
{
"_id" : "color:green",
"count" : NumberInt(1)
}
{
"_id" : "color:red",
"count" : NumberInt(1)
}
{
"_id" : "size:M",
"count" : NumberInt(3)
}
{
"_id" : "color:yellow",
"count" : NumberInt(1)
}
{
"_id" : "height:5",
"count" : NumberInt(1)
}
Beyond this step, your application server can do a color/size grouping before sending back to the client.
Note - The approach to combine facet and its values gives you all facet values agggregated and you can avoid - "The main problem using MongoDB is you have to query it N Times: First for get matching results and then once per group; while using a full text search engine you get it all in one query." see Garcia's answer