I'm trying to grasp the mongodb concepts by translating some of our sql queries into mongo aggregation framework.
I have an sql code:
select dbo.VisitNo(u.id) as visitNo , o.id, o.PatientId, u.VisitDate
from dbo.Observation o
join sbo.ProspectiveFollowUp u on u.rootid = o.Id
order by o.PatientId
The dbo.VisitNo is implemented as:
CREATE FUNCTION dbo.VisitNo(#Id int)
RETURNS INT
AS
BEGIN
DECLARE #VisitDate date, #RootId int
SELECT #VisitDate=VisitDate, #RootId=RootId FROM dbo.ProspectiveFollowUp WHERE Id=#Id
RETURN (SELECT COUNT(1) FROM dbo.ProspectiveFollowUp WHERE RootId = #RootId AND VisitDate <= #VisitDate)
END
result:
My document in Mongo has following structure:
{
"_id",
"values":[
{
"Id",
"PatientId",
"ProspectiveFollowUp":[
"Id",
"RootId",
"VisitDate"
]
}
]
}
The values array has always one element, but that's how the data was imported. ProspectiveFollowUp has at least one record.
Creating query for retrieving the data was rather easy:
db.dbo_ObservationJSON.aggregate([
{ $unwind: '$values' },
{
$project: {
_id: 0,
Id: '$values.Id',
PatientId: '$values.PatientId',
VisitDate: '$values.ProspectiveFollowUp.VisitDate'
}
},
{ $unwind: '$VisitDate' },
{ $sort: { PatientId: 1 } }
])
The harder part is the custom function itself. I can't think outside od tsql world yet, so I have hard time getting this to work. I have translated the function into mongo the following way:
var id = 4
var result = db.dbo.ObservationJSON.aggregate([
{ $unwind: '$values' },
{ $unwind: '$values.ProspectiveFollowUp' },
{ $project: { Id: '$values.ProspectiveFollowUp.Id', RootId: '$values.ProspectiveFollowUp.RootId', VisitDate: '$values.ProspectiveFollowUp.VisitDate', _id:0 }},
{ $match: { Id: id }}
]).toArray()[0]
var totalResult = db.dbo_ObservationJSON.aggregate([{
$unwind: {
path: '$values'
}
}, {
$unwind: {
path: '$values.ProspectiveFollowUp'
}
}, {
$project: {
Id: '$values.ProspectiveFollowUp.Id',
RootId: '$values.ProspectiveFollowUp.RootId',
VisitDate: '$values.ProspectiveFollowUp.VisitDate'
}
}, {
$match: {
RootId: result.RootId,
VisitDate: {
$lte: result.VisitDate
}
}
},{$count: 'total'}]).toArray()[0]
But don't know how to integrate it into the aggregation function above.
Can I write the entire sql query equivalent into one mongo aggregate expression?
I finally got it to work.
db.dbo_ObservationJSON.aggregate([
{ $unwind: '$values' },
{ $unwind: { path: '$values.ProspectiveFollowUp', "includeArrayIndex": "index" } },
{
$project: {
_id: 0,
VisitNo: { $add: ['$index', 1] },
RootId: '$values.ProspectiveFollowUp.RootId',
PatientId: '$values.PatientId',
VisitDate: '$values.ProspectiveFollowUp.VisitDate'
}
},
{
$sort: {
PatientId: 1
}
}
]);
Related
I'm trying to query specific fields in my document and sort them by one of the fields, however, the engine seems to completely ignore the sort.
I use the query:
db.symbols.find({_id:'AAPL'}, {'income_statement.annual.totalRevenue':1,'income_statement.annual.fiscalDateEnding':1}).sort({'income_statement.annual.totalRevenue': 1})
This is the output:
[
{
_id: 'AAPL',
income_statement: {
annual: [
{
fiscalDateEnding: '2021-09-30',
totalRevenue: '363172000000'
},
{
fiscalDateEnding: '2020-09-30',
totalRevenue: '271642000000'
},
{
fiscalDateEnding: '2019-09-30',
totalRevenue: '256598000000'
},
{
fiscalDateEnding: '2018-09-30',
totalRevenue: '265595000000'
},
{
fiscalDateEnding: '2017-09-30',
totalRevenue: '229234000000'
}
]
}
}
]
I would expect to have the entries sorted by fiscalDateEnding, starting with 2017-09-30 ascending.
However, the order is fixed, even if I use -1 for sorting.
Any ideas?
The sort you are using is for the ordering of documents in the result set. This is different from the ordering of array elements inside the document.
For your case, if you are using a newer version of MongoDB (5.2+), you can use the $sortArray.
db.symbols.aggregate([
{
$project: {
_id: 1,
annual: {
$sortArray: {
input: "$income_statement.annual",
sortBy: {
fiscalDateEnding: 1
}
}
}
}
}
])
If you are using older version of MongoDB, you can do the followings to perform the sorting.
db.collection.aggregate([
{
"$unwind": "$income_statement.annual"
},
{
$sort: {
"income_statement.annual.fiscalDateEnding": 1
}
},
{
$group: {
_id: "$_id",
annual: {
$push: "$income_statement.annual"
}
}
},
{
"$project": {
_id: 1,
income_statement: {
annual: "$annual"
}
}
}
])
Here is the Mongo Playground for your reference.
I'm trying to analyse some data and I thought my queries would be faster ultimately by storing a relationship between my collections instead. So I wrote something to do the data normalisation, which is as follows:
var count = 0;
db.Interest.find({'PersonID':{$exists: false}, 'Data.DateOfBirth': {$ne: null}})
.toArray()
.forEach(function (x) {
if (null != x.Data.DateOfBirth) {
var peep = { 'Name': x.Data.Name, 'BirthMonth' :x.Data.DateOfBirth.Month, 'BirthYear' :x.Data.DateOfBirth.Year};
var person = db.People.findOne(peep);
if (null == person) {
peep._id = db.People.insertOne(peep).insertedId;
//print(peep._id);
}
db.Interest.updateOne({ '_id': x._id }, {$set: { 'PersonID':peep._id }})
++count;
if ((count % 1000) == 0) {
print(count + ' updated');
}
}
})
This script is just passed to mongo.exe.
Basically, I attempt to find an existing person, if they don't exist create them. In either case, link the originating record with the individual person.
However this is very slow! There's about 10 million documents and at the current rate it will take about 5 days to complete.
Can I speed this up simply? I know I can multithread it to cut it down, but have I missed something?
In order to insert new persons into People collection, use this one:
db.Interest.aggregate([
{
$project: {
Name: "$Data.Name",
BirthMonth: "$Data.DateOfBirth.Month",
BirthYear: "$Data.DateOfBirth.Year",
_id: 0
}
},
{
$merge: {
into: "People",
// requires an unique index on {Name: 1, BirthMonth: 1, BirthYear: 1}
on: ["Name", "BirthMonth", "BirthYear"]
}
}
])
For updating PersonID in Interest collection use this pipeline:
db.Interest.aggregate([
{
$lookup: {
from: "People",
let: {
name: "$Data.Name",
month: "$Data.DateOfBirth.Month",
year: "$Data.DateOfBirth.Year"
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$Name", "$$name"] },
{ $eq: ["$BirthMonth", "$$month"] },
{ $eq: ["$BirthYear", "$$year"] }
]
}
}
},
{ $project: { _id: 1 } }
],
as: "interests"
}
},
{
$set: {
PersonID: { $first: "$interests._id" },
interests: "$$REMOVE"
}
},
{ $merge: { into: "Interest" } }
])
Mongo Playground
const sellerSchema = Schema(
{
name: String,
url:String
}
const productSchema = Schema(
{
title: String,
sellerUrl:String
}
Below query will return unique sellerUrl from all products:
context.Product.aggregate([
{
$group: {
_id: "$sellerUrl",
}
}
]);
But I also want to exclude from aggregation, sellers that I already saved. So if url == sellerUrl aggregation must exclude that seller.
Please help me
You can try below query :
db.product.aggregate([
{
$group: {
_id: "", /** group on no condition & push all unique `sellerUrl` to sellerUrls array */
sellerUrls: { $addToSet: "$sellerUrl" }
}
},
{
$lookup: {
from: "seller",
let: { sellerUrls: "$sellerUrls" }, // creating local variable
pipeline: [
{ $group: { _id: "", urls: { $addToSet: "$url" } } }, /** group on no condition & push all unique `url` to urls array */
{ $project: { _id: 0, uniqueAndNotInSellerColl: { $setDifference: [ "$$sellerUrls", "$urls" ] } } } // get difference between two arrays
],
as: "data" // As we're grouping will always be one doc/element in an array
}
},
/** Create a new root doc from getting first element(though it will have only one) from `data` array */
{
$replaceRoot: { newRoot: { $arrayElemAt: [ "$data", 0 ] } }
}
])
Test : mongoplayground
Update :
As you need few other fields from product collection but not just the sellerUrl field then try below query :
db.product.aggregate([
{
$group: {
_id: "$sellerUrl",
docs: { $push: { title: "$title" } } // We're only retrieving `title` field from `product` docs, if every field is needed use `$$ROOT`
}
},
/** We've used basic `lookup` stage, use this if you've only few matching docs from `seller` collection
* If you've a lot of matching docs for each `_id` (sellerUrl),
* then instead of getting entire `seller` doc (which is not needed) use `lookup` with aggregation pipeline &
* just get `_id`'s of seller docs for better performace refer previous query
*/
{
$lookup: {
from: "seller",
localField: "_id",
foreignField: "url",
as: "sellerDocs"
}
},
/** match will retain only docs which doesn't have a matching doc in seller collection */
{
$match: { sellerDocs: [] }
},
{
$project: { sellerDocs: 0 }
}
])
Test : mongoplayground
In our collection, there's structure like:
Object: //below is object metadata from mongo
_id
created_at
lang
source
object: //this is real object data from our db
id
created_at
object_class
I ran below query on this collection:
db.getCollection('foo').aggregate(
[
{
$match: {
lang: 'bar',
pushed_at:{
$gte: new ISODate("2015-11-09T00:00:00.000Z"),
$lt: new ISODate("2015-11-10T00:00:00.000Z")
}
}
},
{
$group: {
_id: "$object.id",
occurences: {$sum: 1}
}
},
{
$match: {
occurences: {$gt: 1}
}
}
])
Which returned:
It appears that we got duplicate entries in our collection. By duplicate I mean objects with same Object.object.id.
I'd like to remove redundant occurences using results from agreggate function I used. Notice that I don't want to delete anything, just rendundant ones, so above aggregate returns occurences: 1.
How to do this, also using results from aggregation?
I think you can try that in the shell :
db.foo.aggregate(
[
{
$match: {
lang: 'bar',
pushed_at:{
$gte: new ISODate("2015-11-09T00:00:00.000Z"),
$lt: new ISODate("2015-11-10T00:00:00.000Z")
}
}
},
{
$group: {
_id: "$object.id",
occurences: {$sum: 1}
}
},
{
$match: {
occurences: {$gt: 1}
}
}
]).result.forEach(function(x) {
if(x.occurences > 1) {
for(i=0;i<x.occurences - 1;i++) {
db.foo.remove({"object.id":x._id}, true);
}
}
}
);
I have this mongoose query that im running
db.accounts.aggregate([{
$unwind: "$Publishers"
}, {
$group: {
_id: "$Profile._id",
reachTotal: {
$sum: "$Publishers.reach"
},
Publishers: {
$push: "$Publishers"
},
Profile: {
$first: "$Profile"
}
}
}, {
$sort: {
reachTotal: 1
}
}])
It works fine, but the problem is that some of the records dont have '$Publishers.reach'. Mongoose doesn't return those records with the sum of null, undefined or 0. Is there a way to have mongoose return them?
You'll have to tell mongodb what to do in the form of a cond statement or the ifNull statement depending on your document structure.
http://docs.mongodb.org/manual/reference/operator/aggregation/cond/
http://docs.mongodb.org/manual/reference/operator/aggregation/ifNull/
...
reachTotal: { $sum: { $ifNull: ["$Publishers.reach", 0] } }