Adding refs to another mongodb collection by matching multiple criteria - mongodb

I'm totally new to MongoDB and can't sort this out from the docs. I have two collections, TREES and PLOTS. Every tree is in a plot; every plot has, say, 4-150 trees. So far the collections look something like this:
// TREES
[{"_id": objId(), "Tree": "1", "Project": "Alpha", "Plot": "A", "Year": 1979, "Size": 20},
{"_id": objId(), "Tree": "1", "Project": "Alpha", "Plot": "A", "Year": 1986, "Size": 21},
...
{"_id": objId(), "Tree": "54", "Project": "Omega", "Plot": "Z", "Year": 2016, "Size": 17}]
// PLOTS
[{"_id": objId(), "Plot": "A", "Project": "Alpha", "Year": 1979},
{"_id": objId(), "Plot": "A", "Project": "Alpha", "Year": 1986},
...
{"_id": objId(), "Plot": "Z", "Project": "Omega", "Year": 2016}]
I want to add a reference field to all the Trees with the objId of the appropriate Plot document, matching on Project, Plot and Year. I'd also like to add a refs array to all the Plots to contain the objIds with all of each one's Trees [Edit: Although maybe that's really not necessary?]. The real schemas both have 30-40 fields so embedding would be mad. The application development will most likely be done in pymongo, if there's anything relevant there.
To Clarify:
My problem is in matching trees to plots on the three criteria -- it seems like $lookup is no use here and I've tried $unionWith but can't figure it out. The docs and tutorials are full of toy problems where you add inter-collection references by matching on one field... and I can't figure out how to generalize that. Best result has been from doing
db.TREES.aggregate([
{
'$lookup': {
'from': 'PLOTS',
'localField': 'Plot',
'foreignField': 'Plot',
'as': 'TooManyPlots'
}
}
])
That gives me an array of all Plots with the right name -- but from all Years and all Projects, and I can't figure out how to weed that out and arrive at an updated Tree document with just the one correct Plot.
I haven't yet developed the Mongo-vision to see the proper flow for this.
Could be I'm also having some XY trouble -- plus it could be that MongoDB isn't the best fit for our project anyway. It seems worth a try though.

Okay -- no great mystery to it except for understanding how to understand it. In hopes of giving someone else a hand here's how I got it to work, closely matching the doc reference #D.SM noted in the comments, and my thanks to them.
db.TREES.aggregate(
[{$lookup: {
from: 'Plots',
let: { tPlot: '$Plot', tProj: '$Project', tYear: '$Year'},
pipeline: [
{ $match:
{ $expr:
{ $and:
[{$eq: ['$Plot', '$$tPlot']},
{$eq: ['$Project', '$$tProj']},
{$eq: ['$Year', '$$tYear']}]
}
}
},
{ $project: { _id:1} }
],
as: 'Plot_id_A'
} // That does what I wanted; from here it's just tidying things up
}, {$unwind: {
path: '$Plot_id_A',
}}, {$set: {
Plot_id: '$Plot_id_A._id'
}}, {$project: {
Plot_id_A : 0
}}]
Note that in the let: line the right-hand names are all coming from the TREES collection we're working in, while in the $and: the same-named left-hand names are all coming from the PLOTS collection we're joining, and so for the loss of clarity there I acknowlege D.SM's beef with my example field names. Anyway it works.
If anyone has an improvement to suggest though... lemme know!

Related

MongoDB $lookup with conditional foreignField

Playground: https://mongoplayground.net/p/OxMnsCFZpmQ
My MongoDB version: 4.2.
I have a collection car_parts and customers.
As the name suggests car_parts has car parts, where some of them can have a field sub_parts which is a list of car_parts._ids this part consists of.
Every customer that bought something at us is stored in customers. The parts field for a customer contains a list of parts the customer bought together on a certain date.
I would like to have an aggregate query in MongoDB that returns a mapping of which car parts were bought (bought_parts) from which customers. However, if the car_parts has the field sub_parts, the customer should show up for the subparts only.
So the query in the playground gives almost the correct result already, except for the sub_parts topic.
Example for customer_3:
{
"_id": "customer_3",
"parts": [
{
"bought_parts": [
3
],
date: "15.07.2020"
}
]
}
Since bought_parts has car_parts._id = 3:
{
"_id": 3,
"name": "steering wheel",
"sub_parts": [
1, // other car_parts._id s
2
]
}
The result should show customer_3 as a customer of car parts 1 and 2.
I'm not sure how to accomplish this, but I assume a "temporary" replacement of the id 3 in bought_parts with the actual ids [1,2] might solve it.
Expected output:
[
{
"_id": 1,
"customers": [
"customer_1",
"customer_2",
"customer_3" // <- since customer_3 bought car part 3 which has 1 in sub_parts
]
},
{
"_id": 2,
"customers": [
"customer_3" // <- since customer_3 bought car part 3 which has 2 in sub_parts
]
},
{
"_id": 3,
"customers": [
"customer_1", // <- since car_parts.id = 3 has [1, 2] in sub_parts, show customers of ids [1, 2]
"customer_2",
"customer_3"
]
},
{
"_id": 4,
"customers": [
"customer_1",
"customer_2"
]
}
]
Thanks a lot in advance!
EDIT: One way to do it is:
db.car_parts.aggregate([
{
$project: {
topLevel: {$concatArrays: [{$ifNull: ["$sub_parts", []]}, ["$_id"]]},
sub_parts: 1
}
},
{$unwind: "$topLevel"},
{
$group: {
_id: "$topLevel",
parts: {$push: "$_id"},
sub_parts: {$first: "$sub_parts"}
}
},
{
$project: {
parts: {$concatArrays: [{"$ifNull": ["$sub_parts", []]}, "$parts"]}
}
},
{
$lookup: {
from: "customers",
localField: "parts",
foreignField: "parts.scanned_parts",
as: "customers"
}
},
{$project: {customers: "$customers._id"}}
])
As you can see working on this playground.
Since you said there is only one level of sub-parts, I used another idea: creating a top level before the $lookup. Since you want customers that used part 3 for example, to be registered under parts 1,2 which are sub-parts of 3, the idea is to group them. This connection is a bit clumsy after the $lookup, but if we use the data that we have on the car_parts collection before the $lookup, we actually knows already that parts 1,2 are subpart of 3. Creating a topLevel temporary field, allows to group, in advance, all the parts and sub-parts that if a customer used on of them, he should be registered under this top level part. This makes things much more elegant...

MongoDB: returning documents in order until a condition match

In a MongoDB collection, I have documents with a "position" field for ordering and an optional "date" field, e.g.
[
{
"_id": "doc1",
"position": 1
},
{
"_id": "doc2",
"position": 2,
"date": "2021-05-20T08:00:00.000Z"
},
{
"_id": "doc3",
"position": 3
},
{
"_id": "doc4",
"position": 4,
"date": "2021-05-20T08:00:00.000Z"
}
]
I would like the query this collection to get the documents "before" a specified date, in position order. The algorithm would be:
find the first element whose date is "after" the specified date
return all the documents whose position is less than the position of the element found, sorted by "position"
I have implemented this algorithm naïvely with 2 independent queries. However, I suspect it can be done with a single call to the database, but I have no idea how to proceed. Maybe with an aggregation pipeline?
Can someone give me a clue how this can be done?
EDIT: Here are the current queries I use (roughly):
limit_element = db.getCollection('collection').find({
"date": { "$gte": ISODate("2021-05-20T08:00:00.000Z") }
}).sort({
"position": 1
}).limit(1)
position = limit_element['position']
elements = db.getCollection('collection').find({
"position": { "$lt": position }
}).sort({
"position": 1
})
You can use an aggregation pipeline with two match clauses. Essentially its the same thing as you do now but within one DB access so a bit faster. With aggregation you can acess results from the previus stage to use in the next stage. If that is worth it you have to decide. I think your naive approach is sensible. In any case this a conditional problem so you will have to first find one and then do the other. Difference is just where you do the steps.

Trying to fetch data from Nested MongoDB Database?

I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.

MongoDB document setup and aggregation

I'm pretty new to MongoDB and while preparing data to be consumed I got into Aggregation... what a powerful little thing this database has! I got really excited and started to test some things :)
I'm saving time entries for a companyId and employeeId ... that can have many entries... those are normally sorted by date, but one date can have several entries (multiple registrations in the same day)
I'm trying to come up with a good schema so I could easily get my data exactly how I need and as a newbie, I would rather ask for guidance and check if I'm in the right path
my output should be as
[{
"company": "474A5D39-C87F-440C-BE99-D441371BF88C",
"employee": "BA75621E-5D46-4487-8C9F-C0CE0B2A7DE2",
"name": "Bruno Alexandre":
"registrations": [{
"id": 1448364,
"spanned": false,
"spannedDay": 0,
"date": "2019-01-17",
"timeStart": "09:00:00",
"timeEnd": "12:00:00",
"amount": {
"days": 0.4,
"hours": 2,
"km": null,
"unit": "days and hours",
"normHours": 5
},
"dateDetails": {
"week": 3,
"weekDay": 4,
"weekDayEnglish": "Thursday",
"holiday": false
},
"jobCode": {
"id": null,
"isPayroll": true,
"isFlex": false
},
"payroll": {
"guid": null
},
"type": "Sick",
"subType": "Sick",
"status": "APP",
"reason": "IS",
"group": "LeaveAndAbsence",
"note": null,
"createdTimeStamp": "2019-01-17T15:53:55.423Z"
}, /* more date entries */ ]
}, /* other employees */ ]
what is the best way to add the data into a collection?
Is it more efficient if I create a document per company/employee and add all registration entries inside that document (it could get really big as time passes)... or is it better to have one document per company/employee/date and add all daily events in that document instead?
regarding aggregation, I'm still new to all this, but I'm imagining I could simply call
RegistrationsModel.aggregate([
{
$match: {
date: { $gte: new Date('2019-01-01'), $lte: new Date('2019-01-31') },
company: '474A5D39-C87F-440C-BE99-D441371BF88C'
}
},
{
$group: {
_id: '$employee',
name: { '$first': '$name' }
}
},
{
// ... get all registrations as an Array ...
},
{
$sort: {
'registrations.date': -1
}
}
]);
P.S. I'm taken the Aggregation course to start familiarized with all of it
Is it more efficient if I create a document per company/employee and
add all registration entries inside that document (it could get really
big as time passes)... or is it better to have one document per
company/employee/date and add all daily events in that document
instead?
From what I understand of document oriented databases, I would say the aim is to have all the data you need, in a specific context, grouped inside one document.
So what you need to do is identify what data you're going to need (getting close to the features you want to implement) and build your data structure according to that. Be sure to identify future features, cause the more you prepare your data structure to it, the less it will be tricky to scale your database to your needs.
Your aggregation query looks ok !

How to query and return an individual field from a nested embedded document in mongodb?

I'm trying to build a verb database where:
each verb has many conjugations
each conjugation has one tense name (present, imperfect, ...) and many forms
each form has a personal pronoun (io, tu, ...) and the actual conjugated verb (conjugation)
I choose this structure because I want to do two types of queries:
Given a verb, a tense and a pronoun, return the associated conjugation
Given a verb, show all of its tenses, pronouns and conjugations
I'm struggling with the first one. This is what I have so far (but it doesn't work):
db.verbs.findOne({"verb": "comprare", "conjugations": {"$elemMatch": {"tense": "present", "forms.pronoun": "io"}}}, {"conjugations.forms.conjugation": 1})
Here is a reproducible example:
db.verbs.insert([{
"verb": "comprare",
"conjugations": [
{
"tense": "present",
"forms": [{"pronoun": "io", "conjugation": "compro"},
{"pronoun": "tu", "conjugation": "compri"}],
},
{
"tense": "imperfect",
"forms": [{"pronoun": "io", "conjugation": "compravo"},
{"pronoun": "tu", "conjugation": "compravi"}]
}
]
},
{
"verb": "bere",
"conjugations": [
{
"tense": "present",
"forms": [{"pronoun": "io", "conjugation": "bevo"},
{"pronoun": "tu", "conjugation": "bevi"}]
},
{
"tense": "imperfect",
"forms": [{"pronoun": "io", "conjugation": "bevevo"},
{"pronoun": "tu", "conjugation": "bevevi"}]
}
]
}])
I'm willing to change the structure of the database to make it easier to query, so feel free to suggest a more natural way to do it.
You can use the aggregation framework $unwind in the pipeline to create documents from nested arrays. You have two nested arrays, so you will unwind twice.
db.verbs.aggregate([
// match documents with specific verb
{$match: {verb: "comprare"}},
// query each conjugation as a separate document
{$unwind: "$conjugations"},
// match conjugations with the provided tense
{$match: {"conjugations.tense": "present"}},
// query each form as a separate document
{$unwind: "$conjugations.forms"},
// match conjugation form with the provided pronoun
{$match: {"conjugations.forms.pronoun": "io"}},
// only select fields of interest
{$project: {"conjugations": 1, _id: 0}}
]);
You can change the projection to get specific fields and even rename the fields and use an expression:
{"conjugations.forms.pronoun": 1}
-> {conjugations: {forms: {pronoun: "io"}}}
{"conjugation": "$conjugations.forms.conjugation"}
-> {conjugation: "compro"}