How to query nested documents in MongoDB? - mongodb

I am very new to database and MongoDB. I tried to stored financial statment information in the database with each document representing a company.
I have created a nested documents (maybe not a good way), like the following diagram. the outest level contains Annual statement, Basic info, Key Map, and Interim Statement. And within Annual statement, there are different dates. And within dates, we have different types of statements (INC, BAL, CAS), and then the inner level contains the real data.
My question is how can I query the db to give me all documents contain 2017 statements (for example)?
The year is now formated as YYYY-MM_DD. but I only wants to filter YYYY.

I highly discourage to use variable (date, INC here) as field name. It will be (much more) harded to query, update, you cannot use index. So it's a very bad idea in most of case, even if it can be 'acceptable' (but bad practice) in case of a few numbers of static values (INC, BAL, CAS).
My advice will be to change your schema for something easier to use like for example :
[
{
"annual_statement": [
{ "date": ISODate("2017-12-31"),
"INC":{
"SREV": 1322.5,
"RTLR": 1423.4,
...
},
"BAL":{...},
"CAS":{...}
},
{ "date": ISODate("2017-12-31"),
"INC":{
"SREV": 1322.5,
"RTLR": 1423.4,
...
},
"BAL":{...},
"CAS":{...}
}
]
}
]
To query this schema, use the following :
db.collection.find({
"annual_statement": {
$elemMatch: {
date: {
$lt: ISODate("2018-01-01"),
$gte: ISODate("2017-01-01"),
}
}
}
})
Will return you whole documents where at least on date is in 2017.
Adding projection like following will return you only matching annual statements :
db.collection.find({
"annual_statement": {
$elemMatch: {
date: {
$lt: ISODate("2018-01-01"),
$gte: ISODate("2017-01-01"),
}
}
}
},
{
"annual_statement": {
$elemMatch: {
date: {
$lt: ISODate("2018-01-01"),
$gte: ISODate("2017-01-01"),
}
}
}
})

Use $exists (note I'm using python - pymongo driver syntax below - yours may differ)
https://docs.mongodb.com/manual/reference/operator/query/exists/
Example: Find all "2017" "INC" records, by company.
year_exists=db.collection.find({'Annual.2017-12-31': {'$exists':True}})
for business in year_exists:
bus_name = business['BasicInfo']['CompanyName']
financials_INC = business['Annual']['2017-12-31']['INC'])
print(bus_name, financials_INC)

Related

MongoDB - $push accumulator is slowing query down

To develop a web application, I use MongooDB as Back-End and I need to retrieve data from it. On a particular page I need to recover prices' history from specific brands. In my MongoDB collection, here is how a document/product is saved :
{
brand : "exampleBrand"
prices : Array
0: Object
date "2022-03-08"
price: 1900
1: Object
date "2022-03-09"
price: 1910
}
My goal is then to retieve dates and prices from a specific brand in the following format :
{
{
date : "2022-03-08",
prices : [price_product1, priceproduct2,...]
}
{
date : "2022-03-09",
prices : [price_product1, priceproduct3,...]
}
}
In order to do that I have designed the following query :
db.Prices.aggregate([
{
$match: {
{brand: "exampleBrand"}],
},
},
{
$project: {
_id: 0,
prices: 1,
},
},
{
$unwind: '$prices',
},
{
$group: {
_id: "$prices.date",
prix: {
$push: "$prices.price",
},
},
},
]);
Once I have these results I can go on with different calculations etc... to display on my page. However, there are approcimatively 90000 documents, each of them having in average 30 prices and dates. Thus, the group stage of the aggregation pipeline is taking a long time.
I have try different indexes on "prices", "prices.date", "brand, prices" but none of them seem to speed up the query. I have also tried twisting and changing the query but couldn't find a more efficient way to get my results. Would anyone have an idea on how to achieve this ?
Thank you,
For faster querying here with these conditions I think the only way is to use mechanisms like Redis or Memcached because the query, especially in the array, would cost a lot of io and process for the aggregation process.
P.S. I doubt that but if you somehow are able to change your data structure in a way that it would be flat it would be faster but not like caching method faster.
example:
{
brand : "exampleBrand"
price1 : 1900
date1 : "2022-03-08"
price2 : 2100
date2 : "2022-03-29"
}

MongoDB querying aggregation in one single document

I have a short but important question. I am new to MongoDB and querying.
My database looks like the following: I only have one document stored in my database (sorry for blurring).
The document consists of different fields:
two are blurred and not important
datum -> date
instance -> Array with an Embedded Document Object; Our instance has an id, two not important fields and a code.
Now I want to query how many times an object in my instance array has the group "a" and a text "sample"?
Is this even possible?
I only found methods to count how many documents have something...
I am using Mongo Compass, but i can also use Pymongo, Mongoengine or every other different tool for querying the mongodb.
Thank you in advance and if you have more questions please leave a comment!
You can try this
db.collection.aggregate([
{
$unwind: "$instance"
},
{
$unwind: "$instance.label"
},
{
$match: {
"instance.label.group": "a",
"instance.label.text": "sample",
}
},
{
$group: {
_id: {
group: "$instance.label.group",
text: "$instance.label.text"
},
count: {
$sum: 1
}
}
}
])

MongoDb - add index on 'calculated' fields

I have a query that includes an $expr-operator with a $cond in it.
Basically, I want to have objects with a timestamp from a certain year. If the timestamp is not set, I'll use the creation date instead.
{
$expr: {
$eq: [
{
$cond: {
'if': {
TimeStamp: {
$type: 'null'
}
},
then: {
$year: '$Created'
},
'else': {
$year: '$TimeStamp'
}
}
},
<wanted-year>
]
}
}
It would be nice to have this query using a index. But is it possible to do so? Should I just add index to both TimeStamp and Created-fields? Or is it possible to create an index for a Year-field that doesn't really exist on the document itself...?
Not possible
Indexes are stored on disk before executing the query.
Workaround: On-Demand Materialized Views
You store in separate collection your calculated data (with indexes)
This can't be done today without precomputing that information and storing it in a field on the document. The closest alternative would probably be to use MongoDB 4.2's aggregation pipeline-powered updates to precompute and store a createdOrTimestamp field whenever your documents are updated. You could then create an index on createdOrTimestamp that would be used when querying for documents that match a certain year.
What this would look like when updating or after inserting your document:
db.collection.update({ _id: ObjectId("5e8523e7ea740b14fb16b5c3") }, [
{
$set: {
createdOrTimestamp: {
$cond: {
if: {$gt: ['$TimeStamp', null]},
then: '$TimeStamp',
else: '$Created'
}
}
}
}
])
If documents already exist, you could also send off an updateMany operation with that aggregation to get that computed field into all your existing documents.
It would be really nice to be able to define computed fields declaratively on a collection just like indexes, so that they take care of keeping themselves up to date!

mongodb: pull element from nested array where date is before now

I am having an issue which deemed to be so simple. Given nested documents that have an expires field. One example document:
{
_id: ObjectId(),
stuff: [
{
name: 'egg',
expires: ISODate("2019-07-19T12:52:56.163Z")
},
{
name: 'potato',
expires: ISODate("2019-07-19T12:52:56.163Z")
}
]
}
I thought I could use a query like that:
db.collection.update({"_id": ObjectId("578411d30af77c226c52b940")}, {
"$pull": {
"stuff.expires": { "$lt": ISODate() }
}
});
possibly being able to apply that to multiple documents at once, but even when trying to update a single document I run into that error:
Cannot use the part (expires) of (stuff.expires) to traverse the element
I tried a ton of modifications, but I was not able to find a way to make this work or a similar example (which seems to be quite odd when searching for mongodb stuff).
If there is no way to update multiple documents at once, I would be happy if there would be a way to remove all expired items from a single document in an atomic query. The query does not need to work with older mongodb versions - latest version is fine.
I think you can achieve what you want using this query:
db.collection.update({}, {
$pull: {
stuff: {
expires: {
$lt: ISODate()
}
}
}
}, {
multi: true
})
The above query will target all documents, and will pull every stuff object which expires property is lower than your ISODate()
the multi:true is the option to allow to update multiple documents

MongoDB - Query Available Date Range with a Date Range for Hotel

I have an array of objects containing dates of when a hotel is available to book within Mongo. It looks something like this, using ISO Date formats as said here.
Here's what document looks like, trying to keep it short for the example.
available: [
{
"start":"2014-04-07T00:00:00.000000",
"end":"2014-04-08T00:00:00.000000"
},
{
"start":"2014-04-12T00:00:00.000000",
"end":"2014-04-15T00:00:00.000000"
},
{
"start":"2014-04-17T00:00:00.000000",
"end":"2014-04-22T00:00:00.000000"
},
]
Now, I need query two dates, check in date and check out date. If the dates are available, Mongo should return the document, otherwise it won't. Here are a few test cases:
2014-04-06 TO 2014-04-08 should NOT return.
2014-04-13 TO 2014-04-16 should NOT return.
2014-04-17 TO 2014-04-21 should return.
How would I go about forming this in to a Mongo query? Using $elemMatch looked like it would be a good start, but I don't know where to take it after that so all three examples I posted above work with the same query. Any help is appreciated.
db.collection.find({
"available": {
"$elemMatch": {
"start": { "$lte": new Date("2014-04-17") },
"end": { "$gte": new Date("2014-04-21") }
}
}
})
How about this command?
Well I actually hope your documents have real ISODates rather than what appears to be strings. When they do then the following query form matches as expected:
db.collection.find({
"available": {
"$elemMatch": {
"start": { "$gte": new Date("2014-04-17") },
"end": { "$gte": new Date("2014-04-21") }
}
}
})