My document structure:
{ _id:objectID,
month:'2014-01'
daily:{
'01':{},
'02':{},
'03':{}
.
.
.
'31':{}
}
}
Now, i want to query objects in daily, which is in a range 08 to 13 (for say), means only objects greater 08 and less than 13. These keys(01, 02,....31) in daily object are generated dynamically. I don't want to retrieve whole daily object and then process in backend. Please help.
You can't query for slices out of an embedded array. Since the daily array is embedded in the month document, you can't treat its individual entries as individual objects.
If your query looks for individual days, you should consider modeling your data appropriately, by creating a single document for each day. e.g.:
{
_id: { month: '2014-01', day: 1 },
/* rest of daily data here */
}
This will allow you to query for particular days with or without a specific month.
Related
I am trying to use Bucket Pattern design for my DB collection following the concept of this MongoDB post. I wonder which is a smart and usual way to handle the first document when a collection is empty.
For Example, when we create collection named UserBookMarks each bucket allows to store 50 bookmarks. At the first time a user adds a bookmark there is no document to push to array. And for the 51st time a user adds a bookmark, how can I know the previous bucket is full and create a new one?
Here is some scenario I am thinking to do . When user add bookmark I
will go query to find user bucket which count < 50 then I push to that
bucket . If bucket is not exist then I create and push. But it will
cost one query to find data . Is there any built in utilize of MongoDB
support us on this type of design pattern.
Foremost, learning patterns it's imperative to learn them in context of usecases where these patterns are appropriate.
From the page refereed in the question
The Bucket Pattern
With data coming in as a stream
I have hard time imagining UserBookMarks coming as a stream, but let's put it aside assuming there is such stream.
The more important part of the pattern tho, it is intended for optimisation, i.e. the document contains bucket boundaries, in the example from the same page:
start_date: ISODate("2019-01-31T10:00:00.000Z"),
end_date: ISODate("2019-01-31T10:59:59.000Z"),
or "a day" if you translate it to English. The point is, the measurements bucket is not limited by number of documents, but timestamps of the documents within the bucket. The idea is you don't index timestamps for all measurements, but only boundaries and save on RAM to keep shorter index. It is not uncommon to combine it with pre-aggregation to calculate some stats on documents in the bucket at write time.
Setting boundaries of the bucket by number of documents in the bucket defeats this purpose.
The answer
Assuming the usecase from the article, the "smart and usual way" is to use upserts:
db.collection.updateOne(
{ start_date: ISODate("2019-01-31T10:00:00.000Z"),
end_date: ISODate("2019-01-31T10:59:59.000Z")
},
{
$push: { measurements: {
timestamp: ISODate("2019-01-31T10:42:00.000Z"),
temperature: 42
} },
$setOnInsert: { measurements: [ {
timestamp: ISODate("2019-01-31T10:42:00.000Z"),
temperature: 42
}] }
},
{ upsert: true }
)
The boundaries start_date and end_date are calculated on the application level from the measurement date ISODate("2019-01-31T10:42:00.000Z").
If a document with these boundaries already exists in the collection, the command will $push new measurement to the bucket. Otherwise a new documnet will be added to the collection with initial measurements array defined in $setOnInsert.
I have an application that stores users and their behavior in the form of events (in two collections). An important feature of the application is to query the users based on behavior and properties, for example: "All users with property X that did trigger event Y at least 10 times within the last Z days/weeks/months".
Until now I've been doing this on the raw data using aggregate() on the contact collection with a $lookup on events. However as the event collection grows this becomes slow.
My idea is to store some pre-aggregated version of the event data in a separate collection, with one document for every unique event/user combination. These documents would store:
the total number of times this event was triggered by this user
the last time he triggered it
how often he triggered it in some pre-defined time intervals for the last days/weeks/months
The documents in that collection would look like this:
{
event: 'my_event',
user: '593aaa84c685604066a6a0cf',
total: 79,
last: '2016-11-01T04:39:52.667Z',
days: { 0: 4, 1: 8, 2: 4 ... }, // 7 values here (0=today)
weeks: { ... }, // 3 values here
months: { ... } // 12 values here
}
However I'm struggling to figure out a good strategy for computing the values for the time intervals and how to keep this data up to date.
Do you have any suggestions or alternative approaches? Every idea is welcome :)
I'am working on Cloudant NoSQL database, the data is received every 15 seconds, so in every 5 mins there will be 20 documents on the database.
I need to get documents which have a difference in timestamp of five minutes for example, Document with time stamp:"2017-03-14 T 18:21:58" and
Document with time stamp:"2017-03-14 T 18:26:58" and so on...
Sample document:
Make a view keyed on the timestamp. I interpret your question as "get all 20 documents sharing a particular timestamp". If that's the case you can get away with not parsing the timestamp:
function(doc) {
if (doc && doc.timestamp) {
emit(doc.timestamp, 1)
}
}
You can now query the view:
% curl 'https://USER:PASS#ACCOUNT.cloudant.com/DATABASE/_design/DDOC/view/VIEWNAME?key=TIMESTAMP&include_docs=true&limit=20'
Substitute the uppercase bits to the relevant names and parameters for your system.
If you need finer granularity for your query (time series style), there is an old-but-useful blog from one of the Cloudant founders on the topic here: https://cloudant.com/blog/mapreduce-from-the-basics-to-the-actually-useful/
I'm looking to store pre-aggregated report data, grouped by day, in a document. I've got two options:
Option 1: flat-ish schema
For each day within timeOnSite, use a yymmdd key. All of these keys would be direct children of timeOnSite, so each key path will be timeOnSite.yymmdd.
{
"timeOnSite": {
"150421": 25,
"150418": 42
}
}
Option 2: nested schema
The nested key paths will now look like this: timeOnSite.yy.mm.dd.
{
"timeOnSite": {
"15": {
"04": {
"21": 25,
"18": 42
}
}
}
}
I have very little preference on how the data is stored. I care about:
Fast queries
Using the smallest amount of storage space possible
Anyone know whether either of the above approaches is better? Many thanks.
Update: to address the question "what kind of queries will I run", it could be any of the following:
Get the total time on site in the last 30 days: use a $project in the aggregation pipeline to $add up the total time on site. A further $match would be applied after this step for filtering.
Detect if customers spent any time on site in the last 30 days, detected by the presence of a value in any of the above keys. This would be a big list of $or queries, each looking for the existence of a different key.
I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?
When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.
Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.
For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.