Mongo - Treat null or undefined fieled as a specific value - mongodb

I have the following situation:
Consider a collection with the following documents:
[
{
'_id': ObjectId('somehting'),
'date': null
},
{
'_id': ObjectId('somehting'),
},
{
'_id': ObjectId('somehting'),
'date': '2015-01-01 12:12:12'
},
many others
]
Now I have the following query that finds documents with date between to values db.getCollection('validation_archive').find({'date': {$lte: '[date_here]', {$gte: '[date_here]'}}});
All works fine, except for the fields with null or nonexistent.
Is there anyway I can tell mongo to treat null as '0000-00-00 00:00:00'?
Edit: I need to do this, so if the date sent in $gt is 0000-00-00 00:00:00, the query returns the document in result.

In a general query then no. You can always exlude them from results as in :
db.getCollection('validation_archive').find({
"date": { "$lte": date_to, "$gte" date_from, "$ne": null }
})
Or you can be "inclusive" with the "zero" or "epoch" date you suggest using .aggregate():
db.getCollection('validation_archive').aggregate([
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$gte": [ date_from, { "$ifNull": [ "$date", new Date(0) ] } ] },
{ "$lte": [ date_to, { "$ifNull": [ "$date", new Date(0) ] } ] }
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
But in the context of what you are asking, then we would have to ask "What is the point".
Or even if you must:
db.getCollection('validation_archive').aggregate([
{ "$project": {
"date": { "$ifNull": [ "$date", new Date(0) ] }
}},
{ "$match": {
"$or": [
{ "date": { "$lte": date_to, "$gte" date_from } },
{ "date": { "$eq": Date(0) } }
]
}}
])
And that is completely inclusive in results.
But then again why not just do:
db.getCollection('validation_archive').find({
"$or": [
{ "date": { "$lte": date_to, "$gte" date_from },
{ "date": null },
{ "date": { "$exists": false } }
]
})
Which is a lot more efficient.
So it is possible to "project" as date where not present, but it mostly makes sense to simply use the basic query operations instead.

Try to put value of date as 0 at the time of insert itself. It'll help you execute the query.
If the collection is already existing you can update the date field by using update queries.
Use mongo's projection technique to insert a new field, say, newDate with value either 0 or actual date. Use filter queries on newDate field after that.

If you want to exclude results having date equals to null or nonexistent, then
db.getCollection('validation_archive').find({
date: {
$ne: null,
$lt: '[date_here]',
$gt: '[date_here]'
}
});

Related

MongoDB Query on multiple field condition

I have a MongoDB model that is currently like this (this is the stripped version):
{
title: String,
type: {
type: String,
lowercase: true,
enum: ['event', 'regular', 'project'],
},
project_start_time: Date,
project_end_time: Date,
regular_start_date: Date,
regular_end_date: Date,
events: [{
id: Number,
date: Date
}]
}
Now, I want to query something like this:
Find data where the regular_end_date, project_end_time, and events at the last index are lower than the date provided
The catch is, not every data has the three criteria above because it is available according to the types (Sorry for the messy data, it is already there). Below is an example:
If the data type is an event, then there are events
If the data type is regular, then there are regular_start_date and regular_end_date
If the data type is a project, then there are project_start_date and project_end_date
So far, I've tried to use this:
db.data.find({
"$or": [
{
"project_end_time": {
"$lt": ISODate("2022-12-27T10:09:49.753Z")
},
},
{
"regular_end_date": {
"$lt": ISODate("2022-12-27T10:09:49.753Z")
}
},
{
"$expr": {
"$lt": [
{
"$getField": {
"field": "date",
"input": {
"$last": "$events"
}
}
},
ISODate("2022-12-27T10:09:49.753Z")
]
}
}
]
})
Also with aggregation pipeline:
db.data.aggregate([
{
$match: {
"$or": [{
"project_end_time": {
"$lt": ISODate("2022-12-27T10:09:49.753Z")
},
},
{
"regular_end_date": {
"$lt": ISODate("2022-12-27T10:09:49.753Z")
}
},
{
"$expr": {
"$lt": [{
"$getField": {
"field": "date",
"input": {
"$last": "$events"
}
}
},
ISODate("2022-12-27T10:09:49.753Z")
]}
}]
}
}
])
But it shows all data as if it wasn't filtered according to the criteria. Any idea where did I do wrong?
FYI I am using MongoDB 5.0.2
One option is to check if the relevant field exists before checking its value, otherwise its value is null which is less than your requested date:
db.collection.find({
$or: [
{$and: [
{project_end_time: {$exists: true}},
{project_end_time: {$lt: ISODate("2022-12-27T10:09:49.753Z")}}
]},
{$and: [
{regular_end_date: {$exists: true}},
{regular_end_date: {$lt: ISODate("2022-12-27T10:09:49.753Z")}}
]},
{$and: [
{"events.0": {$exists: true}},
{$expr: {
$lt: [
{$last: "$events.date"},
ISODate("2022-12-27T10:09:49.753Z")
]
}}
]}
]
})
See how it works on the playground example

How do I query mongodb with aggregration by passing data as a parameter to filter unix time stamp

I'm querying through Metabase which is connected to a Mongodb server. The field which I'm querying is nested and is a Unix timestamp. See below
{
room_data: {
"meta": {
"xxx_unrecognized": null,
"xxx_sizecache": 0,
"id": "Hke7owir4oejq3bMf",
"createdat": 1565336450838,
"updatedat": 1565336651548,
}
}
}
The query I have written is as follows
[
{
$match: {
client_id: "{{client_id}}",
"room_data.meta.createdat": {
$gt: "{{start}}",
$lt: "{{end}}",
}
}
},
{
$group: {
id: "$room_data.recipe.id",
count: {
$sum: 1
}
}
}
]
I do not get any result as the field room_data.meta.createdat is not a date (Aug 20, 2020) which I'm passing in. Here start and end are the parameters (Metabase feature) which I'm passing in the Date format. I need some help in converting those dates into unix timestamp which can then be used to filter out the results between the specific dates
If you're using Mongo version 4.0+ you can then use $toDate in you're aggregation like so:
db.collection.aggregate([
{
$match: {
$expr: {
$and: [
{
$eq: [
"$client_id",
{{client_id}}
]
},
{
$lt: [
{
$toDate: "$room_data.meta.createdat"
},
{{end}}
]
},
{
$gt: [
{
$toDate: "$room_data.meta.createdat"
},
{{start}}
]
}
]
}
}
}
])
MongoPlayground
If you're you're on an older Mongo version I recommend you either convert you're database fields to be Date type, or you convert your input into a number timestamp somehow (I'm unfamiliar with metabase).
The last option is to use $subtract as you can subtract a number from a date in Mongo, then check to see whether that date is before or after 1970-01-01T00:00:00Z. the problem with this approach is it does not consider timezones, so if your input's timezone is different than your database one or is dynamic this will be a problem you'll have to account for.
db.collection.aggregate([
{
$match: {
$expr: {
$and: [
{
$eq: [
"$client_id",
{{client_id}}
]
},
{
$gt: [
{
"$subtract": [
{{end}},
"$room_data.meta.createdat"
]
},
ISODate("1970-01-01T00:00:00.000Z")
]
},
{
$lt: [
{
"$subtract": [
{{start}},
"$room_data.meta.createdat"
]
},
ISODate("1970-01-01T00:00:00.000Z")
]
}
]
}
}
}
])
MongoPlayground

Return Sub-document only when matched but keep empty arrays

I have a collection set with documents like :
{
"_id": ObjectId("57065ee93f0762541749574e"),
"name": "myName",
"results" : [
{
"_id" : ObjectId("570e3e43628ba58c1735009b"),
"color" : "GREEN",
"week" : 17,
"year" : 2016
},
{
"_id" : ObjectId("570e3e43628ba58c1735009d"),
"color" : "RED",
"week" : 19,
"year" : 2016
}
]
}
I am trying to build a query witch alow me to return all documents of my collection but only select the field 'results' with subdocuments if week > X and year > Y.
I can select the documents where week > X and year > Y with the aggregate function and a $match but I miss documents with no match.
So far, here is my function :
query = ModelUser.aggregate(
{$unwind:{path:'$results', preserveNullAndEmptyArrays:true}},
{$match:{
$or: [
{$and:[
{'results.week':{$gte:parseInt(week)}},
{'results.year':{$eq:parseInt(year)}}
]},
{'results.year':{$gt:parseInt(year)}},
{'results.week':{$exists: false}}
{$group:{
_id: {
_id:'$_id',
name: '$name'
},
results: {$push:{
_id:'$results._id',
color: '$results.color',
numSemaine: '$results.numSemaine',
year: '$results.year'
}}
}},
{$project: {
_id: '$_id._id',
name: '$_id.name',
results: '$results'
);
The only thing I miss is : I have to get all 'name' even if there is no result to display.
Any idea how to do this without 2 queries ?
It looks like you actually have MongoDB 3.2, so use $filter on the array. This will just return an "empty" array [] where the conditions supplied did not match anything:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$or": [
{ "$gt": [ "$$result.week", week ] },
{ "$not": { "$ifNull": [ "$$result.week", false ] } }
]}
]
}
}
}
}}
])
Where the $ifNull test in place of $exists as a logical form can actually "compact" the condition since it returns an alternate value where the property is not present, to:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$$result.week", week+1 ] },
week
]}
]
}
}
}
}}
])
In MongoDB 2.6 releases, you can probably get away with using $redact and $$DESCEND, but of course need to fake the match in the top level document. This has similar usage of the $ifNull operator:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [{ "$ifNull": [ "$year", year ] }, year ] },
{ "$gt": [
{ "$ifNull": [ "$week", week+1 ] }
week
]}
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
If you actually have MongoDB 2.4, then you are probably better off filtering the array content in client code instead. Every language has methods for filtering array content, but as a JavaScript example reproducible in the shell:
db.collection.find().forEach(function(doc) {
doc.results = doc.results.filter(function(result) {
return (
result.year == year &&
( result.hasOwnProperty('week') ? result.week > week : true )
)
]);
printjson(doc);
})
The reason being is that prior to MongoDB 2.6 you need to use $unwind and $group, and various stages in-between. This is a "very costly" operation on the server, considering that all you want to do is remove items from the arrays of documents and not actually "aggregate" from items within the array.
MongoDB releases have gone to great lengths to provide array processing that does not use $unwind, since it's usage for that purpose alone is not a performant option. It should only ever be used in the case where you are removing a "significant" amount of data from arrays as a result.
The whole point is that otherwise the "cost" of the aggregation operation is likely greater than the "cost" of transferring the data over the network to be filtered on the client instead. Use with caution:
db.collection.aggregate([
// Create an array if one does not exist or is already empty
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ifNull": [ "$results.0", false ] },
"$results",
[false]
]
}
}},
// Unwind the array
{ "$unwind": "$results" },
// Conditionally $push based on match expression and conditionally count
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": {
"$push": {
"$cond": [
{ "$or": [
{ "$not": "$results" },
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
]}
] },
"$results",
false
]
}
},
"count": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
] }
1,
0
]
}
}
}},
// $unwind again
{ "$unwind": "$results" }
// Filter out false items unless count is 0
{ "$match": {
"$or": [
"$results",
{ "count": 0 }
]
}},
// Group again
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": { "$push": "$results" }
}},
// Now swap [false] for []
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ne": [ "$results", [false] ] },
"$results",
[]
]
}
}}
])
Now that is a lot of operations and shuffling just to "filter" content from an array compared to all of the other approaches which are really quite simple. And aside from the complexity, it really does "cost" a lot more to execute on the server.
So if your server version actually supports the newer operators that can do this optimally, then it's okay to do so. But if you are stuck with that last process, then you probably should not be doing it and instead do your array filtering in the client.

Count Multiple Date Ranges in a Query

I have the following aggregate query which gives me counts (countA) for a given date range period. In this case 01/01/2016-03/31/2016. Is it possible to add a second date rage period for example 04/01/2016-07/31/2016 and count these as countB?
db.getCollection('customers').aggregate(
{$match: {"status": "Closed"}},
{$unwind: "$lines"},
{$match: {"lines.status": "Closed"}},
{$match: {"lines.deliveryMethod": "Tech Delivers"}},
{$match: {"date": {$gte: new Date('01/01/2016'), $lte: new Date('03/31/2016')}}},
{$group:{_id:"$lines.productLine",countA: {$sum: 1}}}
)
Thanks in advance
Sure, and you can also simplify your pipeline stages quite a lot, mostly since successive $match stages are really a single stage, and that you should always use match criteria at the beginning of any aggregation pipeline. Even if it doesn't actually "filter" the array content, it at least just selects the documents containing entries that will actually match. This speeds things up immensely, and especially with large data sets.
For the two date ranges, well this is just an $or query argument. Also it would be applied "before" the array filtering is done, since after all it is a document level match to begin with. So again, in the very first pipeline $match:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Unwind the array
{ "$unwind": "$lines" },
// Filter just the matching elements
// Successive $match is really just one pipeline stage
{ "$match": {
"lines.status": "Closed",
"lines.deliveryMethod": "Tech Delivers"
}},
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
The $or basically "joins" two result sets as it looks for "either" range criteria to apply. As this is given in addition to the other arguments, the logic is an "AND" condition as with the others on the criteria met with either $or argument. Note the $gte and $lt combination is also another form of expressing "AND" conditions on the same key.
The $elemMatch is applied since "both" criteria are required on the array element. If you just directly applied them with "dot notation", then all that really asks is that "at least one array element" matches each condition, rather than the array element matching "both" conditions.
The later filtering after $unwind can use the "dot notation" since the array elements are now "de-normalised" into separate documents. So there is only one element per document to now match the conditions.
When you apply the $group, instead of just using { "$sum": 1 } you rather "conditionally assess whether to count it or not by using $cond. Since both date ranges are within the results, you just need to determine if the current document being "rolled up" belongs to one date range or another. As a "ternary" (if/then/else) operator, this is what $cond provides.
It looks at the values within "date" in the document and if it matches the condition set ( first argument - if ) then it returns 1 ( second argument - then ), else it returns 0, effectively not adding to the current count.
Since these are "logical" conditions then the "AND" is expressed with a logical $and operator, which itself returns true or false, requiring both contained conditions to be true.
Also note the correction in the Date object constructors, since if you do not instantiate with the string in that representation then the resulting Date is in "localtime" as opposed to the "UTC" format in which MongoDB is storing the dates. Only use a "local" constructor if you really mean that, and often people really don't.
The other note is the $lt date change, which should always be "one day" greater than the last date you are looking for. Remember these are "beginning of day" dates, and therefore you usually want all possible times within the date, and not just up to the beginning. So it's "less than the next day" as the correct condition.
For the record, with MongoDB versions from 2.6, it's likely better to "pre-filter" the array content "before" you $unwind. This removes the overhead of producing new documents in the "de-normalizing" that occurs that would not match the conditions you want to apply to array elements.
For MongoDB 3.2 and greater, use $filter:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$project": {
"lines": {
"$filter": {
"input": "$lines",
"as": "line",
"cond": {
"$and": [
{ "$eq": [ "$$line.status", "Closed" ] },
{ "$eq": [ "$$line.deliveryMethod", "Tech Delivers" ] }
]
}
}
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Or for at least MongoDB 2.6, then apply $redact instead:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ "$status", "Closed" ] },
{ "$eq": [
{ "$ifNull": ["$deliveryMethod", "Tech Delivers" ] },
"Tech Delivers"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Noting that funny little $ifNull in there which is necessary due to the recursive nature of $$DESCEND, since all levels of the document are inspected, including the "top level" document and then "descending" into subsequent arrays and members or even nested objects. The "status" field is present and has a value of "Closed" due to earlier query selection criteria for the top level field, but of course there is no "top level" element called "deliveryMethod", since it is only within the array elements.
That basically is the "care" then needs to be take when using $redact like this, and if the structure if the document does not allow such conditions, then it's not really an option, so revert to processing $unwind then $match instead.
But where possible, use those methods in preference to the $unwind then $match processing, as it will save considerable time and use less resources by using the newer techniques instead.

GroupBy DayOfMonth in mongodb but project Complete Date

I have a Collection containing a date field. I want to group it by dayOfMonth but at the time of projection I want to project the complete Date and associated count.
I have a raw Collection in mongodb containing a Timestamp (Date field)
This is my Aggregation query:
db.raw.aggregate(
{
"$match" : { "Timestamp":{$gte:new Date("2012-05-30T00:00:00.000Z"),$lt:new Date("2014-05-31T00:00:00.000Z")}}
},
{
$group:
{
_id: { ApplicationId: "$ApplicationId", date: {$dayOfMonth: '$Timestamp'} },
count: { $sum: 1 }
}
}
)
In the above query I'm grouping with dayOfMonth but how can I project complete the Date with count?
Your "Timestamp" values are clearly actual points in time so there really isn't a "complete date" to return. You could just generally "do the math" based on the date range you are applying and the "day of month" values returned as you process the results returned.
But alternately you could just "apply the math" to the date values in order by rounding the "timestamp" values out to the day. The returned values are no longer date objects, but they are the millisecond since epoch values, so it is relatively easy to "seed" those to date functions:
db.raw.aggregate([
{ "$match" : {
"Timestamp":{
"$gte": new Date("2012-05-30"),
"$lt": new Date("2014-05-31")
}
}},
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$Timestamp", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$Timestamp", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
])
]
},
"count": { "$sum": 1 }
}}
])
So when you subtract one date object from another the difference is milliseconds is returned as a number. So this just normalizes to epoch seconds by subtracting the epoch date. The rest is basic date math to round the result to the current day.
Alternately again you could just use other date aggregation operators and concatenate to a string, but there would be usually a bit more work involved unless those values were for direct use:
db.raw.aggregate([
{ "$match" : {
"Timestamp":{
"$gte": new Date("2012-05-30"),
"$lt": new Date("2014-05-31")
}
}},
{ "$group": {
"_id": {
"$concat": [
{ "$substr": [{ "$year": "$Timestamp" },0,4] },
"-",
{ "$substr": [{ "$month": "$Timestamp" },0,2] },
"-",
{ "$substr": [{ "$dayOfMonth": "$Timestamp" },0,2] }
]
},
"count": { "$sum": 1 }
}}
])
Neil Lunn has provides a great answer.
Theirs one more approach that u can use :
db.raw.aggregate([
{
"$match" :
{
"Timestamp":{"$gte": new Date("2012-05-30"), "$lt": new Date("2014-07-31")}
}
},
{
"$group" :
{
"_id":{"$dayOfMonth": "$Timestamp"},
"Date":{"$first":"$Timestamp"},
"count": { "$sum": 1 }
}
}
])
It will return you date.
Hope so this helps you.