How to write MongoDB query to filter selectively - mongodb

I want to write a query where we have 2 types of employees stored in a database. I want a list of all the employees sorted by their start date, but if an employee is on ‘contract’ then I want to include only those employees whose end date is still in future.
Example of such a collection would be:
{
"_id": 123,
"startDate": some date
"endDate": some date
"type": “FULL_TIME",
},
{
"_id": 123,
"startDate": some date
"endDate": some date
"type": “CONTRACT",
},

You can try below $or find query.
First condition to get all active CONTRACT employees and second part to get all FULL_TIME employees followed by asc sorting on startDate.
db.collection.find(
{ $or:
[
{ $and:
[
{ type: "CONTRACT" },
{ endDate: { $gt: new ISODate() } }
]
},
{ type:"FULL_TIME" }
]
}
).sort( { startDate: 1 } );

Related

mongodb/aggregation/lookup - Optimize lookup pipeline to only filter the matched record from previous pipeline and also aggregating it

DB Schema:
3 Collections - `Collection1`, `Collection2`, `Collection3`
Column Structure:
Collection1 - [ _id, rid, aid, timestamp, start_timestamp, end_timestamp ]
Collection2 - [ _id, rid, aid, task ]
Collection3 - [ _id, rid, event_type, timestamp ]
MONGO DB VERSION: 4.2.14
The problem statement is, we need to join all the three collections using rid. Collection1 is the parent source from where we need the analysis from.
Collection2 contains the task for each record for rid in collection1. We need a "count" for each rid in this collection. Collection3 contains the event log for each record for rid. This is quite huge so we just need to filter only two events EventA & EventB for each rid found in pipeline-1.
I could come up with this but it's not working. I am not getting the min date from Collection3 for each rid matched in previous pipeline.
Note: Event logs min date for each event should be associated with rid matched in Collection1 filter.
Query:
db.getCollection("Collection1").aggregate([
{
$match: {
start_timestamp: {
$gte: new Date(ISODate().getTime() - 1000 * 60 * 15),
},
},
},
{
$lookup: {
from: "Collection2",
localField: "rid",
foreignField: "rid",
as: "tasks",
},
},
{
$lookup: {
from: "Collection3",
pipeline: [
{
$match: {
event: {
$in: ["EventA", "EventB"]
}
}
},
{
$group: {
_id: "$event",
timestamp: {
"$min": {
"updated": "$timestamp"
}
}
}
}
],
as: "eventlogs",
},
}
]);
Expected Output:
[
rid: "123",
aid: "456",
timestamp: ISODate("2022-06-03T09:46:39.609Z"),
start_timestamp: ISODate("2022-06-03T09:46:39.657Z"),
tasks: [
count: 5
],
logs: [
{
"_id": "EventA",
"timestamp": {
"updated": ISODate("2022-04-27T06:10:44.323Z")
}
},
{
"_id": "EventB",
"timestamp": {
"updated": ISODate("2022-05-05T06:36:10.271Z")
}
}
]
]
I need to write a highly optimized query which would do the above in less time (Assuming proper indexes are in place for each collection on columns). That query should not do COLLSCAN for the entire data in Collection3, since it's going to be quite huge.

$ne: Compare Date of entry with field in an array

I created a test collection for this and it looks like this:
{
"createDate": ISODate("2021-01-04T15:00:00.000+00:00"),
"arr": [
{
"date": ISODate("2021-01-04T15:00:00.000+00:00")
}
]
},
{
"createDate": ISODate("2021-01-04T16:00:00.000+00:00"),
"arr": [
{
"date": ISODate("2021-01-04T15:00:00.000+00:00")
}
]
}
The only difference is: The first entry has the same date, the second one has different dates.
Now I want to create a query to find all entries, where the two dates are different. I tried it with this:
db.getCollection('test').find(
{
"createDate": { $ne: "arr.0.date" }
}
)
Unfortunately, this query delivers me both entries. What am I doing wrong?
It requires to match in $expr expression, that compare fields from the same document, to access element from specific index use $arrayElemAt
db.getCollection('test').find({
$expr: {
$ne: [
"$createDate",
{ $arrayElemAt: ["$arr.date", 0] }
]
}
})
Playground

$facet \ $bucket date manipulation

I am using $facet aggregation for e-commerce style platform.
for this case I have an products collections, and the product schema contain many fields.
I am using same aggregation process to get all the required facets.
the question is, if I am able to use the same $facet \ $bucket aggrigation to group documents by manipulation of specific field -
for example - in product schema I have releaseDate field which is Date (type) field.
currently the aggregation query is looking like this:
let facetsBodyExample = {
'facetName': [
{ $unwind: `$${fieldData.key}` },
{ $sortByCount: `$${fieldData.key}` }
],
'facetName2': [
{ $unwind: `$${fieldData.key}` },
{ $sortByCount: `$${fieldData.key}` }
],
...,
...
};
let results = await Product.aggregate({
$facet: facetsBodyExample
});
the documents looks like
{
_id : 'as5d16as1d65sa65d165as1d',
name : 'product name',
releaseDate: '2015-07-01T00:00:00.000Z',
field1:13,
field2:[1,2,3],
field3:'abx',
...
}
I want to create custom facets (groups) by quarter + year in format like 'QQ/YYY', without defining any boundaries.
now, I am getting groups of exact match of date, and I want to group them into quarter + year groups, if possible in the same $facet aggregation.
Query result of the date field I want to customize:
CURRENT:
{
relaseDate: [
{ "_id": "2017-01-01T00:00:00.000Z", "count": 26 },
{ "_id": "2013-04-01T00:00:00.000Z", "count": 25 },
{ "_id": "2013-07-01T00:00:00.000Z", "count": 23 },
...
]
}
DESIRED:
{
relaseDate: [
{ "_id": "Q1/2014", "count": 100 },
{ "_id": "Q2/2014", "count": 200 },
{ "_id": "Q3/2016", "count": 300 },
...
]
}
Thanks !!

Mongo - Return items that exist with one value but not another

I have a collection of documents that represent products available in different locales. Some products are available only in certain locales. If a product isn't specifically listed for a given locale, then it's not available for that locale.
Here's a tiny example, with only the pertinent bits:
{ "locale": "US", "productId": "123" }
{ "locale": "CA", "productId": "123" }
{ "locale": "FR", "productId": "123" }
{ "locale": "US", "productId": "456" }
{ "locale": "FR", "productId": "456" }
I'd like to query for all the productIds that are available in the US, but not in CA. In this example, that'd be productId 456.
If I were doing this in SQL, I'd join the table to itself and find the answer that way. But I'm not sure how to tackle this sort of query in Mongo. What's the most appropriate way to do this sort of thing?
You can use aggregation pipeline:
db.products.aggregate([
{ $group: { _id: "$productId", locales: { $push: "$locale" } } },
{ $match: { locales: "US", locales: {$nin: ["CA"] } } },
{ $project: { _id: 0, productId: "$_id" } }
])
Details:
Group products by id and push all product locales into the array.
Select those documents which have US locale but don't have CA locale
Project results to get only productId values.
Output:
{ "productId" : "456" }
Note that you can get all product ids in single array if you'll do grouping after match stage:
{ $group: { _id: 1, productIds: { $push: "$_id" } } }
According to description as mentioned in above question please try executing following aggregate query in MongoDB shell.
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$group: {
_id:{productId:'$productId'},
locale:{$addToSet:'$locale'}
}
},
// Stage 2
{
$match: {
locale:{$in:['US'],$nin:['CA']}
}
},
]
);
The above mentioned aggregate query executes following stages of aggregation in pipeline.
$group operator groups documents according to values of productId and
push values of locale associated to productId into an array using
$addToSet operator.
$match operator filters documents having only US locale along
with excluding documents having CA locale from array of locale
values generated in group stage.

Need to aggregate by hour and $avg not recognized

From a MongoDB collection storing data with time stamps I need to return a single record for each hour.
So far I have selected the set of records between two dates successfully, but I cant figure how to build the hourly record I need in the $group clause.
var myName = "CollectionName"
//schema for mongoose
var mySchema = new Schema({
dt: Date,
value: Number
});
var myDB = mongoose.createConnection('mongodb://localhost:27017/MYDB');
myDBObj = myDB.model(myName, evalSchema, myName);
The match in this aggregate call works fine, and the $hour creates a record for each hour in the day.. but I don't know how to recreate the a full date and get an error "unknown group operator $avg" ...
myDBObj.aggregate([
{
$match: { "dt": { $gt: new Date("October 13, 2010 12:00:00"), $lt: new Date("November 13, 2010 12:00:00") } }
},{
$group: {
"_id": { "dt": { "$hour": "$dt" } , "price": { "$avg": "$price" }}
}], function (err, data) { if (err) { return next(err); } res.json(data); });
I think I need to use $dayOfYear so there is different records for each hour of each day, and include a new Date() somewhere ...
Can someone help me do this correctly? any help is appreciated.
The $group pipeline stage works by "grouping" all data by the "key" specified for _id. Other fields you are actually aggregating are separate from the _id value and are their own field properties.
So your $group becomes this instead:
{ "$group": {
"_id": { "$hour": "$dt" },
"price": { "$avg": "$price" }
}}
Or if you want that broken by day then make a compound key:
{ "$group": {
"_id": {
"day": { "$dayOfYear": "$dt" },
"hour": { "$hour": "$dt" }
},
"price": { "$avg": "$price" }
}}
Or just use date math to produce Date objects rounded by hour:
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$dt", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$dt", new Date(0) ] },
1000 * 60 *60
]}
]},
new Date(0)
]
},
"price": { "$avg": "$price" }
}}
Where subrtacting another date object (epoch date) from another prodces a numeric value you can round ( 1000 milliseconds, 60 seconds, 60 minutes = 1 hour ) with the applied math, and adding a number to a date object produces a date corresponding to that value.
So your problem was you had everything in the _id, where the $avg accumulator is not recognised. All accumulators need to be specified outside of the grouping key. That is the intent.
If you want to make an accumulator value part of a grouping key ( does not seem relevant here though ), you instead follow with another group stage, referencing the field that was produced from the former.