Determining whether a Mongo document is the first one with a particular attribute - mongodb

I need to produce a report on a set of documents with timestamps between two dates. The report needs to list each document, but it also needs to include a field for each document to indicate whether it's the first document in its group, which is indicated by an attribute.
There's a slight complication in the fact that although only documents between the two dates should be included, documents before the start date need to be considered when deciding if each document is the first it its set.
E.g. given the data
{ "_id": 1, "group": "A", "timestamp": "2015-01-01" }
{ "_id": 2, "group": "B", "timestamp": "2015-01-02" }
{ "_id": 3, "group": "A", "timestamp": "2015-01-03" }
{ "_id": 4, "group": "C", "timestamp": "2015-01-04" }
{ "_id": 5, "group": "B", "timestamp": "2015-01-05" }
{ "_id": 6, "group": "C", "timestamp": "2015-01-06" }
Generating a report from 2015-01-02 to 2015-01-05 would return
{ "_id": 2, "group": "B", "timestamp": "2015-01-02", "first": 1 }
{ "_id": 3, "group": "A", "timestamp": "2015-01-03", "first": 0 }
{ "_id": 4, "group": "C", "timestamp": "2015-01-04", "first": 1 }
{ "_id": 5, "group": "B", "timestamp": "2015-01-05", "first": 0 }
Currently I'm doing this by sorting all documents by group then timestamp, then looping over the entire dataset keeping track of the previous row to decide if a row inside the date range is the first of its type. With a large dataset this is very slow - it feels as though there must be a better way involving grouping or something clever but my Mongo skills aren't up to the job - any suggestions?

Related

How to fetch records from mongoDB on the basis of duplicate data in multiple fields

The requirement is to fetch the document with some field having the same values in the particular collection..
For Example:
we have two documents below:
1. {
"_id": "finance100",
"status": "ACTIVE",
"customerId": "100",
"contactId": "contact_100",
"itemId": "profile_100",
"audit": {
"dateCreated": "2022-02-16T16:34:52.718539Z",
"dateModified": "2022-03-18T09:36:42.774271Z",
"createdBy": "41d38c187155427fa37c855a4d1868d1",
"modifiedBy": "41d38c187155427fa37c855a4d1868d1"
},
"location": "US"
}
2. {
"_id": "finance101",
"status": "ACTIVE",
"customerId": "100",
"contactId": "contact_100",
"itemId": "profile_100",
"audit": {
"dateCreated": "2022-02-16T16:34:52.718539Z",
"dateModified": "2022-03-18T09:36:42.774271Z",
"createdBy": "41d38c187155427fa37c855a4d1868d1",
"modifiedBy": "41d38c187155427fa37c855a4d1868d1"
},
"location": "US"
}
3. {
"_id": "finance101",
"status": "ACTIVE",
"customerId": "100",
"contactId": "contact_100",
"itemId": "profile_100",
"audit": {
"dateCreated": "2022-02-16T16:34:52.718539Z",
"dateModified": "2022-03-18T09:36:42.774271Z",
"createdBy": "41d38c187155427fa37c855a4d1868d1",
"modifiedBy": "41d38c187155427fa37c855a4d1868d1"
},
"location": "UK"
}
The following parameter should have the same values:
customerId
contactId
itemId
location
so, need the fetch those records, matching with these above parameters having the same values in all the documents.
So, it should fetch the first two records (1 and 2) because the values for customerId, contactId, itemId, and location is same in first two document and in 3rd document only the location value is different(so it will not be fetched).
Could you please share the appropriate mongo query to work for this. I tried aggeration but it did not work. Thanks in advance.
Need the fetch those records, matching with these above parameters having the same values in all the documents.

Strange MongoDB $setIntersection behaviour

I want to query a match between records in my db based on certain tags. The match would be calculated based on a formula and the intersection of the tags. Now, even querying the intersection doesn't work...always. Sometimes it does, sometimes it doesn't. In my example, if I change the displayName attribute to something else (add or remove one character, the query works. In its current state (for demo purposes) it doesn't as it does not deliver the one intersection match for the last doc with id 3.
https://mongoplayground.net/p/KAYPoV29RFO
That's my query:
db.collection.aggregate([
{
$match: {
"_id": "1"
}
},
{
"$lookup": {
from: "collection",
let: {
"criteria": "$tags"
},
pipeline: [
{
$project: {
"match": {
$setIntersection: [
"$tags",
"$$criteria"
]
},
}
}
],
as: "result"
}
},
{
$project: {
"tags": 0
}
},
])
Here is the example data (simplified):
[
{ "_id": "1", "tags": [{ "_id": "a", "displayName": "a", "level": 1}, {"_id": "b", "displayName": "b", "level": 2}, {"_id": "c", "displayName": "c", "level": 3}]},
{"_id": "2", "tags": [{"_id": "a", "displayName": "a", "level": 1}, {"_id": "b", "displayName": "b", "level": 2}]},
{"_id": "3", "tags": [{"_id": "a", "displayName": "a", "level": 1}, {"_id": "d", "displayName": "d", "level": 4}]}
]
and the result as it is: (expected is three matches for id 1, 2 matches for id 2 and one for the last id. However, the last result has 0 elements in the intersection result. Again, when i change "displayName" to "displayNam" or "displayNames" (obviously in all docs), it give the correct result...
[{
"_id": "1", "result": [
{"_id": "1", "match": [{"_id": "a", "displayName": "a", "level": 1}, {"_id": "b", "displayName": "b", "level": 2},{"_id": "c","displayName": "c","level": 3}]},
{"_id": "2", "match": [{"_id": "a", "displayName": "a", "level": 1}, {"_id": "b", "displayName": "b", "level": 2}]},
{"_id": "3","match": [*here should be the match to _id: "a", but it's not (always) there*]}
]
}]
Does anyone have an idea what I am missing here?

MongoDB Aggregation lookup how often Document is mentioned in other Collection

I need to know how often a Document from Collection A is mentioned in Collection B. I am currently doing this with a $lookup aggregation and the size of the resulting array, but I guess that there is a much nicer way to do that?
Example:
Collection A
{ "_id": 1, "name": "User 1" }
{ "_id": 2, "name": "User 2" }
Collection B
{ "_id": 1, "user": 1, ... }
{ "_id": 2, "user": 1, ... }
{ "_id": 3, "user": 2, ... }
Desired result:
{ "_id": 1, "name": "User 1", "mentions": 2 }
{ "_id": 2, "name": "User 2", "mentions": 1 }

Calculate aggregates in a bucket in Upsert MongoDB update statement

My application gets measurements from a device that should be stored in a MongoDB database. Each measurement contains values for several probes of the device.
The measurements should displayed in an aggregation for a certain amount of time. I'm using the Bucket pattern in order to prepare the aggregates and simplify indexing and querying.
The following sample shows a document:
{
"DeviceId": "Device1",
"StartTime": 100, "EndTime": 199,
"Measurements": [
{ "timestamp": 100, "probeValues": [ { "id": "1", "t": 30 }, { "id": "2", "t": 67 } ] },
{ "timestamp": 101, "probeValues": [ { "id": "1", "t": 32 }, { "id": "2", "t": 67 } ] },
{ "timestamp": 102, "probeValues": [ { "id": "1", "t": 34 }, { "id": "2", "t": 55 } ] },
{ "timestamp": 103, "probeValues": [ { "id": "1", "t": 27 }, { "id": "2", "t": 30 } ] }
],
"probeAggregates": [
{ "id": "1", "cnt": 4, "total": 123 },
{ "id": "2", "cnt": 4, "total": 219 }
]
}
Updating the values and calculating the aggregates in a single request works well if the document already exists (1st block: query, 2nd: update, 3rd: options):
{
"DeviceId": "Device1",
"StartTime": 100,
"EndTime": 199
},
{
$push: {
"Measurements": {
"timestamp": 103,
"probeValues": [ { "id": "1", "t": 27 }, { "id": "2", "t": 30 } ]
}
},
$inc: {
"probeAggregates.$[probeAggr1].cnt": 1,
"probeAggregates.$[probeAggr1].total": 27,
"probeAggregates.$[probeAggr2].cnt": 1,
"probeAggregates.$[probeAggr2].total": 30
}
},
{
arrayFilters: [
{ "probeAggr1.id": "1" },
{ "probeAggr2.id": "2" }
]
}
Now I want to extend the statement to do a upsert if the document does not exist yet. However, if I do not change the update statement at all, there is the following error:
The path 'probeAggregates' must exist in the document in order to apply array updates.
If I try to prepare the probeAggregates array in case of an insert (e.g. by using $setOnInsert or $addToSet), this leads to another error:
Updating the path 'probeAggregates.$[probeAggr1].cnt' would create a conflict at 'probeAggregates'
Both errors can be explained and seem legit. One way to solve this would be to change the document structure and create one document per device, timeframe and probe and by that simplify the required update statement. In order to keep the number of documents low, I'd rather solve this by changing the update statement. Is there a way to create a valid document in an upsert?
(as I'm just learning to use a document db, feel free to share your experience in the comments on whether it is a good goal to keep the number of documents low in real world scenarios)

Aggregate documents with diversity

[
{
"_id": 0,
"type": "cat"
},
{
"_id": 1,
"type": "dog"
},
{
"_id": 2,
"type": "cat"
},
{
"_id": 3,
"type": "unicorn"
}
]
How to select N documents and preserve diversity based on the field type.
Examples:
If I request 2 documents, I should never have two documents of type
"cat"
If I request 4 documents, I should only have 3 (only one of
"type": "cat" should be included)