Mongodb: splite string in aggregation pipeline - mongodb

this is my document that had stored into collection:
{
"_id" : UUID("61a2053c-1a79-4649-8793-df6c4dc1973"),
"NotificationId" : UUID("ad068e4e-10e2-528c-a74a-df6c4dd9211"),
"DistributionId" : UUID("f5445ea1-e6cb-4acd-9881-c4122df6c4d"),
"CreationDateTime" : ISODate("2016-07-13T04:20:38.697Z"),
"ExpirationDateTime" : ISODate("2099-01-01T00:00:00.000Z"),
"DeliveryType" : 1,
"DeliveryParams" : [],
"Address" : "Topics/Messages/Global",
"Payload" : "{\"Id\":\"ad067896-10e2-528c-er87-df6c4d123654\",\"CreationDateTime\":\"\\/Date(1468324824751)\\/\",\"DeviceId\":\"456987456985\",\"UserId\":\"64545678-1234-4834-4321-123456789012\",\"UserFullName\":\"test-user\",\"SystemId\":\"com.messaging\",\"SystemTitle\":\"message\",\"EventId\":\"messaging.message\",\"EventTitle\":\"ارسال پیام\",\"EventData\":[],\"BusinessCode\":\"1-2-4-4-5-6-9\",\"ProcessId\":\"55333333-4433-3333-7733-113333333399\",\"WorkItemId\":423458,\"WKT\":\"\"}",
"SendAttempts" : null,
"Sent" : ISODate("2016-11-10T10:01:22.140Z"),
"Delivered" : ISODate("1970-01-01T00:00:00.000Z")
}
My question is, How could I split \"BusinessCode\":\"1-2-4-4-5-6-9\" inside
Payload field. I just need BusinessCode:1-2-4-4-5-6-9 for store into the other field.
I used this script:
db.Messages.find().forEach(function(item)
{
id = item._id;
payload = item.Payload;
matched = payload.match(/\"BusinessCode\":\"(([1-2]?[0-9])-([0-9]*)-([0-9]*)-([0-9]*)-([0-9]*)-([0-9]*)-([0-9]*))\"/);
....
db.Messages.updateOne(
....
This payload.match return
BusinessCode":"1-2-4-4-5-6-9",1-2-4-4-5-6-9,1,2,4,4,5,6,9
this script for collection with a many document is not appropriate and has a low speed. I want to use aggregation pipeline.
How could I get exactly same response like payload.match, into aggregation pipeline ?

we don't have regexp substring in mongo as of 3.6.2
we can use substr methods to substring the business id, since the payload contains unicode characters, we need to use CodePoint (CP) to get the expected results
db.col.aggregate(
[
{$addFields : {
start : {$indexOfCP : ["$Payload", "BusinessCode"]},
end : { $indexOfCP : ["$Payload", "ProcessId"]}
}
},
{$project : {BusinessCode : {$substrCP : ["$Payload", {$sum : ["$start",15]}, {$subtract : [{$subtract : ["$end", "$start"]}, 18]}]}}}
]
)
result
{ "_id" : "61a2053c-1a79-4649-8793-df6c4dc1973", "BusinessCode" : "1-2-4-4-5-6-9" }

Related

And Operator in Criteria not working as expected for nested documents inside aggregation Spring Data Mongo

I am trying to fetch total replies where read values for a replies is true. But I am getting count value as 3 but expected value is 2 (since only two read value is true) through Aggregation function available in Spring Data Mongo. Below is the code which I wrote:
Aggregation sumOfRepliesAgg = newAggregation(match(new Criteria().andOperator(Criteria.where("replies.repliedUserId").is(userProfileId),Criteria.where("replies.read").is(true))),
unwind("replies"), group("replies").count().as("repliesCount"),project("repliesCount"));
AggregationResults<Comments> totalRepliesCount = mongoOps.aggregate(sumOfRepliesAgg, "COMMENTS",Comments.class);
return totalRepliesCount.getMappedResults().size();
Using AND Operator inside Criteria Query and passed two criteria condition but not working as expected. Below is the sample data set:
{
"_id" : ObjectId("5c4ca7c94807e220ac5f7ec2"),
"_class" : "com.forum.api.domain.Comments",
"comment_data" : "logged by karthe99",
"totalReplies" : 2,
"replies" : [
{
"_id" : "b33a429f-b201-449b-962b-d589b7979cf0",
"content" : "dasdsa",
"createdDate" : ISODate("2019-01-26T18:33:10.674Z"),
"repliedToUser" : "#karthe99",
"repliedUserId" : "5bbc305950a1051dac1b1c96",
"read" : false
},
{
"_id" : "b886f8da-2643-4eca-9d8a-53f90777f492",
"content" : "dasda",
"createdDate" : ISODate("2019-01-26T18:33:15.461Z"),
"repliedToUser" : "#karthe50",
"repliedUserId" : "5c4bd8914807e208b8a4212b",
"read" : true
},
{
"_id" : "b56hy4rt-2343-8tgr-988a-c4f90598h492",
"content" : "dasda",
"createdDate" : ISODate("2019-01-26T18:33:15.461Z"),
"repliedToUser" : "#karthe50",
"repliedUserId" : "5c4bd8914807e208b8a4212b",
"read" : true
}
],
"last_modified_by" : "karthe99",
"last_modified_date" : ISODate("2019-01-26T18:32:41.394Z")
}
What is the mistake in the query that I wrote?

MongoDB searching for gaps in indices

I am caching data from an online resource for future use in machine learning. This data is canonical and has no missing entries.
In the event that the real-time connection is dropped or the machine rebooted, I have a safeguard in place that does a historical search for a range of ids that are missing from the cache.
What I have yet to implement, however, is a mechanism for searching through the collection and identifying ranges where id values have been skipped.
For instance:
{"entry_id": 27497713, ...}
{"entry_id": 27497761, ...}
This data has a clear gap where entries are missing between 27497713 and 27497761.
Is there a way I can find such a gap using queries? Perhaps at least narrowing it down by selecting values between two ranges and checking the count of returned entries? Given how many entries the collection contains, I am trying to avoid lots of queries for efficiency.
can you try this aggregation
$group - get $min and $max
$addFields - generate $range by $min and $max entry_id
$lookup - self lookup with generated range ids and entry ids
$project - get only non matching range ids using setDifference
pipeline
db.entries.aggregate(
[
{$group : {_id : null, min : {$min : "$entry_id"}, max : {$max : "$entry_id"}}},
{$addFields : {rangeIds : {$range : ["$min", "$max"]}}},
{$lookup : {from : "entries", localField : "rangeIds", foreignField : "entry_id", as : "entries"}},
{$project : {_id :0, missingIds : {$setDifference : ["$rangeIds", "$entries.entry_id"]}}}
]
)
collection
> db.entries.find()
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad22"), "entry_id" : 27497713 }
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad23"), "entry_id" : 27497761 }
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad24"), "entry_id" : 27497750 }
>
aggregate result
> db.entries.aggregate( [ {$group : {_id : null, min : {$min : "$entry_id"}, max : {$max : "$entry_id"}}}, {$addFields : {rangeIds : {$range : ["$min", "$max"]}}}, {$lookup : {from : "entries", localField : "rangeIds", foreignField : "entry_id", as : "entries"}}, {$project : {_id :0, missingIds : {$setDifference : ["$rangeIds", "$entries.entry_id"]}}} ] )
{ "missingIds" : [ 27497714, 27497715, 27497716, 27497717, 27497718, 27497719, 27497720, 27497721, 27497722, 27497723, 27497724, 27497725, 27497726, 27497727, 27497728, 27497729, 27497730, 27497731, 27497732, 27497733, 27497734, 27497735, 27497736, 27497737, 27497738, 27497739, 27497740, 27497741, 27497742, 27497743, 27497744, 27497745, 27497746, 27497747, 27497748, 27497749, 27497751, 27497752, 27497753, 27497754, 27497755, 27497756, 27497757, 27497758, 27497759, 27497760 ] }
>

Get specific object in array of array in MongoDB

I need get a specific object in array of array in MongoDB.
I need get only the task object = [_id = ObjectId("543429a2cb38b1d83c3ff2c2")].
My document (projects):
{
"_id" : ObjectId("543428c2cb38b1d83c3ff2bd"),
"name" : "new project",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"members" : [
ObjectId("5424ac37eb0ea85d4c921f8b")
],
"US" : [
{
"_id" : ObjectId("5434297fcb38b1d83c3ff2c0"),
"name" : "Test Story",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"tasks" : [
{
"_id" : ObjectId("54342987cb38b1d83c3ff2c1"),
"name" : "teste3",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
},
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
]
}
]
}
Result expected:
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
But i not getting it.
I still testing with no success:
db.projects.find({
"US.tasks._id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}, { "US.tasks.$" : 1 })
I tryed with $elemMatch too, but return nothing.
db.projects.find({
"US" : {
"tasks" : {
$elemMatch : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}
}
}
})
Can i get ONLY my result expected using find()? If not, what and how use?
Thanks!
You will need an aggregation for that:
db.projects.aggregate([{$unwind:"$US"},
{$unwind:"$US.tasks"},
{$match:{"US.tasks._id":ObjectId("543429a2cb38b1d83c3ff2c2")}},
{$project:{_id:0,"task":"$US.tasks"}}])
should return
{ task : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
Explanation:
$unwind creates a new (virtual) document for each array element
$match is the query part of your find
$project is similar as to project part in find i.e. it specifies the fields you want to get in the results
You might want to add a second $match before the $unwind if you know the document you are searching (look at performance metrics).
Edit: added a second $unwind since US is an array.
Don't know what you are doing (so realy can't tell and just sugesting) but you might want to examine if your schema (and mongodb) is ideal for your task because the document looks just like denormalized relational data probably a relational database would be better for you.

substract two date end return value

I need help to build a query to substract two dates in mongodb.
I have some documents like above :
{"_id" : "32472034809", "center": "102030", dateArq : 141010, inDate : "ISODate("2014-06-06T02:57:19.000-03:00)", biDate : ISODate("2014-06-07T02:57:19.000-03:00)"}
And Im trying to write a query
db.teste.aggregation([{$match : {dateArq : 141010}},{$project : {$subtract : ["$biDate" "$inDate"]}}])
In fact, I want to do : for each _id I want to result biDate - inDate , because I need to see if dateArq keep in a line constante.
In Oracle I did
select dateArq, (biDate - inDate) diff from teste where dateArq = 141010
Tks for help
The document and aggregation pipeline provided had syntax problems, and you needed to put a field name for the result of the $subtract, but otherwise your pipeline works for me:
> db.test.findOne()
{
"_id" : "32472034809",
"center" : "102030",
"dateArq" : 141010,
"inDate" : ISODate("2014-06-06T05:57:19Z"),
"biDate" : ISODate("2014-06-07T05:57:19Z")
}
> db.test.findOnedb.test.aggregate([
{ "$match" : { "dateArq" : 141010 } },
{ "$project" : { "dateDiff" : { "$subtract" : ["$biDate", "$inDate"] } } }
])
{ "_id" : "32472034809", "dateDiff" : NumberLong(86400000) }

How can I select a number of records per a specific field using mongodb?

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more