How to collect specific samples from MongoDB collections? - mongodb

I have a MongoDB collection "Events" with 1 million documents similar to:
{
"_id" : 32423,
"testid" : 43212,
"description" : "fdskfhsdj kfsdjfhskdjf hksdjfhsd kjfs",
"status" : "error",
"datetime" : ISODate("2018-12-04T15:55:00.000Z"),
"failure" : 0,
}
Considering the documents were sorted based on datetime field (ascending), I want to check them in the chronical order one by one and pick only the records where the "failure" field was 0 in the previous document and it is 1 in the current document. I want to skip other records in between.
For example, if I also have the following records:
{
"_id" : 32424,
....
"datetime" : ISODate("2018-12-04T16:55:00.000Z"),
"failure" : 0,
}
,
{
"_id" : 32425,
....
"datetime" : ISODate("2018-12-04T17:55:00.000Z"),
"failure" : 1,
}
,
{
"_id" : 32426,
....
"datetime" : ISODate("2018-12-04T18:55:00.000Z"),
"failure" : 0,
}
I only want to collect the one with "_id:32425", and repeat the same policy for the following cases.
Of course, if I extract all the data at once, then I can process it using Python for instance. But, extracting all the records would be really time-consuming (1 million documents!).
Is there a way to do the above via MongoDB commands?

Related

Mongo query to find documents which have date less than a particular date and delete it

So this is my data in mongodb
{
"_id" : ObjectId("60f67dc955784b233692a0f2"),
"Id" : "9153",
"InfoList" : [
{
"itemId" : "42342",
"price" : 1009.0,
"date" : ISODate("2021-01-01T08:30:36.131Z")
}
]
}
{
"_id" : ObjectId("6105668a55784bd00ef3ebc6"),
"Id" : "894249",
"InfoList" : [
{
"itemId" : "42342",
"price" : 23.0,
"date" : ISODate("2021-01-01T08:30:36.131Z")
},
{
"itemId" : "3221",
"price" : 44554.0,
"date" : ISODate("2013-07-31T15:05:10.042Z")
}
]
}
I want to find all the items in InfoList for all the documents whose date is less than 2021-02-09 and then delete them.
This is the code that I am using
Query query = new Query();
query.addCriteria(Criteria.where("InfoList")
.elemMatch(Criteria
.where("date")
.lte(date)));
return mongoTemplate.findAllAndRemove(query,ProductInfo.class, CollectionName);
But this code is neither finding the documents which have date < 2021-02-01 nor deleting them. Any suggestions regarding what might be wrong here ?
I'm not familiar with Mongo query I know mongoose more
But do it like this 1-get all the of the object
It will return an array
On this array
2-do an if to check if the date is less than 2021-02-09 if yes remove it using its id

String timestamp performance in Mongo

I have a collection in mongo [4.0.9] which stores document like below
{ "_id" : 187489726, "mykey" : { "data" : [ { "id" : 2, "value" : "No" } ], "timestamp" : "2020-06-03 10:40:52.718" } }
If I fire queries like so, would it have performance impact because timestamp is string in the collection.
db.mycollection.find({"mykey.timestamp" : {$get : "2020-06-05 10:00:10.269"}},{"mykey": 1}).count()
I would have close 20 million records in DB.
Could not find any benchmark or documentation such a scenario. I can still convert all in ISO format.

How do i remove duplicates in mongodb?

I have a database which consists of few collections , i have tried copying from one collection to another .
In this process connection was lost and had to recopy them
now i find around 40000 records duplicates.
Format of my data:
{
"_id" : ObjectId("555abaf625149715842e6788"),
"reviewer_name" : "Sudarshan A",
"emp_name" : "Wilson Erica",
"evaluation_id" : NumberInt(550056),
"teamleader_id" : NumberInt(17199),
"reviewer_id" : NumberInt(1659),
"team_manager" : "Las Vegas",
"teammanager_id" : NumberInt(12245),
"team_leader" : "Thomas Donald",
"emp_id" : NumberInt(7781)
}
here only evaluation id is unique.
Queries that i have tried:
ensureIndex({id:1}, {unique:true, dropDups:true})
dropDups was removed in mongodb ~2.7.
Here is other realization method
but I don't test it

How can I select a number of records per a specific field using mongodb?

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more

mongodb get elements which was inserted after some document

I have a document and I need to query mongodb database to return me all the documents which was inserted after current document.
Is it possible and how to do that query?
If you do not override the default _id field you can use that objectID (see the mongodb docs) to make a comparison by time. For instance, the following query will find all the documents that are inserted after curDoc has been inserted (assuming none overwrite the _id field):
>db.test.find({ _id : {$gt : curDoc._id}})
Note that these timestamps are not super granular, if you would like a finer grained view of the time that documents are inserted I encourage you to add your own timestamp field to the documents you are inserting and use that field to make such queries.
If you are using Insert time stamp as on of the parameter, you can query like below
> db.foo.find()
{ "_id" : ObjectId("514bf8bbbe11e483111af213"), "Name" : "abc", "Insert_time" : ISODate("2013-03-22T06:22:51.422Z") }
{ "_id" : ObjectId("514bf8c5be11e483111af214"), "Name" : "xyz", "Insert_time" : ISODate("2013-03-22T06:23:01.310Z") }
{ "_id" : ObjectId("514bf8cebe11e483111af215"), "Name" : "pqr", "Insert_time" : ISODate("2013-03-22T06:23:10.006Z") }
{ "_id" : ObjectId("514bf8eabe11e483111af216"), "Name" : "ijk", "Insert_time" : ISODate("2013-03-22T06:23:38.410Z") }
>
Here my Insert_time corresponds to the document inserted time, and following query will give you the documents after a particular Insert_time,
> db.foo.find({Insert_time:{$gt:ISODate("2013-03-22T06:22:51.422Z")}})
{ "_id" : ObjectId("514bf8c5be11e483111af214"), "Name" : "xyz", "Insert_time" : ISODate("2013-03-22T06:23:01.310Z") }
{ "_id" : ObjectId("514bf8cebe11e483111af215"), "Name" : "pqr", "Insert_time" : ISODate("2013-03-22T06:23:10.006Z") }
{ "_id" : ObjectId("514bf8eabe11e483111af216"), "Name" : "ijk", "Insert_time" : ISODate("2013-03-22T06:23:38.410Z") }
>