I want to query the oplog to find what are the operation made in a particular time.
How it possible to find and query oplog in MongoDB. Where oplog is placed? Please explain with an example...I could'nt find any tutorial over internet...
If you want to query by date, you need to generate a proper MongoDB timestamp. You can use the Date class for that purpose.
If you want for example to count the amount of entries between two dates, you can try a query similar to this one:
db.oplog.rs.count(
{
ts:
{
$gt: Timestamp(new Date("2020-02-01").getTime() / 1000, 1),
$lt: Timestamp(new Date("2020-03-01").getTime() / 1000, 1)
}
}
);
You have to always divide the Date milliseconds by 1000, because the Timestamp class operates with seconds after 1970-01-01.
Here is another example query which excludes some very usual entries in the oplog, like e.g "periodic noop":
db.oplog.rs.find(
{
ts:
{
$gt: Timestamp(new Date("2020-02-01").getTime() / 1000, 1),
$lt: Timestamp(new Date("2020-03-01").getTime() / 1000, 1)
},
"o.msg":
{
$ne: "periodic noop"
},
"ns" : {
$ne: "config.system.sessions"
}
}
).limit(5).pretty();
Simple query to read from oplog
query = {'ts': {'$gt': some_timestamp}} # Replace with your own query.
cursor = db.oplog.rs.find(query, **_TAIL_OPTS)
Related
I have a mongo collections with about 4M records, 2 of my document fields of this collection are date as string, and I need to change them to ISODate, so arote this small script to do that:
db.vendors.find().forEach(function(el){
el.lastEdited = new Date(el.lastEdited);
el.creationDate = new Date(el.creationDate)
db.vendors.save(el)
})
but it takes foreverrrr...and I added indexes to those fields, can't I do this in some other way which will be reasonable time to complete?
Currently you do a find against whole collection which could take a while, also for each save call it has to do a network roundtrip from the client (shell) to the server.
EDIT: Removed suggestion to use $match and do this in batches, because $out in fact replaces the collection on each run as noted in #Stennie's comment.
(Needless to say, test it in a sample dataset first in a test environment. I haven't tested behavior of new Date() as I don't know your data format)
db.vendors.aggregate([
{
$project: {
_id: 1,
lastEdited: { $add : [new Date('$lastEdited')]},
creationDate: { $add : [new Date('$creationDate')]},
field1: 1,
field2: 1,
//.. (important to repeat all necessary fields)
},
},
{
$out: 'vendors',
},
]);
A simplify situation is this:
There are 1000+ documents in the MongoDb collection, certain user(e.g. free account) can only operate on the first 100 documents.
The operation includes: find, update, and delete.
How to limit operation to first 100 documents in a collection? I have the following algorithem in mind:
1) find the first 100 documents
2) do find, update, delete, paginate only for this sub set of documents.
How to achieve this? If possible, please provide some sample code.
I would suggest to keep some numeric id field in your collection
In this way you can easily filter out your first n customers and you will not have to search records first and then process it according to your need.
Find a customer having first name as 'John'
Modal.find({
id: {
$gte: 1,
$lte: 100
},
first_name: 'John'
}
)
delete a customer which has first name as 'John'
Modal.deleteOne({
id: {
$gte: 1,
$lte: 100
},
first_name: "John"
}
)
Paginate by 10 doc per Page
Modal.find({
id: {
$gte: 1,
$lte: 100
}
}
).limit(10).skip(10)
I am developing a financial application with Nodejs. I wonder would it be possible to compute simple moving average which is the average last N days of price directly in Mongo Shell than reading it and computing it in Node js.
Document Sample.
[{code:'0001',price:0.10,date:'2014-07-04T00:00:00.000Z'},
{code:'0001',price:0.12,date:'2014-07-05T00:00:00.000Z'},{code:'0001',price:0.13,date:'2014-07-06T00:00:00.000Z'},
{code:'0001',price:0.12,date:'2014-07-07T00:00:00.000Z'}]
If you have more than a trivial number of documents you should use the DB server to do the work rather than JS.
You don't say if you are using mongoose or the node driver directly. I'll assume you are using mongoose as that is the way most people are headed.
So your model would be:
// models/stocks.js
const mongoose = require("mongoose");
const conn = mongoose.createConnection('mongodb://localhost/stocksdb');
const StockSchema = new mongoose.Schema(
{
price: Number,
code: String,
date: Date,
},
{ timestamps: true }
);
module.exports = conn.model("Stock", StockSchema, "stocks");
You rightly suggested that aggregation frameworks would be a good way to go here. First though if we are dealing with returning values between date ranges, the records in your database need to be date objects. From your example documents you may have put strings. An example of inserting objects with dates would be:
db.stocks.insertMany([{code:'0001',price:0.10,date:ISODate('2014-07-04T00:00:00.000Z')}, {code:'0001',price:0.12,date:ISODate('2014-07-05T00:00:00.000Z')},{code:'0001',price:0.13,date:ISODate('2014-07-06T00:00:00.000Z')}, {code:'0001',price:0.12,date:ISODate('2014-07-07T00:00:00.000Z')}])
The aggregation pipeline function accepts an array with one or more pipeline stages.
The first pipeline stage we should use is $match, $match docs, this filters the documents down to only the records we are interested in which is important for performance
{ $match: {
date: {
$gte: new Date('2014-07-03'),
$lte: new Date('2014-07-07')
}
}
}
This stage will send only the documents that are on the 3rd to 7th July 2014 inclusive to the next stage (in this case all the example docs)
Next stage is the stage where you can get an average. We need to group the values together based on one field, multiple fields or all fields.
As you don't specify a field you want to average over I'll give an example for all fields. For this we use the $group object, $group docs
{
$group: {
_id: null,
average: {
$avg: '$price'
}
}
}
This will take all the documents and display an average of all the prices.
In the case of your example documents this results in
{ _id: null, avg: 0.1175 }
Check the answer:
(0.10 + 0.12 + 0.12 + 0.13) / 4 = 0.1175
FYI: I wouldn't rely on calculations done with javascript for anything critical as Numbers using floating points. See https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html for more details if you are worried about that.
For completeness here is the full aggregation query
const Stock = require("./models/stocks");
Stock.aggregate([{ $match: {
date: {
$gte: new Date('2014-07-03'),
$lte: new Date('2014-07-07')
}
}},
{
$group: {
_id: null,
avg: {
$avg: '$price'
}
}
}])
.then(console.log)
.catch(error => console.error(error))
Not sure about your moving average formula, but here is how I would do it:
var moving_average = null
db.test.find().forEach(function(doc) {
if (moving_average==null) {
moving_average = doc.price;
}
else {
moving_average = (moving_average+doc.price)/2;
}
})
output:
> moving_average
0.3
And if you wan to define the N days to do the average for, just modify the argument for find:
db.test.find({ "date": { $lt: "2014-07-10T00:00:00.000Z" }, "date": { $gt: "2014-07-07T00:00:00.000Z" } })
And if you want to do the above shell code in one-line, you can assume that moving_average is undefined and just check for that before assigning the first value.
I have a collection1 of documents with tags in MongoDB. The tags are an embedded array of strings:
{
name: 'someObj',
tags: ['tag1', 'tag2', ...]
}
I want to know the count of each tag in the collection. Therefore I have another collection2 with tag counts:
{
{
tag: 'tag1',
score: 2
}
{
tag: 'tag2',
score: 10
}
}
Now I have to keep both in sync. It is rather trivial when inserting to or removing from collection1. However when I update collection1 I do the following:
1.) get the old document
var oldObj = collection1.find({ _id: id });
2.) calculate the difference between old and new tag arrays
var removedTags = $(oldObj.tags).not(obj.tags).get();
var insertedTags = $(obj.tags).not(oldObj.tags).get();
3.) update the old document
collection1.update(
{ _id: id },
{ $set: obj }
);
4.) update the scores of inserted & removed tags
// increment score of each inserted tag
insertedTags.forEach(function(val, idx) {
// $inc will set score = 1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: 1 } },
{ upsert: true }
)
});
// decrement score of each removed tag
removedTags.forEach(function(val, idx) {
// $inc will set score = -1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: -1 } },
{ upsert: true }
)
});
My questions:
A) Is this approach of keeping book of scores separately efficient? Or is there a more efficient one-time query to get the scores from collection1?
B) Even if keeping book separately is the better choice: can that be done in less steps, e.g. letting mongoDB calculate what tags are new / removed?
The solution, as nickmilion correctly states, would be an aggregation. Though I would do it with a nack: we'll save its results in a collection. What will do is to trade real time results for an extreme speed boost.
How I would do it
More often than not, the need for real time results is overestimated. Hence, I'd go with precalculated stats for the tags and renew it every 5 minutes or so. That should be well enough, since most of such calls are requested async by the client and hence some delay in case the calculation has to be made on a specific request is negligible.
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
)
db.tagStats.ensureIndex({lastRun:1}, {sparse:true})
Ok, here is the deal. First, we unwind the tags array, group it by the individual tags and increment the score for each occurrence of the respective tag. Next, we upsert lastRun in the tagStats collection, which we can do since MongoDB is schemaless. Next, we create a sparse index, which only holds values for documents in which the indexed field exists. In case the index already exists, ensureIndex is an extremely cheap query; however, since we are going to use that query in our code, we don't need to create the index manually. With this procedure, the following query
db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
)
becomes a covered query: A query which is answered from the index, which tends to reside in RAM, making this query lightning fast (slightly less than 0.5 msecs median in my tests). So what does this query do? It will return a record when the last run of the aggregation was run more than 5 minutes ( 5*60*1000 = 300000 msecs) ago. Of course, you can adjust this to your needs.
Now, we can wrap it up:
var hasToRun = db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
);
if(hasToRun){
db.tags.aggregate(
{$unwind:"$tags"},
{$group: {_id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
);
db.tagStats.ensureIndex({lastRun:1},{sparse:true});
}
// For all stats
var tagsStats = db.tagStats.find({score:{$exists:true}});
// score for a specific tag
var scoreForTag = db.tagStats.find({score:{$exists:true},_id:"tag1"});
Alternative approach
If real time results really matter and you need the stats for all the tags, simply use the aggregation without saving it to another collection:
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
)
If you only need the results for one specific tag at a time, a real time approach could be to use a special index, create a covered query and simply count the results:
db.tags.ensureIndex({tags:1})
var numberOfOccurences = db.tags.find({tags:"tag1"},{_id:0,tags:1}).count();
answering your questions:
B): you don't have to calculate the dif yourself use $addToSet
A): you can get the counts via aggregation framework with a combination of $unwind and $count
I know that ObjectIds contain the date they were created on. Is there a way to query this aspect of the ObjectId?
Popping Timestamps into ObjectIds covers queries based on dates embedded in the ObjectId in great detail.
Briefly in JavaScript code:
/* This function returns an ObjectId embedded with a given datetime */
/* Accepts both Date object and string input */
function objectIdWithTimestamp(timestamp) {
/* Convert string date to Date object (otherwise assume timestamp is a date) */
if (typeof(timestamp) == 'string') {
timestamp = new Date(timestamp);
}
/* Convert date object to hex seconds since Unix epoch */
var hexSeconds = Math.floor(timestamp/1000).toString(16);
/* Create an ObjectId with that hex timestamp */
var constructedObjectId = ObjectId(hexSeconds + "0000000000000000");
return constructedObjectId
}
/* Find all documents created after midnight on May 25th, 1980 */
db.mycollection.find({ _id: { $gt: objectIdWithTimestamp('1980/05/25') } });
In pymongo, it can be done this way:
import datetime
from bson.objectid import ObjectId
mins = 15
gen_time = datetime.datetime.today() - datetime.timedelta(mins=mins)
dummy_id = ObjectId.from_datetime(gen_time)
result = list(db.coll.find({"_id": {"$gte": dummy_id}}))
Using inbuilt function provided by mongodb drivers in in Node.js lets you query by any timestamp:
var timestamp = Date.now();
var objectId = ObjectID.createFromTime(timestamp / 1000);
Alternatively, to search for records before the current time, you can simply do:
var objectId = new ObjectID(); // or ObjectId in the mongo shell
Source: http://mongodb.github.io/node-mongodb-native/api-bson-generated/objectid.html
You can use $convert function to extract the date from ObjectId starting in 4.0 version.
Something like
$convert: { input: "$_id", to: "date" }
You can query on date comparing between start and end time for date.
db.collectionname.find({
"$expr":{
"$and":[
{"$gte":[{"$convert":{"input":"$_id","to":"date"}}, ISODate("2018-07-03T00:00:00.000Z")]},
{"$lte":[{"$convert":{"input":"$_id","to":"date"}}, ISODate("2018-07-03T11:59:59.999Z")]}
]
}
})
OR
You can use shorthand $toDate to achieve the same.
db.collectionname.find({
"$expr":{
"$and":[
{"$gte":[{"$toDate":"$_id"}, ISODate("2018-07-03T00:00:00.000Z")]},
{"$lte":[{"$toDate":"$_id"},ISODate("2018-07-03T11:59:59.999Z")]}
]
}
})
how to find Find the Command (this date[2015-1-12] to this Date[2015-1-15]):
db.collection.find({
_id: {
$gt: ObjectId(Math.floor((new Date('2015/1/12'))/1000).toString(16) + "0000000000000000"),
$lt: ObjectId(Math.floor((new Date('2015/1/15'))/1000).toString(16) + "0000000000000000")
}
}).pretty()
Count the Command (this date[2015-1-12] to this Date[2015-1-15]):
db.collection.count({
_id: {
$gt: ObjectId(Math.floor((new Date('2015/1/12'))/1000).toString(16) + "0000000000000000"),
$lt: ObjectId(Math.floor((new Date('2015/1/15'))/1000).toString(16) + "0000000000000000")
}
})
Remove the Command (this date[2015-1-12] to this Date[2015-1-15]):
db.collection.remove({
_id: {
$gt: ObjectId(Math.floor((new Date('2015/1/12'))/1000).toString(16) + "0000000000000000"),
$lt: ObjectId(Math.floor((new Date('2015/1/15'))/1000).toString(16) + "0000000000000000")
}
})
Since the first 4 bytes of an ObjectId represent a timestamp, to query your collection chronologically, simply order by id:
# oldest first; use pymongo.DESCENDING for most recent first
items = db.your_collection.find().sort("_id", pymongo.ASCENDING)
After you get the documents, you can get the ObjectId's generation time like so:
id = some_object_id
generation_time = id.generation_time
Yes you can query object by date using MongoDB inserted ID
db.collectionname.find({_id: {$lt: ObjectId.fromDate( new ISODate("TZformat") ) } });
let's suppose users is my collection and I want all users created less than 05 January 2018
db.users.find({_id: {$lt: ObjectId.fromDate( new ISODate("2018-01-05T00:00:00.000Z") ) } });
For running from a query we can use like
db.users.find({_id: {$lt: ObjectId.fromDate(new Date((new Date().getTime() - (1 * 3 * 60 * 60 * 1000))) ) } })
All the users from the current time - 3 hours
To get last 60 days old documents in mongo collection i used below query in shell.
db.collection.find({_id: {$lt:new ObjectId( Math.floor(new Date(new Date()-1000*60*60*24*60).getTime()/1000).toString(16) + "0000000000000000" )}})
If you want to make a range query, you can do it like in this post. For example querying for a specific day (i.e. Apr 4th 2015):
> var objIdMin = ObjectId(Math.floor((new Date('2015/4/4'))/1000).toString(16) + "0000000000000000")
> var objIdMax = ObjectId(Math.floor((new Date('2015/4/5'))/1000).toString(16) + "0000000000000000")
> db.collection.find({_id:{$gt: objIdMin, $lt: objIdMax}}).pretty()
From the documentation:
o = new ObjectId()
date = o.getTimestamp()
this way you have date that is a ISODate.
Look at
http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.
for more information
Using MongoObjectID you should also find results as given below
db.mycollection.find({ _id: { $gt: ObjectId("5217a543dd99a6d9e0f74702").getTimestamp().getTime()}});
A Solution Filtering within MongoDB Compass.
Based on versions:
Compass version: 1.25.0
MongoDB version: 4.2.8
Option 1:
#s7vr 's answer worked perfectly for me. You can paste this into the Filter field:
{$expr: { $and: [ {$gte: [{$toDate: "$_id"}, ISODate('2021-01-01')]}, {$lt: [{$toDate: "$_id"}, ISODate('2021-02-01')]} ] } }
Option 2:
I also found this to work (remember that the Date's month parameter is 0-based indexing so January is 0):
{_id: {$gte: ObjectId(Date(2021, 0, 1) / 1000), $lt: ObjectId(Date(2021, 1, 1) / 1000) } }
Option 3:
Equivalent with ISODate:
{_id: {$gte: ObjectId(ISODate('2021-01-01') / 1000), $lt: ObjectId(Date('2021-02-01') / 1000) } }
After writing this post, I decided to run the Explain on these queries. Here's the skinny on performance:
Option 1: 39 ms, 0 indexes used, 30 ms in COLLSCAN
Option 2: 0 ms, _id index used
Option 3: 1 ms, _id index used, 1 ms in FETCH
Based on my rudimentary analysis, it appears that option 2 is the most efficient. I will use Option 3, personally, as it is a little cleaner to use ISODate rather than remembering 0-based month indexing in the Date object.
In rails mongoid you can query using
time = Time.utc(2010, 1, 1)
time_id = ObjectId.from_time(time)
collection.find({'_id' => {'$lt' => time_id}})