Okay, I'm a SQL Server based DBA, but there's a biz-critical app that uses MongoDB as a core component, and the problem is that it's grown too large. (The "large-ness" isn't critical yet, I'm trying to be proactive!)
Specifically, it has a "message log" collection which is over 400 GB, where the date-stamp is actually stored as a 2-element array [Int64, Int32], the 0th element being some measure of time (and the 1th element is just always '0').
So for example, a document:
{
"_id" : ObjectId("55ef63618c782e0afcf346cf"),
"CertNumber" : null,
"MachineName" : "WORKERBEE1",
"DateTime" : [
NumberLong(635773487051900000),
0
],
"Message" : "Waited 00:00:30.0013381 to get queue lock: Specs to verify",
"ScopeStart" : false
}
And just because 2 is better than 1, another example document:
{
"_id" : ObjectId("55ef63618c782e0afcf323be"),
"CertNumber" : null,
"MachineName" : "WORKERBEE2",
"DateTime" : [
NumberLong(635773487056430453),
0
],
"Message" : "Waited 00:00:30.0012345 to get queue lock: Specs to verify",
"ScopeStart" : false
}
I need to figure out two things:
What the heck does that "DateTime" really mean? It's not Unix Epoch time (seconds nor milliseconds); and even if I strip off the trailing 0's, it represents (in millis) 6/20/2171, so, unless we're building a time machine here, it makes no sense. If I strip off the last 6 digits, it means 2/23/1990, but even that doesn't seem likely, as this application has only existed since the early 2000's. (AFAIK)
Assuming we figure out #1, can we use some kind of command to remove (delete) all documents in the collection that are older than, say, 1/1/2016?
Again, I'm a SQL guy, so try to explain using analogs in that vein, e.g. "this is like your WHERE clause" and such.
PS: Yes, I read thru questions such as Find objects between two dates MongoDB and How do I convert a property in MongoDB from text to date type? , but so far nothing has jumped out at me.
Related
So i have a program constantly running watching an API endpoint. When something on the endpoint has changed, it updates the mongodb document exactly how I want it. Next, I want to be able to get what has changed in the document, and lets just say use that has a variable or whatever i decide to do with it. Right now, I can get it to update and just tell me that the document has been changed, but not get what EXACTLY has changed.
If you are replacing the document, you cannot get what was changed, but you can get the version of the document before the change was applied, so you can figure out the differences. In the official mongodb driver, use the FindOneAndReplace function to get the document as it was before the update, and then compare.
You can use:
read mongo log, or
write a code to compare between original data and updated data, or
you can add 1 more field which is updated to data
If you want more detailed version, I suggest you use mongo log mongoDB Profiler section Update Operation.
Here is the snippet of the log:
{
"op" : "update",
"ns" : "test.report",
"command" : {
"q" : { },
"u" : { "$rename" : { "a" : "b" } },
"multi" : true,
"upsert" : false
},
.............
Hope it helps.
In My social network I want to get the feed for member A , member A is following lets say 20 category/member.
when a category/member(followed by member A) do an activity it is inserted into a collection called recent_activity :
{
"content_id": "6", // content id member A is following
"content_type_id": "6",// content type (category , other member)
"social_network_id": "2", // the action category did (add/like/follow)
"member_id": "51758", //Member A
"date_added": ISODate("2014-03-23T11:37:03.0Z"),
"platform_id": NumberInt(2),
"_id": ObjectId("532ec75f6b1f76fa2d8b457b"),
"_type": {
"0": "Altibbi_Mongo_RecentActivity"
}
}
I want when member A login into the system to get last 10 activities for the categories/member
my problem :
How to get Only 10 activities for all categories/members.
It is better to do it in one query or to do a for loop.
For this use case, I'd suggest to invert the logic and keep a separate object of the last 10 activities for member A that is kept up-to-date all the time. While that solution is more write-heavy, it makes reading trivially simple and it can be extended very easily. I'd like to blatantly advertise a blog post I wrote a while ago about news feeds with mongodb which outlines this approach.
This 'fan-out' approach might seem overly complex at first, but when you think about importance filtering / ranking (a la facebook), push messages for particularly important events (facebook, twitter) or regular digest emails (practically all), you will get one location in your code to perform all this logic.
I think I commented that T'm not really seeing the selection criteria. So if you are "outside" of a single collection, then you have problems. But if your indicated fields are the things you want to "filter" by, then just do this:
db.collection.find({
"social_network_id": "2",
"content_type_id": "6",
"content_id": "6",
"member_id": { "$ne": "51758" }
})
.sort({ "$natural": -1 })
.limit(10);
So what does that do? You match the various conditions in the data to do the "category match" (if I understood what was meant), then you make sure you are not matching entries by the same member.
The last parts do the "natural" sort. This is important because the ObjectId is monotinic, or math speak for "ever increasing". This means the "newest" entries are always the "highest" value. So descending order is "latest" to "oldest".
And the very final part is a basic "limit". So just return the last 10 entries.
As long as you can "filter" within the same collection in whatever way you want, then this should be fine.
This is explanation of a simple search in mongodb, its taking more than 2.4 secs and more to retrieve data.If i add index(search params) it takes more than 5 secs.
Query
db.CX_EMPLOYEES.find({ "$or" : [{ "AML_FULLNAME" : /RAJ/ },
{ "AML_FULLALIAS" : /RAJ/ }] })
Explain
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 79,
"nscannedObjects" : 504570,
"nscanned" : 504570,
"nscannedObjectsAllPlans" : 504570,
"nscannedAllPlans" : 504570,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2423,
"indexBounds" : {},
"server" : "SERVER:27017"
}
Scheduled for the 2.6 version of MongoDb is a full text search feature. It's available as a development preview in current builds if enabled.
Given the nature of your query, it's likely to be the only option that might be effective using only MongoDb. As you're trying to do a "string contains" search according to the regular expressions you provided, the performance of doing a search to match strings on multiple fields will be terrible given the size of your collection. While it's a simple query in concept, translation to an efficient query is very difficult. Mongo needs to scan every document for a match. Breaking the words apart doesn't help, as Mongo still needs to scan every document.
If you can anchor the regular expression which means it would change to a "string starts with" rather than "string contains", the performance should be reasonable if you normalize the strings so that all character case is ignored, and realizing that matches will be exact still. For example, a is not รก and would need to be specially handled.
Mongo's support for this type of query really are limited for production use. You may find that the full text search feature isn't suitable either. If this query is important, I'd suggest considering alternative search mechanisms. Possibly looking at something like Elastic Search for example.
There are no any reasons to add index on this search params because you using regExp. Index can improve find with regExp only if regExp has an anchor for the beginning.
db.CX_EMPLOYEES.find({ "$or" : [{ "AML_FULLNAME" : /^RAJ/ }, { "AML_FULLALIAS" : /^RAJ/ }] })
From documentation:
$regex can only use an index efficiently when the regular expression has an anchor for the beginning (i.e. ^) of a string and is a case-sensitive match. Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.
There is not a lot of things you can do. You have half of a million elements and you are doing full scan on all of them. Not a surprise that it takes time. Moreover your search is based on regex which can be anywhere in the string. So indexes can not help you in this case.
If your search is based on words, you can try to create array from the string. For example string 'Salvador Domingo Dali' will be converted to ['Salvador', 'Domingo', 'Dali']. If you will add an index on this array and will try to look for 'Dali', that the search will take advantage of this index.
P.S. databases and indexes are not a silver bullet. Sometimes you need a better logic to be able to deal with a lot of data.
First, I'd like to say that I really love NoSQL & MongoDB but I've got some major concerns with its schema-less aspect.
Let's say I have 2 tables. Employees and Movies.
And... I have a very stupid data layer / framework that sometimes like to save objects in the wrong tables.
So one day, a Movie gets saved in the Employees table. Like this:
> use mongoTests;
switched to db mongoTests
> db.employees.insert({ name : "Max Power", sex : "Male" });
> db.employees.find();
{ "_id" : ObjectId("4fb25ce6420141116081ae57"), "name" : "Max Power", "sex" : "Male" }
> db.employees.insert({ title : "Fight Club", actors : [{ name : "Brad Pitt" }, { name : "Edward Norton" }]});
> db.employees.find();
{ "_id" : ObjectId("4fb25ce6420141116081ae57"), "name" : "Max Power", "sex" : "Male" }
{ "_id" : ObjectId("4fb25db834a31eb59101235b"), "title" : "Fight Club", "actors" : [ { "name" : "Brad Pitt" }, { "name" : "Edward Norton" } ] }
This is VERY wrong.
Let's switch the context, think about Movies, and CreditCards (for whatever reason, in this context credit cards would be stored in clear text inside the DB). This is SUPER WRONG?
The code would probably explode because it's trying to use an object
structure and receives another totally unknown structure.
Even worst, the code actually works and the webstore visitors
actually see credit cards information in the "Rent a movie" list.
Is there anything, built-in that would prevent such threat to ever happen? Like some way to "force" a schema to be respected for only some tables?
Or is there any way to force MongoDB to make a schema mandatory? (Can't create new fields in a table, etc)
EDIT: For those who thinks I'm trolling, I'm really not, this is an important question for me and my team because this is a big decision whether or not we're going to use NoSQL.
Thanks and have a nice day.
The schema-less aspect is one of the major positives.
A DB with a schema doesn't fully remove this kind of issue - e.g. there could be a bug in a system that uses a RDBMS that puts the wrong data in the wrong field/table.
IMHO, the bigger concern would be, how did that kind of bug make it through dev, testing and out into production?!
Having said that, you could set up a process that checks the "schema" of documents within a collection (e.g. look at newly added documents, check whether they have fields you would expect to see in there) - then flag up for investigation. There is such a tool (node.js) here (I think, I've never used it):
http://dhendo.github.com/node-mongodb-schema-validator/
Edit:
For those finding this question in future, so the link in my comment doesn't go overlooked, there's a jira item for this kind of thing here:
http://jira.mongodb.org/browse/SERVER-3536
Ok, so I'm trying to roll out a small update to my site. One update includes querying upon a field that may or may not exist. This doesn't work as I want, so I decided to just make it so that the field always exists in my database. I used this line at the MongoDB shell:
> db.entries.update({Published: null},{$set: {Published: true}},false,true);
Now, I'm not fully understanding how this caused every entry object where Published is null to be deleted. I mean, it literally was deleted. I tried looking up some IDs, and .findOne will return null for them.
How does that line work? I thought it would take every entry where Published is null(doesn't exist), and set Published to true.
Reading about operator behavior is better than guessing operator behavior. Search for null is different from performing a check for existence.
MongoDB has a dedicated $exists operator:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24exists
To be honest, I'm not sure why it now works with changes, or at least, why it deleted everything with that command.
My ending command looked like this:
db.entries.update({Published: {$exists: false},$atomic: true},{$set:{"Published":true}},false,true);
I thought it would take every entry where Published is null(doesn't exist), and set Published to true.
OK, so these are two different things.
Published is null:
{ Published : null, post : 'blah' }
Published does not exist:
{ post : 'blahblah' }
You may want to post this question over at the MongoDB user group (developers check it very often) at http://groups.google.com/group/mongodb-user
Updates do not delete documents. In fact, the update you ran does what you intended, for example, if you wanted y to always have a value:
> db.foo.insert({x:1})
> db.foo.insert({x:2})
> db.foo.insert({y:null})
> db.foo.insert({y:1})
> db.foo.update({y:null},{$set : {y:true}}, false, true)
> db.foo.find()
{ "_id" : ObjectId("4db02aabbe5a5418fb65d24c"), "y" : true }
{ "_id" : ObjectId("4db02aafbe5a5418fb65d24d"), "y" : 1 }
{ "_id" : ObjectId("4db02aa1be5a5418fb65d24a"), "x" : 1, "y" : true }
{ "_id" : ObjectId("4db02aa4be5a5418fb65d24b"), "x" : 2, "y" : true }
There must have been another operation that did the delete. There might be a record of it in the logs (or there might not... it depends how long it took). It's impossible to tell from the info here what caused the deletions, but the update isn't the culprit here.