I have following index:
PaymentSchema.index({ driver_id: 1, year: 1, month: 1 },{ unique: true });
So I want this collection to hold just one record for each different combination of fields driver_id, year and month. I want to update collection with upsert option:
var query = {
driver_id: req.params.driver_id,
year: req.params.year,
month: req.params.month,
amount: req.params.old_value
};
var update = {
$set: {
amount: req.params.new_value
}
};
var options = {
upsert: true
};
Payment.update(query,update,options,function(err,rows){
if(err) return next(err);
res.json({});
});
So what I want is to update document with given unique key (driver_id+year+month) and with additional condition amount = .... If query conditions are ok document should be updated - and it works. If document is not found according to this conditions and document with unique index does not exist it is created. But if document with unique index exists (only amount condition is incorrect) then a new document in created with same unique index (driver_id + year + month). It is strange because I declared unique index on those 3 fields (driver_id+year+month) and I can see in mongoshell that there exist two documents with those fields the same...
Solved: I had to restart mongod and delete database (probably reindex option would work too).
Related
I sort and paginate records by a last_message_id field which takes an ObjectID (not timestamp), though both of them are the same if we deep dive.
The problem is some of the records may be ignored in pagination if we paginate using timestamp, because timestamps are not unique and they can be duplicate.
In my project there are two collections, "Rooms" and "Messages". Whenever a user sends a message, The associated room of the message will be updated with the _id of the message.
I do this because i want to show recently active rooms to users.
Is there a solution to the problem?
Rooms:
_id
room_name
last_message_id
x
General
612a8e6ab075cf9f2b8c6f9d
y
Politics
612a8e6ab075cf9f2b8c6f9e
Messages:
_id
room_id
text
612a8e6ab075cf9f2b8c6f9e
y
...
612a8e6ab075cf9f2b8c6f9d
x
...
I solved the issue by adding a compound index:
['last_message_id', '_id']
The order of the fields must not be changed.
We can also use a timestamp (created_at) rather than an ObjectId (last_message_id)
Example:
// Pagination result: (limit 1)
[{_id: "613091804ae06dde2e507ff5", last_message_id: "612ff5e11d3130cc1fb1448e"}]
// to get more results:
// http://www.website.com/results?id=613091804ae06dde2e507ff5&last_message_id=612ff5e11d3130cc1fb1448e
// SudoQuery:
let id = req.query.id;
let last_message_id = req.query.last_message_id;
let query = {};
if (id && last_message_id) {
query = {
$or: [
{
last_message_id: {
$lt: last_message_id
}
},
{
last_message_id,
_id: {
$lt: id
}
}
]
};
}
db.items.find(query).sort({last_message_id: -1, _id: -1}).limit(1);
In Meteor project, I would like find the first item in a collection. It's for a page with a form in which I can edit the content.
I've created a new collection with my data. How can I target only this item without hardcoding the _id in the code (BEURK)?
My router :
Router.route('/admin/about/edit', {
name: 'aboutContentAdmin',
layoutTemplate: 'adminLayout',
data: function() {
var about = About.find().sort({
x: 1 // doesn't work
});
return {
about: about
};
}
});
Thank you !
To guarantee the order you will need a key to sort on. The _id field is not naturally sorted in Meteor. The normal pattern is to add a createdAt key of type Date and sort on that. You can then limit the return set to a single document to get the first document:
var about = About.findOne({},{sort: {createdAt: 1}, limit: 1});
About.findOne({sort: {createdAt: 1}, limit: 1});
I have a collection1 of documents with tags in MongoDB. The tags are an embedded array of strings:
{
name: 'someObj',
tags: ['tag1', 'tag2', ...]
}
I want to know the count of each tag in the collection. Therefore I have another collection2 with tag counts:
{
{
tag: 'tag1',
score: 2
}
{
tag: 'tag2',
score: 10
}
}
Now I have to keep both in sync. It is rather trivial when inserting to or removing from collection1. However when I update collection1 I do the following:
1.) get the old document
var oldObj = collection1.find({ _id: id });
2.) calculate the difference between old and new tag arrays
var removedTags = $(oldObj.tags).not(obj.tags).get();
var insertedTags = $(obj.tags).not(oldObj.tags).get();
3.) update the old document
collection1.update(
{ _id: id },
{ $set: obj }
);
4.) update the scores of inserted & removed tags
// increment score of each inserted tag
insertedTags.forEach(function(val, idx) {
// $inc will set score = 1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: 1 } },
{ upsert: true }
)
});
// decrement score of each removed tag
removedTags.forEach(function(val, idx) {
// $inc will set score = -1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: -1 } },
{ upsert: true }
)
});
My questions:
A) Is this approach of keeping book of scores separately efficient? Or is there a more efficient one-time query to get the scores from collection1?
B) Even if keeping book separately is the better choice: can that be done in less steps, e.g. letting mongoDB calculate what tags are new / removed?
The solution, as nickmilion correctly states, would be an aggregation. Though I would do it with a nack: we'll save its results in a collection. What will do is to trade real time results for an extreme speed boost.
How I would do it
More often than not, the need for real time results is overestimated. Hence, I'd go with precalculated stats for the tags and renew it every 5 minutes or so. That should be well enough, since most of such calls are requested async by the client and hence some delay in case the calculation has to be made on a specific request is negligible.
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
)
db.tagStats.ensureIndex({lastRun:1}, {sparse:true})
Ok, here is the deal. First, we unwind the tags array, group it by the individual tags and increment the score for each occurrence of the respective tag. Next, we upsert lastRun in the tagStats collection, which we can do since MongoDB is schemaless. Next, we create a sparse index, which only holds values for documents in which the indexed field exists. In case the index already exists, ensureIndex is an extremely cheap query; however, since we are going to use that query in our code, we don't need to create the index manually. With this procedure, the following query
db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
)
becomes a covered query: A query which is answered from the index, which tends to reside in RAM, making this query lightning fast (slightly less than 0.5 msecs median in my tests). So what does this query do? It will return a record when the last run of the aggregation was run more than 5 minutes ( 5*60*1000 = 300000 msecs) ago. Of course, you can adjust this to your needs.
Now, we can wrap it up:
var hasToRun = db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
);
if(hasToRun){
db.tags.aggregate(
{$unwind:"$tags"},
{$group: {_id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
);
db.tagStats.ensureIndex({lastRun:1},{sparse:true});
}
// For all stats
var tagsStats = db.tagStats.find({score:{$exists:true}});
// score for a specific tag
var scoreForTag = db.tagStats.find({score:{$exists:true},_id:"tag1"});
Alternative approach
If real time results really matter and you need the stats for all the tags, simply use the aggregation without saving it to another collection:
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
)
If you only need the results for one specific tag at a time, a real time approach could be to use a special index, create a covered query and simply count the results:
db.tags.ensureIndex({tags:1})
var numberOfOccurences = db.tags.find({tags:"tag1"},{_id:0,tags:1}).count();
answering your questions:
B): you don't have to calculate the dif yourself use $addToSet
A): you can get the counts via aggregation framework with a combination of $unwind and $count
I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"),
"source_references" : [
"_id" : ObjectId("5045xxxxxxxxxxxxxx"),
"name" : "xxx",
"key" : 123
]
}
I am having a lot of duplicate records in the collection having same source_references.key. (By Duplicate I mean, source_references.key not the _id).
I want to remove duplicate records based on source_references.key, I'm thinking of writing some PHP code to traverse each record and remove the record if exists.
Is there a way to remove the duplicates in Mongo Internal command line?
This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:
db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.
Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.
Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.
This is the easiest query I used on my MongoDB 3.2
db.myCollection.find({}, {myCustomKey:1}).sort({_id:1}).forEach(function(doc){
db.myCollection.remove({_id:{$gt:doc._id}, myCustomKey:doc.myCustomKey});
})
Index your customKey before running this to increase speed
While #Stennie's is a valid answer, it is not the only way. Infact the MongoDB manual asks you to be very cautious while doing that. There are two other options
Let the MongoDB do that for you using Map Reduce
Another way
You do programatically which is less efficient.
Here is a slightly more 'manual' way of doing it:
Essentially, first, get a list of all the unique keys you are interested.
Then perform a search using each of those keys and delete if that search returns bigger than one.
db.collection.distinct("key").forEach((num)=>{
var i = 0;
db.collection.find({key: num}).forEach((doc)=>{
if (i) db.collection.remove({key: num}, { justOne: true })
i++
})
});
I had a similar requirement but I wanted to retain the latest entry. The following query worked with my collection which had millions of records and duplicates.
/** Create a array to store all duplicate records ids*/
var duplicates = [];
/** Start Aggregation pipeline*/
db.collection.aggregate([
{
$match: { /** Add any filter here. Add index for filter keys*/
filterKey: {
$exists: false
}
}
},
{
$sort: { /** Sort it in such a way that you want to retain first element*/
createdAt: -1
}
},
{
$group: {
_id: {
key1: "$key1", key2:"$key2" /** These are the keys which define the duplicate. Here document with same value for key1 and key2 will be considered duplicate*/
},
dups: {
$push: {
_id: "$_id"
}
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
"$gt": 1
}
}
}
],
{
allowDiskUse: true
}).forEach(function(doc){
doc.dups.shift();
doc.dups.forEach(function(dupId){
duplicates.push(dupId._id);
})
})
/** Delete the duplicates*/
var i,j,temparray,chunk = 100000;
for (i=0,j=duplicates.length; i<j; i+=chunk) {
temparray = duplicates.slice(i,i+chunk);
db.collection.bulkWrite([{deleteMany:{"filter":{"_id":{"$in":temparray}}}}])
}
Expanding on Fernando's answer, I found that it was taking too long, so I modified it.
var x = 0;
db.collection.distinct("field").forEach(fieldValue => {
var i = 0;
db.collection.find({ "field": fieldValue }).forEach(doc => {
if (i) {
db.collection.remove({ _id: doc._id });
}
i++;
x += 1;
if (x % 100 === 0) {
print(x); // Every time we process 100 docs.
}
});
});
The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.
Also, indexing the field before the operation helps.
pip install mongo_remove_duplicate_indexes
create a script in any language
iterate over your collection
create new collection and create new index in this collection with unique set to true ,remember this index has to be same as index u wish to remove duplicates from in ur original collection with same name
for ex-u have a collection gaming,and in this collection u have field genre which contains duplicates,which u wish to remove,so just create new collection
db.createCollection("cname")
create new index
db.cname.createIndex({'genre':1},unique:1)
now when u will insert document with similar genre only first will be accepted,other will be rejected with duplicae key error
now just insert the json format values u received into new collection and handle exception using exception handling
for ex pymongo.errors.DuplicateKeyError
check out the package source code for the mongo_remove_duplicate_indexes for better understanding
If you have enough memory, you can in scala do something like that:
cole.find().groupBy(_.customField).filter(_._2.size>1).map(_._2.tail).flatten.map(_.id)
.foreach(x=>cole.remove({id $eq x})
i need to know if is possible to have a list of objects, where the objects are uniques by day.
I have a collection with this format:
{
domain: "google.com"
counters: [
{ day: "2011-08-03", metric1: 10, metric_2: 15 }
{ day: "2011-08-04", metric1: 08, metric_2: 07 }
{ day: "2011-08-05", metric1: 20, metric_2: 150 }
]
}
I tried something like that:
db.test.ensureIndex({ domain: 1, 'counters.day': 1 }, { unique: true }).
with upsert and $push, but this not works.
Then I tried with upsert and $addToSet. but i can't set the unique fields.
I need to push a new counter, if the day exists, it should be replaced.
Unique indexes working only for the root document, but not for the embedded. So that's mean that you can't insert two documents with same domain and counters.day. But you can insert into embedded counters duplicated rows.
I need to push a new counter, if the day exists, it should be
replaced.
When you trying to insert new embedded document you should check if document with such day exists and in case if it exists make an update, otherwise insert.