How do I remove the duplicate Key check before insert in mongoDB? - mongodb

I use mongoDB to manage my DB for a yearly contest. Every year many users just renew their registration. MongoDB rejects duplicate emails, therefore they cannot register if they participated any of the edition of early years.
My question, is there any way to remove that limitation? Or maybe change the dup-key-check to be i.e. the "_id" (or whatever) instead of the "email"?

Apart from the mandatory _id field, MongoDB will only enforce uniqueness based on additional unique indexes that have been created. In your situation it sounds like there may be a unique index defined on { email: 1 }.
If that is not the logic that you wish to enforce, then you should drop that index and replace it with a different one. How exactly you define that really depends on your desired application logic. If you had a registrationYear field, for example, perhaps a compound unique index on both of those fields ({ email: 1, registrationYear: 1 }) would be appropriate.
But this probably isn't the only way to solve the problem. An alternative approach may be to combine a unique index with a partial index. With this approach, you could define as index as follows assuming that there is some active field in the document that becomes false after the specified amount of time:
db.foo.createIndex({ email: 1}, { unique: true, partialFilterExpression: { active: true } })
Such an index would only include documents that were currently considered active therefore only enforcing uniqueness on them. Once it was time to renew and an old document was no longer active the database would accept a new one.
Another alternative approach would be to just update the existing documents rather than creating new ones. Again this depends on what exactly you are trying to achieve, but you could use a similar approach of marking a document as no longer active and having the registration process perform an upsert (either an insert or an update).
Finally, if you don't need historical information at all then you could additionally do some sort of archival or deletion (perhaps via a TTL index) to expire the old documents.

Related

How should I efficiently delete alot of records from a mongodb collection?

This bounty has ended. Answers to this question are eligible for a +500 reputation bounty. Bounty grace period ends in 4 hours.
Jiew Meng wants to draw more attention to this question.
I am using Mongo to store multi tenant data. As part of data cleanup for a tenant I want to delete everything related to the tenant. The tenantId is indexed but there are alot of rows and it takes a long time to query and I have no easy way to get the progress.
Currently I do something
db.records.deleteMany({tenantId: x})
Is there a better way?
Thinking of doing in batches but like query for x records then build a list of ids to delete. Seems very manual but isit the recommended way?
Some options that I can think of.
Drop the index, before deleting. You can recreate the index after the deletion.
Change the write concern to a lower value, possibly 0. Request won't wait for acknowledgement from secondaries.
db.records.deleteMany({tenantId: x},{w : 0});
If there is another field with enough cardinality to reduce the number of documents, try including that in the query.
Ex: if anotherField as 0,1,2,3 as values, then execute the delete command 4 times, each time with different value.
db.records.deleteMany({tenantId: x, anotherField: 0},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 1},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 2},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 3},{w : 0});
The performance may depend on variety of different factors. But here are some options you can try to improve the performance
Bulk operations
Bulk operations might help here. bulk.find(query).remove() is a version of db.collection.remove(query) that optimized for large numbers of operations. You can read more about it here
You can use the following way:
Declare a search query:
var query= {tenantId: x};
Initialize and use a bulk:
var bulk = db.yourCollection.initializeUnorderedBulkOp()
bulk.find(query).remove() // or try delete() instead of remove()
bulk.execute()
The idea here rather not to speed up the removal, but to produce less load.
Also you could try bulkWrite()
db.yourCollection.bulkWrite([
{ deleteMany: {
"filter" : query,
}}
])
TTL indexes
It may be not suitable for your use case, but there's entirely another approach without removing by yourself at all.
If it is suitable for you to delete data based on a timestamp, then a TTL index might help you. The idea here is that the record is being removed when the TTL expires.
Implemented as a special index type, TTL collections make it possible
to store data in MongoDB and have the mongod automatically remove data
after a specified period of time.
DeleteMany I think, There must be something common between all the rows that you want to remove from the collection.
You can find out something and then create a query accordingly.
this will help you to remove those records fast.
Let me give you one example. I want to remove all the records where username is not exists.
db.collection.deleteMany({ username: {$exists: false} })
The best place to start is to find something that all records have in common in-order to removed them all at once.
For example the following code deletes all entries that don't contain an email address.
db.users.deleteMany({ email: { $exists: false } })
MongoDB documentation have great examples. Link provided below.
https://www.mongodb.com/docs/manual/reference/method/db.collection.deleteMany/#delete-multiple-documents
You might also want to consider dropping the index since it could be recreated after your done with the operation.
Finally you might want to lower the write concern in your operation in order to speed things up. A compile list of options can be found here
https://www.mongodb.com/docs/v5.0/reference/write-concern/#w-option
I found a good tutorial on https://www.geeksforgeeks.org/mongodb-delete-multiple-documents-using-mongoshell/ that might help you further.
apologies for any grammatical mistakes since English is not my native tongue
I would suggest two solutions, and also please export your model If anything goes wrong you will have a backup of your data or try this in your test DB first 
you can use your tenantId as a condition, not matching _id but with extra logic, like if any of the records do have the tenantId delete them so this way all of your tenant data will be removed using a single query.
db.records.deleteMany({tenantId : {$exists: true})
// suggestion- if any of your tenant data has a field tenantId but it is null you can check for a null value also to delete those records.
 
2) find command data in all of the records, if there is use it as a condition to delete those records.
for example, all of your tenant data have a common field called type with the same value use delete statement like
db.records.deleteMany({type : 1})

Mongodb id on bulk insert performance

I have a class/object that have a guid and i want to use that field as the _id object when it is saved to Mongodb. Is it possible to use other value instead of the ObjectId?
Is there any performance consideration when doing bulk insert when there is an _id field? Is _id an index? If i set the _id to different field, would it slow down the bulk insert? I'm inserting about 10 million records.
1) Yes you can use that field as the id. There is no mention of what API (if any) you are using for inserting the documents. So if you would do the insertion at the command line, the command would be:
db.collection.insert({_id : <BSONString_version_of_your_guid_value>, field1 : value1, ...});
It doesn't have to be BsonString. Change it to whatever Bson value is closest matching to your guid's original type (except the array type. Arrays aren't allowed as the value of _id field).
2) As far as i know, there IS effect on performance when db.collection.insert when you provide your own ids, especially in bulk, BUT if the id's are sorted etc., there shouldn't be a performance loss. The reason, i am quoting:
The structure of index is a B-tree. ObjectIds have an excellent
insertion order as far as the index tree is concerned: they are always
increasing, meaning they are always inserted at the right edge of
B-tree. This, in turn, means that MongoDB only has to keep the right
edge of the B-Tree in memory.
Conversely, a random value in the _id field means that _ids will be
inserted all over the tree. Then the machine must move a page of the
index into memory, update a tiny piece of it, then probably ignore it
until it slides out of memory again. This is less efficient.
:from the book `50 Tips and Tricks for MongoDB Developers`
The tip's title says - "Override _id when you have your own simple, unique id." Clearly it is better to use your id if you have one and you don't need the properties of an ObjectId. And it is best if your ids are increasing for the reason stated above.
3) There is a default index on _id field by MongoDB.
So...
Yes. It is possible to use other types than ObjectId, including GUID that will be saved as BinData.
Yes, there are considerations. It's better if your _id is always increasing (like a growing number, or ObjectId) otherwise the index needs to rebuild itself more often. If you plan on using sharding, the _id should also be hashed evenly.
_id indeed has an index automatically.
It depends on the type you choose. See section 2.
Conclusion: It's better to keep using ObjectId unless you have a good reason not to.

How to replace all existing records with one record in MongoDB

In my application I have multiple records that contain username and domain.
Before, I used to keep all records when they have different value in version field but now I want to replace them all with just one version.
For example, I have device collection that is structured as:
{
username: me,
domain: stackoverflow.com,
version: 1
}
{
username: me,
domain: stackoverflow.com,
version: 2
}
And kept upserting whenever there's new version.
And now I would like to have only one record that replaces all existing documents. Whenever new document with new version is upserted, all records that match username and domain will be gone and merged into the new one.
I tried upsert: true and multi: true option but it does not delete the old records.
Any help would be great.
Upsert won't delete the old records. It can only replace or create new. (documentation for upsert)
You'll need to clean up the old data manually. There are a few options:
Wait till you encounter a what before would have been a new version of the document and remove all versions before saving the new version. (Clean up old, then put down new).
Use the aggregation framework to group on username and domain to return a list of all combinations. Then, for each combination, eliminate all but the newest (you could sort on version to get the highest and then do a query where you use $ne to remove all that match everything but the highest version number). While this will really hit your database hard, you'd only need to do it once.
Filter the data manually in your favorite programming language and move the data to a new collection. Again, slow, but you'd do it only once.
If you don't care which one of the duplicates is kept, you can do this by creating a unique index over the two fields and specify the dropDups: true option when calling ensureIndex like this:
db.device.ensureIndex({username: 1, domain: 1}, {unique: true, dropDups: true})
This will force MongoDB to create the unique index by deleting documents with duplicate values leaving just one of each username/domain pairing (which seems to be just what you're looking for).

MongoDB workaround for not supported sparse unique compound index

I need a workaround because MongoDB does not support sparse unique compound indexes (it will set the values to null if not present whereas it doesn't add the field to the index when it's a non-compound index). See https://jira.mongodb.org/browse/SERVER-2193
In my particular case I have events. They can either be one-time or recurring. I have a field parent which is only present when the event is an instance of a recurring event (I periodically create new copies of the parent to have the recurring events for the next weeks in the system).
I thought I'd just add this index in order to prevent duplicate copies when the cronjob runs twice
events.ensureIndex({ dateFrom: 1, dateTo: 1, parent: 1 }) { sparse: true, unique: true }
Unfortunately as said above MongoDB does not support sparse on compound indexes. What this means is that for one-time events the parent field is not present and is set to null by MongoDB. If I now have a second one-time event at the same time, it causes a duplicate key error, which I only want when parent is set.
Any ideas?
Edit: I've seen MongoDB: Unique and sparse compound indexes with sparse values , but checking for uniqueness on application level is a no-go. I mean that's what the database is there for, to guarantee uniqueness.
You can add a 4th field which would be dateFrom+dateTo+parent (string concatenation). When the parent is null, choose a uid, for example from ObjectId function, and then index that field (unique).
This way you can enforce the uniqueness you want. However you can hardly use it for anything else than enforcing this constraint. (Although queries like "get docs where the string starts with blah blah" may be pretty efficient)

The fastest way to show Documents with certain property first in MongoDB

I have collections with huge amount of Documents on which I need to do custom search with various different queries.
Each Document have boolean property. Let's call it "isInTop".
I need to show Documents which have this property first in all queries.
Yes. I can easy do sort in this field like:
.sort( { isInTop: -1 } );
And create proper index with field "isInTop" as last field in it. But this will be work slowly, as indexes in mongo works best with unique fields.
So is there is solution to show Documents with field "isInTop" on top of each query?
I see two solutions here.
First: set Documents wich need to be in top the _id from "future". As you know, ObjectId contains timestamp. So I can create ObjectId with timestamp from future and use natural order
Second: create separate collection for Ducuments wich need to be in top. And do queries in it first.
Is there is any other solutions for this problem? Which will work fater?
UPDATE
I have done this issue with sorting on custom field which represent rank.
Using the _id field trick you mention has the problem that at some point in time you will reach the special time, and you can't change the _id field (without inserting a new document and removing the old one).
Creating a special collection which just holds the ones you care about is probably the best option. It gives you the ability to logically (and to some extent, physically) separate the documents.
Newly introduced in mongodb there is also support for a "sparse" index which may fulfill your needs as well. You could only set the "isInTop" field when you want it to be special, and then create a sparse index on it which would not have the problems you would normally have with a single indexed boolean field (in btrees).