Given a list of unique discount codes, how to send each one only once - mongodb

I'm wondering the best way to solve this problem: given a pre-generated list of 500 unique discount codes for an e-commerce site, how do I ensure that each of the first 500 users that receive a discount code each receive a unique one? The e-commerce site would be making an asynchronous request to a separate server with the list of discount codes stored in its database. It's this server's job to make sure that it sends back each discount code only once, in chronological order as requests are received.
As this seems like a rather primitive problem, I wonder if there is a clever and elegant way to do this with a relatively low level of effort.

A simple way is to have a collection of your codes and remove items as you select them. Here is a simple example with .findAndModify().
A basic collection example:
db.codes.insert([
{ "a": 1 },
{ "a": 2 },
{ "a": 3 }
])
Issue a .findAndModify():
db.codes.findAndModify({
"query": {},
"remove": true,
"new": false
})
Returns:
{ "_id" : ObjectId("550caf3f7d9c3dc0eab83334"), "a" : 1 }
And the new state of the collection is:
{ "a": 2 }
{ "a": 3 }
So as the document is retrieved it is removed from the collection preventing further selection. Since .findAndModify() is an atomic operation, no other request can see the same document and every request will get it's own unique response.

If your DB has atomic transactions, this is no problem. Just make a table discount with 2 fields, code (varchar wide enough to hold the code) and used (boolean), indexed by used and then by code. Initially INSERT 500 rows, each with used = false of course. Whenever a request comes, just SELECT min(code) FROM discount FOR UPDATE WHERE NOT used, and then UPDATE discount SET used = true WHERE NOT used AND code = <that code>, all inside a single DB transaction. (The NOT used part of the update is not necessary for correctness, but may speed things up by enabling the index to be used.)
If contention is a problem (and I don't see how it could be for 500 requests, but maybe it somehow could be), then add an integer id field containing a unique integer between 1 and 500 to the table. Then on each request, pick a random number r between 1 and 500, and SELECT min(code) FROM discount FOR UPDATE WHERE NOT used AND (id >= <r> OR id + 500 >= <r>). The condition in parentheses ensures that the search will "wrap around" to lower-numbered discounts if (and only if) all discounts >= r have already been taken.

Related

How do I resolve this design constraint in mongo db w.r.t to performance?

Currently I have something as below.
Collection1 - system
{
_id: system_id,
... system fields
system_name: ,
system_site: ,
system_group: ,
....
device_errors: [1,2,3,4,5,6,7]
}
I have 2K unique error codes.
I have an error collection as below.
{
_id: error_id,
category,
impact,
action,
}
I have got a use case where each each system|burt combination can have unique error_description because error has some system data.
I am confused how to handle this in this scenario.
One system can have many errors.
One error can be part of multiple systems.
Now, how to maintain the unique details of a burt specific to a system? I thought of having a nested field instead array in system collection. I am wondering about the scalability.
Any suggestion?
system1|burt1
error_desc:unique system1
system2|burt1
error_Description: unique
If I store like above in another collection, API request has to make three calls and form the response.
1. Find all errors for set of systems
2. Find top 50 burts from point1
3. For top 50 burts, find error desc
Combine all three call responses and reply to the user?
I am not thinking it is best as we need to make 3 data source calls to respond a request.
I have already tried flatten structure with redundant data.
{
... system1_info
... error1_info
},
{
... system2_info
... error1_info
},
{
... system1_info
... error2_info
},
{
... system10_info
... error1200_info
}
Here, I am using many aggregation as below in single query
1. Match
2. Group error
3. Sort
4. total count of errors - another group
5. Project
I feel it is a heavier query than the approach1[actual question].
Let's say I have 2k errors, 20million systems = I have totally 40million doc.
In worst case each system has 2k errors. My query should support more than 1 system. Let's say I have to query for 25k systems.
25k systems * 2k errors => match result
Apply all the mentioned above operations
Then slice to 100[For pagination]
If I go with relational model like without redundancy, I will get 25k systems, then i have to query for only 2k errors = It is very less operation than above aggregation.
Presumably the set of possible errors does not change very frequently. Cache it in the application.

How to create an index in MongoDB which calls a JS function via system.js?

I have two collections viz. whitelist (id, count, expiry) and blacklist (id).
Now i would like to create an index such that when count>=200 then call a JS function which will remove the document from whitelist and add the id to blacklist.
So can i do this in Mongo using db.collection.createindex({"count":1}, ???);
or do i need to write a daemon to scan the entire collection? or is there any better method for the same?
You seem to be asking for what in a SQL relational database we would call a "trigger", which is something completely different from an "index" even in that world.
In the NoSQL world typically and especially with MongoDB, that sort of "server logic" is relegated to the "client" code operations rather than the server. Think of it as another part of the "scalability" philosphy of these products, where certain functions like "triggers" are taken away due to the stance that these "cost" a lot with distributed data.
So in order to do what you want you do it in "code" instead of defining a database "trigger". The process is simple enough, via .findAndModify() and other wrapping variants available to langauge API's:
// Increment below 200 and return the modified document
var doc = db.whitelist.findAndModify({
"query": { "_id": myId, "count": { "$lt": 200 } }
"update": { "count": { "$inc": 1 } },
"new": true
});
// Then remove the blacklist where the value meets conditions
if ( doc.hasOwnProperty("count") {
if ( doc.count >= 200 )
db.blacklist.remove({ "_id": myId });
}
Be careful with the actual language API method variant as the structure typically differs fromt the "query/update" keys as is provided in the shell method.
The basic principles remain the same. Modifiy and fetch, then remove from the other collection if your conditions are met. But it is "two" trips to the server, and there is no way to make the server "trigger" when such a condition is met by itself.
db.whitelist.insert(doc);
if(db.whitelist.find(criterion).count() >= 200) {
var bulkRemove = db.whitelist.initializeUnorderedBulkOp();
var bulkInsert = db.blacklist.initializeUnorderedBulkOp();
db.whitelist.find(criterion).forEach(
function(doc){
bulkInsert.insert({_id:doc._id});
bulkRemove.find({doc._id}).removeOne();
}
);
bulkInsert.execute();
bulkRemove.execute();
}
First, you insert the document as usual. Since criterion is going to use an index, the if clause should be determined fast and efficiently.
In case we have 200 or more documents matching that criterion, we use bulk operations to insert the ids into the blacklist and remove the documents from the whitelist, which will be executed in parallel.
The problem with only writing the _id to the backlist is that you need to check wether the criterion for being blacklisted is matched, so the _id needs to contain that criterion.
A better solution IMHO is to flag entries of a single collection using a field named blacklisted for individual entries or to use the aggregation framework to find blacklisted documents and write them to an a collection using the out pipeline stage. Sadly, you didn't give example data or a proper description of your use case, so you get an unspecified answer.

mongodb, make increment several times in single update

Having very simple 2 mongo documents:
{_id:1, v:1}
{_id:2, v:1}
Now, basing on array of _id I need increase field v as many times how _id appears. For example [1, 2, 1] should produce
{_id:1, v:3} //increased 2 times
{_id:2, v:2} //increased 1 times
Of course simple update eliminates duplicate in $in:
db.r.update({_id:{$in:[1,2,1]}}, {$inc:{v:1}}, {multi:true})
Is there a way to do it without for-loop? /Thank you in advance/
No there isn't a way to do this in a single update statement.
The reason why the $in operator "removes the duplicate" is a simple matter of the fact that th 1 was already matched, no point in matching again. So you can't make the document "match twice" as it were.
Also there is no current way to batch update operations. But that feature is coming.
You could look at your "batch" and make a decision to group together occurrences of the same document to be updated and then issue your increment to the appropriate number of units. However just like looping the array items, the operation would be programitic, albeit a little more efficient.
That isn't possible directly. You'll have to do that in your client, where you can at least try to minimize the number of batch updates required.
First, find the counts. This depends on your programming language, but what you want is something like [1, 2, 1] => [ { 1 : 2 }, { 2 : 1} ] (these are the counts for the respective ids, i.e. id 1 appears twice, etc.) Something like linq oder underscore.js is helpful here.
Next, since you can't perform different updates in a single operation, group them by their count, and update all objects whose count must be incremented by a common fixed value in one batch:
Pseudocode:
var groups = data.groupBy(p => p.Value);
foreach(var group in groups)
db.update({"_id" : { $in : group.values.asArray }},
// increase by the number of times those ids were present
{$inc : { v : group.key } })
That is better than individual updates only if there are many documents that must be increased by the same value.

DynamoDB Model/Keys Advice

I was hoping someone could help me understand how to best design my table(s) for DynamoDb. I'm building an application which is used to track the visits a certain user makes to another user's profile.
Currently I have a MongoDB where one entry contains the following fields:
userId
visitedProfileId
date
status
isMobile
How would this translate to DynamoDB in a way it would not be too slow? I would need to do search queries to select all items that have a certain userId, taking the status and isMobile in affect. What would me keys be? Can I use limit functionality to only request the latest x entries (sorted on date?).
I really like the way DynamoDB can be used but it really seems kind of complicated to make the click between a regular NoSQL database and a key-value nosql database.
There are a couple of ways you could do this - and it probably depends on any other querying you may want to do on this table.
Make your HashKey of the table the userId, and then the RangeKey can be <status>:<isMobile>:<date> (eg active:true:2013-03-25T04:05:06.789Z). Then you can query using BEGINS_WITH in the RangeKeyCondition (and ScanIndexForward set to false to return in ascending order).
So let's say you wanted to find the 20 most recent rows of user ID 1234abcd that have a status of active and an isMobile of true (I'm guessing that's what you mean by "taking [them] into affect"), then your query would look like:
{
"TableName": "Users",
"Limit": 20,
"HashKeyValue": { "S": "1234abcd" },
"RangeKeyCondition": {
"ComparisonOperator": "BEGINS_WITH"
"AttributeValueList": [{ "S": "active:true:" }],
},
"ScanIndexForward": false
}
Another way would be to make the HashKey <userId>:<status>:<isMobile>, and the RangeKey would just be the date. You wouldn't need a RangeKeyCondition in this case (and in the example, the HashKeyValue would be { "S": "1234abcd:active:true" }).

mongodb limit in the embedded document

I need to create a message system, where a person can have a conversation with many users.
For example I start to speak with user2, user3 and user4, so anyone of them can see the whole conversation, and if the conversation is not private at any point of time any of participants can add any other person to the conversation.
Here is my idea how to do this.
I am using Mongo and my idea is to use dialog as an instance instead of message.
The schema is listed as follows:
{
_id : ...., // dialog Id
'private' : 0 // is the conversation private
'participants' : [1, 3, 5, 6], //people who are in the conversation
'msgs' :[
{
'mid' : ...// id of a message
'pid': 1, // person who wrote a message
'msg' : 'tafasd' //message
},
....
{
'mid' : ...// id of a message
'pid': 1, // person who wrote a message
'msg' : 'tafasd' //message
}
]
}
I can see some pros for this approach
- in a big database it will be easy to find messages for some particular conversation.
- it will be easy to add people to the conversation.
but here is a problem, for which I can't find a solution:
the conversation is becoming too long (take skype as an example) and they are not showing you all the conversation, they are showing you a part and afterwards they are showing you additional messages.
In other situations skip, limit solves the case, but how can I do this here?
If this is impossible what suggestions do you have?
The MongoDB docs explain how to select a subrange of an array element.
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: 5}}) // first 5 comments
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: -5}}) // last 5 comments
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: [20, 10]}}) // skip 20, limit 10
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: [-20, 10]}}) // 20 from end, limit 10
You can use this technique to only select the messages that are relevant to your UI. However, I'm not sure that this is a good schema design. You may want to consider separating out "visible" messages from "archived" messages. It might make the querying a bit easier/faster.
There are caveats if your conversation will have many many messages:
You will notice significant performance reduction on slicing messages arrays as mongodb will do load all of them and will slice the list before return to driver only.
There is document size limit (16MB for now) that could be possibly reached by this approach.
My suggestions is:
Use two collections: one for conversations and the other for messages.
Use dbref in messages to conversation (index this field with the message timestamp to be able to select older ranges on user request).
Additional use separate capped collection for every conversation. It will be easy to find it by name if you build it like "conversation_"
Result:
You will have to write all messages twice. But into separate collections which is normal.
When you want to show your conversation you will need just to select all the data from one collection in natural sort order which is very fast.
Your capped collections will automatically store last messages and delete old.
You may show older messages on the user request by querying main messages collection.