MongoDb two collections or one? - mongodb

I am beginner in Mongodb.
I have a products collection with details:
products:[
{
id: 1,
name: "CocaCola",
discount: true
}
]
Some products may have a discount. For that I decided to add property:
{ discount: true }
Should I create a separated discounts_products collection to store details of discount or better includ all information directly in product.discount?
I am a bit confused after related databases.
I try to consider this step from others sides (insert ,update and reading data)

I suggest you to have that as a field.
Instead of a boolean flag, you can keep how much percentage discount is.
If it is boolean
You can check for $exists on discount field to find discounted field.
If it is a number
You can use $gte, $lte to find products with more or less than particular discount.
About non discount products
It is not mandatory that all documents/rows should have that field. You can opt out that field for the product which doesn't have discount.
If you need boolean for particular use case, then you need additional field to store how much percent discount is.
Reason to avoid separate collection:
There could be a use case where you need to get both discounted and non discounted products. You need to make two http requests.
You need to move/delete product forth and back between two collections when product goes from discounted to non discounted.
In cases of single collection, you can remove this field when the product becomes non discounted or you can set as false if you keep boolean or you can set as 0 if you plan to keep a number.
If it is a complex object, you can have a nested object like
{
_id:12,
product_name : "phone",
discount: {
startTime:time,
.... OtherFields
}
}

Mongodb is designed to get rid of the relational data, use that to get the benefit from this pattern and the high performance that you would expect.
So my answer will be to add the discounts to the property in products.
By inserting you could maybe insert it with the value null when that is the default behaviour, otherwise just add the amount of discount or whatever you have to add to it, updating and reading date will be the same (checking if not null).

Related

NoSQL/MongoDB: When do I need _id in nested objects?

I don't understand the purpose of the _id field when it refers to a simple nested object, eg it is not used as a root entity in a dedicated collection.
Let's say, we have a simple value-object like Money({currency: String, amount: Number }). Obviously, you are never going to store such a value object by itself in a dedicated collection. It will rather be attached to a Transaction or a Product, which will have their own collection.
But you still might query for specific currencies or amounts that are higher/lower than a given value. And you might programmatically (not on DB level) set the value of a transaction to the same as of a purchased product.
Would you need an _id in this case? What are the cases, where a nested object needs an _id?
Assuming the _id you are asking about is an ObjectId, the only purpose of such field is to uniquely identify an object. Either a document in a collection or a subdocument in an array.
From the description of your money case it doesn't seem necessary to have such _id field for your value object. A valid use case might be an array of otherwise non-unique/non-identifiable subdocuments e.g. log entries, encrypted data, etc.
ODMs might add such field by default to have a unified way to address subdocuments regardless of application needs.

Need to count boolean values in Tableau

I'm using mongo with Tableau and have a boolean called "verified" that shows as true vs false.
Each user can add "certifications" to his/her record, then we go in with an admin tool and flag the cert as verified:true or verified:false. I want to show a simple table that has the number of certifications for each user, then another column with the number verified.
Currently I'm using "COUNTD([Certifications.Verified])" to count the number of verified but I don't think it's accurately counting.
This is just counting if the sub-schema of "verified" exists with a true or false state so the numbers are not accurate. Note, in some cases this node doesn't exist and is shown as a null.
I need to way to count if the the verified=true then 1 if no verified node exists or verified:false then 0.
How do I add the logic to count this accurately in Tableau?
Update: Thanks for the Mongo queries but I'm looking for Tableau custom fields to show this.
You're going to want to use the $cond pipeline operation, within your .aggregate() operator. It'll allow you to specify what you would like returned based on a conditional, which in your case would be the Verified field. I don't know how your data is structured, but I would imagine using something like this:
$sum: { $cond: ["$Certifications.Verified", 1, 0] }
If Verified is true for that certification, it will return a 1 which will be accounted for in the $sum operator. Whether you want to use something like $group operator or a $project to create this summed field will depend on your preference/use case.
You can use this to return count.
schemaName.find({Certifications.Verified : true}).count(function (error,count) {
console.log(count);
});
it return non-zero value(no of document satisfied condition) if certificate verified = true is exist
otherwise it return 0
Replacing whatever your table's key is with RowID:
COUNTD(IIF([Certifications.Verified]=1, RowID, NULL))
COUNTD() is useful but can be computationally inefficient on large data sets, so don’t use COUNTD() in situations where a simpler, faster aggregation function will work equally well.
If you simply want to know how many records satisfy From Tableau, just use SUM(INT(<condition>)) The INT() type conversion function converts True to 1 and False to 0. So if the level of detail of your data has one record per THING, and you want to know how many THINGs satisfy your condition, just using SUM(INT(<condition>)) will do the trick, faster than using count distinct on record ids.
There are even some data sources, like MS Access, that don’t implement COUNTD()
Bottom line,
SUM(INT()) is the simplest way to count records that satisfy a condition
COUNTD() is very flexible and useful, but can be slow. For large data sets or high performance environments, consider alternatives such as reshaping your data.
BTW, similar advice applies to LOD calculations. They are very useful, and flexible, but introduce complexity and performance costs. Use them when necessary, but don’t use them when a simpler, faster approach will suffice. I see a lot of people just use FIXED LOD calcs for everything, presumably because it seems a lot like SQL. Overdone, that can lead to a very brittle solution.

Extensive filtering

Example:
{
shortName: "KITT",
longName: "Knight Industries Two Thousand",
fromZeroToSixty: 2,
year: 1982,
manufacturer: "Pontiac",
/* 25 more fields */
}
Ability to query by at least 20 fields which means that only 10 fields are left unindexed
There's 3 fields (all number) that could be used for sorting (both ways)
This leaves me wondering that how does sites with lots of searchable fields do it: e.g real estate or car sale sites where you can filter by every small detail and can choose between several sort options.
How could I pull this off with MongoDB? How should I index that kind of collection?
Im aware that there are dbs specifically made for searching but there must be general rules of thumb to do this (even if less performant) in every db. Im sure not everybody uses Elasticsearch or similar.
---
Optional reading:
My reasoning is that index could be huge but the index order matters. You'll always make sure that fields that return the least results are first and most generic fields are last in index. However, what if user chooses only generic fields? Should I include non-generic fields to query anyway? How to solve ordering in both ways? Or index intersection saves the day and I should just add 20 different indexes?
text index is your friend.
Read up on it here: https://docs.mongodb.com/v3.2/core/index-text/
In short, it's a way to tell mongodb that you want full text search over a specific field, multiple fields, or all fields (yay!)
To allow text indexing of all fields, use the special symbol $**, and define it of type 'text':
db.collection.createIndex( { "$**": "text" } )
you can also configure it with Case Insensitivity or Diacritic Insensitivity, and more.
To perform text searches using the index, use the $text query helper, see: https://docs.mongodb.com/v3.2/reference/operator/query/text/#op._S_text
Update:
In order to allow user to select specific fields to search on, it's possible to use weights when creating the text-index: https://docs.mongodb.com/v3.2/core/index-text/#specify-weights
If you carefully select your fields' weights, for example using different prime numbers only, and then add the $meta text score to your results you may be able to figure out from the "textScore" which field was matched on this query, and so filter out the results that didn't get a hit from a selected search field.
Read more here: https://docs.mongodb.com/v3.2/tutorial/control-results-of-text-search/

How to deal with Many-to-Many relations in MongoDB when Embedding is not the answer?

Here's the deal. Let's suppose we have the following data schema in MongoDB:
items: a collection with large documents that hold some data (it's absolutely irrelevant what it actually is).
item_groups: a collection with documents that contain a list of items._id called item_groups.items plus some extra data.
So, these two are tied together with a Many-to-Many relationship. But there's one tricky thing: for a certain reason I cannot store items within item groups, so -- just as the title says -- embedding is not the answer.
The query I'm really worried about is intended to find some particular groups that contain some particular items (i.e. I've got a set of criteria for each collection). In fact it also has to say how much items within each found group fitted the criteria (no items means group is not found).
The only viable solution I came up with this far is to use a Map/Reduce approach with a dummy reduce function:
function map () {
// imagine that item_criteria came from the scope.
// it's a mongodb query object.
item_criteria._id = {$in: this.items};
var group_size = db.items.count(item_criteria);
// this group holds no relevant items, skip it
if (group_size == 0) return;
var key = this._id.str;
var value = {size: group_size, ...};
emit(key, value);
}
function reduce (key, values) {
// since the map function emits each group just once,
// values will always be a list with length=1
return values[0];
}
db.runCommand({
mapreduce: item_groups,
map: map,
reduce: reduce,
query: item_groups_criteria,
scope: {item_criteria: item_criteria},
});
The problem line is:
item_criteria._id = {$in: this.items};
What if this.items.length == 5000 or even more? My RDBMS background cries out loud:
SELECT ... FROM ... WHERE whatever_id IN (over 9000 comma-separated IDs)
is definitely not a good way to go.
Thank you sooo much for your time, guys!
I hope the best answer will be something like "you're stupid, stop thinking in RDBMS style, use $its_a_kind_of_magicSphere from the latest release of MongoDB" :)
I think you are struggling with the separation of domain/object modeling from database schema modeling. I too struggled with this when trying out MongoDb.
For the sake of semantics and clarity, I'm going to substitute Groups with the word Categories
Essentially your theoretical model is a "many to many" relationship in that each Item can belong Categories, and each Category can then possess many Items.
This is best handled in your domain object modeling, not in DB schema, especially when implementing a document database (NoSQL). In your MongoDb schema you "fake" a "many to many" relationship, by using a combination of top-level document models, and embedding.
Embedding is hard to swallow for folks coming from SQL persistence back-ends, but it is an essential part of the answer. The trick is deciding whether or not it is shallow or deep, one-way or two-way, etc.
Top Level Document Models
Because your Category documents contain some data of their own and are heavily referenced by a vast number of Items, I agree with you that fully embedding them inside each Item is unwise.
Instead, treat both Item and Category objects as top-level documents. Ensure that your MongoDb schema allots a table for each one so that each document has its own ObjectId.
The next step is to decide where and how much to embed... there is no right answer as it all depends on how you use it and what your scaling ambitions are...
Embedding Decisions
1. Items
At minimum, your Item objects should have a collection property for its categories. At the very least this collection should contain the ObjectId for each Category.
My suggestion would be to add to this collection, the data you use when interacting with the Item most often...
For example, if I want to list a bunch of items on my web page in a grid, and show the names of the categories they are part of. It is obvious that I don't need to know everything about the Category, but if I only have the ObjectId embedded, a second query would be necessary to get any detail about it at all.
Instead what would make most sense is to embed the Category's Name property in the collection along with the ObjectId, so that pulling back an Item can now display its category names without another query.
The biggest thing to remember is that the key/value objects embedded in your Item that "represent" a Category do not have to match the real Category document model... It is not OOP or relational database modeling.
2. Categories
In reverse you might choose to leave embedding one-way, and not have any Item info in your Category documents... or you might choose to add a collection for Item data much like above (ObjectId, or ObjectId + Name)...
In this direction, I would personally lean toward having nothing embedded... more than likely if I want Item information for my category, i want lots of it, more than just a name... and deep-embedding a top-level document (Item) makes no sense. I would simply resign myself to querying the database for an Items collection where each one possesed the ObjectId of my Category in its collection of Categories.
Phew... confusing for sure. The point is, you will have some data duplication and you will have to tweak your models to your usage for best performance. The good news is that that is what MongoDb and other document databases are good at...
Why don't use the opposite design ?
You are storing items and item_groups. If your first idea to store items in item_group entries then maybe the opposite is not a bad idea :-)
Let me explain:
in each item you store the groups it belongs to. (You are in NOSql, data duplication is ok!)
for example, let's say you store in item entries a list called groups and your items look like :
{ _id : ....
, name : ....
, groups : [ ObjectId(...), ObjectId(...),ObjectId(...)]
}
Then the idea of map reduce takes a lot of power :
map = function() {
this.groups.forEach( function(groupKey) {
emit(groupKey, new Array(this))
}
}
reduce = function(key,values) {
return Array.concat(values);
}
db.runCommand({
mapreduce : items,
map : map,
reduce : reduce,
query : {_id : {$in : [...,....,.....] }}//put here you item ids
})
You can add some parameters (finalize for instance to modify the output of the map reduce) but this might help you.
Of course you need to have another collection where you store the details of item_groups if you need to have it but in some case (if this informations about item_groups doe not exist, or don't change, or you don't care that you don't have the most updated version of it) you don't need them at all !
Does that give you a hint about a solution to your problem ?

Many to many in MongoDB

I decided to give MongoDB a try and see how well we get along. I do have some questions though.
Premise
I have users(id, name, address, password, email, etc)
I have stamps(id, type, value, price, etc)
Users browse through a stamp archive and filter it in various ways(pagination, filter by price, type, name, etc), select a stamp then add it to their collection.
Users can add more then one stamp to their collection (1 piece of mint and one used or just 2 pieces of used)
Users can flag some of their stamps for sale or trade and perhapa specify a price.
So far
Here's what I have so far:
{
_id : objectid,
Name: "bob",
Email: "bob#bob.com",
...
Stamps: [stampid-1, stampid-543,...,stampid-23]
}
Questions
How should I add the state of the owned stamp, the quantity and condition?
what would be some sample queries for the situations described earlier?
As far as I know, ensureindex makes it so you reduce the number of "scanned" entries.
The accepted answer here keeps changing the index. Is that just for the purpose of explaining it or is this the way to do it? I mean it does make sense somehow but I keep thinking of it in sql terms and... it does not make ANY sense...
The only change I would do is how you store the stamps that a user owns. I would store an array of objects representing the stamps and duplicating the values that are the more often accessed.
For example something like that :
{
_id : objectid,
Name: "bob",
Email: "bob#bob.com",
...
Stamps : [
{
_id: id,
type: 'type',
price: 20,
forSale: true/false,
quantity: 2
},
{
_id: id2,
type: 'type2',
price: 5,
forSale: false,
quantity: 10
}
]
}
You can see that some datas are duplicated between the stamps collection and the stamps array in the user collection. You do that with the properties that you access the more often. Because otherwise you would have to do a findOne for each stamps, and it is better to read directly the data that doing that in MongoDB. And this way you can add others properties such as quantity and forSale here.
The goal of duplication here is to avoid to run a query for each stamp in the array.
There is a link of a video that discusses MongoDB design and also explains what I tried to explain here.
http://lacantine.ubicast.eu/videos/3-mongodb-deployment-strategies/
from a SQL background, struggling with NoSQL also. It seems to me that a lot hinges on how unchanging types of data may or may not be. One thing that puzzles me in RDBMS systems is why it is not possible to say a particular column/field is "immutable". If you know a field is immutable (or nearly) in a NoSQL context it seems me to make it more acceptable to duplicate the info. Is it complete heresy to suggest that in many contexts you might actually want a combination of SQL and NoSQL structures?