Should I use selector or views in Cloudant? - ibm-cloud

I'm having confusion about whether to use selector or views, or both, when try to get a result from the following scenario:
I need to do a wildsearch for a book and return the result of the books plus the price and the details of the store branch name.
So I tried using selector to do wildsearch using regex
"selector": {
"_id": {
"$gt": null
},
"type":"product",
"product_name": {
"$regex":"(?i)"+search
}
},
"fields": [
"_id",
"_rev",
"product_name"
]
I am able to get the result. The idea after getting the result is to use all the _id's from the result set and query to views to get more details like price and store branch name on other documents, which I feel is kind of odd and I'm not certain is that the correct way to do it.
Below is just the idea once I get the result of _id's and insert it as a "productId" variable.
var input = {
method : 'GET',
returnedContentType : 'json',
path : 'test/_design/app/_view/find_price'+"?keys=[\""+productId+"\"]",
};
return WL.Server.invokeHttp(input);
so I'm asking for input from an expert regarding this.
Another question is how to get the store_branch_name? Can it be done in a single view where we can get the product detail, prices and store branch name? Or do I need to have several views to achieve this?
expected result
product_name (from book document) : Book 1
branch_name (from branch array in Store document) : store 1 branch one
price ( from relationship document) : 79.9
References:
Book
"_id": "book1",
"_rev": "1...b",
"product_name": "Book 1",
"type": "book"
"_id": "book2",
"_rev": "1...b",
"product_name": "Book 2 etc",
"type": "book"
relationship
"_id": "c...5",
"_rev": "3...",
"type": "relationship",
"product_id": "book1",
"store_branch_id": "Store1_branch1",
"price": "79.9"
Store
{
"_id": "store1",
"_rev": "1...2",
"store_name": "Store 1 Name",
"type": "stores",
"branch": [
{
"branch_id": "store1_branch1",
"branch_name": "store 1 branch one",
"address": {
"street": "some address",
"postalcode": "33490",
"type": "addresses"
},
"geolocation": {
"coordinates": [
42.34493,
-71.093232
],
"type": "point"
},
"type": "storebranch"
},
{
"branch_id": "store1_branch2",
"branch_name":
**details ommit...**
}
]
}

In Cloudant Query, you can specify two different kinds of indexes, and it's important to know the differences between the two.
For the first part of your question, if you're using Cloudant Query's $regex operator for wildcard searches like that, you might be better off creating a Cloudant Query index of type "text" instead of type "json". It's in the Cloudant docs, but see the intro blog post for details: https://cloudant.com/blog/cloudant-query-grows-up-to-handle-ad-hoc-queries/ There's a more advanced post on this that covers the tradeoffs between the two types of indexes https://cloudant.com/blog/mango-json-vs-text-indexes/
It's harder to address the second part of your question without understanding how your application interacts with your data, but there are a couple pieces of advice.
1) Consider denormalizing some of this information so you're not doing the JOINs to begin with.
2) Inject more logic into your document keys, and use the traditional MapReduce View indexing system to emit a compound key (an array), that you can use to emulate a JOIN by taking advantage of the CouchDB/Cloudant index sorting rules.
That second one's a mouthful, but check out this example on YouTube: https://youtu.be/0al1KnCKjlA?t=23m39s
Here's a preview (example map function) of what I'm talking about:
'map' : function(doc)
{
if (doc.type==="user") {
emit( [doc._id], null );
}
else if (doc.type==="edge:follower") {
emit( [doc.user, doc.follows], {"_id":doc.follows} );
}
}
The resulting secondary index here would take advantage of the rules outlined in http://wiki.apache.org/couchdb/View_collation -- that strings sort before arrays, and arrays sort before objects. You could then issue range queries to emulate the results you'd get with a JOIN.
I think that's as much detail that's appropriate for here. Hope it helps!

Related

Index and sort on nested variable field MongoDB

I have a document that looks like this :
{
"_id": "chatID"
"presence": {
"userID1": 1647627240464,
"userID2": 1647227540464
},
}
I need to query for each userID the chats where he is present and order by the timestamp in the presence map.
I am aware that this is probably not the best way to do, before i had 1 element per user meaning duplicating the chatIDs, but it's a pain to update them all because it would look like :
{
"_id": "userID1chatID",
"at": 1647627240464,
"ids": "30EYwO01_Nyq7dMqe_O3vfL3AH",
"members": ["userID1", "userID2", "userID3"],
"owner": "userID1",
"present": true,
"uid": "chatID",
"url": "databaseURL"
}
This would allow me to find the chats where userID1 is present: true and order by at DESCENDING.
The problem with this is that i need to update the at attribute for all the documents (one per user) for this same chat room.
How can i do this same query while maintaining a single document with present as a map ?
Problem : the index would be on a variable : userID1, userID2, etc...
like : present.userID1 and seems to not be convenient for use when userID1 can be removed from the present map if the user leaves the chat.
Please let me know if this is unclear, thanks in advance.

How to join two collection in mongo without lookup

I have two collection, there name are post and comment.
The model structure is in the following.
I want to use aggregation query post and sort by comments like length sum, currently I can query a post comments like length sum in the following query statement.
My question is how can I query post and join comment collection in Mongo version 2.6. I know after Mongo 3.2 have a lookup function.
I want to query post collection and sort by foreign comments likes length. Is it have a best way to do this in mongo 2.6?
post
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
}
comment
/* 1 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello world",
"like": [
"2"
]
}
/* 2 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello stackoverflow",
"like": [
"1",
"2"
]
}
Query a post comments like sum
db.getCollection('comment').aggregate([
{
"$match": {
post_id: "5a39e22c27308912334b4567"
}
},
{
"$project": {
"likeLength": {
"$size": "$like"
},
"post_id": "$post_id"
}
},
{
"$group": {
_id: "$post_id",
"likeLengthSum": {
"$sum": "$likeLength"
}
}
}
])
There is no "best" way to query, as it'll really depend on your specific needs, but... you cannot perform a single query across multiple collections (aside from the $lookup aggregation pipeline function in later versions, as you already are aware).
You'll need to make multiple queries: one to your post collection, and one to your comment collection.
If you must perform a single query, then consider storing both types of documents in a single collection (with some identifier property to let you filter on either posts or comments, within your query).
There is no other way to join collections in the current MongoDB v6 without $lookup,
I can predict two reasons that causing you the issues,
The $lookup is slow and expensive - How to improve performance?
$lookup optimization:
Follow the guideline provided in the documentation
Use indexs:
You can use the index on the reference collection's fields, as per your sample data you can create an index for post_id field, an index for uid field, or a compound index for both the fields on the basis of your use cases
You can read more about How to Improve Performance with Indexes and Document Filters
db.comment.createIndex({ "post_id": -1 });
db.comment.createIndex({ "uid": -1 });
// or
db.comment.createIndex({ "post_id": -1, "uid": -1 });
Document Filters:
Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline
You can refer to the documentation for more detailed examples
{ $skip: 0 },
{ $limit: 10 } // as per your use case
Limit the $lookup result:
Try to limit the result of lookup by $limit stage,
Try to coordinate or balance with improved query and the UI/Use cases
You want to avoid $lookup - How to improve the collection schema to avoid $lookup?
Store the analytics/metrics:
If you are trying to get the total counts of the comments in a particular post then you must store the total count in the post collection whenever you get post get a new comment
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10
}
Store minimum reference data:
If you want to show the comments of a particular post, you can limit the result for ex: show 5 comments per post
You can also store a max of 5 latest comments in the post collection to avoid the $lookup, whenever you get the latest comment then add it and just remove the oldest comment from 5 comments
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10,
"comments": [
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"comment": "hello world"
},
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"comment": "hello stackoverflow"
}
]
}
Must read about Reduce $lookup Operations
Must read about Improve Your Schema

Mongodb large document design

I plan to create a database for price history.
The history database should store prices defined 90 days in advance each day in a year.
That means: 90 days x 365 days/year = 32850 database item
Is there any way to design schema to improve query performance ?
my first suggestion was hierarchical store values like:
{
"Address": "xxxxx",
"City": "xxxxx",
"Country": "Deutschland",
"Currency": "EUR",
"Item_Name": "xxxxxx",
"Location": [
log, lat
],
"Postal_code": "xxxx",
"Price_History": [
2014 : [
"January" : {
"CW_1" : { 1: [ price1 .. price90 ], 2: [ price1 .. price90 ], },
"CW_2" : {},
"CW_3" : {},
} ,
"February" : {},
"March" : {},
]
]
}
Thank you in advance!
It all depends on which queries you are planning to run against this data. It seems to me that if you are interested in keeping a history of actions, then your queries will almost always contain a date parameter.
The Price_History array might be better formatted as sub document. Each of these documents would have a varied (but limited) range of values - the year and the month. It might be a good idea to add an index on that attribute. This way, whenever you query by a certain date range, your indexes will assist mongo to find the relevant dataset relatively quickly.
Another option would be to have each price in-itself as a document. The item connected to the price could be a sub-document perhaps not containing all of the item data, but enough to be able to make the calculations and fetch the other relevant data once your dataset is small enough. For this usage, I would recommend creating a single attribute of the date ranges to be indexed and also an index on the item._id attribute. You can still have the individual date components if you still need to query them individually. Something like this:
{
"ind_attr": "2014_January_CW1",
"date": {
"year": 2014,
"month": January",
},
"CW": 1,
"price": [ price1... price90 ],
"item": {
"name": ...,
"_id": ...,
// minimal data about the actual item
}
}
With this document structure, you could easily add an index on the ind_attr attribute. The document.item._id attribute can be used to retrieve more detailed data on the actual item if needed.

How to store option data in mongo?

I have a site that I'm using Mongo on. So far everything is going well. I've got several fields that are static option data, for example a field for animal breeds and another field for animal registrars.
Breeds
Arabian
Quarter Horse
Saddlebred
Registrars
AQHA
American Arabians
There are maybe 5 or 6 different collections like this that range from 5-15 elements.
What is the best way to put these in Mongo? Right now, I've got a separate collection for each group. That is a breeds collection, a registrars collection etc.
Is that the best way, or would it make more sense to have a single static data collection with a "type" field specifying the option type?
Or something else completely different?
Since this data is static then it's better to just embed the data in documents. This way you don't have to do manual joins.
And also store it in a separate collection (one or several, doesn't matter, choose what's easier) to facilitate presentation (render combo-boxes, etc.)
I believe creating multiple collections has collection size implications? (something about MongoDB creating a collection file on disk as twice the size of the previous file [ db.0 = 64MB, db.1 = 128MB and so on)
Here's what I can think of:
1. Storing as single collection
The benefits here are:
You only need one call to Mongo to fetch, and if you can cache the call you quickly have the data.
You avoid duplication: create a single schema that deals with all your options. You can just nest suboptions if there are any.
Of course, you also avoid duplication in statics/methods to modify options.
I have something similar on a project that I'm working on. I have categories and subcategories all stored in one collection. Here's a JSON/BSON dump as example:
In all the data where I need to store my 'options' (station categories in my case) I simply use the _id.
{
"status": {
"code": 200
},
"response": {
"categories": [
{
"cat": "Taxi",
"_id": "50b92b585cf34cbc0f000004",
"subcat": []
},
{
"cat": "Bus",
"_id": "50b92b585cf34cbc0f000005",
"subcat": [
{
"cat": "Bus Rapid Transit",
"_id": "50b92b585cf34cbc0f00000b"
},
{
"cat": "Express Bus Service",
"_id": "50b92b585cf34cbc0f00000a"
},
{
"cat": "Public Transport Bus",
"_id": "50b92b585cf34cbc0f000009"
},
{
"cat": "Tour Bus",
"_id": "50b92b585cf34cbc0f000008"
},
{
"cat": "Shuttle Bus",
"_id": "50b92b585cf34cbc0f000007"
},
{
"cat": "Intercity Bus",
"_id": "50b92b585cf34cbc0f000006"
}
]
},
{
"cat": "Rail",
"_id": "50b92b585cf34cbc0f00000c",
"subcat": [
{
"cat": "Intercity Train",
"_id": "50b92b585cf34cbc0f000012"
},
{
"cat": "Rapid Transit/Subway",
"_id": "50b92b585cf34cbc0f000011"
},
{
"cat": "High-speed Rail",
"_id": "50b92b585cf34cbc0f000010"
},
{
"cat": "Express Train Service",
"_id": "50b92b585cf34cbc0f00000f"
},
{
"cat": "Passenger Train",
"_id": "50b92b585cf34cbc0f00000e"
},
{
"cat": "Tram",
"_id": "50b92b585cf34cbc0f00000d"
}
]
}
]
}
}
I have a call to my API that gets me that document (app.ly/api/v1/stationcategories). I find this much easier to code with.
In your case you could have something like:
{
"option": "Breeds",
"_id": "xxx",
"suboption": [
{
"option": "Arabian",
"_id": "xxx"
},
{
"option": "Quarter House",
"_id": "xxx"
},
{
"option": "Saddlebred",
"_id": "xxx"
}
]
},
{
"option": "Registrars",
"_id": "xxx",
"suboption": [
{
"option": "AQHA",
"_id": "xxx"
},
{
"option": "American Arabians",
"_id": "xxx"
}
]
}
Whenever you need them, either loop through them, or pull specific options from your collection.
2. Storing as a static JSON document
This as #Sergio mentioned, is a viable and more simplistic approach. You can then either have separate docs for separate options, or put them in one document.
You do lose some flexibility here because you can't reference options by Id (which I prefer because changing option name doesn't affect all your other data).
Prone to typos (though if you know what you're doing this shouldn't be a problem).
For Node.js users: this might leave you with a headache from require('../../../options.json') similar to PHP.
The reader will note that I'm being negative about this approach, it works, but is rather inflexible.
Though we're discouraged from using joins unnecessarily on MongoDB, referencing by ObjectId is sometimes useful and extensible.
An example is if your website becomes popular in one region of the world, and say people from Poland start accounting for say 50% of your site visits. If you decide to add Polish translations. You would need to go back to all your documents, and add Polish names (if exists) to your options. If using approach 1, it's as easy as adding a Polish name to your options, and then plucking the Polish name from your options collection at runtime.
I could only think of 2 options other than storing each option as a collection
UPDATE: If someone has positives or negatives for either approach, may you please add them. My bias might be unhelpful to some people as there are benefits to storing static JSON files
MongoDB is schemaless and also no JOIN is supported. So you have to move out of the RDBMS and normalization given the fact that this is purely a different kind of database.
Few rules which you can apply while designing as a guidelines. Of course, you have the choice of keeping it in a separate collection when needed.
Static Master/Reference Data:
You have to always embed them in your documents wherever required. Since the data is not going to be changed, it is not at all bad idea to keep in the same collection. If the data is too large, group them and store them in a separate collection instead of creating multiple collection for the this master data itself.
NOTE: When embedding the collections as sub-documents, always make sure that you are never going to exceed the 16MB limit. That is the limit (at this point) for each collection can take in MongoDB.
Dynamic Master/Reference Data
Try to keep them in a separate collection as the master data is tend to change often.
Always remember, NO join support, so keep them in a way that you can easily access it without querying the database too many times.
So there is NO suggested best way, it always changes based on your need and the design can go either way.

mongodb php - how to do "INNER JOIN"-like query

I'm using the Mongo PHP extension.
My data looks like:
users
{
"_id": "4ca30369fd0e910ecc000006",
"login": "user11",
"pass": "example_pass",
"date": "2010-09-29"
},
{
"_id": "4ca30373fd0e910ecc000007",
"login": "user22",
"pass": "example_pass",
"date": "2010-09-29"
}
news
{
"_id": "4ca305c2fd0e910ecc000003",
"name": "news 333",
"content": "news content 3333",
"user_id": "4ca30373fd0e910ecc000007",
"date": "2010-09-29"
},
{
"_id": "4ca305c2fd0e910ecc00000b",
"name": "news 222",
"content": "news content 2222",
"user_id": "4ca30373fd0e910ecc000007",
"date": "2010-09-29"
},
{
"_id": "4ca305b5fd0e910ecc00000a",
"name": "news 111",
"content": "news content",
"user_id": "4ca30369fd0e910ecc000006",
"date": "2010-09-29"
}
How to run a query similar like this, from PHP?
SELECT n.*, u.*
FROM news AS n
INNER JOIN users AS u ON n.user_id = u.id
MongoDB does not support joins. If you want to map users to the news, you can do the following
1) Do this at the application-layer. Get the list of users, and get the list of news and map them in your application. This method is very expensive if you need this often.
2) If you need to do the previous-step often, you should redesign your schema so that the news articles are stored as embedded documents along with the user documents.
{
"_id": "4ca30373fd0e910ecc000007",
"login": "user22",
"pass": "example_pass",
"date": "2010-09-29"
"news" : [{
"name": "news 222",
"content": "news content 2222",
"date": "2010-09-29"
},
{
"name": "news 222",
"content": "news content 2222",
"date": "2010-09-29"
}]
}
Once you have your data in this format, the query that you are trying to run is implicit. One thing to note, though, is that analytics queries become difficult on such a schema. You will need to use MapReduce to get the most recently added news articles and such queries.
In the end the schema-design and how much denormalization your application can handle depends upon what kind of queries you expect your application to run.
You may find these links useful.
http://www.mongodb.org/display/DOCS/Schema+Design
http://www.blip.tv/file/3704083
I hope that was helpful.
Forget about joins.
do a find on your news. Apply the skip number and limit for paging the results.
$newscollection->find().skip(20).limit(10);
then loop through the collection and grab the user_id in this example you would be limited to 10 items. Now do a query on users for the found user_id items.
// replace 1,2,3,4 with array of userids you found in the news collection.
$usercollection.find( { _id : { $in : [1,2,3,4] } } );
Then when you print out the news it can display user information from the user collection based on the user_id.
You did 2 queries to the database. No messing around with joins and figuring out field names etc. SIMPLE!!!
If you are using the new version of MongoDB (3.2), then you would get something similar with the $lookup operator.
The drawbacks with using this operator are that it is highly inefficient when run over large result sets and it only supports equality for the match where the equality has to be between a single key from each collection. The other limitation is that the right-collection should be an unsharded collection in the same database as the left-collection.
The following aggregation operation on the news collection joins the documents from news with the documents from the users collection using the fields user_id from the news collection and the _id field from the users collection:
db.news.aggregate([
{
"$lookup": {
"from": "users",
"localField": "user_id",
"foreignField": "_id",
"as": "user_docs"
}
}
])
The equivalent PHP example implementation:
<?php
$m = new MongoClient("localhost");
$c = $m->selectDB("test")->selectCollection("news");
$ops = array(
array(
"$lookup" => array(
"from" => "users",
"localField" => "user_id",
"foreignField" => "_id",
"as" => "user_docs"
)
)
);
$results = $c->aggregate($ops);
var_dump($results);
?>
You might be better off embedding the "news" within the users' documents.
You can't do that in mongoDB. And from version 3 Eval() is deprecated, so you shouldn't use stored procedures either.
The only way I know to achieve a server side query involving multiple collections right now it's to use Node.js or similar. But if you are going to try this method, I strongly recommend you to limit the ip addresses allowed to access your machine, for security reasons.
Also, if your collections aren't too big, you can avoid inner joins denormalizing them.