How to structure NoSQL Documents in Azure for Lookup By Array of String contains? - nosql

NoSQL newbie here..
I have Employee documents and every Employee has a name and has one to many tags. Here is a possible representation of an employee object in JSON format:
{
"name": "John Doe",
"tags": ["blue", "red", "green"]
}
I want to be able to query Employee instances in Cosmos DB by their tags. For example, I want to find an Employee where tags contains 'green'. An Employee will not have too many tags, maybe up to 10 or 15 at most.
What is the best way to model the document structure for this use case? cosmos db documentation here suggests a structure akin to following for a reason I do not understand:
{
"name": "John Doe",
"tags": [
{
"name": "blue"
},
{
"name": "red"
}
]
}
Is there any reason to split a String array into child JSON objects like this?

How to model documents is totally based on your requirement, there is no strict rule for that.
For your doc structure, I did some test on my side and this all my test doc,4 docs in total:
I can use the query below to find out all employees that contain the "green" tag:
SELECT c.name,c.tags FROM c where ARRAY_CONTAINS(c.tags, "green")

Related

MongoDB Reuse _id

Say I have a simple schema for Users, which gets an automatically generated _id:
{
_id: ObjectId("9dfhdf9fdhd90dfdhdaf"),
name: "Joe Shmoe"
}
And then I have a schema for Groups where I can add Users into a members array.
{
name: "Joe's Group",
members: [{
_id: ObjectId("58fdaffdhfd9fdsahfdsfa"),
name: "Joe Shmoe"
}]
}
Objects within an array get new autogenerated IDs, but I'd like to keep that _id field consistent and reuse it so that the member in the Group has the same attributes as they do in the Users collection.
My obvious solution is to create an independent id field in the members' User object that references their _id in the Users collection, but that seems cluttered having two separate id fields for each member.
My question is, is what I'm attempting bad practice? What's the correct way to add existing objects into a collection from another collection?
I think what you're referring to is : manual or DB References in data modelling.
original_id = ObjectId()
db.places.insert({
"_id": original_id,
"name": "Broadway Center",
"url": "bc.example.net"
})
db.people.insert({
"name": "Erin",
"places_id": original_id,
"url": "bc.example.net/Erin"
})
Check the documentation here

Data structure design for fast queries MongoDB

I have several collections: photos, photos_like , photos_comment
I have two data structure options:
Case 1: photos_like, photos_comment is a filed array in photos collection
Case 2: use relation to connect between collections
Which option should I use to get the fastest performance?
You should go for case 2 - document reference.
If you start to get a lot of traffic, you will end up having photo documents with hundreds or even thousands of comments and likes inside.
You would have a data structure like:
collection: photos
{
"_id": ObjectId("5c53653451154c6da4623a79"),
"name": "ocean",
"path": "path/to/ocean.png"
}
collection: photo_comments
{
"_id": ObjectId("c73h42h3ch238c7238cyn34y"),
"comment": "it's actually a lake",
"photo": ObjectId("5c53653451154c6da4623a79"),
"user": ObjectId("sd686sd8ywh3rkjiusyyrk32")
}
collection: photo_likes
{
"_id": ObjectId("x267cb623yru2ru6c4r273bn"),
"photo": ObjectId("5c53653451154c6da4623a79"),
"user": ObjectId("sd686sd8ywh3rkjiusyyrk32")
}
You can get more details at Model One-to-Many Relationships with Document References.

Matching array objects where document has all of any given but not necessarily all given

I have a set of data that looks like this:
{
"name": "A document",
"version": "1.0",
"attributes":
[
{
"name": "a",
"values": ["1", "2"]
},
{
"name": "b",
"values": ["3", "4"]
}
]
}
Then for my query, I have been given data that has the same structure as attributes. But there might be lots of them, and with all sorts of different names and values. a and b might be part of it, but there might also be c with ["5", "6"] or d with ["1", "3"].
What I want is to match all documents where all the attribute objects in the db are matched with any of the given attributes in the query.
So in the database document there might be 2 attributes, and in the query I am giving 6 attributes, and the 2 attributes must find 2 matches among those 6.
Another question I found that I think is asking for the same thing is: MongoDB Query with Java. Count matches in Array and I am also leaning towards a similar solution; just filtering on the names in the query and then doing the fine-tuning in code using a stream/pointer to lessen the memory usage.

Should I use selector or views in Cloudant?

I'm having confusion about whether to use selector or views, or both, when try to get a result from the following scenario:
I need to do a wildsearch for a book and return the result of the books plus the price and the details of the store branch name.
So I tried using selector to do wildsearch using regex
"selector": {
"_id": {
"$gt": null
},
"type":"product",
"product_name": {
"$regex":"(?i)"+search
}
},
"fields": [
"_id",
"_rev",
"product_name"
]
I am able to get the result. The idea after getting the result is to use all the _id's from the result set and query to views to get more details like price and store branch name on other documents, which I feel is kind of odd and I'm not certain is that the correct way to do it.
Below is just the idea once I get the result of _id's and insert it as a "productId" variable.
var input = {
method : 'GET',
returnedContentType : 'json',
path : 'test/_design/app/_view/find_price'+"?keys=[\""+productId+"\"]",
};
return WL.Server.invokeHttp(input);
so I'm asking for input from an expert regarding this.
Another question is how to get the store_branch_name? Can it be done in a single view where we can get the product detail, prices and store branch name? Or do I need to have several views to achieve this?
expected result
product_name (from book document) : Book 1
branch_name (from branch array in Store document) : store 1 branch one
price ( from relationship document) : 79.9
References:
Book
"_id": "book1",
"_rev": "1...b",
"product_name": "Book 1",
"type": "book"
"_id": "book2",
"_rev": "1...b",
"product_name": "Book 2 etc",
"type": "book"
relationship
"_id": "c...5",
"_rev": "3...",
"type": "relationship",
"product_id": "book1",
"store_branch_id": "Store1_branch1",
"price": "79.9"
Store
{
"_id": "store1",
"_rev": "1...2",
"store_name": "Store 1 Name",
"type": "stores",
"branch": [
{
"branch_id": "store1_branch1",
"branch_name": "store 1 branch one",
"address": {
"street": "some address",
"postalcode": "33490",
"type": "addresses"
},
"geolocation": {
"coordinates": [
42.34493,
-71.093232
],
"type": "point"
},
"type": "storebranch"
},
{
"branch_id": "store1_branch2",
"branch_name":
**details ommit...**
}
]
}
In Cloudant Query, you can specify two different kinds of indexes, and it's important to know the differences between the two.
For the first part of your question, if you're using Cloudant Query's $regex operator for wildcard searches like that, you might be better off creating a Cloudant Query index of type "text" instead of type "json". It's in the Cloudant docs, but see the intro blog post for details: https://cloudant.com/blog/cloudant-query-grows-up-to-handle-ad-hoc-queries/ There's a more advanced post on this that covers the tradeoffs between the two types of indexes https://cloudant.com/blog/mango-json-vs-text-indexes/
It's harder to address the second part of your question without understanding how your application interacts with your data, but there are a couple pieces of advice.
1) Consider denormalizing some of this information so you're not doing the JOINs to begin with.
2) Inject more logic into your document keys, and use the traditional MapReduce View indexing system to emit a compound key (an array), that you can use to emulate a JOIN by taking advantage of the CouchDB/Cloudant index sorting rules.
That second one's a mouthful, but check out this example on YouTube: https://youtu.be/0al1KnCKjlA?t=23m39s
Here's a preview (example map function) of what I'm talking about:
'map' : function(doc)
{
if (doc.type==="user") {
emit( [doc._id], null );
}
else if (doc.type==="edge:follower") {
emit( [doc.user, doc.follows], {"_id":doc.follows} );
}
}
The resulting secondary index here would take advantage of the rules outlined in http://wiki.apache.org/couchdb/View_collation -- that strings sort before arrays, and arrays sort before objects. You could then issue range queries to emulate the results you'd get with a JOIN.
I think that's as much detail that's appropriate for here. Hope it helps!

Querying MongoDB (Using Edge Collection - The most efficient way?)

I've written Users, Clubs and Followers collections for the sake of an example the below.
I want to find all user documents from the Users collection that are following "A famous club". How can I find those? and Which way is the fastest?
More info about 'what do I want to do - Edge collections'
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA"
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}
Followers collection
{
"_id": "159",
"user_id": "1",
"club_id": "12"
}
PS: I can get the documents using Mongoose like the below way. However, creating followers array takes about 8 seconds with 150.000 records. And second find query -which is queried using followers array- takes about 40 seconds. Is it normal?
Clubs.find(
{ club_id: "12" },
'-_id user_id', // select only one field to better perf.
function(err, docs){
var followers = [];
docs.forEach(function(item){
followers.push(item.user_id)
})
Users.find(
{ _id:{ $in: followers } },
function(error, users) {
console.log(users) // RESULTS
})
})
There is no an eligible formula to manipulate join many-to-many relation on MongoDB. So I combined collections as embedded documents like the below. But the most important taks in this case creating indexes. For instance if you want to query by followingClubs you should create an index like schema.index({ 'followingClubs._id':1 }) using Mongoose. And if you want to query country and followingClubs you should create another index like schema.index({ 'country':1, 'followingClubs._id':1 })
Pay attention when working with Embedded Documents: http://askasya.com/post/largeembeddedarrays
Then you can get your documents fastly. I've tried to get count of 150.000 records using this way it took only 1 second. It's enough for me...
ps: we musn't forget that in my tests my Users collection has never experienced any data fragmentation. Therefore my queries may demonstrated good performance. Especially, followingClubs array of embedded documents.
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA",
"followingClubs": [ {"_id": "12"} ]
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}