I am pretty new to Mongo db and coming from T-SQL background, I am finding little hard to understand how joins work in Mongo.
I have a very simple case where i have a "User Table.. err.. Collections" and "User Audit Collections"..
My User Collection looks something like this.
{
"_id": LUUID("d991e92a-766c-054e-9ad8-1c902acc6efc"),
"System": {
"VisitCount": 1
},
"UserData": {
"Uid": "46831",
"UserName": "abc.",
"FirstName": "abv",
"LastName": "test",
"EmailId": "abc#gmail.com",
"Region": "Georgia",
"Postal": "10000",
"Country": "United States",
"Phone": "800-000-1734",
}
}
and a User Audit Table :
{
"_id": LUUID("9561a583-0afe-e844-a090-43ffdab46ed2"),
"UserId": LUUID("914ed252-3fc7-d84c-9731-f382e7cf400b"),
"StartDateTime": ISODate("2016-05-12T04:07:37.299Z"),
"EndDateTime": ISODate("2016-05-12T04:07:42.715Z"),
"SaveDateTime": ISODate("2016-05-12T04:28:23.186Z"),
"Browser": {
"BrowserVersion": "50.0",
"BrowserMajorName": "Chrome",
"BrowserMinorName": "50.0"
},
"Pages": [
{
"DateTime": ISODate("2016-05-12T04:07:37.365Z"),
"Duration": 5416,
"Item": {
"_id": LUUID("f293157a-f22d-fe49-a7b0-f66f412408fe"),
"Language": "en",
"Version": 1
}"Url": {
"Path": "/"
},
"VisitPageIndex": 1
},
{
"DateTime": ISODate("2016-05-12T04:07:42.781Z"),
"Duration": 0,
"Item": {
"Version": 0
},
"SitecoreDevice": {
"_id": LUUID("df7f5dfe-c089-994d-9aa3-b5fbd009c9f3"),
"Name": "Default"
},
"MvTest": {
"ValueAtExposure": 0
},
"Url": {
"Path": "/Sample Page1"
},
"VisitPageIndex": 2
}
]
}
I need a Flat view where each row will hold all the user User information and the pages the user visited.
The Audit information can be grouped by user or repeated per user.. My main idea is to combine the User details with Page visited history.
I am looking for something like a Left outer join equivalent
something like
Select * from usertable, useraudittable
on usertable.id = userAuditTable.UserId
group by userID.
Mongo is a simple object storage database and does not offer a lot of relational operations like joins. Normally you have to do it programmatically doing multiple queries and processing the data using your application code and logic.
In Mongo 3.2 they introduced the lookup operation to the aggregation pipeline and fortunately it kinda does what you are looking for. You can use something like this (using mongo shell javascript syntax as example)
db.user.aggregate([{
$lookup: {
from: "audit",
localField: "_id",
foreignField: "UserId",
as: "VisitedPages"
}
}]);
If you are using the last version of mongo you can play with this approach otherwise you'll need to go with multiple queries on your application.
Take a look at the documentation
Related
Let's imagine a mongo collection of - let's say magazines. For some reason, we've ended up storing each issue of the magazine as a separate document. Each article is a subdocument inside an Articles-array, and the authors of each article is represented as a subdocument inside the Writers-array on the Article-subdocument. Only the name and email of the author is stored inside the article, but there is an Writers-array on the magazine level containing more information about each author.
{
"Title": "The Magazine",
"Articles": [
{
"Title": "Mongo Queries 101",
"Summary": ".....",
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
},
{
"Title": "Why not SQL instead?",
"Summary": ".....",
"Writers": [
{
"Name": "mike",
"Email": "mike#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
}
],
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com",
"Web": "tom.example.com"
},
{
"Name": "mike",
"Email": "mike#example.com",
"Web": "mike.example.com"
},
{
"Name": "anna",
"Email": "anna#example.com",
"Web": "anna.example.com"
}
]
}
How can one author be completely removed from a magazines?
Finding magazines where the unwanted author exist is quite easy. The problem is pulling the author out of all the sub documents.
MongoDB 3.6 introduces some new placeholder operators, $[] and $[<identity>], and I suspect these could be used with either $pull or $pullAll, but so far, I haven't had any success.
Is it possible to do this in one go? Or at least no more than two? One query for removing the author from all the articles, and one for removing the biography from the magazine?
You can try below query.
db.col.update(
{},
{"$pull":{
"Articles.$[].Writers":{"Name": "tom","Email": "tom#example.com"},
"Writers":{"Name": "tom","Email": "tom#example.com"}
}},
{"multi":true}
);
Using MongoDB for storage, if I wanted to represent a tree structure of nodes, where child nodes under a single parent always have unique node-names, I believe the standard approach would be to use collections and to manage the node name uniqueness on the app level:
Approach 1: Collection Based Approach for Tree Data
{ "node_name": "home", "title": "Home", "children": [
{ "node_name": "products", "title": "Products", "children": [
{ "node_name": "electronics", "title": "Electronics", "children": [ ] },
{ "node_name": "toys", "title": "Toys", "children": [ ] } ] },
{ "node_name": "services", "title": "Services", "children": [
{ "node_name": "repair", "title": "Repair", "children": [ ] },
{ "node_name": "training", "title": "Training"", "children": [ ] } ] } ] }
I have however thought of the following alternate approach, where node-names become "Object Map" field names, and we do without collections altogether:
Approach 2: Object-Map Based Approach (without Collections)
// NOTE: We don't have the equivalent of "none_name":"home" at the root, but that's not an issue in this case
{ "title": "Home", "children": {
"products": { "title": "Products", children": {
"electronics": { "title": "Electronics", "children": { } },
"toys": { "title": "Toys", "children": { } } } },
"services": { "title": "Services", children": {
"repair": { "title": "Repair", "children": { } },
"training": { "title": "Training", "children": { } } } } } }
The question is:
Strictly from a MongoDB perspective (considering querying, performance, data maintainability and data-size and server scaling), are there any major issues with Approach #2 (over #1)?
EDIT: After getting to know MongoDB a bit better (and thanks to Neil's comments below), I realized that both options of this question are generally the wrong way to go, because they assume that it makes sense to store multiple nodes in a single MongoDB document. Ultimately, each "node" should be a separate document and (as Neil Lunn stated in the comments) there are various ways to implement a hierarchy tree, as seen here: Model Tree Structures in MongoDB
I think this use-case is not good for Mongo DB, because:
there's(MongoDB 2.6) no compress algorithm (your documents will be too large)
Mongo DB use database-level locks (when you want one large document, all DB operations will be blocked)
it will be hard to index
I think better solution will be some relational DB for this use-case.
I have some documents in the "company" collection structured this way :
[
{
"company_name": "Company 1",
"contacts": {
"main": {
"email": "main#company1.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
},
{
"company_name": "Company 2",
"contacts": {
"main": {
"email": "main#company2.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company2.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company2.com",
"name": "Store2 user"
}
}
}
]
I'm trying to retrieve the doc that have store1#company2.com as a contact but cannot find how to query a specific value of a specific propertie of an "indexed" list of objects.
My feeling is that the contacts lists should not not be indexed resulting in the following structure :
{
"company_name": "Company 1",
"contacts": [
{
"email": "main#company1.com",
"name": "Mainuser",
"label": "main"
},
{
"email": "store1#company1.com",
"name": "Store1 user",
"label": "store1"
},
{
"email": "store2#company1.com",
"name": "Store2 user",
"label": "store2"
}
]
}
This way I can retrieve matching documents through the following request :
db.company.find({"contacts.email":"main#company1.com"})
But is there anyway to do a similar request on document using the previous structure ?
Thanks a lot for your answers!
P.S. : same question for documents structured this way :
{
"company_name": "Company 1",
"contacts": {
"0": {
"email": "main#company1.com",
"name": "Mainuser"
},
"4": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"1": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
}
Short answer: yes, they can be queried but it's probably not what you want and it's not going to be really efficient.
The document structure in the first and third block is basically the same - you have an embedded document. The only difference between are the name of the keys in the contacts object.
To query document with that kind of structure you will have to do a query like this:
db.company.find({ $or : [
{"contacts.main.email":"main#company1.com"},
{"contacts.store1.email":"main#company1.com"},
{"contacts.store2.email":"main#company1.com"}
]});
This query will not be efficient, especially if you have a lot of keys in the contacts object. Also, creating a query will be unnecessarily difficult and error prone.
The second document structure, with an array of embedded objects, is optimal. You can create a multikey index on the contacts array which will make your query faster. The bonus is that you can use a short and simple query.
I think the easiest is really to shape your document using the structure describe in your 2nd example : (I have not fixed the JSON)
{
"company_name": "Company 1",
"contacts":{[
{"email":"main#company1.com","name":"Mainuser", "label": "main", ...}
{"email":"store1#company1.com","name":"Store1 user", "label": "store1",...}
{"email":"store2#company1.com","name":"Store2 user", "label": "store2",...}
]}
}
like that you can easily query on email independently of the "label".
So if you really want to use the other structure, (but you need to fix the JSON too) you will have to write more complex code/aggregation pipeline, since we do not know the name and number of attributes when querying the system. Theses structures are also probably hard to use by the developers independently of MongoDB queries.
Since it was not clear let me show what I have in mind
db.company.save(
{
"company_name": "Company 1",
"contacts":[
{"email":"main#company1.com","name":"Mainuser", "label": "main"},
{"email":"store1#company1.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company1.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.save(
{
"company_name": "Company 2",
"contacts":[
{"email":"main#company2.com","name":"Mainuser", "label": "main"},
{"email":"store1#company2.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company2.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.ensureIndex( { "contacts.email" : 1 } );
db.company.find( { "contacts.email" : "store1#company2.com" } );
This allows you to store many emails, and query with an index.
I need create index from mongodb. Colection name is Product and have such structure:
{
"_id": ObjectId("5239656f60663de206b1053e"),
"brand": "<brandName>",
"category": {
"$ref": "Category",
"$id": ObjectId("50cb515760663d3577000043"),
"$db": "<dbName>"
},
"image": "<imageUrl>",
"integraId": "<someId>",
"isActive": <isActive>,
"name": "<productName>",
"slug": "<slug>"
}
Collection Product have more 30 000 rows, but elasticsearch indexing only ~10 000 rows.
My query to create index:
{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "127.0.0.1", "port": 27017 }
],
"options": {
"secondary_read_preference": true
},
"db": "<dbName>",
"collection": "Product"
},
"index": {
"name": "test",
"type": "test_type"
}
}
And just a second question: How can I indexing only some fields (name, category (get row by id from other collection) and brand)?
You may have more luck in the Google Groups about it bro http://groups.google.com/group/elasticsearch/topics or in the IRC http://www.elasticsearch.org/community/
MongoDB has full text search built in experimentally in version 2.4 if you would like to experiment with that: http://docs.mongodb.org/manual/core/index-text/ you may be able to query more effeciently. I realize this isn't the same as the elasticsearch solution you're looking for but this might be another way to solve the problem. Good luck!
Say I have a document that looks something like this:
{
"_id": ObjectId("50b6a7416cb035b629000001"),
"businesses": [{
"name": "Biz1",
"id": ObjectId("50b6bc953e47dc923e000001")
}, {
"name": "Biz2",
"id": ObjectId("50b6ccebae0513bf52000001")
}, {
"name": "Biz3",
"id": ObjectId("50b6d015c58b414156000001")
}, {
"name": "Biz4",
"id": ObjectId("50b6d0c8a4cdd5e356000001")
}]
}
I want to remove
{
"name": "Biz3",
"id": ObjectId("50b6d015c58b414156000001")
}
from the array of businesses. I tried this (using business name instead of id for clarity):
db.users.update({'businesses.name':'Biz3'},{$pull:{'businesses.name':'Biz3'}})
but of course it didn't work. I know that the query part is correct because I get the document back when I do this:
db.users.find({'businesses.name' : 'Biz3'})
So the problem is with the update part.
Just ran a quick lil test and this works
I think trying db.users.update({'businesses.name':'Biz3'},{$pull:{'businesses':{'name':'Biz3'}}}) should do it