ElasticSearch index from mongodb? - mongodb

I need create index from mongodb. Colection name is Product and have such structure:
{
"_id": ObjectId("5239656f60663de206b1053e"),
"brand": "<brandName>",
"category": {
"$ref": "Category",
"$id": ObjectId("50cb515760663d3577000043"),
"$db": "<dbName>"
},
"image": "<imageUrl>",
"integraId": "<someId>",
"isActive": <isActive>,
"name": "<productName>",
"slug": "<slug>"
}
Collection Product have more 30 000 rows, but elasticsearch indexing only ~10 000 rows.
My query to create index:
{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "127.0.0.1", "port": 27017 }
],
"options": {
"secondary_read_preference": true
},
"db": "<dbName>",
"collection": "Product"
},
"index": {
"name": "test",
"type": "test_type"
}
}
And just a second question: How can I indexing only some fields (name, category (get row by id from other collection) and brand)?

You may have more luck in the Google Groups about it bro http://groups.google.com/group/elasticsearch/topics or in the IRC http://www.elasticsearch.org/community/

MongoDB has full text search built in experimentally in version 2.4 if you would like to experiment with that: http://docs.mongodb.org/manual/core/index-text/ you may be able to query more effeciently. I realize this isn't the same as the elasticsearch solution you're looking for but this might be another way to solve the problem. Good luck!

Related

Delete sub-document from array in array of sub documents

Let's imagine a mongo collection of - let's say magazines. For some reason, we've ended up storing each issue of the magazine as a separate document. Each article is a subdocument inside an Articles-array, and the authors of each article is represented as a subdocument inside the Writers-array on the Article-subdocument. Only the name and email of the author is stored inside the article, but there is an Writers-array on the magazine level containing more information about each author.
{
"Title": "The Magazine",
"Articles": [
{
"Title": "Mongo Queries 101",
"Summary": ".....",
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
},
{
"Title": "Why not SQL instead?",
"Summary": ".....",
"Writers": [
{
"Name": "mike",
"Email": "mike#example.com"
},
{
"Name": "anna",
"Email": "anna#example.com"
}
]
}
],
"Writers": [
{
"Name": "tom",
"Email": "tom#example.com",
"Web": "tom.example.com"
},
{
"Name": "mike",
"Email": "mike#example.com",
"Web": "mike.example.com"
},
{
"Name": "anna",
"Email": "anna#example.com",
"Web": "anna.example.com"
}
]
}
How can one author be completely removed from a magazines?
Finding magazines where the unwanted author exist is quite easy. The problem is pulling the author out of all the sub documents.
MongoDB 3.6 introduces some new placeholder operators, $[] and $[<identity>], and I suspect these could be used with either $pull or $pullAll, but so far, I haven't had any success.
Is it possible to do this in one go? Or at least no more than two? One query for removing the author from all the articles, and one for removing the biography from the magazine?
You can try below query.
db.col.update(
{},
{"$pull":{
"Articles.$[].Writers":{"Name": "tom","Email": "tom#example.com"},
"Writers":{"Name": "tom","Email": "tom#example.com"}
}},
{"multi":true}
);

Mongo DB Join Collections

I am pretty new to Mongo db and coming from T-SQL background, I am finding little hard to understand how joins work in Mongo.
I have a very simple case where i have a "User Table.. err.. Collections" and "User Audit Collections"..
My User Collection looks something like this.
{
"_id": LUUID("d991e92a-766c-054e-9ad8-1c902acc6efc"),
"System": {
"VisitCount": 1
},
"UserData": {
"Uid": "46831",
"UserName": "abc.",
"FirstName": "abv",
"LastName": "test",
"EmailId": "abc#gmail.com",
"Region": "Georgia",
"Postal": "10000",
"Country": "United States",
"Phone": "800-000-1734",
}
}
and a User Audit Table :
{
"_id": LUUID("9561a583-0afe-e844-a090-43ffdab46ed2"),
"UserId": LUUID("914ed252-3fc7-d84c-9731-f382e7cf400b"),
"StartDateTime": ISODate("2016-05-12T04:07:37.299Z"),
"EndDateTime": ISODate("2016-05-12T04:07:42.715Z"),
"SaveDateTime": ISODate("2016-05-12T04:28:23.186Z"),
"Browser": {
"BrowserVersion": "50.0",
"BrowserMajorName": "Chrome",
"BrowserMinorName": "50.0"
},
"Pages": [
{
"DateTime": ISODate("2016-05-12T04:07:37.365Z"),
"Duration": 5416,
"Item": {
"_id": LUUID("f293157a-f22d-fe49-a7b0-f66f412408fe"),
"Language": "en",
"Version": 1
}"Url": {
"Path": "/"
},
"VisitPageIndex": 1
},
{
"DateTime": ISODate("2016-05-12T04:07:42.781Z"),
"Duration": 0,
"Item": {
"Version": 0
},
"SitecoreDevice": {
"_id": LUUID("df7f5dfe-c089-994d-9aa3-b5fbd009c9f3"),
"Name": "Default"
},
"MvTest": {
"ValueAtExposure": 0
},
"Url": {
"Path": "/Sample Page1"
},
"VisitPageIndex": 2
}
]
}
I need a Flat view where each row will hold all the user User information and the pages the user visited.
The Audit information can be grouped by user or repeated per user.. My main idea is to combine the User details with Page visited history.
I am looking for something like a Left outer join equivalent
something like
Select * from usertable, useraudittable
on usertable.id = userAuditTable.UserId
group by userID.
Mongo is a simple object storage database and does not offer a lot of relational operations like joins. Normally you have to do it programmatically doing multiple queries and processing the data using your application code and logic.
In Mongo 3.2 they introduced the lookup operation to the aggregation pipeline and fortunately it kinda does what you are looking for. You can use something like this (using mongo shell javascript syntax as example)
db.user.aggregate([{
$lookup: {
from: "audit",
localField: "_id",
foreignField: "UserId",
as: "VisitedPages"
}
}]);
If you are using the last version of mongo you can play with this approach otherwise you'll need to go with multiple queries on your application.
Take a look at the documentation

Doctrine ODM Query Builder Subquery

I'm using an Akeneo 1.4 system with mongoDB and have several associations for the data. Now I want to query for products with an associationType which has a specific product in it. The mongo data looks like following in the database:
"associations": [
{
"_id": ObjectId("565867d7c6e41f4408d0068f"),
"associationType": 5,
"groupIds": [
],
"owner": {
"$ref": "pim_catalog_product",
"$id": ObjectId("56560373c6e41f5b688b47d7"),
"$db": "akeneo_pim"
},
"products": [
{
"$ref": "pim_catalog_product",
"$id": ObjectId("56560372c6e41f5b688b4583"),
"$db": "akeneo_pim"
}
]
},
{
"_id": ObjectId("565867d7c6e41f4408d00690"),
"associationType": 6,
"groupIds": [
],
"owner": {
"$ref": "pim_catalog_product",
"$id": ObjectId("56560373c6e41f5b688b47d7"),
"$db": "akeneo_pim"
},
"products": [
]
}
]
I know how to query for the products array with in:
$queryBuilder->addOr(
$queryBuilder->expr()->field('associations.products.$id')->in(array(new \MongoId($product->getId())))
);
But I donĀ“t know how to query only for products with a specific associationType (e.g. 5) AND with the given productId. Can I do something like a subquery in Doctrine ODM? I tried already with multiple QueryBuilder objects but that didn't work.

How can I query an indexed object list in mongodb?

I have some documents in the "company" collection structured this way :
[
{
"company_name": "Company 1",
"contacts": {
"main": {
"email": "main#company1.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
},
{
"company_name": "Company 2",
"contacts": {
"main": {
"email": "main#company2.com",
"name": "Mainuser"
},
"store1": {
"email": "store1#company2.com",
"name": "Store1 user"
},
"store2": {
"email": "store2#company2.com",
"name": "Store2 user"
}
}
}
]
I'm trying to retrieve the doc that have store1#company2.com as a contact but cannot find how to query a specific value of a specific propertie of an "indexed" list of objects.
My feeling is that the contacts lists should not not be indexed resulting in the following structure :
{
"company_name": "Company 1",
"contacts": [
{
"email": "main#company1.com",
"name": "Mainuser",
"label": "main"
},
{
"email": "store1#company1.com",
"name": "Store1 user",
"label": "store1"
},
{
"email": "store2#company1.com",
"name": "Store2 user",
"label": "store2"
}
]
}
This way I can retrieve matching documents through the following request :
db.company.find({"contacts.email":"main#company1.com"})
But is there anyway to do a similar request on document using the previous structure ?
Thanks a lot for your answers!
P.S. : same question for documents structured this way :
{
"company_name": "Company 1",
"contacts": {
"0": {
"email": "main#company1.com",
"name": "Mainuser"
},
"4": {
"email": "store1#company1.com",
"name": "Store1 user"
},
"1": {
"email": "store2#company1.com",
"name": "Store2 user"
}
}
}
Short answer: yes, they can be queried but it's probably not what you want and it's not going to be really efficient.
The document structure in the first and third block is basically the same - you have an embedded document. The only difference between are the name of the keys in the contacts object.
To query document with that kind of structure you will have to do a query like this:
db.company.find({ $or : [
{"contacts.main.email":"main#company1.com"},
{"contacts.store1.email":"main#company1.com"},
{"contacts.store2.email":"main#company1.com"}
]});
This query will not be efficient, especially if you have a lot of keys in the contacts object. Also, creating a query will be unnecessarily difficult and error prone.
The second document structure, with an array of embedded objects, is optimal. You can create a multikey index on the contacts array which will make your query faster. The bonus is that you can use a short and simple query.
I think the easiest is really to shape your document using the structure describe in your 2nd example : (I have not fixed the JSON)
{
"company_name": "Company 1",
"contacts":{[
{"email":"main#company1.com","name":"Mainuser", "label": "main", ...}
{"email":"store1#company1.com","name":"Store1 user", "label": "store1",...}
{"email":"store2#company1.com","name":"Store2 user", "label": "store2",...}
]}
}
like that you can easily query on email independently of the "label".
So if you really want to use the other structure, (but you need to fix the JSON too) you will have to write more complex code/aggregation pipeline, since we do not know the name and number of attributes when querying the system. Theses structures are also probably hard to use by the developers independently of MongoDB queries.
Since it was not clear let me show what I have in mind
db.company.save(
{
"company_name": "Company 1",
"contacts":[
{"email":"main#company1.com","name":"Mainuser", "label": "main"},
{"email":"store1#company1.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company1.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.save(
{
"company_name": "Company 2",
"contacts":[
{"email":"main#company2.com","name":"Mainuser", "label": "main"},
{"email":"store1#company2.com","name":"Store1 user", "label": "store1"},
{"email":"store2#company2.com","name":"Store2 user", "label": "store2"}
]
}
);
db.company.ensureIndex( { "contacts.email" : 1 } );
db.company.find( { "contacts.email" : "store1#company2.com" } );
This allows you to store many emails, and query with an index.

mongodb - query embedded document - return only medium screenshots for each software in embedded collection

I have this simple mongo collection:
{
"_id": ObjectId("4fb176f964debef01e000000"),
"applicationId": NumberInt(1),
"screenshots": [
{
"caption": "ddd",
"images": [
{
"size": "large",
"file": {
"$ref": "File",
"$id": ObjectId("4fb176f964debef01e000001"),
"$db": "flukeytest"
}
},
{
"size": "medium",
"file": {
"$ref": "File",
"$id": ObjectId("4fb176f964debef01e000002"),
"$db": "flukeytest"
}
},
{
"size": "small",
"file": {
"$ref": "File",
"$id": ObjectId("4fb176f964debef01e000003"),
"$db": "flukeytest"
}
}
]
},
{
"caption": "tetsss",
"images": [
{
"size": "large",
"file": {
"$ref": "File",
"$id": ObjectId("4fb1771164debe9c1a000000"),
"$db": "flukeytest"
}
},
{
"size": "medium",
"file": {
"$ref": "File",
"$id": ObjectId("4fb1771164debe9c1a000001"),
"$db": "flukeytest"
}
},
{
"size": "small",
"file": {
"$ref": "File",
"$id": ObjectId("4fb1771164debe9c1a000002"),
"$db": "flukeytest"
}
}
]
}
]
}
I've been reading a lot about the $where function and map reduce but alas I'm not getting very far. I'm trying to select all medium images of every screenshot for one application id. I'm not sure how I can just return the medium image of each screenshot and nothing else. any ideas? Any pointers would be great :)
EDIT: db.Screenshot.find({ "applicationId": 1, "$where": "function() { return this.screenshots.images.size == 'medium'; }" }).sort([ ]); I've got this far, but it doesn't work. Alas. Still reading up on everything I can find on google.
Why are you attempting to return only the medium screenshots? Is it for performance reasons? If there's one document per applicationId, then it won't be much more efficient to return only the medium screenshots (for example, MongoDB will do the same amount of physical IO loading the data from disk). MongoDB is significantly different from a RDBMS in situations like this.
Unless the document in question is huge, and you're trying to return the medium screenshots for many applications in one query, then there won't be much (if any) performance benefit. I would suggest that you just query for the document by applicationId and then filter out the screenshots you want in code.
Eventually, you will probably be able to do this sort of thing using the new aggregation framework, but it won't be released until 2.2.