Can elastic search provide nested json results? - mongodb

I know that elastic search provides good support for nested json. It has very good support for nested objects with advance indexing.
So, when I make a nested query in elastic search, can the query result be obtained in original nested form ? Or is the query result in flattened form like that in lucene or solr ?
Note: I have used apache solr and lucene before. And, I am evaluating other different search platforms for better support for nested json objects.

I'm giving you a simple example of results maintaining the depth.
PUT people { "mappings": {
"list": {
"properties": {
"name": {
"type": "nested"
}
}
} } }
PUT people/list/1 { "age" : "19", "name" : [
{
"first" : "John",
"last" : "Smith"
} ] }
PUT people/list/2 { "age" : "23", "name" : [
{
"first" : "Wilber",
"last" : "Smith"
} ] }
GET people/list/_search { "query": {
"match_all": {} } }
As far as I understand, you'll prefer nested mapping to object mapping. Because object would flatten results. See this for reference:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/nested.html

Related

How to find nodes with an object that contains a string value

I'm struggling to create a find query that finds nodes that contain "Item1".
{
"_id" : ObjectId("589274f49bd4d562f0a15e07"),
"Value" : [["Item1", {
"Name" : "John",
"Age" : 45
}], ["Item2", {
"Address" : "123 Main St.",
"City" : "Hometown",
"State" : "ZZ"
}]]
}
In this example, "Item1" is not a key/value pair, but rather just a string that is part of an array that is part of a larger array. This is a legacy format so I can't adjust it unfortunately.
I've tried something like: { Value: {$elmemmatch:{$elemmatch:{"Item1"}}}, but that is not returning any matches. Similarly, $regex is not working since it only seems to match on string objects (and the overall object is not a string, but a string in an array in an array).
It seems like you should use the $in or $eq operator to match value.
So try this:
db.collection.find({'Value':{$elemMatch:{$elemMatch:{$in:['Item1']}}}})
Or run this to get the specific Item
db.collection.find({},{'Value':{$elemMatch:{$elemMatch:{$in:['Item1']}}}})
Hope this helps.
var data = {
"_id":"ObjectId('589274f49bd4d562f0a15e07')",
"Value":[
[
"Item1",
{
"Name":"John",
"Age":45
}
],
[
"Item2",
{
"Address":"123 Main St.",
"City":"Hometown",
"State":"ZZ"
}
]
]
}
data.Value[0][0] // 'Item1'
Copy and paste on repl it works.
There was an error on structure ofr your data

Aggregating filter for Key

If I have a document as follows:
{
"_id" : ObjectId("54986d5531a011bb5fb8e0ee"),
"owner" : "54948a5d85f7a9527a002917",
"type" : "group",
"deleted" : false,
"participants" : {
"54948a5d85f7a9527a002917" : {
"last_message_id" : null
},
"5491234568f7a9527a002917" : {
"last_message_id" : null
}
"1234567aaaa7a9527a002917" : {
"last_message_id" : null
}
},
}
How do I do a simple filter for all documents this have participant "54948a5d85f7a9527a002917"?
Thanks
Trying to query structures like this does not work well. There are a whole whole host of problems with modelling like this, but the most clear problem is using "data" as the names for "keys".
Try to think a little RDBMS like, at least in the concepts of the limitations to what a database cannot or should not do. You wouldn't design a "table" in a schema that had something like "54948a5d85f7a9527a002917" as the "column" name now would you? But this is essentially what you are doing here.
MongoDB can query this, but not in an efficient way:
db.collection.find({
"participants.54948a5d85f7a9527a002917": { "$exists": true }
})
Naturally this looks for the "presence" of a key in the data. While the query form is available, it does not make efficient use of such things as indexes where available as indexes apply to "data" and not the "key" names.
A better structure and approach is this:
{
"_id" : ObjectId("54986d5531a011bb5fb8e0ee"),
"owner" : "54948a5d85f7a9527a002917",
"type" : "group",
"deleted" : false,
"participants" : [
{ "_id": "54948a5d85f7a9527a002917" },
{ "_id": "5491234568f7a9527a002918" },
{ "_id": "1234567aaaa7a9527a002917" }
]
}
Now the "data" you are looking for is actual "data" associated with a "key" ( possibly ) and inside an array for binding to the parent object. This is much more efficient to query:
db.collection.find({
"participants._id": "54948a5d85f7a9527a002917"
})
It's much better to model that way than what you are presently doing and it makes sense to the consumption of objects.
BTW. It's probably just cut and paste in your question but you cannot possibly duplicate keys such as "54948a5d85f7a9527a002917" as you have. That is a basic hash rule that is being broken there.

How do I query a hash sub-object that is dynamic in mongodb?

I currently have a Question object and am not sure how to query for it?
{ "title" : "Do you eat fast food?"
"answers" : [
{
"_id" : "506b422ff42c95000e00000d",
"title" : "Yes",
"trait_score_modifiers" : {
"hungry" : 1
}
},
{
"_id" : "506b422ff42c95000e00000e",
"title" : "No",
"trait_score_modifiers" : {
"not-hungry" : -1
}
}]
}
I am trying to find questions where the trait_score_modifieres is queried (sometimes it exists, sometimes not)
I have the following but it is not dynamic:
db.questions.find({"answers.trait_score_modifiers.not-hungry":{$exists: true}})
How could i do something like this?
db.questions.find({"answers.trait_score_modifiers.{}.size":{$gt: 0}})
You should modify the schema so you have consistent key names to query on. I ran into a similar problem using the aggregation framework, see question: Total values from all keys in subdocument
Something like this should work (not tested):
{
"title" : "Do you eat fast food?"
"answers" : [
{
"title" : "Yes",
"trait_score_modifiers" : [
{"dimension": "hungry", "value": 1}
]
},
{
"title" : "No",
"trait_score_modifiers" : [
{"dimension": "not-hungry", "value": -1}
]
}]
}
You can return all questions that have a dynamic dimension (e.g. "my new dimension") with:
db.questions.find("answers.trait_score_modifiers.dimension": "my new dimension")
Or limit the returned set to questions that have a specific value on that dimension (e.g. > 0):
db.questions.find(
"answers.trait_score_modifiers": {
"$elemMatch": {
"dimension": "my new dimension",
"value": {"$gt": 0}
}
}
)
Querying nested arrays can be a bit tricky, be sure to read up on the documentation In this case, $elemMatch is needed because otherwise you return a document that has some trait_score_modifier my new dimension but the matching value is in the dimension key of a different array element.
You need $elemMatch criteria in your query.
Refer to: http://docs.mongodb.org/manual/reference/projection/elemMatch/
Let me know if you need the query.

Indexing embedded mongoDB documents (in an array) with Solr

Is there any way, how I can make Solr index embedded mongoDB documents? We already can index top-level values of keys in a mongo document via mongo-connector, pushes the data to Solr.
However, in situations like in this structure which represents a post:
{
author: "someone",
post_text : "some really long text which is already indexed by solr",
comments : [
{
author:"someone else"
comment_text:"some quite long comment, which I do not
know how to index in Solr"
},
{
author:"me"
comment_text:"another quite long comment, which I do not
know how to index in Solr"
}
]
}
This is just an example structure. In our project, we handle more complicated structures, and sometimes, the text we want to index is nested on a second or third level (depth, or what is the formal name for it).
I believe that there is a community of mongoDB + Solr users and so that this issue must have been adressed before, but I was unable to find good materials, that would cover this problem, if there is a nice way, how to handle this or whether there is no solution and workarounds have yet to be founded (and maybe you could provide me with one)
For a better understanding, one of our structures have at top level key that has for its value an array of some several analysis results, where one of them has an array of singular values, that are parts of the result. We need to index these values. E.g. (this is not the actual data structure, we use):
{...
Analysis_performed: [
{
User_tags:
[
{
tag_name: "awesome",
tag_score: 180
},
{
tag_name: "boring",
tag_score: 10
}
]
}
]
}
In this case we would need to index on the tag names. There is a possibility of us having a bad structure for storing the data, we want to store, but we thought hard about it and we think it's quite good. However, even if we switch to less nested information, we will most likely come across at least one situation where we will have to index information stored in embedded documents that are in an array and this is the question's main focus. Can we index such data with SOLR somehow?
I had a question like this a couple months ago. My solution is to use doc_manager.
You can use solr_doc_manager (upsert method), to modify document posted into solr. For example, if you have
ACL: {
Read: [ id1, id2 ... ]
}
you can handle it something like
def upsert(self, doc):
if ("ACL" in doc) and ("Read" in doc["ACL"]):
doc["ACL.Read"] = []
for item in doc["ACL"]["Read"]:
if not isinstance(item, dict):
id = ObjectId(item)
doc["ACL.Read"].append(str(id))
self.solr.add([doc], commit=False)
It adds new field - ACL.Read. This field is multivalued and stores list of ids from ACL : { Read: [ ... ] }
If you do not want to write you own handlers for nested documents, you can try another mongo connector. Github project page https://github.com/SelfishInc/solr-mongo-connector. It supports nested documents out of the box.
Official 10gen mongo connector now supports flattening of arrays and indexing subdocuments.
See https://github.com/10gen-labs/mongo-connector
However for arrays it does something unpleasant like this. It would transform this document:
{
"hashtagEntities" : [
{
"start" : "66",
"end" : "81",
"text" : "startupweekend"
},
{
"start" : "82",
"end" : "90",
"text" : "startup"
},
{
"start" : "91",
"end" : "100",
"text" : "startups"
},
{
"start" : "101",
"end" : "108",
"text" : "london"
}
]
}
into this:
{
"hashtagEntities.0.start" : "66",
"hashtagEntities.0.end" : "81",
"hashtagEntities.0.text" : "startupweekend",
"hashtagEntities.1.start" : "82",
"hashtagEntities.1.end" : "90",
"hashtagEntities.1.text" : "startup",
....
}
The above is very difficult to index in Solr - even more if you have no stable schema for your documents. We wanted something more like this:
{
"hashtagEntities.xArray.start": [
"66",
"82",
"91",
"101"
],
"hashtagEntities.xArray.text": [
"startupweekend",
"startup",
"startups",
"london"
],
"hashtagEntities.xArray.end": [
"81",
"90",
"100",
"108"
],
}
I have implemented an alternative solr_doc_manager.py
If you want to use this, just edit the flatten_doc function in your doc_manager to this, to support such functionality:
def flattened(doc):
return dict(flattened_kernel(doc, []))
def flattened_kernel(doc, path):
for k, v in doc.items():
path.append(k)
if isinstance(v, dict):
for inner_k, inner_v in flattened_kernel(v, path):
yield inner_k, inner_v
elif isinstance(v, list):
for inner_k, inner_v in flattened_list(v, path).items():
yield inner_k, inner_v
path.pop()
else:
yield ".".join(path), v
path.pop()
def flattened_list(v, path):
tem = dict()
#path2 = list()
path.append(str("xArray"))
for li, lv in enumerate(v):
if isinstance(lv, dict):
for dk, dv in flattened_kernel(lv, path):
got = tem.get(dk, list())
if isinstance(dv, list):
got.extend(dv)
else:
got.append(dv)
tem[dk] = got
else:
got = tem.get(".".join(path)+".ROOT", list())
if isinstance(lv, list):
got.extend(lv)
else:
got.append(lv)
tem[".".join(path)+".ROOT"] = got
return tem
In case you do not want to lose data from array, which are not sub-documents, this implementation will place the data into a "array.ROOT" attribute. See here:
{
"array" : [
{
"innerArray" : [
{
"c" : 1,
"d" : 2
},
{
"ahah" : "asdf"
},
42,
43
]
},
1,
2
],
}
into:
{
"array.xArray.ROOT": [
"1.0",
"2.0"
],
"array.xArray.innerArray.xArray.ROOT": [
"42.0",
"43.0"
],
"array.xArray.innerArray.xArray.c": [
"1.0"
],
"array.xArray.innerArray.xArray.d": [
"2.0"
],
"array.xArray.innerArray.xArray.ahah": [
"asdf"
]
}
I had the same issue, I want to index/store in Solr complicated documents. My approach was to modify the the JsonLoader to accept complicated json documents with arrays/objects as values.
It stores the object/array and then flatten it and indexes the fields.
e.g basic example document
{
"titles_json":{"FR":"This is the FR title" , "EN":"This is the EN title"} ,
"id": 1000003,
"guid": "3b2f2998-85ac-4a4e-8867-beb551c0b3c6"
}
It will store
titles_json:{
"FR":"This is the FR title" ,
"EN":"This is the EN title"
}
and then index fields
titles.FR:"This is the FR title"
titles.EN:"This is the EN title"
Not only you will be able to index the child documents, but also when you perform a search on solr you will receive the original complicated structure of the document that you indexed.
If you want to check the source code, installation and integration details with your existing solr, check
http://www.solrfromscratch.com/2014/08/20/embedded-documents-in-solr/
please note that I have tested this for solr 4.9.0
M.

How to include different catalogs in a query in MongoDB?

Suppose I have different documents in different collections:
On cars:
{ "_id": 32534534, "color": "red", ... }
On houses:
{ "_id": 93867, "city": "Xanadu", ... }
How can I retrieve the corresponding document to the documents below, in people:
{ "name": "Alonso", "owns": [32534534], ... }
{ "name": "Kublai Khan", "owns": [93867], ... }
Can I use something like the code below?
(Note that I'm not specifying a catalog)
db.find({'_id': 93867})
If not, what would you suggest to achieve this efect?
I have just found this related question: MongoDB: cross-collection queries
Using DBrefs you can store links to documents outside your collection or even in another mongodb database. You will have to fetch the references in separate queries, different drivers handle this differently, for example with the python driver you can auto dereference.
An example of yours in the js shell might look like:
> red_car = {"color": "red", "model": "Ford Perfect"}
{"color": "red", "model": "Ford Perfect"}
> db.cars.save(red_car)
> red_car
{
"color" : "red",
"model" : "Ford Perfect",
"_id" : ObjectId("4f041d96874e6f24e704f887")
}
> // Save as DBRef
> alonso = {"name": "Alonso", "owns": [new DBRef('cars', red_car._id)]}
{
"name" : "Alonso",
"owns" : [
{
"$ref" : "cars",
"$id" : ObjectId("4f041d96874e6f24e704f887")
}
]
}
> db.people.save(alonso)
As you can see DBRefs are a formal spec for referencing objects, that always contain the ObjectId but also can contain information on the database and the collection. In the above example you can see it stores the collection cars in the $ref field. Searching is trivial as you just do a query on the dbref:
> dbref = new DBRef('cars', red_car._id)
> red_car_owner = db.people.find({"owns": {$in: [dbref]}})[0]
> red_car_owner
{
"_id" : ObjectId("4f0448e3a1c5cd097fc36a65"),
"name" : "Alonso",
"owns" : [
{
"$ref" : "cars",
"$id" : ObjectId("4f0448d1a1c5cd097fc36a64")
}
]
}
Dereferencing can be done via the fetch() command in the shell:
> red_car_owner.owns[0].fetch()
{
"_id" : ObjectId("4f0448d1a1c5cd097fc36a64"),
"color" : "red",
"model" : "Ford Perfect"
}
However depending on your use case you may want to optimise this and write some code that iterates over the owns array and does as few find() queries as possible...
I think there is no way of achieving querying from multiple collections at once. I may suggest storing them inside the same collection like below with a type field.
{ "_id": 32534534, "type": "car", "color": "red", ... }
{ "_id": 93867, "type": "house", "city": "Xanadu", ... }
You need to restructure your people document to have a type to be added
{ "name": "Alonso", "owns": {ids:[32534534],type:'car'} ... }
{ "name": "Kublai Khan", "owns":{ids:[93867],type:'house'} ... }
so now you can find the people who owns the red color car by
db.people.find({type:car,ids:32534534})
and houses by
db.people.find({type:house,ids:93867})