How to include different catalogs in a query in MongoDB? - mongodb

Suppose I have different documents in different collections:
On cars:
{ "_id": 32534534, "color": "red", ... }
On houses:
{ "_id": 93867, "city": "Xanadu", ... }
How can I retrieve the corresponding document to the documents below, in people:
{ "name": "Alonso", "owns": [32534534], ... }
{ "name": "Kublai Khan", "owns": [93867], ... }
Can I use something like the code below?
(Note that I'm not specifying a catalog)
db.find({'_id': 93867})
If not, what would you suggest to achieve this efect?
I have just found this related question: MongoDB: cross-collection queries

Using DBrefs you can store links to documents outside your collection or even in another mongodb database. You will have to fetch the references in separate queries, different drivers handle this differently, for example with the python driver you can auto dereference.
An example of yours in the js shell might look like:
> red_car = {"color": "red", "model": "Ford Perfect"}
{"color": "red", "model": "Ford Perfect"}
> db.cars.save(red_car)
> red_car
{
"color" : "red",
"model" : "Ford Perfect",
"_id" : ObjectId("4f041d96874e6f24e704f887")
}
> // Save as DBRef
> alonso = {"name": "Alonso", "owns": [new DBRef('cars', red_car._id)]}
{
"name" : "Alonso",
"owns" : [
{
"$ref" : "cars",
"$id" : ObjectId("4f041d96874e6f24e704f887")
}
]
}
> db.people.save(alonso)
As you can see DBRefs are a formal spec for referencing objects, that always contain the ObjectId but also can contain information on the database and the collection. In the above example you can see it stores the collection cars in the $ref field. Searching is trivial as you just do a query on the dbref:
> dbref = new DBRef('cars', red_car._id)
> red_car_owner = db.people.find({"owns": {$in: [dbref]}})[0]
> red_car_owner
{
"_id" : ObjectId("4f0448e3a1c5cd097fc36a65"),
"name" : "Alonso",
"owns" : [
{
"$ref" : "cars",
"$id" : ObjectId("4f0448d1a1c5cd097fc36a64")
}
]
}
Dereferencing can be done via the fetch() command in the shell:
> red_car_owner.owns[0].fetch()
{
"_id" : ObjectId("4f0448d1a1c5cd097fc36a64"),
"color" : "red",
"model" : "Ford Perfect"
}
However depending on your use case you may want to optimise this and write some code that iterates over the owns array and does as few find() queries as possible...

I think there is no way of achieving querying from multiple collections at once. I may suggest storing them inside the same collection like below with a type field.
{ "_id": 32534534, "type": "car", "color": "red", ... }
{ "_id": 93867, "type": "house", "city": "Xanadu", ... }

You need to restructure your people document to have a type to be added
{ "name": "Alonso", "owns": {ids:[32534534],type:'car'} ... }
{ "name": "Kublai Khan", "owns":{ids:[93867],type:'house'} ... }
so now you can find the people who owns the red color car by
db.people.find({type:car,ids:32534534})
and houses by
db.people.find({type:house,ids:93867})

Related

How to combine Documents in aggregation pipeline with MongoDB Java driver 3.6?

I am using an aggregation pipeline with the MongoDB Java driver version 3.6. If I have documents that look something like:
doc1 --
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
doc2 --
{
"CAR": {
"VIN": "ASDF1234",
"AVAILABILITY": "In Stock"
}
}
And if I submit a query like:
collection.aggregate(
Arrays.asList(
Aggregates.match(
and(
in("CAR.VIN", vinList),
or(
eq("CAR.MAKE", carMake),
eq("CAR.AVAILABILITY", carAvailability),
)
)
)
)
)
Let us assume that there are exactly two different records for which the "CAR.VIN" criteria match for every VIN, and I am going to get two results. Rather than deal with two results each time, I would like to merge the documents so that the result looks like this:
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord",
"AVAILABILITY": "In Stock"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
The example where I have two and only two results trivializes my need for this. Imagine that vinList is a list of 10000 values, and it might return 2 x 10000 documents. When I return an AggregateIterable to the client that is calling my code, I do not want to impose the requirement that they have to group or collate the results in any way, but that they will receive one document for each result that has all of the information that they will want to parse, cleanly and easily.
Of course, people will suggest that the data is simply combined into one document with all of the data in the MongoDB collection. For reasons that I cannot control, there are two separate documents corresponding to each VIN in the same collection, and that is something that I am unable to change. There is a value in our system that makes this more reasonable than it might seem, so please don't focus on this apparent problem with the data.
I am trying, with not much luck, to utilize the Aggretes.group() operation to merge the fields in my aggregation pipeline. Accumulators.push seems to be the closest operation to what I need, but I do not want to complicate the document structure with extra arrays, etc. Is there a straightforward approach that I am not seeing?
you can try $mergeObjects added in mongo v3.6
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$mergeObjects : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : {
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
}
}
>
to get features as array
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$push : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : [
{
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
},
null
]
}
>

Can elastic search provide nested json results?

I know that elastic search provides good support for nested json. It has very good support for nested objects with advance indexing.
So, when I make a nested query in elastic search, can the query result be obtained in original nested form ? Or is the query result in flattened form like that in lucene or solr ?
Note: I have used apache solr and lucene before. And, I am evaluating other different search platforms for better support for nested json objects.
I'm giving you a simple example of results maintaining the depth.
PUT people { "mappings": {
"list": {
"properties": {
"name": {
"type": "nested"
}
}
} } }
PUT people/list/1 { "age" : "19", "name" : [
{
"first" : "John",
"last" : "Smith"
} ] }
PUT people/list/2 { "age" : "23", "name" : [
{
"first" : "Wilber",
"last" : "Smith"
} ] }
GET people/list/_search { "query": {
"match_all": {} } }
As far as I understand, you'll prefer nested mapping to object mapping. Because object would flatten results. See this for reference:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/nested.html

MongoDB: text search on nested view

I have a document in MongoDB 3.2 with the following structure:
"_id" : ObjectId("5759815b94db5928bea3c3a5"),
"source" : "pons1",
"libraries" : [
{
"archive" : “deko1”,
"last_access" : ISODate("2016-06-09T14:45:04.644+0000"),
"books" : [
{
"title": "American Gods",
"author": "Neil Gaiman"
},
{
"title": "A Little Life",
"author": "Hanya Yanagihara"
}
]
},
{
"archive" : “deko90”,
"last_access" : ISODate("2016-06-10T12:45:03.624+0000"),
"books" : [
{
"title": "Sociology of News",
"author": "Michael Schudson"
},
{
"title": "City of God",
"author": "Augustine of Hippo"
}
]
}
]
There is an array (“books”) inside of another array (“libraries”).
Since the book titles are indexed as "text," I want to be able to conduct a free text search, and return only the relevant array elements.
For instance, if I search for the term “Gods,” I would like to see the following result:
"_id" : ObjectId("5759815b94db5928bea3c3a5"),
"source" : "pons1",
"libraries" : [
{
"archive" : “deko1”,
"last_access" : ISODate("2016-06-09T14:45:04.644+0000"),
"books" : [
{
"title": "American Gods",
"author": "Neil Gaiman"
}
]
},
{
"archive" : “deko90”,
"last_access" : ISODate("2016-06-10T12:45:03.624+0000"),
"books" : [
{
"title": "City of God",
"author": "Augustine of Hippo"
}
]
}
]
In MongoDB 3.2, you can filter the elements of an array using “$filter” (https://docs.mongodb.com/manual/reference/operator/aggregation/filter/).
The problem is that you cannot use text search ($text) as a condition for $filter.
$text can only be used in the first stage of the aggregation pipeline ($match) (https://docs.mongodb.com/manual/tutorial/text-search-in-aggregation/).
There is one obvious workaround: give up the power of MongoDB’s text search and maybe work with regex.
That does not seem a good option for me. I’d rather not lose the diacritic insensitivity and other interesting functionalities of MongoDB’s text search.
Is there a way of reconciling $filter and $text in the same MongoDB query?

Can I utilize indexes when querying by MongoDB subdocument without known field names?

I have a document structure like follows:
{
"_id": ...,
"name": "Document name",
"properties": {
"prop1": "something",
"2ndprop": "other_prop",
"other3": ["tag1", "tag2"],
}
}
I can't know the actual field names in properties subdocument (they are given by the application user), so I can't create indexes like properties.prop1. Neither can I know the structure of the field values, they can be single value, embedded document or array.
Is there any practical way to do performant queries to the collection with this kind of schema design?
One option that came to my mind is to add a new field to the document, index it and set used field names per document into this field.
{
"_id": ...,
"name": "Document name",
"properties": {
"prop1": "something",
"2ndprop": "other_prop",
"other3": ["tag1", "tag2"],
},
"property_fields": ["prop1", "2ndprop", "other3"]
}
Now I could first run query against property_fields field and after that let MongoDB scan through the found documents to see whether properties.prop1 contains the required value. This is definitely slower, but could be viable.
One way of dealing with this is to use schema like below.
{
"name" : "Document name",
"properties" : [
{
"k" : "prop1",
"v" : "something"
},
{
"k" : "2ndprop",
"v" : "other_prop"
},
{
"k" : "other3",
"v" : "tag1"
},
{
"k" : "other3",
"v" : "tag2"
}
]
}
Then you can index "properties.k" and "properties.v" for example like this:
db.foo.ensureIndex({"properties.k": 1, "properties.v": 1})

How do I query a hash sub-object that is dynamic in mongodb?

I currently have a Question object and am not sure how to query for it?
{ "title" : "Do you eat fast food?"
"answers" : [
{
"_id" : "506b422ff42c95000e00000d",
"title" : "Yes",
"trait_score_modifiers" : {
"hungry" : 1
}
},
{
"_id" : "506b422ff42c95000e00000e",
"title" : "No",
"trait_score_modifiers" : {
"not-hungry" : -1
}
}]
}
I am trying to find questions where the trait_score_modifieres is queried (sometimes it exists, sometimes not)
I have the following but it is not dynamic:
db.questions.find({"answers.trait_score_modifiers.not-hungry":{$exists: true}})
How could i do something like this?
db.questions.find({"answers.trait_score_modifiers.{}.size":{$gt: 0}})
You should modify the schema so you have consistent key names to query on. I ran into a similar problem using the aggregation framework, see question: Total values from all keys in subdocument
Something like this should work (not tested):
{
"title" : "Do you eat fast food?"
"answers" : [
{
"title" : "Yes",
"trait_score_modifiers" : [
{"dimension": "hungry", "value": 1}
]
},
{
"title" : "No",
"trait_score_modifiers" : [
{"dimension": "not-hungry", "value": -1}
]
}]
}
You can return all questions that have a dynamic dimension (e.g. "my new dimension") with:
db.questions.find("answers.trait_score_modifiers.dimension": "my new dimension")
Or limit the returned set to questions that have a specific value on that dimension (e.g. > 0):
db.questions.find(
"answers.trait_score_modifiers": {
"$elemMatch": {
"dimension": "my new dimension",
"value": {"$gt": 0}
}
}
)
Querying nested arrays can be a bit tricky, be sure to read up on the documentation In this case, $elemMatch is needed because otherwise you return a document that has some trait_score_modifier my new dimension but the matching value is in the dimension key of a different array element.
You need $elemMatch criteria in your query.
Refer to: http://docs.mongodb.org/manual/reference/projection/elemMatch/
Let me know if you need the query.