Get key from map and retrieve data with it in one query - mongodb

I have two collections in mongo. First, is the actual data:
{ "_id" : "internal_key1", "data" : "some data1" }
{ "_id" : "internal_key2", "data" : "some data2" }
{ "_id" : "internal_key3", "data" : "some data3" }
Another one is a map of keys as provided by some external service to my internal keys:
{ "_id" : "ext_key111", "internal" : "internal_key1" }
{ "_id" : "ext_key222", "internal" : "internal_key2" }
{ "_id" : "ext_key333", "internal" : "internal_key3" }
If I only have external key, can I somehow retrieve data (for example, given "ext_key111" retrieve "some data1") with just one query? Not counting eval-like stuff, of course.

The short answer is NO.
You are basically asking for a join and MongoDB explicitly does not allow joins.
However, depending on your requirements, you may be able to get away with the following structure:
{ "_id" : "internal_key1", "ext_id": "ext_key111", "data" : "some data1" }
You can create an additional index on ext_id. You can even make it unique which seems to match your data.
db.ensureIndex({ ext_id: 1 }, { unique: true, background: true })

Related

What is the best way of writing a collection schema to map another collection?

In mongodb I have many collection like below
boys_fashion
girls_fashion
gents_fashion
ladies_fashion
girls_accessories
gents_accessories
ladies_accessories
based on some fields I need to use different collection. So for that I thought to create a collection which will map to a specific collection. Below is the collection I have created for that.
{
"type" : "Fashion",
"gender" : "Gents",
"kids" : "true",
"collection" : "boys_fashion"
}
{
"type" : "Accessories",
"gender" : "Ladies",
"kids" : "false",
"collection" : "ladies_accessories"
}
{
"type" : "Fashion",
"gender" : "Gents",
"kids" : "false",
"collection" : "gents_fashion"
}
{
"type" : "Accessories",
"gender" : "Ladies",
"kids" : "true",
"collection" : "girls_accessories"
}
{
"type" : "Accessories",
"gender" : "Gents",
"kids" : "true",
"collection" : "gents_accessories"
}
{
"type" : "Accessories",
"gender" : "Gents",
"kids" : "false",
"collection" : "gents_accessories"
}
Is this is the right way to do this? or please suggest me some other ways
If I stored like below(the above option is similar to RDBMS. Since its mongo I guess I used this way). How can I write a query for fetching the collection?
{
"fashion" : {
"gents" : {
"true" : "boys_fashion",
"false" : "gents_fashion"
}
},
"accessories" : {
"ladies" : {
"true" : "girls_accessories",
"false" : "ladies_accessories"
}
}
}
Assumptions:
There were one collection before and you split them into multiple collections as they are getting large and you want to solve it without sharing.
I would not even create a collection for the following data. This data is static reference data and will act as a router. On start up, application loads this data and creates a router.
{
"type" : "Fashion",
"gender" : "Gents",
"kids" : "true",
"collection" : "boys_fashion"
}
{
"type" : "Accessories",
"gender" : "Ladies",
"kids" : "false",
"collection" : "ladies_accessories"
}
...
What do I mean by creating a kind of router by that static configuration file? Lets say you receive a query fashionable items for baby girls. This router will tell you, hey you need to search girls_accessories collection. And you send the query to girls_accessories collection and return the result.
Lets take another example, you receive a query for fashionable items for female. This router will tell you hey you need to search, ladies_accessories and girls_accessories. You send the query to both collections and combine the result and send it back.
Conclusion
If my assumptions are correct, you don't need a collection to store the reference data. What you have done is manual sharding by splitting the data across different collections, the reference data will act as router to tell you where to search and combine
Update based on comments
Query does not involve multiple collections
Administrator can add new collection and application should query it without modifying code.
Solution 1
You create a collection for the reference data too
Downside to this is that every query involves two calls to database. First to fetch the static data and second to the data collection
Solution 2
You create a collection for the reference data too
You also build a Dao on top of that which uses #Cacheable for the method that return this data.
You also add a method in Dao to clear the cache with #CacheEvict and have a rest endpoint like /refresh-static-data that will call this method`
Downside to this method is that whenever administrator add new collection, you need to call the endpoint.
Solution 3
Same as solution 2 but instead of having an endpoint to clear the cache, you combine it with scheduler
#Scheduled(fixedRate = ONE_DAY)
#CacheEvict(value = { CACHE_NAME })
public void clearCache() {
}
Downside to this solution is that you have to come up with a period for fixedRate which is acceptable to your business
I added the collection like below.
/* 1 */
{
"type" : "fashion",
"category" : [
{
"value" : "gents",
"true" : "boys_fashion",
"false" : "gents_fasion"
}
]
}
/* 2 */
{
"type" : "accessories",
"category" : [
{
"value" : "ladies",
"true" : "girls_accessories",
"false" : "ladies_accessories"
}
]
}
and will fetch the data using the below query
findOne({"type":type,"category.value":cvalue},{"_id": 0, "category": {"$elemMatch": {"value": cvalue}}})
Try to make subdocuments in MongoDB, instead of nested objects
https://mongoosejs.com/docs/subdocs.html

MongoDb - Query for specific subdocument

I have a set of mongodb documents with the following structure:
{
"_id" : NUUID("58fbb893-dfe9-4f08-a761-5629d889647d"),
"Identifiers" : {
"IdentificationLevel" : 2,
"Identifier" : "extranet\\test#test.com"
},
"Personal" : {
"FirstName" : "Test",
"Surname" : "Test"
},
"Tags" : {
"Entries" : {
"ContactLists" : {
"Values" : {
"0" : {
"Value" : "{292D8695-4936-4865-A413-800960626E6D}",
"DateTime" : ISODate("2015-04-30T09:14:45.549Z")
}
}
}
}
}
}
How can I make a query with the mongo shell which finds all documents with a specific "Value" (e.g.{292D8695-4936-4865-A413-800960626E6D} in the Tag.Entries.ContactLists.Values path?
The structure is unfortunately locked by Sitecore, so it is not an options to use another structure.
As your sample collection structure show Values is object, it contains only one Value. Also you must check for Value as it contains extra paranthesis. If you want to get Value from given structure try following query :
db.collection.find({
"Tags.Entries.ContactLists.Values.0.Value": "{292D8695-4936-4865-A413-800960626E6D}"
})

Get specific object in array of array in MongoDB

I need get a specific object in array of array in MongoDB.
I need get only the task object = [_id = ObjectId("543429a2cb38b1d83c3ff2c2")].
My document (projects):
{
"_id" : ObjectId("543428c2cb38b1d83c3ff2bd"),
"name" : "new project",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"members" : [
ObjectId("5424ac37eb0ea85d4c921f8b")
],
"US" : [
{
"_id" : ObjectId("5434297fcb38b1d83c3ff2c0"),
"name" : "Test Story",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"tasks" : [
{
"_id" : ObjectId("54342987cb38b1d83c3ff2c1"),
"name" : "teste3",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
},
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
]
}
]
}
Result expected:
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
But i not getting it.
I still testing with no success:
db.projects.find({
"US.tasks._id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}, { "US.tasks.$" : 1 })
I tryed with $elemMatch too, but return nothing.
db.projects.find({
"US" : {
"tasks" : {
$elemMatch : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}
}
}
})
Can i get ONLY my result expected using find()? If not, what and how use?
Thanks!
You will need an aggregation for that:
db.projects.aggregate([{$unwind:"$US"},
{$unwind:"$US.tasks"},
{$match:{"US.tasks._id":ObjectId("543429a2cb38b1d83c3ff2c2")}},
{$project:{_id:0,"task":"$US.tasks"}}])
should return
{ task : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
Explanation:
$unwind creates a new (virtual) document for each array element
$match is the query part of your find
$project is similar as to project part in find i.e. it specifies the fields you want to get in the results
You might want to add a second $match before the $unwind if you know the document you are searching (look at performance metrics).
Edit: added a second $unwind since US is an array.
Don't know what you are doing (so realy can't tell and just sugesting) but you might want to examine if your schema (and mongodb) is ideal for your task because the document looks just like denormalized relational data probably a relational database would be better for you.

Resolving MongoDB DBRef array using Mongo Native Query and working on the resolved documents

My MongoDB collection is made up of 2 main collections :
1) Maps
{
"_id" : ObjectId("542489232436657966204394"),
"fileName" : "importFile1.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042e9")
},
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042ea")
}
]
},
{
"_id" : ObjectId("542489262436657966204398"),
"fileName" : "importFile2.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("542489232436657966204395")
}
],
"uploadDate" : ISODate("2012-08-22T09:06:40.000Z")
}
2) Territories, which are referenced in "Map" objects :
{
"_id" : ObjectId("5424892224366579662042e9"),
"name" : "Afghanistan",
"area" : 653958
},
{
"_id" : ObjectId("5424892224366579662042ea"),
"name" : "Angola",
"area" : 1252651
},
{
"_id" : ObjectId("542489232436657966204395"),
"name" : "Unknown",
"area" : 0
}
My objective is to list every map with their cumulative area and number of territories. I am trying the following query :
db.maps.aggregate(
{'$unwind':'$territories'},
{'$group':{
'_id':'$fileName',
'numberOf': {'$sum': '$territories.name'},
'locatedArea':{'$sum':'$territories.area'}
}
})
However the results show 0 for each of these values :
{
"result" : [
{
"_id" : "importFile2.json",
"numberOf" : 0,
"locatedArea" : 0
},
{
"_id" : "importFile1.json",
"numberOf" : 0,
"locatedArea" : 0
}
],
"ok" : 1
}
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example of such a case in the Mongo doc. area is stored as an integer, and name as a string.
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example
of such a case in the Mongo doc. area is stored as an integer, and
name as a string.
Yes indeed, the field "territories" has an array of database references and not the actual documents. DBRefs are objects that contain information with which we can locate the actual documents.
In the above example, you can clearly see this, fire the below mongo query:
db.maps.find({"_id":ObjectId("542489232436657966204394")}).forEach(function(do
c){print(doc.territories[0]);})
it will print the DBRef object rather than the document itself:
o/p: DBRef("territories", ObjectId("5424892224366579662042e9"))
so, '$sum': '$territories.name','$sum': '$territories.area' would show you '0' since there are no fields such as name or area.
So you need to resolve this reference to a document before doing something like $territories.name
To achieve what you want, you can make use of the map() function, since aggregation nor Map-reduce support sub queries, and you already have a self-contained map document, with references to its territories.
Steps to achieve:
a) get each map
b) resolve the `DBRef`.
c) calculate the total area, and the number of territories.
d) make and return the desired structure.
Mongo shell script:
db.maps.find().map(function(doc) {
var territory_refs = doc.territories.map(function(terr_ref) {
refName = terr_ref.$ref;
return terr_ref.$id;
});
var areaSum = 0;
db.refName.find({
"_id" : {
$in : territory_refs
}
}).forEach(function(i) {
areaSum += i.area;
});
return {
"id" : doc.fileName,
"noOfTerritories" : territory_refs.length,
"areaSum" : areaSum
};
})
o/p:
[
{
"id" : "importFile1.json",
"noOfTerritories" : 2,
"areaSum" : 1906609
},
{
"id" : "importFile2.json",
"noOfTerritories" : 1,
"areaSum" : 0
}
]
Map-Reduce functions should not be and cannot be used to resolve DBRefs in the server side.
See what the documentation has to say:
The map function should not access the database for any reason.
The map function should be pure, or have no impact outside of the
function (i.e. side effects.)
The reduce function should not access the database, even to perform
read operations. The reduce function should not affect the outside
system.
Moreover, a reduce function even if used(which can never work anyway) will never be called for your problem, since a group w.r.t "fileName" or "ObjectId" would always have only one document, in your dataset.
MongoDB will not call the reduce function for a key that has only a
single value

Filtering Mongo items by multiple fields and subfields

I have the following items in my collection:
> db.test.find().pretty()
{ "_id" : ObjectId("532c471a90bc7707609a3d4f"), "name" : "Alice" }
{
"_id" : ObjectId("532c472490bc7707609a3d50"),
"name" : "Bob",
"partner_type1" : {
"status" : "rejected"
}
}
{
"_id" : ObjectId("532c473e90bc7707609a3d51"),
"name" : "Carol",
"partner_type2" : {
"status" : "accepted"
}
}
{
"_id" : ObjectId("532c475790bc7707609a3d52"),
"name" : "Dave",
"partner_type1" : {
"status" : "pending"
}
}
There are two partner types: partner_type1 and partner_type2. A user cannot be accepted partner in the both of types. But he can be a rejected partner in partner_type1 but accepted in the another, for example.
How can I build Mongo query that fetches the users that can become partners?
When your user can only be accepted in one partner-type, you should turn it around: Have a field accepted_as:"partner_type1" or accepted_as:"partner_type2". For people who aren't accepted yet, either have no such field or set it to null.
In both cases, your query to get any non-accepted will then be:
{
data.accepted_as: null
}
(null matches both non-existing fields as well as fields explicitly set to null)
For me the logical schema would be this:
"partner : {
"type": 1,
"status" : "rejected"
}
At least that keeps the paths consistent between documents.
So if you want to stay away from using mapReduce type methods to find out "which field" it is on, and otherwise use plain queries and the aggregation pipeline, then don't vary field paths on documents. If you alter the "data" then that is the most consistent form.