Find all objects, that's nested properties have desired value - mongodb

I have collection with the following (sample) documents:
{
"label": "Tree",
"properties": {
"height": {
"type": "int",
"label": "Height",
"description": "In meters"
},
"coordinates": {
"type": "coords",
"label": "Coordinates"
},
"age": {
"type": "int",
"label": "Age"
}
}
}
Keys in the properties attribute are different for almost each of the documents in collection.
I want to find all documents that have at least one property of given type.
What I'm looking for is to query this for {"properties.*.type": "coords"}. But this is not working as it is only my invention of mongo query.
Every help I was able to find concerned the $elemMatch operator which I can not use here because properties is an object, not an array.

Hi as per my knowledge in mongodb not provide this kind of search. So for finding this first I separated out all keys using map-reduce and then find query form so below code will help you
var mapReduce = db.runCommand({
"mapreduce": "collectionName",
"map": function() {
for (var key in this.properties) {
emit(key, null);
}
},
"reduce": function(key, stuff) {
return null;
},
"out": "collectionName" + "_keys"
})
db[mapReduce.result].distinct("_id").forEach(function(data) {
findkey = [];
findkey.push("properties." + data + ".type");
var query = {};
query[findkey] = "coords";
var myCursor = db.collectionName.find(query);
while (myCursor.hasNext()) {
print(tojson(myCursor.next()));
}
})

MongoDB doesn't support searches on keys - things like properties.* to match all subkeys of properties, etc. You shouldn't have arbitrary keys or keys that you don't know about in your schema, unless they are just for display, generally, because you will not be able to interact with them very easily in MongoDB.
If you do want to store dynamic attributes, the best approach is usually an array like the following:
{
"properties" : [
{
"key" : "height",
"value" : {
"type" : "Int",
"label" : "Height",
"description" : "In meters"
}
},
...
]
}
Efficient querying for your use case
find all documents that have at least one property of given type
results from an index on { "key" : 1 }:
db.test.find({ "properties.key" : { "$in" : ["height", "coordinates", "age"] } })

Related

Elasticsearch java high level client group by and max

I am using Scala 2.12 and Elasticsearch 6.5. Using the high level java client to query the ES.
Required Data is as E.g. Simple example of Documents has 2 sets of data (published 2 times) with different id and timestamp.
id: id_123 and id_234 (Theese are 2 different ids of required documents) and timestamp(representation only) 10 AM (for id_123) and 11 AM (for id_234).
So I just need those documents which are latest among these i.e. 11 AM one.
I have some filter conditions and then need to group on field1 and take the max of field2 (which is timestamp).
val searchRequest = new SearchRequest("index_name")
val searchSourceBuilder = new SearchSourceBuilder()
val qb = QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("myfield.date", "2019-07-02"))
.must(QueryBuilders.matchQuery("myfield.data", "1111"))
.must(QueryBuilders.boolQuery()
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex1"))
.should(QueryBuilders.regexpQuery("myOtherFieldId", "myregex2"))
)
val myAgg = AggregationBuilders.terms("group_by_Id").field("field1.Id").subAggregation(AggregationBuilders.max("timestamp").field("field1.timeStamp"))
searchSourceBuilder.query(qb)
searchSourceBuilder.aggregation(myAgg)
searchSourceBuilder.size(1000)
searchRequest.source(searchSourceBuilder)
val searchResponse = client.search(searchRequest, RequestOptions.DEFAULT)
Basically, all works good if I do not use the aggregation.
When I use the aggregation, I am getting the following error:
ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Expected numeric type on field [field1.timeStamp], but got [keyword]]]
So what am I missing here?
I am basically looking for SQL-like query, which has fileter (where, AND/OR clause) and then group by a field (Id) and take documents only where timeStamp is max.
UPDATE:
I tried the above query in cURL via command prompt and get the same error when using "max" on aggregaation.
{
"query": {
"bool": {
"must": [
{
"match": { "myfield.date" : "2019-07-02" }
},
{
"match": { "myfield.data" : "1111" }
},
{
"bool": {
"should": [
{
"regexp": { "myOtherFieldId": "myregex1" }
},
{
"regexp": { "myOtherFieldId": "myregex2" }
}
]
}
}
]
}
},
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp"
}
}
}
}
},
"size": "10000"
}
I am getting the same error.
I tried to check the mappings of the index.
It is showing as keyword. So how to do max on such fields?
Adding the relevant mappings:
{"index_name":{"mappings":{"data":{"dynamic_templates":[{"boolean_as_keyword":{"match":"*","match_mapping_type":"boolean","mapping":{"ignore_above":256,"type":"keyword"}}},{"double_as_keyword":{"match":"*","match_mapping_type":"double","mapping":{"ignore_above":256,"type":"keyword"}}},{"long_as_keyword":{"match":"*","match_mapping_type":"long","mapping":{"ignore_above":256,"type":"keyword"}}},{"string_as_keyword":{"match":"*","match_mapping_type":"string","mapping":{"ignore_above":256,"type":"keyword"}}}],"date_detection":false,"properties":{"header":{"properties":{"Id":{"type":"keyword","ignore_above":256},"otherId":{"type":"keyword","ignore_above":256},"someKey":{"type":"keyword","ignore_above":256},"dataType":{"type":"keyword","ignore_above":256},"processing":{"type":"keyword","ignore_above":256},"otherKey":{"type":"keyword","ignore_above":256},"sender":{"type":"keyword","ignore_above":256},"receiver":{"type":"keyword","ignore_above":256},"system":{"type":"keyword","ignore_above":256},"timeStamp":{"type":"keyword","ignore_above":256}}}}}}}}
UPDATE2:
I think I need to aggregate (timeStamp) on keyword.
Please note that timeStamp is a subfield i.e. under field1. So below syntax for keyword doesn't seem to work or I am missing something else.
"aggs": {
"NAME" : {
"terms": {
"field": "field1.Id"
},
"aggs": {
"NAME": {
"max" : {
"field": "field1.timeStamp.keyword"
}
}
}
}
}
It fails now saying:
"Invalid aggregator order path [field1.timeStamp]. Unknown aggregation [field1]"

MongoDB - Get Names of All Keys Matching Criteria in a Collection

As the title says, I need to retrieve the names of all the keys in my MongoDB collection, BUT I need them split up based on a key/value pair that each document has. Here's my clunky analogy: If you imagine the original collection is a zoo, I need a new collection that contains all the keys Zebras have, all the keys Lions have, and all the keys Giraffes have. The different animal types share many of the same keys, but those keys are meant to be specific to each type of animal (because the user needs to be able to (for example) search for Zebras taller than 3ft and giraffes shorter than 10ft).
Here's a bit of example code that I ran which worked well - it grabbed all the unique keys in my entire collection and threw them into their own collection:
db.runCommand({
"mapreduce" : "MyZoo",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "MyZoo" + "_keys"
})
I'd like a version of this command that would look through the MyZoo collection for animals with "type":"zebra", find all the unique keys, and place them in a new collection (MyZoo_keys) - then do the same thing for "type":"lion" & "type":"giraffe", giving each "type" its own array of keys.
Here's the collection I'm starting with:
{
"name": "Zebra1",
"height": "300",
"weight": "900",
"type": "zebra"
"zebraSpecific1": "somevalue"
},
{
"name": "Lion1",
"height": "325",
"weight": "1200",
"type": "lion",
},
{
"name": "Zebra2",
"height": "500",
"weight": "2100",
"type": "zebra",
"zebraSpecific2": "somevalue"
},
{
"name": "Giraffe",
"height": "4800",
"weight": "2400",
"type": "giraffe"
"giraffeSpecific1": "somevalue",
"giraffeSpecific2": "someothervalue"
}
And here's what I'd like the MyZoo_keys collection to look like:
{
"zebra": [
{
"name": null,
"height": null,
"weight": null,
"type": null,
"zebraSpecific1": null,
"zebraSpecific2": null
}
],
"lion": [
{
"name": null,
"height": null,
"weight": null,
"type": null
}
],
"giraffe": [
{
"name": null,
"height": null,
"weight": null,
"type": null,
"giraffeSpecific1": null,
"giraffeSpecific2": null
}
]
}
That's probably imperfect JSON, but you get the idea...
Thanks!
You can modify your code to dump the results in a more readable and organized format.
The map function:
Emit the type of animal as key, and an array of keys for
each animal(document). Leave out the _id field.
Code:
var map = function(){
var keys = [];
Object.keys(this).forEach(function(k){
if(k != "_id"){
keys.push(k);
}
})
emit(this.type,{"keys":keys});
}
The reduce function:
For each type of animal, consolidate and return the unique keys.
Use an Object(uniqueKeys) to check for duplicates, this increases the running
time even if it occupies some memory. The look up is O(1).
Code:
var reduce = function(key,values){
var uniqueKeys = {};
var result = [];
values.forEach(function(value){
value.keys.forEach(function(k){
if(!uniqueKeys[k]){
uniqueKeys[k] = 1;
result.push(k);
}
})
})
return {"keys":result};
}
Invoking Map-Reduce:
db.collection.mapReduce(map,reduce,{out:"t1"});
Aggregating the result:
db.t1.aggregate([
{$project:{"_id":0,"animal":"$_id","keys":"$value.keys"}}
])
Sample o/p:
{
"animal" : "lion",
"keys" : [
"name",
"height",
"weight",
"type"
]
}
{
"animal" : "zebra",
"keys" : [
"name",
"height",
"weight",
"type",
"zebraSpecific1",
"zebraSpecific2"
]
}
{
"animal" : "giraffe",
"keys" : [
"name",
"height",
"weight",
"type",
"giraffeSpecific1",
"giraffeSpecific2"
]
}

MongoDB conditionally $addToSet sub-document in array by specific field

Is there a way to conditionally $addToSet based on a specific key field in a subdocument on an array?
Here's an example of what I mean - given the collection produced by the following sample bootstrap;
cls
db.so.remove();
db.so.insert({
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples"
}
]
});
n defines a unique document key. I only want one entry with the same n value in the array at any one time. So I want to be able to update the pfms array using n so that I end up with just this;
{
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples",
"mState": 1111234
}
]
}
Here's where I am at the moment;
db.so.update({
"Name": "fruitBowl",
},{
// not allowed to do this of course
// "$pull": {
// "pfms": { n: "apples" },
// },
"$addToSet": {
"pfms": {
"$each": [
{
"n": "apples",
"mState": 1111234
}
]
}
}
}
)
Unfortunately, this adds another array element;
db.so.find().toArray();
[
{
"Name" : "fruitBowl",
"_id" : ObjectId("53ecfef5baca2b1079b0f97c"),
"pfms" : [
{
"n" : "apples"
},
{
"n" : "apples",
"mState" : 1111234
}
]
}
]
I need to effectively upsert the apples document matching on n as the unique identifier and just set mState whether or not an entry already exists. It's a shame I can't do a $pull and $addToSet in the same document (I tried).
What I really need here is dictionary semantics, but that's not an option right now, nor is breaking out the document - can anyone come up with another way?
FWIW - the existing format is a result of language/driver serialization, I didn't choose it exactly.
further
I've gotten a little further in the case where I know the array element already exists I can do this;
db.so.update({
"Name": "fruitBowl",
"pfms.n": "apples",
},{
$set: {
"pfms.$.mState": 1111234,
},
}
)
But of course that only works;
for a single array element
as long as I know it exists
The first limitation isn't a disaster, but if I can't effectively upsert or combine $addToSet with the previous $set (which of course I can't) then it the only workarounds I can think of for now mean two DB round-trips.
The $addToSet operator of course requires that the "whole" document being "added to the set" is in fact unique, so you cannot change "part" of the document or otherwise consider it to be a "partial match".
You stumbled on to your best approach using $pull to remove any element with the "key" field that would result in "duplicates", but of course you cannot modify the same path in different update operators like that.
So the closest thing you will get is issuing separate operations but also doing that with the "Bulk Operations API" which is introduced with MongoDB 2.6. This allows both to be sent to the server at the same time for the closest thing to a "contiguous" operations list you will get:
var bulk = db.so.initializeOrderedBulkOp();
bulk.find({ "Name": "fruitBowl", "pfms.n": "apples": }).updateOne({
"$pull": { "pfms": { "n": "apples" } }
});
bulk.find({ "Name": "fruitBowl" }).updateOne({
"$push": { "pfms": { "n": "apples", "state": 1111234 } }
})
bulk.execute();
That pretty much is your best approach if it is not possible or practical to move the elements to another collection and rely on "upserts" and $set in order to have the same functionality but on a collection rather than array.
I have faced the exact same scenario. I was inserting and removing likes from a post.
What I did is, using mongoose findOneAndUpdate function (which is similar to update or findAndModify function in mongodb).
The key concept is
Insert when the field is not present
Delete when the field is present
The insert is
findOneAndUpdate({ _id: theId, 'likes.userId': { $ne: theUserId }},
{ $push: { likes: { userId: theUserId, createdAt: new Date() }}},
{ 'new': true }, function(err, post) { // do the needful });
The delete is
findOneAndUpdate({ _id: theId, 'likes.userId': theUserId},
{ $pull: { likes: { userId: theUserId }}},
{ 'new': true }, function(err, post) { // do the needful });
This makes the whole operation atomic and there are no duplicates with respect to the userId field.
I hope this helpes. If you have any query, feel free to ask.
As far as I know MongoDB now (from v 4.2) allows to use aggregation pipelines for updates.
More or less elegant way to make it work (according to the question) looks like the following:
db.runCommand({
update: "your-collection-name",
updates: [
{
q: {},
u: {
$set: {
"pfms.$[elem]": {
"n":"apples",
"mState": NumberInt(1111234)
}
}
},
arrayFilters: [
{
"elem.n": {
$eq: "apples"
}
}
],
multi: true
}
]
})
In my scenario, The data need to be init when not existed, and update the field If existed, and the data will not be deleted. If the datas have these states, you might want to try the following method.
// Mongoose, but mostly same as mongodb
// Update the tag to user, If there existed one.
const user = await UserModel.findOneAndUpdate(
{
user: userId,
'tags.name': tag_name,
},
{
$set: {
'tags.$.description': tag_description,
},
}
)
.lean()
.exec();
// Add a default tag to user
if (user == null) {
await UserModel.findOneAndUpdate(
{
user: userId,
},
{
$push: {
tags: new Tag({
name: tag_name,
description: tag_description,
}),
},
}
);
}
This is the most clean and fast method in the scenario.
As a business analyst , I had the same problem and hopefully I have a solution to this after hours of investigation.
// The customer document:
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK"
}
]
}
// The problem : I want to insert dateField "01.01.2016" to customer documents where customerCodes subdocument has a document with code "YK" but does not have dateField. The final document must be as follows :
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK" ,
"dateField" : "01.01.2016"
}
]
}
// The solution : the solution code is in three steps :
// PART 1 - Find the customers with customerCodes "YK" but without dateField
// PART 2 - Find the index of the subdocument with "YK" in customerCodes list.
// PART 3 - Insert the value into the document
// Here is the code
// PART 1
var myCursor = db.customers.find({ customerCodes:{$elemMatch:{code:"YK", dateField:{ $exists:false} }}});
// PART 2
myCursor.forEach(function(customer){
if(customer.customerCodes != null )
{
var size = customer.customerCodes.length;
if( size > 0 )
{
var iFoundTheIndexOfSubDocument= -1;
var index = 0;
customer.customerCodes.forEach( function(clazz)
{
if( clazz.code == "YK" && clazz.changeDate == null )
{
iFoundTheIndexOfSubDocument = index;
}
index++;
})
// PART 3
// What happens here is : If i found the indice of the
// "YK" subdocument, I create "updates" document which
// corresponds to the new data to be inserted`
//
if( iFoundTheIndexOfSubDocument != -1 )
{
var toSet = "customerCodes."+ iFoundTheIndexOfSubDocument +".dateField";
var updates = {};
updates[toSet] = "01.01.2016";
db.customers.update({ "id" : customer.id } , { $set: updates });
// This statement is actually interpreted like this :
// db.customers.update({ "id" : "1212" } ,{ $set: customerCodes.0.dateField : "01.01.2016" });
}
}
}
});
Have a nice day !

MongoDB different query styles with different results

I have a document :
{
"_id": ObjectId("5324d5b30cf2df0b84436141"),
"value": 0,
"metaId": {
"uuid": "8df088b2-9aa1-400a-8766-3080a6206ed1",
"domain": "domain1"
}
}
Also I have ensured indexes of this type:
ensureIndex({"metaId.uuid" : 1})
Now here comes two queries:
db.test.find({"metaId" : {"uuid" : "8df088b2-9aa1-400a-8766-3080a6206ed1"}}).explain()
"cursor" : "BasicCursor"
NO Index used!
db.test.find({"metaId.uuid" : "8df088b2-9aa1-400a-8766-3080a6206ed1"}).explain()
"cursor" : "BtreeCursor metaId.uuid_1"
Index used!
Is there a way to make both queries use index ?
Firstly, the following document:
{
"_id": ObjectId("5324d5b30cf2df0b84436141"),
"value": 0,
"metaId": {
"uuid": "8df088b2-9aa1-400a-8766-3080a6206ed1",
"domain": "domain1"
}
}
Would not match the Query:
db.test.find({
"metaId": {
"uuid": "8df088b2-9aa1-400a-8766-3080a6206ed1"
}
});
Because, it's querying by the value of "metaId" which has to match exactly to:
{
"uuid": "8df088b2-9aa1-400a-8766-3080a6206ed1",
"domain": "domain1"
}
In this case, you'd be using the index on "metaId".
There is a known issue on this, SERVER-2953. You can vote that up if you wish.
In the meantime you could do this instead:
{
"value": 0,
"metaId": [{
"uuid": "8df088b2-9aa1-400a-8766-3080a6206ed1",
"domain": "domain1"
}]
}
And with a slightly different query form then the index will be selected:
db.test.find(
{"metaId" : {
"$elemMatch": {
"uuid" : "8df088b2-9aa1-400a-8766-3080a6206ed1"
}
}}
).explain()
And actually that query will match the index with your current data form as well. However it will not return results. But with the data in this form it will return a match.
It is generally better to use an array element with a "contained" sub-document, even if it is only one. This allows for much more flexible searching, especially if you want to expand on the different field keys in the sub-document in the future.

Get a list of all unique tags in mongodb

I am beginning with mongodb and have a collection with documents that look like the following
{
"type": 1,
"tags": ["tag1", "tag2", "tag3"]
}
{
"type": 2,
"tags": ["tag2", "tag3"]
}
{
"type": 3,
"tags": ["tag1", "tag3"]
}
{
"type": 1,
"tags": ["tag1", "tag4"]
}
With this, I want a set of all the tags for a particular type. For example, for type 1, I want the set of tag1, tag2, tag3, tag4 (any order).
All I could think of is to get the tags and add them to a set in python, but I wanted to know if there is a way to do it with mongodb's mapreduce or something else. Please advise.
If you just want a (distinct) list of the tags then using distinct will be best. Map/Reduce will be slower and can't use an index for the javascript part.
http://docs.mongodb.org/manual/reference/method/db.collection.distinct/
db.coll.distinct("tags", {type:1}) Will return a set of tags for type=1.
You are right, a Map/Reduce might work for what you are trying to accomplish, but a Set might be faster and less code.
> m = function() {
... for (var tag in this.tags) {
... emit(this.tags[tag], 1);
... }
... }
> r = function(key, values) {
... return 1;
... }
> db.tags.mapReduce(m, r).find()
{ "_id" : "tag1", "value" : 1 }
{ "_id" : "tag2", "value" : 1 }
{ "_id" : "tag3", "value" : 1 }