Use distinct() on a variable in mongodb - mongodb

I'm trying several queries in mongodb. Each document of my colelction is like this :
{
"_id" : 1,
"name" : 1,
"isReferenceProteome" : 1,
"isRepresentativeProteome" : 1,
"component" : 1,
"reference" : 1,
"upid" : 1,
"modified" : 1,
"taxonomy" : 1,
"superregnum" : 1,
"description" : 1,
"dbReference" : 1
}
the "reference" field has nested fields, one is "authorList", an array containing 'name' fields.
"reference" {
"authorList" [
{"name": "author1"},
{"name": "author2"},
{"name": "author3"} ...etc...
]
}
I have stored in a variable the result of the following query :
var testing = db.mycollection.find({'reference.authorList.30': {$exists: true}})
which stores all documents where the authorList is at least 30 names long.
Then I wanted to use distinct() on this variable, in order to have the distinct names of all authors :
testing.distinct("reference.authorList.name")
I tried this way because my first query returned an empty array :
db.mycollection.distinct( "reference.authorList.name", {"reference.authorList.name.30": {$exists: true}} )
I'm also trying whit $where command, but I got syntaxError for now.
What I am missing ?
Thanks.

Use
db.head_human_prot.distinct( "reference.authorList.name", {"reference.authorList.30": {$exists: true}} )
instead of
db.head_human_prot.distinct( "reference.authorList.name", {"reference.authorList.name.30": {$exists: true}} )
Silly me...

Related

What are the efficient query for mongodb if value exist on array then don't update and return the error that id already exist

I have an entry stored on my collection like this:
{
"_id" : ObjectId("5d416c595f19962ff0680dbc"),
"data" : {
"a" : 6,
"b" : [
"5c35f04c4e92b8337885d9a6"
]
},
"image" : "123.jpg",
"hyperlinks" : "google.com",
"expirydate" : ISODate("2019-08-27T06:10:35.074Z"),
"createdate" : ISODate("2019-07-31T10:24:25.311Z"),
"lastmodified" : ISODate("2019-07-31T10:24:25.311Z"),
"__v" : 0
},
{
"_id" : ObjectId("5d416c595f19962ff0680dbd"),
"data" : {
"a" : 90,
"b" : [
"5c35f04c4e92b8337885d9a7"
]
},
"image" : "456.jpg",
"hyperlinks" : "google.com",
"expirydate" : ISODate("2019-08-27T06:10:35.074Z"),
"createdate" : ISODate("2019-07-31T10:24:25.311Z"),
"lastmodified" : ISODate("2019-07-31T10:24:25.311Z"),
"__v" : 0
}
I have to write the query for push userid on b array which is under data object and increment the a counter which is also under data object.
For that, I wrote the Code i.e
db.collection.updateOne({_id: ObjectId("5d416c595f19962ff0680dbd")},
{$inc: {'data.a': 1}, $push: {'data.b': '124sdff54f5s4fg5'}}
)
I also want to check that if that id exist on array then return the response that following id exist, so for that I wrote extra query which will check and if id exist then return the error response that following id exist,
My question is that any single query will do this? Like I don't want to write Two Queries for single task.
Any help is really appreciated for that
You can add one more check in the update query on "data.b". Following would be the query:
db.collection.updateOne(
{
_id: ObjectId("5d416c595f19962ff0680dbd"),
"data.b":{
$ne: "124sdff54f5s4fg5"
}
},
{
$inc: {'data.a': 1},
$push: {'data.b': '124sdff54f5s4fg5'}
}
)
For duplicate entry, you would get the following response:
{ "acknowledged" : true, "matchedCount" : 0, "modifiedCount" : 0 }
If matched count is 0, you can show the error that the id already exists.
You can use the operator $addToSet to check if the element already exits in the array.
db.collection.updateOne({_id: ObjectId("5d416c595f19962ff0680dbd")},
{$inc: {'data.a': 1}, $addToSet: {'data.b': '124sdff54f5s4fg5'}}
)

MongoDB - Update or Create an object in nested Array in Pymongo

This is my collection
{
"_id" : '50001',
"data" :
[
{
"name" : "ram",
"grade" : 'A'
},
{
"name" : "jango",
"grade" : 'B'
},
{
"name" : "remo",
"grade" : 'A'
}
]
}
Here I want to update the object corresponds to "name": "jango" and have to create a new entry to the Array if "jango" is absent.
I can create a new entry but failed in "create or update".
I tried this way in mongo interpreter
db.MyCollection.update({'_id': '50001', "data.name" :"jango"}, {'$set':{'data': {'data.$.grade':'A'}}}, upsert=true)
but showing
not okForStorage
Mongo nested update so you should know the position or $ of update values below may help
db.collecionName.update(
{'_id': '50001', "data.name" :"jango"},
{'$set':{'data.1.grade':'A'}}, upsert=true)
or
db.collecionName.update(
{'_id': '50001', "data.name" :"jango"},
{'$set':{'data.$.grade':'A'}}, upsert=true)
You almost there:
db.YourCollection.update(
{ '_id':'50001', <-- to find document
'data.name': 'jango' < -- to find element of the array
},
{ '$set': { "data.$.grade" : 'A' } } <-- with .$ you reference array element from first argument
)
Link to documentation

MongoDB Why this error : can't append to array using string field name: comments

I have a DB structure like below:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"content" : "xxx"
}
]
}
I update a new subdocument in the comments feild. It is OK.
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
after that the DB structure:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"comments" : [
{
"id" : 3,
"content" : "xxx"
}
],
"content" : "xxx"
}
]
}
But when I update a new subdocument in the comment field that _id is 3, There is a error:
db.test.update(
{"_id" : 1, "comments.comments.id" : 3},
{$push : {"comments.comments.$.comments" : {id : 4, content:"xxx"}}}
)
error message:
can't append to array using string field name: comments
Well, it makes total sense if you think about it. MongoDb has the advantage and the disadvantage of solving magically certain things.
When you query the database for a specific regular field like this:
{ field : "value" }
The query {field:"value"} makes total sense, it wouldn't in case value is part of an array but Mongo solves it for you, so in case the structure is:
{ field : ["value", "anothervalue"] }
Mongo iterates through all of them and matches "value" into the field and you don't have to think about it. It works perfectly.. at only one level, because it's impossible to guess what you want to do if you have multiple levels
In your case the first query works because it's the case in this example:
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
Matches _id in the first level, and comments._id at the second level, it gets an array as a result but Mongo is able to solve it.
But in the second case, think what you need, let's isolate the where clause:
{"_id" : 1, "comments.comments.id" : 3},
"Give me from the main collection records with _id:1" (one doc)
"And comments which comments inside have and id=3" (array * array)
The first level is solved easily, comments.id, the second is not possible due comments returns an array, but one more level is an array of arrays and Mongo gets an array of arrays as a result and it's not possible to push a document into all the records of the array.
The solution is to narrow your where clause to obtain an unique document in comments (could be the first one) but it's not a good solution because you never know what is the position of the document you're looking for, using the shell I think the only option to be accurate is to do it in two steps. Check this query that works (not the solution anyway) but "solves" the multiple array part fixing it to the first record:
db.test.update(
{"_id" : 1, "comments.0.comments._id" : 3},
{$push : {"comments.0.comments.$.comments" : {id : 4, content:"xxx"}}}
)

Listing, counting factors of unique Mongo DB values over all keys

I'm preparing a descriptive "schema" (quelle horreur) for a MongoDB I've been working with.
I used the excellent variety.js to create a list of all keys and show coverage of each key. However, in cases where the values corresponding to the keys have a small set of values, I'd like to be able to list the entire set as "available values." In R, I'd be thinking of these as the "factors" for the categorical variable, ie, gender : ["M", "F"].
I know I could just use R + RMongo, query each variable, and basically do the same procedure I would to create a histogram, but I'd like to know the proper Mongo.query()/javascript/Map,Reduce way to approach this. I understand the db.collection.aggregate() functions are designed for exactly this.
Before asking this, I referenced:
http://docs.mongodb.org/manual/reference/aggregation/
http://docs.mongodb.org/manual/reference/method/db.collection.distinct/
How to query for distinct results in mongodb with python?
Get a list of all unique tags in mongodb
http://cookbook.mongodb.org/patterns/count_tags/
But can't quite get the pipeline order right. So, for example, if I have documents like these:
{_id : 1, "key1" : "value1", "key2": "value3"}
{_id : 2, "key1" : "value2", "key2": "value3"}
I'd like to return something like:
{"key1" : ["value1", "value2"]}
{"key2" : ["value3"]}
Or better, with counts:
{"key1" : ["value1" : 1, "value2" : 1]}
{"key2" : ["value3" : 2]}
I recognize one problem with doing this will be any values that have a wide range of different values---so, text fields, or continuous variables. Ideally, if there were more than x different possible values, it would be nice to truncate, say to no more than 20 unique values. If I find it's actually more, I'd query that variable directly.
Is this something like:
db.collection.aggregate(
{$limit: 20,
$group: {
_id: "$??varname",
count: {$sum: 1}
}})
First, how can I reference ??varname? for the name of each key?
I saw this link which had 95% of it:
Binning and tabulate (unique/count) in Mongo
with...
input data:
{ "_id" : 1, "age" : 22.34, "gender" : "f" }
{ "_id" : 2, "age" : 23.9, "gender" : "f" }
{ "_id" : 3, "age" : 27.4, "gender" : "f" }
{ "_id" : 4, "age" : 26.9, "gender" : "m" }
{ "_id" : 5, "age" : 26, "gender" : "m" }
This script:
db.collection.aggregate(
{$project: {gender:1}},
{$group: {
_id: "$gender",
count: {$sum: 1}
}})
Produces:
{"result" :
[
{"_id" : "m", "count" : 2},
{"_id" : "f", "count" : 3}
],
"ok" : 1
}
But what I don't understand is how could I do this generically for an unknown number/name of keys with a potentially large number of return values? This sample knows the key name is gender, and that the response set will be small (2 values).
If you already ran a script that outputs the names of all keys in the collection, you can generate your aggregation framework pipeline dynamically. What that means is either extending the variety.js type script or just writing your own.
Here is what it might look like in JS if passed an array called "keys" which has several non-"_id" named fields (I'm assuming top level fields and that you don't care about arrays, embedded documents, etc).
keys = ["key1", "key2"];
group = { "$group" : { "_id" : null } } ;
keys.forEach( function(f) {
group["$group"][f+"List"] = { "$addToSet" : "$" + f }; } );
db.collection.aggregate(group);
{
"result" : [
{
"_id" : null,
"key1List" : [
"value2",
"value1"
],
"key2List" : [
"value3"
]
}
],
"ok" : 1
}

How to write Mongo query to find sub document with condition

I have a document in a collection like this, I need to find the record with form_Id:1 and Function_Id:2, how to write the mongo query.
"Form_Id" : 1,
"Function" : [{
"Function_Id" : 1,
"Role" : [{
"Role_Id" : 1,
"UserId" : ["Admin", "001"]
}]
}, {
"Function_Id" : 2,
"Role" : [{
"Role_Id" : 2,
"UserId" : ["Admin", "005"]
}]
}]
You can use dot notation and the $ positional projection operator to do this:
db.test.find({Form_Id: 1, 'Function.Function_Id': 2}, {_id: 0, 'Function.$': 1})
returns:
{"Function": [{"Function_Id": 2, "Role": [{"Role_Id": 2, "UserId": ["Admin", "005"]}]}]}
Since your function key is an array, in order to use the $match operator, first you have to use the $unwind operator. http://docs.mongodb.org/manual/reference/aggregation/unwind/
And then you use $match operator to find the documents that you want http://docs.mongodb.org/manual/reference/aggregation/match/
So your query should look like this
db.collection.aggregate([{$unwind:"$Function"},{$match:{"Form_id":1,"Function.Function_id":2}}])
By default mongo will display the _id of the document. So if you do not want to display the _id, after matching the relevant ones, you could use the $project operator http://docs.mongodb.org/manual/reference/aggregation/project/
db.collection.aggregate([{$unwind:"$Function"},{$match:{"Form_id":1,"Function.Function_id":2}},{$project:{"_id":0,"Form_id":1,"Function":1}}])
If you don't want the form_id to be displayed, simply don't specify the form_id in the project part of the query. By default mongo will only display the keys whose value is 1. If the key is not mentioned it will not display it.
db.collection.aggregate([{$unwind:"$Function"},{$match:{"Form_id":1,"Function.Function_id":2}},{$project:{"_id":0,"Function":1}}])