MongoDB: distinct tuples

MongoDB: distinct tuples - mongodb

Suppose to have a collection of MongoDB documents with the following structure:
{
id_str: "some_value",
text: "some_text",
some_field: "some_other_value"
}
I would like to filter such documents so as to obtain the ones with distinct text values.
I learned from the MongoDB documentation how to extract unique field values from a collection, using the distinct operation. Thus, by performing the following query:
db.myCollection.distinct("text")
I would obtain an array containing the distinct text values:
["first_distinct_text", "second_distinct_text",...]
However, this is not the result that i would like to obtain. Instead, I would like to have the following:
{ "id_str": "a_sample_of_id_having_first_distinct_text",
"text": "first_distinct_text"}
{ "id_str": "a_sample_of_id_having_second_distinct_text",
"text": "second_distinct_text"}
I am not sure if this can be done with a single query.
I found a similar question which, however, do not solve fully my problem.
Do you have any hint on how to solve this problem?
Thanks.

You should look into making an aggregate query using the $group stage, and probably using the $first operator.
Maybe something along the lines of:
db.myCollection.aggregate([{ $group : { _id : { text: "$text"},
text: { $first: "$id_str" }
}
}])

try:
db.myCollection.aggregate({$group: {_id: {'text': "$text", 'id_str': '$id_str'}}})
More information here: http://docs.mongodb.org/manual/reference/method/db.collection.aggregate/

Related

Push values in Mongo Nested Array

enter image description here
Let's say that we have many documents like this in the photo
I have the above schema. I want to find the document based on _id first and then push an array of values to providedServices which belongs to the _id which is inside barbers array
A little help. Can't seem to find this out!

You need to find the related arrays firstly. For this, you can use $elemMatch or write it as 'barbers._id' : {$elemMatch: parameter}' .
Here we tried to find document with filtering it's own id and barbers id. You can change the filter as you wished. It can be only search on barbers id.
Need to write your DocumentName and your parameters instead of idValue, barbersId, serviceModel.
const result = await DocumentName.findOneAndUpdate(
{
$and:
[
{_id: mongoose.Types.ObjectId(idValue)},
{'barbers': {$elemMatch: {_id: mongoose.Types.ObjectId(barbersId)}}}
]
},
{ $push: { 'barbers.$.providedServices': serviceModel } },
{ new: true })
At first, we found the related barbers array inside of all documents. Then we pushed the model inside of providedServices array into this barbers array.

How to find in MongoDB by last 4 chars in ObjectID?

I don't want to expose the full object ID to the client, instead I want to show him only a short of the last 4 chars of the actual object ID of an entity in the collection.
For example: ObjectId("5fcca5d997239a74da0d67a9") will become just 67a9
So it will be much easier to "talk" with ids of documents instead of the full object it.
Then I need to find the document in the DB using only the 67a9.
Is this possible and how?

According to this issue in Jira the resolution is "Won't fix".
ObjectId is not a String, is another object, so $regex is no possible.
Check this example where $regex works ok when _id is an String but not an ObjectId.
So one possible option is duplicate every field _id in another field called id or whatever where the id is in String format.
Then, you can do this query:
db.collection.find({
"_id": {
"$regex": "67a9$"
}
})
Example here where I've added more _id fields that not match the pattern

As pointed out, regex won't work on an ObjectId. But there is an easy workaround. Just use aggregation to first convert your ObjectId into a string and then match it.
db.collection.aggregate([
{
$addFields: {
tempId: { $toString: '$_id' },
}
},
{
$match: {
tempId: { $regex: "67a9"}
}
}
])
Obviously not a great solution to use on very large collections.

Meteor collection get last document of each selection

Currently I use the following find query to get the latest document of a certain ID
Conditions.find({
caveId: caveId
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
How can I use the same using multiple ids with $in for example
I tried it with the following query. The problem is that it will limit the documents to 1 for all the found caveIds. But it should set the limit for each different caveId.
Conditions.find({
caveId: {$in: caveIds}
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
One solution I came up with is using the aggregate functionality.
var conditionIds = Conditions.aggregate(
[
{"$match": { caveId: {"$in": caveIds}}},
{
$group:
{
_id: "$caveId",
conditionId: {$last: "$_id"},
diveDate: { $last: "$diveDate" }
}
}
]
).map(function(child) { return child.conditionId});
var conditions = Conditions.find({
_id: {$in: conditionIds}
},
{
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});

You don't want to use $in here as noted. You could solve this problem by looping through the caveIds and running the query on each caveId individually.

you're basically looking at a join query here: you need all caveIds and then lookup last for each.
This is a problem of database schema/denormalization in my opinion: (but this is only an opinion!):
You could as mentioned here, lookup all caveIds and then run the single query for each, every single time you need to look up last dives.
However I think you are much better off recording/updating the last dive inside your cave document, and then lookup all caveIds of interest pulling only the lastDive field.
That will give you immediately what you need, rather than going through expensive search/sort queries. This is at the expense of maintaining that field in the document, but it sounds like it should be fairly trivial as you only need to update the one field when a new event occurs.

How to ignore a certain field during search in MongoDB

suppose I have a MongDB record like below:
{
name:"name",
streams: [
{user:"user0", name:"name0", locked:true},
{user:"user1", name:"name1", locked:true},
{user:"user2", name:"name2", locked:false}
}
}
I want to find all records that have user0 and name0 in the streams field, but I don't care about the locked field
find({streams:{user:"user0", name:"name0"}}) doesn't work, since the locked field is not specified.
Thank You,
Gary

You are looking for the $elemMatch operator which allows you to select the fields from a sub-document in an array that match your conditions:
db.collection.find({
"streams": { "$elemMatch": { "user": "user0", "name": "name0"} }
})
Take some time to go through the Query Operators in the manual. There are lots of useful operations there.

How can I get all the doc ids in MongoDB?

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.

You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.

One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})

I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.

Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!

I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.

Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")

i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.

One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});