Mongodb unique value [duplicate] - mongodb

This question already has answers here:
Get distinct records values
(6 answers)
Closed 6 years ago.
I am a complete beginner in mongodb. I want to do the mongodb equivalence of sql select distinct column. i.e. For a mongodb collection with the following schema:
{
"_id" : ObjectId("xxxxxxcbf"),
"date" : "1462514997209",
"user_ids" : [
"userXXXX"
],
"created" : 1462543716,
"processed" : 1462543716
}
How to find unique user_ids in the collection?

You need to use distinct() like (assuming your collection name is collection1)
db.collection1.distinct( "user_ids" )

Related

Find by another field in mongoDB [duplicate]

This question already has answers here:
MongoDb query condition on comparing 2 fields
(4 answers)
MongoDB : querying documents with two equal fields, $match and $eq
(2 answers)
Closed 1 year ago.
how can i in mongoDB check field by another field like that in sql:
SELECT `name`,`surname` FROM `users` where `name`=`surname`
for now i try :
Credentials.findOne({ usersLen: { $lte: '$usersMaxLen' } });
^^^ - here i want access field usersMaxLen from collection
but have error:
CastError: Cast to number failed for value "$usersMaxLen" (type string) at path "usersLen" for model "Credentials"

Would like some input on the best indexing for a MongoDB collection for the case below [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a collection set of objects with 4 ids on them, something like:
{
location_id : ObjectId,
project_id : ObjectId,
department_id : ObjectId,
element_id : ObjectId
}
I have a specialized endpoint in my data service that takes in an array of these objects, and must return all objects in the collection where the ids match, but nulls in the database are always a match.
Not a difficult query by my book, but what i'm most concerned about is the performance. If I simply index all 4 ids, is mongo smart enough to handle a query on all 4 efficiently? There could be hundreds of millions of records in this collection and hundreds in the passed-in set to query with. Is there a more efficient way with secondary indices or is just indexing on all 4 enough for mongo's engine to work it out?
Indexing all 4 should be enough, but you have a few options depending on the typical queries types your system will perform.
If your queries always search for all 4 fields at the same query, then the all-in index will be enough.
Now, if you have combinations between the fields, then you can have extra indexes tailored to each query, like:
db.collection.find({ location_id : ObjectId, project_id : ObjectId });
//index: { location_id : 1, project_id : 1 }
db.collection.find({ location_id : ObjectId, department_id : ObjectId });
//index: { location_id : 1, department_id : 1 }
db.collection.find({ project_id : ObjectId, element_id : ObjectId });
//index: { project_id : 1, element_id : 1 }
...
Under the hood, MongoDB will test each possible index on the first few queries and decide which has the better performance for that specific query.
What I usually do is turn the MongoDB database profiler to log queries slower than 100ms, and then create necessaries indices for each case.
To activate the profiler, run at mongo shell:
db.setProfilingLevel(
1, //0-turn off profiler; 1-log only slow than slowms; 2-log all operations
{ slowms: 100 } //slow than 100ms
);
And then check the slower queries:
db.system.profile.find().sort({millis:-1}).limit(10).pretty();

How to find the document having an array element? [duplicate]

This question already has answers here:
Find document with array that contains a specific value
(13 answers)
Closed 4 years ago.
I have a MongoDB document as follows:
{
"_id" : ObjectId("5c29f3123d8cf714fd9cdb87"),
"Machine" : "host1",
"Pools" : [
"Pool1",
"Pool2"
]
}
How do I find all the documents that have pool Pool1 in "Pools" key in my collection?
I tried the following, but it doesn't seem correct.
db.Resources.find({Pools: {$elemMatch: { "$in", ['Pool1']}}}).pretty()
There are different ways to get what you want.
Find all records whose Pools' array contains Pool1:
db.Resources.find({Pools: 'Pool1'}).pretty()
Find all records whose Pools' array contains the following array elements, the order does not matter
db.Resources.find({Pools: {$all: ['Pool1', ...]}}).pretty()
To read more on querying arrays, see this mongodb post

Update a mongo collection of 10 billion documents. Without contaminating the data.

I have a collection in mongo of 10 billion documents. Some of which have false information and require updating. The documents look like
{
"_id" : ObjectId("5567c71e2cdc06be25dbf7a0"),
"index1" : "stringIndex1",
"field" : "stringField",
"index2" : "stringIndex2",
"value" : 100,
"unique_id" : "b21fc73e-55a0-4e15-8db0-fa94e4ebcc0b",
"t" : ISODate("2015-05-29T01:55:39.092Z")
}
I want to update the value field for all documents that match criteria on index1, index2 and field. I want to do this across many combinations of the 3 fields.
In an ideal world we could create a second collection and compare the 2 before replacing the original in order to guarantee that we haven't lost any documents. But the size of the collection means that this isn't possible. Any suggestions for how to update this large amount of data without risking damaging it.

mongodb: How to use an index for distinct command and query?

I have some problems with very slow distinct commands that use a query.
From what I have observed the distinct command only makes use of an index if you do not specify a query:
I have created a test database on my MongoDB 3.0.10 server with 1Mio objects. Each object looks as follows:
{
"_id" : ObjectId("56e7fb5303858265f53c0ea1"),
"field1" : "field1_6",
"field2" : "field2_10",
"field3" : "field3_29",
"field4" : "field4_64"
}
The numbers at the end of the field values are random 0-99.
On the collections two simple indexes and one compound-index has been created:
{ "field1" : 1 } # simple index on "field1"
{ "field2" : 1 } # simple index on "field2"
{ # compound index on all fields
"field2" : 1,
"field1" : 1,
"field3" : 1,
"field4" : 1
}
Now I execute distinct queries on that database:
db.runCommand({ distinct: 'dbtest',key:'field1'})
The result contains 100 values, nscanned=100 and it has used index on "field1".
Now the same distinct query is limited by a query:
db.runCommand({ distinct: 'dbtest',key:'field1',query:{field2:"field2_10"}})
It contains again 100 values, however nscanned=9991 and the used index is the third one on all fields.
Now the third index that was used in the last query is dropped. Again the last query is executed:
db.runCommand({ distinct: 'dbtest',key:'field1',query:{field2:"field2_10"}})
It contains again 100 values, nscanned=9991 and the used index is the "field2" one.
Conclusion: If I execute a distinct command without query the result is taken directly from an index. However when I combine a distinct command with a query only the query uses an index, the distinct command itself does not use an index in such a case.
My problem is that I need to perform a distinct command with query on a very large database. The result set is very large but only contains ~100 distinct values. Therefore the complete distinct command takes ages (> 5 minutes) as it has to cycle through all values.
What needs to be done to perform my distinct command presented above that can be answered by the database directly from an index?
The index is automatically used for distinct queries if your Mongo database version supports it.
The possibility to use an index in a distinct query requires Mongo version 3.4 or higher - it works for both storage engines MMAPv1/WiredTiger.
See also the bug ticket https://jira.mongodb.org/browse/SERVER-19507