How to count the number of documents in a mongodb collection - mongodb

I would like to know how to count the number of documents in a collection. I tried the follow
var value = collection.count();
&&
var value = collection.find().count()
&&
var value = collection.find().dataSize()
I always get method undefined.
Can you let me know what is the method to use to find the total documents in a collection.

Traverse to the database where your collection resides using the command:
use databasename;
Then invoke the count() function on the collection in the database.
var value = db.collection.count();
and then print(value) or simply value, would give you the count of documents in the collection named collection.
Refer: http://docs.mongodb.org/v2.2/tutorial/getting-started-with-the-mongo-shell/

If you want the number of documents in a collection, then use the count method, which returns a Promise. Here's an example:
let coll = db.collection('collection_name');
coll.count().then((count) => {
console.log(count);
});
This assumes you're using Mongo 3.
Edit: In version 4.0.3, count is deprecated. use countDocument to achieve the goal.

Since v4.0.3 you can use for better performance:
db.collection.countDocuments()

Execute Mongo shell commands in single line for better results.
To get count of all documents in mongodb
var documentCount = 0; db.getCollectionNames().forEach(function(collection) { documentCount++; }); print("Available Documents count: "+ documentCount);
Output: Available Documents count: 9
To get all count of document results in a collection
db.getCollectionNames().forEach(function(collection) { resultCount = db[collection].count(); print("Results count for " + collection + ": "+ resultCount); });
Output:
Results count for ADDRESS: 250
Results count for APPLICATION_DEVELOPER: 950
Results count for COUNTRY: 10
Results count for DATABASE_DEVELOPER: 1
Results count for EMPLOYEE: 4500
Results count for FULL_STACK_DEVELOPER: 2000
Results count for PHONE_NUMBER: 110
Results count for STATE: 0
Results count for QA_DEVELOPER: 100
To get all dataSize of documents in a collection
db.getCollectionNames().forEach(function(collection) { size = db[collection].dataSize(); print("dataSize for " + collection + ":"+ size); });
Output:
dataSize for ADDRESS: 250
dataSize for APPLICATION_DEVELOPER: 950
dataSize for COUNTRY: 10
dataSize for DATABASE_DEVELOPER: 1
dataSize for EMPLOYEE: 4500
dataSize for FULL_STACK_DEVELOPER: 2000
dataSize for PHONE_NUMBER: 110
dataSize for STATE: 0
dataSize for QA_DEVELOPER: 100

Simply you can use
Model.count({email: 'xyz#gmail.com'}, function (err, count) {
console.log(count);
});

db.collection('collection-name').count() is now deprecated.
Instead of count(), we should now use countDocuments() and estimatedDocumentCount().
Here is an example:
`db.collection('collection-name').countDocuments().then((docs) =>{
console.log('Number of documents are', docs);
};`
To know more read the documentation:
https://docs.mongodb.com/manual/reference/method/db.collection.countDocuments/
https://docs.mongodb.com/manual/reference/method/db.collection.estimatedDocumentCount/

Through MongoDB Console you can see the number of documents in a collection.
1.Go to mongoDB console and issue command "use databasename".
To start the console go up to the bin folder of where MongoDB is installed and click on mongo.exe
to start the mongoDB console
e.g If the database is myDB then command is "use myDB"
2.Execute this command db.collection.count()
collection is like table in RDBMS.
e.g if your collection name is myCollection then the command is db.myCollection.count();
this command will print the size of the collection in the console.

db.collection.count(function(err,countData){
//you will get the count of number of documents in mongodb collection in the variable
countdata
});
In place of collection give your mongodb collection name

Related

MongoDB atlas trigger find return 50k documents while the count is 63k

I am using atlas trigger in atlas, when I am fetching records it is returning only 50k records white the total data is 63k. Is there any limit for data fetching on atlas trigger.
const docs = await collection1.find().sort({ "_id" : -1}).toArray();
const count = await collection1.count({});
console.log('count = ',count);
console.log('found records un-processed', docs.length);
count = 63101
found records un-processed 50000
> result:
{
"$undefined": true
}
> result (JavaScript):
EJSON.parse('{"$undefined":true}')
This is the result I get when I ran the trigger. Need help
See documentation of count
Avoid using the count and its wrapper methods without a query predicate (note: db.collection.estimatedDocumentCount() does not take a query predicate) since without the query predicate, these operations return results based on the collection's metadata, which may result in an approximate count. In particular,
On a sharded cluster, the resulting count will not correctly filter out orphaned documents.
After an unclean shutdown, the count may be incorrect.
Use db.collection.countDocuments() or db.collection.estimatedDocumentCount(), depending on your needs

read values for a mongodb query from file

I'm trying to query all documents from a mongodb collection of which the criteria is inside a file.
criteria-file.txt:
value1
value2
value3
...
Currently I'm building the query like this
built-test.js.sh:
#!/bin/bash
echo 'db.collection.find({keyfield: {$in:[' > test.js
cat criteria-file.txt| while read i
do
echo "\"$i\"," >> test.js
done
echo ']}})' >> test.js
The query document is way under 16MB in size, but I wonder if there's a better way which is more elegant and efficient, especially, because over time I will most probably be over 16MB for the query document. I'm eager to get your suggestions.
BTW, I was wondering, for those 25K criteria values seeking in a collection with currently 200 million entries, the query time is only a bit over a minute and the CPU load doesn't seem to be too bad.
Thanks!
Read the file into an array using the cat() native shell method. Then, loop over the array of criteria values to find the matching documents and store all the documents in an array; this will be your list of matches.
var criteria_file = cat("criteria-file.txt");
var criteria_array = criteria_file.split("\n");
var result_ids_arr = [ ];
for (let value of criteria_array) {
let id_arr = db.collection.find( { keyfield: value }, { _id: 1} ).toArray();
result_ids_arr = result_ids_arr.concat(id_arr);
}
The result array of the _id values, e.g.: [ { "_id" : 11 }, { "_id" : 34 }, ... ]
All this JavaScript can be run from the command prompt or the mongo shell, using the load().
Split the criteria file into different chunks ensuring that the chunks don't exceed 16MB.
Run the same query on each chunk only now you can run the queries in parallel.
if you want to get extra fancy you can use the aggregation pipeline to do a $match query and send all the output results from each query to a single results collection using $mergeenter link description here.

Iterating through collections and counting the number of same value appearances pymongo

I have similar data in 5 collections in mongodb as follows (documents)
{
"_id" : ObjectId("53490030cf3b942d63cfbc7b"),
"uNr" : "abdc123abcd",
}
I want to iterate through each collection and check if there is uNr match in any collection. If there is then add that uNr and count +1 to new table. For example if there is a match in 3 collections, that it should show {"uNr" : "abcd123", "count": "3"}
If your total number of uNr values is small enough to fit in memory (at most a few million of them), you can total them client-side with a Counter and store them in a MongoDB collection:
from collections import Counter
from pymongo import MongoClient, InsertOne
db = MongoClient().my_database
counts = Counter()
for collection in [db.collection1,
db.collection2,
db.collection3]:
for doc in collection.find():
counts[doc['uNr']] += 1
# Empty the target collection.
db.counts.delete_many({})
inserts = [InsertOne({'_id': uNr, 'n': cnt}) for uNr, cnt in counts.items()]
db.counts.bulk_write(inserts)
Otherwise, query a thousand uNr values at a time and update counts in a separate collection:
from pymongo import MongoClient, UpdateOne, ASCENDING
db = MongoClient().my_database
# Empty the target collection.
db.counts.delete_many({})
db.counts.create_index([('uNr', ASCENDING)])
for collection in [db.collection1,
db.collection2,
db.collection3]:
cursor = collection.find(no_cursor_timeout=True)
# "with" statement helps ensure cursor is closed, since the server will
# never auto-close it.
with cursor:
updates = []
for doc in cursor:
updates.append(UpdateOne({'_id': doc['uNr']},
{'$inc': {'n': 1}},
upsert=True))
if len(updates) == 1000:
db.counts.bulk_write(updates)
updates = []
if updates:
# Last batch.
db.counts.bulk_write(updates)

How to delete documents by query efficiently in mongo?

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):
for id in mycoll.find(query, fields={}):
mycoll.remove(id)
This does not seem to be very efficient. Is there a better way?
EDIT
OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:
def reduce_duplicates(mydb, max_group_size):
# 1. Count the group sizes
res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
# 2. For each entry from the filter scratch collection having count > max_group_size
deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
key = entry['_id']
group_size = int(entry['value'])
# 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
mydb.static.remove(id)
return res['counts']['input']
So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:
MR the data to (key, count) pairs.
Iterate over all the pairs with count > max_group_size
Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
Delete each and every found record.
As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.
Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:
mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])
This attempt fails miserably. Moreover, it seems to screw mongo.Observe:
C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database
Needless to say, that the foreach-found-remove approach works and yields the expected results.
Now, I hope I have provided enough context and (hopefully) have restored my lost honour.
You can use a query to remove all matching documents
var query = {name: 'John'};
db.collection.remove(query);
Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.
Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.
You can remove it directly using MongoDB scripting language:
db.mycoll.remove({_id:'your_id_here'});
Would deleteMany() be more efficient? I've recently found that remove() is quite slow for 6m documents in a 100m doc collection. Documentation at (https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany)
db.collection.deleteMany(
<filter>,
{
writeConcern: <document>,
collation: <document>
}
)
I would recommend paging if large number of records.
First: Get the count of data you want to delete:
-------------------------- COUNT --------------------------
var query= {"FEILD":"XYZ", 'DATE': {$lt:new ISODate("2019-11-10")}};
db.COL.aggregate([
{$match:query},
{$count: "all"}
])
Second: Start deleting chunk by chunk:
-------------------------- DELETE --------------------------
var query= {"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var cursor = db.COL.aggregate([
{$match:query},
{ $limit : 5 }
])
cursor.forEach(function (doc){
db.COL.remove({"_id": doc._id});
});
and this should be faster:
var query={"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var ids = db.COL.find(query, {_id: 1}).limit(5);
db.tags.deleteMany({"_id": { "$in": ids.map(r => r._id)}});
Run this query in cmd
db.users.remove( {"_id": ObjectId("5a5f1c472ce1070e11fde4af")});
If you are using node.js write this code
User.remove({ _id: req.body.id },, function(err){...});

Jumbled up ids in mongodb

We have around 20 million records in our mongodb. In my collection called 'posts' there is a field called 'id' which was supposed to be unique but now it has gotten all messed up. We just want it to be unique and there are many many duplicates now.
We just wanted to do something like iterating over every reocrd and assigning it a unique id in a loop from 1 to 20million.
What would be the easiest way to do this?
There are not many options here, really.
Pick your language and driver of choice.
Fetch N documents.
Assign unique ids to them (several options here: 1) copy _id; 2) assign new ObjectID; 3) assign plain integer)
Save those documents.
Fetch next N documents. Go to step 3.
To fetch next N documents, you should note the last processed document's _id and do this:
db.collection.find({_id: {$gt: last_processed_id}}).sort({_id: 1}).limit(N);
Do not use skip here. It will be too slow.
And, of course, you can always truncate the collection, create unique index on id and populate it again.
You can use a simple script like this:
db.posts.dropIndex("*id index name here*"); // Drop Unique index
counter = 0;
page = 1;
slice = 1000;
total = db.posts.count();
conditions = {};
while (counter < total) {
cursor = db.posts.find(conditions, {_id: true}).sort({_id: 1}).limit(slice);
while (cursor.hasNext()) {
row = cursor.next();
db.posts.update({_id: row._id}, {$set: {id: ++counter}});
}
conditions['_id'] = {$gt: row._id};
print("Processed " + counter + " rows");
}
print('Adding id index');
db.posts.ensureIndex({id: 1}, {unique: true, background: false});
print("done");
save it to assignids.js, and run as
$ mongo dbname assignids.js
the outer-while selects 1000 rows as a time, and prevents cursor timeouts; the inner while assigns each row a new incremental id.