How to print out more than 20 items (documents) in MongoDB's shell? - mongodb

db.foo.find().limit(300)
won't do it. It still prints out only 20 documents.
db.foo.find().toArray()
db.foo.find().forEach(printjson)
will both print out very expanded view of each document instead of the 1-line version for find():

DBQuery.shellBatchSize = 300
MongoDB Docs - Configure the mongo Shell - Change the mongo Shell Batch Size

From the shell you can use:
db.collection.find().toArray()
to display all documents without having to use it.

You can use it inside of the shell to iterate over the next 20 results. Just type it if you see "has more" and you will see the next 20 items.

Could always do:
db.foo.find().forEach(function(f){print(tojson(f, '', true));});
To get that compact view.
Also, I find it very useful to limit the fields returned by the find so:
db.foo.find({},{name:1}).forEach(function(f){print(tojson(f, '', true));});
which would return only the _id and name field from foo.

With newer version of mongo shell (mongosh) use following syntax:
config.set("displayBatchSize", 300)
instead of depreciated:
DBQuery.shellBatchSize = 300
Future find() or aggregate() operations will only return 300 documents per cursor iteration.

I suggest you to have a ~/.mongorc.js file so you do not have to set the default size everytime.
# execute in your terminal
touch ~/.mongorc.js
echo 'DBQuery.shellBatchSize = 100;' > ~/.mongorc.js
# add one more line to always prettyprint the ouput
echo 'DBQuery.prototype._prettyShell = true; ' >> ~/.mongorc.js
To know more about what else you can do, I suggest you to look at this article: http://mo.github.io/2017/01/22/mongo-db-tips-and-tricks.html

In the mongo shell, if the returned cursor is not assigned to a variable using the var keyword, the cursor is automatically iterated to access up to the first 20 documents that match the query. You can set the DBQuery.shellBatchSize variable to change the number of automatically iterated documents.
Reference - https://docs.mongodb.com/v3.2/reference/method/db.collection.find/

show dbs
use your database name
in my case, I'm using - use smartbank
then - show collections - just to check the document collections name.
and finally, db. your collection name.find() or find({}) -
show dbs
use smartbank
show collections
db.users.find() or db.users.find({}) or db.users.find({_id: ObjectId("60c8823cbe9c1c21604f642b")}) or db.users.find({}).limit(20)
you can specify _id:ObjectId(write the document id here) to get the single document
or you can specify limit - db.users.find({}).limit(20)

Related

MongoDB findOneAndReplace log if added as new document or replaced

I'm using mongo's findOneAndReplace() with upsert = true and returnNewDocument = true
as basically a way to not insert duplicate. But I want to get the _id of the new inserted document (or the old existing document) to be passed to a background processing task.
BUT I also want to log if the document was Added-As-New or if a Replacement took place.
I can't see any way to use findOneAndReplace() with these parameters and answer that question.
The only think I can think of is to find, and insert in two different requests which seems a bit counter-productive.
ps. I'm actually using pymongo's find_one_and_replace() but it seems identical to the JS mongo function.
EDIT: edited for clarification.
Is it not possible to use replace_one function ? In java I am able to use repalceOne which returns UpdateResult. That has method for finding if documented updated or not. I see repalce_one in pymongo and it should behave same. Here is doc PyMongo Doc Look for replace_one
The way I'm going to implement it for now (in python):
import pymongo
def find_one_and_replace_log(collection, find_query,
document_data,
log={}):
''' behaves like find_one_or_replace(upsert=True,
return_document=pymongo.ReturnDocument.AFTER)
'''
is_new = False
document = collection.find_one(find_query)
if not document:
# document didn't exist
# log as NEW
is_new = True
new_or_replaced_document = collection.find_one_and_replace(
find_query,
document_data,
upsert=True,
return_document=pymongo.ReturnDocument.AFTER
)
log['new_document'] = is_new
return new_or_replaced_document

call custom python function on every document in a collection Mongo DB

I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.
I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)
It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);

check if collection find return a cursor

This Meteor code needs to check if a certain document exist in the collection by checking if a cursor is returned, and if no cursor returned then the document does not exist. but it always returns true even if there is no value "alosh" for the text field in any of the documents in the collection
.
Why and how can it be fixed? Thanks
if (myCollection.find({text: 'alosh'}, {limit: 1})) {console.log('found');}
edit
The reason I did not use colllection.findOne is my understanding that it is much slower according to this post
Idea for solution:
You want to understand if there is a document with a certain value for a certain property. You could use:
if(myCollection.findOne({text: 'alosh'})){
console.log("found");}
Add a count. A cursor is always going to be a truthy value as the cursor itself is returned. You need to check the records contained in the cursor so...
if (myCollection.find({text: 'alosh'}, {limit: 1}).count()) {
console.log('found');
}
Because no docs returned is 0 and 0 is falsey this should only return when documents are found
Mongo cursors seem to behave the same as arrays in that an empty cursor (and an empty array) evaluate to true.
So !![] //true and !!collection.find({text : "alosh"}) //true.
BUT collection.findOne({text : "alosh"}) will fail your if - so use that instead
http://docs.meteor.com/#/full/findone

Mongo find by regex: return only matching string

My application has the following stack:
Sinatra on Ruby -> MongoMapper -> MongoDB
The application puts several entries in the database. In order to crosslink to other pages, I've added some sort of syntax. e.g.:
Coffee is a black, caffeinated liquid made from beans. {Tea} is made from leaves. Both drinks are sometimes enjoyed with {milk}
In this example {Tea} will link to another DB entry about tea.
I'm trying to query my mongoDB about all 'linked terms'. Usually in ruby I would do something like this: /{([a-zA-Z0-9])+}/ where the () will return a matched string. In mongo however I get the whole record.
How can I get mongo to return me only the matched parts of the record I'm looking for. So for the example above it would return:
["Tea", "milk"]
I'm trying to avoid pulling the entire record into Ruby and processing them there
I don't know if I understand.
db.yourColl.aggregate([
{
$match:{"yourKey":{$regex:'[a-zA-Z0-9]', "$options" : "i"}}
},
{
$group:{
_id:null,
tot:{$push:"$yourKey"}
}
}])
If you don't want to have duplicate in totuse $addToSet
The way I solved this problem is using the string aggregation commands to extract the StartingIndexCP, ending indexCP and substrCP commands to extract the string I wanted. Since you could have multiple of these {} you need to have a projection to identify these CP indices in one shot and have another projection to extract the words you need. Hope this helps.

use cursor as iterator with a query

I was reading about mongodb. Came across this part http://www.mongodb.org/display/DOCS/Tutorial It says -
> var cursor = db.things.find();
> printjson(cursor[4]);
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
"When using a cursor this way, note that all values up to the highest accessed (cursor[4] above) are loaded into RAM at the same time. This is inappropriate for large result sets, as you will run out of memory. Cursors should be used as an iterator with any query which returns a large number of elements."
How to use cursor as iterator with a query?Thanks for the help
You've tagged that you're using pymongo, so I'll give you two pymongo examples using the cursor as an iterator:
import pymongo
cursor = pymongo.Connection().test_db.test_collection.find()
for item in cursor:
print item
#this will print the item as a dictionary
and
import pymongo
cursor = pymongo.Connection().test_db.test_collection.find()
results = [item['some_attribute'] for item in cursor]
#this will create a list comprehension containing the value of some_attribute
#for each item in the collection
In addition, you can set the size of batches returned to the pymongo driver by doing this:
import pymongo
cursor = pymongo.Connection().test_db.test_collection.find()
cursor.batchsize(20) #sets the size of batches of items the cursor will return to 20
It is usually unnecessary to mess with the batch size, but if the machine you are running the driver on is having memory issues and page faulting while you are manipulating results from the query, you might have to set this to achieve better performance (this really seems like a painful optimization to me and I've always left the default).
As far as the javascript driver (the driver that loads when you launch the "shell") that part of the documentation is cautioning you not to use "array mode". From the online manual:
Array Mode in the Shell
Note that in some languages, like JavaScript, the driver supports an
"array mode". Please check your driver documentation for specifics.
In the db shell, to use the cursor in array mode, use array index []
operations and the length property.
Array mode will load all data into RAM up to the highest index
requested. Thus it should not be used for any query which can return
very large amounts of data: you will run out of memory on the client.
You may also call toArray() on a cursor. toArray() will load all
objects queries into RAM.
Using the MongoDB Java driver it should be something like:
DBCursor cursor = collection.find( query );
while( cursor.hasNext() ) {
DBObject obj = cursor.next();
// do something tih obj
}
In the mongo console you can do something like:
var cursor = db.things.find();
while(cursor.hasNext()) { printjson(cursor.next()); }
MongoDB returns results in batches. To see how many objects are left in a batch, we use objLeftInBatch() like this:
var c = db.Schools.find();
var doc = function() {return c.hasNext()? c.next : null;}
c.objLeftInBatch();
To iterate through this batch we can use doc() that we setup in the above code block. More learning on cursors can be found at https://docs.mongodb.com/