MongoDB aggregation crashes when cursor or explain is used - mongodb

Since version 3.6 MongoDB requires the use of cursor or explain in aggregate queries. It's a breaking change so I have to modify my earlier code.
But when I add cursor or explain to my query, the request simply enters an endless loop and MongoDB never responds. It doesn't even seem to time out.
This simple aggregation just hangs the code and never responds:
db.collection('users').aggregate([{ $match: { username: 'admin' }}],
{ cursor: {} },
(err, docs) => {
console.log('Aggregation completed.');
});
I can replace { cursor: {} } with { explain: true } and the result is the same. It works perfectly under older MongoDB versions without this one parameter.
Without cursor or explain I get this error message:
The 'cursor' option is required, except for aggregate with the explain argument
I'm not the only one who ran into this:
https://github.com/nosqlclient/nosqlclient/issues/419

OK, this was a little tricky, but finally it works. Looks like there are some major breaking changes in MongoDB's Node.js driver which nobody bothered to tell me.
1. The Node.js MongoDB driver has to be upgraded. My current version is 3.0.7.
2. The way how MongoDB connects has been changed, breaking any old code. The client connection now returns a client object, not merely the db. It has to be handled differently. There is a SO answer explaining it perfectly:
db.collection is not a function when using MongoClient v3.0
3. Aggregations now return an AggregationCursor object, not an array of data. Instead of a callback now you have to iterate through it.
var cursor = collection.aggregate([ ... ],
{ cursor: { batchSize: 1 } });
cursor.each((err, docs) => {
...
});
So it seems you have to rewrite ALL your db operations after upgrading to MongoDB 3.6. Yay, thanks for all the extra work, MongoDB Team! Guess that's where I'm done with your product.

Related

Mongodb Stitch realtime watch

What I intend to achieve is some sort of "live query" functionality.
So far I've tried using the "watch" method. According to the documentation:
You can open a stream of changes that match a filter by calling
collection.watch(delegate:) with a $match expression as the argument.
Whenever the watched collection changes and the ChangeEvent matches
the provided $match expression, the stream’s event handler fires with
the ChangeEvent object as its only argument
Passing the doc ids as an array works perfectly, but passing a query doesn't work:
this.stitch.db.collection<Queue>('queues')
.watch({
hospitalId: this.activehospitalid
}));
I've also tried this:
this.stitch.db.collection<Queue>('queues')
.watch({
$match: {
hospitalId: this.activehospitalid
}
},
));
Which throws an error on the console "StitchServiceError: mongodb watch: filter is invalid (unknown top level operator: $match)". The intention is watch all documents where the field "hospitalId" matches the provided value, or to effectively pass a query filter to the watch() method.
After a long search I found that it's possible to filter, but the query needs to be formatted differently
this.stitch.db.collection<Queue>('queues')
.watch({
$or: [
{
"fullDocument.hospitalId": this.activehospitalid
}
]
},
));
For anyone else who might need this, please note the important fullDocument part of the query. I havent found much documentation relating to this, but I hope it helps

MongoDB: Bulk changing all field types in python

I have a ton of documents (around 10 million) and I need to change their field type. The usual forEach function (just looping through every value) seems to take forever and is clearly not viable in the timeframe I have (it basically took all night for one out of four updates)
I've heard that bulkwrites may be able to do it but I'm getting mixed messages. I saw a confusing answer on this site, for example, says that there's no written function to do it (you would have to do some workaround), others say that it can be done with updates in Python, using pymongo.
I was wondering if there was a quicker way to mass changes of field type (string->double, string -> int) using python? I can also work from the console but I find even less solutions there.
Thanks
You can try using aggregation query in the mongo shell
Something like
db.your_collection.aggregate([
{
$addFields: {
field1: {
$convert: {
input: "$field1",
to: "string"
}
}
}
},
{ $out: "your_collection" }
])
More info here https://docs.mongodb.com/manual/reference/operator/aggregation/convert/

Aggregate pipleline Error: getMore: cursor didn't exist on server, possible restart or timeout

I am facing this issue with mongodb.
My code is something like this
for(loop) {
var cursorQuery = db.beacon_0000.aggregate([
{
$match: {
...
}
},
{
$project: {
...
}
},
{
$group: {
...
}
},
{
$sort: {
...
}
}
], {allowDiskUse: true} );
...
while(cursorQuery.hasNext()) {
var cursor = cursorQuery.next();
...
}
}
I run the above query via command and mongo shell as
$ mongo dbName file.js
After a while I get the cursor didn't exist on server error at line cursorQuery.hasNext().
In find query if I get this error, I can resolve by adding addOption(DBQuery.Option.noTimeout)
However this option does not seem to be available with aggregate
Please let me know how can I resolve or workaround this issue.
Just to provide additional update:
When say I use
var cursor = db.collection..aggregate([ ...], {allowDiskUse: true} ).addOption(DBQuery.Option.noTimeout)
I get this error
E QUERY TypeError: Object # has no method 'addOption'
However when say I use
var cursor = db.collection..find({...}, {...}).addOption(DBQuery.Option.noTimeout)
It works fine.
Checking the aggregate doc
https://docs.mongodb.com/v3.0/reference/method/db.collection.aggregate/
It says:
Returns:A cursor to the documents produced by the final stage of the aggregation pipeline operation
And then checking cursor doc
https://docs.mongodb.com/v3.0/reference/method/cursor.addOption/#cursor.addOption
There is no suggestion that aggregate cursor is different from find cursor and former does not support DBQuery.Option.noTimeout.
So is there a bug at mongodb for this. Any way to fix it or have a workaround.
Note mongodb version is 3.0
I had the same issue and solved it by setting the idle cursor timeout from default 10 minutes to 1 hour. This is configurable since mongodb 2.6.9. See:
https://jira.mongodb.org/browse/SERVER-8188
https://docs.mongodb.com/manual/reference/parameters/#param.cursorTimeoutMillis
The default cursor timeout is 600000 ms = 10 minutes. You can alter it in different ways:
on startup: mongod --setParameter cursorTimeoutMillis=<num>
or: mongos --setParameter cursorTimeoutMillis=<num>
or during operation, using the mongo shell: db.adminCommand({setParameter:1, cursorTimeoutMillis: <num>})
Mongos is not transferring the command to its mongod's of the cluster. Also the Primary does not replicate the command to its replicaSet members. Thus, you need to execute the command on every mongos and mongod where the query might run.
You have sort of answered this yourself.
Adding the option addOption(DBQuery.Option.noTimeout) will indeed fix the issue when using find because it stops the cursor from timing out and therefore it will exist when you try .hasNext()
However the cursor for aggregation does not have that option so you can't stop it from timing out unfortunately.
you can actually use : maxTimeMS
as it described in documentation:
Optional. Specifies a time limit in milliseconds for processing operations on a cursor. If you do not specify a value for maxTimeMS, operations will not time out. A value of 0 explicitly specifies the default unbounded behavior.
There is an option in mongodb documentation that you can set it no a non-negative number for the time that you want your cursor to be alive.
you can see the more detail on : documentation

call custom python function on every document in a collection Mongo DB

I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.
I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)
It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);

Meteor - Cannot Access All Read Operations

I'm new to Meteor. I've been stuck on this problem for a while. I can successfully adds items to a collection and look at them fully in the console. However, I cannot access all of the read operations in my .js file.
That is, I can use .find() and .findOne() with empty parameters. But when I try to add .sort or an argument I get an error telling me the object is undefined.
Autopublish is turned on, so I'm not sure what the problem is. These calls are being made directly in the client.
This returns something--
Template.showcards.events({
"click .play-card": function () {
alert(Rounds.find());
}
})
And this returns nothing--
Template.showcards.events({
"click .play-card": function () {
alert(Rounds.find().sort({player1: -1}));
}
})
Sorry for the newbie question. Thanks in advance.
Meteor's collection API works a bit differently from the mongo shell's API, which is understandably confusing for new users. You'll need to do this:
Template.showcards.events({
'click .play-card': function() {
var sortedCards = Rounds.find({}, {sort: {player1: -1}}).fetch();
console.log(sortedCards);
}
});
See this for more details. Also note that logging a cursor (the result of a find) probably isn't what you want. If you want to see the contents of the documents, you need to fetch them.
Rounds.find().sort({player1: -1}) returns a cursor, so you will want to do this:
Rounds.find().sort({player1: -1}).fetch();
Note that this returns an Array of document objects. So you would do something more like this:
docs = Rounds.find().sort({player1: -1}).fetch();
alert(docs[0]);