Mongodb stored function in aggregation - mongodb

I am using below query to fetch data from mongo-db:
db.FetchedUrl.aggregate({
"$match": {
"type": db.eval('echoFunction(\"news\")')
}
})
This is my stored function:
db.system.js.save({
_id: "echoFunction",
"value": function(x){return x}
})
This code is working fine at mongo-db shell. How to write equivalent Java code to call stored function in aggregation?

I think you need to understand what is actually happening here. Take this for an example, and you can do it yourself in the shell. So declare a variable there like this:
var param = db.eval('echoFunction(\"news\")');
And then do the aggregate again like this:
db.FetchedUrl.aggregate({
"$match": {
"type": param
}
})
Here is the thing. You know you don't have a "stored" variable on your server called "param" but of course the result is the same as what you did before.
This is because, much as the same as what you have done, this value gets evaluated in the "shell" before the request is sent to the server.
So in this case, your "server side" function is not providing any server side evaluation at the time of the aggregate being performed.
What this means is that any "Java" code you write is going to be pre-evaluated into the BSON document that is sent before it is sent to the server.
So whatever means you are using to "fetch" this value, then you need to write that in "Java" code. That value is then placed in as the value to the same key in a BSON document by the methods your driver supplies.
There are some notes on aggregation with the Java driver on the MongoDB website.

Related

How do I create an array for an updateOne() call in mongoc (C libarary for Mongodb)?

I am completely mystified (and supremely frustrated). How do I create this call using the mongoc library?
I have the following doc structure in the collection
{_id: myOID,
subsriptions: {
newProducts: true,
newBlogPosts: true,
pressReleases: true,
}
}
I want to remove one of the subscriptions, for example, the user no longer wants to receive press releases from me.
This works in the mongo shell. Now I need to do it in C code
updateOne({_id: myOID}, [{'$unset': 'subscriptions.pressReleases'}], {})
Note how the update parameter in the Mongo shell is an anonymous array. I need to do that for the bson passed in as the update parameter in the mongoc_collection_update_one() API call.
The C code for updateOne is
mongo_status = mongoc_collection_update_one (mongo_collection,
mongo_query,
mongo_update,
NULL, /* No Opts to pass in */
NULL, /* no reply wanted */
&mongo_error);
Also note that in the aggregate() API, this is done with
{"pipeline" : [{'$unset': 'elists.lunch' }] }
Neither the updateOne() shell function nor the mongoc_collection_update_one() API call accept that, they want just the array.
How do I create the bson to use as the second parameter for mongoc_collection_update_one() API call?
Joe's answer works and I am able to accomplish what I need to do.
The $unset update operator takes an object, just like $set.
updateOne({_id: myOID},{'$unset':{'subscriptions.pressReleases': true}})
OR perhaps even better
updateOne({_id: myOID},{'$unset':{'subscriptions.pressReleases': {'$exists': true}}})
which will remove the subscription flag no matter what the value is for that field.
Doing it this way does not require an anonymous array (which I still don't know how to create).

Mongodb Stitch realtime watch

What I intend to achieve is some sort of "live query" functionality.
So far I've tried using the "watch" method. According to the documentation:
You can open a stream of changes that match a filter by calling
collection.watch(delegate:) with a $match expression as the argument.
Whenever the watched collection changes and the ChangeEvent matches
the provided $match expression, the stream’s event handler fires with
the ChangeEvent object as its only argument
Passing the doc ids as an array works perfectly, but passing a query doesn't work:
this.stitch.db.collection<Queue>('queues')
.watch({
hospitalId: this.activehospitalid
}));
I've also tried this:
this.stitch.db.collection<Queue>('queues')
.watch({
$match: {
hospitalId: this.activehospitalid
}
},
));
Which throws an error on the console "StitchServiceError: mongodb watch: filter is invalid (unknown top level operator: $match)". The intention is watch all documents where the field "hospitalId" matches the provided value, or to effectively pass a query filter to the watch() method.
After a long search I found that it's possible to filter, but the query needs to be formatted differently
this.stitch.db.collection<Queue>('queues')
.watch({
$or: [
{
"fullDocument.hospitalId": this.activehospitalid
}
]
},
));
For anyone else who might need this, please note the important fullDocument part of the query. I havent found much documentation relating to this, but I hope it helps

How to evaluate simple expressions

When developing complex aggregations, I want the ability to test out simpler expressions as a sanity check. So I'm wondering if mongo shell has the ability to evaluate simple expressions.
For example, I want to do simple things like:
> { $hour: ISODate("2016-01-01T12:30:00Z") }
ISODate("2016-01-01T12:30:00Z")
In the example above it seems the shell isn't evaluating and returning the hour component as desired.
Is it possible to do what I want here?
If you're willing to use something other than Mongo Shell, NoSQLBooster will evaluate partial query operations. Just highlight the relevant section and click Run.
This is particularly useful for constructing pipelines with multiple stages. You can evaluate your pipeline one stage a time to verify which documents are passed to the next stage.
Not an ideal solution, but you can something like this:
var exprr = { "$hour": ISODate("2016-01-01T12:30:00Z") };
var tempCollection = "tempCollection";
db.getCollection(tempCollection).insert({});
db.getCollection(tempCollection).aggregate([
{"$project" : {"_id" : 0, "result" : exprr}}
]);
db.getCollection(tempCollection).drop();
Or you can wrap last part in a function and save it, for resue. The idea is we will make a temporary collection, insert a blank document on it and evaluate an expression the aggregation way. The downside is you can only evaluate those expressions which are supported in project aggregation operation.

MongoDB: Bulk changing all field types in python

I have a ton of documents (around 10 million) and I need to change their field type. The usual forEach function (just looping through every value) seems to take forever and is clearly not viable in the timeframe I have (it basically took all night for one out of four updates)
I've heard that bulkwrites may be able to do it but I'm getting mixed messages. I saw a confusing answer on this site, for example, says that there's no written function to do it (you would have to do some workaround), others say that it can be done with updates in Python, using pymongo.
I was wondering if there was a quicker way to mass changes of field type (string->double, string -> int) using python? I can also work from the console but I find even less solutions there.
Thanks
You can try using aggregation query in the mongo shell
Something like
db.your_collection.aggregate([
{
$addFields: {
field1: {
$convert: {
input: "$field1",
to: "string"
}
}
}
},
{ $out: "your_collection" }
])
More info here https://docs.mongodb.com/manual/reference/operator/aggregation/convert/

call custom python function on every document in a collection Mongo DB

I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.
I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)
It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);