How to mapReduce() in Studio 3T/MongoDB - mongodb

I recently started to work with MongoDB and I came across the mapReduce method. I understood theory behind it, but i'm having problem with practice, i'll try to explain: i'm using Studio 3T as IDE and I saw the 'add/edit stored functions' option by right-clicking DBs, i creted map and reduce functon with this option, but i don't know how to call them.
This is how I define map and reduce function:
and this is how i call them, receiving the ReferenceError.
EDIT 1 : I saw this thread, but it doesn't do what i'd like to do, he define functions in mongodb shell, i'd like to be able to define them in studio 3t and call them "when" i want to.

Instead of using IntelliShell (the smarter mongo shell equivalent in Studio 3T) for map-reduce functions, it would be simpler to use the dedicated Map-Reduce feature (full documentation here), which will spare you the task of defining, storing, and calling separate functions.

Related

Are mongodb aggregations turing complete?

Mongodb aggregation pipelines and the expressions available inside it make it look like a full language. However, I do not know how to test if a system is turing complete. Has anybody written/said anything about the turing completeness of mongodb aggregations?
I was curious about this as well. Theoretically it seems like it could be turing complete, similar to how vanilla sql can be turing complete. See answer here: https://stackoverflow.com/a/7580013/15314201
However, in practice the aggregation pipeline isn't meant for "scripting" but a rather a linear flow through the pipeline. Any sort of loops or functions or whatever would probably be better done in a language that interfaces with Mongo, such as using pymongo for the linear pipeline and python for more advanced control flows.

Can Sphinx query be tested command line w/o an index?

If I just want to test that my query will work vs a term inside a sentence "and am selling an Iphone 3gs", is it possible to use command line to test this? This way I don't need to keep adding to and rotating an index but can simply tweak my query and the data I'd plan on storing. Mainly I am trying to tweak various query parameters like SENTENCE and PROXIMITY vs wordforms/stopwords/ignore_char and would like to be able to work fast and test different query structures vs test words/patterns.
In theory can use BuildExcerpts, to run an arbitrary query, against a block of text (make sure use query_mode=true).
http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts
But even then not totally sure it will completely honour the query, not sure SENTENCE etc will truely work.
... but if you wanting to play with ignore_char etc, you are going to be modifying the index config file. So surely just quickly running indexer to rebuild the index, to see the results is not htat difficult.

retrieving python variables from Cell Magic in Ipython notebook from background subprocess

I'm trying to do a quick and dirty querying of my mongodb database using ipython notebook.
I have several cells each with its own query. Since mongodb can support several connections I would like to run each query in parallel. I thought an ideal way would be just do something like
%%script --bg python
query = pymongo.find(blahbalhba)
You can imagine several cells each with its own query. However I'm not able to access the query returned by pymongo.find.
I understand that this is a subprocess run in a seperate thread, but I have no idea how to access the data since the process is quickly destroyed and the namespace goes away.
I found a similar post for %%bash here but I'm having trouble translating this to a python namespace.
%%script is just a convenient magic, it will not replace writing a full blown magic.
The only thing I can see is to write your own magic. Basically if you can do it with a function that takes a string a parameter, you know how to write a magic.
So How would you (like to) write it in pure python ? (Futures, multiprocessing, queuing library ?) ... then move it to a magic.

Not recommended to use server-side functions in MongoDB, does this go for MapReduce as well?

The MongoDB documentation states that it is not recommended to use its stored functions feature. This question goes through some of the reasons, but they all seem to boil down to "eval is evil".
Are there specific reasons why server-side functions should not be used in a MapReduce query?
The system.js functions are available to Map Reduce jobs by default ( https://jira.mongodb.org/browse/SERVER-8632 notes a slight glitch to that in 2.4.0rc ).
They are not actually evaled within the native V8/Spidermonkey evironment so tehcnically that part of them is also gone.
So no, there is no real problems, they will run as though native within that Map Reduce and should run just as fast and "good" as any other javascript you write. In fact the system.js collection is more designed to house code for map reduce jobs, it is later uses that sees it used as a hack for "stored procedures".

Performance of repeatedly executing the same javascript on MongoDb?

I'm looking at using some JavaScript in a MongoDb query. I have a couple of choices:
db.system.js.save the function in the db then execute it
db.myCollection.find with a $where clause and send the JS each time
exec_js in MongoEngine (which I imagine uses one of the above)
I plan to use the JavaScript in a regularly used query that's executed as part of a request to a site or API (i.e. not a batch administrative jobs) so it's important that the query executes with reasonable speed.
I'm looking at a 30ish line function.
Is the Javascript interpreted fresh each time? Will the performance be ok? Is it a sensible basis upon which to build queries?
Is the Javascript interpreted fresh each time?
Pretty much. MongoDB only has one "javascript instance" per running instance of MongoDB. You'll notice this if you try to run two different Map/Reduces at the same time.
Will the performance be ok?
Obviously, there are different definitions of "OK" here. The $where clause can not use indexes. You can combine that clause with another indexed query. In either case each object will need to be pushed from BSON over to the Javascript run-time and then acted on inside the run-time.
The process is definitely not what you would call "performant". Of course, by that measure Map/Reduce is also not very performant and people use that on production systems.
Is it a sensible basis upon which to build queries?
The real barrier here isn't the number of lines in the code, it's the number of possible documents this code will interpret. Even though it's "server-side" javascript, it's still a bunch of work that the server has to do. (in one thread, in an interpreted environment)
If you can test it and scope it correctly, it may well work out. Just don't expect miracles.
What is your point here? Write a JS script and call it regularly through cron. What should be the problem with that?