Does MongoDB's Map/Reduce sort work? - mongodb

If the following is used
Analytic.collection.map_reduce(map, reduce,
:query => {:page => subclass_name},
:sort => [[:pageviews, Mongo::DESCENDING]]).find.to_a
it won't sort by pageviews. Alternatively, if it is array of hash:
Analytic.collection.map_reduce(map, reduce,
:query => {:page => subclass_name},
:sort => [{:pageviews => Mongo::DESCENDING}]).find.to_a
it won't work either. I think the reason it has to be an array is to specify the first field to sort by, etc. I also tried just a flat array instead of an array of array like in the first code listing up above and it didn't work either.
Is it not working? This is the spec: http://api.mongodb.org/ruby/current/Mongo/Collection.html#map_reduce-instance_method

What are you trying to do? Sort is really only useful in conjunction with limit: it's applied before the map so you can just MapReduce the latest 20 items or something. If you're trying to sort the results, you can just do a normal sort on the output collection.

Ok, it is a little bit tricky:
After the map_reduce(), a Mongo::Collection object is returned, but the structure is like:
[{"_id":123.0,"value":{"pageviews":3621.0,"timeOnPage":206024.0}},
{"_id":1320.0,"value":{"pageviews":6584.0,"timeOnPage":373195.0}},
...
]
so to do the sort, it has to be:
Analytic.collection.map_reduce(map, reduce,
:query => {:page => subclass_name}).find({},
:sort => [['value.pageviews', Mongo::DESCENDING]])
note the value.pageviews part.

Related

Mongodb: Is it possible to run map-reduce command and query strings as integers

So I have the following PHP code which runs a map-reduce command on a MongoDB database collection:
$map = new MongoCode("function() { emit(this.app, this.bytes); }");
$reduce = new MongoCode("function(k, vals) { ".
"var sum = 0;".
"for (var i in vals) {".
"sum += vals[i];".
"}".
"return sum; }");
$dateAdded = mktime(0,0,0,5,1,2015);
//echo $dateAdded." = ".date("r",$dateAdded)."<br>\n";
$request = $db->command(array(
"mapreduce" => "log",
"map" => $map,
"reduce" => $reduce,
"query" => array("event" => "destroy", "systimelong" => array('$gt' => $dateAdded)),
"out" => array("inline" => 1)));
var_dump($request);
This actually works really great when the data is stored in the database as an integer. But sometimes the data gets stored as a string. Why? Thats another story that cant be changed right now. Ultimately it will be an integer, but I'd really like to know if and how I can modify this to handle the cases that the data is a string just in case it ever happens.
Since Mongo uses Javscript I feel like I should be able to use the parseInt() function inside the $map and/or$reduce functions, but it doesn't seem to be working.
Also, how would I handle the query? The systimelong field is just unixtime, and I am using the PHP mktime() function to generate a integer value for the beginning of the month. Again, it works great then comparing integers, but I need to first convert the string value.
Any ideas?

Perl module for Elastisearch Percolator

I'm trying to use the Elasticsearch Percolator with perl and I have found this cool module.
The Percolation methods are listed here
As far as I can tell they're just read methods, hence it is only possible to read the queries index and see if a query already exists, count the queries matched, etc.
Unless I'm missing something it is not possible to add queries via the Percolator interface, so what I did is use the normal method to create a document against the .percolator index as follow:
my $e = Search::Elasticsearch->new( nodes => 'localhost:9200' );
$e->create(
index => 'my_index',
type => '.percolator',
id => $max_idx,
body => {
query => {
match => {
...whatever the query is....
},
},
},
);
Is that the best way of adding a query to the percolator index via the perl module ?
Thanks!
As per DrTech answer the code I posted looks to be the correct way of doing it.

Perl mongo->collection count using '$in' operator

I was wondering if it was possible to use the "in" operator as you can from the mongo shell, using the perl MongoDB::Collection module. I have tried a number of things, but haven't quite got the result I am expecting. I've check the docs and other posts on stackoverflow but can't seem to find anything specifically about this, unless I am overlooking something.
http://docs.mongodb.org/manual/reference/operator/query/in/
The count query I am running via the mongo shell is
mongo:PRIMARY> db.getCollection("Results").count( { TestClass : "TestClass", TestMethod : { $in: ["method1" , "method2", "method3"] } })
181605
I have tried this a few different ways passing the list as an array or hash-refs or pre-building a string...
my $count = $mongo->{collection}->count({
'TimeStamp' => { '$gt' => $ft, '$lt' => $tt },
'TestClass' => $TestClass,
'TestMethod' => { '$in' => [$whitelist->methods] },
'Result' => $result
});
Where Dumping $whitelist->methods is
$VAR1 = {
'method1' => 1,
'method2' => 1,
'method3' => 1
};
I've looked high and low for an answer, does anyone know if the driver is currently capable of using the $in operator like this? Looping through the returned methods from a previous query and adding up the results will require more code.
The only other stack overflow post I have seen about the $in operator was this $in mongoDB operator with _id in perl recommending using http://api.mongodb.org/perl/current/MongoDB/OID.html but don't think that is relevant in my example as looks more to do with ID's.
Any help or discussion would be greatly appreciated.
The problem is that $in clause expects its value to be an array reference, but you supply a hashref (as Dumper's output shows) into it. The easiest way to turn the latter into the former is to apply keys function:
# ...
'TestMethod' => { '$in' => [keys %{$whitelist->methods}] }
... or just [keys $whitelist->methods], if you're using Perl 5.14+, as ...
starting with Perl 5.14, keys can take a scalar EXPR, which must
contain a reference to an unblessed hash or array
.

Incorrect sort result when using $nearSphere

this sorted correctly when using directly
db.users.find({currentloc : {$nearSphere : [115.22804,-8.69914]}})
but when execute from PHP, it look like sorted by _id
$users = $this->m->mappt->users;
$results = $users->find(
array(
'currentloc' => array('$nearSphere' => array(115.22804,-8.69914))
);
$arrayresult = iterator_to_array($results);
any ideas ?
Your query looks fine. I can think of a few things:
You don't have a 2dsphere index, which you do have from the shell
iterator_to_array() is messing with it — if you do a normal foreach() do you get them in the right order then?
Adi, You can get idea from Here. You can try to use variable name for geo values.
Other try like this,
$collection->find(Array("point" => Array('$within' => Array('$center'=> Array(Array(151.1955562233925,-33.87107475181752), 0.1/111 ) ) )));

How to select subdocuments with MongoDB

I have a collection with a subdocument tags like :
Collection News :
title (string)
tags: [tag1, tag2...]
I want to select all the tags who start with a pattern, but returning only the matching tags.
I already use a regex but it returns all the news containing the matching tag, here is the query :
db.news.find( {"tags":/^proga/i}, ["tags"] ).sort( {"tags":1} ).
limit( 0 ).skip( 0 )
My question is : How can I retrieve all the tags (only) who match the pattern ?
(The final goal is to make an autocomplete field)
I also tried using distinct, but I didn't find a way to make a distinct with a find, it always returning me all the tags :(
Thanks for your time
A bit late to the party, but hopefully will help others who are hunting for a solution. I've found a way to do this using the aggregation framework and combining $project and $unwind with the $match, by chaining them together. I've done it using PHP but you should get the gist:
$ops = array(
array('$match' => array(
'collectionColumn' => 'value',
)
),
array('$project' => array(
'collection.subcollection' => 1
)
),
array('$unwind' => '$subCollection'),
array('$match' => array(
subCollection.subColumn => 'subColumnValue'
)
)
);
The first match and project are just use to filter out to make it faster, then the unwind on subcollection spits out each subcollection item by item which can then be filtered using the final match.
Hope that helps.
UPDATE (from Ryan Wheale):
You can then $group the data back into its original structure. It's like having an $elemMatch which returns more than one subdocument:
array('$group' => array(
'_id' => '$_id',
'subcollection' => array(
'$push' => '$subcollection'
)
)
);
I translated this from Node to PHP, so I haven't tested in PHP. If anybody wants the Node version, leave a comment below and I will oblige.
Embedded documents are not collections. Look at your query: db.news.find will return documents from the news collection. tags is not a collection, and cannot be filtered.
There is a feature request for this "virtual collection feature" (SERVER-142), but don't expect to see this too soon, because it's "planned but not scheduled".
You can do the filtering client-side, or move the tags to a separate collection. By retrieving only a subset of fields - only the tags field - this should be reasonably fast.
Hint: Your regex uses the /i flag, which makes it impossible to use indexation. Your db strings should be case-normalized (e.g. all upper case)