I implemented the pearson product correlation via map / reduce / finalize. The missing part is to restrict the documents (representing users) to be processed via a filter query. For a simple query like
mapreduce(mapper, reducer, :finalize => finalizer, :query => { :name => 'Bernd' })
I get this to work.
But my filter criteria is a little bit more complicated: I have one set of preferences which need to have at least one common element and another set of preferences which may not have a common element. In a later step I also want to restrict this to documents (users) within a certain geographical distance.
Currently I have this code working in my map function, but I would prefer to separate this into either query params as supported by mongoid or a javascript function. All my attempts to solve this failed since the code is either ignored or raises an error.
I did a couple of tests.
[results deleted, see below]
I'm using ruby 1.9.2, mongodb 1.6.5-x86_64, and the mongoid 2.0.0.beta.20, mongo 1.1.5 and bson 1.1.5 gems on MacOS.
What am I doing wrong?
Thanks in advance.
More Tests (problem partly solved):
User.where(:name.in => ["Arno", "Bernd", "Claudia"])
works, however when using mapreduce like this
mapreduce(..., :query => { :name.in => ['Arno', 'Bernd', 'Claudia'] })
results in the bson error serialize: keys must be strings or symbols (TypeError) mentioned above.
Replacing :name.in with 'name.in' or 'name.$in' makes the query return no results (guess this is passed as is to mongodb).
However
mapreduce(..., :query => { :name => { '$in' => ['Arno', 'Bernd', 'Claudia'] } })
works, but I didn't have any success with my geospatial query attempts, no matter how I wrote the expression.
mapreduce(..., :query => {:location => { "$near" => [ 13, 52, 1 ] } })
results in this error_message: Database command 'mapreduce' failed: {"assertion"=>"manual matcher config not allowed",....
If anyone could give me an idea how to write a geospatial query using near or within and working with mapreduce I would be very happy. (I didn't play with sets yet, see above).
Related
I was having some performance issues using SharpRepository, and after playing around the SQL Query Profiler I found the reason.
With EF I can do stuff like this:
var books = db.Books.Where(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
EF will not really do anything until books is used (last line) and then it will compile a query that will select only the small set of data matching year and author from the db.
But SharpRepository evaluates books at once, so this:
var books = book_repo.Books.FindAll(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
will compile a query like "select * from Books where Year == '2016'" at the first line, and get ALL those records from the database! Then at the second line it will make a search for the author within the C# code... That behaviour can be a major difference in performance when using large databases, and it explains why my queries timed out...
I tried using repo.GetAll().Where() instead of repo.FindAll().... but it worked the same way.
Am I misunderstanding something here, and is there a way around this issue?
You can use repo.AsQueryable() but by doing that you lose some of the functionality that SharpRepository can provide, like caching or and aspects/hooks you are using. It basically takes you out of the generic repo layer and lets you use the underlying LINQ provider. It has it's benefits for sure but in your case you can just build the Predicate conditionally and pass that in to the FindAll method.
You can do this by building an Expression predicate or using Specifications. Working with the Linq expressions does not always feel clean, but you can do it. Or you can use the Specification pattern built into SharpRepository.
ISpecification<Book> spec = new Specification<Book>(x => x.Year == 2016);
if (!string.IsNullorEmpty(search_author))
{
spec = spec.And(x => x.Author.Contains(search_author));
}
return repo.FindAll(spec);
For more info on Specifications you can look here: https://github.com/SharpRepository/SharpRepository/blob/develop/SharpRepository.Samples/HowToUseSpecifications.cs
Ivan Stoev provided this answer:
"The problem is that most of the repository methods return IEnumerable. Try repo.AsQueryable(). "
Our app is running smoothly for about 2 years now (at least in the following aspect).
But some weeks ago some of our clients started complaining about a relevant count in the system. We investigated it, reduced the example to a minumum case, and ended up with this weird situation:
db.my_collection.find().count()
=> 250088
db.my_collection.distinct("field_1")
=> [false, true]
db.my_collection.find({"field_1":{$nin:[true, false]}}).count()
=> 0
db.my_collection.find({"field_1":{$in:[true, false]}}).count()
=> 250088
db.my_collection.find({"field_1":true}).count()
=> 140357
db.my_collection.find({"field_1":false}).count()
=> 109731
(140357 + 109731 = 250088, great!)
db.my_collection.find({"field_2":null}).count()
=> 4638
db.my_collection.find({"field_2":{$ne:null}}).count()
=> 245450
(4638 + 245450 = 250088, great!)
db.my_collection.find({"field_2":{$ne:null}, "field_1":{$in:[true, false]}}).count()
=> 245450 (great!)
But look at this:
db.my_collection.find({"field_2":null, "field_1":false}).count()
=> 2669
db.my_collection.find({"field_2":null, "field_1":true}).count()
=> 790
2669 + 790 = 3459 (should sum 4638 !!)
db.my_collection.find({"field_2":{$ne:null}, "field_1":false}).count()
=> 75795
db.my_collection.find({"field_2":{$ne:null}, "field_1":true}).count()
=> 107298
75795 + 107298 = 183093 (should sum 245450 !!)
Also1: This problem only happens with this specific fields composition in the query. If we compose each of these two fields with others, everything works fine (we didn't make an exaustive search here. But we tried a bunch of them)
Also2: We tried to make the queries with various diffeten hint values (including {$natural:1}), to see if it could be a corrupted index, but it is still broken.
Also3: We resync'd one of our seccondaries, and the error persists.
Also4: We coppied the collection, document by document, through the console and a find().forEach({insert}) query, (same db version) and IT WORKS CORRECTLY!!!
Also5: we're using mongodb 2.4.8.
WTF might be happening?
What should I do to correct it RIGHT?
What should I do to correct it FAST?
===== UPDATE 18-sep
We found out that we had some indexes with a key that was too large and mongodb 2.4 just logs and ignores it. So the index was silently breaking. (looks like mongo 2.6 solves this in a more polished way: http://docs.mongodb.org/master//release-notes/2.6-compatibility/#index-key-length-incompatibility )
We dropped those indexes (they were just some tests we made and forgot) and the queries came back home.
We just did not realize yet why we could not get right results when giving hints, even the $natural:1...
I've a problem that's getting me crazy!
I need to translate the following Mongo query in PHP:
db.mycollection.find({$or: [ {'field.a': 45.4689, 'field.b': 9.18103}, {'field.a' : 40.71455, 'field.b': -74.007124} ]})
It works perfectly from the shell.
I think the query should be translated to PHP in the following way (var_dump):
Array
(
[$or] => Array
(
[0] => Array
(
[field.a] => 45.468945
[field.b] => 9.18103
)
[1] => Array
(
[field.a] => 40.71455
[field.b] => -74.007124
)
)
)
but I get no results in PHP!
Why?
What's wrong?
What's the correct syntax?
Thank you!
Your syntax seems fine to me, I think the main problem is is that you are using floating point numbers which are not always as accurate as you think—especially if you mix up 45.4689 and 45.468945 yourself. For direct comparing of floating point numbers, you should always add a small fuzzing factor.
In this case you seem to be using coordinates? If that's the case, I suggest you:
swap the a and b fields (so that you get longitude, then latitude)
create a 2d index on "field": ensureIndex( array( 'field' => '2d' ) );
use geoNear with a small max distance and a limit of 1,
That should give a much better way of scanning for points. You'd have to run the query twice though, as the geoNear command can't do $or.
If you only have a discrete set of points (like your comments seems to indicate), then I would recommend to not query on floating point numbers but simply add a field naming the city as well.
I have a few thousand strings (items) that I would like to translate. I have structured my MongoDB as follows:
#document = {:item => "hello", :translations => {:fr => {:name => "bonjour",
:note => "easy"}, :es => {:name => "hola", :note => "facil"}}}
The :translations field can contain many more languages and properties. I would like to run queries such as retrieving all items with no translations for a specific language, or retrieving all items having 'bonjour' as a French translation.
I don't see how I can do this. Is there a better way to structure my database for these purposes? I am using node.js.
Thanks.
I would like to run queries such as retrieving all items with no translations for a specific language,
.find({ 'translations.fr': {$exists:false} })
...or retrieving all items having 'bonjour' as a French translation.
.find({ 'translations.fr.name': "bonjour" })
Is there a better way to structure my database for these purposes?
I believe that you have the correct structure. You will have to become familiar with the Dot Notation.
I'd say that for your purpose the model is good. You need mongo dot notation, you can use $exists to look for fr and dot notation for bonjour -
find({ "fr.name" : "bonjour" })
This is the sample code :
$r = $coll->findOne();
$coll->remove(array("_id"=>$r["_id"])); // use the same object id as retreived from DB
$ret=$coll->findOne(array("_id"=>($r["_id"])));
var_dump($ret); // dumps the records that was supposed to be deleted
The records in the collection have MongoDB objectId, and not strings.
Same logic on console works fine and deleted the record correctly.
This is working for me. Here's the code:
$coll->drop();
print("Now have ".$coll->count()." items\n");
$coll->insert(array("x" => 'blah'));
$coll->insert(array("x" => "blahblah"));
print("Inserted ".$coll->count()." items\n");
$x = $coll->findOne();
print("Object X\n");
print_r($x);
$query_x = array('_id' => $x['_id']);
$coll->remove($query_x);
print("Removed 1 item, now have ".$coll->count()." items\n");
$y = $coll->findOne($query_x);
print("Object Y\n");
print_r($y);
Here's the output:
Now have 0 items
Inserted 2 items
Object X
Array
(
[_id] => MongoId Object
(
[$id] => 4d8d124b6803fa623b000000
)
[x] => blah
)
Removed 1 item, now have 1 items
Object Y
Are you sure there's not a typo somewhere?
Unlike the php == operator, mongo's equality operator always uses "Object equality" which is like php's identical comparison operator (===) or java's .equals(). While your code looks as though it should work (and it does work fine for me with a test dataset), something about your dataset may be causing php to cast the returned MongoId to a string. Read more about MongoId here.
Make sure that your query is supplying a MongoId for comparison by doing a var_dump of the query itself. Also, make sure that you are running the latest version of the PHP Mongo driver.
Since PHP is loosely typed it is most important to ensure that you cast all input values and search values to the expected and consistent data type otherwise it will surely not locate your expected document.
While using MongoDB within PHP I make it a point to cast everything intentionally in order to avoid any possible confusion or error.
Also, the mongodb-user group at groups.google.com is very good and responsive so if you are not already utilizing that resource I would definitely consider joining it.