MONGODB: Inconsistent query results (corrupted db?) - mongodb

Our app is running smoothly for about 2 years now (at least in the following aspect).
But some weeks ago some of our clients started complaining about a relevant count in the system. We investigated it, reduced the example to a minumum case, and ended up with this weird situation:
db.my_collection.find().count()
=> 250088
db.my_collection.distinct("field_1")
=> [false, true]
db.my_collection.find({"field_1":{$nin:[true, false]}}).count()
=> 0
db.my_collection.find({"field_1":{$in:[true, false]}}).count()
=> 250088
db.my_collection.find({"field_1":true}).count()
=> 140357
db.my_collection.find({"field_1":false}).count()
=> 109731
(140357 + 109731 = 250088, great!)
db.my_collection.find({"field_2":null}).count()
=> 4638
db.my_collection.find({"field_2":{$ne:null}}).count()
=> 245450
(4638 + 245450 = 250088, great!)
db.my_collection.find({"field_2":{$ne:null}, "field_1":{$in:[true, false]}}).count()
=> 245450 (great!)
But look at this:
db.my_collection.find({"field_2":null, "field_1":false}).count()
=> 2669
db.my_collection.find({"field_2":null, "field_1":true}).count()
=> 790
2669 + 790 = 3459 (should sum 4638 !!)
db.my_collection.find({"field_2":{$ne:null}, "field_1":false}).count()
=> 75795
db.my_collection.find({"field_2":{$ne:null}, "field_1":true}).count()
=> 107298
75795 + 107298 = 183093 (should sum 245450 !!)
Also1: This problem only happens with this specific fields composition in the query. If we compose each of these two fields with others, everything works fine (we didn't make an exaustive search here. But we tried a bunch of them)
Also2: We tried to make the queries with various diffeten hint values (including {$natural:1}), to see if it could be a corrupted index, but it is still broken.
Also3: We resync'd one of our seccondaries, and the error persists.
Also4: We coppied the collection, document by document, through the console and a find().forEach({insert}) query, (same db version) and IT WORKS CORRECTLY!!!
Also5: we're using mongodb 2.4.8.
WTF might be happening?
What should I do to correct it RIGHT?
What should I do to correct it FAST?
===== UPDATE 18-sep
We found out that we had some indexes with a key that was too large and mongodb 2.4 just logs and ignores it. So the index was silently breaking. (looks like mongo 2.6 solves this in a more polished way: http://docs.mongodb.org/master//release-notes/2.6-compatibility/#index-key-length-incompatibility )
We dropped those indexes (they were just some tests we made and forgot) and the queries came back home.
We just did not realize yet why we could not get right results when giving hints, even the $natural:1...

Related

Lucene.NET version 4.8 beta casing issue [duplicate]

This question already has answers here:
Java Lucene 4.5 how to search by case insensitive
(3 answers)
Closed 3 years ago.
I'm using Lucene.NET version 4.8 (beta) for a little search task in a solution I'm doing, but have problems searching case insensitive. I know that Lucene isn't case insensitive, but when using the StandardAnalyzer, it should lowercase the data stored (according to the documentation here StandardAnalyzer), as long as you make sure the queries are done right.
So any idea what I'm doing wrong here? I've stored the data "Kirsten" in a field in 4 different documents, and when searching for (lowercased) "kirsten" I get no hits, but when searching for "Kirsten" I get the expected 4.
Here's my query code:
query = query.ToLowerInvariant();
BooleanQuery q = new BooleanQuery {
new BooleanClause(new WildcardQuery(new Term(FieldNames.Name, query + WildcardQuery.WILDCARD_STRING)), Occur.SHOULD),
new BooleanClause(new WildcardQuery(new Term("mt-year", query)), Occur.SHOULD),
new BooleanClause(new WildcardQuery(new Term("mt-class", query + WildcardQuery.WILDCARD_STRING)), Occur.SHOULD)
};
And the issue is that the users would always write the lowercase version, and expect it to find both lower- and upper-case.
As #Peska wrote in the comments, this was a case of using StringField instead of TextField when adding the document (and data) to Lucene.
Once I switched to using TextField, everything worked as expected.

Performance issue with fluent query in EF vs SharpRepository

I was having some performance issues using SharpRepository, and after playing around the SQL Query Profiler I found the reason.
With EF I can do stuff like this:
var books = db.Books.Where(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
EF will not really do anything until books is used (last line) and then it will compile a query that will select only the small set of data matching year and author from the db.
But SharpRepository evaluates books at once, so this:
var books = book_repo.Books.FindAll(item => item.Year == '2016');
if (!string.IsNullorEmpty(search_author))
books = books.Where(item => item.Author.Contains(search_author);
return (books.ToList());
will compile a query like "select * from Books where Year == '2016'" at the first line, and get ALL those records from the database! Then at the second line it will make a search for the author within the C# code... That behaviour can be a major difference in performance when using large databases, and it explains why my queries timed out...
I tried using repo.GetAll().Where() instead of repo.FindAll().... but it worked the same way.
Am I misunderstanding something here, and is there a way around this issue?
You can use repo.AsQueryable() but by doing that you lose some of the functionality that SharpRepository can provide, like caching or and aspects/hooks you are using. It basically takes you out of the generic repo layer and lets you use the underlying LINQ provider. It has it's benefits for sure but in your case you can just build the Predicate conditionally and pass that in to the FindAll method.
You can do this by building an Expression predicate or using Specifications. Working with the Linq expressions does not always feel clean, but you can do it. Or you can use the Specification pattern built into SharpRepository.
ISpecification<Book> spec = new Specification<Book>(x => x.Year == 2016);
if (!string.IsNullorEmpty(search_author))
{
spec = spec.And(x => x.Author.Contains(search_author));
}
return repo.FindAll(spec);
For more info on Specifications you can look here: https://github.com/SharpRepository/SharpRepository/blob/develop/SharpRepository.Samples/HowToUseSpecifications.cs
Ivan Stoev provided this answer:
"The problem is that most of the repository methods return IEnumerable. Try repo.AsQueryable(). "

Exception in Expression Trees

This is my model:
- Business
- BusinesType - FK
- Categories (*) - FK
- Branch (*)
- BranchType - FK
- Address
- Phone (*)
- CustomFields (*)
- OpeningTimes (*)
- WorkingPeriods (*)
- .....
Now I have a controller-action that accepts a form that consists of the whole bunch of data as a single Business entity with all its properties and collections set fine.
Now I have to walk thru all the properties and collections recursively, and compare with the database graph; if they don't exist, add them, if they do walk thru all properties again and perform the same to a deeper level until no navigation properties are left over. Since I have many more properties and descendants than mentioned in the previous example, it's just inside to walk thru them manually.
Thanks to this answer I found GraphDiff which offered a brilliant solution to the situation.
Here's an update query I'm calling:
Context.UpdateGraph(business, bus => bus
.AssociatedEntity(bu => bu.BusinessType)
.AssociatedCollection(bu => bu.Categories)
.OwnedCollection(bu => bu.Branches, branch => branch
.AssociatedEntity(b => b.BranchType)
.OwnedEntity(b => b.Address)
.OwnedCollection(b => b.Phones)
.OwnedCollection(b => b.CustomFields)
.OwnedCollection(b => b.OpeningTimes, openingTimes => openingTimes
.OwnedCollection(b => b.WorkingPeriods)
)
)
);
It throws this exception:
System.InvalidCastException: Unable to cast object of type 'System.Linq.Expressions.MethodCallExpressionN' to type 'System.Linq.Expressions.MemberExpression'.
I tried debugging the source code, but I'm not an expert with Expression Trees, the problem occurs when the internal Include call (to include object graph to load store object) tries to attach WorkingPeriods, looks like it's not ready to take that deepness level of recursion. I messed around with it a bit, but I'm sure someone with extensive knowledge in expression trees will be able to solve this easily. Any suggestions will be appreciated on that to.
Here's what the include path expression is supposed to be generated like:
.Include(b =>
b.Branches.Select(br =>
br.OpeningTimes.Select(ot =>
ot.WorkingPeriods)));
Here's the stacktrace of the error.
Essentially, the exception is thrown because the recursive call returns the inner include as a method call, without processing it and returning the collection property it's meant to expose.
sorry it took me a while to get back to you.
I'ts 3am and I've had a fair bit of wine but the problem is fixed :) If you get the latest version of code # https://github.com/refactorthis/GraphDiff it should work fine.
I'll update the new nuget package (RefactorThis.GraphDiff) soon.

PHP mongoDb driver , fetching data and then deleting it , is not working

This is the sample code :
$r = $coll->findOne();
$coll->remove(array("_id"=>$r["_id"])); // use the same object id as retreived from DB
$ret=$coll->findOne(array("_id"=>($r["_id"])));
var_dump($ret); // dumps the records that was supposed to be deleted
The records in the collection have MongoDB objectId, and not strings.
Same logic on console works fine and deleted the record correctly.
This is working for me. Here's the code:
$coll->drop();
print("Now have ".$coll->count()." items\n");
$coll->insert(array("x" => 'blah'));
$coll->insert(array("x" => "blahblah"));
print("Inserted ".$coll->count()." items\n");
$x = $coll->findOne();
print("Object X\n");
print_r($x);
$query_x = array('_id' => $x['_id']);
$coll->remove($query_x);
print("Removed 1 item, now have ".$coll->count()." items\n");
$y = $coll->findOne($query_x);
print("Object Y\n");
print_r($y);
Here's the output:
Now have 0 items
Inserted 2 items
Object X
Array
(
[_id] => MongoId Object
(
[$id] => 4d8d124b6803fa623b000000
)
[x] => blah
)
Removed 1 item, now have 1 items
Object Y
Are you sure there's not a typo somewhere?
Unlike the php == operator, mongo's equality operator always uses "Object equality" which is like php's identical comparison operator (===) or java's .equals(). While your code looks as though it should work (and it does work fine for me with a test dataset), something about your dataset may be causing php to cast the returned MongoId to a string. Read more about MongoId here.
Make sure that your query is supplying a MongoId for comparison by doing a var_dump of the query itself. Also, make sure that you are running the latest version of the PHP Mongo driver.
Since PHP is loosely typed it is most important to ensure that you cast all input values and search values to the expected and consistent data type otherwise it will surely not locate your expected document.
While using MongoDB within PHP I make it a point to cast everything intentionally in order to avoid any possible confusion or error.
Also, the mongodb-user group at groups.google.com is very good and responsive so if you are not already utilizing that resource I would definitely consider joining it.

restrict documents for mapreduce with mongoid

I implemented the pearson product correlation via map / reduce / finalize. The missing part is to restrict the documents (representing users) to be processed via a filter query. For a simple query like
mapreduce(mapper, reducer, :finalize => finalizer, :query => { :name => 'Bernd' })
I get this to work.
But my filter criteria is a little bit more complicated: I have one set of preferences which need to have at least one common element and another set of preferences which may not have a common element. In a later step I also want to restrict this to documents (users) within a certain geographical distance.
Currently I have this code working in my map function, but I would prefer to separate this into either query params as supported by mongoid or a javascript function. All my attempts to solve this failed since the code is either ignored or raises an error.
I did a couple of tests.
[results deleted, see below]
I'm using ruby 1.9.2, mongodb 1.6.5-x86_64, and the mongoid 2.0.0.beta.20, mongo 1.1.5 and bson 1.1.5 gems on MacOS.
What am I doing wrong?
Thanks in advance.
More Tests (problem partly solved):
User.where(:name.in => ["Arno", "Bernd", "Claudia"])
works, however when using mapreduce like this
mapreduce(..., :query => { :name.in => ['Arno', 'Bernd', 'Claudia'] })
results in the bson error serialize: keys must be strings or symbols (TypeError) mentioned above.
Replacing :name.in with 'name.in' or 'name.$in' makes the query return no results (guess this is passed as is to mongodb).
However
mapreduce(..., :query => { :name => { '$in' => ['Arno', 'Bernd', 'Claudia'] } })
works, but I didn't have any success with my geospatial query attempts, no matter how I wrote the expression.
mapreduce(..., :query => {:location => { "$near" => [ 13, 52, 1 ] } })
results in this error_message: Database command 'mapreduce' failed: {"assertion"=>"manual matcher config not allowed",....
If anyone could give me an idea how to write a geospatial query using near or within and working with mapreduce I would be very happy. (I didn't play with sets yet, see above).