MongoDB $lte strange results

MongoDB $lte strange results - mongodb

I am trying to implement a (semi-) random result from MongoDB; I know, Q/A's a plenty, here on SO to. But... As I tried to query MongoDB, just for fun to see what I would get, $lte gives me some very strange results. Consider this dataset:
1. 0.011224885703995824
2. 0.01718393270857632
3. 0.03377954219467938
4. 0.09617210761643946
5. 0.10130057414062321
6. 0.13116577989421785
7. 0.25664394721388817
8. 0.27124307211488485
9. 0.3029055509250611
10. 0.31508319173008204
11. 0.3163822046481073
12. 0.34581731259822845
13. 0.5077376591507345
14. 0.5806738587561995
15. 0.5997774603310972
16. 0.6492975174915045
17. 0.710568506969139
18. 0.7257499841507524
19. 0.7275129975751042
20. 0.771076871547848
stored in a field called random, and filled with the rand-function. There is a index on this field, but with 20 records, I think that does not matter. But. If I query with: .findOne( {'random': { $lte : 0.59 }} ) I get the result: "random" : 0.34581731259822845 (number 12), while expecting number 14...?
When I tried some more queries, the result is very strange. Sometimes it is what I expect, but most of the time it seems to... well, I don't know. I do not understand how the query above leads to the result it gives...

Ah! I think I get it now - that is wat Yogesh meant: findOne takes the first document in the natural order that statisfies the criteria! Right. I did not get that. So that means that if the random number is high, it will often return the same number. So Yogeshes answer is correct - although he might have elaborated a bit :-)

Related

Gremlin: Calculate division of based on two counts in one line of code

I have two counts, calculated as follows:
1)g.V().hasLabel('brand').where(__.inE('client_brand').count().is(gt(0))).count()
2)g.V().hasLabel('brand').count()
and I want to get one line of code that results in the first count divided by the second.

Here's one way to do it:
g.V().hasLabel('brand').
fold().as('a','b').
math('a/b').
by(unfold().where(inE('client_brand')).count())
by(unfold().count())
Note that I simplify the first traversal to just .where(inE('client_brand')).count() since you only care to count that there is at least one edge, there's no need to count them all and do a compare.
You could also union() like:
g.V().hasLabel('brand').
union(where(inE('client_brand')).count(),
count())
fold().as('a','b').
math('a/b').
by(limit(local,1))
by(tail(local))
While the first one was a bit easier to read/follow, I guess the second is nicer because it only stores a list of the two counts whereas, the first stores a list of all the "brand" vertices which would be more memory intensive I guess.
Yet another way, provided by Daniel Kuppitz, that uses groupCount() in an interesting way:
g.V().hasLabel('brand').
groupCount().
by(choose(inE('client_brand'),
constant('a'),
constant('b'))).
math('a/(a+b)')
The following solution that uses sack() step shows why we have math() step:
g.V().hasLabel('brand').
groupCount().
by(choose(inE('client_brand'),
constant('a'),
constant('b'))).
sack(assign).
by(coalesce(select('a'), constant(0))).
sack(mult).
by(constant(1.0)). /* we need a double */
sack(div).
by(select(values).sum(local)).
sack()
If you can use lambdas then:
g.V().hasLabel('brand').
union(where(inE('client_brand')).count(),
count())
fold().
map{ it.get()[0]/it.get()[1]}

This is what worked for me:
g.V().limit(1).project('client_brand_count','total_brands')
.by(g.V().hasLabel('brand')
.where(__.inE('client_brand').count().is(gt(0))).count())
.by(g.V().hasLabel('brand').count())
.map{it.get().values()[0] / it.get().values()[1]}
.project('brand_client_pct')

Seeking understanding of the positional operator in the case of a single update of non-periodic time series segment

I'm experimenting with a schema for recording some non-periodic time series data using pymongo. There's a decent amount of stuff out there about periodic time series, but not too much for non-periodic. Taking a leaf from this post.
In this schema, the collection documents each contain fixed count of data/time points. I'm initializing a segment with something like:
nonTime = datetime.fromordinal(1) #create a time we can use as an "invalid" time entry
segment = {
'times': [nonTime] * 64,
'values': [float('nan')]}
collection.insert(segment)
Armed with a beginners understanding of the $ (positional) operator, I concocted the following update:
collection.update({'times': nonTime}, {'$set': {'times.$': now(), 'values.$': newValue}})
This works like a charm. I'm not sure how expensive the linear seek for the match is, but my data isn't so steady that I'm terribly worried about that part. It keeps the segments small and nice.
What I would like to do though is avoid adding newValue when the previous element is the same value. For example, if my segment looks like
{
'times': [t1, t2, t3, nonTime, ... nonTime],
'values': [42.0, 13.0, 42.0, nan, ... nan]}
I want to add a new t4/newValue only if newValue is not equal to that last 42.0 at slot 2 (zero indexed).
I tried the following largish expression on a lark:
collection.update({'$and': [{'times': nonTime}, {'values.$-1': {'$ne': newValue}}]}, {same_set_as_above})
Amazingly, it didn't blow up. But then... it didn't work either. It appended the duplicate 42.0 anyway. Am I just expecting too much of the positional operator? Is there a different way to do this? I was trying to keep it as a single transaction.

mongodb - i cannot update with $pushAll and simple assignment at the same time

the following fails:
db.test.update({_id:102},{$pushAll:{our_days:["sat","thurs","frid"]}, country:"XYZ"}, {upsert:true})
error message: "Invalid modifier specified: country"
The correct way seems to be:
db.test.update({_id:102},{$pushAll:{our_days:["sat","thurs","frid"]}, $set:{country:"XYZ"}}, {upsert:true})
So is it the case that I cannot mix modifiers like "$pushAll" with simple assignments like field:value, in the same update document? Instead I have to use the $set modifier for simple assignments?
Is there anything in the docs that describes this behaviour?

This happens because db.test.update({_id : 1}, {country : 1}) will just change the whole document to country = 1 and thus removing everything else.
So most probably mongo being smart tells you: You want to update specific element and at the same time to remove everything (and that element as well) to substitute it with country = 1. Most probably this is not what you want. So I would rather rise an error.
Regarding the documentation - I think that the best way is to reread mongodb update.

Mongodb $strcasecmp. Strange behaviour when the field content has dollar signs

I'm triying to compare two strings on MongoDB Aggregation Framework. This is the query I'm using:
db.people.aggregate({
$project:{
name:1,
balance:1,
compareBalance:{$strcasecmp:["$balance","$2,500.00"]}
}
});
My problem is that each "$balance" field has a dollar sign at the begining of the string, and the results returned by the query seem to be incorrect. For example:
{
"_id" : ObjectId("5257e2e7834a87e7ea509665"),
"balance" : "$1,416.00",
"name" : "Houston Monroe",
"compareBalance" : 1
}
As you can see the results, the field comparision is 1, but it should be -1 because $2,500.00 is higher than $1,416.00. In fact, all comparisions has a value of 1.
There is a workaround by using $substr to remove the dollar sign at the beginning of all fields, but I want to know who is doing this wrong, MongoDB or me.
Thanks in advance.

It sounds like you are trying to use the "balance" field as a numeric, for example might want to compare $10 to $100.
The best way to do this is to store the actual value, and add the formatting, the $ the , etc when displaying to the user.
So, you would have - balance: 2500
Slightly unrelated...
Not sure if you are doing much calculation on the value, but using binary floating point numbers for currency is a bad idea (can't accurately represent all numbers), so, it's often better to store an integer with the cents (or if high precision is required, an integer for hundredths of cents)
This could give: balanceCents: 250000 or balanceFourDec: 25000000
Then you can use $gt $lt and arithmetic

The $ is used as a field reference operator. So, the aggregation pipeline is trying to do a comparison between a field called "$balance" and "$2,500.00":
{
"balance": "$5,000.00",
"2,500.00": undefined
}
Of course, that's not what you are looking for.
You shouldn't start with the $ in the data. Also, unless you've got fixed length strings, sorting and comparisons isn't going to work the way you would expect if you're trying to store numbers as strings. If you're just doing this as an example, I'd suggest you use the actual math operators for numbers, and leave $strcasecmp to actual strings.

you can use the { $literal: < value > } pipeline operator to ignore the cash sign.
https://docs.mongodb.com/manual/reference/operator/aggregation/literal/

Mongodb - query with '$or' gives no results

There is extremely weird thing happening on my database. I have a query:
db.Tag.find({"word":"foo"})
this thing matches one object. it's nice.
Now, there's second query
db.Tag.find({$or: [{"word":"foo"}]})
and the second one does not give any results.
There's some kind of magic I obviously don't understand :( What is wrong in second query?
in theory, $or requires two or more parameters, so I can fake it with:
db.Tag.find({$or: [{"word":"foo"},{"word":"foo"}]})
but still, no results.

Your second query is perfectly fine, and it should work. Though the docs says, that $or performs logical operation on array of two or more expression, but it would work for single expression also.
Here's a sample that you can see, and try out, to get it to work: -
> db.col.insert({"foo": "Rohit"})
> db.col.insert({"foo": "Aman", "bar": "Rohit"})
>
> db.col.find({"foo": "Rohit"})
{ "_id" : ObjectId("50ed6bb1a401d9b4576417f7"), "foo" : "Rohit" }
> db.col.find({$or: [{"foo": "Rohit"}]})
{ "_id" : ObjectId("50ed6bb1a401d9b4576417f7"), "foo" : "Rohit" }
So, as you can see, both your query when used for my collection works fine. So, there is certainly something wrong somewhere else. Are you sure you have data in your collection?

Okaay, server admin installed mongodb from debian repo. Debian repo had 1.4.4 version of mongodb, aandd looks like $or is simply not yet supported out there :P