Mongo db find multiple conditions on the same key - mongodb

I encountered an unexpected behavior in find() logic on mongodb community 5.0.6.
My collection is:
{ _id: 1, a: 1, b: 1 }
{ _id: 2, a: 2, b: 2 }
{ _id: 3, a: 3, b: 3 }
when I execute
db.selection.find({a:1,b:2})
I obtain an empty set as resultset, as expected for condition a=1 AND b=2 (AND logic in selection)
But when I execute
db.selection.find({a:1,a:2})
Which should represent the logic condition a=1 AND a=2
I obtain this result
{ _id: 2, a: 2, b: 2 }
Honestly I was expecting an empty set again.
Someone can explain it?

Related

Partitions are traversed multiple times invalidating Accumulator consistency

We are trying to use Accumulators to count RDDs without having to force .count() on them for efficiency reasons. We are aware tasks can fail and re-run, which will invalidate the value of the accumulator, so we count the number of times a partition has been traversed, so we can detect this.
The problem is that partitions are being traversed multiple times even though
We cache the RDD in memory after applying the logic below
No tasks are failing, no executors are dying.
There is plenty of memory (no RDD eviction)
The code we use:
val count: LongAccumulator
val partitionTraverseCounts: List[LongAccumulator]
def increment(): Unit = count.add(1)
def incrementTimesCalled(partitionIndex: Int): Unit =
partitionTraverseCounts(partitionIndex).add(1)
def incrementForPartition[T](index: Int, it: Iterator[T]): Iterator[T] = {
incrementTimesCalled(index)
it.map { x =>
increment()
x
}
}
How we use the above:
rdd.mapPartitionsWithIndex(safeCounter.incrementForPartition)
We have a 50 partition RDD, and we frequently see odd traverse counts:
traverseCounts: List(2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2)
As you can see, some partitions are traversed twice, while others are traversed only once.
To confirm no task failures:
cat job.log | grep -i task | grep -i fail
To confirm no memory issues:
cat job.log | grep -i memory
We see every log line has multiple GB memory free.
We also don't see any errors or exceptions.
Question:
Why is spark traversing a cached RDD multiple times?
Is there any way to disable this?
See also:
https://issues.apache.org/jira/browse/SPARK-40048

How to split one liner object keys to multiple lines - so each key will has its own line?

I have this:
method({ a: 1, b: 2, c: 3 });
I want to find the magic key in VSCode that turns it to this:
method({
a: 1,
b: 2,
c: 3
});
And vice versa

How to retrieve all documents with the $match query in MongoDB Atlas

I am getting values a,b from user through an HTML form and passing it to the below query. My requirement is to retrieve document based on the a & b values and in case they are empty I need to retrieve all the doucments. Can someone help me with the query, please? What should I pass instead of search_data["a"] & search_data["b"] to get all the documents?
query = user_collection.aggregate([
{
"$project": {
"_id": 0
}
},
{
"$match": {
"a": search_data['a'],
"b": search_data['b'],
}
}
])
If you are only doing a match and project, you don't need an aggregate query, you can use the much simpler find() operation.
The code below will take your search_data dict and using dict comprehension, create a search_filter that only filters on the keys that have some data in (e.g.) removes nulls (None) and empty ('') fields. It's a nicer solution as you can add more fields without having to change the code.
search_filter = {k: v for (k, v) in search_data.items() if not (v is None or v == '')}
query = user_collection.find(search_filter, {'_id': 0})
Full worked example:
from pymongo import MongoClient
db = MongoClient()['mydatabase']
user_collection = db.user_collection
def filtered_query(search_data):
search_filter = {k: v for (k, v) in search_data.items() if not (v is None or v == '')}
print(search_filter) # Just to show what it is doing
return user_collection.find(search_filter, {'_id': 0})
# Test it out!
filtered_query({'a': 1, 'b': ''})
filtered_query({'a': None, 'b': 3, 'c': 'x'})
filtered_query({'a': 'x123', 'b': 3, 'c': 'x', 'd': None, 'e': '', 'f': 'f'})
gives:
{'a': 1}
{'b': 3, 'c': 'x'}
{'a': 'x123', 'b': 3, 'c': 'x', 'f': 'f'}

Scala "takeWhile" flow confusion

I currently try to learn Scala and force me to handle as most as possible using functional programming style.
The following code has a flow that I didn't understand:
object Testing {
def XForm(i: Int) = {
println(i)
if (i < 3) "%d".format(i * i) else ""
}
def main(args: Array[String]) {
print(Range(0, 6).map(XForm).takeWhile(_.nonEmpty))
}
}
The output is as follows:
0
1
2
3
4
5
Vector(0, 1, 4)
Why is XForm called for values 4 and 5? I thought the 'loop' using takeWhile (in comparision to filter) is terminated on the first false occurence?
How can I solve this in a different (functional style) way?
The map on Range is strict, so it is evaluated immediately. That is, if you remove the takeWhile, you'll see everything evaluates before you even get to where the takeWhile would happen:
scala> Range(0, 6).map(XForm)
0
1
2
3
4
5
res1: scala.collection.immutable.IndexedSeq[String] = Vector(0, 1, 4, "", "", "")
You can solve this by using a view, which will lazily evaluate the collection.
scala> Range(0, 6).view.map(XForm).takeWhile(_.nonEmpty).force
0
1
2
3
res4: Seq[String] = Vector(0, 1, 4)

Order of parameters to foldright and foldleft in scala

Why does the foldLeft take
f: (B, A) => B
and foldRight take
f: (A, B) => B
foldLeft could have been written to take f: (A, B) => B.
I am trying to understand the reasoning for the difference in the order of parameters.
It's supposed to show you the direction of the aggregation. FoldLeft aggregates from left to right, so you can imagine the accumulator B as bunching up stuff on the left side as it approaches each A:
If you have something like:
Vector(1,2,3,4,5).foldLeft(0)((b,a) => b + a)
Then you get this behavior
B ...As...
---------------
(0), 1, 2, 3, 4, 5
(0+1), 2, 3, 4, 5
(0+1+2), 3, 4, 5
(0+1+2+3), 4, 5
(0+1+2+3+4), 5
(0+1+2+3+4+5)
FoldRight, on the other hand, aggregates things from the right side. So if you have something like:
Vector(1,2,3,4,5).foldRight(0)((a,b) => a + b)
Then you get this behavior
...As... B
-----------------
1, 2, 3, 4, 5 (0)
1, 2, 3, 4, (5+0)
1, 2, 3, (4+5+0)
1, 2, (3+4+5+0)
1, (2+3+4+5+0)
(1+2+3+4+5+0)
#dhg already provided a great answer. My example illustrates an interesting subtlety: namely, that sometimes the the order in which the initial
value is passed to the given function matters. So I figured I'd post this on the off chance someone
is interested in cases where foldRight can behave differently than foldLeft
with the same initial value, same function, and same input list.
Consider the exponentiation below:
def verbosePower(base:Double, exp:Double) = {
println(s"base=$base / exp=$exp") ;
math.pow(base, exp)
}
var X = List(2.0,3).foldLeft(1.0) (verbosePower)
System.out.println("x:" + X);
X = List(2.0,3).foldRight(1.0) (verbosePower)
System.out.println("x:" + X);
the output and result from foldLeft is:
base=1.0 / exp=2.0
base=1.0 / exp=3.0
X: Double = 1.0
the output and result from foldRight is:
base=3.0 / exp=1.0
base=2.0 / exp=3.0
X: Double = 8.0