MongoDB sort by only exists entry, key with value first and key with null or not exists last

MongoDB sort by only exists entry, key with value first and key with null or not exists last - mongodb

I have a collection of 15000 documents. Some documents have sr_no with numeric values and other with absent of sr_no.
Now i want to get entries like all documents comes first which has sr_no with asc then all others.
I tried .find().sort({sr_no:1}) but it return all null entries first then asc with sr_no.
This question seems too close with duplicate. But slightly defer
with numeric key.
I answered it with hack below.

I used a dirty hack for this.
MongoDB doc says that they have priorities for sorting as posted below image.
So when i sort with asc then it sort first all null (empty key consider as null) entries then sort numeric entries.
What is hack here ?
Store sr_no : "" with empty string default.
Now it will sort first numeric values then string.

Related

Firestore order by time but sort by ID

I have been trying to figure out a way to query a list of documents where I have a range filter on one field and order by another field which of course isn't possible, see my other question: Order by timestamp with range filter on different field Swift Firestore
But is it possible to save documents with the timestamp as id and then it would sort by default? Or maybe hardcode an ID, then retrieve the last created document id and increase id by one for the next post to be uploaded?
This shows how the documents is ordered in the collection
Any ideas how to store documents so they are ordered by created at in the collection?

It will order by document ID (ascending) by default in Swift.
You can use .order(by: '__id__') but the better/documented way is with FieldPath documentID() I don't really know Swift but I assume that it's something like...
.order(by: FirebaseFirestore.FieldPath.documentID())
JavaScript too has an internal variable which simply returns __id__.
.orderBy(firebase.firestore.FieldPath.documentId())
Interestingly enough __name__ also works, but that sorts the whole path, including the collection name (and also the id of course).

If I correctly understood your need, by doing the following you should get the correct order:
For each document, add a specific field of type number, called for example sortNbr and assign as value a timestamp you calculate (e.g. the epoch time, see Get Unix Epoch Time in Swift)
Then build a query sorted on this field value, like:
let docRef = db.collection("xxxx")
docRef.order(by: "sortNbr")
See the doc here: https://firebase.google.com/docs/firestore/query-data/order-limit-data

Yes, you can do this.
By default, a query retrieves all documents that satisfy the query in
ascending order by document ID.
See the docs here: https://firebase.google.com/docs/firestore/query-data/order-limit-data
So if you find a way to use a timestamp or other primary key value where the ascending lexicographical ordering is what you want, you can filter by any fields and still have the results sorted by the primary key, ascending.
Be careful to zero-pad your numbers to the maximum precision if using a numeric key like seconds since epoch or an integer sequence. 10 is lexicographical less than 2, but 10 is greater than 02.
Using ISO formatted YYYY-mm-ddTHH:MM:SS date-time strings would work, because they sort naturally in ascending order.

The order of the documents shown in the Firebase console is mostly irrelevant to the functioning of your code that uses Firestore. The console is just for browsing data, and that sorting scheme makes it relatively intuitive to find a document you might be looking for, if you know its ID. You can't change this sort order in the console.
Your code is obviously going to have other requirements, and those requirements should be coded into your queries, without regarding any sort order you see in the dashboard. If you want time-based ordering of your documents, you'll have to store some sort of timestamp field in the document, and use that for ordering. I don't recommend using the timestamp as the ID of a document, as that could cause problems for you in the future.

Get Redis values while scanning

I've created a Redis key / value index this way :
set 7:12:321 '{"some:"JSON"}'
The key is delimited by a colon separator, each part of the key represents a hierarchic index.
get 7:12:321 means that I know the exact hierarchy and want only one single item
scan 7:12:* means that I want every item under id 7 in the first level of hierarchy and id 12 in the second layer of hierarchy.
Problem is : if I want the JSON values, I have to first scan (~50000 entries in a few ms) then get every key returned by scan one by one (800ms).
This is not very efficient. And this is the only answer I found on stackoverflow searching "scanning Redis values".
1/ Is there another way of scanning Redis to get values or key / value pairs and not only keys ? I tried hscan with as follows :
hset myindex 7:12:321 '{"some:"JSON"}'
hscan myindex MATCH 7:12:*
 
But it destroys the performance (almost 4s for the 50000 entries)
2/ Is there another data structure in Redis I could use in the same way but which could "scan for values" (hset ?)
3/ Should I go with another data storage solution (PostgreSQL ltree for instance ?) to suit my use case with huge performance ?
I must be missing something really obvious, 'cause this sounds like a common use case.
Thanks for your answers.

Optimization for your current solution
Instead of geting every key returned by scan one-by-one, you should use mget to batch get key-value pairs, or use pipeline to reduce RTT.
Efficiency problem of your current solution
scan command iterates all keys in the database, even if the number of keys that match the pattern is small. The performance decreases when the number of keys increases.
Another solution
Since the hierarchic index is an integer, you can encode the hierarchic indexes into a number, and use that number as the score of a sorted set. In this way, instead of searching by pattern, you can search by score range, which is very fast with a sorted set. Take the following as an example.
Say, the first (right-most) hierarchic index is less than 1000, the second index is less than 100, then you can encode the index (e.g. 7:12:321) into a score (321 + 12 * 1000 + 7 * 100 * 1000 = 712321). Then set the score and the value into a sorted set: zadd myindex 712321 '{"some:"JSON"}'.
When you want to search keys that match 7:12:*, just use zrangebyscore command to get data with a score between 712000 and 712999: zrangebyscore myindex 712000 712999 withscores.
In this way, you can get key (decoded with the returned score) and value together. Also it should be faster than the scan solution.
UPDATE
The solution has a little problem: members of sorted set must be unique, so you cannot have 2 keys with the same value (i.e. json string).
// insert OK
zadd myindex 712321 '{"the_same":"JSON"}'
// failed to insert, members should be unique
zadd myindex 712322 '{"the_same":"JSON"}'
In order to solve this problem, you can combine the key with the json string to make it unique:
zadd myindex 712321 '7:12:321-{"the_same":"JSON"}'
zadd myindex 712321 '7:12:322-{"the_same":"JSON"}'

You could consider using a Sorted Set and lexicographical ranges as long as you only need to perform prefix searches. For more information about this and indexing in general refer to http://redis.io/topics/indexes
Updated with an example:
Consider the following -
$ redis-cli
127.0.0.1:6379> ZADD anotherindex 0 '7:12:321:{"some:"JSON"}'
(integer) 1
127.0.0.1:6379> ZRANGEBYLEX anotherindex [7:12: [7:12:\xff
1) "7:12:321:{\"some:\"JSON\"}"
Now go and read about this so you 1) understand what it does and 2) know how to avoid possible pitfalls :)

Mongo distinct query returns several records one of which is "". But find null fails to find it

I ran a distinct query on a collection. The query syntax:
db.Collection.distinct("dict.field")
I got a set of results - one of which was "" (null).
I then tried to find the record that had a null value for the field in question:
db.Collection.find({"dict.field": null})
To my surprise, no record was found.
No indexes are set on this collection other than _id.
The field in question is a dictionary.
What am I missing?

You should look for db.Collection.find({"dict.field": ""}) instead. Null and String ("") are considered different datatypes.
https://docs.mongodb.com/manual/reference/bson-types/

Sparse Index in MongoDB not working?

I'm trying to understand sparse index in MongoDB. I understand that if I do this :
> db.check.ensureIndex({"id":1},{sparse:true, unique:true})
I can only insert documents where the id field is not a duplicate and is also not absent.
Hence, I tried,
> db.check.insert({id:1})
> db.check.insert({id:1})
which, as I expected, gave :
E11000 duplicate key error index: test.check.$id_1 dup key: { : 1.0 }
However, inserting a document with a non-existant id field :
> db.check.insert({})
works! What is going wrong?

A sparse unique index means that a document doesn't need to have the indexed field(s), but when it has that field, it must be unique.
You can add any number of documents into the collection when the field is absent. Note that when you insert an empty document, the _id field will get an auto-generated ObjectID which is (as good as guaranteed) unique.

That's pretty much what sparse means. From the docs;
Sparse indexes only contain entries for documents that have the indexed field. [5] Any document that is missing the field is not indexed.
In other words, your missing id field makes the index not even consider that entry for a unique check.

Range queries on a large collection in MongoDB

I have document collection with around 40 million documents(~10GB). The documents in this collection are fairly small(~1000 bytes). The main fields of interest are as follows:
start_x integer
end_x integer
I have a query to return a row for a given value of x. For a value of x there can ever be only one matching row in the collection. I am using following selector for this purpose:
"start_x"=>{"$lte"=>1258}, "end_x"=>{"$gte"=> 1258}
I am not getting the expected performance for the query. I started with a compound index (start_x =1 , end_x = 1). The query plan showed around 400K nscanned
{
"cursor"=>"BtreeCursor start_x_1_end_x_1",
"nscanned"=>417801,
"nscannedObjects"=>1,
"n"=>1,
"millis"=>3548,
"nYields"=>0,
"nChunkSkips"=>0,
"isMultiKey"=>false,
"indexOnly"=>false
}
Subsequently, I added stand-alone index on start_x and end_x fields. The query plan didn't show much improvement.
Why is indexOnly not true even though I have a compound index and all the fields used in the query are covered by the index?
Is there a way to optimize this query?

I ended up using indexed lookup on the end_x field to address this issue.
Dropped all the indexes on the collection
Added a ASC index on end_x field.
Queried the first matching row with top bound equal or above the given value
row = Model.where(:end_x.gte => 1258).asc(:end_x).limit(1).first
Checked to ensure the row returned is indeed the matching range
row = (row.present? and 1258.between?(row.start_x, row.end_x)) ? row : nil

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse