Firestore index on maps and array - clarification - google-cloud-firestore

I'm trying to understand how Firestore creates indexes on fields. Given the following sample document, how are indexes created, especially for the maps/arrays?
I read the documentation at Index types in Cloud Firestore multiple times and I'm still unsure. There it says:
Automatic indexing
By default, Cloud Firestore automatically maintains single-field indexes for each field in a document and each subfield in a map. Cloud Firestore uses the following default settings for single-field indexes:
For each non-array and non-map field, Cloud Firestore defines two collection-scope single-field indexes, one in ascending mode and one in descending mode.
For each map field, Cloud Firestore creates one collection-scope ascending index and one descending index for each non-array and non-map subfield in the map.
For each array field in a document, Cloud Firestore creates and maintains a collection-scope array-contains index.
Single-field indexes with collection group scope are not maintained by default.
If I understand this correctly then there is an index created for each of these fields, even for the values in the alternate_names array.
So if I want to search for any document where fields.alternate_names contains a value of (for example) "Caofang", then Firestore would use an index for its search
Is my assumption/understanding correct?

No, your understanding is not correct. fields.alternate_names is an array subfield in a map field, which means it would not satisfy the requirements in the second point. You can test your assumption simply by issuing the query. If the query fails, you will see in the error message that it failed due to lack of index.
Firestore will simply not allow queries that are not indexed. The error message from that failure will contain a link to the console that will let you create the index necessary for that query, if such a thing is possible.
If you want to be able to query the contents of fields.alternate_names, consider promoting it to its own top-level field, which will be indexed by default.

Related

Sharding with array in Cloud Firestore with composite index

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I can add a shard field to avoid this.
Therefore I should add the shard field before the sequential field in a composite index.
But what if my sequential field is an array?
An array must always be the first field in a composite index.
For example:
I have a Collection "users" with an array field "reminders".
The field reminders contains time strings like ["12:15", "17:45", "20:00", ...].
I think these values could result in hot spotting but maybe I am wrong.
I don't know how Firestore handles arrays in composite indexes.
Clould my array reminders slow down the writes per second? And if so how could I implement a shard field? Or is there a completely different solution?

How to create a composite index in Firestore with document Id

So I'm using firebase cloud firestore with swift (but this is a general question with firestore), and I want to sort through some documents using a query, something like
fetchQ.whereField(fieldName, isGreaterThan: startingValue)
But then I want to guarantee some kind of order if the field has the same value, and it stands to reason that the document id is good for this, so I add
.order(by: FieldPath.documentID(), descending: false)
But now I get the error in the console where I have to paste the url in order to create a composite index. I do that, except it's only for the single index "fieldName", leaving out the document id, so obviously I get an error for trying to create a composite index with a single field. I also tried it with two fields plus the document id, and sure enough the url generates a composite index for the two fields but leaving out the document id.
The composite indexing page in the firebase console also does not have an option to create a composite index involving the document id.
So it would seem to me that maybe using document id for sorting is not the intended practice? Should I create a unique id for each document for sorting purposes or if I can use document id for ordering how should I do it?
From the docs: "By default, a query retrieves all documents that satisfy the query in ascending order by document ID" (firebase.google.com/docs/firestore/query-data/…). So the behaviour you want to to achieve is what you get out of the box.

Does length of indexed field matter while searching?

The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.
I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents

How to create an index exemption on Firestore subdocuments?

We have a database structured as follows:
Collection foo
Documents
Collection bar
Documents with many fields (approaching the 1 MB limit)
Trying to write a document to the bar collection containing 34571 fields, I get (from the Go API):
rpc error: code = InvalidArgument desc = too many builtin index entries for entity
OK, fine, it seems I need to add an exemption:
Large array or map fields
Large array or map fields can approach the limit of 20,000 index entries per document. If you are not querying based on a large array or map field, you should exempt it from indexing.
But how? The console only lets me set a single collection name and a single field path, and slashes aren't accepted:
I tried other combinations, but / isn't accepted in either the Collection ID or the Field path, and using ., while not clearly forbidden, results in a generic error when trying to save the exemption. I'm also not sure if * is allowed.
Index exemptions are based on collection ID and not collection path. In this case, you can enter bar as the collection ID. This also means the exemption applies to all collections with ID bar, regardless of hierarchy.
As for the fields, you can specify only a single field path per exemption. The "*" all-selector is not supported. There is a limit of 200 index exemptions so you wouldn't be able to exempt all 34571 fields. If possible, I suggest moving your fields into a map. Then you could disable indexing on the map field.

How does MongoDB order their docs in one collection? [duplicate]

This question already has answers here:
How does MongoDB sort records when no sort order is specified?
(2 answers)
Closed 7 years ago.
In my User collection, MongoDB usually orders each new doc in the same order I create them: the last one created is the last one in the collection. But I have detected another collection where the last one I created has the 6 position between 27 docs.
Why is that?
Which order follows each doc in MongoDB collection?
It's called natural order:
natural order
The order in which the database refers to documents on disk. This is the default sort order. See $natural and Return in Natural Order.
This confirms that in general you get them in the same order you inserted, but that's not guaranteed–as you noticed.
Return in Natural Order
The $natural parameter returns items according to their natural order within the database. This ordering is an internal implementation feature, and you should not rely on any particular structure within it.
Index Use
Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index.
MMAPv1
Typically, the natural order reflects insertion order with the following exception for the MMAPv1 storage engine. For the MMAPv1 storage engine, the natural order does not reflect insertion order if the documents relocate because of document growth or remove operations free up space which are then taken up by newly inserted documents.
Obviously, like the docs mentioned, you should not rely on this default order (This ordering is an internal implementation feature, and you should not rely on any particular structure within it.).
If you need to sort the things, use the sort solutions.
Basically, the following two calls should return documents in the same order (since the default order is $natural):
db.mycollection.find().sort({ "$natural": 1 })
db.mycollection.find()
If you want to sort by another field (e.g. name) you can do that:
db.mycollection.find().sort({ "name": 1 })
For performance reasons, MongoDB never splits a document on the hard drive.
When you start with an empty collection and start inserting document after document into it, mongoDB will place them consecutively on the disk.
But what happens when you update a document and it now takes more space and doesn't fit into its old position anymore without overlapping the next? In that case MongoDB will delete it and re-append it as a new one at the end of the collection file.
Your collection file now has a hole of unused space. This is quite a waste, isn't it? That's why the next document which is inserted and small enough to fit into that hole will be inserted in that hole. That's likely what happened in the case of your second collection.
Bottom line: Never rely on documents being returned in insertion order. When you care about the order, always sort your results.
MongoDB does not "order" the documents at all, unless you ask it to.
The basic insertion will create an ObjectId in the _id primary key value unless you tell it to do otherwise. This ObjectId value is a special value with "monotonic" or "ever increasing" properties, which means each value created is guaranteed to be larger than the last.
If you want "sorted" then do an explicit "sort":
db.collection.find().sort({ "_id": 1 })
Or a "natural" sort means in the order stored on disk:
db.collection.find().sort({ "$natural": 1 })
Which is pretty much the standard unless stated otherwise or an "index" is selected by the query criteria that will determine the sort order. But you can use that to "force" that order if query criteria selected an index that sorted otherwise.
MongoDB documents "move" when grown, and therefore the _id order is not always explicitly the same order as documents are retrieved.
I could find out more about it thanks to the link Return in Natural Order provided by Ionică Bizău.
"The $natural parameter returns items according to their natural order within the database.This ordering is an internal implementation feature, and you should not rely on any particular structure within it.
Typically, the natural order reflects insertion order with the following exception for the MMAPv1 storage engine. For the MMAPv1 storage engine, the natural order does not reflect insertion order if the documents relocate because of document growth or remove operations free up space which are then taken up by newly inserted documents."