iOS Firestore compound query with multiple ids - swift

This is an issue I came across while trying to mix geo location with Firestore. Long story short - I need restaurants around user's location. In order to get geo search done I use Algolia. When I do the request to Algolia it returns an array of unique restaurant IDs which correspond to Firestore document ids. This works just fine.
What makes things complicated is that I need two more conditions - I need to restrict the query to restaurants with average rating >= 8. And also I want to limit the count of the returned documents (2, 5, 20 etc).
So this is how it should look like in pseudo code:
db.restaurantsCollection
.documentIds([111, 222, 333, 444, 555, 666, 777, 888, 999])
.whereField("averageRating", isGreaterThanOrEqualTo: 8)
.order(by: "averageRating", descending: true)
.limit(to: 2)
.getDocuments()
I know as of today Firestore doesn't support queries with multiple document ids. So what is the most optimized way to perform such a query?
If I set the documentId as an id field in my document and then iterate through all of the returned ids from Algolia and do something like this (then I can do the ordering and limiting in pure Swift):
for id in ids {
db.restaurantsCollection
.whereField("averageRating", isGreaterThanOrEqualTo: 8)
.whereField("id", isEqualTo: id)
.getDocuments()
}
But still this means a lot of requests. Any other ideas?

Here is a bit more efficient way to do it:
firestoreDB
.collection("someCollection")
.whereField(FieldPath.documentID(), in: [
"05C0632C-601B-4D98-BD2B-3E809D0496B1", //up to 10 ids
"087686CA-6B21-4268-9E4C-CF833FCA92DE"
]).getDocuments { snap, error in
if let snap = snap {
snap.documents.forEach { print($0.data()) }
} else if let error = error {
print("Error: \(error)")
}
}
Getting them in batches of 10 will allow for sorting (using .order(by: ) etc.)
It is not perfect, but looks like an improvement on what you have.
You could have multiple such calls and merge the results in one, then sort those.

In case this is helpful to anyone I am posting the options you might consider if you are implementing geo queries the way I implemented them - with Algolia. So the way you do it with Algolia is to keep coordinates with your search indices.
For example:
{
"_geoloc": {
"lng": 8.507785799999999,
"lat": 47.17652589999999
},
"objectID": "1234abcd"
}
Where objectID corresponds to a document ID of a restaurant or whatever kind of venue you want to geo query.
Then you create a query through Algolia API. If you want to get all items around a CLLocation you should set it up - use query.aroundLatLng property for that. You can also set a distance property in meters. The result will be a JSON with your IDs sorted by their distance starting with the closest.
So far so good - we solved our biggest issue - querying by geo location.
Now what if I want to get the top rated from the all N IDs I got from Algolia? The sad thing is that so far there is no API to query by multiple ids. So this means if I want to get all restaurants with score >= 8, I need to make N Firestore document reads and then get the top rated. Far from ideal.
Option 1.
Save restaurant averages with Algolia data.
I know this means extra networking but this is a good way to go. Drawbacks - every time you update the restaurant average in Firestore you need to update it in Algolia also. And if you calculate averages with Cloud Functions as I do you need to set them in Algolia too. Yes, this means you need to upgrade to a paid Firebase plan since this is an external Cloud Functions call.
Now that you have averages in your Algolia payload you can easily get all restaurant IDs for your area and then sort the data by their Algolia rating client side. Once you have the ids sorted you can then request the documents from Firestore.
Option 2.
Save city name and country
You can save city and country names with your location data in Firestore. This way you can use them for your queries. Pseudo Firestore query code:
db.restaurants.where("city" == "current_city").where("country" == "current_country").where("averageRating" >= 8).getDocuments()
However this approach won't be very accurate and will mostly work for big cities. It also won't count the proximity to your location.
I hope this helps.

Related

How are Firebase Reads charged on snapshot whereField?

In this example from the firebase documentation:
// Create a reference to the cities collection
let citiesRef = db.collection("cities")
// Create a query against the collection.
let query = citiesRef.whereField("state", isEqualTo: "CA")
Suppose I were to add a snapshot listener to the query and use it when a page appears. Would I be charged a read for every city in the collection of "cities" or just the cities where the state is equal to CA? For the query to work, it seems like it would have to search through every city, and I'm wondering if those would count as reads.
You're only charged for documents that need to be read on the server for API calls. Queries are actually handled by accessing one or more indexes, so there is no charge for the query itself (*) only for the documents that are actually returned to the client.
(*) the only exception to this is when there are no results for a query, in which case you'll be charged 1 document read.

Firestore - how to do a "reverse" array-contains?

I have a collection users in firestore where one of the fields is "contact_person".
Then I have an array arrNames {'Jim', 'Danny', 'Rachel'} in my frontend and I want to get all users that have either of these names in their "contact_person" field.
Something like where("contact_person" IN arrNames)
How could I do that?
This is currently not possible within the Firestore API. You will need to do a separate get for each document, unless the names happen to be in a single contiguous range.
Also see:
FireStore Where In Query
Google Firestore - how to get document by multiple ids in one round trip?
It is now possible (from 11/2019) https://firebase.googleblog.com/2019/11/cloud-firestore-now-supports-in-queries.html but keep in mind that the array size can't be greater than 10.
db.collection("projects").where("status", "in", ["public", "unlisted", "secret"]);

Is there an equivalent to `beginsWith` in Firestore?

The Firestore documentation on making queries includes examples where you can filter a collection of documents based on whether some field in the document either matches, is less than, or is greater than some value you pass in. For example:
db.collection("cities").whereField("population", isLessThan: 100000)
This will return every "city" whose "population" is less than 100000. This type of query can be made on fields of type String as well.
db.collection("cities").whereField("name", isGreaterThanOrEqualTo: "San Francisco")
I don't see a method to perform a substring search. For example, this is not available:
db.collection("cities").whereField("name", beginsWith: "San")
I suppose I could add something like this myself using greaterThan and lessThan but I wanted to check first:
Why doesn't this functionality exist?
My fear is that it doesn't exist because the performance would be terrible.
[Googler here] You are correct, there are no string operations like beginsWith or contains in Cloud Firestore, you will have to approximate your query using greater than and less than comparisons.
You say "it doesn't exist because the performance would be terrible" and while I won't use those exact words you are right, the reason is performance.
All Cloud Firestore queries must hit an index. This is how we can guarantee that the performance of any query scales with the size of the result set even as the data set grows. We don't currently index string data in a way that would make it easy to service the queries you want.
Full text search operations are one of the top Cloud Firestore feature requests so we're certainly looking into it.
Right now if you want to do full text search or similar operations we recommend integrating with an external service, and we provide some guidance on how to do so:
https://firebase.google.com/docs/firestore/solutions/search
It is possible now:
db.collection('cities')
.where('name', '>=', 'San')
.where('name', '<', 'Sam');
for more details see Firestore query documents startsWith a string

Best way to structure MongoDB with the following use cases?

sorry to have to ask this but I am new to MongoDB (only have experience with relational databases) and was just curious as to how you would structure your MongoDB.
The documents will be in the format of JSONs with some of the following fields:
{
"url": "http://....",
"text": "entire ad content including HTML (very long)",
"body": "text (50-200 characters)",
"date": "01/01/1990",
"phone": "8001112222",
"posting_title": "buy now"
}
Some of the values will be very long strings.
Each document is essentially an ad from a certain city. We are storing all ads for a lot of big cities in the US (about 422). We are storing more ads every day, and the amount of ads per city varies from as little as 0 to as big as 2000. The average is probably around 700-900.
We need to do the following types of queries, in almost instant time (if possible):
Get all ads for any specific city, for any specific date range.
Get all ads that were posted by a specific phone number, for any city, for any date range.
What would you recommend? I'm thinking I should have 422 collections - one for each city. I'm just worried about the query time when we query for phone numbers because it needs to go through each collection. I have an iterable list of all collection names.
Or would it be faster to just have one collection so that I don't have to switch through 422 collections?
Thank you so much, everyone. I'm here to answer any questions!
EDIT:
Here is my "iterating through all collections" snippet:
for name in glob.glob("Data\Nov. 12 - 5pm\*"):
val = name.split("5pm")[1].split(".json")[0][1:]
coll = db[val]
# Add into collection here...
MongoDB does not offer any operations which get results from more than one collection, so putting your data in multiple collections is not advisable in this case.
You can considerably speed up all the use-cases you mentioned by creating indexes for them. When you have a very large dataset and always query for exact equality, then hashed indexes are the fastest.
When you query a range of dates (between day x and day y), you should use the Date type and not strings, because this not just allows you to use lots of handy date operators in aggregation but also allows you to speed up ranged queries and sorts with ascending or descending indexes.
Maybe I'm missing something, but wouldn't making "city" a field in your JSON solve your problem? That way you only need to do something like this db.posts.find({ city: {$in: ['Boston', 'Michigan']}})

Data model for geo spatial data and multiple queries

I have some mongodb object let's call it place which contains geo information, look at the example:
{
"_id": "234235425e3g33424".
"geo": {
"lon": 12.23456,
"lat": 34.23322
}
"some_field": "value"
}
With every place, a list of features is associated with:
{
"_id": "2334sgfgsr435d",
"place_id": "234235425e3g33424",
"feature_field" : "some_value"
}
As you see features are linked to places thanks to place_id field. Now I would like to find: list of features connected with nearest places. But I would like also add search contition on place.some_field and feature.feature_field. And what is important I would like to limit results.
Now I am using such approach:
I query on places with condition on geo and some_filed
I query on features with condition on feature_field and place_id (limit only to ones found in 1.)
I limit results in my application code
My question is: is there better approach to such task? Now I cannot use mongo limit() function, as when I do it on places I can end with too few results as I need to make second query. I cannot limit() on second query as results will come up with random order, and I would like to sort it by distance.
I know I can put data into one document, but I presume that list of features will be long and I can exceed BSON size limit.
Running out of 16mb for just the features seems unlikely... but it's possible. I don't think you realize how much 16mb is, so do the maths before assuming anything!
In any case, with MongoDB you can not do a query with fields from two collections. A query always deals with one specific collection only. I have done a very similar thing than what you have here though, which i've described in an article: http://derickrethans.nl/indexing-free-tags.html — have a look at that for some more inspiration.