Am using the distinct clause for grouping content along with retrieving results page-by-page. But this ain't seem to be working. Getting zero matching results even though results are present.
Query filters being used as below.
{'hitsPerPage': 30, 'distinct': 5, 'filters': 'created>0', 'facetFilters': ['type:1', ['category:t1', 'category:abc', 'category:t2', 'category:t3']], 'page':1}
Related
Main question
I am making a social media app which recommends posts to users. The posts have fields user, gym and grade, all of which are of type String. user refers to the user that made the post.
I want to recommend posts which have gym and grade that are found in lists of preferred gyms and grades, so I want to use whereIn on those fields. However, I do NOT want to recommend posts that were posted by the user of the app (i.e. I don't want users to see their own posts in their recommended posts), so I want to use isNotEqualTo on the user field.
Additionally, the posts have a timestamp field. I want to recommend the latest posts, so I want to use orderBy on the timestamp field.
Lastly, I only want to load 10 posts at a time, so I want to use limit(10).
However, I have to load the next 10 posts when the user scrolls past 10 posts, so I want to use startAfterDocument for that.
So this is what I attempted:
A possible searchTerms looks like this:
final searchTerms = {
'notUser': 'nathantew',
'gyms': ['gym1','gym2'],
'grades': ['grade1','grade2'],
}
Then the query using searchTerms (the actual query logic is longer in my code as the posts have more fields, and I attempted to make a generalised query function. I extracted the relevant parts):
var query =
FirebaseFirestore.instance.collection('posts').orderBy('timestamp');
if (lastDoc != null) {
query = query.startAfterDocument(lastDoc);
}
if (searchTerms['notUser'] != null) {
query = query.where('user', isNotEqualTo: searchTerms['notUser']);
}
if (searchTerms['gyms'] != null) {
query = query.where('gym', whereIn: searchTerms['gyms']);
}
if (searchTerms['grades'] != null) {
query = query.where('grade', whereIn: searchTerms['grades']);
}
return query.limit(10).get();
However, that throws an error on the isNotEqualTo filter:
_AssertionError ('package:cloud_firestore/src/query.dart': Failed assertion: line 677 pos 11: 'field == orders[0][0]': The initial orderBy() field '[[FieldPath([timestamp]), false]][0][0]' has to be the same as the where() field parameter 'FieldPath([user])' when an inequality operator is invoked.)
Extra questions and things I tried
I doubt this is how queries actually work for a non-SQL database like cloud firestore, but the way I imagine the query is that the posts are first ordered by timestamp, then we iterate through the posts from newest to oldest, then whichever post has a gym and grade in the list of preferred gyms and grades, AND doesn't have a user which is 'nathantew', will be added to the list of documents that will be returned to the app when the list reaches 10 documents.
And this way, when I request for the next 10 posts, I can pass in the last document of the previous query and start from there.
I tried searching up the error, but it seems to me that my understanding of how queries happen is completely wrong, because I quickly found a confusingly large amount of rules restricting how queries can be made.
The rules are confusing too, for example, https://firebase.google.com/docs/firestore/query-data/order-limit-data#limitations states "If you include a filter with a range comparison (<, <=, >, >=), your first ordering must be on the same field". Also, "You cannot order your query by any field included in an equality (=) or in clause." I'm using isNotEqualTo in this case, so does that fall under the first rule, or the second? It doesn't seem like isNotEqualTo is a "range comparison" filter, but the error called on the isNotEqualTo sounds like it is referring to the first rule...
I've seen suggestions to make a less complicated query omitting certain filters, and only add the filters in the local app code. I don't know how that'll work.
I can't omit the orderBy timestamp query filter or I would have to query the entire database to filter it locally.
But if I omit any of the where filters, then I need to query an unknown number of documents each time to ensure 10 documents are queried that satisfy the conditions.
And the more I think about it, the more possible errors I imagine. What if a user deletes their post as I am trying to use that document for the startAfterDocument filter...?
So, how should I make this query? Even better, where can I find best practices for such use cases? Are they all hidden from amateurs as they are used by large companies with professionals?
I am trying to optimize a DB lookup as best as I can. From what I understand, my goal should be to target a winningPlan where the only stage is an IXScan. But I have a field containing date keys, and it seems like I cannot build a compound index that will be able to lookup directly documents when filtering on "null" date values.
My filter query is the following
{"$and":[
{"published":true},
{"soft_deleted_at":null}, # <-- this one's a date field, I need null values
{"another_filter":false},
{"yet_another_filter":false}
]}`
I tried building a partial index that would correspond exactly to this query (in order to also save up some index memory, since I know I will never have to show documents that are soft deleted for example)
(Note that the code is in Ruby, but it translates to MongoDB language without any problem using Mongoid)
index(
{
published: 1,
another_filter: 1,
soft_deleted_at: 1,
yet_another_filter: 1,
},
{
background: true,
name: 'Visible in search engine partial index',
partial_filter_expression: {
'$and': [
{"published":true},
{"soft_deleted_at":null},
{"another_filter":false},
{"yet_another_filter":false}
]
}
}
)
This seems to work well except for the soft_deleted_at filter, since my winning plan looks like
=> {"stage"=>"FETCH",
"filter"=>{"soft_deleted_at"=>{"$eq"=>nil}},
"inputStage"=>
{"stage"=>"IXSCAN",
"keyPattern"=>{"published"=>1, "another_filter"=>1, "soft_deleted_at"=>1, "yet_another_filter"=>1},
"indexName"=>"Visible in search engine partial index",
"isMultiKey"=>false,
"multiKeyPaths"=>{"published"=>[], "another_filter"=>[], "soft_deleted_at"=>[], "yet_another_filter"=>[]},
"isUnique"=>false,
"isSparse"=>false,
"isPartial"=>true,
"indexVersion"=>2,
"direction"=>"forward",
"indexBounds"=>
{"published"=>["[true, true]"], "another_filter"=>["[false, false]"], "soft_deleted_at"=>["[null, null]"], "yet_another_filter"=>["[false, false]"]}}}
So here I have this extra stage "stage"=>"FETCH", "filter"=>{"soft_deleted_at"=>{"$eq"=>nil}}, which is basically manually filtering my date field fr null values. I was hoping this would already be in the partial index and not require more filtering... was I wrong ?
Is there some way I can avoid this extra filter stage ?
Is there some way I can avoid this extra filter stage ?
No, there's not. (At least, not with your current data schema)
Mongo creates indexes for non-existence (null & undefined) a bit differently than existence. It is actually using the soft_deleted_at index (note that it's filtering on a range of [null, null], but that's also fetching values where soft_deleted_at is undefined. It isn't able to use the index to filter out those values, so it has to do that filter step.
While in general, it's best to avoid filter stages, this doesn't seem like a case where it's going to be costly. You won't be fetching any extra documents, so the only cost is inspecting the fetched documents for a single field.
The alternative would be to add a value like false and search by that. If you had a field like deleted that was either true or false for every documents (and that you updated at the same time as soft_deleted_at) your query plan wouldn't include a filter stage.
If I want to sort by product price, when I apply skip and limit it works if prices are distinct but if they are all the same, pagination breaks (next page result isn't expected, it shows already shown results), as if sorting products is computed differently everytime. So I'm wondering is adding product _id (which is unique) to sort: {product_price:1, product_id:1} correct to ensure that my pagination won't break (so that sort is ordered equally everytime). Is there something else I should be aware?
Thanks
My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.
Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.
The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.
I have browsed through various examples but have failed to find what I am looking for.. What I want is to search for a specific document by _id and skip multiple times between a collection by using one query? Or some alternative which is fast enough to my case.
Following query would skip first one and return second in advance:
db.posts.find( { "_id" : 1 }, { comments: { $slice: [ 1, 1 ] } } )
That would be skip 0, return 1 and leaves the rest out from result..
But what If there would be like 10000 comments and I would want to use same pattern, but return that array values like this:
skip 0, return 1, skip 2, return 3, skip 4, return 5
So that would return collection which comments would be size of 5000, because half of them is skipped away. Is this possible? I applied large number like 10000 because I fear that using multiple queries to apply this would not be performance wise.. (example shown in here: multiple queries to accomplish something similar). Thnx!
I went through several resources and concluded that currently this is impossible to make with one query.. Instead, I agreed on that there are only two options to overcome this problem:
1.) Make a loop of some sort and run several slice queries while increasing the position of a slice. Similar to resource I linked:
var skip = NUMBER_OF_ITEMS * (PAGE_NUMBER - 1)
db.companies.find({}, {$slice:[skip, NUMBER_OF_ITEMS]})
However, depending on the type of a data, I would not want to run 5000 individual queries to get only half of the array contents, so I decided to use option 2.) Which seems for me relatively fast and performance wise.
2.) Make single query by _id to row you want and before returning results to client or some other part of your code, skip your unwanted array items away by using for loop and then return the results. I made this at java side since I talked to mongo via morphia. I also used query explain() to mongo and understood that returning single line with array which has 10000 items while specifying _id criteria is so fast, that speed wasn't really an issue, I bet that slice skip would only be slower.