Range query with no dups - mongodb

I have a collection that I would like to serve out as 'pages'. The collection could get quite large, I have read skip is not optimal in that case. I think range queries will work just fine in my case so I am going to try that route.
My collection will be sorted and paged on a timestamp field. I have implemented the API such that a user passes in a startDate and I will return a certain number ('limit', max of 1000) of items. However I am struggling with how to not get duplicates on each page if documents have the same time.
As an example (small page size to make it easy).
I have 6 documents let's docs 3 and 4 have the same time. If I ask for page one I will get the first three. However when I ask for page 2 with a startDate that it 'gte' the last doc on page one I will get a dup on page 2 as the last doc from page one will be that same as the first doc on page 2.
I cannot find a range query example anywhere that deals with dates, while not returning dups.

Related

Magento 2.4 collection filtering no longer working - strange paging issue

I'll start by saying that this worked correctly prior to Magento 2.4.
using $collection->addAttributeToFilter("sku",'21V12') does seem to filter the products, but it does so very strangely. if i have 20 products in a category and i use that filter, there are 2 different scenarios
The results are correct and it shows 1 of 1
The results show "We can't find products matching the selection."
the difference is that if the sku i'm filtering on is on page 1, then i get the first results, but if the sku is on a different page, i get the "We can't find products matching the selection."
If I add the page to the url, I get the result (adding p=2 or p=3 for example)
Any idea why that is? I've tried this in multiple points in the code to no avail.
That filtering on sku is a simple example, but lets say we want to do a more involved filter like
$collection->addAttributeToFilter("special_name",'some_custom_text')
and that gives 20 results, sometimes there are none on page 1 and 3 on page 2, etc.
Anyway, it seems to just be hiding items in the display and not actually giving the results we are looking for.
I've tested this on a baseline 2.4.2 install with the Luma theme.
To verify, the easiest way, add this at line 147 in vendor/magento/module-catalog/Model/Layer.php
$collection->addAttributeToFilter("sku",'21V12');
Substitute a sku you have on page 2 of a category page. you should get the "We can't find products matching the selection." page, but if you add ?p=2 (or whatever page your item is on normally) you'll get that product as a result.

Is there a way to see if a limit offset query has reached the end with pgpromse?

I have a table of posts. I would like to query these posts as pages. Because I would like to keep my endpoints stateless I would like to do this with offset and limit like this:
SELECT * FROM post LIMIT 50 OFFSET $1 ORDER BY id
Where $1 one would be the page number times the page size (50). The easy way to check if we have reached the end would be to see if we got 50 pages back. The problem of course is if the number of pages is divisible by 50, we can't be sure.
The way I have solved this until now is by simply fetching 51 posts per query with the page size still being 50. That way if the return query is less than 51, we have reached the end.
Unfortunately, this seems a very hacky way to do this. So I was wondering, is there some feature within pg-promise or postgresql that would indicate that I have reached the end of a table without resorting to tricks like this?
The simplest method with the lowest overhead I found:
You can request pageLimit+1 rows on every page request. In your controller you will check if rowsCount > pageLimit and will know that there is more data available. Of course, before returning the rows, you would need to remove the last element and send along the rows something like a hasNext boolean.
It is usually way cheaper for the DB to retrieve an extra row of data than count all rows or make an extra request for page+1 to check if it returns any rows.
Well there is no built in process for this directly. But you can count the rows and add that to the results. You could then even give the user the number of items or number of pages:
-- Item count
with pc(cnt) as (select count(*) from post)
select p.*, cnt
from post p
cross join pc
limit 50 offset $1;
-- page count
with pc(cnt) as (select count(*)/50 + ((count(*)%50)>0)::int from post)
select p.*, cnt
from post p
cross join pc
limit 50 offset $1;
Caution: The count function can be slow, and even when not it does add to response time. Is it worth the additional overhead? Only you and the user can answer that.
This method works well only in specific settings (SPA with caching of network requests and desire to make pagination feel faster with pre-fetching):
One every page, you make two requests: one for the current page data and one for the next page's data.
It works if you for example use a React Single-Page Application with react-query where the nextPage will not be refetched but reused when user opens it.
Otherwise, if the nextPage is not reused, it's worse than checking for a total number of rows to determine whether there are any rows left as you will make 2 requests for every page.
It will even make the user interface snappier as the transition to the next page will always be instant.
This method will work well if you have a lot of page transitions as the total number of calls equals numberOfPages+1, so if on average users go to 10 pages, numberOfPages+1=10+1 or just 10% overhead. But if your users usually do not go beyond the first page, it makes little sense as in this case numberOfPages+1=2 calls for a single page.

Displaying a 'Top Result' with Algolia?

I want to search 3 different indices at once, breaking out their results into 3 separate sections.
Above these 3 sections though, I want to display a 4th section with a single result row, and this section will be titled "Best Result".
It should take the best result of the 3 indices that most perfectly matches.
Does anyone know how I can achieve this? Thanks!
By design, when the ranking of each index is properly set (both attributeToIndex, and customRanking settings filled), the Algolia engine returns the most relevant et popular results for each new search request. First the one without typos and with perfect matches, then the others...
Which mean that if you want to display all the top results on a single page, you only need to take the N first result of each index.
Then it's just a matter of display. Like on the following websites, you can imagine to display those results into multiple columns:
http://telly.com/
Prototype built using the TED API
Feel free to look at tips about the best way to display result hits in multiple columns using Algolia and Bootstrap 3
Beside that you can also consider passing the getRankingInfo=1 parameter with your search query, and filter the results displayed accordingly to the matching info returned for each hit within the _rankingInfo property.
firstMatchedWord: 2000
geoDistance: 0
geoPrecision: 1
nbExactWords: 2
nbTypos: 0
proximityDistance: 1
userScore: 9499
words: 2

Sphinx Search Default Order

I just noticed something about Sphinx Search. If I choose a particular order, like relevance for example, it seems like if I have a number of items from 1 to 10, for some reason, the relevant returns that come back are still in a numbered order. i.e. The records will be in the 1-5 range instead of in the 6-10 range. Is there something I am missing or don't understand?
So, the only way I can get new results to show is to do a sort by ID DESC, but the problem there is I am only getting from the newest ID down and there isn't really any sort on relevance at that point.
Is there some kind of default sort on the back end that can be adjusted?

when i put a find query in mongodb without limit and the data of total row was exceeded from 16mb an exception shows up

I have a scenario where i need all document form the collection and do some calculation and show the analyze graph to the user.
At that point when I use the find function it shows me the error ( document fragment is too large: 24331229, max: 16777216 ).
in one collection document like 100000 and above.
My current scenario
I have 3 collection with 3 type of analyze representation.
visit_data (User visit the page and fill some forms having questions.)
graph_data (The graph represent the how many question answered and whats the response of the user )
tabular_data (Its shows the question and answer of the selected questions visited by the user.)
Like google analytic graph representation of the analysis of the websites by
--> date
--> Location
--> language
--> etc.
similarly I have some filters (date,user,location) for the above type of analyze result.
So on the above collection every day 50,000 documents are interested.
If we do 7 day filter or 30 day filter at that moment the total document was >1,00,000 and the size is >16mb.
When I use find() function it will give me that exception.
I am using php-mongo to get the values.
Is there any other solution to do this stuff?
Please Help me.