how to paginate ordering by non-distinct / non-unique values in PostgreSQL? - postgresql

How can I properly page by ordering on a column that could possibly have repeated values? I have a table called posts, which has a column that holds the number of likes of a certain post, called num_likes, and I want to order by num_likes DESC. But the image below shows a problem that I run into - the new row inserted between the two pages causes repeated data to be fetched.
This link here explains the problem, and gives the solution of keyset pagination, but from what I've seen, that only works if the column that the rows are being sorted on are distinct / unique. How would I do this if that is not the case?

You can easily make the sort key unique by adding the primary key to it.
You don't have to display the primary key to the user, just use it internally to tell “equal” rows apart.
For querying and indexing, you can make use of PostgreSQL's ability to compare like this: (num_likes, id) >= (4, 325698).

Related

Sqlalchemy way to handle select based on outdated data

Assuming I have a table sets with a field filters, containing array of key - value mappings, table items to which select query must be applied to extract rows based on these filters, and associated table for M:M relations to link each set with each item. I am seeking for a method or mechanism to cancel select query if sets.filters were updated, otherwise M:M relation will be built invalid as based on yet not refreshed filters.
The concrete scenario when a problem takes place is:
Receive file with items data, to parse, and insert into items returning new relevant ids(primary keys here);
After insertion, select from relevant sets for filters;
Take items ids and select from items using filters;
Update M:M association table for all the items returned at step 3.
So, unfortunately between step 3 and 4 or even earlier, API call makes an update on one of the sets rows, changing its filters. As the result - M:M table is invalid, because one filter was changed(lets say the filters contained kind of weight <= 100 kilos expression, however after the mentioned update it has become weight <= 50 kilos, so if there are some new items with weight greater than 50, those items ids should not be in M:M table, obviously).
Is there some efficient way to cancel select query from items during transaction? Or maybe there is a strong query to use. My idea is to rollback changes post-factum, checking sets.modified_at column. But it seems as doing additional job by wasting disk and cpu time.

FullText Index - Searching values from another table

Is it possible, in SQL Server 2008, using the full text index syntax, to run a query such as this one?
SELECT *
FROM TABLE_TO_SEARCH S,
TABLE_WITH_STRINGS_TO_SEARCH SS
WHERE
CONTAINS(S.WHOLE_NAME,SS.FIRST_NAME)
OR CONTAINS(S.WHOLE_NAME,SS.LAST_NAME)
I need to search for the FIRST_NAME in table TABLE_TO_SEARCH, column WHOLE_NAME that has an full text index on it. It doesn't seem to be a valid query though... Is there any workaround to it by using the full text index search?
LATER EDIT:
Here is the business case: each night I am downloading from several websites information about "blacklisted" individuals and insert it into a table in this format: WholeName, LastName, FirstName, MiddleName. But the data is chaotic as WholeName does not necessarily contain either the last, first or middle name or the WholeName is null while the other 3 fields have values, or every of these 4 fields is null and so on. Also, the data may repeat itself as one blacklisted individual may come from 2+ of these websites. What I need to do is to compare this data, as chaotic as it is, against our customer data based on our customer's First and Last name and give it a matching score (rank) against the files we download from these websites.
First I tried with charindex or like operators but I couldn't create a scoring algorithm based on this and also it took 6 hours to compare just our customer's first and last name with only the WholeName column from the TABLE_TO_SEARCH table. I thought that perhaps implementing the full_text index it would get easier and faster but ... apparently I was wrong.
Has anyone dealt with a task like this? And if so, what was the best approach?
After skimming http://technet.microsoft.com/en-us/library/ms187787.aspx and http://technet.microsoft.com/en-us/library/ms142571.aspx I don't think it is possible to do your search in this way. Not only that, but it seems this type of index wouldn't work well with names anyway.
If you care about checking one name then all you have to do is set those values to variables. This method would allow you to use the full-text index.
Otherwise, I would suggest splitting the WHOLE_NAME column (if there is a space or unique character between the first and last name) and comparing each part to those other columns. If you are working with a huge data set, you may want to experiment with doing this at a temp table level and creating an index.
Good luck!

Google refine cross-reference between row and column

I'm not sure if this can be achieved in Google Refine at all. But basically, I have data like this.
The first table is the table of all the users. The second table show all the friends. However, in the second table in "friends" column not all the id exists in the first table which I want to get rid of. So, how can I search each id in friends column in the second table and get rid of the id that doesn't exists in the table 1?
Put the two tables in different projects (we'll call them Table1 and Table2).
In Table2 on on the friends column:
use "split multi-valued cells" to get each value on a separate row
convert the visitors column to numbers (or conversely user_id in Table1 to string)
use "add a new column based on this column" with the expression cross(cell,'Table1','user_id').length()
This will return 0 if there's no match, 1 if there's a match or N>1 if there are duplicates in Table1
If you want the data back in the original format, set up a facet to filter on the validity column, blank out all the bad values and then use "join multi-valued cells" to reverse the split operation you did up front.
I fixed some caching bugs with cross() for OpenRefine 2.6, so if the cross doesn't work, try stopping and restarting the Refine server.

Swap the order of items in a SQLite database

I retrieve an ordered list of items from a table of items in a Sqlite Database. How can I swap the id so the order of two items in the Sqlite database table?.
The id shouldn't determine position or ordering. It should be an immutable identifier.
If you need to represent order in a database you need to create another orderNumber column. A couple options are (1) either have values that span a range or (2) have a pointer to next (like a linked list).
For ranges: Spanning a range helps you avoid rewriting the orderNumber column for all items after the insert point. For example, in the range, insert first gets 1, insert 2nd gets max range, insert 3rd between first and second gets mid-range number - if you reposition you have to assign mid-points of the items it's between. One downside is if the list gets enough churn (minimized by a large span) you may have to rebalance the ranges. The pro of this solution is you can get the ordered list just by ordering by this column in the sql statement.
For linked list: If the database has a next column that points to the id that's after it in order, you need to update a couple rows to insert something. Upside is it's simple. Downside is you can't order in the sql statement - you're relying on the code getting the list to sort it.
One other variation is you could pull the ordered list data out of that table altogether. For example, you could have an ordered list table that has listid, itemid, orderedNumber. That allows you to have one or multiple logical ordered lists of the items in that table it references.
Some other references:
How to store ordered items which often change position in DB
Best way to save a ordered List to the Database while keeping the ordering
https://dba.stackexchange.com/questions/5683/how-to-design-a-database-for-storing-a-sorted-list

Structure a dynamoDB table to enable ASC or DESC ordered pagination on * items in a table

I want to ORDER_BY by time/date, and paginate through all items in a table. Scan seems designed to paginate through everything, but does not seem to have a "ASC/DESC" equiv. Query has ScanIndexForward but requires specific primary keys. (no way to SELECT * ?)
Based on the first comment of this question I'm thinking the only way to achieve this is to use a common primary key (!?) and then Query based on that, focusing on the Range key. Is this really how it's supposed to work? I'd have to make a whole separate table with mirrored attributes if I wanted to Query an individual item based on a unique primary key.
Please excuse my NoSQL noobness. I'm a front-end dev who's only dabbled in MySQL and SimpleDB.
Yes, this is what Query is for. The hash key identifies the list of things to page over, and the range key indicates the position within the list. If you can tolerate the latency hit, all you need to store in the table is primary keys where all the data being paged over lives, you can then issue a BatchGetItem to read a pageful of data in parallel.
Duplicate data isn't the sin in NoSQL that it is in the relational model, you're essentially crafting a MySQL style index by hand.