postgres offset by value not number - postgresql

I have a table with at least a "name" column and an "ordinal_position" column. I wish to loop each row starting from a certain row the user inputs. Let's say the user inputs "John", and that his ordinal_position is 6 (out of a 10 total). How do I loop only the last 4 rows without using a subquery? I've tried using the "OVER()" window function but it doesn't seem to work on the offset part of the query, and that same offset only takes numbers (as far as I know) not strings.
EDIT (in response to klin):
INSERT INTO foo(id,name,ordinal_position) VALUES
(DEFAULT,'Peter',1),
(DEFAULT,'James',2),
(DEFAULT,'Freddy',3),
(DEFAULT,'Mark',4),
(DEFAULT,'Jack',5),
(DEFAULT,'John',6),
(DEFAULT,'Will',7),
(DEFAULT,'Robert',8),
(DEFAULT,'Dave',9),
(DEFAULT,'Michael',10);
so in my FOR, since the user inputed "John" I want to loop through Will-Michael. Something like the following but without a subquery:
SELECT * FROM foo ORDER BY ordinal_position OFFSET
(SELECT ordinal_position FROM foo WHERE name='John');

Unfortunately, you have to query the table to find an ordinal_position for a given name.
However, do not use offset. You can do it in where clause, for large tables it will be much faster:
select *
from foo
where ordinal_position > (select ordinal_position from foo where name = 'John')
order by ordinal_position;
id | name | ordinal_position
----+---------+------------------
7 | Will | 7
8 | Robert | 8
9 | Dave | 9
10 | Michael | 10
(4 rows)

Related

Finding duplicate records posted within a lapse of time, in PostgreSQL

I'm trying to find duplicate rows in a large database (300,000 records). Here's an example of how it looks:
| id | title | thedate |
|----|---------|------------|
| 1 | Title 1 | 2021-01-01 |
| 2 | Title 2 | 2020-12-24 |
| 3 | Title 3 | 2021-02-14 |
| 4 | Title 2 | 2021-05-01 |
| 5 | Title 1 | 2021-01-13 |
I found this excellent (i.e. fast) answer here: Find duplicate rows with PostgreSQL
-- adapted from #MatthewJ answering in https://stackoverflow.com/questions/14471179/find-duplicate-rows-with-postgresql/14471928#14471928
select * from (
SELECT id, title, TO_DATE(thedate,'YYYY-MM-DD'),
ROW_NUMBER() OVER(PARTITION BY title ORDER BY id asc) AS Row
FROM table1
) dups
where
dups.Row > 1
Which I'm trying to use as a base to solve my specific problem: I need to find duplicates according to column values like in the example, but only for records posted within 15 days of each other (the date of record insertion in the column "thedate" in my DB).
I reproduced it in this fiddle http://sqlfiddle.com/#!15/ae109/2, where id 5 (same title as id 1, and posted within 15 days of each other) should be the only acceptable answer.
How would I implement that condition in the query?
With the LAG function you can get the date from the previous row with the same title and then filter based on the time difference.
WITH with_prev AS (
SELECT
*,
LAG(thedate, 1) OVER (PARTITION BY title ORDER BY thedate) AS prev_date
FROM table1
)
SELECT id, title, thedate
FROM with_prev
WHERE thedate::timestamp - prev_date::timestamp < INTERVAL '15 days'
You don't necessarily need window funtions for this, you an use a plain old self-join, like:
select p.id, p.thedate, n.id, n.thedate, p.title
from table1 p
join table1 n on p.title = n.title and p.thedate < n.thedate
where n.thedate::date - p.thedate::date < 15
http://sqlfiddle.com/#!15/a3a73a/7
This has the advantage that it might use some of your indexes on the table, and also, you can decide if you want to use the data (i.e. the ID) of the previous row or the next row from each pair.
If your date column however is not unique, you'll need to be a little more specific in your join condition, like:
select p.id, p.thedate, n.id, n.thedate, p.title
from table1 p
join table1 n on p.title = n.title and p.thedate <= n.thedate and p.id <> n.id
where n.thedate::date - p.thedate::date < 15

How to sum children occurrences from a joining table in Postgres?

I need to count how many consultants are using a skill through a joining table (consultant_skills), and the challenge is to sum the children occurrences to the parents recursively.
Here's the reproduction of what I'm trying to accomplish. The current results are:
skill_id | count
2 | 2
3 | 1
5 | 1
6 | 1
But I need to compute the count to the parents recursively, where the expected result would be:
skill_id | count
1 | 2
2 | 2
3 | 1
4 | 2
5 | 2
6 | 1
Does anyone know how can I do that?
Sqlfiddle Solution
You need to use WITH RECURSIVE, as the Mike suggests. His answer is useful, especially in reference to using distinct to eliminate redundant counts for consultants, but it doesn't drive to the exact results you're looking for.
See the working solution in the sqlfiddle above. I believe this is what you are looking for:
WITH RECURSIVE results(skill_id, parent_id, consultant_id)
AS (
SELECT skills.id as skill_id, parent_id, consultant_id
FROM consultant_skills
JOIN skills on skill_id = skills.id
UNION ALL
SELECT skills.id as skill_id, skills.parent_id as parent_id, consultant_id
FROM results
JOIN skills on results.parent_id = skills.id
)
SELECT skill_id, count(distinct consultant_id) from results
GROUP BY skill_id
ORDER BY skill_id
What is happening in the query below the UNION ALL is that we're recursively joining the skills table to itself, but rotating in the previous parent id as the new skill id, and using the new parent id on each iteration. The recursion stops because eventually the parent id is NULL and there is no JOIN because it's an INNER join. Hope that makes sense.

How to split a string in a smart way?

Function string_to_array splits strings without grouping substrings in apostrophes:
# select unnest(string_to_array('one, "two,three"', ','));
unnest
--------
one
"two
three"
(3 rows)
I would like to have a smarter function, like this:
# select unnest(smarter_string_to_array('one, "two,three"', ','));
unnest
--------
one
two,three
(2 rows)
Purpose.
I know that COPY command does it in a proper way, but I need this feature internally.
I want to parse a text representation of rows of existing table. Example:
# select * from dataset limit 2;
id | name | state
----+-----------------+--------
1 | Smith, Reginald | Canada
2 | Jones, Susan |
(2 rows)
# select dataset::text from dataset limit 2;
dataset
------------------------------
(1,"Smith, Reginald",Canada)
(2,"Jones, Susan","")
(2 rows)
I want to do it dynamically in a plpgsql function for different tables. I cannot assume constant number of columns of a table nor a format of columns values.
There is a nice method to transpose a whole table into a one-column table:
select (json_each_text(row_to_json(t))).value from dataset t;
If the column id is unique then
select id, array_agg(value) arr from (
select row_number() over() rn, id, value from (
select id, (json_each_text(row_to_json(t))).value from dataset t
) alias
order by id, rn
) alias
group by id;
gives you exactly what you want. Additional query with row_number() is necessary to keep original order of columns.

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.

Sorting randomly with grouping in postgresql

I have a table where the data may look like so:
ID | PARENT_ID | several_columns_to_follow
---+-----------+--------------------------
1 | 1 | some data
2 | 2 | ...
3 | 1 | ...
4 | 4 | ...
5 | 3 | ...
6 | 1 | ...
7 | 2 | ...
8 | 3 | ...
Per requirements, I need to allow sorting in two ways (per user request):
1) Random order of ID's within sequential parents - this is achieved easily with
SELECT * FROM my_table ORDER BY parent_id, random()
2) Random order of ID's within randomly sorted parents - and this is where I'm stuck. Obviously, just sorting the whole thing by random() won't be very useful.
Any suggestions? Ideally, in one SQL statement, but I'm willing to go for more than one if needed. The amount of data is not large, so I'm not worried about performance (there will never be more than about 100 rows in the final result).
Maybe this will do:
ORDER BY ((parent_id * date_part('microseconds', NOW()) :: Integer) % 123456),
((id * date_part('microseconds', NOW()) :: Integer) % 123456);
Maybe a prime number instead of 123456 will yield "more random" results
random() really is a nasty function in a final sort. If almost random is good enough, maybe you could sort on some hashed-up function of the row values, like
SELECT * FROM my_table
ORDER BY parent_id, (id *12345) % 54321
;
If it's never more than 100 records, it may make sense just to select all the records, place it in a local table (or in memory data structure on your client end) and just do a proper Fisher-Yates shuffle.
Something like this should do it:
WITH r AS (SELECT id, random() as rand FROM my_table ORDER BY rand)
SELECT m.* FROM r JOIN my_table m USING (id)
ORDER BY r.rand, random();