Sorting randomly with grouping in postgresql - postgresql

I have a table where the data may look like so:
ID | PARENT_ID | several_columns_to_follow
---+-----------+--------------------------
1 | 1 | some data
2 | 2 | ...
3 | 1 | ...
4 | 4 | ...
5 | 3 | ...
6 | 1 | ...
7 | 2 | ...
8 | 3 | ...
Per requirements, I need to allow sorting in two ways (per user request):
1) Random order of ID's within sequential parents - this is achieved easily with
SELECT * FROM my_table ORDER BY parent_id, random()
2) Random order of ID's within randomly sorted parents - and this is where I'm stuck. Obviously, just sorting the whole thing by random() won't be very useful.
Any suggestions? Ideally, in one SQL statement, but I'm willing to go for more than one if needed. The amount of data is not large, so I'm not worried about performance (there will never be more than about 100 rows in the final result).

Maybe this will do:
ORDER BY ((parent_id * date_part('microseconds', NOW()) :: Integer) % 123456),
((id * date_part('microseconds', NOW()) :: Integer) % 123456);
Maybe a prime number instead of 123456 will yield "more random" results

random() really is a nasty function in a final sort. If almost random is good enough, maybe you could sort on some hashed-up function of the row values, like
SELECT * FROM my_table
ORDER BY parent_id, (id *12345) % 54321
;

If it's never more than 100 records, it may make sense just to select all the records, place it in a local table (or in memory data structure on your client end) and just do a proper Fisher-Yates shuffle.

Something like this should do it:
WITH r AS (SELECT id, random() as rand FROM my_table ORDER BY rand)
SELECT m.* FROM r JOIN my_table m USING (id)
ORDER BY r.rand, random();

Related

SQL Query for equal and opposite values

Suppose I have a table with two columns id and val. I wan't to find all the distinct ids where there exist a pair of equal and opposite vals. For example suppose you have the following table
id | val
------+------
1 | 3
2 | 5
2 | -5
1 | 4
2 | 6
3 | 9
2 | -6
3 | -9
I want the result to be
result
2
3
2 in the result set because there are values 5, -5 and 6, -6. 3 is in the result set because of 9, -9.
I can do this by using where exists. Something like
select distinct tab1.id from tab tab1
where exists (
select * from tab tab2
where tab1.id = tab2.id
and tab1.val = -tab2.val
);
However I worry that a query like this has time complexity O(n^2) because it is computed like nested loops (?). However it is possible to compute this in O(n) time by scanning the table (and keeping track of previously seen results in a data structure with O(1) lookup time). What is the optimal way to write such a query?
We should have an explain of your request and how you sets indexes.
May be it could be done like this too :
WITH pos AS (
SELECT id, val FROM tab WHERE val > 0),
neg AS (
SELECT id, val FROM tab WHERE val < 0)
SELECT DISTINCT id
FROM pos JOIN neg USING (id)
WHERE pos.val = neg.val;
With right indexation, this could be quick. Depend also of the volume of data.

postgres offset by value not number

I have a table with at least a "name" column and an "ordinal_position" column. I wish to loop each row starting from a certain row the user inputs. Let's say the user inputs "John", and that his ordinal_position is 6 (out of a 10 total). How do I loop only the last 4 rows without using a subquery? I've tried using the "OVER()" window function but it doesn't seem to work on the offset part of the query, and that same offset only takes numbers (as far as I know) not strings.
EDIT (in response to klin):
INSERT INTO foo(id,name,ordinal_position) VALUES
(DEFAULT,'Peter',1),
(DEFAULT,'James',2),
(DEFAULT,'Freddy',3),
(DEFAULT,'Mark',4),
(DEFAULT,'Jack',5),
(DEFAULT,'John',6),
(DEFAULT,'Will',7),
(DEFAULT,'Robert',8),
(DEFAULT,'Dave',9),
(DEFAULT,'Michael',10);
so in my FOR, since the user inputed "John" I want to loop through Will-Michael. Something like the following but without a subquery:
SELECT * FROM foo ORDER BY ordinal_position OFFSET
(SELECT ordinal_position FROM foo WHERE name='John');
Unfortunately, you have to query the table to find an ordinal_position for a given name.
However, do not use offset. You can do it in where clause, for large tables it will be much faster:
select *
from foo
where ordinal_position > (select ordinal_position from foo where name = 'John')
order by ordinal_position;
id | name | ordinal_position
----+---------+------------------
7 | Will | 7
8 | Robert | 8
9 | Dave | 9
10 | Michael | 10
(4 rows)

SUM of COUNTs in the same table

I'm doing many counts that I want to show in a table. And I want in the same table to show the sum of all counts.
Here's what I got (simplified - I got 6 Counts):
SELECT * FROM (SELECT COUNT() AS NB_book
item as a1, metadatavalue as m1, metadatavalue as m12,
WHERE m1.field_id = 64 (because I need that field to exist)
AND m2.field_id = 66
And m2. = book
AND a1.in_archive = TRUE )
(SELECT COUNT() AS NB_toys
metadatavalue as m1, metadatavalue as m12,
WHERE m1.field_id = 64 (because I need that field to exist)
AND m2.field_id = 66
And m2. = toys
AND a1.in_archive = TRUE)
)
Now, I want the display to be like
-------------table ----------
|NB_book | NB_Toys | total_object |
-----------------------------
| 12 | 10 | 22 |
You want something along the lines of:
SELECT
sum(CASE WHEN condition_1 THEN 1 END) AS firstcount,
sum(CASE WHEN condition_2 THEN 1 END) AS secondcount,
sum(thecolumn) AS total
FROM ...
Your example query is too vague to construct something usable from, but this'll give you the idea. The conditions above can be any boolean expression.
If you prefer you can use NULLIF instead of CASE WHEN ... THEN ... END. I prefer to stick to the standard CASE.
It is difficult to figure out what you actually want. You can run completely different queries that each return a one-row result and combine the results like this:
select
(select count(*) from pgbench_accounts) as count1,
(select count(*) from pgbench_tellers) as count2 ;
But perhaps you shouldn't do that. Instead just run each query by itself and use the client, rather than the database engine, to format the results.

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.

Why do I have to provide an items.id column to the group by clause?

I want to return unique items based on condition, sorted by price asc. My query fails because Postgres wants items.id to be present in the group by clause. If it's included the query returns everything matching the where clause, which is not what I want. Why do I need to include the column?
select items.*
from items
where product_id = 1 and items.status = 'in_stock'
group by condition /* , items.id returns everything */
order by items.price asc
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
| 3 | good | 3 |
I only want items with ids 1 and 3.
Update: Here's a fiddle using the answer below, which still produces the error:
http://sqlfiddle.com/#!1/33786/2
The problem is that PostgreSQL has no way of knowing which items records you want to take values from; that is, it can't tell that you want this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 3 |
and not this:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 2 | good | 5 |
To fix this, you need to use some sort of aggregation function, such as MAX:
SELECT MAX(id) AS id,
condition,
MAX(price) AS price
FROM items
WHERE product_id = 1
AND status = 'in_stock'
GROUP BY condition
ORDER BY price ASC
which gives:
| id | condition | price |
--------------------------
| 1 | new | 9 |
| 3 | good | 5 |
(This restriction is part of the SQL standard, and most DBMSes enforce it. One exception is MySQL, which allows your query, but with the caveat that "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate" [link].)
SQL Fiddle
select *
from (
select distinct on (cond)
id, cond, price
from items
where product_id = 1 and items.status = 'in_stock'
order by cond, price
) s
order by price
The SQL standard requires this behaviour, though some databases like MySQL ignore it and instead return unpredictable results.
If there's more than one row for "cond = good" and you ask for the "id" of the row where "cond = good", which row should the database give you? The row with id = 3, or id = 2? How should it know which to pick? MySQL picks an arbitrary row if there are multiple candidates, but this isn't allowed by the standard.
In your case you seem to want to pick the lowest-price row for each condition.
PostgreSQL provides an extension, DISTINCT ON ..., to help with this. Clodaldo has demonstrated this in his answer, so I won't repeat that here. Using DISTINCT ON will be much more efficient than the example below.
The SQL-standard way would be to use a window to rank the results, then filter on the ranked data. Unfortunately this is pretty inefficient as it requires all rows that match the inner where clause to be collected and sorted.
SELECT *
FROM (
SELECT *, dense_rank() OVER w AS itemrank
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
WINDOW w AS (PARTITION BY cond ORDER BY price ASC)
) ranked_items
WHERE itemrank = 1;
(http://sqlfiddle.com/#!1/33786/19)
Another SQL-standard way is to use an aggregation subquery to find the min prices for each category then display all rows with the min price:
SELECT *
FROM items INNER JOIN (
SELECT cond, min(price) AS minprice
FROM items
WHERE product_id = 1 AND items.status = 'in_stock'
GROUP BY cond
) minprices(cond, price)
ON (items.price = minprices.price AND items.cond = minprices.cond)
ORDER BY items.price;
Unlike the DISTINCT ON version, though, this will display multiple entries if the lowest priced item has more than one entry with the same cond and price.
So.. you should really use the DISTINCT ON approach, but you need to understand it. Start with the PostgreSQL documentation here.
On a side note, newer PostgreSQL versions allow you to refer to any column of a table whose primary key you've listed in GROUP BY; they identify the functional dependency of the other columns on the primary key. So you don't have to aggregate other cols if you've mentioned the PK in newer versions. That's what the standard requires, but older versions weren't smart enough to figure it out and required all columns to be listed explicitly.
That's what people who ask this question usually want to know, but doesn't apply strictly to your question since it turns out you're trying to use GROUP BY to filter rows.