Postgres how can I merge 2 separate select queries into 1 - postgresql

I am using postgres 9.4 and I would like to merge 2 separate queries into one statement. I been looking at this How to merge these queries into 1 using subquery post but still can't figure out how to work it. These 2 queries do work independently. Here they are
# 1: select * from votes v where v.user_id=32 and v.stream_id=130;
#2: select city,state,post,created_on,votes,id as Voted from streams
where latitudes >=28.0363 AND 28.9059>= latitudes order by votes desc limit 5 ;
I would like query #2 to be limited by 5, however I don't want query #1 to be included in that limit so that up to 6 rows could be returned in total. This works like a suggestion engine where query #1 has a main thread and query #2 gives up to 5 different suggestions however they are obviously located in a different table.

Having no model and data I simulated this problem with dummies of both in this SQL Fiddle.
CREATE TABLE votes
(
id smallint
, user_id smallint
);
CREATE TABLE streams
(
id smallint
, foo boolean
);
INSERT INTO votes
VALUES (1, 42), (2, 32), (3, 17), (4, 37), (5, 73), (6, 69), (7, 21), (8, 18), (9, 11), (10, 15), (11, 28);
INSERT INTO streams
VALUES (1, true), (2, true), (3, true), (4, true), (5, true), (6, true), (7, false), (8, false), (9, false), (10, false), (11, false);
SELECT
id
FROM
(SELECT id, 1 AS sort FROM votes WHERE user_id = 32) AS query_1
FULL JOIN (SELECT id FROM streams WHERE NOT foo) AS query_2 USING (id)
ORDER BY
sort
LIMIT 6;
Also I have to point out, that this isn't my work entirely, but an adaptation of this answer I came across the other day. Maybe this is an approach here too.
So, what's going on? Column id stands for any column your tables and sub-queries will have in common. votes.user_id I made to have sth. to select in the one sub-query and streams.foo in the other.
As you demanded to have 6 rows at the most I used the limit clause twice. First in the sub-query just in case there is a huge amount of rows in your table you don't want to select and again in the outer query to finally restrict the number of rows. Fiddle about a little on the two limits and toggle WHERE foo and WHERE NOT foo and you see why.
In the first sub-query I added a sort column like it is done in that answer. That's because I guess you want the result of the first sub-query always on top too.

Related

PostgreSQL - Can I get the inverse of distinct rows?

I have a table of contacts. Each contact has an associating website. Each website can have multiple contacts.
I ran a query to get one contact with Select distinct on (website). This works fine.
But I want to do something the the rest of the data not selected but Select distinct on (website). Is there an inverse command where I can find all records from websites that have NOT been processed?
Use except. Here is an illustration. order by is for clarity.
create temporary table the_table (i integer, tx text);
insert into the_table values
(1, 'one'),
(1, 'one more one'),
(1, 'yet another one'),
(2, 'two'),
(2, 'one more two'),
(2, 'yet another two'),
(3, 'three'),
(3, 'three alternative');
select * from the_table
EXCEPT
select distinct on (i) * from the_table
order by i;
i
tx
1
one more one
1
yet another one
2
yet another two
2
one more two
3
three alternative

Concatenate mutliple contiguous rows to single row

i have a huge table with iot-datas from a lot of iot-devices. Every device is sending data one time per minute but only if counter-input got some singals. If not, no data will be sended. So in my database the datas looks like
Today I'm loading all this data in my application and aggregate them by iterating and checking row by row to 3 rows based on contiguous rows. Contiguous rows are all rows where next row is one minute later. It is working but it feels not smart and nice.
Does it make sense to generate this aggregation on sql server - espacialy increase performance?
How would you start?
This is a classic Islands and Gaps problem. I'm still mastering Islands and Gaps so I'd love any feedback on my solution from others in the know (please be gentle). There are at least a couple different ways to solve Islands and Gaps but this is the one that is easiest on my brain. Here's how I got it to work:
DDL to set up data:
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp;
CREATE TABLE #tmp
(IoT_Device INT,
Count INT,
TimeStamp DATETIME);
INSERT INTO #tmp
VALUES
(1, 5, '2021-10-27 14:03'),
(1, 4, '2021-10-27 14:04'),
(1, 7, '2021-10-27 14:05'),
(1, 8, '2021-10-27 14:06'),
(1, 5, '2021-10-27 14:07'),
(1, 4, '2021-10-27 14:08'),
(1, 7, '2021-10-27 14:12'),
(1, 8, '2021-10-27 14:13'),
(1, 5, '2021-10-27 14:14'),
(1, 4, '2021-10-27 14:15'),
(1, 5, '2021-10-27 14:21'),
(1, 4, '2021-10-27 14:22'),
(1, 7, '2021-10-27 14:23');
Islands and Gaps Solution:
;WITH CTE_TIMESTAMP_DATA AS (
SELECT
IoT_Device,
Count,
TimeStamp,
LAG(TimeStamp) OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS previous_timestamp,
LEAD(TimeStamp) OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS next_timestamp,
ROW_NUMBER() OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_location
FROM #tmp
)
,CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_number,
IoT_Device,
TimeStamp AS island_start_timestamp,
island_location AS island_start_location
FROM CTE_TIMESTAMP_DATA
WHERE DATEDIFF(MINUTE, previous_timestamp, TimeStamp) > 1
OR previous_timestamp IS NULL
)
,CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_number,
IoT_Device,
TimeStamp AS island_end_timestamp,
island_location AS island_end_location
FROM CTE_TIMESTAMP_DATA
WHERE DATEDIFF(MINUTE, TimeStamp, next_timestamp) > 1
OR next_timestamp IS NULL
)
SELECT
S.IoT_Device,
(SELECT SUM(Count)
FROM CTE_TIMESTAMP_DATA
WHERE IoT_Device = S.IoT_Device
AND TimeStamp BETWEEN S.island_start_timestamp AND E.island_end_timestamp) AS Count,
S.island_start_timestamp,
E.island_end_timestamp
FROM CTE_ISLAND_START AS S
INNER JOIN CTE_ISLAND_END AS E
ON E.IoT_Device = S.IoT_Device
AND E.island_number = S.island_number;
The CTE_TIMESTAMP_DATA query pulls the IoT_Device, Count, and TimeStamp along with the TimeStamp before and after each record using LAG and LEAD, and assigns a row number to each record ordered by TimeStamp.
The CTE_ISLAND_START query gets the start of each island.
The CTE_ISLAND_END query gets the end of each island.
The main SELECT at the bottom then uses this data to sum the Count within each island.
This will work with multiple IoT_Devices.
You can read more about Islands and Gaps here or numerous other places online.

Best way to avoid duplicates in table?

I've been given a task that requires writing a script to mass change items in a table(ProductArea):
ProductID int
SalesareaID int
One ProductID can only exist once in each SalesareaID so there can't be any duplicates in this table. But one ProductID can be sold in multiple SalesareaID.
So an example would look something like:
ProductID SalesareaID
1 1
1 2
1 3
2 2
3 1
Now, some areas have merged. So, if I try to run a straight-forward UPDATE to fix this like:
UPDATE ProductArea SET SalesareaID = 4 where SalesareaID IN (2, 3)
it will find (1, 2) and change that to (1, 4). Then it will find (1, 3) and try to change that to (1, 4). But that already exist so it will crash with a "Cannot insert duplicate key..."-error.
Is there a best/recommended way to tell my UPDATE to only update if the resulting (ProductID, SalesareaID) doesn't already exist?
This should work
It uses a window function
declare #T table (prodID int, salesID int, primary key (prodID, salesID));
insert into #T values
(1, 1)
, (1, 2)
, (1, 3)
, (2, 2)
, (3, 1);
with cte as
( select t.*
, row_number() over (partition by t.prodID order by t.salesID) as rn
from #T t
where t.salesID in (2, 3)
)
delete cte where rn > 1;
update #T set salesID = 4 where salesID in (2, 3);
select * from #T;
If you are creating a new merged region from existing regions then I think the easiest thing to do would be to treat the merge as two separate operations.
First you insert entries for the new area based on the existing areas.
INSERT INTO ProductArea (ProductID, SalesareaID)
SELECT DISTINCT ProductID, 4 FROM ProductArea
WHERE SalesareaID IN (2, 3)
Then you remove the entries for the existing areas.
DELETE FROM ProductArea WHERE SalesareaID IN (2, 3)
The SalesareaID of 4 would need to be replaced by the id of the new Salesarea. The 2 and 3 would also need to be replaced by the ids of the areas you are merging to create the new Salesarea.

Performance Issue with finding recent date of each group and joining to all records

I have following tables:
CREATE TABLE person (
id INTEGER NOT NULL,
name TEXT,
CONSTRAINT person_pkey PRIMARY KEY(id)
);
INSERT INTO person ("id", "name")
VALUES
(1, E'Person1'),
(2, E'Person2'),
(3, E'Person3'),
(4, E'Person4'),
(5, E'Person5'),
(6, E'Person6');
CREATE TABLE person_book (
id INTEGER NOT NULL,
person_id INTEGER,
book_id INTEGER,
receive_date DATE,
expire_date DATE,
CONSTRAINT person_book_pkey PRIMARY KEY(id)
);
/* Data for the 'person_book' table (Records 1 - 9) */
INSERT INTO person_book ("id", "person_id", "book_id", "receive_date", "expire_date")
VALUES
(1, 1, 1, E'2016-01-18', NULL),
(2, 1, 2, E'2016-02-18', E'2016-10-18'),
(3, 1, 4, E'2016-03-18', E'2016-12-18'),
(4, 2, 3, E'2017-02-18', NULL),
(5, 3, 5, E'2015-02-18', E'2016-02-23'),
(6, 4, 34, E'2016-12-18', E'2018-02-18'),
(7, 5, 56, E'2016-12-28', NULL),
(8, 5, 34, E'2018-01-19', E'2018-10-09'),
(9, 5, 57, E'2018-06-09', E'2018-10-09');
CREATE TABLE book (
id INTEGER NOT NULL,
type TEXT,
CONSTRAINT book_pkey PRIMARY KEY(id)
) ;
/* Data for the 'book' table (Records 1 - 8) */
INSERT INTO book ("id", "type")
VALUES
( 1, E'Btype1'),
( 2, E'Btype2'),
( 3, E'Btype3'),
( 4, E'Btype4'),
( 5, E'Btype5'),
(34, E'Btype34'),
(56, E'Btype56'),
(67, E'Btype67');
My query should list name of all persons and for persons with recently received book types of (book_id IN (2, 4, 34, 56, 67)), it should display the book type and expire date; if a person hasn’t received such book type it should display blank as book type and expire date.
My query looks like this:
SELECT p.name,
pb.expire_date,
b.type
FROM
(SELECT p.id AS person_id, MAX(pb.receive_date) recent_date
FROM
Person p
JOIN person_book pb ON pb.person_id = p.id
WHERE pb.book_id IN (2, 4, 34, 56, 67)
GROUP BY p.id
)tmp
JOIN person_book pb ON pb.person_id = tmp.person_id
AND tmp.recent_date = pb.receive_date AND pb.book_id IN
(2, 4, 34, 56, 67)
JOIN book b ON b.id = pb.book_id
RIGHT JOIN Person p ON p.id = pb.person_id
The (correct) result:
name | expire_date | type
---------+-------------+---------
Person1 | 2016-12-18 | Btype4
Person2 | |
Person3 | |
Person4 | 2018-02-18 | Btype34
Person5 | 2018-10-09 | Btype34
Person6 | |
The query works fine but since I'm right joining a small table with a huge one, it's slow. Is there any efficient way of rewriting this query?
My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.
Problems with your setup
My local PostgreSQL version is 9.3.18; but the query should work on version 8.4 as well since that's our productions version.
That makes two major problems before even looking at the query:
Postgres 8.4 is just too old. Especially for "production". It has reached EOL in July 2014. No more security upgrades, hopelessly outdated. Urgently consider upgrading to a current version.
It's a loaded footgun to use very different versions for development and production. Confusion and errors that go undetected. We have seen more than one desperate request here on SO stemming from this folly.
Better query
This equivalent should be substantially simpler and faster (works in pg 8.4, too):
SELECT p.name, pb.expire_date, b.type
FROM (
SELECT DISTINCT ON (person_id)
person_id, book_id, expire_date
FROM person_book
WHERE book_id IN (2, 4, 34, 56, 67)
ORDER BY person_id, receive_date DESC NULLS LAST
) pb
JOIN book b ON b.id = pb.book_id
RIGHT JOIN person p ON p.id = pb.person_id;
To optimize read performance, this partial multicolumn index with matching sort order would be perfect:
CREATE INDEX ON person_book (person_id, receive_date DESC NULLS LAST)
WHERE book_id IN (2, 4, 34, 56, 67);
In modern Postgres versions (9.2 or later) you might append book_id, expire_date to the index columns to get index-only scans. See:
How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?
About DISTINCT ON:
Select first row in each GROUP BY group?
About DESC NULLS LAST:
PostgreSQL sort by datetime asc, null first?

PostgreSQL Get holes in index column

I suppose it is not easy to query a table for data which don't exists but maybe here is some trick to achieve holes in one integer column (rowindex).
Here is small table for illustrating concrete situation:
DROP TABLE IF EXISTS examtable1;
CREATE TABLE examtable1
(rowindex integer primary key, mydate timestamp, num1 integer);
INSERT INTO examtable1 (rowindex, mydate, num1)
VALUES (1, '2015-03-09 07:12:45', 1),
(3, '2015-03-09 07:17:12', 4),
(5, '2015-03-09 07:22:43', 1),
(6, '2015-03-09 07:25:15', 3),
(7, '2015-03-09 07:41:46', 2),
(10, '2015-03-09 07:42:05', 1),
(11, '2015-03-09 07:45:16', 4),
(14, '2015-03-09 07:48:38', 5),
(15, '2015-03-09 08:15:44', 2);
SELECT rowindex FROM examtable1;
With showed query I get all used indexes listed.
But I would like to get (say) first five indexes which is missed so I can use them for insert new data at desired rowindex.
In concrete example result will be: 2, 4, 8, 9, 12 what represent indexes which are not used.
Is here any trick to build a query which will give n number of missing indexes?
In real, such table may contain many rows and "holes" can be anywhere.
You can do this by generating a list of all numbers using generate_series() and then check which numbers don't exist in your table.
This can either be done using an outer join:
select nr.i as missing_index
from (
select i
from generate_series(1, (select max(rowindex) from examtable1)) i
) nr
left join examtable1 t1 on nr.i = t1.rowindex
where t1.rowindex is null;
or an not exists query:
select i
from generate_series(1, (select max(rowindex) from examtable1)) i
where not exists (select 1
from examtable1 t1
where t1.rowindex = i.i);
I have used a hardcoded lower bound for generate_series() so that you would also detect a missing rowindex that is smaller than the lowest number.