Select only those records which are twice in postgres - postgresql

select distinct(msg_id),sub_id from programs where sub_id IN
(
select sub_id from programs group by sub_id having count(sub_id) = 2 limit 5
)
sub_id means subscriberID
Inner query will return those subscriberID which are exactly 2 times in the program table and main query will gives those subscriberID which having distinct msg_id.
This result will generated
msg_id sub_id
------|--------|
112 | 313
111 | 222
113 | 313
115 | 112
116 | 112
117 | 101
118 | 115
119 | 115
110 | 222
I want it should be
msg_id sub_id
------|--------|
112 | 313
111 | 222
113 | 313
115 | 112
116 | 112
118 | 115
119 | 115
110 | 222
117 | 101 (this result should not be in output because its only once)
I want only those record which are twice.

I'm not sure, but are you just missing the second field in your in-list?
select distinct msg_id, sub_id, <presumably other fields>
from programs
where (sub_id, msg_id) IN
(
select sub_id, msg_id
from programs
group by sub_id, msg_id
having count(sub_id) = 2
)
If so, you can also do this with a windowing function:
with cte as (
select
msg_id, sub_id, <presumably other fields>,
count (*) over (partition by msg_id, sub_id) as cnt
from programs
)
select distinct
msg_id, sub_id, <presumably other fields>
from cte
where cnt = 2

try this
SELECT msg_id, MAX(sub_id)
FROM programs
GROUP BY msg_id
HAVING COUNT(sub_id) = 2 -- COUNT(sub_id) > 1 if you want all those that repeat more than once
ORDER BY msg_id

Related

Unpivot data in PostgreSQL

I have a table in PostgreSQL with the below values,
empid hyderabad bangalore mumbai chennai
1 20 30 40 50
2 10 20 30 40
And my output should be like below
empid city nos
1 hyderabad 20
1 bangalore 30
1 mumbai 40
1 chennai 50
2 hyderabad 10
2 bangalore 20
2 mumbai 30
2 chennai 40
How can I do this unpivot in PostgreSQL?
You can use a lateral join:
select t.empid, x.city, x.nos
from the_table t
cross join lateral (
values
('hyderabad', t.hyderabad),
('bangalore', t.bangalore),
('mumbai', t.mumbai),
('chennai', t.chennai)
) as x(city, nos)
order by t.empid, x.city;
Or this one: simpler to read- and real plain SQL ...
WITH
input(empid,hyderabad,bangalore,mumbai,chennai) AS (
SELECT 1,20,30,40,50
UNION ALL SELECT 2,10,20,30,40
)
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
)
SELECT
empid
, CASE i
WHEN 1 THEN 'hyderabad'
WHEN 2 THEN 'bangalore'
WHEN 3 THEN 'mumbai'
WHEN 4 THEN 'chennai'
ELSE 'unknown'
END AS city
, CASE i
WHEN 1 THEN hyderabad
WHEN 2 THEN bangalore
WHEN 3 THEN mumbai
WHEN 4 THEN chennai
ELSE NULL::INT
END AS city
FROM input CROSS JOIN i
ORDER BY empid,i;
-- out empid | city | city
-- out -------+-----------+------
-- out 1 | hyderabad | 20
-- out 1 | bangalore | 30
-- out 1 | mumbai | 40
-- out 1 | chennai | 50
-- out 2 | hyderabad | 10
-- out 2 | bangalore | 20
-- out 2 | mumbai | 30
-- out 2 | chennai | 40

Run a SQL query against ten-minutes time intervals

I have a postgresql table with this schema:
id SERIAL PRIMARY KEY,
traveltime INT,
departuredate TIMESTAMPTZ,
departurehour TIMETZ
Here is a bit of data (edited):
id | traveltime | departuredate | departurehour
----+------------+------------------------+---------------
1 | 73 | 2019-12-24 00:00:03+01 | 00:00:03+01
2 | 73 | 2019-12-24 00:12:16+01 | 00:12:16+01
53 | 115 | 2019-12-24 07:53:44+01 | 07:53:44+01
54 | 116 | 2019-12-24 07:58:45+01 | 07:58:45+01
55 | 119 | 2019-12-24 08:03:46+01 | 08:03:46+01
56 | 120 | 2019-12-24 08:08:47+01 | 08:08:47+01
57 | 121 | 2019-12-24 08:13:48+01 | 08:13:48+01
58 | 121 | 2019-12-24 08:18:48+01 | 08:18:48+01
542 | 112 | 2019-12-26 07:52:41+01 | 07:52:41+01
543 | 114 | 2019-12-26 07:57:42+01 | 07:57:42+01
544 | 116 | 2019-12-26 08:02:43+01 | 08:02:43+01
545 | 116 | 2019-12-26 08:07:44+01 | 08:07:44+01
546 | 117 | 2019-12-26 08:12:45+01 | 08:12:45+01
547 | 118 | 2019-12-26 08:17:46+01 | 08:17:46+01
548 | 118 | 2019-12-26 08:22:48+01 | 08:22:48+01
1031 | 80 | 2019-12-28 07:50:33+01 | 07:50:33+01
1032 | 81 | 2019-12-28 07:55:34+01 | 07:55:34+01
1033 | 81 | 2019-12-28 08:00:35+01 | 08:00:35+01
1034 | 82 | 2019-12-28 08:05:36+01 | 08:05:36+01
1035 | 82 | 2019-12-28 08:10:37+01 | 08:10:37+01
1036 | 83 | 2019-12-28 08:15:38+01 | 08:15:38+01
1037 | 83 | 2019-12-28 08:20:39+01 | 08:20:39+01
I'd like to get the average for all the values collected for traveltime for each 10 minutes interval for several weeks.
Expected result for the data sample: for the 10-minutes interval between 8h00 and 8h10, the rows that will be included in the avg are with id 55, 56, 544, 545, 1033 and 1034
and so on.
I can get the average for a specific interval:
select avg(traveltime) from belt where departurehour >= '10:40:00+01' and departurehour < '10:50:00+01';
To avoid creating a query for each interval, I used this query to get all the 10-minutes intervals for the complete period encoded:
select i from generate_series('2019-11-23', '2020-01-18', '10 minutes'::interval) i;
What I miss is a way to apply my AVG query to each of these generated intervals. Any direction would be helpful!
It turns out that the generate_series does not actually apply as requardless of the date range. The critical part is the 144 10Min intervals per day. Unfortunatly Postgres does not provide an interval type for minuets. (Perhaps creating one would be a useful exersize). But all is not loss you can simulate the same with BETWEEN, just need to play with the ending of the range.
The following generates this simulation using a recursive CTE. Then as previously joins to your table.
set timezone to '+1'; -- necessary to keep my local offset from effecting results.
-- create table an insert data here
-- additional data added outside of date range so should not be included)
with recursive min_intervals as
(select '00:00:00'::timetz start_10Min -- start of 1st 10Min interval
, '00:09:59.999999'::timetz end_10Min -- last microsecond in 10Min interval
, 1 interval_no
union all
select start_10Min + interval '10 min'
, end_10Min + interval '10 min'
, interval_no + 1
from Min_intervals
where interval_no < 144 -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
) -- select * from min_intervals;
select start_10Min, end_10Min, avg(traveltime) average_travel_time
from min_intervals
join belt
on departuredate::time between start_10Min and end_10Min
where departuredate::date between date '2019-11-23' and date '2020-01-18'
group by start_10Min, end_10Min
order by start_10Min;
-- test result for 'specified' Note added rows fall within time frame 08:00 to 08:10
-- but these should be excluded so the avg for that period should be the same for both queries.
select avg(traveltime) from belt where id in (55, 56, 544, 545, 1033, 1034);
My issue with the above is the data range is essentially hard coded (yes substitution parameter are available) and manually but that is OK for psql or an IDE but not good for a production environment. If this is to be used in that environment I'd use the following function to return a virtual table of the same results.
create or replace function travel_average_per_10Min_interval(
start_date_in date
, end_date_in date
)
returns table (Start_10Min timetz
,end_10Min timetz
,avg_travel_time numeric
)
language sql
as $$
with recursive min_intervals as
(select '00:00:00'::timetz start_10Min -- start of 1st 10Min interval
, '00:09:59.999999'::timetz end_10Min -- last microsecond in 10Min interval
, 1 interval_no
union all
select start_10Min + interval '10 min'
, end_10Min + interval '10 min'
, interval_no + 1
from Min_intervals
where interval_no < 144 -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
) -- select * from min_intervals;
select start_10Min, end_10Min, avg(traveltime) average_travel_time
from min_intervals
join belt
on departuredate::time between start_10Min and end_10Min
where departuredate::date between start_date_in and end_date_in
group by start_10Min, end_10Min
order by start_10Min;
$$;
-- test
select * from travel_average_per_10Min_interval(date '2019-11-23', date '2020-01-18');

Showing only TOP 1 value result from join duplicates

I have 3 tables like below. You will see how they are joined.
Orders Table
+---------+------------+
| Orderid | LocationId |
+---------+------------+
| 36 | 14 |
| 38 | 13 |
+---------+------------+
OrdersDetails Table
+-----------+------------+
| Detailsid | OrderId |
+-----------+------------+
| 38 | 36 |
| 39 | 36 |
| 40 | 38 |
+-----------+------------+
OrderLocations
+------------+------------+
| Locationid | DistanceKM |
+------------+------------+
| 13 | 550 |
| 14 | 245 |
+------------+------------+
When doing an inner join of the 3 tables we get:
I don't want to have a duplicate DistanceKM, ex. 245. I would like a 0 instead for line item 2 like this:
Here is my solution:
Creating tables:
CREATE TABLE #Orders
(
Orderid INT, LocationId INT
);
INSERT INTO #Orders
VALUES
(36, 14
),
(38, 13
);
CREATE TABLE #OrdersDetails
(
Detailsid INT, OrderId INT
);
INSERT INTO #OrdersDetails
VALUES
(38, 36
),
(39, 36
),
(40, 38
);
CREATE TABLE #OrderLocations
(
Locationid INT, DistanceKM INT
);
INSERT INTO #OrderLocations
VALUES
(13, 550
),
(14, 245
);
The actual query:
;WITH cte
AS
(SELECT o.Orderid, d.Detailsid, l.DistanceKM, ROW_NUMBER() OVER
(PARTITION BY l.DistanceKM ORDER BY o.Orderid
) AS rn
FROM #Orders AS o
INNER JOIN
#OrdersDetails AS d
ON o.Orderid = d.OrderId
INNER JOIN
#OrderLocations AS l
ON o.LocationId = l.Locationid
)
SELECT cte.Orderid, cte.Detailsid,
CASE
WHEN cte.rn > 1
THEN 0
ELSE cte.DistanceKM
END AS DistanceKM
FROM CTE;
And here is the results:

Select limited set of fields from inner query with preserved order

I've got a SQL query which involves one-to-many relationships with ORDER BY clause:
SELECT
s0_.id,
s0_.created_at,
s5_.sort_order
FROM
surveys_submits s0_
INNER JOIN surveys_answers s3_ ON s0_.id = s3_.submit_id
INNER JOIN surveys_questions s4_ ON s3_.question_id = s4_.id
INNER JOIN surveys_questions_references s5_ ON s4_.id = s5_.question_id
ORDER BY
s0_.created_at DESC,
s5_.sort_order ASC
This query returns following results:
id | created_at | sort_order
----+---------------------+-----------
218 | 2014-03-18 12:21:09 | 1
218 | 2014-03-18 12:21:09 | 2
218 | 2014-03-18 12:21:09 | 3
218 | 2014-03-18 12:21:09 | 4
218 | 2014-03-18 12:21:09 | 5
217 | 2014-03-18 12:20:57 | 1
217 | 2014-03-18 12:20:57 | 2
217 | 2014-03-18 12:20:57 | 3
...
214 | 2014-03-18 12:18:01 | 4
214 | 2014-03-18 12:18:01 | 5
213 | 2014-03-18 12:17:48 | 1
213 | 2014-03-18 12:17:48 | 2
213 | 2014-03-18 12:17:48 | 3
213 | 2014-03-18 12:17:48 | 4
213 | 2014-03-18 12:17:48 | 5
Now, I need to modify this query in a way that would return first 25 distinct ids from the begining with preserved order.
I've tried something like this:
SELECT DISTINCT id
FROM (
SELECT ... ORDER BY ...
) inner_query
ORDER BY created_at DESC, sort_order ASC
LIMIT 25 OFFSET 0;
But obviously it doesn't work:
ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 16: created_at DESC,
^
********** Error **********
...and I can't add created_at and sort_order columns to SELECT clause cause it would result in duplicated ids, just like the first query.
select *
from (
SELECT distinct on (s0_.id)
s0_.id,
s0_.created_at,
s5_.sort_order
FROM
surveys_submits s0_
INNER JOIN surveys_answers s3_ ON s0_.id = s3_.submit_id
INNER JOIN surveys_questions s4_ ON s3_.question_id = s4_.id
INNER JOIN surveys_questions_references s5_ ON s4_.id = s5_.question_id
ORDER BY
s0_.id,
s0_.created_at DESC,
s5_.sort_order ASC
) s
order by
created_at desc,
sort_order ASC
limit 25
From the manual
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

SQL Server: FAILING extra records

I have a tableA (ID int, Match varchar, code char, status = char)
ID Match code Status
101 123 A
102 123 B
103 123 C
104 234 A
105 234 B
106 234 C
107 234 B
108 456 A
109 456 B
110 456 C
I want to populate status with 'FAIL' when:
For same match, there exists code different than (A,B or C)
or the code exists multiple times.
In other words, code can be only (A,B,C) and it should exists only one for same match, else fail. So, expected result would be:
ID Match code Status
101 123 A NULL
102 123 B NULL
103 123 C NULL
104 234 A NULL
105 234 B NULL
106 234 C NULL
107 234 B FAIL
108 456 A NULL
109 456 B NULL
110 456 C NULL
Thanks
No guarantees on efficiency here...
update tableA
set status = 'FAIL'
where ID not in (
select min(ID)
from tableA
group by Match, code)