Choose the duplicates and pick based on non duplicate column - postgresql

I need to write a query on below table to fetch the records only when the same email and name is shared by more than 1 member. In below example, I need resultset as
100 a#a.com nameA
300 a#a.com nameA
Table
Member email name
100 a#a.com nameA
100 a#a.com nameA
300 a#a.com nameA
200 b#b.com nameB

I doubt you have typo and you mean 100 instead of 200 in your expected result. If so, then there is one way:
with your_table(Member, email , name ) as (
select 100,'a#a.com','nameA' union all
select 100,'a#a.com','nameA' union all
select 300,'a#a.com','nameA' union all
select 200,'b#b.com','nameB'
)
-- below is actual query:
select distinct your_table.*
from your_table
inner join (
select email , name from your_table
group by email , name
having count(distinct Member) > 1
) t
on your_table.email = t.email and your_table.name = t.name

Related

Selecting other columns not in count, group by

So I have a table as follows
product_id sender_id timestamp ...other columns...
1 2 1222
1 2 3423
1 2 1231
2 2 890
3 4 234
2 3 234234
I want to get rows where sender_id = 2, but I want to count and group by product_id and sort by timestamp descending. This means I need the following result
product_id sender_id timestamp count ...other columns...
1 2 3423 3
2 2 890 1
I tried the following query:
SELECT product_id, sender_id, timestamp, count(product_id), ...other columns...
FROM table
WHERE sender_id = 2
GROUP BY product_id
But I get the following error Error in query: ERROR: column "table.sender_id" must appear in the GROUP BY clause or be used in an aggregate function
Seems like I cannot SELECT columns that are not in the GROUP BY. Another method which I found online was to join
SELECT product_id, sender_id, timestamp, count, ...other columns...
FROM table
JOIN (
SELECT product_id, COUNT(product_id) AS count
FROM table
GROUP BY (product_id)
) table1 ON table.product_id = table1.product_id
WHERE sender_id = 2
GROUP BY product_id
But doing this simply lists all rows without grouping or counting. My guess is that the ON part simply extends table again.
Try grouping using product_id, sender_id
select product_id, sender_id, count(product_id), max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
order by maxtm desc
If you want other columns too:
select t.*, t1.product_count
from t
inner join (
select product_id, sender_id, count(product_id) product_count, max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
) t1
on t.product_id = t1.product_id and t.sender_id = t1.sender_id and t.timestamp = t1.maxtm
order by t1.maxtm desc
Just do a workout with your data:
CREATE TABLE products (product_id INTEGER,
sender_id INTEGER,
time_stamp INTEGER)
INSERT INTO products VALUES
(1,2,1222),
(1,2,3423),
(1,2,1231),
(2,2,890),
(3,4,234),
(2,3,234234)
SELECT product_id,sender_id,string_agg(time_stamp::text,','),count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id
Here you have distinct time_stamp ,so you need to apply some aggregate or just remove that column in select statement.
If you remove time_stamp in select statement then it would be very easy like below :
SELECT product_id,sender_id,count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id

Subsetting records that contain multiple values in one column

In my postgres table, I have two columns of interest: id and name - my goal is to only keep records where id has more than one value in name. In other words, would like to keep all records of ids that have multiple values and where at least one of those values is B
UPDATE: I have tried adding WHERE EXISTS to the queries below but this does not work
The sample data would look like this:
> test
id name
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 2 B
9 1 B
10 2 B
and the output would look like this:
> output
id name
1 1 A
2 2 A
8 2 B
9 1 B
10 2 B
How would one write a query to select only these kinds records?
Based on your description you would seem to want:
select id, name
from (select t.*, min(name) over (partition by id) as min_name,
max(name) over (partition by id) as max_name
from t
) t
where min_name < max_name;
This can be done using EXISTS:
select id, name
from test t1
where exists (select *
from test t2
where t1.id = t2.id
and t1.name <> t2.name) -- this will select those with multiple names for the id
and exists (select *
from test t3
where t1.id = t3.id
and t3.name = 'B') -- this will select those with at least one b for that id
Those records where for their id more than one name shines up, right?
This could be formulated in "SQL" as follows:
select * from table t1
where id in (
select id
from table t2
group by id
having count(name) > 1)

TSQL: Inserting missing records into table

I am stuck at this T-SQL query.
I have table below
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
1 Section2 500
3 Section2 100
4 Section2 200
Lets say for each section I can have maximum 5 Age. In above table there are some missing Ages. How do I insert missing Ages for each section. (Possibly without using cursor). The cost would be zero for missing Ages
So after the insertion the table should look like
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
3 Section1 0
4 Section1 0
5 Section1 0
1 Section2 500
2 Section2 0
3 Section2 100
4 Section2 200
5 Section2 0
EDIT1
I should have been more clear with my question. The maximum age is dynamic value. It could be 5,6,10 or someother value but it will be always less than 25.
I think I got it
;WITH tally AS
(
SELECT 1 AS r
UNION ALL
SELECT r + 1 AS r
FROM tally
WHERE r < 5 -- this value could be dynamic now
)
select n.r, t.SectionName, 0 as Cost
from (select distinct SectionName from TempFormsSectionValues) t
cross join
(select ta.r FROM tally ta) n
where not exists
(select * from TempFormsSectionValues where YearsAgo = n.r and SectionName = t.SectionName)
order by t.SectionName, n.r
You can use this query to select missing value:
select n.num, t.SectioName, 0 as Cost
from (select distinct SectioName from table1) t
cross join
(select 1 as num union select 2 union select 3 union select 4 union select 5) n
where not exists
(select * from table1 where table1.age = n.num and table1.SectioName = t.SectioName)
It creates a Cartesian product of sections and numbers 1 to 5 and then selects those that doesn't exist yet. You can then use this query for the source of insert into your table.
SQL Fiddle (it has order by added to check the results easier but it's not necessary for inserting).
Use below query to generate missing rows
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
ORDER BY Section,Age
SQL Fiddle
You can utilize above result set for inserting missing rows by using EXCEPT operator to exclude already existing rows in table -
INSERT INTO test
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
EXCEPT
SELECT Age,Section,Cost
FROM test
SELECT * FROM test
ORDER BY Section,Age
http://www.sqlfiddle.com/#!3/d9035/11

postgre group by having

I have a query like that
select c.travelandsmile_id, c.name, c.surname
from customer c
where c.travelandsmile_id in
(
select s.travelandsmile_id
from spent_kilometers s
group by travelandsmile_id
having count(s.kilometers)=1
)
I want to select the records that are shown only once in the table spent_kilometers and where kilometers is greater than 30. But when I add where s.kilometers > 30, the result is wrong and more tuples appear according to first query.
How can I do that?
select travelandsmile_id, c.name, c.surname
from
customer c
inner join
spent_kilometers s using (travelandsmile_id)
where s.kilometers > 30
group by travelandsmile_id, c.name, c.surname
having count(*) = 1
If I read the question correctly you want find all customers who have one record in spent_kilometers and this record must have the constraint s.kilometers > 30.
This can be done with the following SQL.
select c.travelandsmile_id, c.name, c.surname
from customer c
where c.travelandsmile_id in
( /* find all customers that have only one record in spent_kilometers */
select c.travelandsmile_id
from spent_kilometers s
group by travelandsmile_id having count(travelandsmile_id) = 1
)
and c.travelandsmile_id in
( /* find all customers that have s.kilometers > 30 */
select c.travelandsmile_id
from spent_kilometers s
where s.kilometers > 30
);

Grouping SQL results by continous time intervals (oracle sql)

I have following data in the table as below and I am looking for a way to group the continuous time intervals for each id to return:
CREATE TABLE DUMMY
(
ID VARCHAR2(10 BYTE),
TIME_STAMP VARCHAR2(8 BYTE),
NAME VARCHAR2(255 BYTE)
);
SELECT ID, min(TIME_STAMP) "startDate", max(TIME_STAMP) "endDate", NAME
GROUP BY ID , NAME
something like
100 20011128 20011203 David
100 20011204 20011207 Unknown
100 20011208 20011215 David
100 20011216 20011220 Sara
and so on ...
ps. I have a sample script, but i don't know how to attach my file.
Hi every one here is more input:
There is only one record with time_stamp for a specific ID.
Users can be different, for example for day 1 David, day 2 unknown, day 3 David and so on.
So there is one row for every day of year for each ID but with different users.
Now, i want to see the break point, differences base on time_stamp intervals from day one
until last day for a specific ID in day order from begin day until last day.
Query Result should be :
ID NAME MIN_DATE MAX_DATE
100 David 20011128 20050407
100 Sara 20050408 20050417
100 David 20050418 20080416
100 Unknown 20080417 20080507
100 David 20080508 20080508
100 Unknown 20080509 20080607
100 David 20080608 20080608
100 Unknown 20080609 20080921
100 David 20080922 20080922
100 Unknown 20080923 20081231
100 David 20090101 20090405
thanks
Hi again, many thanks to everyone, i have solved the problem, here is the solution:
select id, min(time_stamp), max(time_stamp), name
from ( select id, time_stamp, name,
max(rn) over (order by time_stamp) grp
from ( select id, time_stamp, name,
case
when lag(name) over (order by time_stamp) <> name or
row_number() over (order by time_stamp) = 1
then row_number() over (order by time_stamp)
end rn
from dummy
)
)
group by id, grp, name
order by 1
Select
ID,
Name,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id,
Name
That should work.
IF you want the date range for each Id, but all the names you can do:
Select
d.Id,
d.Name,
dr.min_date,
dr.max_date
from
Dummy d
JOIN
(Select
Id,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id
) dr
on ( dr.Id = d.Id)