TSQL - Get latest rows which their title is not null - tsql

I have following table:
========================
Id SubCode Title
========================
1 1 test1
1 2 test2
1 3 NULL
1 4 NULL
2 1 k1
2 2 k2
2 3 k3
2 4 NULL
No I want to select latest rows which their title is not null, for example for Id 1 then query must show test2 and for Id 2 it must be k3:
========================
Id SubCode Title
========================
1 2 test2
2 3 k3
I have written this query:
select t.Id, t.SubCode, t.Title from Test t
inner join (
select max(Id) as Id, max(SubCode) as SubCode
from Test
group by Id
) tm on t.Id = tm.Id and t.SubCode = tm.SubCode
But this code gives the wrong result:
========================
Id SubCode Title
========================
1 4 NULL
2 4 NULL
Any idea?

You forgot to exclude NULLs by writing an appropriate WHERE clause (where title is not null).
However such problems (to get a best / last / ... record) are usually best solved with analytic functions (RANK, DENSE_RANK, ROW_NUMBER) anyway, because with them you access the table only once:
select id, subcode, title
from
(
select id, subcode, title, rank() over (partition by id order by subcode desc) as rn
from test
where title is not null
) ranked
where rn = 1;

You need a Title is not null where clause in your inner select:
select t.Id, t.SubCode, t.Title from Test t
inner join (
select max(Id) as Id, max(SubCode) as SubCode
from Test
where Title is not null
group by Id
) tm on t.Id = tm.Id and t.SubCode = tm.SubCode

Related

Selecting other columns not in count, group by

So I have a table as follows
product_id sender_id timestamp ...other columns...
1 2 1222
1 2 3423
1 2 1231
2 2 890
3 4 234
2 3 234234
I want to get rows where sender_id = 2, but I want to count and group by product_id and sort by timestamp descending. This means I need the following result
product_id sender_id timestamp count ...other columns...
1 2 3423 3
2 2 890 1
I tried the following query:
SELECT product_id, sender_id, timestamp, count(product_id), ...other columns...
FROM table
WHERE sender_id = 2
GROUP BY product_id
But I get the following error Error in query: ERROR: column "table.sender_id" must appear in the GROUP BY clause or be used in an aggregate function
Seems like I cannot SELECT columns that are not in the GROUP BY. Another method which I found online was to join
SELECT product_id, sender_id, timestamp, count, ...other columns...
FROM table
JOIN (
SELECT product_id, COUNT(product_id) AS count
FROM table
GROUP BY (product_id)
) table1 ON table.product_id = table1.product_id
WHERE sender_id = 2
GROUP BY product_id
But doing this simply lists all rows without grouping or counting. My guess is that the ON part simply extends table again.
Try grouping using product_id, sender_id
select product_id, sender_id, count(product_id), max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
order by maxtm desc
If you want other columns too:
select t.*, t1.product_count
from t
inner join (
select product_id, sender_id, count(product_id) product_count, max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
) t1
on t.product_id = t1.product_id and t.sender_id = t1.sender_id and t.timestamp = t1.maxtm
order by t1.maxtm desc
Just do a workout with your data:
CREATE TABLE products (product_id INTEGER,
sender_id INTEGER,
time_stamp INTEGER)
INSERT INTO products VALUES
(1,2,1222),
(1,2,3423),
(1,2,1231),
(2,2,890),
(3,4,234),
(2,3,234234)
SELECT product_id,sender_id,string_agg(time_stamp::text,','),count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id
Here you have distinct time_stamp ,so you need to apply some aggregate or just remove that column in select statement.
If you remove time_stamp in select statement then it would be very easy like below :
SELECT product_id,sender_id,count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id

How to get the id of max count group in hive?

I have a table like this:
id , m_id , group_id
1 , a , 0
1 , b , 0
1 , c , 1
1 , d , 1
2 , e , 0
2 , f , 0
2 , g , 0
2 , h , 1
2 , i , 1
For each id, I would like to get the m_id which they belong to the group that has max number of m_id. If there is a tie, I will just take a random group of m_id. Hence the expected output will be like:
id , m_id
1 , a
1 , b
2 , e
2 , f
2 , g
Notice: the number from group_id is only an indicator of group identification under each id. i.e. group_id = 0 does not not mean the same thing between id=1, and id=2.
My original idea is to get the max(group_id) group by (id,m_id), and return the id,m_id which has the max(group_id). However, this approach wont help on the tie situation (id = 2 cases).
Really hope someone can help me on this!
Thanks!
Use row_number() and partition the group by id to get the max grouping.Then self join to get the max grouping for each id,group_id
CREATE TABLE test
(
id integer , m_id char(1) , group_id integer
);
INSERT INTO test (id,m_id,group_id) VALUES (1,'a',0);
INSERT INTO test (id,m_id,group_id) VALUES (1,'b',0);
INSERT INTO test (id,m_id,group_id) VALUES (1,'c',1);
INSERT INTO test (id,m_id,group_id) VALUES (1,'d',1);
INSERT INTO test (id,m_id,group_id) VALUES (2,'e',0);
INSERT INTO test (id,m_id,group_id) VALUES (2,'f',0);
INSERT INTO test (id,m_id,group_id) VALUES (2,'g',0);
INSERT INTO test (id,m_id,group_id) VALUES (2,'h',1);
INSERT INTO test (id,m_id,group_id) VALUES (2,'i',1);
select b.id,b.group_id,b.m_id
from (
select id,group_id,row_number() over(partition by id order by id,group_id,count(*) desc) as r_no
from test
group by id,group_id
) a
join test b on b.id=a.id and b.group_id=a.group_id
where a.r_no=1
Output
You can use row_number with aggregation to do this.
select t1.id,t1.group_id,t1.m_id
from (select id,group_id,row_number() over(partition by id order by count(*) desc) as rnum
from tbl
group by id,group_id
) t
join tbl t1 on t1.id=t.id and t1.group_id=t.group_id
where t.rnum=1

Subsetting records that contain multiple values in one column

In my postgres table, I have two columns of interest: id and name - my goal is to only keep records where id has more than one value in name. In other words, would like to keep all records of ids that have multiple values and where at least one of those values is B
UPDATE: I have tried adding WHERE EXISTS to the queries below but this does not work
The sample data would look like this:
> test
id name
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 2 B
9 1 B
10 2 B
and the output would look like this:
> output
id name
1 1 A
2 2 A
8 2 B
9 1 B
10 2 B
How would one write a query to select only these kinds records?
Based on your description you would seem to want:
select id, name
from (select t.*, min(name) over (partition by id) as min_name,
max(name) over (partition by id) as max_name
from t
) t
where min_name < max_name;
This can be done using EXISTS:
select id, name
from test t1
where exists (select *
from test t2
where t1.id = t2.id
and t1.name <> t2.name) -- this will select those with multiple names for the id
and exists (select *
from test t3
where t1.id = t3.id
and t3.name = 'B') -- this will select those with at least one b for that id
Those records where for their id more than one name shines up, right?
This could be formulated in "SQL" as follows:
select * from table t1
where id in (
select id
from table t2
group by id
having count(name) > 1)

Row_number() over partition

I am working on peoplesoft. I have a requirement where I have to update the column value in a sequence ordered based on some ID.
For eg.
CA24100001648- 1
CA24100001648- 2
CA24100001664- 1
CA24100001664- 2
CA24100001664- 3
CA24100001664- 4
CA24100001664- 5
CA24100001664- 6
But, I am getting '1' as the value for all the rows on updating.
Here is my query, can anyone please help out on this.
UPDATE PS_UC_CA_CONT_STG C
SET C.CONTRACT_LINE_NUM2 = ( SELECT row_number() over(PARTITION BY D.CONTRACT_NUM
order by D.CONTRACT_NUM)
FROM PS_UC_CA_HDR_STG D
WHERE C.CONTRACT_NUM=D.CONTRACT_NUM );
Thanksenter image description here
update emp a
set comm =
(with cnt as ( select deptno,empno,row_number() over (partition by deptno order by deptno) rn from emp)
select c.rn from cnt c where c.empno=a.empno)

unexplained error in sql execution

UPDATE amc_machine b
SET with_parts = a.with_parts,
amc_validity_upto = a.amc_validity_upto
FROM (SELECT CASE
WHEN count(*) > 0 THEN (SELECT DISTINCT ON (machine_id) with_parts, amc_validity_upto, machine_id
FROM amc_amcdetail
WHERE machine_id = 2 AND id != 1
ORDER BY machine_id, amc_validity_upto DESC)
WHEN count(*) = 0 THEN (SELECT FALSE AS with_parts, NULL AS amc_validity_upto, 2 AS machine_id)
END AS a
FROM (SELECT DISTINCT ON (machine_id) with_parts, amc_validity_upto, machine_id
FROM amc_amcdetail
WHERE machine_id = 2
ORDER BY machine_id, amc_validity_upto
) AS T) AS foo
WHERE a.machine_id = b.id
The error shown is
ERROR: subquery must return only one column
LINE 5: WHEN count(*) > 0 THEN (SELECT DISTINCT ON (machine_id) w...
Can anyone tell what seems to be the problem.
Basically the query is to update on table b with data from table a if exists, else update with null , false as the case is.
The query executes when standalone. I am using Postgres 9.3, but deployment will be on postgres9.1
The subquery returns 3 columns
SELECT DISTINCT ON (machine_id) with_parts, amc_validity_upto, machine_id
Make it return only one
SELECT DISTINCT ON (machine_id) with_parts