How can I get the sum(value) on the latest gather_time per group(name,col1) in PostgreSQL? - postgresql

Actually, I got a good answer about the similar issue on below thread, but I need one more solution for different data set.
How to get the latest 2 rows ( PostgreSQL )
The Data set has historical data, and I just want to get sum(value) for the group on the latest gather_time.
The final result should be as following:
name | col1 | gather_time | sum
-------+------+---------------------+-----
first | 100 | 2016-01-01 23:12:49 | 6
first | 200 | 2016-01-01 23:11:13 | 4
However, I just can see the data for the one group(first-100) with a query below meaning that there is no data for the second group(first-200).
Thing is that I need to get the one row per the group.
The number of the group can be vary.
select name,col1,gather_time,sum(value)
from testtable
group by name,col1,gather_time
order by gather_time desc
limit 2;
name | col1 | gather_time | sum
-------+------+---------------------+-----
first | 100 | 2016-01-01 23:12:49 | 6
first | 100 | 2016-01-01 23:11:19 | 6
(2 rows)
Can you advice me to accomplish this requirement?
Data set
create table testtable
(
name varchar(30),
col1 varchar(30),
col2 varchar(30),
gather_time timestamp,
value integer
);
insert into testtable values('first','100','q1','2016-01-01 23:11:19',2);
insert into testtable values('first','100','q2','2016-01-01 23:11:19',2);
insert into testtable values('first','100','q3','2016-01-01 23:11:19',2);
insert into testtable values('first','200','t1','2016-01-01 23:11:13',2);
insert into testtable values('first','200','t2','2016-01-01 23:11:13',2);
insert into testtable values('first','100','q1','2016-01-01 23:11:11',2);
insert into testtable values('first','100','q1','2016-01-01 23:12:49',2);
insert into testtable values('first','100','q2','2016-01-01 23:12:49',2);
insert into testtable values('first','100','q3','2016-01-01 23:12:49',2);
select *
from testtable
order by name,col1,gather_time;
name | col1 | col2 | gather_time | value
-------+------+------+---------------------+-------
first | 100 | q1 | 2016-01-01 23:11:11 | 2
first | 100 | q2 | 2016-01-01 23:11:19 | 2
first | 100 | q3 | 2016-01-01 23:11:19 | 2
first | 100 | q1 | 2016-01-01 23:11:19 | 2
first | 100 | q3 | 2016-01-01 23:12:49 | 2
first | 100 | q1 | 2016-01-01 23:12:49 | 2
first | 100 | q2 | 2016-01-01 23:12:49 | 2
first | 200 | t2 | 2016-01-01 23:11:13 | 2
first | 200 | t1 | 2016-01-01 23:11:13 | 2

One option is to join your original table to a table containing only the records with the latest gather_time for each name, col1 group. Then you can take the sum of the value column for each group to get the result set you want.
SELECT t1.name, t1.col1, MAX(t1.gather_time) AS gather_time, SUM(t1.value) AS sum
FROM testtable t1 INNER JOIN
(
SELECT name, col1, col2, MAX(gather_time) AS maxTime
FROM testtable
GROUP BY name, col1, col2
) t2
ON t1.name = t2.name AND t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND
t1.gather_time = t2.maxTime
GROUP BY t1.name, t1.col1
If you wanted to use a subquery in the WHERE clause, as you attempted in your OP, to restrict to only records with the latest gather_time then you could try the following:
SELECT name, col1, gather_time, SUM(value) AS sum
FROM testtable t1
WHERE gather_time =
(
SELECT MAX(gather_time)
FROM testtable t2
WHERE t1.name = t2.name AND t1.col1 = t2.col1
)
GROUP BY name, col1

Related

Join and combine tables to get common rows in a specific column together in Postgres

I have a couple of tables in Postgres database. I have joined and merges the tables. However, I would like to have common values in a specific column to appear together in the final table (In the end, I would like to perform groupby and maximum value calculation on the table).
The schema of the test tables looks like this:
Schema (PostgreSQL v11)
CREATE TABLE table1 (
id CHARACTER VARYING NOT NULL,
seq CHARACTER VARYING NOT NULL
);
INSERT INTO table1 (id, seq) VALUES
('UA502', 'abcdef'), ('UA503', 'ghijk'),('UA504', 'lmnop')
;
CREATE TABLE table2 (
id CHARACTER VARYING NOT NULL,
score FLOAT
);
INSERT INTO table2 (id, score) VALUES
('UA502', 2.2), ('UA503', 2.6),('UA504', 2.8)
;
CREATE TABLE table3 (
id CHARACTER VARYING NOT NULL,
seq CHARACTER VARYING NOT NULL
);
INSERT INTO table3 (id, seq) VALUES
('UA502', 'qrst'), ('UA503', 'uvwx'),('UA504', 'yzab')
;
CREATE TABLE table4 (
id CHARACTER VARYING NOT NULL,
score FLOAT
);
INSERT INTO table4 (id, score) VALUES
('UA502', 8.2), ('UA503', 8.6),('UA504', 8.8);
;
I performed join and union and oepration of the tables to get the desired columns.
Query #1
SELECT table1.id, table1.seq, table2.score
FROM table1 INNER JOIN table2 ON table1.id = table2.id
UNION
SELECT table3.id, table3.seq, table4.score
FROM table3 INNER JOIN table4 ON table3.id = table4.id
;
The output looks like this:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA502 | abcdef | 2.2 |
| UA504 | yzab | 8.8 |
| UA503 | uvwx | 8.6 |
| UA504 | lmnop | 2.8 |
| UA503 | ghijk | 2.6 |
However, the desired output should be:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA502 | abcdef | 2.2 |
| UA504 | yzab | 8.8 |
| UA504 | lmnop | 2.8 |
| UA503 | uvwx | 8.6 |
| UA503 | ghijk | 2.6 |
View on DB Fiddle
How should I modify my query to get the desired output?

How to join 2 tables without value duplication in PostgreSql

I am joining two tables using:
select table1.date, table1.item, table1.qty, table2.anotherQty
from table1
INNER JOIN table2
on table1.date = table2.date
table1
date | item | qty
july1 | itemA | 20
july1 | itemB | 30
july2 | itemA | 20
table2
date | anotherQty
july1 | 200
july2 | 300
Expected result should be:
date | item | qty | anotherQty
july1 | itemA | 20 | 200
july1 | itemB | 30 | null or 0
july2 | itemA | 20 | 300
So that when i sum(anotherQty) it will have 500 only, instead of:
date | item | qty | anotherQty
july1 | itemA | 20 | 200
july1 | itemB | 30 | 200
july2 | itemA | 20 | 300
That is 200+200+300 = 700
SQL DEMO
WITH T1 as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY "date" ORDER BY "item") as rn
FROM Table1
), T2 as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY "date" ORDER BY "anotherQty") as rn
FROM Table2
)
SELECT *
FROM t1
LEFT JOIN t2
ON t1."date" = t2."date"
AND t1.rn = t2.rn
OUTPUT
Filter the columns you want, and change the order if need it.
| date | item | qty | rn | date | anotherQty | rn |
|-------|-------|-----|----|--------|------------|--------|
| july1 | itemA | 20 | 1 | july1 | 200 | 1 |
| july1 | itemB | 30 | 2 | (null) | (null) | (null) |
| july2 | itemA | 20 | 1 | july2 | 300 | 1 |
Try the following code, but know that so long as the qty values differ across rows, that you're going to still get the 'anotherQty' field breaking out into distinct values:
select
table1.date,
table1.item,
table1.qty,
SUM(table2.anotherQty)
from table1
INNER JOIN table2
on table1.date = table2.date
GROUP BY
table1.item,
table1.qty,
table1.date
If you need it to always aggregate down to a single line per item/date, then you will need to add a SUM() to table1.qty as well. Alternately, you could run a common table expression (WITH() statement) for each quantity that you want, summing them within the common table expression, and then rejoining the expressions to your final SELECT statement.
Edit:
Based on the comment from #Juan Carlos Oropeza, I'm not sure that there is a way to get the summed value of 500 while including table1.date in your query, because you will have to group the output by date which will cause the aggregation to split into distinct lines. The following query will get you the sum of anotherQty, at the sacrifice of displaying date:
select
table1.item,
SUM(table1.qty),
SUM(table2.anotherQty)
from table1
INNER JOIN table2
on table1.date = table2.date
GROUP BY
table1.item
If you need date to persist, you can get the sum to show up by using a WINDOW function, but note that this is essentially doing a running sum, and may throw off any subsequent summation you're doing on this query's output in terms of post-processing:
select
table1.item,
table1.date,
SUM(table1.qty),
SUM(table2.anotherQty) OVER (Partition By table1.item)
from table1
INNER JOIN table2
on table1.date = table2.date
GROUP BY
table1.item,
table1.date,
table2.anotherQty

Updating multiple rows with a certain value from the same table

So, I have the next table:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | null |
12:10:00| update | null |
12:15:00| insert | null |
12:20:00| out | null |
12:30:00| access | 2 |
12:35:00| select | null |
The table is bigger (aprox 1-1,5 mil rows) and there will be ID equal to 2,3,4 etc and rows between.
The following should be the result:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | 1 |
12:10:00| update | 1 |
12:15:00| insert | 1 |
12:20:00| out | 1 |
12:30:00| access | 2 |
12:35:00| select | 2 |
What is the most simple method to update the rows without making the log full? Like, one ID at a time.
You can do it with a sub query:
UPDATE YourTable t
SET t.ID = (SELECT TOP 1 s.ID
FROM YourTable s
WHERE s.time < t.time AND s.name = 'access'
ORDER BY s.time DESC)
WHERE t.name <> 'access'
Index on (ID,time,name) will help.
You can do it using CTE as below:
;WITH myCTE
AS ( SELECT time
, name
, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY time ) AS [rank]
, ID
FROM YourTable
)
UPDATE myCTE
SET myCTE.ID = myCTE.rank
SELECT *
FROM YourTable ORDER BY ID

SQL group by distinct array

I have table1:
col1 (integer) | col2 (varchar[]) | col3 (integer)
----------------------------------------------------
1 | {A,B,C} | 2
1 | {A} | 5
1 | {A,B} | 1
2 | {A,B} | 2
2 | {A} | 3
2 | {B} | 1
I want summarize 'col3 ' with a GROUP BY 'col1 ' by keeping only DISTINCT values ​​from 'col3 '
Expected result below :
col1 (integer) | col2 (varchar[]) | col3 (integer)
----------------------------------------------------
1 | {A,B,C} | 8
2 | {A,B} | 6
I tried this :
SELECT col1, array_to_string(array_accum(col2), ','::text),sum(col3) FROM table1 GROUP BY col1
but the result is not the one expected :
col1 (integer) | col2 (varchar[]) | col3 (integer)
---------------------------------------------------------------
1 | {A,B,C,A,A,B} | 8
2 | {A,B,A,B} | 6
do you have any suggestion?
If the logic of which col2 you want is by the largest (like in your expected output is {A,B,C} & {A,B}.
SELECT col1, (SELECT sub.col2
FROM table1 sub
INNER JOIN table1 sub ON MAX(char_length(sub.col2)) = col2
WHERE sub.col1 = col1)
SUM(col3)
FROM table1
GROUP BY col1
SELECT
col1,
array_to_string(array_accum(col2), ','::text),
sum(col3)
FROM table1
GROUP BY col1;
but array_to_string concatenates array elements using supplied delimiter and optional null string.
You have to devise a different strategy like using array_dims(anyarray) to select the array with max elements or create a new aggregation function.
For this you could be interested in this answer:
eliminate duplicate array values in postgres

How to get info about position element in the table?

I have query:
Select * from mytable order by 'date'
And result:
date | item_id | user_id | some_data
------------------------------------------
2015-01-01 | 1 | 1 | null
2015-01-01 | 1 | 1 | null
2015-01-02 | 1 | 1 | null
2015-01-03 | 1 | 1 | null
2015-01-03 | 1 | 2 | null
2015-01-04 | 1 | 1 | null
2015-01-05 | 1 | 2 | null
And I want to get position of first row where user_id = 2. In this example it be 5. How to do it?
select pos_overall
from (
select user_id,
row_number() over (order by "date") as pos_overall,
row_number() over (partition by user_id order by "date") as user_pos
from mytable
) t
where user_id = 2
and user_pos = 1
You can use the row_number() function to number the rows in order of date, user_id and then select the minimum value:
select min(rn)
from (
select
user_id, row_number() over (order by date, user_id) as rn
from mytable
) x
where user_id = 2;
If the item_id can change you might want to include that in the order by clause for the row_number function in the derived table.