Selecting value for the latest two distinct columns - postgresql

I am trying to do an SQL which will return the latest data value of the two distinct columns of my table.
Currently, I select distinct the values of the column and afterwards, I iterate through the columns to get the distinct values selected before then order and limit to 1. These tags can be any number and may not always be posted together (one time only tag 1 can be posted; whereas other times 1, 2, 3 can).
Although it gives the expected outcome, this seems to be inefficient in a lot of ways, and because I don't have enough SQL experience, this was so far the only way I found of performing the task...
--------------------------------------------------
| name | tag | timestamp | data |
--------------------------------------------------
| aa | 1 | 566 | 4659 |
--------------------------------------------------
| ab | 2 | 567 | 4879 |
--------------------------------------------------
| ac | 3 | 568 | 1346 |
--------------------------------------------------
| ad | 1 | 789 | 3164 |
--------------------------------------------------
| ae | 2 | 789 | 1024 |
--------------------------------------------------
| af | 3 | 790 | 3346 |
--------------------------------------------------
Therefore the expected outcome is {3164, 1024, 3346}
Currently what I'm doing is:
"select distinct tag from table"
Then I store all the distinct tag values programmatically and iterate programmatically through these values using
"select data from table where '"+ tags[i] +"' in (tag) order by timestamp desc limit 1"
Thanks,

This comes close, but beware if you have two rows with the same tag share a maximum timestamp you will get duplicates in the result set
select data from table
join (select tag, max(timestamp) maxtimestamp from table t1 group by tag) as latesttags
on table.tag = latesttags.tag and table.timestamp = latesttags.maxtimestamp

Related

Reset column with numeric value that represents the order when destroying a row

I have a table of users that has a column called order that represents the order in they will be elected.
So, for example, the table might look like:
| id | name | order |
|-----|--------|-------|
| 1 | John | 2 |
| 2 | Mike | 0 |
| 3 | Lisa | 1 |
So, say that now Lisa gets destroyed, I would like that in the same transaction that I destroy Lisa, I am able to update the table so the order is still consistent, so the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 2 | Mike | 0 |
Or, if Mike were the one to be deleted, the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 3 | Lisa | 0 |
How can I do this in PostgreSQL?
If you are just deleting one row, one option uses a cte and the returning clause to then trigger an update
with del as (
delete from mytable where name = 'Lisa'
returning ord
)
update mytable
set ord = ord - 1
from del d
where mytable.ord > d.ord
As a more general approach, I would really recommend trying to renumber the whole table after every delete. This is inefficient, and can get tedious for multi-rows delete.
Instead, you could build a view on top of the table:
create view myview as
select id, name, row_number() over(order by ord) ord
from mytable

Postgresql Split single row to multiple rows

I'm new to postgresql. I'm getting below results from a query and now I need to split single row to obtain multiple rows.
I have gone through below links, but still couldn't manage it. Please help.
unpivot and PostgreSQL
How to split a row into multiple rows with a single query?
Current result
id,name,sub1code,sub1level,sub1hrs,sub2code,sub2level,sub2hrs,sub3code,sub3level,sub3hrs --continue till sub15
1,Silva,CHIN,L1,12,MATH,L2,20,AGRW,L2,35
2,Perera,MATH,L3,30,ENGL,L1,10,CHIN,L2,50
What we want
id,name,subcode,sublevel,subhrs
1,Silva,CHIN,L1,12
1,Silva,MATH,L2,20
1,Silva,AGRW,L2,35
2,Perera,MATH,L3,30
2,Perera,ENGL,L1,10
2,Perera,CHIN,L2,50
Use union:
select id, 1 as "#", name, sub1code, sub1level, sub1hrs
from a_table
union all
select id, 2 as "#", name, sub2code, sub2level, sub2hrs
from a_table
union all
select id, 3 as "#", name, sub3code, sub3level, sub3hrs
from a_table
order by 1, 2;
id | # | name | sub1code | sub1level | sub1hrs
----+---+--------+----------+-----------+---------
1 | 1 | Silva | CHIN | L1 | 12
1 | 2 | Silva | MATH | L2 | 20
1 | 3 | Silva | AGRW | L2 | 35
2 | 1 | Perera | MATH | L3 | 30
2 | 2 | Perera | ENGL | L1 | 10
2 | 3 | Perera | CHIN | L2 | 50
(6 rows)
The # column is not necessary if you want to get the result sorted by subcode or sublevel.
You should consider normalization of the model by splitting the data into two tables, e.g.:
create table students (
id int primary key,
name text);
create table hours (
id int primary key,
student_id int references students(id),
code text,
level text,
hrs int);

Query on multiple postgres hstores combined with or

This is a hardcoded example of what I'm trying to achieve:
SELECT id FROM places
WHERE metadata->'route'='Route 23'
OR metadata->'route'='Route 22'
OR metadata->'region'='Northwest'
OR metadata->'territory'='Territory A';
Metadata column is an hstore column and I'm wanting to build up the WHERE clause dynamically based on another query from a different table. The table could either be:
id | metadata
---------+----------------------------
1647 | "region"=>"Northwest"
1648 | "route"=>"Route 23"
1649 | "route"=>"Route 22"
1650 | "territory"=>"Territory A"
or
id | key | value
----+-------------+-------+---
1 | route | Route 23
2 | route | Route 22
3 | region | Northwest
4 | territory | Territory A
Doesnt really matter, just whatever works to build up that where clause. It could potentially have 1 to n number of OR's in it based on the other query.
Ended up with a solution using the 2nd table (distribution table):
id | metadata
---------+----------------------------
1647 | "region"=>"Northwest"
1648 | "route"=>"Route 23"
1649 | "route"=>"Route 22"
1650 | "territory"=>"Territory A"
Used the following join, which the #> sees if the places.metadata contains the distributions.metadata
SELECT places.id, places.metadata
FROM places INNER JOIN distributions
ON places.metadata #> distributions.metadata
WHERE distributions.some_other_column = something;

Join column with timestamps where value is maximum

I have a table that looks like
+-------+-----------+
| value | timestamp |
+-------+-----------+
and I'm trying to build a query that gives a result like
+-------+-----------+------------+------------------------+
| value | timestamp | MAX(value) | timestamp of max value |
+-------+-----------+------------+------------------------+
so that the result looks like
+---+----------+---+----------+
| 1 | 1.2.1001 | 3 | 1.1.1000 |
| 2 | 5.5.1021 | 3 | 1.1.1000 |
| 3 | 1.1.1000 | 3 | 1.1.1000 |
+---+----------+---+----------+
but I got stuck on joining the column with the corresponding timestamps.
Any hints or suggestions?
Thanks in advance!
For further information (if that helps):
In the real project the max-values are grouped by month and day (with group by clause, which works btw), but somehow I got stuck on joining the timestamps for max-values.
EDIT
Cross joins are a good idea, but I want to have them grouped by month e.g.:
+---+----------+---+----------+
| 1 | 1.1.1101 | 6 | 1.1.1300 |
| 2 | 2.6.1021 | 5 | 5.6.1000 |
| 3 | 1.1.1200 | 6 | 1.1.1300 |
| 4 | 1.1.1040 | 6 | 1.1.1300 |
| 5 | 5.6.1000 | 5 | 5.6.1000 |
| 6 | 1.1.1300 | 6 | 1.1.1300 |
+---+----------+---+----------+
EDIT 2
I've added a fiddle for some sample data and and example of the current query.
http://sqlfiddle.com/#!1/efa42/1
How to add the corresponding timestamp to the maximum?
Try a cross join with two sub queries, the first one selects all records, the second one gets one row that represents the time_stamp of the max value, <3;"1000-01-01"> for example.
SELECT col_value,col_timestamp,max_col_value, col_timestamp_of_max_value FROM table1
cross join
(
select max(col_value) max_col_value ,col_timestamp col_timestamp_of_max_value from table1
group by col_timestamp
order by max_col_value desc
limit 1
) A --One row that represents the time_stamp of the max value, ie: <3;"1000-01-01">
Use the window cause you use with pg
Select *, max( value ) over (), max( timestamp ) over() from table
That gives you the max values from all values in every row
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

Sum of the most recent non-null columns (window function with "ignore nulls")

I am using PostgreSQL 9.1.9.
In the project I am working on, some most recent records have null columns because that information was not available when that row was created. I have a view that lists the sum of rows that belongs to the members of a group. As of right now, the view shows the sum of the most recent columns, which uses null values if those are the most recent values. For example,
table1
group_name | member
-------------------
group1 | Andy
group1 | Bob
table2
name | stat_date | col1 | col2 | col 3
--------------------------------------
Andy | 6/19/13 | null | 1 | 2
Andy | 6/18/13 | 100 | 3 | 5
Bob | 6/19/13 | 50 | 9 | 12
Bob | 6/18/13 | 111 | 31 | 51
-- creating view would be something like this...
create view v_grouped as
select table1.group_name, stat_date,
sum(col1) as col1_sum, sum(col2) as col2_sum, sum(col3) as col3_sum
from table1
join table2 on table1.member = table2.name
group by table1.group_name, table2.stat_date;
Current view looks like this:
group_name | stat_date | col1_sum | col2_sum | col3_sum
-------------------------------------------------------
group1 | 6/19/13 | 50 | 10 | 14
group2 | 6/18/13 | 211 | 34 | 56
Instead of 50, 150 would be a closer representation of what the actual group total is, despite lack of data for 6/19. So, I want an output of
group_name | stat_date | col1_sum | col2_sum | col3_sum
-------------------------------------------------------
group1 | 6/19/13 | 150 | 10 | 14
group2 | 6/18/13 | 211 | 34 | 56
I've been looking at first_value() from window functions as a possible function to use. I found that Oracle's first_value() supports the ignore nulls option which I believe will do what I want (http://psoug.org/definition/FIRST_VALUE.htm). According to this page I linked, about PL/SQL's first_value() function:
If the first value in the result set is NULL then the function returns NULL unless you specify IGNORE NULLS.
If you use the IGNORE NULLS parameter then FIRST_VALUE will return the first non-null value found in the result set. (If all
values are null then it will return NULL.)
Example Syntax: FIRST_VALUE(expression [INGORE NULLS]) OVER (analytic_clause)
But PostgreSQL's first_value() does not support such an option. Is there a way to do this in PostgreSql? Thank you in advance!
You can use this custom aggregate as a postgres variant of FIRST_VALUE(expression INGORE NULLS). Or build your own aggregate with desired behavior.
Is this what you are trying to describe?
SELECT sum(col1), sum(col2), sum(col3) FROM table2 WHERE col1 IS NOT NULL
(although I omitted the join on table1; that is an exercise for the reader)