Comparing the latest record to previous record in postgresql - postgresql

I have a table in PostgreSQL DB like this:
Client | Rate | StartDate|EndDate
A | 1000 | 2005-1-1 |2005-12-31
A | 2000 | 2006-1-1 |2006-12-31
A | 3000 | 2007-1-1 |2007-12-31
B | 5000 | 2006-1-1 |2006-12-31
B | 8000 | 2008-1-1 |2008-12-31
C | 2000 | 2006-1-1 |2006-12-31
I want to get the latest change, like this table. How?
Client | Rate | StartDate|EndDate |Pre Rate | Pre StartDate |Pre EndDate
A | 3000 | 2007-1-1 |2007-12-31 | 2000 | 2006-1-1 |2006-12-31
B | 8000 | 2008-1-1 |2008-12-31 | 5000 | 2006-1-1 |2006-12-31
C | 2000 | 2006-1-1 |2006-12-31

SELECT DISTINCT ON (Client) Client,
Rate,
StartDate,
EndDate,
LAG(Rate) OVER (PARTITION BY Client
ORDER BY StartDate) AS "Pre Rate",
LAG(StartDate) OVER (PARTITION BY Client
ORDER BY StartDate) AS "Pre StartDate",
LAG(EndDate) OVER (PARTITION BY Client
ORDER BY StartDate) AS "Pre EndDate"
FROM ClientRates
ORDER BY Client,
StartDate DESC;

I can't help thinking there's a simpler way to express this.
with current_start_dates as (
select client, max(startdate) cur_startdate
from client_rates
group by client
),
extended_client_rates as (
select client, rate, startdate, enddate,
lag(rate, 1) over (partition by client order by startdate) prev_rate,
lag(startdate,1) over (partition by client order by startdate) prev_startdate,
lag(enddate,1) over (partition by client order by startdate) prev_enddate
from client_rates
)
select cr.*
from extended_client_rates cr
inner join current_start_dates csd on csd.client = cr.client
and csd.cur_startdate = cr.startdate

Esentially the same as Tim's answer (+1), with some polishing and full script for trying/cheking
CREATE TEMP TABLE client_rates (client VARCHAR, rate INTEGER,
start_date DATE, end_date DATE);
INSERT INTO client_rates VALUES ('A',1000,'2005-1-1','2005-12-31');
INSERT INTO client_rates VALUES ('A',2000,'2006-1-1','2006-12-31');
INSERT INTO client_rates VALUES ('A',3000,'2007-1-1','2007-12-31');
INSERT INTO client_rates VALUES ('B',5000,'2006-1-1','2006-12-31');
INSERT INTO client_rates VALUES ('B',8000,'2008-1-1','2008-12-31');
INSERT INTO client_rates VALUES ('C',2000,'2006-1-1','2006-12-31');
SELECT DISTINCT ON (client) * FROM
(
SELECT client, rate, start_date, end_date,
lag(rate) OVER w1 AS prev_rate,
lag(start_date) OVER w1 AS prev_start_date,
lag(end_date) OVER w1 AS prev_end_date
FROM client_rates
WINDOW w1 AS (PARTITION BY client ORDER BY start_date)
ORDER BY client,start_date desc
) AS foo;
client | rate | start_date | end_date | prev_rate | prev_start_date | prev_end_date
--------+------+------------+------------+-----------+-----------------+---------------
A | 3000 | 2007-01-01 | 2007-12-31 | 2000 | 2006-01-01 | 2006-12-31
B | 8000 | 2008-01-01 | 2008-12-31 | 5000 | 2006-01-01 | 2006-12-31
C | 2000 | 2006-01-01 | 2006-12-31 | | |

Related

How to join 2 tables without value duplication in PostgreSql

I am joining two tables using:
select table1.date, table1.item, table1.qty, table2.anotherQty
from table1
INNER JOIN table2
on table1.date = table2.date
table1
date | item | qty
july1 | itemA | 20
july1 | itemB | 30
july2 | itemA | 20
table2
date | anotherQty
july1 | 200
july2 | 300
Expected result should be:
date | item | qty | anotherQty
july1 | itemA | 20 | 200
july1 | itemB | 30 | null or 0
july2 | itemA | 20 | 300
So that when i sum(anotherQty) it will have 500 only, instead of:
date | item | qty | anotherQty
july1 | itemA | 20 | 200
july1 | itemB | 30 | 200
july2 | itemA | 20 | 300
That is 200+200+300 = 700
SQL DEMO
WITH T1 as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY "date" ORDER BY "item") as rn
FROM Table1
), T2 as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY "date" ORDER BY "anotherQty") as rn
FROM Table2
)
SELECT *
FROM t1
LEFT JOIN t2
ON t1."date" = t2."date"
AND t1.rn = t2.rn
OUTPUT
Filter the columns you want, and change the order if need it.
| date | item | qty | rn | date | anotherQty | rn |
|-------|-------|-----|----|--------|------------|--------|
| july1 | itemA | 20 | 1 | july1 | 200 | 1 |
| july1 | itemB | 30 | 2 | (null) | (null) | (null) |
| july2 | itemA | 20 | 1 | july2 | 300 | 1 |
Try the following code, but know that so long as the qty values differ across rows, that you're going to still get the 'anotherQty' field breaking out into distinct values:
select
table1.date,
table1.item,
table1.qty,
SUM(table2.anotherQty)
from table1
INNER JOIN table2
on table1.date = table2.date
GROUP BY
table1.item,
table1.qty,
table1.date
If you need it to always aggregate down to a single line per item/date, then you will need to add a SUM() to table1.qty as well. Alternately, you could run a common table expression (WITH() statement) for each quantity that you want, summing them within the common table expression, and then rejoining the expressions to your final SELECT statement.
Edit:
Based on the comment from #Juan Carlos Oropeza, I'm not sure that there is a way to get the summed value of 500 while including table1.date in your query, because you will have to group the output by date which will cause the aggregation to split into distinct lines. The following query will get you the sum of anotherQty, at the sacrifice of displaying date:
select
table1.item,
SUM(table1.qty),
SUM(table2.anotherQty)
from table1
INNER JOIN table2
on table1.date = table2.date
GROUP BY
table1.item
If you need date to persist, you can get the sum to show up by using a WINDOW function, but note that this is essentially doing a running sum, and may throw off any subsequent summation you're doing on this query's output in terms of post-processing:
select
table1.item,
table1.date,
SUM(table1.qty),
SUM(table2.anotherQty) OVER (Partition By table1.item)
from table1
INNER JOIN table2
on table1.date = table2.date
GROUP BY
table1.item,
table1.date,
table2.anotherQty

How to query just the last record of every second within a period of time in postgres

I have a table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585'.
I can quite easily to select a period using
SELECT uid, price, unit from prices
WHERE dt > '2017-05-01 00:00:00.000'
AND dt < '2017-05-01 02:59:59.999'
What I can't understand how to select price for every last record in each second. (I also need a very first one of each second too, but I guess it will be a similar separate query). There are some similar example (here), but they did not work for me when I try to adapt them to my needs generating errors.
Could some please help me to crack this nut?
Let say that there is a table which has been generated with a help of this command:
CREATE TABLE test AS
SELECT timestamp '2017-09-16 20:00:00' + x * interval '0.1' second As my_timestamp
from generate_series(0,100) x
This table contains an increasing series of timestamps, each timestamp differs by 100 milliseconds (0.1 second) from neighbors, so that there are 10 records within each second.
| my_timestamp |
|------------------------|
| 2017-09-16T20:00:00Z |
| 2017-09-16T20:00:00.1Z |
| 2017-09-16T20:00:00.2Z |
| 2017-09-16T20:00:00.3Z |
| 2017-09-16T20:00:00.4Z |
| 2017-09-16T20:00:00.5Z |
| 2017-09-16T20:00:00.6Z |
| 2017-09-16T20:00:00.7Z |
| 2017-09-16T20:00:00.8Z |
| 2017-09-16T20:00:00.9Z |
| 2017-09-16T20:00:01Z |
| 2017-09-16T20:00:01.1Z |
| 2017-09-16T20:00:01.2Z |
| 2017-09-16T20:00:01.3Z |
.......
The below query determines and prints the first and the last timestamp within each second:
SELECT my_timestamp,
CASE
WHEN rn1 = 1 THEN 'First'
WHEN rn2 = 1 THEN 'Last'
ELSE 'Somwhere in the middle'
END as Which_row_within_a_second
FROM (
select *,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp
) rn1,
row_number() over( partition by date_trunc('second', my_timestamp)
order by my_timestamp DESC
) rn2
from test
) xx
WHERE 1 IN (rn1, rn2 )
ORDER BY my_timestamp
;
| my_timestamp | which_row_within_a_second |
|------------------------|---------------------------|
| 2017-09-16T20:00:00Z | First |
| 2017-09-16T20:00:00.9Z | Last |
| 2017-09-16T20:00:01Z | First |
| 2017-09-16T20:00:01.9Z | Last |
| 2017-09-16T20:00:02Z | First |
| 2017-09-16T20:00:02.9Z | Last |
| 2017-09-16T20:00:03Z | First |
| 2017-09-16T20:00:03.9Z | Last |
| 2017-09-16T20:00:04Z | First |
| 2017-09-16T20:00:04.9Z | Last |
| 2017-09-16T20:00:05Z | First |
| 2017-09-16T20:00:05.9Z | Last |
A working demo you can find here

postgresql: splitting time period at event

I have a table of country-periods. In some cases, certain country attributes (e.g. the capital) changes on a date within a time period. Here I would like to split the country-period into two new periods, one before and one after this change.
Example:
Country | start_date | end_date | event_date
A | 1960-01-01 | 1999-12-31 | 1994-07-20
B | 1926-01-01 | 1995-12-31 | NULL
Desired output:
Country | start_date | end_date | event_date
A | 1960-01-01 | 1994-07-19 | 1994-07-20
A | 1994-07-20 | 1999-12-31 | 1994-07-20
B | 1926-01-01 | 1995-12-31 | NULL
I considered starting off with generate_series along these lines:
SELECT country, min(p1) as sdate1, max(p1) as sdate2,
min(p2) as sdate2, min(p2) as edate2
FROM
(SELECT country,
generate_series(start_date, (event_date-interval '1 day'), interval '1 day')::date as p1,
generate_series(event_date, end_date, interval '1 day')::date as p2
FROM table)t
GROUP BY country
But these seems way to inefficient and messy. Unfortunately I don't have any experience when it comes to writing functions. Any ideas on how I can solve this?
You can do UNION instead. This way you don't generate unnecessary rows
SELECT country, start_date,
CASE WHEN event_date BETWEEN start_date AND end_date
THEN event_date - 1
ELSE end_date
END AS end_date, event_date
FROM table1
UNION ALL
SELECT country, event_date, end_date, event_date
FROM table1
WHERE event_date BETWEEN start_date AND end_date
ORDER BY country, start_date, end_date, event_date
Here is a SQLFiddle demo
Output:
| country | start_date | end_date | event_date |
|---------|------------|------------|------------|
| A | 1960-01-01 | 1994-07-19 | 1994-07-20 |
| A | 1994-07-20 | 1999-12-31 | 1994-07-20 |
| B | 1926-01-01 | 1995-12-31 | (null) |

Order by created_date if less than 1 month old, else sort by updated_date

SQL Fiddle: http://sqlfiddle.com/#!15/1da00/5
I have a table that looks something like this:
products
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| chair | 50 | 10/12/2016 | 1/4/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| toaster | 20 | 1/9/2017 | 1/9/2017 |
+-----------+-------+--------------+--------------+
I want to order this table in a way where if the product was created less than 30 days those results should show first (and be ordered by the updated date). If the product was created 30 or more days ago I want it to show after (and have it ordered by updated date within that group)
This is what the result should look like:
products - desired results
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| toaster | 20 | 1/9/2017 | 1/9/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| chair | 50 | 10/12/2016 | 1/4/2017 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
+-----------+-------+--------------+--------------+
I've started writing this query:
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
ORDER BY order_index, created_date DESC
but that only bring the rows with created_date less thatn 30 days to the top, and then ordered by created_date. I want to also sort the rows where order_index = 1 by updated_date
Unfortunately in version 9.3 only positional column numbers or expressions involving table columns can be used in order by so order_index is not available to case at all and its position is not well defined because it comes after * in the column list.
This will work.
order by
created_date <= ( current_date - 30 ) , case
when created_date > ( current_date - 30 ) then created_date
else updated_date end desc
Alternatively a common table expression can be used to wrap the result and then that can be ordered by any column.
WITH q AS(
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
)
SELECT * FROM q
ORDER BY
order_index ,
CASE order_index
WHEN 0 THEN created_date
WHEN 1 THEN updated_date
END DESC;
A third approach is to exploit nulls.
order by
case
when created_date > ( current_date - 30 ) then created_date
end desc nulls last,
updated_date desc;
This approach can be useful when the ordering columns are of different types.

how to get data from other table to one table and show it?

table1
no | date |
J001 | 06 June |
table2
no | code | qty | /// AVGprice | Total
J001 | B001 | 5 | /// 1500 | 7500
J001 | B003 | 7 | /// 1000 | 7000
table3 table4
code | name | AVGPrice no | code | Price
B001 | procc | 1500 M001 | B001 | 1000
B002 | motherboard | 2000 M001 | B002 | 2000
B003 | VGA card | 1000 M002 | B001 | 2000
M002 | B003 | 1000
I get AVGprice from this query
select t.code, t.name, t.avg
from (select table3.code, table3.name, (
select avg(table4.price)
from table4
where table4.code=table3.code)as 'avg'
from table3
)as t
result that i can make is
no | date | Info
J001| 06 June | ABCDEFG
with these query
select t.no, t.date, t.info
from (select table1.no, table1.date, 'ABCDEFG' as info
from table1
)as t
result that I want is
no | date | Info | Total
J001| 06 June | ABCDEFG | 14500 --> from sum of Total
I don't know where to put my avg query and how to sum it...
The following should add the subquery you need to pull the average, and I added another column that
gives you the sum of the averages.
select t.no,
t.date,
t.info,
(select avg(table4.price)
from table4
where table4.code=table3.code)as 'avg',
sum(avg)
from (select table1.no, table1.date, 'ABCDEFG' as info
from table1
)as t
group by t.no, t.date, t.info