Use another table's data in postgreSQL - postgresql

I have a event table and a transaction log.
And I want count each event's total revenue by one sql.
Could anything tell how to do this.
please be ware there will be more than 100,000 logs in transaction table.
event_table:
Event_id | start_date | end_date
------------------------
11111 | 2013-01-04 | 2013-01-05
11112 | 2013-01-08 | 2013-01-10
11113 | 2013-01-11 | 2013-01-12
11114 | 2013-01-15 | 2013-01-18
11115 | 2013-01-19 | 2013-01-21
11116 | 2013-01-22 | 2013-01-24
11117 | 2013-01-26 | 2013-01-29
transaction_log:
id | name | time_created | Cost
------------------------
1 | michael | 2013-01-04 | 1
2 | michael | 2013-01-08 | 4
3 | mary | 2013-01-11 | 5
4 | john | 2013-01-15 | 2
5 | michael | 2013-01-19 | 3
6 | mary | 2013-01-22 | 2
7 | john | 2013-01-26 | 4
I tried to use the sql like following, but it does not work.
select
event_table.id,
( select sum(Cost)
from transaction_log
where date(time_created) between transaction_log.start_date and transaction_log.end_date ) as revenue
from event_table

It is failing because the fields start_date and end_date are from event_table but you're stating them as transaction_log.start_date and transaction_log.end_date. This will work:
select
event_table.id,
( select sum(Cost)
from transaction_log
where date(time_created) between event_table.start_date and event_table.end_date ) as revenue
from event_table
There is no need to cast time_created as date (date(time_created)) if it is already of date data type. Otherwise, if time_created is timestamp or timestamptz, then for performance you may want to consider doing:
select
event_table.id,
( select sum(Cost)
from transaction_log
where time_created >= event_table.start_date::timestamptz and time_created < (event_table.end_date+1)::timestamptz ) as revenue
from event_table
Also for performance, when executing a query like the one above, PostgreSQL is executing a subquery for each row of the main query (in this case the event_table table). Joining and using GROUP BY will generally provide you with better results:
select e.id, sum(l.Cost) as revenue
from event_table e
join transaction_log l ON (l.time_created BETWEEN e.start_date AND e.end_date)
group by e.id

Related

PostgreSQL - Check if column value exists in any previous row

I'm working on a problem where I need to check if an ID exists in any previous records within another ID set, and create a tag if it does.
Suppose I have the following table
| client_id | order_date | supplier_id |
| 1 | 2022-01-01 | 1 |
| 1 | 2022-02-01 | 2 |
| 1 | 2022-03-01 | 1 |
| 1 | 2022-04-01 | 3 |
| 2 | 2022-05-01 | 1 |
| 2 | 2022-06-01 | 1 |
| 2 | 2022-07-01 | 2 |
And I want to create a column with a "is new supplier" tag (for each client):
| client_id | order_date | supplier_id | is_new_supplier|
| 1 | 2022-01-01 | 1 | True
| 1 | 2022-02-01 | 2 | True
| 1 | 2022-03-01 | 1 | False
| 1 | 2022-04-01 | 3 | True
| 2 | 2022-05-01 | 1 | True
| 2 | 2022-06-01 | 1 | False
| 2 | 2022-07-01 | 2 | True
First I tried doing this by creating a dense_rank and filtering out repeated ranks, but it didn't work:
with aux as (SELECT client_id,
order_date,
supplier_id
FROM table)
SELECT *, dense_rank() over (
partition by client_id
order by supplier_id
) as _dense_rank
FROM aux
Another way I thought about doing this, is by creating an auxiliary id with client_id + supplier_id, ordering by date and checking if the aux id exists in any previous row, but I don't know how to do this in SQL.
You are on the right track.
Instead of dense_rank, you can just use row_number and on your partition by add supplier id..
Don't forget to order by order_date
with aux as (SELECT client_id,
order_date,
supplier_id,
row_number() over (
partition by client_id, supplier_id
order by order_date
) as rank
FROM table)
SELECT client_id,
order_date,
supplier_id,
rank,
(rank = 1) as is_new_supplier
FROM aux

Postgres Hierarchy output

im struggling on how to get the correct output using hierarchy query.
I have one table which loads per day all product and its price. during time this can cancel and being activate again.
I believe with oracle we could use the Connect By.
WITH RECURSIVE cte AS (
select min(event_date) event_date, item_code,sum(price::numeric)/1024/1024 price, 1 AS level
from rdpidevdat.raid_r_cbs_offer_accttype_map where product_type='cars' and item_code in ('Renault')
group by item_code
UNION ALL
SELECT e.event_date, e.item_code, e.price, cte.level + 1
from (select event_date, item_code,sum(price::numeric)/1024/1024 price
from rdpidevdat.raid_r_cbs_offer_accttype_map where product_type='cars' and item_code in ('9859')
group by event_date,item_code) e join cte ON e.event_date = cte.event_date and e.item_code = cte.item_code
)
SELECT *
FROM cte where item_code in ('Renault') ;
how do i put an ouput where will have the range of each product during time?
if we have the data:
EVENT_DATE | ITEM_COD| PRICE
20210910 | Renaut | 2500
20210915 | Renaut | 2500
20210920 | Renaut | 2600
20211020 | Renaut | 2900
20220101 | Renaut | 2500
the expected output should be:
-------------------------------------------------
FROM_EVENT_DATE | TO_EVENT_DATE | ITEM_COD| PRICE
20210910 | 20210915 | Renaut | 2500
20210915 | 20210920 | Renaut | 2600
20210920 | 20211020 | Renaut | 2900
20211020 | 20220101 | Renaut | 2500
Thanks in Advance and Regards!
I already found the solution. Using the Lag and lastvalue function. no need to use the hierarchy one.

Computing rolling sums efficiently in PostgreSQL

Supposing I have a set of transactions (purchases) with dates for a set of customers, I want to calculate a rolling x day sum of purchase amount and number of purchases by customer in that same window. I've gotten it to work using a window function, but I have to fill in for dates where the customer did not make any purchases. In so doing, I'm using a Cartesian product. Is there a more efficient approach so that it's more scalable as the number of customers – and time window – increases?
Edit: As noted in the comments, I'm on PostgreSQL v9.3.
Here's sample data (note that some customers may have 0, 1, or multiple purchases on a given date):
| id | cust_id | txn_date | amount |
|----|---------|------------|--------|
| 1 | 123 | 2017-08-17 | 10 |
| 2 | 123 | 2017-08-17 | 5 |
| 3 | 123 | 2017-08-18 | 5 |
| 4 | 123 | 2017-08-20 | 50 |
| 5 | 123 | 2017-08-21 | 100 |
| 6 | 456 | 2017-08-01 | 5 |
| 7 | 456 | 2017-08-01 | 5 |
| 8 | 456 | 2017-08-01 | 5 |
| 9 | 456 | 2017-08-30 | 5 |
| 10 | 456 | 2017-08-01 | 1000 |
| 11 | 789 | 2017-08-15 | 1000 |
| 12 | 789 | 2017-08-30 | 1000 |
And here's the desired output:
| cust_id | txn_date | sum_dly_txns | tot_txns_7d | cnt_txns_7d |
|---------|------------|--------------|-------------|-------------|
| 123 | 2017-08-17 | 15 | 15 | 2 |
| 123 | 2017-08-18 | 5 | 20 | 3 |
| 123 | 2017-08-20 | 50 | 70 | 4 |
| 123 | 2017-08-21 | 100 | 170 | 5 |
| 456 | 2017-08-01 | 1015 | 1015 | 4 |
| 456 | 2017-08-30 | 5 | 5 | 1 |
| 789 | 2017-08-15 | 1000 | 1000 | 1 |
| 789 | 2017-08-30 | 1000 | 1000 | 1 |
Here's SQL that produces the totals as desired:
SELECT *
FROM (
-- One row per day per user
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
),
-- Every possible transaction date for every user
dummydates AS (
SELECT txn_date, uids.cust_id
FROM (
SELECT generate_series(
timestamp '2017-08-01'
,timestamp '2017-08-30'
,interval '1 day')::date
) d(txn_date)
CROSS JOIN (SELECT DISTINCT cust_id FROM daily_txns) uids
),
txns_dummied AS (
SELECT
d.cust_id
,d.txn_date
,COALESCE(sum_dly_txns,0) AS sum_dly_txns
,COALESCE(cnt_dly_txns,0) AS cnt_dly_txns
FROM dummydates d
LEFT JOIN daily_txns dx
ON d.txn_date = dx.txn_date
AND d.cust_id = dx.cust_id
ORDER BY d.txn_date, d.cust_id
)
SELECT
cust_id
,txn_date
,sum_dly_txns
,SUM(COALESCE(sum_dly_txns,0)) OVER w AS tot_txns_7d
,SUM(cnt_dly_txns) OVER w AS cnt_txns_7d
FROM txns_dummied
WINDOW w AS (
PARTITION BY cust_id
ORDER BY txn_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7d moving window
)
ORDER BY cust_id, txn_date
) xfers
WHERE sum_dly_txns > 0 -- Omit dates with no transactions
;
SQL Fiddle
Instead of ROWS BETWEEN 6 PRECEDING AND CURRENT ROW did you want to write RANGE '6 days' PRECEEDING ?
This must be what you are looking for:
SELECT DISTINCT
cust_id
,txn_date
,SUM(amount) OVER (PARTITION BY cust_id, txn_date) sum_dly_txns
,SUM(amount) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
,COUNT(*) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
from transactions
ORDER BY cust_id, txn_date
Edit: Since you are using an old version (I tested the one above on my postgresql 11), the point above will not make much sense so you will need to old-fashioned SQL (that is, witout window functions).
It is a bit less efficient but does a fair job.
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
)
SELECT t1.cust_id, t1.txn_date, t1.sum_dly_txns, SUM(t2.sum_dly_txns), SUM(t2.cnt_dly_txns)
from daily_txns t1
join daily_txns t2 ON t1.cust_id = t2.cust_id and t2.txn_date BETWEEN t1.txn_date - 7 and t1.txn_date
group by t1.cust_id, t1.txn_date, t1.sum_dly_txns
order by t1.cust_id, t1.txn_date

How to check if there's a record in every hour in specified time-frame and then count it?

I'm using PostgreSQL and this is my table measurement_archive:
+-----------+------------------------+------+-------+
| sensor_id | time | type | value |
+-----------+------------------------+------+-------+
| 123 | 2017-11-26 01:53:11+00 | PM25 | 34.32 |
+-----------+------------------------+------+-------+
| 123 | 2017-11-26 02:15:11+00 | PM25 | 32.1 |
+-----------+------------------------+------+-------+
| 123 | 2017-11-26 04:32:11+00 | PM25 | 75.3 |
+-----------+------------------------+------+-------+
I need a query that will take records from specified timeframe (eg. from 2017-01-01 00:00:00 to 2017-12-01 23:59:59) and then check if in every hour there is at least 1 record - if there is, then add 1 to result.
So, if I make that query from 2017-11-26 01:00:00 to 2017-11-26 04:59:59+00 for sensor_id == 123 on above table then the result should be 3.
select count(*)
from (
select date_trunc('hour', time) as time
from measurement_archive
where
time >= '2017-11-26 01:00:00' and time < '2017-11-26 05:00:00'
and
sensor_id = 123
group by 1
) s
alternative solution would be using distinct,
select count(*) from (select distinct a, extract(hour from time) from t where time >'2017-11-26 01:00:11' and time <'2017-11-26 05:00:00' and sensor_id=123)t;

Order by created_date if less than 1 month old, else sort by updated_date

SQL Fiddle: http://sqlfiddle.com/#!15/1da00/5
I have a table that looks something like this:
products
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| chair | 50 | 10/12/2016 | 1/4/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| toaster | 20 | 1/9/2017 | 1/9/2017 |
+-----------+-------+--------------+--------------+
I want to order this table in a way where if the product was created less than 30 days those results should show first (and be ordered by the updated date). If the product was created 30 or more days ago I want it to show after (and have it ordered by updated date within that group)
This is what the result should look like:
products - desired results
+-----------+-------+--------------+--------------+
| name | price | created_date | updated_date |
+-----------+-------+--------------+--------------+
| toaster | 20 | 1/9/2017 | 1/9/2017 |
| microwave | 100 | 1/3/2017 | 1/4/2017 |
| computer | 1000 | 12/28/2016 | 1/1/2017 |
| chair | 50 | 10/12/2016 | 1/4/2017 |
| TV | 500 | 12/1/2016 | 1/2/2017 |
| desk | 100 | 11/4/2016 | 12/27/2016 |
+-----------+-------+--------------+--------------+
I've started writing this query:
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
ORDER BY order_index, created_date DESC
but that only bring the rows with created_date less thatn 30 days to the top, and then ordered by created_date. I want to also sort the rows where order_index = 1 by updated_date
Unfortunately in version 9.3 only positional column numbers or expressions involving table columns can be used in order by so order_index is not available to case at all and its position is not well defined because it comes after * in the column list.
This will work.
order by
created_date <= ( current_date - 30 ) , case
when created_date > ( current_date - 30 ) then created_date
else updated_date end desc
Alternatively a common table expression can be used to wrap the result and then that can be ordered by any column.
WITH q AS(
SELECT *,
CASE
WHEN created_date > NOW() - INTERVAL '30 days' THEN 0
ELSE 1
END AS order_index
FROM products
)
SELECT * FROM q
ORDER BY
order_index ,
CASE order_index
WHEN 0 THEN created_date
WHEN 1 THEN updated_date
END DESC;
A third approach is to exploit nulls.
order by
case
when created_date > ( current_date - 30 ) then created_date
end desc nulls last,
updated_date desc;
This approach can be useful when the ordering columns are of different types.