I have a table called #TimeAtHome. It includes an address, the date and a flag atHome to indicate if the person was at home that day. I need to capture the min and max date for every grouping the person is not at home (0) for each address.
Here is some sample code:
create table #TimeAtHome (
[address] varchar(100),
[date] date,
[atHome] bit
)
insert into #TimeAtHome
values ('123 ABC Street', '2020-01-01', '1'),
('123 ABC Street', '2020-01-02', '1'),
('123 ABC Street', '2020-01-03', '0'),
('123 ABC Street', '2020-01-04', '0'),
('123 ABC Street', '2020-01-05', '0'),
('123 ABC Street', '2020-01-06', '0'),
('123 ABC Street', '2020-01-07', '1'),
('123 ABC Street', '2020-01-08', '0'),
('123 ABC Street', '2020-01-09', '0'),
('123 ABC Street', '2020-01-10', '1'),
('777 Hello Ct', '2020-01-01', '1'),
('777 Hello Ct', '2020-01-02', '1'),
('777 Hello Ct', '2020-01-03', '1'),
('777 Hello Ct', '2020-01-04', '0'),
('777 Hello Ct', '2020-01-05', '1'),
('777 Hello Ct', '2020-01-06', '1')
Here is my desired outcome:
I suppose I used a simpler solution as this seems like a gaps and islands problem. So I used the LAG() function to find where the islands start and end based on the AtHome flag. Then I used the SUM() function to create a group and aggregated the dates from there:
SELECT Address,Min(Date) minDate, Max(date) maxDate
FROM
(
SELECT *, SUM(CASE WHEN AtHome <> PrevAtHome THEN 1 ELSE 0 END) OVER(PARTITION BY Address order by date) Grp
FROM(
SELECT *, LAG(ATHome,1,AtHome) OVER(PARTITION BY address order by date) PrevAtHome
from #TimeAtHome
) T
) Final
WHERE Athome = 0
GROUP BY Address,Grp
ORDER BY Address
We can try the following:
Get all minDate values joining the table with itself and checking if the current date the person is at home and not at home on the next date (will be subquery 1).
Get all maxDate the same way as in point 1 except checking that the person is back next date (subquery 2).
Match for each address first minDate with first maxDate, second minDate with second maxDate, and so on (join subquery 1 and 2).
SELECT q1.address,
q1.minDate,
q2.maxDate
FROM (
SELECT ROW_NUMBER() OVER(
PARTITION BY t2.address
ORDER BY t2.date
) as row,
t2.address,
t2.date as minDate
FROM #TimeAtHome t1 inner join #TimeAtHome t2 ON t1.address = t2.address and t1.date = DATEADD(DAY, -1, t2.date)
WHERE t1.atHome = 1
AND t2.atHome = 0
) q1
INNER JOIN (
SELECT ROW_NUMBER() OVER(
PARTITION BY t1.address
ORDER BY t1.date
) as row,
t1.address,
t1.date as maxDate
FROM #TimeAtHome t1 INNER JOIN #TimeAtHome t2 ON t1.address = t2.address and t1.date = DATEADD(DAY, -1, t2.date)
WHERE t1.atHome = 0
AND t2.atHome = 1
) q2 ON q1.address = q2.address
AND q1.row = q2.row
Please note the constraints for this query
Dates in the table are expected to be continuous so to find the next record in the table we can simply subtract a day t1.date = DATEADD(DAY, -1, t2.date).
The person starts at home so the first minDate when he goes out matches with the first maxDate when he is back.
The query is like this, Cte1 is used to get a full view of data that would be used in the next step. Cte2 is used to find mindate, Cte3 is used to get maxDate, and Rank func is used to join at the end
;WITH cte1
AS
(
SELECT *,
LEAD(date) OVER (PARTITION BY address ORDER BY date) AS nextDate,
LEAD(atHome) OVER (PARTITION BY address ORDER BY date) AS NextAtHome
FROM #TimeAtHome
--ORDER BY address, date
),
CTE2 AS
(
SELECT
address,
cte1.nextDate AS minDate,
ROW_NUMBER() OVER (ORDER BY cte1.address , cte1.date) AS R1
FROM cte1
WHERE cte1.atHome = 1 AND cte1.NextAtHome = 0
),
CTE3 AS
(
SELECT
address,
date AS maxDate,
ROW_NUMBER() OVER (ORDER BY cte1.address, cte1.date) AS R2
FROM cte1
WHERE cte1.atHome = 0 AND cte1.NextAtHome = 1
)
SELECT CTE2.address,CTE2.minDate,CTE3.maxDate
FROM cte2
INNER JOIN cte3 ON cte2.R1 = Cte3.R2
Yet another possibility:
create table #TimeAtHome (
[address] varchar(100),
[date] date,
[atHome] bit
)
insert into #TimeAtHome
values ('123 ABC Street', '2020-01-01', '1'),
('123 ABC Street', '2020-01-02', '1'),
('123 ABC Street', '2020-01-03', '0'),
('123 ABC Street', '2020-01-04', '0'),
('123 ABC Street', '2020-01-05', '0'),
('123 ABC Street', '2020-01-06', '0'),
('123 ABC Street', '2020-01-07', '1'),
('123 ABC Street', '2020-01-08', '0'),
('123 ABC Street', '2020-01-09', '0'),
('123 ABC Street', '2020-01-10', '1'),
('777 Hello Ct', '2020-01-01', '1'),
('777 Hello Ct', '2020-01-02', '1'),
('777 Hello Ct', '2020-01-03', '1'),
('777 Hello Ct', '2020-01-04', '0'),
('777 Hello Ct', '2020-01-05', '1'),
('777 Hello Ct', '2020-01-06', '1')
SELECT dt.address,
MIN(dt.Dt) AS minDate,
MAX(dt.Dt) AS maxDate
FROM (
SELECT address,
t.Date AS Dt,
DATEDIFF(D, ROW_NUMBER() OVER(partition by t.address ORDER BY t.Date),
t.Date) AS DtRange
FROM #TimeAtHome t
WHERE t.atHome = 0
) AS dt
GROUP BY dt.address, dt.DtRange
ORDER BY address, minDate;
HERE IS ANOTHER WAY
SELECT
*
FROM
(
SELECT
*
,ROW_NUMBER() OVER(PARTITION BY address order by date) PrevAtHome_A
,ROW_NUMBER() OVER(PARTITION BY address order by date DESC) PrevAtHome_D
from #TimeAtHome
WHERE AtHome = 0
)A
WHERE PrevAtHome_A =1 OR PrevAtHome_D =1
ORDER BY [address], [date]
Related
I have the following table (with different currencies):
date
currency
ex_rate
30/11/2020 00.00
USD
0.8347245409015025
27/11/2020 00.00
USD
0.8387854386847845
26/11/2020 00.00
USD
0.84033613445378152
As you can see, there is some missing data for two dates. I would like to fill it with the previous available date, so it would be like this:
date
currency
ex_rate
30/11/2020 00.00
USD
0.8347245409015025
29/11/2020 00.00
USD
0.8387854386847845
28/11/2020 00.00
USD
0.8387854386847845
27/11/2020 00.00
USD
0.8387854386847845
26/11/2020 00.00
USD
0.84033613445378152
Or redirect me to a question of the same kind
You can use generate_series to build the series of dates from the earlierst and later values available in the table, then bring the corresponding rows with a lateral join:
select d.dt, 'USD', t.ex_rate
from (
select generate_series(min(date), max(date), interval '1 day') as dt
from mytable
where currency = 'USD'
) d
cross join lateral (
select t.*
from mytable t
where currency = 'USD' and t.date <= d.dt
order by t.date desc limit 1
) t
I wonder whether a left join on date equality and then some window function technique to build groups of records might be more efficient:
select dt, 'USD', max(ex_rate) over(partition by grp) as ex_rate
from (
select d.*, t.*,
count(t.date) over(order by d.dt) as grp
from (
select generate_series(min(date), max(date), interval '1 day') as dt
from mytable
where currency = 'USD'
) d
left join mytable t on currency = 'USD' and t.date = d.dt
) t
Note that this can easily be generalized to handle all currencies at once:
select dt, currency, max(ex_rate) over(partition by currency, grp) as ex_rate
from (
select d.dt, c.currency, t.ex_rate,
count(t.date) over(partition by c.currency order by d.dt) as grp
from (select distinct currency from mytable) c
cross join (
select generate_series(min(date), max(date), interval '1 day') as dt
from mytable
) d
left join mytable t on t.currency = c.currency and t.date = d.dt
) t
everyone. I am a beginner of Postgresql. Recently I met with one question.
I have one table named 'sales'.
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
);
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
insert into sales values ('Emily', 'Soap', 2, 4, 2002, 'CT', 2549);
insert into sales values ('Bloom', 'Eggs', 30, 11, 2000, 'NJ', 559);
....
There are 498 rows in total.
Here is the overview of this table:
Now I want to compute the maximum and minimum sales quantities for each product, along with their corresponding customer (who purchased the product), dates (i.e., dates of those maximum and minimum sales quantities) and the state in which the sale transaction took place.
And the average sales quantity for the corresponding products.
The combined one should be like this:
It should have 10 rows because there are 10 distinct products in total.
I have tried:
select prod,
max(quant),
cust as MAX_CUST
from sales
group by prod;
but it returned an error and said the cust should be in the group by. But I only want to classify by the type of product.
What's more, how can I horizontally combine the max_q and its customer, date, state with min_q and its customer, date, state and also the AVG_Q by their product name?
I feel really confused!
You can use analytic function ROW_NUMBER to rank records by increasing/decreasing sales for each product in a subquery, and then do conditional aggregation:
SELECT
prod product,
MAX(CASE WHEN rn2 = 1 THEN quant END) max_quant,
MAX(CASE WHEN rn2 = 1 THEN cust END) max_cust,
MAX(CASE WHEN rn2 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) max_date,
MAX(CASE WHEN rn2 = 1 THEN state END) max_state,
MAX(CASE WHEN rn1 = 1 THEN quant END) min_quant,
MAX(CASE WHEN rn1 = 1 THEN cust END) min_cust,
MAX(CASE WHEN rn1 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) min_date,
MAX(CASE WHEN rn1 = 1 THEN state END) min_state,
avg_quant
FROM (
SELECT
s.*,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant) rn1,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant DESC) rn2,
AVG(quant) OVER(PARTITION BY prod) avg_quant
FROM sales s
) x
WHERE rn1 = 1 OR rn2 = 1
GROUP BY prod, avg_quant
With two aggregate function (min, max) applied on a column and selecting respective row is not that straight forward. if u wanted only one aggregate function u could do something like example below with dense rank (window function).
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2;
this will give you rows for a product with maximum quant. you can do same for minimum quant. it will more complicated to do both in same query, you can do it in simple way of creating on the fly tables for each case and joining them as show below.
with max_quant as (
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2
),
min_quant as (
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2
),
avg_quant as (
select prod, avg(quant) as avg_quant from sales group by prod
)
select mx.prod, mx.quant, mx.cust, mn.quant, mn.cust, ag.avg_quant
from max_quant mx
join min_quant mn on mn.prod = mx.prod
join avg_quant ag on ag.prod = mx.prod;
you cant use a group by to select min/max here as you want to get the complete row for the min/max value of quant which is not possible directly with group by.
I just begin learning Postgresql recently.
I have a table named 'sales':
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
)
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
......
It looks like this:
And there are 500 rows in total.
Now I want to use the query to implement this:
For each combination of customer and product, output the maximum sales quantities for
NY and minimum sales quantities for NJ and CT in 3 separate columns. Like the first
report, display the corresponding dates (i.e., dates of those maximum and minimum sales
quantities). Furthermore, for CT and NJ, include only the sales that occurred after 2000;
for NY, include all sales.
It should be like this:
I have tried the following query:
SELECT
cust customer,
prod product,
MAX(CASE WHEN rn3 = 1 THEN quant END) NY_MAX,
MAX(CASE WHEN rn3 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date,
MIN(CASE WHEN rn2 = 1 THEN quant END) NJ_MIN,
MIN(CASE WHEN rn2 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date,
MIN(CASE WHEN rn1 = 1 THEN quant END) CT_MIN,
MIN(CASE WHEN rn1 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) date
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant) rn1,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant) rn2,
ROW_NUMBER() OVER(PARTITION BY cust, prod ORDER BY quant DESC) rn3
FROM sales
) x
WHERE rn1 = 1 OR rn2 = 1 or rn3 = 1
GROUP BY cust, prod;
This is the result:
This is wrong because it shows me the maximum number and minimum number of all states, not of the specific state I want. And I have no idea how to deal with the year as the question as me to do.
We can handle this using separate CTEs along with a calendar table:
WITH custprod AS (
SELECT DISTINCT cust, prod
FROM sales
),
ny_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant DESC) rn
FROM sales
WHERE state = 'NY'
),
nj_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant) rn
FROM sales
WHERE state = 'NJ'
),
ct_sales AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust, prod ORDER BY quant) rn
FROM sales
WHERE state = 'CT'
)
SELECT
cp.cust,
cp.prod,
nys.quant AS ny_max,
nys.year::text || '-' || nys.month::text || '-' || nys.day::text AS ny_date,
njs.quant AS nj_max,
njs.year::text || '-' || njs.month::text || '-' || njs.day::text AS nj_date,
cts.quant AS ct_max,
cts.year::text || '-' || cts.month::text || '-' || cts.day::text AS ct_date
FROM custprod cp
LEFT JOIN ny_sales nys
ON cp.cust = nys.cust AND cp.prod = nys.prod AND nys.rn = 1
LEFT JOIN nj_sales njs
ON cp.cust = njs.cust AND cp.prod = njs.prod AND njs.rn = 1
LEFT JOIN ct_sales cts
ON cp.cust = cts.cust AND cp.prod = cts.prod AND cts.rn = 1
ORDER BY
cp.cust,
cp.prod;
Note: You didn't provide comprehensive sample data, but the above seems to be working in the demo link below.
Demo
Here is a Snippet of my data :-
customers order_id order_date order_counter
1 a 1/1/2018 1
1 b 1/4/2018 2
1 c 3/8/2018 3
1 d 4/9/2019 4
I'm trying to get the average number of days between the order time for each customer. So for the following Snippet the average number of days should be 32.66 days as there were 3,62,32 number of days between each order, sum it, and then divide by 3.
My data has Customers that may have more than 100+ orders .
You could use LAG function:
WITH cte AS (
SELECT customers,order_date-LAG(order_date) OVER(PARTITION BY customers ORDER BY order_counter) AS d
FROM t
)
SELECT customers, AVG(d)
FROM cte
WHERE d IS NOT NULL
GROUP BY customers;
db<>fiddle demo
With a self join, group by customer and get the average difference:
select
t.customers,
round(avg(tt.order_date - t.order_date), 2) averagedays
from tablename t inner join tablename tt
on tt.customers = t.customers and tt.order_counter = t.order_counter + 1
group by t.customers
See the demo.
Results:
| customers | averagedays |
| --------- | ----------- |
| 1 | 32.67 |
Please check below query.
I tried to insert data of two customers so that we can check that average for every customer is coming correct.
DB Fiddle Example: https://www.db-fiddle.com/
CREATE TABLE test (
customers INTEGER,
order_id VARCHAR(1),
order_date DATE,
order_counter INTEGER
);
INSERT INTO test
(customers, order_id, order_date, order_counter)
VALUES
('1', 'a', '2018-01-01', '1'),
('1', 'b', '2018-01-04', '2'),
('1', 'c', '2018-03-08', '3'),
('1', 'd', '2018-04-09', '4'),
('2', 'a', '2018-01-01', '1'),
('2', 'b', '2018-01-06', '2'),
('2', 'c', '2018-03-12', '3'),
('2', 'd', '2018-04-15', '4');
commit;
select customers , round(avg(next_order_diff),2) as average
from
(
select customers , order_date , next_order_date - order_date as next_order_diff
from
(
select customers ,
lead(order_date) over (partition by customers order by order_date) as next_order_date , order_date
from test
) a
where next_order_date is not null
) a
group by customers
order by customers
;
Another option. I would myself like the answer from #forpas except that it depends on the monotonically increasing value for order_counter (what happens when an order is deleted). The following accounts for that by actually counting the number of order pairs. It also accounts for customers have places only 1 order, returning NULL as the average.
select customers, round(sum(nd)::numeric/n, 2) avg_days_to_order
from (
select customers
, order_date - lag(order_date) over(partition by customers order by order_counter) nd
, count(*) over (partition by customers) - 1 n
from test
)d
group by customers, n
order by customers;
I have been trying all afternoon to try and achieve this with no success.
I have a db in with info on customers and the date that they purchase products from the store. It is grouped by a batch ID which I have converted into a date format.
So in my table I now have:
CustomerID|Date
1234 |2011-10-18
1234 |2011-10-22
1235 |2011-11-16
1235 |2011-11-17
What I want to achieve is to see the number of days between the most recent purchase and the last purchase and so on.
For example:
CustomerID|Date |Outcome
1234 |2011-10-18 |
1234 |2011-10-22 | 4
1235 |2011-11-16 |
1235 |2011-11-17 | 1
I have tried joining the table to itself but the problem I have is that I end up joining in the same format. I then tried with my join statement to return where it did <> match date.
Hope this makes sense, any help appreciated. I have searched all the relevant topics on here.
Will there be multiple groups of CustomerID? Or only and always grouped together?
DECLARE #myTable TABLE
(
CustomerID INT,
Date DATETIME
)
INSERT INTO #myTable
SELECT 1234, '2011-10-14' UNION ALL
SELECT 1234, '2011-10-18' UNION ALL
SELECT 1234, '2011-10-22' UNION ALL
SELECT 1234, '2011-10-26' UNION ALL
SELECT 1235, '2011-11-16' UNION ALL
SELECT 1235, '2011-11-17' UNION ALL
SELECT 1235, '2011-11-18' UNION ALL
SELECT 1235, '2011-11-19'
SELECT CustomerID,
MIN(date),
MAX(date),
DATEDIFF(day,MIN(date),MAX(date)) Outcome
FROM #myTable
GROUP BY CustomerID
SELECT a.CustomerID,
a.[Date],
ISNULL(DATEDIFF(DAY, b.[Date], a.[Date]),0) Outcome
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) A
LEFT JOIN
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) B ON a.CustomerID = b.CustomerID AND A.Row = B.Row + 1