Finding Detail Based on date in SSRS - date

I have a Table that I am using to pull order details in SSRS that has when the price of a product number was changed. It has Data Changed and Updated Cost.
I am pairing up two different tables to create a report that is the cost of the package at the time of the order. Here is how I am pulling my data:
SELECT
WAREHOUSE.ActPkgCostHist.ItemNo AS [ActPkgCostHist ItemNo]
,WAREHOUSE.ActPkgCostHist.ActPkgCostDate
,WAREHOUSE.ActPkgCostHist.ActPkgCost
,ORDER.OrderHist.OrderNo
,ORDER.OrderHist.ItemNo AS [OrderHist ItemNo]
,ORDER.OrderHist.DispenseDt
FROM
WAREHOUSE.ActPkgCostHist
INNER JOIN ORDER.OrderHist
ON WAREHOUSE.ActPkgCostHist.ItemNo = ORDER.OrderHist.ItemNo
Catalog=ShippedOrders
ActPkgCostHist Table has What the cost of an Item was and what date the cost was changed.
OrderHist Table has the complete details of the order except the ActPkgCost at the time of the purchase.
I am attempting to create a table that Has order number, the date of the order and the package cost at the time of the order.

The ROW_NUMBER function is very useful for cases like this.
SELECT WAREHOUSE.ActPkgCostHist.ItemNo AS [ActPkgCostHist ItemNo]
,WAREHOUSE.ActPkgCostHist.ActPkgCostDate
,WAREHOUSE.ActPkgCostHist.ActPkgCost
,ORDER.OrderHist.OrderNo
,ORDER.OrderHist.ItemNo AS [OrderHist ItemNo]
,ORDER.OrderHist.DispenseDt
FROM ORDER.OrderHist
INNER JOIN (
SELECT ItemNo, ActPkgCostDate, ActPkgCost
, ROW_NUMBER() OVER (PARTITION BY ItemNo ORDER BY ActPkgCostDate DESC) as RN
FROM WAREHOUSE.ActPkgCostHist
--if there are future dated changes, limit ActPkgCostDate to be <= the current date
) ActPkgCostHist on ActPkgCostHist.ItemNo = OrderHist.ItemNo
WHERE RN = 1
What this subquery does is group the cost history by ItemNo. Then for each one, it ranks the changes by recency with the most recent change being 1. Then in the main query you filter it to just rows with a 1.

For each item in each order you have to find the latest cost date and use it when joining with the cost table
SELECT C.ItemNo AS [ActPkgCostHist ItemNo],
C.ActPkgCostDate,
C.ActPkgCost,
O.OrderNo,
O.ItemNo AS [OrderHist ItemNo],
O.DispenseDt
FROM WAREHOUSE.ActPkgCostHist AS C
-- JOIN order detail with cost table in order to define the cost date per item/order
INNER JOIN (SELECT Max(CH.ActPkgCostDate) AS ItemCostDate,
OH.OrderNo,
OH.ItemNo,
OH.DispenseDt
FROM WAREHOUSE.ActPkgCostHist AS CH
INNER JOIN ORDER.OrderHist AS OH
ON CH.ItemNo = OH.ItemNo
-- Get the latest cost date only from dates before order date
WHERE CH.ActPkgCostDate <= OH.DispenseDt
GROUP BY OH.OrderNo,
OH.ItemNo,
OH.DispenseDt) AS O
ON C.ItemNo = O.ItemNo
AND C.ActPkgCostDate = O.ItemCostDate

Related

subquery problem - need to get avg of a sum

I have 2 tables
sales table
weekly sales, store, date
store table
store, type, size
my sales table has multiple years, multiple stores and multiple types. I'm trying to get the avg sales by sqft for each store type per year. I have a sub query that shows the sales by sqft for each store but Im having trouble then rolling it up into my main query to get the avg by type
Anything jumps out with my final query?
SELECT
date_part('year', sales.date) AS year,
stores.type,
AVG(sales_by_sqft)
FROM
(SELECT
SUM((sales.weekly_sales)/stores.size) AS sales_by_sqft
FROM SALES
INNER JOIN stores ON sales.store = stores.store
GROUP BY sales.store) AS sq
FROM sales
INNER JOIN stores ON sales.store = stores.store
WHERE date_part('year', date) = 2012
GROUP BY year, stores.type;
getting a syntax error on the second FROM statement
I figured it out. AVG doesn't work on money. Once I changed that data type to integer, it all fell in place
SELECT
year,
type,
ROUND(AVG(sales_by_sqft),2)AS avg_sales_by_sqft
FROM
(SELECT
date_part('year', sales.date) AS year,
stores.type,
sales.store,
stores.size,
SUM(sales.weekly_sales) AS total_sales,
SUM(sales.weekly_sales)/ AVG(stores.size) AS sales_by_sqft
FROM sales
INNER JOIN stores ON sales.store = stores.store
GROUP BY year, stores.type, sales.store, stores.size) AS sq
GROUP BY 1,2
ORDER BY 1,3 DESC;

How to get the MAX(SUM of values) to find the category with the biggest total? PostgreSQL

I have two tables. One is Transactions and the other is Tickets. In Tickets I have the Ticket_Number,the name of the Category(Theater,Cinema,Concert), the Price of the Ticket. In Transactions I also have the Ticket_Number. What i want to do is to Get a SUM of money for each Category, and then with that data I want to Select the Category with the most money.
I already managed to get the SUM for each category but I am stuck here
SELECT category, SUM (Tickets.Price) AS Price
FROM Tickets,Transactions
WHERE Tickets.ticket_num=Transactions.ticket_num
GROUP BY Category
ORDER BY Price DESC;
I know i can add LIMIT 1 but I know it's not correct because 2 or more values can be the same
Using ROW_NUMBER to generate a sequence based on the sum of the price. Then, restrict to only the matching aggregated row with the highest total price.
WITH cte AS (
SELECT category, SUM(t1.Price) AS Price,
ROW_NUMBER() OVER (ORDER BY SUM(t1.Price) DESC) rn
FROM Tickets t1
INNER JOIN Transactions t2
ON t1.ticket_num = t2.ticket_num
GROUP BY Category
)
SELECT category, Price
FROM cte
WHERE rn = 1
ORDER BY Price DESC;
Note that if you want to capture all categories tied for the highest price, should a tie occur, then replace ROW_NUMBER in the above CTE with RANK, keeping everything else the same.
What you are looking for is a window function DENSE_RANK() which will handle ties properly.
RANK() will also work for your case, but if you would like to extend it to get TOP N places with ties (where N > 1), dense rank is the way to go.
SELECT Category, Price
FROM (
SELECT
Category,
SUM(ti.Price) AS Price,
DENSE_RANK() OVER (ORDER BY SUM(ti.Price) DESC) AS rnk
FROM Tickets ti
INNER JOIN Transactions tr ON
ti.ticket_num = tr.ticket_num
GROUP BY Category
) t
WHERE rnk = 1
I've also replaced the old style and not recommended joining of tables as comma separated list in FROM clause to a proper INNER JOIN clause and assigned aliases to tables.
You can use rank() to rank the sums of the prices, more expensive first.
SELECT category,
price
FROM (SELECT category,
sum(tickets.price) price,
rank() OVER (ORDER BY sum(tickets.price) DESC) r
FROM tickets
INNER JOIN transactions
ON transactions.ticket_num = tickets.ticket_num
GROUP BY category) x
WHERE r = 1;
I also took the liberty to rewrite your join from the ancient comma style to a modern, clearer version.

How to join on closest date in Postgresql

Suppose, I have following tables
product_prices
product|price|date
-------+-----+----------
apple |10 |2014-03-01
-------+-----+----------
apple |20 |2014-05-02
-------+-----+----------
egg |2 |2014-03-03
-------+-----+----------
egg |4 |2015-10-12
purchases:
user|product|date
----+-------+----------
John|apple |2014-03-02
----+-------+----------
John|apple |2014-06-03
----+-------+----------
John|egg |2014-08-13
----+-------+----------
John|egg |2016-08-13
What I need is table similar to this:
name|product|purchase date |price date|price
----+-------+--------------+----------+-----
John|apple |2014-03-02 |2014-03-01|10
----+-------+--------------+----------+-----
John|apple |2014-06-03 |2014-05-02|20
----+-------+--------------+----------+-----
John|egg |2014-08-13 |2014-08-13|2
----+-------+--------------+----------+-----
John|egg |2016-08-13 |2015-10-12|4
Or "what is the price for product at this day". Where price is calculated based on date from products table.
On real DB I tried to use something similar to:
SELECT name, product, pu.date, pp.date, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp
ON pu.date = (
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1);
But I keep either getting only left part of table (with (null) instead of product dates and prices) or many rows with all the combinations of prices and dates.
I would suggest changing product_prices table to use a daterange column instead (or at least a start_date and an end_date).
You can use an exclusion constraint to make sure you never have overlapping ranges for one product and an insert trigger that "closes" the "current" prices and creates a new unbounded range for the newly inserted price.
A daterange can efficiently be indexed and with that in place the query gets as easy as:
SELECT name, product, pu.date, pp.valid_during, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp ON pu.date <# pp.valid_during
(assuming the range column is named valid_during)
The exclusion constraint would only work however if the product was an integer (not a varchar) - but I guess your real product_purchases table uses a foreign key to some product table anyway (which is an integer).
The new table definitions could look something like this:
create table purchase_prices
(
product_id integer not null references products,
price numeric(16,4) not null,
valid_during daterange not null
);
And the constraint that prevents overlapping ranges:
alter table purchase_prices
add constraint check_price_range
exclude using gist (product_id with =, valid_during with &&);
The constraint needs the btree_gist extension.
As always improving query speed comes with a price and in this case it's the higher maintenance costs for the GiST index. You would need to run some tests to see if the easier (and most probably much faster) query outweighs the slower insert performance on purchase_prices.
Look at your scalar sub-query very closely. It is not correlated back to the outer query. In other words, it will return the same result every time: the latest date in the product_prices table. Period. Think about the query out of context:
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1
There are two problems with it:
It will return 2015-10-12 for every row in the join and ultimately, nothing was purchased on that date, hence, null.
Your approximation of closest is that the dates are equal. Unless you have a product_prices row for every product for every single date, you'll always have misses. "Closest" implies distance and ranking.
WITH close_prices_by_purchase AS (
SELECT
p.user,
p.product,
p.date pp.date,
pp.price,
row_number() over (partition by pp.product, order by pp.date desc) as distance -- calculate distance between purchase date and price date
FROM purchases AS p
INNER JOIN product_prices AS pp on pp.product = p.product
WHERE pp.date < p.date
)
SELECT user as name, product, pu.date as purchase_date, pp.date as price_date, price
FROM close_prices_by_purchase AS cpbp
WHERE distance = 1; -- shortest distance
You can try something like this, although I am sure there's a better way:
with diffs as (
select
a.*,
b."date" as bdate,
b.price,
b."date" - a."date" as diffdays,
row_number() over (
partition by "user", a."product", a."date"
order by "user", a."product", a."date", b."date" - a."date" desc
) as sr
from purchases a
inner join product_prices b on a.product = b.product
where b."date" - a."date" < 1
)
select
"user" as "name",
product,
"date" as "purchase date",
bdate as "price date",
price
from diffs
where sr = 1
Example: https://www.db-fiddle.com/f/dwQ9EXmp1SdpNpxyV1wc6M/0
Explanation
I attempted to join both tables and find the difference between dates of purchase and price, and ranked them by closest date prior to the purchase. Rank of 1 will go to the closest date. Then, data with rank of 1 was extracted.
This is a great place to use date ranges! We know the start date of the price range and we can use a window function to get the next date. At that point, it's really easy to figure out the price on any day.
with price_ranges as
(select product,
price,
date as price_date,
daterange(date, lead(date, 1)
OVER (partition by product order by date), '[)'
) as valid_price_range from product_prices
)
select "user" as name,
purchases.product,
purchases.date,
price_date,
price
from purchases
join price_ranges on purchases.product = price_ranges.product
and purchases.date <# price_ranges.valid_price_range
order by purchases.date;

How to select corresponding record alongside aggregate function with having clause

Let's say I have an orders table with customer_id, order_total, and order_date columns. I'd like to build a report that shows all customers who haven't placed an order in the last 30 days, with a column for the total amount their last order was.
This gets all of the customers who should be on the report:
select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval
Is there a better way to do this that doesn't require a subquery but instead uses a window function or other more efficient method in order to access the total amount from the most recent order? The techniques from How to select id with max date group by category in PostgreSQL? are related, but the extra having restriction seems to stop me from using something like DISTINCT ON.
demo:db<>fiddle
Solution with row_number window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
SELECT
customer, order_date, order_total
FROM (
SELECT
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total,
row_number() OVER w as row_count
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30
Solution with DISTINCT ON (https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT):
SELECT
customer, order_date, order_total
FROM (
SELECT DISTINCT ON (customer)
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30
Explanation:
In both solutions I am working with the first_value window function. The window function's frame is defined by customers. The rows within the customers' groups are ordered descending by date which gives the latest row first (last_value is not working as expected every time). So it is possible to get the last order_date and the last order_total of this order.
The difference between both solutions is the filtering. I showed both versions because sometimes one of them is significantly faster
The window function style is creating a row count within the frames. Every first row can be filtered later. This is done by adding a row_number window function. The benefit of this solution comes out when you are trying to filter the first two or three data sets. You simply have to change the filter from WHERE row_count = 1 to WHERE row_count = 2
But if you want only one single row per group you just need to ensure that the expected row per group is ordered to be the first row in the group. Then the DISTINCT ON function can delete all following rows. DISTINCT ON (customer) gives the first (ordered) row per customer group.
Try to join table on itself
select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval
Subqueries in select is a bad idea, because DB will execute a query for each row
If you use postgres you can also try to use CTE
https://www.postgresql.org/docs/9.6/static/queries-with.html
WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval

multiple extract() with WHERE clause possible?

So far I have come up with the below:
WHERE (extract(month FROM orders)) =
(SELECT min(extract(month from orderdate))
FROM orders)
However, that will consequently return zero to many rows, and in my case, many, because many orders exist within that same earliest (minimum) month, i.e. 4th February, 9th February, 15th Feb, ...
I know that a WHERE clause can contain multiple columns, so why wouldn't the below work?
WHERE (extract(day FROM orderdate)), (extract(month FROM orderdate)) =
(SELECT min(extract(day from orderdate)), min(extract(month FROM orderdate))
FROM orders)
I simply get: SQL Error: ORA-00920: invalid relational operator
Any help would be great, thank you!
Sample data:
02-Feb-2012
14-Feb-2012
22-Dec-2012
09-Feb-2013
18-Jul-2013
01-Jan-2014
Output:
02-Feb-2012
14-Feb-2012
Desired output:
02-Feb-2012
I recreated your table and found out you just messed up the brackets a bit. The following works for me:
where
(extract(day from OrderDate),extract(month from OrderDate))
=
(select
min(extract(day from OrderDate)),
min(extract(month from OrderDate))
from orders
)
Use something like this:
with cte1 as (
select
extract(month from OrderDate) date_month,
extract(day from OrderDate) date_day,
OrderNo
from tablename
), cte2 as (
select min(date_month) min_date_month, min(date_day) min_date_day
from cte1
)
select cte1.*
from cte1
where (date_month, date_day) = (select min_date_month, min_date_day from cte2)
A common table expression enables you to restructure your data and then use this data to do your select. The first cte-block (cte1) selects the month and the day for each of your table rows. Cte2 then selects min(month) and min(date). The last select then combines both ctes to select all rows from cte1 that have the desired month and day.
There is probably a shorter solution to that, however I like common table expressions as they are almost all the time better to understand than the "optimal, shortest" query.
If that is really what you want, as bizarre as it seems, then as a different approach you could forget the extracts and the subquery against the table to get the minimums, and use an analytic approach instead:
select orderdate
from (
select o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
from orders o
)
where rn = 1;
ORDERDATE
---------
01-JAN-14
The row_number() effectively adds a pseudo-column to every row in your original table, based on the month and day in the order date. The rn values are unique, so there will be one row marked as 1, which will be from the earliest day in the earliest month. If you have multiple orders with the same day/month, say 01-Jan-2013 and 01-Jan-2014, then you'll still only get exactly one with rn = 1, but which is picked is indeterminate. You'd need to add further order by conditions to make it deterministic, but I have no idea what you might want.
That is done in the inner query; the outer query then filters so that only the records marked with rn = 1 is returned; so you get exactly one row back from the overall query.
This also avoids the situation where the earliest day number is not in the earliest month number - say if you only had 01-Jan-2014 and 02-Feb-2014; comparing the day and month separately would look for 01-Feb-2014, which doesn't exist.
SQL Fiddle (with Thomas Tschernich's anwer thrown in too, giving the same result for this data).
To join the result against your invoice table, you don't need to join to the orders table again - especially not with a cross join, which is skewing your results. You can do the join (at least) two ways:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
) o, invoices i
WHERE i.invno = o.invno
AND rn = 1;
Or:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT orderno, orderdate, invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
)
WHERE rn = 1
) o, invoices i
WHERE i.invno = o.invno;
The first looks like it does more work but the execution plans are the same.
SQL Fiddle with your pastebin-supplied query that gets two rows back, and these two that get one.