Let's say I have a table Sales
SaleID int
UserID int
Field1 varchar(10)
Created Datetime
and right now I have loaded and viewing the record with SaleID = 23
What's the right way to find out, using a stored procedure, what's the PREVIOUS and NEXT SalesID value off the current SaleID = 23, that belongs to me (UserID = 1)?
I could do a
SELECT TOP 1 *
FROM Sales
WHERE SaleID > 23 AND UserID = 1
and the same for SaleID < 23 but that's 2 SQL calls.
Is there a better way?
I'm using the SQL Server 2012.
You can get the previous/next SaleID (or any other field) by using the LAG() and LEAD() functions introduced in SQL Server 2012.
For example:
SELECT *,
LAG(SaleID) OVER (PARTITION BY UserID ORDER BY SaleID) Prev,
LEAD(SaleID) OVER (PARTITION BY UserID ORDER BY SaleID) Next
FROM Sales S
SqlFiddle
If you omit the PARTIITION BY clause in the LAG() or LEAD() functions in the answer of thepirat000's, you can find the related previous or next records according to the SaleID column.
Here is the SQL query
SELECT *,
LAG(SaleID) OVER (ORDER BY SaleID) Prev,
LEAD(SaleID) OVER (ORDER BY SaleID) Next
FROM Sales S
The PARTITION BY clause enables you to use these functions within a grouping based on UserID as in the thepirat000's code
If you want the next and previous records only for a single row, or at least for a small set of item following query can also help in terms of performance (as an answer to Eager to Learn's comment)
select
(select top 1 t.SaleID from Sales t where t.SaleID < tab1.SaleID) as prev_id,
SaleID as current_id,
(select top 1 t.SaleID from Sales t where t.SaleID > tab1.SaleID) as next_id
from Sales where SaleID = 2
Related
I am trying to pull unique active users before a date.
So specifically, I have a date range (let's say August - November) where I want to know the cumulative unique active users on or before a day within a month.
So, the pseudocode would look something like this:
SELECT COUNT(DISTINCT USERS) FROM USER_DB
WHERE
Month = [loop through months 8-11]
AND
DAY <= [day in loop of 1:31]
The output I desire is something Like this
step-by-step demo: db<>fiddle
SELECT
mydate,
SUM( -- 3
COUNT(DISTINCT username) -- 1, 2
) OVER (ORDER BY mydate) -- 3
FROM t
GROUP BY mydate -- 2
GROUP BY your date and count the users
Because you don't want to count ALL user accesses, but only one access per user and day, you need to add the DISTINCT
This is a window function. This one aggregates all counts which where previously done cumulatively.
If you want to get unique user over ALL days (count a user only on its first access) you can filter the users with a DISTINCT ON clause first:
demo: db<>fiddle
SELECT DISTINCT ON (username)
*
FROM t
ORDER BY username, mydate
This yields:
SELECT
mydate,
SUM(
COUNT(*)
) OVER (ORDER BY mydate)
FROM (
SELECT DISTINCT ON (username)
*
FROM t
ORDER BY username, mydate
) s
GROUP BY mydate
Let's say I have an orders table with customer_id, order_total, and order_date columns. I'd like to build a report that shows all customers who haven't placed an order in the last 30 days, with a column for the total amount their last order was.
This gets all of the customers who should be on the report:
select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval
Is there a better way to do this that doesn't require a subquery but instead uses a window function or other more efficient method in order to access the total amount from the most recent order? The techniques from How to select id with max date group by category in PostgreSQL? are related, but the extra having restriction seems to stop me from using something like DISTINCT ON.
demo:db<>fiddle
Solution with row_number window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
SELECT
customer, order_date, order_total
FROM (
SELECT
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total,
row_number() OVER w as row_count
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30
Solution with DISTINCT ON (https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT):
SELECT
customer, order_date, order_total
FROM (
SELECT DISTINCT ON (customer)
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30
Explanation:
In both solutions I am working with the first_value window function. The window function's frame is defined by customers. The rows within the customers' groups are ordered descending by date which gives the latest row first (last_value is not working as expected every time). So it is possible to get the last order_date and the last order_total of this order.
The difference between both solutions is the filtering. I showed both versions because sometimes one of them is significantly faster
The window function style is creating a row count within the frames. Every first row can be filtered later. This is done by adding a row_number window function. The benefit of this solution comes out when you are trying to filter the first two or three data sets. You simply have to change the filter from WHERE row_count = 1 to WHERE row_count = 2
But if you want only one single row per group you just need to ensure that the expected row per group is ordered to be the first row in the group. Then the DISTINCT ON function can delete all following rows. DISTINCT ON (customer) gives the first (ordered) row per customer group.
Try to join table on itself
select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval
Subqueries in select is a bad idea, because DB will execute a query for each row
If you use postgres you can also try to use CTE
https://www.postgresql.org/docs/9.6/static/queries-with.html
WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval
So far I have come up with the below:
WHERE (extract(month FROM orders)) =
(SELECT min(extract(month from orderdate))
FROM orders)
However, that will consequently return zero to many rows, and in my case, many, because many orders exist within that same earliest (minimum) month, i.e. 4th February, 9th February, 15th Feb, ...
I know that a WHERE clause can contain multiple columns, so why wouldn't the below work?
WHERE (extract(day FROM orderdate)), (extract(month FROM orderdate)) =
(SELECT min(extract(day from orderdate)), min(extract(month FROM orderdate))
FROM orders)
I simply get: SQL Error: ORA-00920: invalid relational operator
Any help would be great, thank you!
Sample data:
02-Feb-2012
14-Feb-2012
22-Dec-2012
09-Feb-2013
18-Jul-2013
01-Jan-2014
Output:
02-Feb-2012
14-Feb-2012
Desired output:
02-Feb-2012
I recreated your table and found out you just messed up the brackets a bit. The following works for me:
where
(extract(day from OrderDate),extract(month from OrderDate))
=
(select
min(extract(day from OrderDate)),
min(extract(month from OrderDate))
from orders
)
Use something like this:
with cte1 as (
select
extract(month from OrderDate) date_month,
extract(day from OrderDate) date_day,
OrderNo
from tablename
), cte2 as (
select min(date_month) min_date_month, min(date_day) min_date_day
from cte1
)
select cte1.*
from cte1
where (date_month, date_day) = (select min_date_month, min_date_day from cte2)
A common table expression enables you to restructure your data and then use this data to do your select. The first cte-block (cte1) selects the month and the day for each of your table rows. Cte2 then selects min(month) and min(date). The last select then combines both ctes to select all rows from cte1 that have the desired month and day.
There is probably a shorter solution to that, however I like common table expressions as they are almost all the time better to understand than the "optimal, shortest" query.
If that is really what you want, as bizarre as it seems, then as a different approach you could forget the extracts and the subquery against the table to get the minimums, and use an analytic approach instead:
select orderdate
from (
select o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
from orders o
)
where rn = 1;
ORDERDATE
---------
01-JAN-14
The row_number() effectively adds a pseudo-column to every row in your original table, based on the month and day in the order date. The rn values are unique, so there will be one row marked as 1, which will be from the earliest day in the earliest month. If you have multiple orders with the same day/month, say 01-Jan-2013 and 01-Jan-2014, then you'll still only get exactly one with rn = 1, but which is picked is indeterminate. You'd need to add further order by conditions to make it deterministic, but I have no idea what you might want.
That is done in the inner query; the outer query then filters so that only the records marked with rn = 1 is returned; so you get exactly one row back from the overall query.
This also avoids the situation where the earliest day number is not in the earliest month number - say if you only had 01-Jan-2014 and 02-Feb-2014; comparing the day and month separately would look for 01-Feb-2014, which doesn't exist.
SQL Fiddle (with Thomas Tschernich's anwer thrown in too, giving the same result for this data).
To join the result against your invoice table, you don't need to join to the orders table again - especially not with a cross join, which is skewing your results. You can do the join (at least) two ways:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
) o, invoices i
WHERE i.invno = o.invno
AND rn = 1;
Or:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT orderno, orderdate, invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
)
WHERE rn = 1
) o, invoices i
WHERE i.invno = o.invno;
The first looks like it does more work but the execution plans are the same.
SQL Fiddle with your pastebin-supplied query that gets two rows back, and these two that get one.
I have a table with date attribute and i need to do a query that gets the MIN date and the next of the MIN date
And I tried that :
select min(SC.TIMESTAMP) as minDate, result.TIMESTAMP
from Event SC
INNER JOIN
(SELECT TIMESTAMP from Event
HAVING TIMESTAMP > min(SC.TIMESTAMP)
) as result on result.BUSINESSID1 = SC.BUSINESSID1
where SC.BUSINESSSTEP = 'CONTAINER_PLACING_EVENT'
and SC.LOCATIONCODE = '1';
Could you please advice how to do that ?
Thanks in Advance
Perhaps you can rearrange your query into this form:
select
min(TS), min(TS2)
from
event,
(select TS as TS2 from event where TS > (select min(TS) from event))
Add extra criteria as desired. I would try to rewrite yours, but it isn't entirely clear what the criteria for the count are supposed to be. If you are expecting more than one row (for example, the min and min2 of each LOCATIONCODE) then you will probably want a GROUP BY in there.
Also, I wouldn't call a column TIMESTAMP as it is a reserved word.
You can use the ROW_NUMBER() OLAP Function:
SELECT *
FROM (
SELECT
TIMESTAMP
,ROW_NUMBER() OVER (
PARTITION BY BUSINESSSTEP, LOCATIONCODE
ORDER BY TIMESTAMP ASC
) AS RN
FROM EVENT
WHERE BUSINESSSTEP = 'CONTAINER_PLACING_EVENT'
AND LOCATIONCODE = '1'
) A
WHERE RN < 3
This will return as rows instead of columns, but it should get you what you want. If you think your original query would have returned multiple rows (for multiple entities), you can change the PARTITION BY clause to include the column that makes them distinct.
The query below returns 9,817 records. Now, I want to SELECT one more field from another table. See the 2 lines that are commented out, where I've simply selected this additional field and added a JOIN statement to bind this new columns. With these lines added, the query now returns 649,200 records and I can't figure out why! I guess something is wrong with my WHERE criteria in conjunction with the JOIN statement. Please help, thanks.
SELECT DISTINCT dbo.IMPORT_DOCUMENTS.ITEMID, BEGDOC, BATCHID
--, dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.CATEGORY_ID
FROM IMPORT_DOCUMENTS
--JOIN dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS ON
dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID = dbo.IMPORT_DOCUMENTS.ITEMID
WHERE (BATCHID LIKE 'IC0%' OR BATCHID LIKE 'LP0%')
AND dbo.IMPORT_DOCUMENTS.ITEMID IN
(SELECT dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID FROM
CATEGORY_COLLECTION_CATEGORY_RESULTS
WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN(
SELECT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16))
AND Sample_Id > 0)
AND dbo.IMPORT_DOCUMENTS.ITEMID NOT IN
(SELECT ASSIGNMENT_FOLDER_DOCUMENTS.Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)
One possible reason is because one of your tables contains data at lower level, lower than your join key. For example, there may be multiple records per item id. The same item id is repeated X number of times. I would fix the query like the below. Without data knowledge, Try running the below modified query.... If output is not what you're looking for, convert it into SELECT Within a Select...
Hope this helps....
Try this SQL: SELECT DISTINCT a.ITEMID, a.BEGDOC, a.BATCHID, b.CATEGORY_ID FROM IMPORT_DOCUMENTS a JOIN (SELECT DISTINCT ITEMID FROM CATEGORY_COLLECTION_CATEGORY_RESULTS WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN (SELECT DISTINCT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16)) AND Sample_Id > 0) B ON a.ITEMID =b.ITEMID WHERE a.(a.BATCHID LIKE 'IC0%' OR a.BATCHID LIKE 'LP0%') AND a.ITEMID NOT IN (SELECT DIDTINCT Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)