TSQL Keyword Previous or Last or something similar - tsql

This question is geared for those who have more SQL experience than me.
I am writing a query(that will eventually be a Stored Procedure but this should be irrelevant) where I want to select the count of rows if the most recent entry's is equivalent to the one that was just entered before. And i want to continue to do this until it hits an entry that has a different value. (Poorly explained so I will show the example)
In my table I have a column 'Product_Id' and when this query is run i want it take the product_id and compare it to the previously entered product Id, if its the same I want to add one, and I want it to keep checking the previously entered product_id until it runs into a different product_id
I'm hoping it sounds more complicated than it is, and the query would look something like
Select count(Product_ID)
FROM dbo.myTable
Where Product_Id = previous(Product_Id)
Now, i know that previous isn't a keyword in TSQL, and neither was Last, but I'm hoping of someone who knows a keyword that does what I am asking.
Edit for Sam
USE DbName;
GO
WITH OrderedCount as
(
select ROW_NUMBER() OVER (Order by dbo.Line_Production.Run_Date DESC) as RowNumber,
Line_Production.Product_ID
From dbo.Line_Production
)
Select RowNumber, COUNT(OrderedCount.Product_ID) as PalletCount
From OrderedCount
WHERE OrderedCount.RowNumber + 1 = RowNumber
and Product_ID = Product_ID
Group by RowNumber
The OrderedCount portion works, and it returns the data back how I want it, I'm now having trouble comparing the Product_ID's for different RowNumbers
my Where Clause is wrong

There's no keyword. That would be a nice magic solution, but it doesn't exist, at least in part because there is no guaranteed ordering (okay, you could have the keyword only if there is an ORDER BY...). I can write you a query, but that'll take time, so for now I'll give you a few steps and I'll come back and see if you still need help in a bit.
Figure out an ORDER BY, otherwise no order is guaranteed. If there is a time entered field, that's a good choice, or an index, that works too.
Learn to use Row_Number.
Compare the table (with Row_Number) to itself where instance1.row - 1 = instance2.row.

If product_id is an identity column, couldn't you just do product_id - 1? In other words, if it's sequential, it's the same as using ROW_NUMBER mentioned in the previous comment.

Related

SQL query to calculate first and second order

I have a table with order_id, Customer_Id and Order_date. I can calculate first time order, as it is an easy 'min' on the date, but for some reason I can't find a way to get the second order date per customer in SQL.
Order_id|Customer_Id|Order_date
Ideal result:
Customer_Id|First_order_date|Second_order_date
I am using Postgres and have beginners level of SQL.
Happy to get any suggestions.
Many thanks for your help,
Misael
What I have tried
Tried using 'row_number() over(partition by customer_id', but I can't seem to make it work. I believe I need to declare what a second purchase means and use a common table expression.

PostgreSQL how to GROUP BY single field from returned table

So I have complicated query, to simplify let it be like
SELECT
t.*,
SUM(a.hours) AS spent_hours
FROM (
SELECT
person.id,
person.name,
person.age,
SUM(contacts.id) AS contact_count
FROM
person
JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id
Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table.
I can do that, but I don't believe that will work efficiently with big data.
I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.
Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?
This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.
I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.
So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.
If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:
.
SELECT
t.*,
SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id
This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.
For example the sum column would look like this:
Name Hours Sum Hours
----- ----- ---------
Smith 20 120
Jones 30 30
Smith 100 120
Whereas a group by would have had Smith once and could not have displayed the hours column in detail.
If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.

NOT IN query performance issue with large data

i was trying to get the id and the number from table with condition of number isn't in the id.
select id,number from tmp_t where number not in (select id from tmp_t)
Have tried the query and it's taking soooo looonggg... like almost 40 minutes and i got disconnected from server.
So what should i do? the data is around 500K rows..
So i wanted to show "here you go the id and the number, which the number didn't exist in the id."
Because i tried to insert the number, but the number is a FK and depending on the ID, so i wanted to know the id and the number, that's why i'm using not in.
Maybe someone know? Btw im using Postgresql-13
You can write it with NOT EXISTS instead, although these queries will have different results if any value of id is NULL (in which case, NOT IN probably yields not the answer you want, so NOT EXISTS is better from that perspective as well.)
select id,number from tmp_t where not exists
(select 1 from tmp_t a where a.id=tmp_t.number);
But your formulation is also efficient as long as work_mem is large enough.
Typically NOT EXISTS is faster (and doesn't suffer from surprises if NULL values are involved):
select t1.id, t1.number
from tmp_t t1
where not exists (select *
from tmp_t t2
where t2.id = t1.number)

Understanding a simple DISTINCT ON in postgresql

I am having a small difficulty understanding the below simple DISTINCT ON query:
SELECT DISTINCT
ON (bcolor) bcolor,
fcolor
FROM
t1
ORDER BY
bcolor,
fcolor;
I have this table here:
What is the order of execution of the above table and why I am getting the following result:
As I understand since ORDER BY is used it will display the table columns (both of them), in alphabetical order and since ON is used it will return the 1st matched duplicate, but I am still confused about how the resulting table is displayed.
Can somebody take me through how exactly this query is executed ?
This is an odd one since you would think that the SELECT would happen first, then the ORDER BY like any normal RDBMS, but the DISTINCT ON is special. It needs to know the order of the records in order to properly determine which records should be dropped.
So, in this case, it orders first by the bcolor, then by the fcolor. Then it determines distinct bcolors, and drops any but the first record for each distinct group.
In short, it does ORDER BY then applies the DISTINCT ON to drop the appropriate records. I think it would be most helpful to think of 'DISTINCT ON' as being special functionality that differs greatly from DISTINCT.
Added after initial post:
This could be done using window functions and a subquery as well:
SELECT
bcolor,
fcolor
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY bcolor ORDER BY fcolor ASC) as rownumber,
bcolor,
fcolor
FROM t1
) t2
WHERE rownumber = 1

sql date order by problem

i have image table, which has 2 or more rows with same date.. now im tring to do order by created_date DESC, which works fine and shows rows same position, but when i change the query and try again, it shows different positions.. and no i dont have any other order by field, so im bit confused on why its doing it and how can i fix it.
can you please help on this.
To get reproducible results you need to have columns in your order by clause that together are unique. Do you have an ID column? You can use that to tie-break:
ORDER BY created_date DESC, id
I suspect that this is happening because MySQL is not given any ordering information other than ORDER BY created_date DESC, so it does whatever is most convenient for MySQL depending on its complicated inner workings (caching, indexing, etc.). Assuming you have a unique key id, you could do:
SELECT * FROM table t ORDER BY t.created_date DESC, t.id ASC
Which would give you the same result every time because putting a comma in the arguments following ORDER BY gives it a secondary ordering rule that is executed when the first ordering rule doesn't produce a clear order between two rows.
To have consistent results, you will need to add at least more column to the 'ORDER BY' clause. Since the values in the created_date column are not unique, there is not a defined order. If you wanted that column to be 'unique', you could define it as a timestamp.