TSQL identifying first record in noncontiguous sequences - tsql

I tried for 1 1/2 shifts to solve my problem set-based but couldn't quite get there. Solved it in about 15 mins with cursor and it runs fast enough.
But I wonder if there is a way to do it set-based.
We have records of employee status changes extracted from 3rd party HR app: empid,recorddate,status. I need to identify the recorddate and status for each change of an emp's status over time. However there is a problem in the data. Sometimes there will be rows with different record dates for an emp, but the status does NOT change.
declare #test table (empid int, recorddate date,status varchar(10))
insert into #test (empid,recorddate,status) values
(1,'1/1/2000','a'),
(1,'2/1/2000','b'),
(1,'3/1/2000','b'),
(1,'3/3/2000','b'),
(1,'4/1/2000','c'),
(2,'2/1/2000','a'),
(2,'3/1/2000','c'),
(1,'5/1/2000','a')
(1,'6/1/2000','a')
(2,'7/1/2000','c')
I need to return the recordate and status for any change in status for an emp.
So in the example below, there is no record returned for record date of 3/1/2000 for emp #1 because the status is same as for preceeding record date of 2/1/2000, and no record for emp #1 for 6/1/2000 because the status value did not change vs. record with closest earlier recorddate.
And same concept for emp #2, no record returned with 7/1/2000 recorddate for that emp because status did not change for closest earlier recordate
empid, recorddate, status
--------------------------------------
1,'1/1/2000','a'
1/'2/1/2000','b'
1,'4/1/2000','c'
1,'5/1/2000','a'
2,'2/1/2000','a'
2,'3/1/2000','c'
I tried numbering the unchanging sequences of status using partion by empid and status, order by empid, recorddate and then selecting the rownumber 1 from each window "frame" to get the earliest occurence, but no luck. The rownumber would not reset to 1 when the status might occur > 1 times in the records for an emp, but discontiguous across recorddates.
thanks
ken

You can easily solve this using LAG window function:
select empid, recorddate, status
from (
select empid, recorddate, status,
coalesce(lag(status) over (partition by empid
order by recorddate), '') as prevstatus
from #test) as t
where status <> prevstatus
order by empid, recorddate

Related

Redshift insert a date value into a table

insert into table1 (ID,date)
select
ID,sysdate
from table2
assume i insert a record into table2 with value ID:1,date:2023-1-1
the expected result is update the ID of table1 base on the ID from table2 and update the value of date of table1 base on the sysdate from table2.
select *
from table1;
the expected result after running the insert statement will be
ID
date
1
2023-1-6
but what i get is:
ID
date
1
2023-1-1
I see a few possibilities based on the information given:
You say "the expected result is update the ID of table1 base on the ID from table2" and this begs the question - did ID = 1 exist in table1 BEFORE you ran the INSERT statement? If so are you expecting that the INSERT will update the value for ID #1? Redshift doesn't enforce or check uniqueness of primary keys and you would get 2 rows in the table1 in this case. Is this what is happening?
SYSDATE on Redshift provides the start timestamp of the current transaction, NOT the current statement. Have you had the current transaction open since the 1st?
You didn't COMMIT the results (or the statement failed) and are checking from a different session. It could also be that the transaction started before in the second session before the COMMIT completed. Working with MVCC across multiple sessions can trip anyone up.
There are likely other possible explanations. If you could provide DDL, sample data, and a simple test case so that others can recreate what you are seeing it would greatly narrow down the possibilities.

Postgresql order by and limit, the same record appears in multiple pages

SELECT id, name, port, created_at, updated_at
FROM test_table
ORDER BY updated_at DESC LIMIT 10 OFFSET 0;
All records updated_at is same since am dumping 5000 records into the database
I set page size as 10 by using limit 10 and offset 0
Issue i face is it shows a particular record of page 1 in multiple pages. Which in my case I saw that record till page-5
Can anyone tell me why this occurs and how to solve the issue
The order by needs to have a unique way to order your items. I would just add the id if it's unique.
SELECT id, name, port, created_at, updated_at
FROM test_table
ORDER BY updated_at DESC, id LIMIT 10 OFFSET 0;

sql for each district increment column facility

I am trying to update a column [facility_id] with incrementing integer for each group of districts. The facility_id needs to start at 1 and end at x pending how many columns each district has. I have been playing with loops all day but I have nothing that works and dread doing this by hand because I have 3,000 rows to manipulate.
bad table
good table
I am new and still learning, please teach.
Thank you!
Use row_number() window function to update the column:
update tablename
set Facility_ID = t.rn
from (
select id, row_number() over (partition by District order by ID) rn
from tablename
) t
where t.ID = tablename.ID
See the demo.

Find time difference between two most recent orders

I am trying to estimate the time of a new order from repeat customers by finding the time difference between the most recent order and the second most recent order, and then adding that difference to the most recent order.
I have been trying limit and offset, but this returns a blanket date for every row. I am thinking I need to do a lateral join, but not sure how to implement it correctly. When I try to do it, I receive no output.
select public.orders.customer_id,
max(public.orders.created_at) as last_order_date,
(select created_at from public.orders group by created_at order by created_at desc limit 1 offset 1) as second_last
from public.orders
inner join
(select
customer_id, count(*)
from public.orders
where status = 'fulfilled'
group by public.orders.customer_id
having count(customer_id) >1) repeat_customers
on public.orders.customer_id = repeat_customers.customer_id
group by public.orders.customer_id;
I wanted the second_last field to be populated by the second most recent date for each customer_id, but the output is the second most recent date for the entire table, resulting in the same date for every entry.
For your second_last column you're not limiting it per customer, it will indeed find the max of everything just like the results you've seen. See the WHERE clause in the example below which should solve this:
(SELECT
created_at
FROM
public.orders po
WHERE
po.customer_id = customer_id
ORDER BY
created_at
LIMIT 1 OFFSET 1) AS second_last
I've also aliased the table because I wasn't sure if it would complain about ambiguity since the same table is mentioned in the main select.

Handling PostgreSQL similiar, but not quite duplicate records

I have a table that contains a number of rows that are duplicated. I can get rid of those with:
DELETE FROM files
WHERE id IN (
SELECT id
FROM (SELECT id, ROW_NUMBER() OVER (partition BY name, active, filesize, start_timestamp, end_timestamp ORDER BY id) AS rnum
FROM files) t
WHERE t.rnum > 1);
This works fine most of the time, however I have a number of rows where the filesize and end_timestamp changes, but all of the remaining data remains the same. What I would like to do is when duplicate records exist change the active attribute of the records with the smallest filesize and end_timestamp to false.
I'm just a having a moment and cant seem to figure out how to do that.