Lead() considering where clause, don't want it to

Lead() considering where clause, don't want it to - tsql

I have a simple log table named TaskLog. I want to see how long a certain step is taking by looking at the time difference between that step and the one after.
The query below has the effect of looking at the time between rows that read 'performing media relations rollup' - I don't want the LEAD() rows to follow the WHERE clause. (they should take the next row in cteTask regardless of the value of Notes.
;with cteTask as
(select * from Tasklog where prog = 'StatsMajor' and Date>'8/1/2014')
, cteLead as
(select *
, LEAD(ID) over (order by ID) NextID
, LEAD(date) over (order by ID) NextDt
from cteTask
where notes = 'performing media relations rollup'
)
select *, DATEDIFF(second, Date, NextDt) as Secs
from cteLead

Just filter after?
;with cteTask as
(select * from Tasklog where prog = 'StatsMajor' and Date>'8/1/2014')
, cteLead as
(select *
, LEAD(ID) over (order by ID) NextID
, LEAD(date) over (order by ID) NextDt
from cteTask
)
select *, DATEDIFF(second, Date, NextDt) as Secs
from cteLead
where notes = 'performing media relations rollup'

Related

Postgres select work 3x time faster then function with that select

I have a SELECT in Postgres:
SELECT DISTINCT ON (price) price, quantity, is_ask, final_update_id
FROM (SELECT *
FROM ((SELECT price, quantity, is_ask, book_depth.final_update_id
FROM order_depth
LEFT JOIN book_depth ON book_depth_id = book_depth.id
WHERE book_depth_id IN (SELECT id
FROM book_depth
WHERE final_update_id > (SELECT last_update_id
FROM order_book
WHERE symbol_name = 'XRPRUB'
ORDER BY last_update_id DESC
LIMIT 1)
AND symbol_name = 'XRPRUB'))
UNION
(SELECT price, quantity, is_ask, order_book_id
FROM "order"
WHERE order_book_id = (SELECT id
FROM order_book
WHERE symbol_name = 'XRPRUB'
ORDER BY last_update_id DESC
LIMIT 1))
ORDER BY final_update_id DESC) AS t) AS t1
ORDER BY price, final_update_id DESC;
It works for about 20 seconds.
But when I create function with this select this function works for about 1 min 40 seconds. Can someone explain me is it normal or I make mistake somewhere?

Selecting the 1st and 10th Records Only

Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,

I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc

Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)

Limit by percent instead of number of rows without subqueries

I would like to select the top 1% of rows; however, I cannot use subqueries to do it. I.e., this won't work:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT (COUNT(*) * 0.01)::integer FROM mytbl)
How would I accomplish the same output without using a subquery with limit?

You can utilize PERCENT_RANK:
WITH cte(ID, var, pc) AS
(
SELECT ID, var, PERCENT_RANK() OVER (ORDER BY random()) AS pc
FROM mytbl
WHERE var = 'value'
)
SELECT *
FROM cte
WHERE pc <= 0.01
ORDER BY id;
SqlFiddleDemo

I solved it with Python using the psycopg2 package:
cur.execute("SELECT ROUND(COUNT(id)*0.01,0)
FROM mytbl")
nrows = str([int(d[0]) for d in cur.fetchall()][0])
cur.execute("SELECT *
FROM mytbl
WHERE var='value'
ORDER BY id, random() LIMIT (%s)",nrows)
Perhaps there is a more elegant solution using just SQL, or a more efficient one, but this does exactly what I'm looking for.

If I got it right, you need:
Random 1% sample of all rows,
If some id is within the sample, all rows with the same id must be there too.
The follow sql should do the trick:
with ids as (
select id,
total,
sum(cnt) over (order by max(rnd)) running_total
from (
select id,
count(*) over (partition by id) cnt,
count(*) over () total,
row_number() over(order by random()) rnd
from mytbl
) q
group by id,
cnt,
total
)
select mytbl.*
from mytbl,
ids
where mytbl.id = ids.id
and ids.running_total <= ids.total * 0.01
order by mytbl.id;

I don’t have your data, of course, but I have no trouble using a sub query in the LIMIT clause.
However, the sub query contains only the count(*) part and I then multiply the result by 0.01:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT count(*) FROM mytbl)*0.01;

selecting only two employees from every department

Can you let me know how to select only two employees from every department? The table has deptname, ssn, name . I am doing a sampling and I need only two ssns for every department name. Can someone help?

You can accomplish this with an "OLAP expression" row_number()
with e as
( select deptname, ssn, empname,
row_number() over (partition by dptname order by empname) as pick
from employees
)
select deptname, ssn, empname
from e
where pick < 3
order by deptname, ssn
This example will give you the two employees with the lowest order names, because that is what is specified in the row_number() (order by) expression.

Try this:
select *
from t t1
where (
select count(*)
from t t2
where
t2.deptname = t1.deptname
and
t2.ssn <= t1.ssn) <= 2
order by deptname, ssn,name;
The above will give "smallest" two ssn.
If you want top 2, change to t2.ssn >= t1.ssn
sqlfiddle
The data:
The result from query:

select * from
( select rank() over (partition by dptname order by empname) as count , *
from employees
)
where count<=2
order by deptname, ssn,name;

T-SQL if value exists use it other wise use the value before

I have the following table
-----Account#----Period-----Balance
12345---------200901-----$11554
12345---------200902-----$4353
12345 --------201004-----$34
12345 --------201005-----$44
12345---------201006-----$1454
45677---------200901-----$14454
45677---------200902-----$1478
45677 --------201004-----$116776
45677 --------201005-----$996
56789---------201006-----$1567
56789---------200901-----$7894
56789---------200902-----$123
56789 --------201003-----$543345
56789 --------201005-----$114
56789---------201006-----$54
I want to select the account# that have a period of 201005.
This is fairly easy using the code below. The problem is that if a user enters 201003-which doesnt exist- I want the query to select the previous value.*NOTE that there is an account# that has a 201003 period and I still want to select it too.*
I tried CASE, IF ELSE, IN but I was unsuccessfull.
PS:I cannot create temp tables due to system limitations of 5000 rows.
Thank you.
DECLARE #INPUTPERIOD INT
#INPUTPERIOD ='201005'
SELECT ACCOUNT#, PERIOD , BALANCE
FROM TABLE1
WHERE PERIOD =#INPUTPERIOD

SELECT t.ACCOUNT#, t.PERIOD, t.BALANCE
FROM (SELECT ACCOUNT#, MAX(PERIOD) AS MaxPeriod
FROM TABLE1
WHERE PERIOD <= #INPUTPERIOD
GROUP BY ACCOUNT#) q
INNER JOIN TABLE1 t
ON q.ACCOUNT# = t.ACCOUNT#
AND q.MaxPeriod = t.PERIOD

select top 1 account#, period, balance
from table1
where period >= #inputperiod

; WITH Base AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE Period <= 201003
)
SELECT * FROM Base WHERE RN = 1
Using CTE and ROW_NUMBER() (we take all the rows with Period <= the selected date and we take the top one (the one with auto-generated ROW_NUMBER() = 1)
; WITH Base AS
(
SELECT *, 1 AS RN FROM #MyTable WHERE Period = 201003
)
, Alternative AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE NOT EXISTS(SELECT 1 FROM Base) AND Period < 201003
)
, Final AS
(
SELECT * FROM Base
UNION ALL
SELECT * FROM Alternative WHERE RN = 1
)
SELECT * FROM Final
This one is a lot more complex but does nearly the same thing. It is more "imperative like". It first tries to find a row with the exact Period, and if it doesn't exists does the same thing as before. At the end it unite the two result sets (one of the two is always empty). I would always use the first one, unless profiling showed me the SQL wasn't able to comprehend what I'm trying to do. Then I would try the second one.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Lead() considering where clause, don't want it to - tsql

Related

Postgres select work 3x time faster then function with that select

Selecting the 1st and 10th Records Only

Limit by percent instead of number of rows without subqueries

selecting only two employees from every department

T-SQL if value exists use it other wise use the value before

Categories

Resources