TSQL Cursor Alternative to Speed up my query - tsql

Row Status Time
1 Status1 1383264075
2 Status1 1383264195
3 Status1 1383264315
4 Status2 1383264435
5 Status2 1383264555
6 Status2 1383264675
7 Status2 1383264795
8 Status1 1383264915
9 Status3 1383265035
10 Status3 1383265155
11 Status2 1383265275
12 Status3 1383265395
13 Status1 1383265515
14 Status1 1383265535
15 Status2 1383265615
The [Time] column holds POSIX time
I want to be able to calculate the number of seconds a given [Status] is active for within a given time period without using CURSORS. If this is the only then that is fine as I've already done that.
Using the above sample data extract, how do I calculate how long "Status1" has been active for?
That is, Substract Row1.[Time] from Row4.[Time], Substract Row8.[Time] from Row9.[Time], Substract Row13.[Time] from Row15.[Time].
Thankyou in advance

Assuming that each row represents that the specific Status is active from the specified Time until the next row, one would have to somehow calculate the difference between row N and N+1. One way would be to use a nested query (try it here: SQL Fiddle).
SELECT SUM(Duration) as Duration
FROM (
SELECT f.Status, s.Time-f.Time as Duration
FROM Table1 f
JOIN Table1 s on s.Row = f.Row+1
WHERE f.Status = 'Status1') a

The solution by #erikxiv will work if the Row values have no gaps. If they do have gaps, you could try the following method:
SELECT
TotalDuration = SUM(next.Time - curr.Time)
FROM
dbo.atable AS curr
CROSS APPLY
(
SELECT TOP (1) Time
FROM dbo.atable
WHERE Row > curr.Row
ORDER BY Row ASC
) AS next
WHERE
curr.Status = 'Status1'
;
For every row matching the specified status, the correlated subquery in the CROSS APPLY clause will fetch the next Time value based on the ascending order of Row. The current row's time is then subtracted from the next row's time and all the differences are added up using SUM().
Please note that in both solutions it is implied that the order of Row values follows the order of Time values. In other words, ORDER BY Row is assumed to be equivalent to ORDER BY Time or, if Time can have duplicates, to ORDER BY Time, Row.

Related

How can I count elements satisfying a condition in a group, with PostgresSQL

with this query:
SELECT date_trunc('minute', ts) ts, instrument
FROM test
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
I am grouping rows by minutes but I would like to generate a boolean value that tells me if, in the group, there is at least one row with the timestamp where the seconds are < 10 and at least one row with the timestamp where the seconds are > 50.
In short, something like:
lessThan10 = false
moreThan50 = false
for each row in the one minute group:
if row.ts.seconds < 10 then lessThan10 = true
if row.ts.seconds > 50 then moreThan50 = true
return lessThan10 && moreThan50
What I am trying to achieve is to find out if all the records I aggregate cover the beginning and the end of the minute; it's ok if there are holes here and there, but it's possible the data we capture stops and restarts at second 40 for example and, in that case, I'd like to be able to discard the whole minute.
As the data rate varies quite a lot, I can't check for a minimum number of row. There may be a better solution to achieve this, so I'm open to it as well.
Use EXTRACT() to get the seconds of the min and max values of ts:
SELECT date_trunc('minute', ts) ts, instrument,
EXTRACT(SECOND FROM MIN(ts)) < 10 lessThan10,
EXTRACT(SECOND FROM MAX(ts)) > 50 moreThan50
FROM test
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
See the demo.

Postgres Function: how to return the first full set of data that occurs after specified date/time

I have a requirement to extract rows of data, but only if all said rows make a full set. We have a sequence table that is updated every minute, with data for 80 bins. We need to know the status of bins 1 thru 80 every minute as part of our production process.
I am generating a new report (postgres function) that needs to take a snapshot at roughly 00:01:00:AM (IE 1 minute past midnight). Initially I thougtht this to be an easy task, just grab the first 80 rows of data that occur at/after this time, however I see that, depending on network activity and industrial computer priorities, the table is not religiously updated at exactly 00:01:00AM or any minute for that matter. Updates can occur milliseconds or even seconds later, and take 500ms to 800ms to update the database. Sometimes a given minute can be missing altogether (production processes take precedence over data capture, but the sequence data is not super critical anyway)
My thinking is it would be more reliable to look for the first complete set of data anytime from 00:01:00AM onwards. So effectively, I have a table that looks a bit like this:
Apologies, I know you prefer for images of this manner to not be pasted in this manner, but I could not figure out how to create a textual table like this here (carriage return or Enter button is ignored!)
Basically, the above table is typical, but 1st minute is not guaranteed, and for that matter, I would not be 100% confident that all 80 bins are logged for a given minute. Hence my question: how to return the first complete set of data, where all 80 bins (rows) have been captured for a particular minute?
Thinking about it, I could do some sort of rowcount in the function, ensuring there are 80 rows for a given minute, but this seems less intuitive. I would like to know for sure that for each row of a given minute, bin 1 is represented, bint 2, bin 3...
Ultimately a call to this function will supply a min/max date/time and that period of time will be checked for the first available minute with a full set of bins data.
I am reasonably sure this will involve a window function, as all rows have to be assessed prior to data extraction. I've used windows functions a few times now, but still a green newbie compared to others here, so help is appreciated.
My final code, thanks to help from #klin:-
StartTime = DATE_TRUNC('minute', tme1);
EndTime = DATE_TRUNC('day', tme1) + '23 hours'::interval;
SELECT "BinSequence".*
FROM "BinSequence"
JOIN(
SELECT "binMinute" AS binminute, count("binMinute")
FROM "BinSequence"
WHERE ("binTime" >= StartTime) AND ("binTime" < EndTime)
GROUP BY 1
HAVING COUNT (DISTINCT "binBinNo") = 80 -- verifies that each and every bin is represented in returned data
) theseTuplesOnly
ON theseTuplesOnly.binminute = "binMinute"
WHERE ("binTime" >= StartTime) AND ("binTime" < EndTime)
GROUP BY 1
ORDER BY 1
LIMIT 80
Use the aggregate function count(*) grouping data by minutes (date_trunc('minute', datestamp) gives full minutes from datestamp), e.g.:
create table bins(datestamp time, bin int);
insert into bins values
('00:01:10', 1, 'a'),
('00:01:20', 2, 'b'),
('00:01:30', 3, 'c'),
('00:01:40', 4, 'd'),
('00:02:10', 3, 'e'),
('00:03:10', 2, 'f'),
('00:03:10', 3, 'g'),
('00:03:10', 4, 'h');
select date_trunc('minute', datestamp) as minute, count(bin)
from bins
group by 1
order by 1
minute | count
----------+-------
00:01:00 | 4
00:02:00 | 1
00:03:00 | 3
(3 rows)
If you are not sure that all bins are unique in consecutive minutes, use distinct (this will make the query slower):
select date_trunc('minute', datestamp) as minute, count(distinct bin)
...
You cannot select counts in aggregated minnutes and all columns of the table in a single simple select. If you want to do that, you should join a derived table or use the operator in or use a window function. A join seems to be the simplest:
select b.*, count
from bins b
join (
select date_trunc('minute', datestamp) as minute, count(bin)
from bins
group by 1
having count(bin) = 4
) s
on date_trunc('minute', datestamp) = minute
order by 1;
datestamp | bin | param | count
-----------+-----+-------+-------
00:01:10 | 1 | a | 4
00:01:20 | 2 | b | 4
00:01:30 | 3 | c | 4
00:01:40 | 4 | d | 4
(4 rows)
Note also how to use having() to filter results in the above query.
You can test the query here.

Min value with GROUP BY in Power BI Desktop

id datetime new_column datetime_rankx
1 12.01.2015 18:10:10 12.01.2015 18:10:10 1
2 03.12.2014 14:44:57 03.12.2014 14:44:57 1
2 21.11.2015 11:11:11 03.12.2014 14:44:57 2
3 01.01.2011 12:12:12 01.01.2011 12:12:12 1
3 02.02.2012 13:13:13 01.01.2011 12:12:12 2
3 03.03.2013 14:14:14 01.01.2011 12:12:12 3
I want to make new column, which will have minimum datetime value for each row in group by id.
How could I do it in Power BI desktop using DAX query?
Use this expression:
NewColumn =
CALCULATE(
MIN(
Table[datetime]),
FILTER(Table,Table[id]=EARLIER(Table[id])
)
)
In Power BI using a table with your data it will produce this:
UPDATE: Explanation and EARLIER function usage.
Basically, EARLIER function will give you access to values of different row context.
When you use CALCULATE function it creates a row context of the whole table, theoretically it iterates over every table row. The same happens when you use FILTER function it will iterate on the whole table and evaluate every row against the filter condition.
So far we have two row contexts, the row context created by CALCULATE and the row context created by FILTER. Note FILTER use the EARLIER to get access to the CALCULATE's row context. Having said that, in our case for every row in the outer (CALCULATE's row context) the FILTER returns a set of rows that correspond to the current id in the outer context.
If you have a programming background it could give you some sense. It is similar to a nested loop.
Hope this Python code points the main idea behind this:
outer_context = ['row1','row2','row3','row4']
inner_context = ['row1','row2','row3','row4']
for outer_row in outer_context:
for inner_row in inner_context:
if inner_row == outer_row: #this line is what the FILTER and EARLIER do
#Calculate the min datetime using the filtered rows
...
...
UPDATE 2: Adding a ranking column.
To get the desired rank you can use this expression:
RankColumn =
RANKX(
CALCULATETABLE(Table,ALLEXCEPT(Table,Table[id]))
,Table[datetime]
,Hoja1[datetime]
,1
)
This is the table with the rank column:
Let me know if this helps.

Divide records into groups - quick solution

I need to divide with UPDATE command rows (selected from subselect) in PostgreSQL table into groups, these groups will be identified with integer value in one of its columns. These groups should be with the same size. Source table contains billions of records.
For example I need to divide 213 selected rows into groups, every group should contains 50 records. The result will be:
1 - 50. row => 1
51 - 100. row => 2
101 - 150. row => 3
151 - 200. row => 4
200 - 213. row => 5
There is no problem to do it with some loop (or use PostgreSQL window functions), but I need to do it very efficiently and quickly. I can't use sequence in id because there should be gaps in these ids.
I have an idea to use random integer number generator and set it as default value for a row. But this is not useable when I need to adjust group size.
The query below should display 213 rows with a group-number from 0-4. Just add 1 if you want 1-5
SELECT i, (row_number() OVER () - 1) / 50 AS grp
FROM generate_series(1001,1213) i
ORDER BY i;
create temporary sequence s minvalue 0 start with 0;
select *, nextval('s') / 50 grp
from t;
drop sequence s;
I think it has the potential to be faster than the row_number version #Richard. But the difference could be not relevant depending on the specifics.

TSQL Syntax, Replace Existing "Wrong Value" with previous "Correct Value"

I have an application that makes an entry every hour in a MS SQL database.
The last entry on the 12th FEB is a zero value and is showing in my weekly report.
What I want to do is take the value from the previous count and enter into the filed instead of the zero value.
Can someone offer some advice on how to this because it is beyond my TSQL skills?
SELECT * FROM [dbo].[CountDetails]
WHERE [updateTime] < '2013-02.13'
AND [updateTime] > '2013-02.12'
AND ( DATEPART(hh,[updateTime])= '22' OR DATEPART(hh,[updateTime])= '23' )
Note: The application is supposed to zero the count a Midnight but on the 12th FEB it happened early and I know why.
EDIT: There are 5 IP addresses in total and 6 counters in total because 192.168.168.11 has 2 counters. So 2111 to 2116 is an entire entry for all available counters at 22:58 and 2117 to 2122 is an entire entry for all available counters at 23:58. I need to replace the 23:58 values with the corresponding value from 22:58.
Guessing here, but an update that joins on the ipAddress, counterNumber, and the datetime excluding fractional seconds, separated by an hour (do the SELECT part first for safety):
UPDATE b
SET count = a.count
-- SELECT *
FROM dbo.CountDetails a
JOIN dbo.CountDetails b ON a.ipAddress = b.ipAddress AND a.counterNumber = b.counterNumber
AND CONVERT(VARCHAR(20),b.updateTime,120) = CONVERT(VARCHAR(20),DATEADD(HOUR,1,a.updateTime),120)