Get distinct rows based on one column with T-SQL - tsql

I have a column in the following format:
Time Value
17:27 2
17:27 3
I want to get the distinct rows based on one column: Time. So my expected result would be one result. Either 17:27 3 or 17:27 3.
Distinct
T-SQL uses distinct on multiple columns instead of one. Distinct would return two rows since the combinations of Time and Value are unique (see below).
select distinct [Time], * from SAPQMDATA
would return
Time Value
17:27 2
17:27 3
instead of
Time Value
17:27 2
Group by
Also group by does not appear to work
select * from table group by [Time]
Will result in:
Column 'Value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Questions
How can I select all unique 'Time' columns without taking into account other columns provided in a select query?
How can I remove duplicate entries?

This is where ROW_NUMBER will be your best friend. Using this as your sample data...
time value
-------------------- -----------
17:27 2
17:27 3
11:36 9
15:14 5
15:14 6
.. below are two solutions with that you can copy/paste/run.
DECLARE #youtable TABLE ([time] VARCHAR(20), [value] INT);
INSERT #youtable VALUES ('17:27',2),('17:27',3),('11:36',9),('15:14',5),('15:14',6);
-- The most elegant way solve this
SELECT TOP (1) WITH TIES t.[time], t.[value]
FROM #youtable AS t
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL));
-- A more efficient way solve this
SELECT t.[time], t.[value]
FROM
(
SELECT t.[time], t.[value], ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL)) AS RN
FROM #youtable AS t
) AS t
WHERE t.RN = 1;
Each returns:
time value
-------------------- -----------
11:36 9
15:14 5
17:27 2

Related

how to get last known contiguous value in postgres ltree field?

I have a child table called wbs_numbers. the primary key id is a ltree
A typical example is
id
series_id
abc.xyz.00001
1
abc.xyz.00002
1
abc.xyz.00003
1
abc.xyz.00101
1
so the parent table called series. it has a field called last_contigous_max.
given the above example, i want the series of id 1 to have its last contigous max be 3
can always assume that the ltree of wbs is always 3 fragment separated by dot. and the last fragment is always a 5 digit numeric string left padded by zero. can always assume the first child is always ending with 00001 and the theoretical total children of a series will never exceed 9999.
If you think of it as gaps and islands, the wbs_numbers will never start with a gap within a series. it will always start with an island.
meaning to say this is not possible.
id
series_id
abc.xyz.00010
1
abc.xyz.00011
1
abc.xyz.00012
1
abc.xyz.00101
1
This is possible
id
series_id
abc.xyz.00001
1
abc.xyz.00004
1
abc.xyz.00005
1
abc.xyz.00051
1
abc.xyz.00052
1
abc.xyz.00100
1
abc.xyz.10001
2
abc.xyz.10002
2
abc.xyz.10003
2
abc.xyz.10051
2
abc.xyz.10052
2
abc.xyz.10100
2
abc.xyz.20001
3
abc.xyz.20002
3
abc.xyz.20003
3
abc.xyz.20004
3
abc.xyz.20052
3
abc.xyz.20100
3
so the last max contiguous in this case is
for series id 1 => 1
for series id 2 => 3
for series id 3 => 4
What's the query to calculate the last_contigous_max number for any given series_id?
I also don't mind having another table just to store "islands".
Also, you can safely assume that wbs_number records will never be deleted once created. The id in the wbs_numbers table will never be altered once filled in as well.
Meaning to say islands will only grow and never shrink.
You can carry out your problem following these steps:
extract your integer value from your "id" field
compute a ranking value sided with your id value
filter out when your ranking value does not match your id value
get tied last row for each of your matches
WITH cte AS (
SELECT *, CAST(RIGHT(id_, 4) AS INTEGER) AS idval
FROM tab
), ranked AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY series_id ORDER BY idval) AS rn
FROM cte
)
SELECT series_id, idval
FROM ranked
WHERE idval = rn
ORDER BY ROW_NUMBER() OVER(PARTITION BY series_id ORDER BY idval DESC)
FETCH FIRST ROWS WITH TIES
Check the demo here.

Replace content in 'order' column with sequential numbers

I have an 'order' column in a table in a postgres database that has a lot of missing numbers in the sequence. I am having a problem figuring out how to replace the numbers currently in the column, with new ones that are incremental (see examples).
What I have:
id order name
---------------
1 50 Anna
2 13 John
3 2 Bruce
4 5 David
What I want:
id order name
---------------
1 4 Anna
2 3 John
3 1 Bruce
4 2 David
The row containing the lowest order number in the old version of the column should get the new order number '1', the next after that should get '2' etc.
You can use the window function row_number() to calculate the new numbers. The result of that can be used in an update statement:
update the_table
set "order" = t.rn
from (
select id, row_number() over (order by "order") as rn
from the_table
) t
where t.id = the_table.id;
This assumes that id is the primary key of that table.

SQL - how to sum groups of 15 rows and find the max sum

The purpose of this question is to optimize some SQL by using set-based operations vs iterative (looping, like I'm doing below):
Some Explanation -
I have this cte that is inserted to a temp table #dataForPeak. Each row represents a minute, and a respective value retrieved.
For every row, my code uses a while loop to add 15 rows at a time (the current row + the next 14 rows). These sums are inserted into another temp table #PeakDemandIntervals, which is my workaround for then finding the max sum of these groups of 15.
I've bolded my end goal above. My code achieves this but in about 12 seconds for 26k rows. I'll be looking at much more data, so I know this is not enough for my use case.
My question is,
can anyone help me find a fast alternative to this loop?
It can include more tables, CTEs, nested queries, whatever. The while loop might not even be the issue, it's probably the inner code.
insert into #dataForPeak
select timestamp, value
from cte
order by timestamp;
while ##ROWCOUNT<>0
begin
declare #timestamp datetime = (select top 1 timestamp from #dataForPeak);
insert into #PeakDemandIntervals
select #timestamp, sum(interval.value) as peak
from (select * from #dataForPeak base
where base.timestamp >= #timestamp
and base.timestamp < DATEADD(minute,14,#timestamp)
) interval;
delete from #dataForPeak where timestamp = #timestamp;
end
select max(peak)
from #PeakDemandIntervals;
Edit
Here's an example of my goal, using groups of 3min instead of 15min.
Given the data:
Time | Value
1:50 | 2
1:51 | 4
1:52 | 6
1:53 | 8
1:54 | 6
1:55 | 4
1:56 | 2
the max sum (peak) I'm looking for is 20, because the group
1:52 | 6
1:53 | 8
1:54 | 6
has the highest sum.
Let me know if I need to clarify more than that.
Based on the example given it seems like you are trying to get the maximum value of a rolling sum. You can calculate the 15-minute rolling sum very easily as follow:
SELECT [Time]
,[Value]
,SUM([Value]) OVER (ORDER BY [Time] ASC ROWS 14 PRECEDING) [RollingSum]
FROM #dataForPeak
Note the key here is the ROWS 14 PRECEDING statement. It effectively state that SQL Server should sum the preceding 14 records with the current record which will give you your 15 minute interval.
Now you can simply max the result of the rolling sum. The full query will look as follow:
;WITH CTE_RollingSum
AS
(
SELECT [Time]
,[Value]
,SUM([Value]) OVER (ORDER BY [Time] ASC ROWS 14 PRECEDING) [RollingSum]
FROM #dataForPeak
)
SELECT MAX([RollingSum]) AS Peak
FROM CTE_RollingSum

PGSQL duplicate record in same column

i have a table and i want to know where duplicate records are present for same columns. These are my columns and i want to get record where group_id or week are different for same code and fweek and newcode
Id newcode fweek code group_id week
1 343001 2016-01 343 100 8
2 343002 2016-01 343 100 8
3 343001 2016-01 343 101 08
Required record is
Id newcode fweek code group_id week
3 343001 2016-01 343 101 08
To find the duplicate values i have joined the table with itself.
and we need to group the results with code,fweek and newcode to get more than one duplicate rows if they exist. i have used max() to get last inserted row.
you don't need to use is distinct from (it is same for inequality + NULL). if you don't want to compare NULL ones, use <> operator.
You find more information about here info
select r.*
from your_table r
where r.id in (select max(r.id)
from your_table r
join your_table r2 on r2.code = r.code and r2.fweek = r.fweek and r2.newcode = r.newcode
where
r2.group_id is distinct from r.group_id or
r2.week is distinct from r.week
group by r.code,
r.fweek,
r.newcode
having count(*) > 1)

Postgresql difference between rows

My data:
id value
1 10
1 20
1 60
2 10
3 10
3 30
How to compute column 'change'?
id value change | my comment, how to compute
1 10 10 | 20-10
1 20 40 | 60-20
1 60 40 | default_value-60. In this example default_value=100
2 10 90 | default_value-10
3 10 20 | 30-10
3 30 70 | default_value-30
In other words: if row of id is last, then compute 100-value,
else compute next_value-value_now
You can access the value of the "next" (or "previous") row using a window function. The concept of a "next" row only makes sense if you have a column to define an order on the rows. You said you have a date column on which you can order the result. I used the column name your_date_column for this. You need to replace that with the actual column name of course.
select id,
value,
lead(value, 1, 100) over (partition by id order by your_date_column) - value as change
from the_table
order by id, your_date_column
lead(value, 1, 100) says: take the column value of the "next" row (that's the 1). If there is no such row, use the default value 100 instead.
Join on a subquery and use ROW_NUMBER to find the last value per group
WITH CTE AS(
SELECT id,value,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn,
(LEAD(value) OVER (PARTITION BY id ORDER BY date)-value) change FROM t)
SELECT cte.id,cte.value,
(CASE WHEN cte.change IS NULL THEN 100-cte.value ELSE cte.change END)as change FROM cte LEFT JOIN
(SELECT id,MAX(rn) mrn FROM cte
GROUP BY id) as x
ON x.mrn=cte.rn AND cte.id=x.id
FIDDLE