Select top 5 and bottom 5 columns from a list of columns based on their values

Select top 5 and bottom 5 columns from a list of columns based on their values - tsql

I have a requirement where I need to Select top 5 and bottom 5 columns from a list of columns based on their values.
If more than 1 column has same value then select any one from them.
Eg
CREATE TABLE #b(Company VARCHAR(10),A1 INt,A2 INt,A3 INt,A4 INt,B1 INt,G1 INt,G2 INt,G3 INt,HH5 INt,SS6 INt)
INSERT INTo #b
SELECT 'test_A',8,10,6,10,0,6,0,6,13,4 UNION ALL
SELECT 'test_B',17,7,0,1,3,18,0,6,9,5 UNION ALL
SELECT 'test_C',0,0,6,1,2,6,3,4,3,2 UNION ALL
SELECt 'test_D',13,1,4,1,4,1,9,0,0,5
SELECT * FROM #b
Desired Output:
Company
Top5
Bottom5
test_A
HH5,A2,A1,A3,SS6
B1,SS6,A3,A1,A2
test_B
G1,A1,HH5,A2,G3
A3,A4,B1,SS6,G3
I am able to find the top values but not the column names.
Here is I am stuck at, I am able to find the max scores but not sure how to find the column that holds this max value.
SELECT Company,(
SELECT MAX(myval)
FROM (VALUES (A1),(A2),(A3),(A4),(B1),(G1),(G2),(G3),(HH5)) AS temp(myval))
AS MaxOfColumns
FROM #b

As Larnu suggested, the first step would be to UNPIVOT the data into a form like (Company, ColumnName, Value). You can then use the ROW_NUMBER() window function to assign ordinals 1 - 10 to each value for each company based on the sorted value.
Next, you can wrap the above in a Common Table Expression (CTE) to feed a query that, for each Company, uses conditional aggregation with the STRING_AGG() to selectively combine the top 5 and bottom 5 column names to produce the desired result.
Something like:
;WITH Data AS (
SELECT
Company,
ColumnName,
Value,
ROW_NUMBER() OVER(PARTITION BY Company ORDER BY Value DESC, ColumnName) AS Ord
FROM #b
UNPIVOT (
Value FOR ColumnName IN (A1, A2, A3, A4, B1, G1, G2, G3, HH5, SS6)
) U
)
SELECT
D.Company,
STRING_AGG(CASE WHEN D.Ord BETWEEN 1 AND 5 THEN D.ColumnName END, ', ')
WITHIN GROUP (ORDER BY D.ORD) AS Top5,
STRING_AGG(CASE WHEN D.Ord BETWEEN 6 AND 10 THEN D.ColumnName END, ', ')
WITHIN GROUP (ORDER BY D.ORD) AS Bottom5
FROM Data D
GROUP BY D.Company
ORDER BY D.Company
For older SQL Server versions that don't support STRING_AGG(), the FOR XML PATH(''),TYPE construct can be used to concatenate text. The .value('text()[1]', 'varchar(max)') function is then used to safely extract the result from the XML, and finally the STUFF() function is used to strip out the leading separator (comma-space).
;WITH Data AS (
SELECT
Company,
ColumnName,
Value,
ROW_NUMBER() OVER(PARTITION BY Company ORDER BY Value DESC, ColumnName) AS Ord
FROM #b
UNPIVOT (
Value FOR ColumnName IN (A1, A2, A3, A4, B1, G1, G2, G3, HH5, SS6)
) U
)
SELECT B.Company, C.Top5, C.Bottom5
FROM #b B
CROSS APPLY (
SELECT
STUFF((
SELECT ', ' + D.ColumnName
FROM Data D
WHERE D.Company = B.Company
AND D.Ord BETWEEN 1 AND 5
ORDER BY D.ORD
FOR XML PATH(''),TYPE
).value('text()[1]', 'varchar(max)'), 1, 2, '') AS Top5,
STUFF((
SELECT ', ' + D.ColumnName
FROM Data D
WHERE D.Company = B.Company
AND D.Ord BETWEEN 6 AND 10
ORDER BY D.ORD
FOR XML PATH(''),TYPE
).value('text()[1]', 'varchar(max)'), 1, 2, '') AS Bottom5
) C
ORDER BY B.Company
See this db<>fiddle fr a demo.
If you also want lists of the top 5 and bottom 5 values, you can repeat the aggregations above while substituting CONVERT(VARCHAR, D.Value) for D.ColumnName where appropriate.

Related

TSQL - in a string, replace a character with a fixed one every 2 characters

I can't replace every 2 characters of a string with a '.'
select STUFF('abcdefghi', 3, 1, '.') c3,STUFF('abcdefghi', 5, 1,
'.') c5,STUFF('abcdefghi', 7, 1, '.') c7,STUFF('abcdefghi', 9, 1, '.')
c9
if I use STUFF I should subsequently overlap the strings c3, c5, c7 and c9. but I can't find a method
can you help me?
initial string:
abcdefghi
the result I would like is
ab.de.gh.
the string can be up to 50 characters

Create a numbers / tally / digits table, if you don't have one already, then you can use this to target each character position:
with digits as ( /* This would be a real table, here it's just to test */
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
), t as (
select 'abcdefghi' as s
)
select String_Agg( case when d.n%3 = 0 then '.' else Substring(t.s, d.n, 1) end, '')
from t
cross apply digits d
where d.n <Len(t.s)
Using for xml with existing table
with digits as (
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
),
r as (
select t.id, case when d.n%3=0 then '.' else Substring(t.s, d.n, 1) end ch
from t
cross apply digits d
where d.n <Len(t.s)
)
select result=(select '' + ch
from r r2
where r2.id=r.id
for xml path('')
)
from r
group by r.id

You can try it like this:
Easiest might be a quirky update ike here:
DECLARE #string VARCHAR(100)='abcdefghijklmnopqrstuvwxyz';
SELECT #string = STUFF(#string,3*A.pos,1,'.')
FROM (SELECT TOP(LEN(#string)/3) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values) A(pos);
SELECT #string;
Better/Cleaner/Prettier was a recursive CTE:
We use a declared table to have some tabular sample data
DECLARE #tbl TABLE(ID INT IDENTITY, SomeString VARCHAR(200));
INSERT INTO #tbl VALUES('')
,('a')
,('ab')
,('abc')
,('abcd')
,('abcde')
,('abcdefghijklmnopqrstuvwxyz');
--the query
WITH recCTE AS
(
SELECT ID
,SomeString
,(LEN(SomeString)+1)/3 AS CountDots
,1 AS OccuranceOfDot
,SUBSTRING(SomeString,4,LEN(SomeString)) AS RestString
,CAST(LEFT(SomeString,2) AS VARCHAR(MAX)) AS Growing
FROM #tbl
UNION ALL
SELECT t.ID
,r.SomeString
,r.CountDots
,r.OccuranceOfDot+2
,SUBSTRING(RestString,4,LEN(RestString))
,CONCAT(Growing,'.',LEFT(r.RestString,2))
FROM #tbl t
INNER JOIN recCTE r ON t.ID=r.ID
WHERE r.OccuranceOfDot/2<r.CountDots-1
)
SELECT TOP 1 WITH TIES ID,Growing
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY OccuranceOfDot DESC);
--the result
1
2 a
3 ab
4 ab
5 ab
6 ab.de
7 ab.de.gh.jk.mn.pq.st.vw.yz
The idea in short
We use a recursive CTE to walk along the string
we add the needed portion together with a dot
We stop, when the remaining length is to short to continue
a little magic is the ORDER BY ROW_NUMBER() OVER() together with TOP 1 WITH TIES. This will allow all first rows (frist per ID) to appear.

Parse Numeric Ranges in PostgreSQL

I would like to produce a string containing some parsed numeric ranges.
I have a table with some data
b_id,s_id
1,50
1,51
1,53
1,61
1,62
1,63
2,91
2,95
2,96
2,97
Using only SQL in PostgreSQL, how could I produce this output:
b_id,s_seqs
1,"50-51,53,61-63"
2,"91,95-97"
How on earth do I do that?

select b_id, string_agg(seq, ',' order by seq_no) as s_seqs
from (
select
b_id, seq_no,
replace(regexp_replace(string_agg(s_id::text, ','), ',.+,', '-'), ',', '-') seq
from (
select
b_id, s_id,
sum(mark) over w as seq_no
from (
select
b_id, s_id,
(s_id- 1 <> lag(s_id, 1, s_id) over w)::int as mark
from my_table
window w as (partition by b_id order by s_id)
) s
window w as (partition by b_id order by s_id)
) s
group by 1, 2
) s
group by 1;
Here you can find a step-by-step analyse from the innermost query towards the outside.

Converting two Text/Numerical data to Ranges

I'm attempting to convert data from two columns (one with text and one with numbers) to a range.
I've searched and unable to find something that works for this needed solution:
Table:
ColumnA Nvarchar(50)
ColumnB Int
Table Sample:
ColumnA ColumnB
AA 1
AA 2
AA 3
AA 4
AA 5
AB 1
AB 2
AB 3
AB 4
Desired Output:
AA:1-5, AB:1-4
Any help would be greatly appreciated

Note I am assuming the reason you're asking the question is that you can have broken ranges and you're not simply looking for the min/max ColumnB for each ColumnA.
If you ask me, this type of thing is probably best handled in code on either an intermediate layer or directly in your presentation layer. Sort the rows by (ColumnA, ColumnB) in your query, then you can get the desired results in a single pass as you read rows - by comparing the current values with the previous row, and outputting a row when either ColumnA changes or ColumnB is not adjacent.
However, if you're bent on doing this in SQL, you can use a recursive CTE. The basic premise would be to correlate each row with an adjacent row and hold on to the beginning value of ColumnB as you proceed. An adjacent row is defined as a row with the same value of ColumnA and the next value of ColumnB (i.e. the previous row + 1).
Something like the following ought to do:
;with cte as (
select a.ColumnA, a.ColumnB, a.ColumnB as rangeStart
from myTable a
where not exists ( --make sure we don't keep 'intermediate rows' as start rows
select 1
from myTable b
where b.ColumnA = a.ColumnA
and b.ColumnB = a.ColumnB - 1
)
union all
select a.ColumnA, b.ColumnB, a.rangeStart
from cte a
join myTable b on a.ColumnA = b.ColumnA
and b.ColumnB = a.ColumnB + 1 --correlate with 'next' row
)
select ColumnA, rangeStart, max(ColumnB) as rangeEnd
from cte
group by ColumnA, rangeStart
And given your sample data, indeed it does.
And for kicks, here is another Fiddle with data having gaps in ColumnB.

Note the group by clause for the continuous values by doing some math.
DECLARE #Data table (ColumnA Nvarchar(50), ColumnB Int)
INSERT #Data VALUES
('AA', 1),
('AA', 2),
('AA', 3),
--('AA', 4),
('AA', 5),
('AB', 1),
('AB', 2),
('AB', 3),
('AB', 4)
;WITH Ordered AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY ColumnA ORDER BY ColumnB) AS Seq,
*
FROM #Data
)
SELECT
ColumnA,
CASE
WHEN 1 = 0 THEN ''
-- if the ColumnA only has 1 row, the display is 1-1? or just 1?
--WHEN MIN(ColumnB) = MAX(ColumnB) THEN CONVERT(varchar(10), MIN(ColumnB))
ELSE CONVERT(varchar(10), MIN(ColumnB)) + '-' + CONVERT(varchar(10), MAX(ColumnB))
END AS Range
FROM Ordered
GROUP BY
ColumnA,
ColumnB - Seq -- The math
ORDER BY ColumnA, MIN(ColumnB)
SQL Fiddle

Dealing with periods and dates without using cursors

I would like to solve this issue avoiding to use cursors (FETCH).
Here comes the problem...
1st Table/quantity
------------------
periodid periodstart periodend quantity
1 2010/10/01 2010/10/15 5
2st Table/sold items
-----------------------
periodid periodstart periodend solditems
14343 2010/10/05 2010/10/06 2
Now I would like to get the following view or just query result
Table Table/stock
-----------------------
periodstart periodend itemsinstock
2010/10/01 2010/10/04 5
2010/10/05 2010/10/06 3
2010/10/07 2010/10/15 5
It seems impossible to solve this problem without using cursors, or without using single dates instead of periods.
I would appreciate any help.
Thanks

DECLARE #t1 TABLE (periodid INT,periodstart DATE,periodend DATE,quantity INT)
DECLARE #t2 TABLE (periodid INT,periodstart DATE,periodend DATE,solditems INT)
INSERT INTO #t1 VALUES(1,'2010-10-01T00:00:00.000','2010-10-15T00:00:00.000',5)
INSERT INTO #t2 VALUES(14343,'2010-10-05T00:00:00.000','2010-10-06T00:00:00.000',2)
DECLARE #D1 DATE
SELECT #D1 = MIN(P) FROM (SELECT MIN(periodstart) P FROM #t1
UNION ALL
SELECT MIN(periodstart) FROM #t2) D
DECLARE #D2 DATE
SELECT #D2 = MAX(P) FROM (SELECT MAX(periodend) P FROM #t1
UNION ALL
SELECT MAX(periodend) FROM #t2) D
;WITH
L0 AS (SELECT 1 AS c UNION ALL SELECT 1),
L1 AS (SELECT 1 AS c FROM L0 A CROSS JOIN L0 B),
L2 AS (SELECT 1 AS c FROM L1 A CROSS JOIN L1 B),
L3 AS (SELECT 1 AS c FROM L2 A CROSS JOIN L2 B),
L4 AS (SELECT 1 AS c FROM L3 A CROSS JOIN L3 B),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS i FROM L4),
Dates AS(SELECT DATEADD(DAY,i-1,#D1) AS D FROM Nums where i <= 1+DATEDIFF(DAY,#D1,#D2)) ,
Stock As (
SELECT D ,t1.quantity - ISNULL(t2.solditems,0) AS itemsinstock
FROM Dates
LEFT OUTER JOIN #t1 t1 ON t1.periodend >= D and t1.periodstart <= D
LEFT OUTER JOIN #t2 t2 ON t2.periodend >= D and t2.periodstart <= D ),
NStock As (
select D,itemsinstock, ROW_NUMBER() over (order by D) - ROW_NUMBER() over (partition by itemsinstock order by D) AS G
from Stock)
SELECT MIN(D) AS periodstart, MAX(D) AS periodend, itemsinstock
FROM NStock
GROUP BY G, itemsinstock
ORDER BY periodstart

Hopefully a little easier to read than Martin's. I used different tables and sample data, hopefully extrapolating the right info:
CREATE TABLE [dbo].[Quantity](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[Quantity] [int] NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO Quantity (PeriodStart,PeriodEnd,Quantity)
SELECT '20100101','20100115',5
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100107',2 union all
SELECT '20100106','20100108',1
The actual query is now:
;WITH Dates as (
select PeriodStart as DateVal from SoldItems union select PeriodEnd from SoldItems union select PeriodStart from Quantity union select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1 inner join Dates d2 on d1.DateVal < d2.DateVal left join Dates d3 on d1.DateVal < d3.DateVal and d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,COALESCE(SUM(si.SoldItems),0) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate,q.Quantity - qs.Quantity
from QuantitiesSold qs inner join Quantity q on qs.StartDate < q.PeriodEnd and q.PeriodStart < qs.EndDate
And the result is:
StartDate EndDate (No column name)
2010-01-01 2010-01-05 5
2010-01-05 2010-01-06 3
2010-01-06 2010-01-07 2
2010-01-07 2010-01-08 4
2010-01-08 2010-01-15 5
Explanation: I'm using three Common Table Expressions. The first (Dates) is gathering all of the dates that we're talking about, from the two tables involved. The second (Periods) selects consecutive values from the Dates CTE. And the third (QuantitiesSold) then finds items in the SoldItems table that overlap these periods, and adds their totals together. All that remains in the outer select is to subtract these quantities from the total quantity stored in the Quantity Table

John, what you could do is a WHILE loop. Declare and initialise 2 variables before your loop, one being the start date and the other being end date. Your loop would then look like this:
WHILE(#StartEnd <= #EndDate)
BEGIN
--processing goes here
SET #StartEnd = #StartEnd + 1
END
You would need to store your period definitions in another table, so you could retrieve those and output rows when required to a temporary table.
Let me know if you need any more detailed examples, or if I've got the wrong end of the stick!

Damien,
I am trying to fully understand your solution and test it on a large scale of data, but I receive following errors for your code.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near 'Dates'.
Msg 102, Level 15, State 1, Line 22
Incorrect syntax near ','.
Msg 102, Level 15, State 1, Line 25
Incorrect syntax near ','.

Damien,
Based on your solution I also wanted to get a neat display for StockItems without overlapping dates. How about this solution?
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [datetime] NOT NULL,
[PeriodEnd] [datetime] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100106',2 union all
SELECT '20100105','20100108',3 union all
SELECT '20100115','20100116',1 union all
SELECT '20100101','20100120',10
;WITH Dates as (
select PeriodStart as DateVal from SoldItems
union
select PeriodEnd from SoldItems
union
select PeriodStart from Quantity
union
select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1
inner join Dates d2 on d1.DateVal < d2.DateVal
left join Dates d3 on d1.DateVal < d3.DateVal and
d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,SUM(si.SoldItems) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate, qs.Quantity
from QuantitiesSold qs
where qs.quantity is not null

Query to get row from one table, else random row from another

tblUserProfile - I have a table which holds all the Profile Info (too many fields)
tblMonthlyProfiles - Another table which has just the ProfileID in it (the idea is that this table holds 2 profileids which sometimes become monthly profiles (on selection))
Now when I need to show monthly profiles, I simply do a select from this tblMonthlyProfiles and Join with tblUserProfile to get all valid info.
If there are no rows in tblMonthlyProfile, then monthly profile section is not displayed.
Now the requirement is to ALWAYS show Monthly Profiles. If there are no rows in monthlyProfiles, it should pick up 2 random profiles from tblUserProfile. If there is only one row in monthlyProfiles, it should pick up only one random row from tblUserProfile.
What is the best way to do all this in one single query ?
I thought something like this
select top 2 * from tblUserProfile P
LEFT OUTER JOIN tblMonthlyProfiles M
on M.profileid = P.profileid
ORder by NEWID()
But this always gives me 2 random rows from tblProfile. How can I solve this ?

Try something like this:
SELECT TOP 2 Field1, Field2, Field3, FinalOrder FROM
(
select top 2 Field1, Field2, Field3, FinalOrder, '1' As FinalOrder from tblUserProfile P JOIN tblMonthlyProfiles M on M.profileid = P.profileid
UNION
select top 2 Field1, Field2, Field3, FinalOrder, '2' AS FinalOrder from tblUserProfile P LEFT OUTER JOIN tblMonthlyProfiles M on M.profileid = P.profileid ORDER BY NEWID()
)
ORDER BY FinalOrder
The idea being to pick two monthly profiles (if that many exist) and then 2 random profiles (as you correctly did) and then UNION them. You'll have between 2 and 4 records at that point. Grab the top two. FinalOrder column is an easy way to make sure that you try and get the monthly's first.
If you have control of the table structure, you might save yourself some trouble by simply adding a boolean field IsMonthlyProfile to the UserProfile table. Then it's a single table query, order by IsBoolean, NewID()

In SQL 2000+ compliant syntax you could do something like:
Select ...
From (
Select TOP 2 ...
From tblUserProfile As UP
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Order By NewId()
) As RandomProfile
Union All
Select MP....
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From (
Select TOP 1 ...
From tblUserProfile As UP
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
Order By NewId()
) As RandomProfile
Using SQL 2005+ CTE you can do:
With
TwoRandomProfiles As
(
Select TOP 2 ..., ROW_NUMBER() OVER ( ORDER BY UP.ProfileID ) As Num
From tblUserProfile As UP
Order By NewId()
)
Select MP.Col1, ...
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From TwoRandomProfiles
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Union All
Select ...
From TwoRandomProfiles
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
And Num = 1
The CTE has the advantage of only querying for the random profiles once and the use of the ROW_NUMBER() column.
Obviously, in all the UNION statements the number and type of the columns must match.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse