SQL Renumbering index after group by - tsql

I have the following input table:
Seq Group GroupSequence
1 0
2 4 A
3 4 B
4 4 C
5 0
6 6 A
7 6 B
8 0
Output table is:
Line NewSeq GroupSequence
1 1
2 2 A
3 2 B
4 2 C
5 3
6 4 A
7 4 B
8 5
The rules for the input table are:
Any positive integer in the Group column indicates that the rows are grouped together. The entire field may be NULL or blank. A null or 0 indicates that the row is processed on its own. In the above example there are two groups and three 'single' rows.
the GroupSequence column is a single character that sorts within the group. NULL, blank, 'A', 'B' 'C' 'D' are the only characters allowed.
if Group has a positive integer, there must be alphabetic character in GroupSequence.
I need a query that creates the output table with a new column that sequences as shown.
External apps needs to iterate through this table in either Line or NewSeq order(same order, different values)
I've tried variations on GROUP BY, PARTITION BY, OVER(), etc. WITH no success.
Any help much appreciated.

Perhaps this will help
The only trick here is Flg which will indicate a new Group Sequence (values will be 1 or 0). Then it is a small matter to sum(Flg) via a window function.
Edit - Updated Flg method
Example
Declare #YourTable Table ([Seq] int,[Group] int,[GroupSequence] varchar(50))
Insert Into #YourTable Values
(1,0,null)
,(2,4,'A')
,(3,4,'B')
,(4,4,'C')
,(5,0,null)
,(6,6,'A')
,(7,6,'B')
,(8,0,null)
Select Line = Row_Number() over (Order by Seq)
,NewSeq = Sum(Flg) over (Order By Seq)
,GroupSequence
From (
Select *
,Flg = case when [Group] = lag([Group],1) over (Order by Seq) then 0 else 1 end
From #YourTable
) A
Order By Line
Returns
Line NewSeq GroupSequence
1 1 NULL
2 2 A
3 2 B
4 2 C
5 3 NULL
6 4 A
7 4 B
8 5 NULL

Related

PySpark subtract last row from first row in a group

I want to use window function to partition by ID and have the last row of each group to be subtracted from the first row and create a separate column with the output. What is the cleanest way to achieve that result?
ID col1
1 1
1 2
1 4
2 1
2 1
2 6
3 5
3 5
3 7
Desired output:
ID col1 col2
1 1 3
1 2 3
1 4 3
2 1 5
2 1 5
2 6 5
3 5 2
3 5 2
3 7 2
Code below
w=Window.partitionBy('ID').orderBy('col1').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df.withColumn('out', last('col1').over(w)-first('col1').over(w)).show()
Sounds like you’re defining the “first” row as the row with the minimum value of col1 in the group, and the “last” row as the row with maximum value of col1 in the group. To compute them, you can use the MIN and MAX window functions:
SELECT
ID,
col1,
(MAX(col1) OVER (PARTITION BY ID)) - (MIN(col1) OVER (PARTITION BY ID)) AS col2
FROM
...
If you’re defining “first” and “last” row somehow differently (e.g., in terms of some timestamp), you can use the more general FIRST_VALUE and LAST_VALUE window functions:
SELECT
ID,
col1,
(LAST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
-
(FIRST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
AS col2
FROM
...
The two snippets above are equivalent, but the latter is more general: you can specify ordering by a different column and/or you can modify the window specification.

Postgresql: only keep unique values from integer array

Let's say I have an array of integers
1 6 6 3 3 8 4 4
It will be always of the form n*(pairs of number) + 2 (unique numbers).
Is there an efficient way of keeping only the 2 uniques values (i.e. the 2 with single occurence)?
Here, I would like to get 1 and 8.
So far is what I have:
SELECT node_id
FROM
( SELECT node_id, COUNT(*)
FROM unnest(array[1, 6, 6 , 3, 3 , 8 , 4 ,4]) AS node_id
GROUP BY node_id
) foo
ORDER BY count LIMIT 2;
You are very close, I think:
SELECT node_id
FROM (SELECT node_id, COUNT(*)
FROM unnest(array[1, 6, 6 , 3, 3 , 8 , 4 ,4]) AS node_id
GROUP BY node_id
HAVING count(*) = 1
) foo ;
You can group these back into an array, if you like, using array_agg().

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.

How to number rows with a repeating 1,2,3,4, 1,2,3,4,... series

How can I add a series in length of 4 to a table like this:
Source table:
id
1
2
3
4
5
6
7
8
Results table:
id series
1 1
2 2
3 3
4 4
5 1
6 2
7 3
8 4
I'm using PostgreSQL 9.1.
If your IDs are really consecutive and gapless, you can just use id % 4 + 1. But I imagine that in reality your IDs aren't so orderly, and if they're generated from a SEQUENCE you shouldn't rely on them being gapless.
You can do it properly with row_number(), as shown here: http://sqlfiddle.com/#!12/22767/5
SELECT
id,
(row_number() OVER (ORDER BY id) - 1) % 4 + 1
FROM Table1
ORDER BY 1;
It's also possible to do using generate_series as a set-returning-function in the SELECT list, but that's a PostgreSQL extension, wheras the above is standard SQL that'll work in any modern database except MySQL, which doesn't support window functions.
If you want to actually add a column to the table it gets a bit more complicated. I'm not really sure why you'd want to do that, but it's possible using UPDATE ... FROM:
BEGIN;
ALTER TABLE table1 ADD COLUMN col2 INTEGER;
WITH gen_num(id,n) AS (
SELECT
id,
(row_number() OVER (ORDER BY id) - 1) % 4 + 1
FROM Table1
ORDER BY 1)
UPDATE table1 SET col2 = n
FROM gen_num
WHERE gen_num.id = table1.id;
COMMIT;

T-SQL table variable data order

I have a UDF which returns table variable like
--
--
RETURNS #ElementTable TABLE
(
ElementID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
ElementValue VARCHAR(MAX)
)
AS
--
--
Is the order of data in this table variable guaranteed to be same as the order data is inserted into it. e.g. if I issue
INSERT INTO #ElementTable(ElementValue) VALUES ('1')
INSERT INTO #ElementTable(ElementValue) VALUES ('2')
INSERT INTO #ElementTable(ElementValue) VALUES ('3')
I expect data will always be returned in that order when I say
select ElementValue from #ElementTable --Here I don't use order by
EDIT:
If order by is not guaranteed then the following query
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross Apply dbo.MyFunc T2
order by t1.elementid
will not produce 9x9 matrix as
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
consistently.
Is there any possibility that it could be like
1 2
1 1
1 3
2 3
2 2
2 1
3 1
3 2
3 3
How to do it using my above function?
No, the order is not guaranteed to be the same.
Unless, of course you are using ORDER BY. Then it is guaranteed to be the same.
Given your update, you obtain it in the obvious way - you ask the system to give you the results in the order you want:
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross join dbo.MyFunc() T2
order by t1.elementid, t2.elementid
You are guaranteed that if you're using inefficient single row inserts within your UDF, that the IDENTITY values will match the order in which the individual INSERT statements were specified.
Order is not guaranteed.
But if all you want is just simply to get your records back in the same order you inserted them, then just order by your primary key. Since you already have that field setup as an auto-increment, it should suffice.
...or use a deterministic function
SELECT TOP 9
M1 = (ROW_NUMBER() OVER(ORDER BY id) + 2) / 3,
M2 = (ROW_NUMBER() OVER(ORDER BY id) + 2) % 3 + 1
FROM
sysobjects
M1 M2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3