T-SQL generate sequence from string and count - tsql

I need to generate a sequence starting from a CSV string and a maximum count.
When the sequence exceed, I need to start the sequence again and continue until I saturate the COUNT variable
I have the following CSV:
A,B,C,D
In order to get 4 rows out of this CSV I am using XML and the following statement:
SET #xml_csv = N'<root><r>' + replace('A, B, C, D',',','</r><r>') + '</r></root>'
SELECT
REPLACE(t.value('.','varchar(max)'), ' ', '') AS [delimited items]
FROM
#xml_csv.nodes('//root/r') AS a(t)
Now my SELECT returns the following output:
|-------------|
| A |
| B |
| C |
| D |
Assuming I have a #count variable set to 9, I need to output the following:
|--|-----------|
|1 |A |
|2 |B |
|3 |C |
|4 |D |
|5 |A |
|6 |B |
|7 |C |
|8 |D |
|9 |A |
I tried to join a table called master..[spt_values] but I get for a COUNT = 10 10 rows for A, 10 for B and so on, while I need the sequence ordered and repeated until it saturate

Basically you are on the correct path. Joining the split result with a numbers table will get you the correct output.
I've chosen to use a different function for splitting the csv data since it's using a numbers table for the split as well. (taken from this great article)
First, if you don't already have a numbers table, create one. here is the script used in the article I've linked to:
SET NOCOUNT ON;
DECLARE #UpperLimit INT = 1000;
WITH n AS
(
SELECT
x = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
CROSS JOIN sys.all_objects AS s3
)
SELECT Number = x
INTO dbo.Numbers
FROM n
WHERE x BETWEEN 1 AND #UpperLimit;
GO
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers(Number)
WITH (DATA_COMPRESSION = PAGE);
GO
Then, create the split function:
CREATE FUNCTION dbo.SplitStrings_Numbers
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)
FROM dbo.Numbers
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, LEN(#Delimiter)) = #Delimiter
);
GO
Next step: Join the split results with the numbers table:
DECLARE #Csv varchar(20) = 'A,B,C,D'
SELECT TOP 10 Item
FROM dbo.SplitStrings_Numbers(#Csv, ',')
CROSS JOIN Numbers
ORDER BY Number
Output:
Item
----
A
B
C
D
A
B
C
D
A
B
Great thanks to Aaron Bertrand for sharing his knowledge.

Related

Efficient way to retrieve all values from a column that start with other values from the same column in PostgreSQL

For the sake of simplicity, suppose you have a table with numbers like:
| number |
----------
|123 |
|1234 |
|12345 |
|123456 |
|111 |
|1111 |
|2 |
|700 |
What would be an efficient way of retrieving the shortest numbers (call them roots or whatever) and all values derived from them, eg:
| root | derivatives |
--------------------------------
| 123 | 1234, 12345, 123456 |
| 111 | 1111 |
Numbers 2 & 700 are excluded from the list because they're unique, and thus have no derivatives.
An output as the above would be ideal, but since it's probably difficult to achieve, the next best thing would be something like below, which I can then post-process:
| root | derivative |
-----------------------
| 123 | 1234 |
| 123 | 12345 |
| 123 | 123456 |
| 111 | 1111 |
My naive initial attempt to at least identify roots (see below) has been running for 4h now with a dataset of ~500k items, but the real one I'd have to inspect consists of millions.
select number
from numbers n1
where exists(
select number
from numbers n2
where n2.number <> n1.number
and n2.number like n1.number || '_%'
);
This works if number is an integer or bigint:
select min(a.number) as root, b.number as derivative
from nums a
cross join lateral generate_series(1, 18) as gs(power)
join nums b
on b.number / (10^gs.power)::bigint = a.number
group by b.number
order by root, derivative;
EDIT: I moved a non-working query to the bottom. It fails for reasons outlined by #Morfic in the comments.
We can do a similar and simpler join using like for character types:
select min(a.number) as root, b.number as derivative
from numchar a
join numchar b on b.number like a.number||'%'
and b.number != a.number
group by b.number
order by root, derivative;
Updated fiddle.
Faulty Solution Follows
If number is a character type, then try this:
with groupings as (
select number,
case
when number like (lag(number) over (order by number))||'%' then 0
else 1
end as newgroup
from numchar
), groupnums as (
select number, sum(newgroup) over (order by number) as groupnum
from groupings
), matches as (
select min(number) over (partition by groupnum) as root,
number as derivative
from groupnums
)
select *
from matches
where root != derivative;
There should be only a single sort on groupnum in this execution since the column is your table's primary key.
db<>fiddle here

How to split string in PostgreSQL to make combination with another string

I have data like below
Id | Data |Parent Id
----------------------------------------------------------------------------------
1 | IceCream # Chocolate # SoftDrink |0
2 | Amul,Havemore#Cadbary,Nestle#Pepsi |1
3 | Party#Wedding |0
I want to split this data in below format where row 2 is dependent on row 1. I have added ParentId which is use to find dependency.
IceCream | Amul | Party
IceCream | Havemore | Party
IceCream | Amul | Wedding
IceCream | Havemore | Wedding
Chocolate | Cadbery | Party
Chocolate | Nestle | Party
Chocolate | Cadbery | Wedding
Chocolate | Nestle | Wedding
SoftDrink | Pepsi | Party
SoftDrink | Pepsi | Wedding
I have used unnest(string_to_array) to split string but unable to traverse through loop to make this combination.
The is a very "unstable",like sitting on a knife edge and could easily fall apart. It depends on assigning values for each delimited value and then joining on those values. Maybe those flags that are known to you (but unfortunately not us) can stabilize it. But it does match your indicated expectations. It uses the function regexp_split_to_table rather than unnest to split the delimiters.
with base (num, list) as
( values (1,'IceCream#Chocolate#SoftDrink')
, (2,'Amul,Havemore#Cadbary,Nestle#Pepsi')
, (3,'Party#Wedding')
)
, product as
(select p, row_number(*) over() pn
from (
select regexp_split_to_table(list,'#') p
from base
where num=1
) x
)
, maker as
(select regexp_split_to_table(m, ',') m, row_number(*) over() mn
from (
select regexp_split_to_table(list,'#') m
from base
where num=2
) y
)
, event as
( select regexp_split_to_table(regexp_split_to_table(list,'#'), ',') e
from base
where num=3
)
select p as product
, m as maker
, e as event
from (product join maker on pn = mn) cross join event e
order by pn, e, m;
Hope it helps.

Left Join two tables - dont include the joins where second table has more than 1 row for value from first table; rejects

As title said, I want to reject rows, so I will not create duplicates.
And first step is not to join on values that have more rows in second table.
Here is an example if needed:
Table a:
aa |bb |
---|----|
1 |111 |
2 |222 |
Table h:
hh |kk |
---|----|
1 |111 |
2 |111 |
3 |222 |
Using Normal Left join:
SELECT
*
FROM a
LEFT JOIN h
ON a.bb = h.kk
;
I get:
aa |bb |hh |kk |
---|----|---|----|
1 |111 |1 |111 |
1 |111 |2 |111 |
2 |222 |3 |222 |
I want to get rid of first two rows, where aa = 1.
...
And second step would be for another query, probably with some case, where is table a I will filter out only those rows which have in table b more than 2 rows.
Therefore I want to create table c, where i will have:
aa |bb |
---|----|
1 |111 |
Can someone help me please?
Thank you.
To get only the 1:1 joins
SELECT a.aa,h.hh,h.kk FROM a
LEFT JOIN h ON a.bb = h.kk
GROUP BY bb HAVING COUNT(kk)=1
To get only the 1:n joins
SELECT a.aa,h.hh,h.kk FROM a
LEFT JOIN h ON a.bb = h.kk
GROUP BY bb HAVING COUNT(kk)>1

Postgresql select, show fixed count rows

Simple question. I have a table "tablename" with 3 rows. I need show 5 rows in my select when count rows < 5.
select * from tablename
+------------------+
|colname1 |colname2|
+---------+--------+
|1 |AAA |
|2 |BBB |
|3 |CCC |
+---------+--------+
In this query I show all rows in the table.
But I need show 5 rows. 2 rows is empty.
For example (I need):
+------------------+
|colname1 |colname2|
+---------+--------+
|1 |AAA |
|2 |BBB |
|3 |CCC |
| | |
| | |
+---------+--------+
Last 2 rows is empty.
It is possible?
Something like this:
with num_rows (rn) as (
select i
from generate_series(1,5) i -- adjust here the desired number of rows
), numbered_table as (
select colname1,
colname2,
row_number() over (order by colname1) as rn
from tablename
)
select t.colname1, t.colname2
from num_rows r
left outer join numbered_table t on r.rn = t.rn;
This assigns a number for each row in tablename and joins that to a fixed number of rows. If you know that your values in colname1 are always sequential and without gaps (which is highly unlikely) then you can remove the generation of row numbers in the second CTE using row_number().
If you don't care which rows are returned, you can leave out the order by part - but then the rows that are matched will be random. Leaving out the order by will be a bit more efficient.
The above will always return exactly 5 rows, regardless of how many rows tablename contains. If you want at least 5 rows, then you need to flip the outer join:
....
select t.colname1, t.colname2
from numbered_table t
left outer join num_rows r on r.rn = t.rn;
SQLFiddle example: http://sqlfiddle.com/#!15/e5770/3

How to eliminate repeated field with GROUP BY clause?

I have 3 tables called:
1.app_tenant pk:id, fk:pasar_id
---+--------+-----------+
id | nama | pasar_id |
----+--------+-----------+
1 | joe | 1 |
2 | adi | 2 |
3 | adam | 3 |
2.app_pasar pk:id
----+------------- +
id | nama |
----+------------- +
1 | kosambi |
2 | gede bage |
3 | pasar minggu |
3.app_kios pk:id, fk:tenant_id
----+---------------+----------
id | nama |tenant_id
----+-------------- +----------
1 | kios1 |1
2 | kios2 |2
3 | kios3 |3
4 | kios4 |1
5 | kios5 |1
6 | kios6 |2
7 | kios7 |2
8 | kios8 |3
9 | kios9 |3
Then with a LEFT JOIN query and grouping by id in every table I want to displaying data like this:
----+---------------+------------+-----------
id | nama_tenant |nama_pasar |nama_kios
----+-------------- +------------------------
1 | joe |kosambi |kios 1
2 | adi |gede bage |kios 2
2 | adam |pasar minggu|kios 3
but after I execute this query, data are not shown as expected. The problem is
redundancy in the nama_tenant field. How can I eliminate repeated nama_tenantrecords?
This is my query:
select a.id,a.nama as nama_tenant,
b.nama as nama_pasar,
c.nama as nama_kios
from app_tenant a
left join app_pasar b on a.id=b.id
left join app_kios c on a.id= c.tenant_id
group by
a.id,
b.id,
c.id
Table definitions:
CREATE TABLE app_tenant (
id serial PRIMARY KEY,
nama character varying,
pasar_id integer);
CREATE TABLE app_kios (
id serial PRIMARY KEY,
nama character varying,
tenant_id integer REFERENCES app_tenant);
The problem is that tenants can have multiple kiosks. From your sample data it looks like you want to display the first kiosk of every tenant (although "first" is a vague concept on strings, here I use alphabetical sort order). Your query would be like this:
SELECT t.id, t.nama AS nama_tenant, p.nama AS nama_pasar, k.nama AS nama_kios
FROM app_tenant t
LEFT JOIN app_pasar p ON p.id = t.pasar_id
LEFT JOIN (
SELECT tenant_id, nama, rank() OVER (PARTITION BY tenant_id ORDER BY nama) AS rnk
FROM app_kios
WHERE rnk = 1) k ON k.tenant_id = t.id
ORDER BY t.id
The sub-query on app_kios uses a window function to get the first kiosk name after sorting the names of the kiosk for each tenant.
I would also suggest to use meaningful aliases for table names instead of simply a, b, c.