Add ORDINALITY to expanded JSON array in Postgres 11.7

Add ORDINALITY to expanded JSON array in Postgres 11.7 - postgresql

I'm taking two JSONB arrays, unpacking them, and combing the results. I'm trying to add WITH ORDINALITY to the JSON array unpacking. I've been unable to figure out how to add WITH ORDINALITY. For some reason, I can't find WITH ORDINALITY in the documentation for Postgres 11's JSON tools:
https://www.postgresql.org/docs/11/functions-json.html
I've seen examples using jsonb_array_elements....WITH ORDINALITY, but haven't been able to get it to work. First, a functional example based on Postgres arrays:
WITH
first AS (
SELECT * FROM
UNNEST (ARRAY['Charles','Jane','George','Percy']) WITH ORDINALITY AS x(name_, index)
),
last AS (
SELECT * FROM
UNNEST (ARRAY['Dickens','Austen','Eliot']) WITH ORDINALITY AS y(name_, index)
)
SELECT first.name_ AS first_name,
last.name_ AS last_name
FROM first
JOIN last ON (last.index = first.index)
This gives the desired output:
first_name last_name
Charles Dickens
Jane Austen
George Eliot
I'm using the ORDINALITY index to make the JOIN, as I'm combining two lists for pair-wise comparison. I can assume my lists are equally sized.
However, my input is going to be a JSON array, not a Postgres array. I've got the unpacking working with jsonb_to_recordset, but have not got the ordinality generation working. Here's a sample that does the unpacking part correctly:
DROP FUNCTION IF EXISTS tools.try_ordinality (jsonb, jsonb);
CREATE OR REPLACE FUNCTION tools.try_ordinality (
base_jsonb_in jsonb,
comparison_jsonb_in jsonb)
RETURNS TABLE (
base_text citext,
base_id citext,
comparison_text citext,
comparison_id citext)
AS $BODY$
BEGIN
RETURN QUERY
WITH
base_expanded AS (
select *
from jsonb_to_recordset (
base_jsonb_in)
AS base_unpacked (text citext, id citext)
),
comparison_expanded AS (
select *
from jsonb_to_recordset (
comparison_jsonb_in)
AS comparison_unpacked (text citext, id citext)
),
combined_lists AS (
select base_expanded.text AS base_text,
base_expanded.id AS base_id,
comparison_expanded.text AS comparison_text,
comparison_expanded.id AS comparison_id
from base_expanded,
comparison_expanded
)
select *
from combined_lists;
END
$BODY$
LANGUAGE plpgsql;
select * from try_ordinality (
'[
{"text":"Fuzzy Green Bunny","id":"1"},
{"text":"Small Gray Turtle","id":"2"}
]',
'[
{"text":"Red Large Special","id":"3"},
{"text":"Blue Small","id":"4"},
{"text":"Green Medium Special","id":"5"}
]'
);
But that's a CROSS JOIN
base_text base_id comparison_text comparison_id
Fuzzy Green Bunny 1 Red Large Special 3
Fuzzy Green Bunny 1 Blue Small 4
Fuzzy Green Bunny 1 Green Medium Special 5
Small Gray Turtle 2 Red Large Special 3
Small Gray Turtle 2 Blue Small 4
Small Gray Turtle 2 Green Medium Special 5
I'm after a pair-wise result with only two rows:
Fuzzy Green Bunny 1 Red Large Special 3
Small Gray Turtle 2 Blue Small 4
I've tried switching to jsonb_array_elements, as in this snippet:
WITH
base_expanded AS (
select *
from jsonb_array_elements (
base_jsonb_in)
AS base_unpacked (text citext, id citext)
),
I get back
ERROR: a column definition list is only allowed for functions returning "record"
Is there a straightforward way to get ordinality on an unpacked JSON array? It's very easy with UNNEST on a Postgres array.
I'm happy to learn I've screwed up the syntax.
I can CREATE TYPE, if it's of any help.
I can convert to a Postgres array, if that's straightforward to do.
Thanks for any suggestions.

You do it exactly the same way.
with first as (
select *
from jsonb_array_elements('[
{"text":"Fuzzy Green Bunny","id":"1"},
{"text":"Small Gray Turtle","id":"2"}
]'::jsonb) with ordinality as f(element, idx)
), last as (
select *
from jsonb_array_elements('[
{"text":"Red Large Special","id":"3"},
{"text":"Blue Small","id":"4"},
{"text":"Green Medium Special","id":"5"}
]'::jsonb) with ordinality as f(element, idx)
)
SELECT first.element ->> 'text' AS first_name,
last.element ->> 'text' AS last_name
FROM first
JOIN last ON last.idx = first.idx

Related

TSQL Comparing sets of similar and repeating items

I have a table of orders waiting to be fulfilled and a table of returned orders. There is only one product but you can order in different quantity packs. My job is to pair these orders with returns but the return order must match exactly with the current order in terms of the number of packs ordered and quantity in each pack. So matching on the number of packs ordered is no issue but matching up the quantities is giving me a headache. My orders and returns are pipe delimited fields. An order/return of 3 packs of 30 each will look like "30|30|30". An order of 3 packs, 2 of 15 and 1 of 30 will look like "15|15|30", "15|30|15", or "30|15|15". An order of "15|15|30" can be paired up with a return of "15|30|15" as they are the same. I know I need to parse out the items in the fields into a table first. But how do I compare them?
I have gone through all the examples here: TSQL Comparing two Sets
the intersect and cross join examples doesn't work when there are duplicates in the set - (a,a,b) and (b,a,a) do not match
full join doesn't work 2 sets have different qtys of same elements - (a,a,c) (and a,c,c) is a match
So my thoughts at this point are to maybe parse into table, sort, reassemble back into ordered piped string and compare the 2 strings. That would work but is it the best way to do this?
Editing to add - I cannot change the data model. SQL Server 2017
Sample data (records 2 and 3 would be a match):
declare #comp table(
OrderNo int,
OrderPackCount int,
OrderTtlPieces int,
OrderQtys varchar (50),
ReturnNo int,
RtnPackCount int,
RtnTtlPieces int,
RtnQtys varchar(50))
insert into #comp values
(55500, 2, 100, '50|50|', 401, 2, 100, '75|25|'),
(55501, 2, 60, '20|40|', 404, 2, 60, '40|20|'),
(55504, 3, 75, '15|30|30|', 385, 7, 75, '30|15|30|'),
(55508, 3, 90, '30|30|30|', 422, 7, 75, '50|30|10|')

Couple options to try.
Here's an example of splitting the values, reordering, putting them back to do the compare. Since you mention SQL2017 we can use STRING_SPLIT and then use STRING_AGG with WITHIN GROUP ( ORDER BY <order_by_expression_list> [ ASC | DESC ] ) to reorder and concatenate the values.
SELECT [comp].*
FROM #comp [comp]
CROSS APPLY (
SELECT STRING_AGG([value], '|') WITHIN GROUP(ORDER BY [value]) AS [OrdStr]
FROM STRING_SPLIT([comp].[OrderQtys], '|')
WHERE [value] <> ''
) AS [ord]
CROSS APPLY (
SELECT STRING_AGG([value], '|') WITHIN GROUP(ORDER BY [value]) AS [RtnStr]
FROM STRING_SPLIT([comp].[RtnQtys], '|')
WHERE [value] <> ''
) AS [Rnt]
WHERE [ord].[OrdStr] = [Rnt].[RtnStr];
Another option would be to identify those that do not match and then use EXCEPT. Split the values, aggregate and get a count by value, then outer apply where the values equal having the same count and then identify those that do not match. EXCEPT then returns values those that are not in that result.
SELECT *
FROM #comp
EXCEPT
SELECT [comp].*
FROM #comp [comp]
OUTER APPLY (
SELECT [value] AS [OrdValue]
, COUNT(*) AS [RntCnt]
FROM STRING_SPLIT([comp].[OrderQtys], '|')
WHERE [value] <> ''
GROUP BY [value]
) AS [ord]
OUTER APPLY (
SELECT [value] AS [RntValue]
, COUNT(*) AS [RntCnt]
FROM STRING_SPLIT([comp].[RtnQtys], '|')
WHERE [value] <> ''
AND [value] = [ord].[OrdValue]
GROUP BY [value]
HAVING COUNT(*) = [ord].[RntCnt]
) AS [Rnt]
WHERE [Rnt].[RntValue] IS NULL;

How to aggregate all the resulted rows column data in one column?

I have a case driven query . Below is the simplest form
select Column 1 from mytable
Results :
Column 1
latinnametest
LatinManual
LatinAuto
Is it possible to show the aggregated data of column 1 data of all the resulted rows in another Column say column 5 in front of each row with comma separated ?
Expected :
Column 1 Column 2
latinnametest latinnametest,LatinManual,LatinAuto
LatinManual latinnametest,LatinManual,LatinAuto
LatinAuto latinnametest,LatinManual,LatinAuto
I have used array_agg and concat() but it aggregates the same row data in column 2 but not as expected to add all rows column data comma separated . Any help please.
Edit :
I have tried the solution mentioned below but I am getting repetitive data in the column . see the screenshot. I have hover the mouse over that last column and see the repetitive data . Any solution to this ?
[![enter image description here][1]][1]

You can use string_agg() as a window function:
select column_1,
string_agg(column_1, ',') over () as all_values
from the_table;
Edit, after the scope was changed:
If you need distinct values, use a derived table:
select column_1,
string_agg(column_1, ',') over () as all_values
from (
select distinct column_1
from the_table
) t;
Alternatively with a common table expression:
with vals as (
select string_agg(distinct column_1, ',') as all_values
from the_table
)
select t.column_1, v.all_values
from the_table t
cross join vals v

Postgres - Repeating an element N times as array

For example, where the element is 'hi', and where N is 3, I need a PostgreSQL snippet I can use in a SELECT query that returns the following array:
['hi', 'hi', 'hi']

Postgres provides array_fill for this purpose, e.g.:
SELECT array_fill('hi'::text, '{3}');
SELECT array_fill('hi'::text, array[3]);
The two examples are equivalent but the 2nd form is more convenient if you wish to replace the dimension 3 with a variable.
See also: https://www.postgresql.org/docs/current/functions-array.html

You may use array_agg with generate_series
select array_agg(s) from ( values('hi')) as t(s) cross join generate_series(1,3)
Generic
select array_agg(s) from ( values(:elem)) as t(s) cross join generate_series(1,:n)
DEMO

sql demo
with cte as (
select 'hi' as rep_word, generate_series(1, 3) as value
) -- ^^^ n = 3
select array(SELECT rep_word::text from cte);

Converting a table with a key and comment field into a key and row for every word in the column field

I have a table with unstructured data I am trying to analyze to try to build a relational lookup. I do not have use of word cloud software.
I really have no idea how to solve this problem. Searching for solutions has lead me to tools that might do this for me that cost money, not coded solutions.
Basically my data looks like this:
CK1 CK2 Comment
--------------------------------------------------------------
1 A This is a comment.
2 A Another comment here.
And this is what I need to create:
CK1 CK2 Words
--------------------------------------------------------------
1 A This
1 A is
1 A a
1 A comment.
2 A Another
2 A comment
2 A here.

What you are trying to do is tokenize a string using a space as a Delimiter. In the SQL world people often refer to functions that do this as a "Splitter". The potential pitfall of using a splitter for this type of thing is how words can be separated by multiple spaces, tabs, CHAR(10)'s, CHAR(13)'s, CHAR()'s, etc. Poor grammar, such as not adding a space after a period results in this:
" End of sentence.Next sentence"
sentence.Next is returned as a word.
The way I like to tokenize human text is to:
Replace any text that isn't a character with a space
Replace duplicate spaces
Trim the string
Split the newly transformed string using a space as the delimiter.
Below is my solution followed by the DDL to create the functions used.
-- Sample Data
DECLARE #yourtable TABLE (CK1 INT, CK2 CHAR(1), Comment VARCHAR(8000));
INSERT #yourtable (CK1, CK2, Comment)
VALUES
(1,'A','This is a typical comment...Follewed by another...'),
(2,'A','This comment has double spaces and tabs and even carriage
returns!');
-- Solution
SELECT t.CK1, t.CK2, split.itemNumber, split.itemIndex, split.itemLength, split.item
FROM #yourtable AS t
CROSS APPLY samd.patReplace(t.Comment,'[^a-zA-Z ]',' ') AS c1
CROSS APPLY samd.removeDupChar8K(c1.newString,' ') AS c2
CROSS APPLY samd.delimitedSplitAB8K(LTRIM(RTRIM(c2.NewString)),' ') AS split;
Results (truncated for brevity):
CK1 CK2 itemNumber itemIndex itemLength item
----------- ---- -------------------- ----------- ----------- --------------
1 A 1 1 4 This
1 A 2 6 2 is
1 A 3 9 1 a
1 A 4 11 7 typical
1 A 5 19 7 comment
...
2 A 1 1 4 This
2 A 2 6 7 comment
2 A 3 14 3 has
2 A 4 18 6 double
...
Note that the splitter I'm using is based of Jeff Moden's Delimited Split8K with a couple tweeks.
Functions used:
CREATE FUNCTION dbo.rangeAB
(
#low bigint,
#high bigint,
#gap bigint,
#row1 bit
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS
(
SELECT 1
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0)) T(N) -- 90 values
),
L2(N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
SELECT r.RN, r.OP, r.N1, r.N2
FROM
(
SELECT
RN = 0,
OP = (#high-#low)/#gap,
N1 = #low,
N2 = #gap+#low
WHERE #row1 = 0
UNION ALL -- COALESCE required in the TOP statement below for error handling purposes
SELECT TOP (ABS((COALESCE(#high,0)-COALESCE(#low,0))/COALESCE(#gap,0)+COALESCE(#row1,1)))
RN = i.rn,
OP = (#high-#low)/#gap+(2*#row1)-i.rn,
N1 = (i.rn-#row1)*#gap+#low,
N2 = (i.rn-(#row1-1))*#gap+#low
FROM iTally AS i
ORDER BY i.rn
) AS r
WHERE #high&#low&#gap&#row1 IS NOT NULL AND #high >= #low AND #gap > 0;
GO
CREATE FUNCTION samd.NGrams8k
(
#string VARCHAR(8000), -- Input string
#N INT -- requested token size
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
position = r.RN,
token = SUBSTRING(#string, CHECKSUM(r.RN), #N)
FROM dbo.rangeAB(1, LEN(#string)+1-#N,1,1) AS r
WHERE #N > 0 AND #N <= LEN(#string);
GO
CREATE FUNCTION samd.patReplace8K
(
#string VARCHAR(8000),
#pattern VARCHAR(50),
#replace VARCHAR(20)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newString =
(
SELECT CASE WHEN #string = CAST('' AS VARCHAR(8000)) THEN CAST('' AS VARCHAR(8000))
WHEN #pattern+#replace+#string IS NOT NULL THEN
CASE WHEN PATINDEX(#pattern,token COLLATE Latin1_General_BIN)=0
THEN ng.token ELSE #replace END END
FROM samd.NGrams8K(#string, 1) AS ng
ORDER BY ng.position
FOR XML PATH(''),TYPE
).value('text()[1]', 'VARCHAR(8000)');
GO
CREATE FUNCTION samd.delimitedSplitAB8K
(
#string VARCHAR(8000), -- input string
#delimiter CHAR(1) -- delimiter
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
itemNumber = ROW_NUMBER() OVER (ORDER BY d.p),
itemIndex = CHECKSUM(ISNULL(NULLIF(d.p+1, 0),1)),
itemLength = CHECKSUM(item.ln),
item = SUBSTRING(#string, d.p+1, item.ln)
FROM (VALUES (DATALENGTH(#string))) AS l(s) -- length of the string
CROSS APPLY
(
SELECT 0 UNION ALL -- for handling leading delimiters
SELECT ng.position
FROM samd.NGrams8K(#string, 1) AS ng
WHERE token = #delimiter
) AS d(p) -- delimiter.position
CROSS APPLY (VALUES( --LEAD(d.p, 1, l.s+l.d) OVER (ORDER BY d.p) - (d.p+l.d)
ISNULL(NULLIF(CHARINDEX(#delimiter,#string,d.p+1),0)-(d.p+1), l.s-d.p))) AS item(ln);
GO
CREATE FUNCTION dbo.RemoveDupChar8K(#string varchar(8000), #char char(1))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT NewString =
replace(replace(replace(replace(replace(replace(replace(
#string COLLATE LATIN1_GENERAL_BIN,
replicate(#char,33), #char), --33
replicate(#char,17), #char), --17
replicate(#char,9 ), #char), -- 9
replicate(#char,5 ), #char), -- 5
replicate(#char,3 ), #char), -- 3
replicate(#char,2 ), #char), -- 2
replicate(#char,2 ), #char); -- 2
GO

1) If we are using SQL Server 2016 and above then we should probably
use the built-in function STRING_SPLIT
-- SQL 2016and above
DECLARE #txt NVARCHAR(100) = N'This is a comment.'
select [value] from STRING_SPLIT(#txt, ' ')
2) Only if 1 does not fit, then if the number of separation (the space in our case) is less then 3 which fit your sample data, then we should probably use PARSENAME
-- BEFORE SQL 2016 if we have less than 4 parts
DECLARE #txt NVARCHAR(100) = N'This is a comment.'
DECLARE #Temp NVARCHAR(200) = REPLACE (#txt,'.','#')
SELECT t FROM (VALUES(1),(2),(3),(4))T1(n)
CROSS APPLY (SELECT REPLACE(PARSENAME(REPLACE(#Temp,' ','.'),T1.n), '#','.'))T2(t)
3) Only if the 1 and 2 does not fit, then we should use SQLCLR function
http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/
4) Only if we cannot use 1,2 and we cannot use SQLCLR (which implies a real problematic administration and has nothing with security since you can have all the SQLCLR function in a read-only database for the use of all users, as I explain in my lectures), then you can use T-SQL and create UDF.
https://sqlperformance.com/2012/07/t-sql-queries/split-strings

postgis advanced (?) selection query

The problem: I need to select, for each building in my table that has say at least 2 pharmacies and 2 education centers within a radius of 1km, all POIs (pharmacies, comercial centres, medical centers, education centers, police stations, fire stations) which are within 1km of the respective building. table structure->
building (id serial, name varchar )
poi_category(id serial, cname varchar) --cname being the category name of course
poi(id serial, name varchar, c_id integer)-- c_id is the FK referencing poi_category(id)
all coordinate columns are of type geometry not geography (let's call them geom)
here's the way i thought it should be done but i'm not sure it's even correct let alone the optimal solution to this problem
SELECT r.id_b, r.id_p
FROM (
SELECT b.id AS id_b, p.id AS id_p, pc.id AS id_pc,pc.cname
FROM building AS b, poi AS p, poi_category AS pc
WHERE ST_DWithin(b.geom,p.geom, 1000) AND p.c_id=pc.id
) AS r,
(
SELECT * FROM r GROUP BY id_b
) AS r1
HAVING count (
SELECT *
FROM r, r1
WHERE r1.id_b=r.id_b AND r.id_pc='pharmacy'
)>1
AND
count (
SELECT *
FROM r, r1
WHERE r1.id_b=r.id_b AND r.id_pc='ed. centre'
)>1
Is this the way to go for what i need ? What solution would be better from a performance point of view? What about the most elegant solution?
I've also posted here :http://gis.stackexchange.com/questions/11445/postgis-advanced-selection-query

This is a solution I elaborated. It's the fastest one I could find but it's still slow. Given the nature of the task I doubt it can be made faster...
WITH
building AS (
SELECT way, osm_id
FROM osm_polygon
WHERE tags #> hstore('building','yes')
--ORDER BY 1
LIMIT 1000
),
pharmacy AS (
SELECT way
FROM osm_poi
WHERE tags #> hstore('amenity','pharmacy')
),
school AS (
SELECT way
FROM osm_poi
WHERE tags #> hstore('amenity','school')
)
SELECT ST_AsText(building.way) AS geom, building.osm_id AS label
FROM building
WHERE
(SELECT count(*) > 1
FROM pharmacy
WHERE ST_DWithin(building.way,pharmacy.way,1000))
AND
(SELECT count(*) > 1
FROM school
WHERE ST_DWithin(building.way,school.way,1000))
Yours. S.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Add ORDINALITY to expanded JSON array in Postgres 11.7 - postgresql

Related

TSQL Comparing sets of similar and repeating items

How to aggregate all the resulted rows column data in one column?

Postgres - Repeating an element N times as array

Converting a table with a key and comment field into a key and row for every word in the column field

postgis advanced (?) selection query

Categories

Resources