Postgresql: Get records having similar column values - postgresql

Table A
id name keywords
1 Obj1 a,b,c,austin black
2 Obj2 e,f,austin black,h
3 Obj3 k,l,m,n
4 Obj4 austin black,t,u,s
5 Obj5 z,r,q,w
I need to get those records which contains similar type of keywords. Hence the result for the table needs to be:
Records:
1,2,4
Since records 1,2,4 are the one whose some or the other keyword match with at least any other keyword.

You can convert the "csv" to an array and then use Postgres' array functions:
select *
from the_table t1
where exists (select *
from the_table t2
where string_to_array(t1.keywords, ',') && string_to_array(t2.keywords, ',')
and t1.id <> t2.id);

Related

How to use OPENJSON on multiple rows

I have a temp table with multiple rows in it and each row has a column called Categories; which contains a very simple json array of ids for categories in a different table.
A few example rows of the temp table:
Id Name Categories
---------------------------------------------------------------------------------------------
'539f7e28-143e-41bb-8814-a7b93b846007' Test 1 ["category1Id", "category2Id", "category3Id"]
'f29e2ecf-6e37-4aa9-aa56-4a351d298bfc' Test 2 ["category1Id", "category2Id"]
'34e41a0a-ad92-4cd7-bf5c-8df6bfd6ed5c' Test 3 NULL
Now what I would like to do is to select all of the category ids from all of the rows in the temp table.
What I have is the following and it's not working as it's giving me the error of :
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
SELECT
c.Id
,c.[Name]
,c.Color
FROM
dbo.Category as c
WHERE
c.Id in (SELECT [value] FROM OPENJSON((SELECT Categories FROM #TempTable)))
and c.IsDeleted = 0
Which I guess it makes sense that's failing on that because I'm selecting multiple rows and needing to parse each row's respective category ids json. I'm just not sure what to do/change to give me the results that I want. Thank you in advance for any help.
You'd need to use CROSS APPLY like so:
SELECT id ,
name ,
t.Value AS category_id
FROM #temp
CROSS APPLY OPENJSON(categories, '$') t;
And then, you can JOIN to your Categories table using the category_id column, something like this:
SELECT id ,
name ,
t.Value AS category_id,
c.*
FROM #temp
CROSS APPLY OPENJSON(categories, '$') t
LEFT JOIN Categories c ON c.Id = t.Value

SQL left join case statement

Need some help working out the SQL. Unfortunately the version of tsql is SybaseASE which I'm not too familiar with, in MS SQL I would use a windowed function like RANK() or ROW_NUMBER() in a subquery and join to those results ...
Here's what I'm trying to resolve
TABLE A
Id
1
2
3
TABLE B
Id,Type
1,A
1,B
1,C
2,A
2,B
3,A
3,C
4,B
4,C
I would like to return 1 row for each ID and if the ID has a type 'A' record that should display, if it has a different type then it doesn't matter but it cannot be null (can do some arbitrary ordering, like alpha to prioritize "other" return value types)
Results:
1, A
2, A
3, A
4, B
A regular left join (ON A.id = B.id and B.type = 'A') ALMOST returns what I am looking for however it returns null for the type when I want the 'next available' type.
You can use a INNER JOIN on a SubQuery (FirstTypeResult) that will return the minimum type per Id.
Eg:
SELECT TABLEA.[Id], FirstTypeResult.[Type]
FROM TABLEA
JOIN (
SELECT [Id], Min([Type]) As [Type]
FROM TABLEB
GROUP BY [Id]
) FirstTypeResult ON FirstTypeResult.[Id] = TABLEA.[Id]

Using a where clause in a combination with a subquery which result is a record containin an array

I found some results/answers, regarding searching in array, but:
WHERE = ANY, works with a column, but not with a subquery that returns one record that contains an array as a result, triggers error
WHERE IN, also triggers the same error
I also tested untest on the subquery, similar error as the first 2
I don't want to check if a value is in an array but get/execute the query for the values in the array, like WHERE IN (1,2,3,4), not like in the other questions/answers
The error:
No operator matches the given name and argument types. You might need
to add explicit type casts.
or
No operator matches int = int[]
path is an int[]array type.
Structure:
id | name | | slug | path | parent_id
1 name1 slug1 {1} null
2 name2 slug2 {1,2} 1
3 name3 slug3 {1,2,3} 2
4 nam4 slug4 {4} null
What I tries as base:
SELECT t.id, t.name, t.slug FROM types AS t
WHERE t.id in (SELECT t.path FROM types AS t WHERE t.id = 24)
ORDER BY depth ASC
Basically path is like a breadcrumb , {grandparent,parent,type}
Here's one using IN and unnest()
SELECT t1.id,
t1."name",
t1.slug
FROM types t1
WHERE t1.id IN (SELECT un.e
FROM types t2
CROSS JOIN LATERAL unnest(t2.path) un (e)
WHERE t2.id = 2)
ORDER BY array_length(t1.path, 1);
And another one using the array is contained by operator <#.
SELECT t1.id,
t1."name",
t1.slug
FROM types t1
WHERE ARRAY[t1.id] <# (SELECT t2.path
FROM types t2
WHERE t2.id = 2)
ORDER BY array_length(t1.path, 1);
And one using = ANY.
SELECT t1.id,
t1."name",
t1.slug
FROM types t1
WHERE t1.id = ANY ((SELECT t2.path
FROM types t2
WHERE t2.id = 2)::integer[])
ORDER BY array_length(t1.path, 1);
db<>fiddle
You didn't include depth in your sample data so I replaced it with array_length(t1.path, 1) which is probably what it is.

T-SQL query to find common features

I have this table:
ID Value
------------
1 car
1 moto
2 car
2 moto
3 moto
3 apple
4 gel
4 moto
5 NULL
note that moto is common to all IDs.
I would to obtain a single row with this result
car*, moto, apple*, gel*
i.e.
car, apple, gel with an asterisk because is present but NOT in all IDs
moto without an asterisk because is COMMON to all IDs
If ID + Value are Unique
SELECT Value, CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable) THEN '*' ELSE '' END AS Asterisk FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
Note that this won't group in a single line. And note that your question is wrong. ID 5 is an ID, so moto isn't common to all the IDs. It's common to all the IDs that have at least a value.
If we filter these IDs as written,
SELECT Value, CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable WHERE Value IS NOT NULL) THEN '*' ELSE '' END FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
To "merge" the * with Value, simply replace the , with a +, like:
SELECT Value + CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable WHERE Value IS NOT NULL) THEN '*' ELSE '' END Value FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
To make a single line use one of https://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/ I'll add that, sadly, tsql doesn't have any native method to do it, and all the alternatives are a little ugly :-)
In general, the string aggregation part is quite common on SO (and outside of it) Concatenate row values T-SQL, tsql aggregate string for group by, Implode type function in SQL Server 2000?, How to return multiple values in one column (T-SQL)? and too many others to count :-)

Conditional Union in T-SQL

Currently I've a query as follows:
-- Query 1
SELECT
acc_code, acc_name, alias, LAmt, coalesce(LAmt,0) AS amt
FROM
(SELECT
acc_code, acc_name, alias,
(SELECT
(SUM(cr_amt)-SUM(dr_amt))
FROM
ledger_mcg l
WHERE
(l.acc_code LIKE a.acc_code + '.%' OR l.acc_code=a.acc_code)
AND
fy_id=1
AND
posted_date BETWEEN '2010-01-01' AND '2011-06-02') AS LAmt
FROM
acc_head_mcg AS a
WHERE
(acc_type='4')) AS T1
WHERE
coalesce(LAmt,0)<>0
Query 2 is same as Query 1 except that acc_type = '5' in Query 2. Query 2 always returns a resultset with a single row. Now, I need the union of the two queries i.e
Query 1
UNION
Query 2
only when the amt returned by Query 2 is less than 0. Else, I don't need a union but only the resulset from Query 1.
The best way I can think of is to create a parameterised scalar function. How best can I do this?
You could store the result of the first query into a temporary table, then, if the table wasn't empty, execute the other query.
IF OBJECT_ID('tempdb..#MultipleQueriesResults') IS NOT NULL
DROP TABLE #MultipleQueriesResults;
SELECT
acc_code, acc_name, alias, LAmt, coalesce(LAmt,0) AS amt
INTO #MultipleQueriesResults
FROM
(SELECT
acc_code, acc_name, alias,
(SELECT
(SUM(cr_amt)-SUM(dr_amt))
FROM
ledger_mcg l
WHERE
(l.acc_code LIKE a.acc_code + '.%' OR l.acc_code=a.acc_code)
AND
fy_id=1
AND
posted_date BETWEEN '2010-01-01' AND '2011-06-02') AS LAmt
FROM
acc_head_mcg AS a
WHERE
(acc_type='4')) AS T1
WHERE
coalesce(LAmt,0)<>0;
IF NOT EXISTS (SELECT * FROM #MultipleQueriesResults)
… /* run Query 2 */