Basically I'm trying to compare two JSONB rows and return a numeric value. But I wanna be able to query for it. I'm not sure whether I should use a custom SQL function, a calculated field, or a Postgres generated column, so I need a bit of advice.
I have a jsonb column for each user that keeps a few hundreds of keys/values as such:
USERS TABLE:
| username | user_jsonb_column |
|-----------------------------------------------------------|
| 'user1' | {"key1":"value1", "key2":"value2" ... } |
|--------------|--------------------------------------------|
| 'user2' | {"key2":"value2", "key3":"value3" ... } |
I am trying to calculate the similarity of the jsonb rows of 2 users with a very simple SQL query as such:
SELECT ROUND ((
SELECT COUNT(*) from (
SELECT jsonb_each(user_jsonb_column)
FROM users WHERE username = 'johndoe'
INTERSECT
SELECT jsonb_each(user_jsonb_column)
FROM users WHERE username = 'janedoe'
)::decimal AS SAME_PAIRS
/ --divide it by
SELECT COUNT(*) from (
SELECT jsonb_object_keys(user_jsonb_column)
FROM users WHERE username = 'johndoe'
INTERSECT
SELECT jsonb_object_keys(user_jsonb_column)
FROM users WHERE username = 'janedoe'
) as SAME_KEYS
) * 100) as similarity_percentage
This is working as intended and gives me the similarity result between 2 json objects as a percentage.
I am trying to turn this into a function so that I can query for the similarity percentage of 2 users as such:
query {
calculate_similarity_percentage(
args: {user1: "johndoe", user2: "janedoe"}
){
similarity_percentage_value
}
}
But I'm stuck at this point because I'm not sure whether I should think in terms of a trackable custom SQL function (which should return SETOF <TABLE> but I need a numeric value), a computed field (which can also return BASE type), or maybe a Postgres generated column in my situation.
I've been reading https://hasura.io/docs/1.0/graphql/core/schema/custom-functions.html and https://hasura.io/docs/1.0/graphql/core/schema/computed-fields.html but I couldn't quite figure out how to approach this, so any kind of help or comment would be appreciated.
Update: Yes, as Laurenz Albe pointed out, I am able to create a function like this:
CREATE OR REPLACE FUNCTION public.calculate_similarity_percentage(text, text)
RETURNS numeric
LANGUAGE sql
STABLE
AS $function$
SELECT ROUND(
(select count(*) from (
SELECT jsonb_each(user_jsonb_column) FROM users WHERE username = $1
INTERSECT
SELECT jsonb_each(user_jsonb_column) FROM users WHERE username = $2
) as SAME_PAIRS
)::decimal / (
select count(*) from (
SELECT jsonb_object_keys(user_jsonb_column) FROM users WHERE username = $1
INTERSECT
SELECT jsonb_object_keys(user_jsonb_column) FROM users WHERE username = $2
) as SAME_KEYS
)
* 100) as similarity_percentage
$function$
Then I can execute this function:
SELECT calculate_similarity_percentage('johndoe','janedoe')
And it returns this without any problem:
similarity_percentage
62
However, I would like Hasura to track this function so that I can query it on graphQL as:
query MyQuery {
calculate_similarity_percentage(args: {user1: "johndoe", user2: "janedoe"}) {
similarity_percentage
}
}
But if I try to track the function above, Hasura says:
**SQL Execution Failed**
in function "calculate_similarity_percentage":
the function "calculate_similarity_percentage" cannot be tracked for the following reasons:
• the function does not return a "COMPOSITE" type
• the function does not return a SETOF
• the function does not return a SETOF table
I have no idea if I can find a workaround and return a numeric value as a "COMPOSITE" or SETOF table.
Here is how I kind of solved my case. But this was not the optimal solution so I'm not accepting this as an answer.
I ended up creating another table like this:
USER_RELATION_TABLE:
| user1_col | user2_col |
|--------------------------|
| 'johndoe' | 'janedoe' |
|--------------------------|
| 'brad' | 'angelina' |
|--------------------------|
| ... | ... |
Then I added a computed field on the relation table with the following function:
CREATE OR REPLACE FUNCTION public.calculate_similarity_percentage(user_relation_row user_relation_table)
RETURNS numeric
LANGUAGE sql
STABLE
AS $function$
SELECT ROUND(
(select count(*) from (
SELECT jsonb_each(user_jsonb_column) FROM users
WHERE username = user_relation_row.user1_col
INTERSECT
SELECT jsonb_each(user_jsonb_column) FROM users
WHERE username = user_relation_row.user2_col
) as SAME_PAIRS
)::decimal / (
select count(*) from (
SELECT jsonb_object_keys(user_jsonb_column) FROM users
WHERE username = user_relation_row.user1_col
INTERSECT
SELECT jsonb_object_keys(user_jsonb_column) FROM users
WHERE username = user_relation_row.user2_col
) as SAME_KEYS
)
* 100) as similarity_percentage
$function$
Now I can query it on the graphQL like this:
query MyQuery {
user_relation_table {
similarity
}
}
Related
I have a column in db which constains JSON values like:
{"key-1": "val-1", "key-2": "val-2", "key-3": "val-3"}
By query like..
SELECT column->>'key-1' FROM table;
I can get my val-1.
Is there a way to get value with key as JSON in sql query from already existed JSON value?
I want to get result like:
{"key-1": "val-1"}
from
{"key-1": "val-1", "key-2": "val-2", "key-3": "val-3"}
using sql query.
Use ampersand operator, &, e.g.,
Live test: https://www.db-fiddle.com/f/9izCEH75JhwVDvsGvsZomG/0
with the_table as
(
select '{"key-1": "val-1", "key-2": "val-2", "key-3": "val-3"}'::jsonb as d
)
select d & 'key-1' as j from the_table
Output:
| j |
| ----------------- |
| {"key-1":"val-1"} |
Just kidding :) Create a function that extracts the desired key value pair, and then create your own user-defined operator for it.
create or replace function extract_one_jsonb(j jsonb, key text)
returns jsonb
as
$$
select jsonb_build_object(key, j->key)
$$ language sql;
create operator & (
leftarg = jsonb,
rightarg = text,
procedure = extract_one_jsonb
);
Of course you can just use a function, or if creating a user-defined operator is not an option:
with the_table as
(
select '{"key-1": "val-1", "key-2": "val-2", "key-3": "val-3"}'::jsonb as d
)
select extract_one_jsonb(d, 'key-1') as j from the_table
Output:
| j |
| ----------------- |
| {"key-1":"val-1"} |
If extracting a key value pair from jsonb is being done many times, it's desirable to give an operator for it, e.g., &. Postgres is pretty flexible when you want to create your own operator, this can be created too: ->>>.
Live test: https://www.db-fiddle.com/f/9izCEH75JhwVDvsGvsZomG/1
create operator ->>> (
leftarg = jsonb,
rightarg = text,
procedure = extract_one_jsonb
);
Output:
| j |
| ----------------- |
| {"key-1":"val-1"} |
->> is already used by Postgres: https://www.postgresql.org/docs/11/functions-json.html
You can create '->>>' instead. ->>> looks more like an extractor operator than ampersand &. Besides it looks good even you stick it to the source field (that is without spaces)
with the_table as
(
select '{"key-1": "val-1", "key-2": "val-2", "key-3": "val-3"}'::jsonb as d
)
select d->>>'key-1' as j from the_table
Tried the following, it works too, looks like a scissor (for cutting): %>
select d%>'key-1' as j from the_table
The only thing I can think of is to get the key/value pair and assemble that back into a single JSON value:
select jsonb_build_object(j.k, j.v)
from the_table t, jsonb_each(t.json_col) as j(k,v)
where j.k = 'key-1'
and ... more conditions ...;
Online example: https://rextester.com/VGSX43955
I am trying to select all records from Transaction_Table where Tr_Amount = Instrument_Number using following Code
Select * from Transaction_Table
where abs(Tr_Amount) = Cast(Instrument_number as INTEGER)
However there are some rows in the table where Instrument_Number is Alphanumeric instead of Just Numeric Data. I there a way to skip the alphanumeric instances in Instrument_Number field in the command.
Switch to TO_NUMBER, which returns NULL for bad data:
Select * from Transaction_Table
where abs(Tr_Amount) = TO_NUMBER(Instrument_number)
TD15.10 implements a TRYCAST:
Select * from Transaction_Table
where abs(Tr_Amount) = TRY_CAST(Instrument_number as INTEGER)
I have the following heap of text:
"BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,
URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,
16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,
1.0,Version,1.0,".
What I'd like to do is extract data from this in the following manner:
BundleSize:155648
DynamicSize:204800
Identifier:com.URLConnectionSample
Name:URLConnectionSample
ShortVersion:1.0
Version:1.0
BundleSize:155648
DynamicSize:16384
Identifier:com.IdentifierForVendor3
Name:IdentifierForVendor3
ShortVersion:1.0
Version:1.0
All tips and suggestions are welcome.
It isn't quite clear what do you need to do with this data. If you really need to process it entirely in the database (looks like the task for your favorite scripting language instead), one option is to use hstore.
Converting records one by one is easy:
Assuming
%s =
BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0
SELECT * FROM each(hstore(string_to_array(%s, ',')));
Output:
key | value
--------------+-------------------------
Name | URLConnectionSample
Version | 1.0
BundleSize | 155648
Identifier | com.URLConnectionSample
DynamicSize | 204800
ShortVersion | 1.0
If you have table with columns exactly matching field names (note the quotes, populate_record is case-sensitive to key names):
CREATE TABLE data (
"BundleSize" integer, "DynamicSize" integer, "Identifier" text,
"Name" text, "ShortVersion" text, "Version" text);
You can insert hstore records into it like this:
INSERT INTO data SELECT * FROM
populate_record(NULL::data, hstore(string_to_array(%s, ',')));
Things get more complicated if you have comma-separated values for more than one record.
%s = BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,1.0,Version,1.0,
You need to break up an array into chunks of number_of_fields * 2 = 12 elements first.
SELECT hstore(row) FROM (
SELECT array_agg(str) AS row FROM (
SELECT str, row_number() OVER () AS i FROM
unnest(string_to_array(%s, ',')) AS str
) AS str_sub
GROUP BY (i - 1) / 12) AS row_sub
WHERE array_length(row, 1) = 12;
Output:
"Name"=>"URLConnectionSample", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.URLConnectionSample", "DynamicSize"=>"204800", "ShortVersion"=>"1.0"
"Name"=>"IdentifierForVendor3", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.IdentifierForVendor3", "DynamicSize"=>"16384", "ShortVersion"=>"1.0"
And inserting this into the aforementioned table:
INSERT INTO data SELECT (populate_record(NULL::data, hstore(row))).* FROM ...
the rest of the query is the same.
I have a table EmployeeMoves:
| EmployeeID | CityIDs
+------------------------------
| 24 | 23,21,22
| 25 | 25,12,14
| 29 | 1,2,5
| 31 | 7
| 55 | 11,34
| 60 | 7,9,21,23,30
I'm trying to figure out how to expand the comma-delimited values from the EmployeeMoves.CityIDs column to populate an EmployeeCities table, which should look like this:
| EmployeeID | CityID
+------------------------------
| 24 | 23
| 24 | 21
| 24 | 22
| 25 | 25
| 25 | 12
| 25 | 14
| ... and so on
I already have a function called SplitADelimitedList that splits a comma-delimited list of integers into a rowset. It takes the delimited list as a parameter. The SQL below will give me a table with split values under the column Value:
select value from dbo.SplitADelimitedList ('23,21,1,4');
| Value
+-----------
| 23
| 21
| 1
| 4
The question is: How do I populate EmployeeCities from EmployeeMoves with a single (even if complex) SQL statement using the comma-delimited list of CityIDs from each row in the EmployeeMoves table, but without any cursors or looping in T-SQL? I could have 100 records in the EmployeeMoves table for 100 different employees.
This is how I tried to solve this problem. It seems to work and is very quick in performance.
INSERT INTO EmployeeCities
SELECT
em.EmployeeID,
c.Value
FROM EmployeeMoves em
CROSS APPLY dbo.SplitADelimitedList(em.CityIDs) c;
UPDATE 1:
This update provides the definition of the user-defined function dbo.SplitADelimitedList. This function is used in above query to split a comma-delimited list to table of integer values.
CREATE FUNCTION dbo.fn_SplitADelimitedList1
(
#String NVARCHAR(MAX)
)
RETURNS #SplittedValues TABLE(
Value INT
)
AS
BEGIN
DECLARE #SplitLength INT
DECLARE #Delimiter VARCHAR(10)
SET #Delimiter = ',' --set this to the delimiter you are using
WHILE len(#String) > 0
BEGIN
SELECT #SplitLength = (CASE charindex(#Delimiter, #String)
WHEN 0 THEN
datalength(#String) / 2
ELSE
charindex(#Delimiter, #String) - 1
END)
INSERT INTO #SplittedValues
SELECT cast(substring(#String, 1, #SplitLength) AS INTEGER)
WHERE
ltrim(rtrim(isnull(substring(#String, 1, #SplitLength), ''))) <> '';
SELECT #String = (CASE ((datalength(#String) / 2) - #SplitLength)
WHEN 0 THEN
''
ELSE
right(#String, (datalength(#String) / 2) - #SplitLength - 1)
END)
END
RETURN
END
Preface
This is not the right way to do it. You shouldn't create comma-delimited lists in SQL Server. This violates first normal form, which should sound like an unbelievably vile expletive to you.
It is trivial for a client-side application to select rows of employees and related cities and display this as a comma-separated list. It shouldn't be done in the database. Please do everything you can to avoid this kind of construction in the future. If at all possible, you should refactor your database.
The Right Answer
To get the list of cities, properly expanded, from a table containing lists of cities, you can do this:
INSERT dbo.EmployeeCities
SELECT
M.EmployeeID,
C.CityID
FROM
EmployeeMoves M
CROSS APPLY dbo.SplitADelimitedList(M.CityIDs) C
;
The Wrong Answer
I wrote this answer due to a misunderstanding of what you wanted: I thought you were trying to query against properly-stored data to produce a list of comma-separated CityIDs. But I realize now you wanted the reverse: to query the list of cities using existing comma-separated values already stored in a column.
WITH EmployeeData AS (
SELECT
M.EmployeeID,
M.CityID
FROM
dbo.SplitADelimitedList ('23,21,1,4') C
INNER JOIN dbo.EmployeeMoves M
ON Convert(int, C.Value) = M.CityID
)
SELECT
E.EmployeeID,
CityIDs = Substring((
SELECT ',' + Convert(varchar(max), CityID)
FROM EmployeeData C
WHERE E.EmployeeID = C.EmployeeID
FOR XML PATH (''), TYPE
).value('.[1]', 'varchar(max)'), 2, 2147483647)
FROM
(SELECT DISTINCT EmployeeID FROM EmployeeData) E
;
Part of my difficulty in understanding is that your question is a bit disorganized. Next time, please clearly label your example data and show what you have, and what you're trying to work toward. Since you put the data for EmployeeCities last, it looked like it was what you were trying to achieve. It's not a good use of people's time when questions are not laid out well.
I have the following SQL:
SELECT ',' + LTRIM(RTRIM(CAST(vessel_is_id as CHAR(2)))) + ',' AS 'Id'
FROM Vessels
WHERE ',' + LTRIM(RTRIM(CAST(vessel_is_id as varCHAR(2)))) + ',' IN (',1,2,3,4,5,6,')
Basically, I want to filter the vessel_is_id against a variable list of integer values (which is passed in as a varchar into the stored proc). Now, the above SQL does not work. I do have rows in the table with a `vessel__is_id' of 1, but they are not returned.
Can someone suggest a better approach to this for me? Or, if the above is OK
EDIT:
Sample data
| vessel_is_id |
| ------------ |
| 1 |
| 2 |
| 5 |
| 3 |
| 1 |
| 1 |
So I want to returned all of the above where vessel_is_id is in a variable filter i.e. '1,3' - which should return 4 records.
Cheers.
Jas.
IF OBJECT_ID(N'dbo.fn_ArrayToTable',N'FN') IS NOT NULL
DROP FUNCTION [dbo].[fn_ArrayToTable]
GO
CREATE FUNCTION [dbo].fn_ArrayToTable (#array VARCHAR(MAX))
-- =============================================
-- Author: Dan Andrews
-- Create date: 04/11/11
-- Description: String to Tabled-Valued Function
--
-- =============================================
RETURNS #output TABLE (data VARCHAR(256))
AS
BEGIN
DECLARE #pointer INT
SET #pointer = CHARINDEX(',', #array)
WHILE #pointer != 0
BEGIN
INSERT INTO #output
SELECT RTRIM(LTRIM(LEFT(#array,#pointer-1)))
SELECT #array = RIGHT(#array, LEN(#array)-#pointer),
#pointer = CHARINDEX(',', #array)
END
RETURN
END
Which you may apply like:
SELECT * FROM dbo.fn_ArrayToTable('2,3,4,5,2,2')
and in your case:
SELECT LTRIM(RTRIM(CAST(vessel_is_id AS CHAR(2)))) AS 'Id'
FROM Vessels
WHERE LTRIM(RTRIM(CAST(vessel_is_id AS VARCHAR(2)))) IN (SELECT data FROM dbo.fn_ArrayToTable('1,2,3,4,5,6')
Since Sql server doesn't have an Array you may want to consider passing in a set of values as an XML type. You can then turn the XML type into a relation and join on it. Drawing on the time-tested pubs database for example. Of course you're client may or may not have an easy time generating the XML for the parameter value, but this approach is safe from sql-injection which most "comma seperated" value approaches are not.
declare #stateSelector xml
set #stateSelector = '<values>
<value>or</value>
<value>ut</value>
<value>tn</value>
</values>'
select * from authors
where state in ( select c.value('.', 'varchar(2)') from #stateSelector.nodes('//value') as t(c))