In Postgres, how can I efficiently filter using the inner numbers of this jsonb structure? - postgresql

So I work with Postgres SQL, and I have a jsonb column with the following structure:
{
"Store1":[
{
"price":5.99,
"seller":"seller"
},
{
"price":56.43,
"seller":"seller"
}
],
"Store2":[
{
"price":45.65,
"seller":"seller"
},
{
"price":44.66,
"seller":"seller"
}
]
}
I have a jsonb like this for every product in the database. I want to run an SQL query that will answer the following question:
For each product, is one of the prices in this JSON is bigger/equal/smaller than X?
Basically filter the product to include only the ones who have at least one price that satisfies a mathematical condition.
How can I do it efficiently? What's the best way in Postgres to iterate a JSON like this, with a relatively complex inner structure?
Also, if I could control the way the data is structured (to an extent, I can), what changes can I do to make this query more efficient?
Thanks!

Use a json path expression:
WHERE col ## '$.*[*].price < 20'
or
WHERE col #? '$.*[*] ? (#.price < 20)'
If you need to compare to another column or make the query parameterised, you can either build the jsonpath dynamically
WHERE col ## format('$.*[*].price < %s', $1)::jsonpath
WHERE col #? format('$.*[*] ? (#.price < %s)', $1)::jsonpath
or you can use the respective function and pass variables as an object:
WHERE jsonb_path_match(col, '$.*[*].price < $limit', jsonb_build_object('limit', $1))
WHERE jsonb_path_exists(col, format('$.*[*] ? (#.price < $limit)', jsonb_build_object('limit', $1))
I admit I had to check my cheat sheet to figure out the right combination of operator and expression. Takeaways:
if a comparison operator needs to work with multiple values, it generally functions as an ANY
## does not work with ? (# …) filter expressions since they don't return a boolean,
#? does not work with predicates since they always return a value (even if it's false)

What changes can I do to make this query more efficient?
As #jjanes commented on my other answer, the jsonpath match col ## '$.*[*].price < $limit' isn't going to be fast and needs to do full table scan, at least for < and >. To make a useful index, a different approach is required. An index can only have a single value to compare with, not any number. For that, we need to change the condition from EXISTS(SELECT prices_of(col) WHERE price < $limit) to (SELECT MIN(prices_of(col))) < $limit.
With this idea it is possible to build an expression index on the result of a custom immutable function:
CREATE FUNCTION min_price(data jsonb) RETURNS float
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN (
SELECT min((offer ->> 'price')::float)
FROM jsonb_each(data) AS entries(name, store),
LATERAL jsonb_array_elements(store) AS elements(offer)
);
CREATE INDEX example_min_data_price_idx ON example (min_price(data));
which you can use as
SELECT * FROM example WHERE min_price(data) < 20;
Looking for rows with a price larger than a certain number requires a separate index on max_price(data). If you want to use the index in a JOIN with more conditions, consider making it a multi-column index.
Looking for row with a price equalling a certain number can be optimised by indexing the jsonb column and using a jsonpath:
CREATE INDEX example_data_idx ON example USING GIN (data jsonb_ops);
SELECT * FROM example WHERE data ## '$.*[*].price == 20';
SELECT * FROM example WHERE data #? '$.*[*] ? (#.price == 20)';
Unfortunately you can't use jsonb_path_ops here since that doesn't support the wildcard.

Related

How to properly parameterize my postgresql query

I'm trying to parameterize my postgresql query in order to prevent SQL injection in my ruby on rails application. The SQL query will sum a different value in my table depending on the input.
Here is a simplified version of my function:
def self.calculate_value(value)
calculated_value = ""
if value == "quantity"
calculated_value = "COALESCE(sum(amount), 0)"
elsif value == "retail"
calculated_value = "COALESCE(sum(amount * price), 0)"
elsif value == "wholesale"
calculated_value = "COALESCE(sum(amount * cost), 0)"
end
query = <<-SQL
select CAST(? AS DOUBLE PRECISION) as ? from table1
SQL
return Table1.find_by_sql([query, calculated_value, value])
end
If I call calculate_value("retail"), it will execute the query like this:
select location, CAST('COALESCE(sum(amount * price), 0)' AS DOUBLE PRECISION) as 'retail' from table1 group by location
This results in an error. I want it to execute without the quotes like this:
select location, CAST(COALESCE(sum(amount * price), 0) AS DOUBLE PRECISION) as retail from table1 group by location
I understand that the addition of quotations is what prevents the sql injection but how would I prevent it in this case? What is the best way to handle this scenario?
NOTE: This is a simplified version of the queries I'll be writing and I'll want to use find_by_sql.
Prepared statement can not change query structure: table or column names, order by clause, function names and so on. Only literals can be changed this way.
Where is SQL injection? You are not going to put a user-defined value in the query text. Instead, you check the given value against the allowed list and use only your own written parts of SQL. In this case, there is no danger of SQL injection.
I also want to link to this article. It is safe to create a query text dynamically if you control all parts of that query. And it's much better for RDBMS than some smart logic in query.

Indexing a josnb column in postgresql

I have a column in postgresql table with type jsonb.
{
.....
"type": "car",
"vehicleIds": [
"980e3761-935a-4e52-be77-9f9461dec4d1","980e3761-935a-4e52-be77-9f9461dec4d2"
]
.....
}
Application runs queries against these fields to fetch records. I need to index this column only for these fields.
How can this be done?
This is query structure with properties as the column name:
SELECT *
FROM Vehicle f
WHERE f.properties::text ## CONCAT('$.vehicleIds[*] >', :vehicleId )= true
AND f.properties::text ## CONCAT('$.type >', :type ) = true
The query you are using is highly confusing, as it boils down to be a text search query, as the ## is applied on a text value.
I also don't understand the '$.type > ... condition. With values like car I would expect an equality operator, rather than "greater than". Using > together with a UUID also doesn't seem to make sense.
If you want to search for values of type car and contain a list of IDs, using the "contains" operator #> is a better way to do that:
SELECT *
FROM Vehicle f
WHERE f.properties #> '{"type": "car", "vehicleIds": ["980e3761-935a-4e52-be77-9f9461dec4d1"]}'
The above could make use of a GIN index on the properties column:
create index on vehicles using gin (properties);
If the type key is always queried with equality (which I assume), a combined index might be more efficient:
create index on vehicles using gin ( (properties ->> 'type'), (properties -> 'vehicleIds') );
You need to install the btree_gin extension in order to create that index.
That index would be a bit smaller but needs a different query:
SELECT *
FROM Vehicle f
WHERE f.properties ->> 'type' = 'car'
AND f.properties -> 'vehicleIds' #> '["980e3761-935a-4e52-be77-9f9461dec4d1"]'
You will need to validate if the indexes are used and which ones is more efficient by looking at the execution plan

How to use select syntax for group by field which is array in Dynamics AX

I have field Value in table finStatementTrans which is array.
How should I write select syntax with group by and sum by this field?
while select finStatementTable join DataClassParagraph,sum(Value) from finStatementTrans
group by finStatementTrans.DataClassParagraph
where finStatementTable.RecId == finStatementTrans.FinStatementTable_FK
&& finStatementTable.FinStatementTableParent_FK == 5637569094
{
info(strFmt(%1,%2",finStatementTrans.DataClassParagraph,finStatementTrans.Value[1]));
}
Is this correct?
sum(Value[1])
with this I can't compile.
As Aliaksandr Maksimau mentioned in his comment, aggregating array fields is not possible. Aggregations are only supported for integer and real data type fields.
See also X++ data selection and manipulation, paragraph select statements, last sentence.

Complex SphinxQL Query

I'm trying to write a SphinxQL query that would replicate the following MySQL in a Sphinx RT index:
SELECT id FROM table WHERE colA LIKE 'valA' AND (colB = valB OR colC = valC OR ... colX = valX ... OR colY LIKE 'valY' .. OR colZ LIKE 'valZ')
As you can see I'm trying to get all the rows where one string column matches a certain value, AND matches any one of a list of values, which mixes and matches string and integer columns / values)
This is what I've gotten so far in SphinxQL:
SELECT id, (intColA = intValA OR intColB = intValB ...) as intCheck FROM rt_index WHERE MATCH('#requiredMatch = requiredValue');
The problem I'm running into is in matching all of the potential optional string values. The best possible query (if multiple MATCH statements were allowed and they were allowed as expressions) would be something like
SELECT id, (intColA = intValA OR MATCH('#checkColA valA|valB') OR ...) as optionalMatches FROM rt_index WHERE optionalMatches = 1 AND MATCH('#requireCol requiredVal')
I can see a potential way to do this with CRC32 string conversions and MVA attributes but these aren't supported with RT Indexes and I REALLY would prefer not switch from them.
One way would be to simply convert all your columns to normal fields. Then you can put all this logic inside the MATCH(..). Ie not using attributes.
Yes you can only have one MATCH per query.
Otherwise, yes you could use the CRC trick to make string attributes into integer ones, so can use for filtering.
Not sure why you would need MVA, but they are now supported in RT indexes in 2.0.2

Are you able to use a custom Postgres comparison function for ORDER BY clauses?

In Python, I can write a sort comparison function which returns an item in the set {-1, 0, 1} and pass it to a sort function like so:
sorted(["some","data","with","a","nonconventional","sort"], custom_function)
This code will sort the sequence according to the collation order I define in the function.
Can I do the equivalent in Postgres?
e.g.
SELECT widget FROM items ORDER BY custom_function(widget)
Edit: Examples and/or pointers to documentation are welcome.
Yes you can, you can even create an functional index to speed up the sorting.
Edit: Simple example:
CREATE TABLE foo(
id serial primary key,
bar int
);
-- create some data
INSERT INTO foo(bar) SELECT i FROM generate_series(50,70) i;
-- show the result
SELECT * FROM foo;
CREATE OR REPLACE FUNCTION my_sort(int) RETURNS int
LANGUAGE sql
AS
$$
SELECT $1 % 5; -- get the modulo (remainder)
$$;
-- lets sort!
SELECT *, my_sort(bar) FROM foo ORDER BY my_sort(bar) ASC;
-- make an index as well:
CREATE INDEX idx_my_sort ON foo ((my_sort(bar)));
The manual is full of examples how to use your own functions, just start playing with it.
SQL: http://www.postgresql.org/docs/current/static/xfunc-sql.html
PL/pgSQL: http://www.postgresql.org/docs/current/static/plpgsql.html
We can avoid confusion about ordering methods using names:
"score function" of standard SQL select * from t order by f(x) clauses, and
"compare function" ("sort function" in the question text) of the Python's sort array method.
The ORDER BY clause of PostgreSQL have 3 mechanisms to sort:
Standard, using an "score function", that you can use also with INDEX.
Special "standard string-comparison alternatives", by collation configuration (only for text, varchar, etc. datatypes).
ORDER BY ... USING clause. See this question or docs example. Example: SELECT * FROM mytable ORDER BY somecol USING ~<~ where ~<~ is an operator, that is embedding a compare function.
Perhaps "standard way" in a RDBMS (as PostgreSQL) is not like Python's standard because indexing is the aim of a RDBMS, and it's easier to index score functions.
Answers to the question:
Direct solution. There are no direct way to use an user-defined function as compare function, like in the sort method of languages like Python or Javascript.
Indirect solution. You can use a user-defined compare function in an user-defined operator, and an user-defined operator class to index it. See at PostgreSQL docs:
CREATE OPERATOR with the compare function;
CREATE OPERATOR CLASS, to be indexable.
Explaining compare functions
In Python, the compare function looks like this:
def compare(a, b):
return 1 if a > b else 0 if a == b else -1
The compare function use less CPU tham a score function. It is usefull also to express order when score funcion is unknown.
See a complete description at
for C language see https://www.gnu.org/software/libc/manual/html_node/Comparison-Functions.html
for Javascript see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort#Description
Other typical compare functions
Wikipedia's example to compare tuples:
function tupleCompare((lefta, leftb, leftc), (righta, rightb, rightc))
if lefta ≠ righta
return compare(lefta, righta)
else if leftb ≠ rightb
return compare(leftb, rightb)
else
return compare(leftc, rightc)
In Javascript:
function compare(a, b) {
if (a is less than b by some ordering criterion) {
return -1;
}
if (a is greater than b by the ordering criterion) {
return 1;
}
// a must be equal to b
return 0;
}
C++ example of PostgreSQL docs:
complex_abs_cmp_internal(Complex *a, Complex *b)
{
double amag = Mag(a),
bmag = Mag(b);
if (amag < bmag)
return -1;
if (amag > bmag)
return 1;
return 0;
}
You could do something like this
SELECT DISTINCT ON (interval_alias) *,
to_timestamp(floor((extract('epoch' FROM index.created_at) / 10)) * 10) AT
TIME ZONE 'UTC' AS interval_alias
FROM index
WHERE index.created_at >= '{start_date}'
AND index.created_at <= '{end_date}'
AND product = '{product_id}'
GROUP BY id, interval_alias
ORDER BY interval_alias;
Firstly you define the parameter that will be your ordering column with AS. It could be function or any SQL expression. Then set it to ORDER BY expression and you're done!
In my opinion, this is the smoothest way to do such an ordering.