PostgreSql Search JSON And Match Whole Term

PostgreSql Search JSON And Match Whole Term - postgresql

I have a Postgres database using JSON storage. I have a table of cameras and lenses with a single property to search against called BrandAndModel. The relevant JSON portion looks like this and is stored in a column called "data":
"BrandAndModel": "nikon nikkor 50mm f/1.4 ai-s"
I have a LIKE query running against this brand and model string but it only returns a result of the sequence of characters matches. For instance, the above does get results for "nikkor 50mm" but NOT "nikon 50mm".
I'm no SQL expert and I'm not sure what I need to use to match more possible combinations.
My query looks like this
SELECT * FROM listing where data ->> 'Product' ->> 'BrandAndModel' like '%nikon 50mm%'
How could I get this query to match "nikon 50mm"?

You may use ANY with an array for multiple comparisons.
LIKE ANY(ARRAY['%nikon%ai-s%', 'nikon%50mm%', '%nikkor%50mm%'])

Related

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table.
Once finished I checked the SVL_S3LOG to see the structure of the external table and saw it was updated and I have new column with Array<int> type like expected.
When I have tried to execute select * on the external table I got this error: "Invalid operation: Nested tables do not support '*' in the SELECT clause.;"
So I have tried to detailed the select statement with all columns names:
select name, date, books.... (books is the Array<int> type)
from external_table_a1
and got this error:
Invalid operation: column "books" does not exist in external_table_a1;"
I have also checked under "AWS Glue" the table external_table_a1 and saw that column "books" is recognized and have the type Array<int>.
Can someone explain why my simple query is wrong?
What am I missing?

Querying JSON data is a bit of a hassle with Redshift: when parsing is enabled (eg using the appropriate SerDe configuration) the JSON is stored as a SUPER type. In your case that's the Array<int>.
The AWS documentation on Querying semistructured data seems pretty straightforward, mentioning that PartiQL uses "dotted notation and array subscript for path navigation when accessing nested data". This doesn't work for me, although I don't find any reasons in their SUPER Limitations Documentation.
Solution 1
What I have to do is set the flags set json_serialization_enable to true; and set json_serialization_parse_nested_strings to true; which will parse the SUPER type as JSON (ie back to JSON). I can then use JSON-functions to query the data. Unnesting data gets even crazier because you can only use the unnest syntax select item from table as t, t.items as item on SUPER types. I genuinely don't think that this is the supposed way to query and unnest SUPER objects but that's the only approach that worked for me.
They described that in some older "Amazon Redshift Developer Guide".
Solution 2
When you are writing your query or creating a query Redshift will try to fit the output into one of the basic column data types. If the result of your query does not match any of those types, Redshift will not process the query. Hence, in order to convert a SUPER to a compatible type you will have to unnest it (using the rather peculiar Redshift unnest syntax).
For me, this works in certain cases but I'm not always able to properly index arrays, not can I access the array index (using my_table.array_column as array_entry at array_index syntax).

How can I prevent SQL injection with arbitrary JSONB query string provided by an external client?

I have a basic REST service backed by a PostgreSQL database with a table with various columns, one of which is a JSONB column that contains arbitrary data. Clients can store data filling in the fixed columns and provide any JSON as opaque data that is stored in the JSONB column.
I want to allow the client to query the database with constraints on both the fixed columns and the JSONB. It is easy to translate some query parameters like ?field=value and convert that into a parameterized SQL query for the fixed columns, but I want to add an arbitrary JSONB query to the SQL as well.
This JSONB query string could contain SQL injection, how can I prevent this? I think that because the structure of the JSONB data is arbitrary I can't use a parameterized query for this purpose. All the documentation I can find suggests I use parameterized queries, and I can't find any useful information on how to actually sanitize the query string itself, which seems like my only option.
For example a similar question is:
How to prevent SQL Injection in PostgreSQL JSON/JSONB field?
But I can't apply the same solution as I don't know the structure of the JSONB or the query, I can't assume the client wants to query a particular path using a particular operator, the entire JSONB query needs to be freely provided by the client.
I'm using golang, in case there are any existing libraries or code fragments that I can use.
edit: some example queries on the JSONB that the client might do:
(content->>'company') is NULL
(content->>'income')::numeric>80000
content->'company'->>'name'='EA' AND (content->>'income')::numeric>80000
content->'assets'#>'[{"kind":"car"}]'
(content->>'DOB')::TIMESTAMP<'2000-01-30T10:12:18.120Z'::TIMESTAMP
EXISTS (SELECT FROM jsonb_array_elements(content->'assets') asset WHERE (asset->>'value')::numeric > 100000)
Note that these don't cover all possible types of queries. Ideally I want any query that PostgreSQL supports on the JSONB data to be allowed. I just want to check the query to ensure it doesn't contain sql injection. For example, a simplistic and probably inadequate solution would be to not allow any ";" in the query string.

You could allow the users to specify a path within the JSON document, and then parameterize that path within a call to a function like json_extract_path_text. That is, the WHERE clause would look like:
WHERE json_extract_path_text(data, $1) = $2
The path argument is just a string, easily parameterized, which describes the keys to traverse down to the given value, e.g. 'foo.bars[0].name'. The right-hand side of the clause would be parameterized along the same rules as you're using for fixed column filtering.

Amazon Redshift get all keys from JSON

I looked at the documentation of Amazon redshift and I'm not able to see a function which will give me what I want.
https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html
I have a column in my database which contains JSON like this
{'en_IN-foo':'bla bla', 'en_US-foo':'bla bla'}
I want to extract all keys from json which have foo. So I want to extract
en_IN-foo
en_US-foo
How can I get what I want? The closest to my requirement is JSON_EXTRACT_PATH_TEXT function but that can only extract the key when you know the key name. in my case I want all keys which have a pattern but I don't know the key names.
I also tried abandoning the JSON function way and going the REGEX way. I wrote this code
select distinct regexp_substr('{en_in-foo:FOO, en_US-foo:BAR}','[^.]{5}-foo')
but this finds only the first match. I need all the matches.

Redshift is not flexible with JSON, so I don't think getting keys from an arbitrary JSON document is possible. You need to know the keys upfront.
option 1
If possible change your JSON document to have a static schema:
{"locale":"en_IN", "foo": "bla bla"}
Or even
{"locale":"en_IN", "name": "foo", "value": "bla bla"}
Option 2
I can see that your prefix may be known to you as it looks like the locale. What you could do is to create a static table of locales, and then CROSS JOIN it with your JSON column.
locales_table:
Id | locale
----------------
1 | en_US
2 | en_IN
The query would look like this:
SELECT
JSON_EXTRACT_PATH_TEXT(json_column, locale || '-foo', TRUE) as foo_at_locale
FROM json_table
CROSS JOIN locales_table
WHERE foo_at_locale IS NOT NULL

Querying on multiple LINKMAP items with OrientDB SQL

I have a class that contains a LINKMAP field called links. This class is used recursively to create arbitrary hierarchical groupings (something like the time-series example, but not with the fixed year/month/day structure).
A query like this:
select expand(links['2017'].links['07'].links['15'].links['10'].links) from data where key='AAA'
Returns the actual records contained in the last layer of "links". This works exactly as expected.
But a query like this (note the 10,11 in the second to last layer of "links"):
select expand(links['2017'].links['07'].links['15'].links['10','11'].links) from data where key='AAA'
Returns two rows of the last layer of "links" instead:
{"1000":"#23:0","1001":"#24:0","1002":"#23:1"}
{"1003":"#24:1","1004":"#23:2"}
Using unionAll or intersect (with or without UNWIND) results in this single record:
[{"1000":"#23:0","1001":"#24:0","1002":"#23:1"},{"1003":"#24:1","1004":"#23:2"}]
But nothing I've tried (including various attempts at "compound" SELECTs) will get the expand to work as it does with the original example (i.e. return the actual records represented in the last LINKMAP).
Is there a SQL syntax that will achieve this?
Note: Even this (slightly modified) example from the ODB docs does not result in a list of linked records:
select expand(records) from
(select unionAll(years['2017'].links['07'].links['15'].links['10'].links, years['2017'].links['07'].links['15'].links['11'].links) as records from data where key='AAA')
Ref: https://orientdb.com/docs/2.2/Time-series-use-case.html

I'm not sure of what you want to achieve, but I think it's worth trying with values():
select expand(links['2017'].links['07'].links['15'].links['10','11'].links.values()) from data where key='AAA'

PostgreSql Dynamic JSON Indexing

I am new to PostgreSql world. We chose this DB so that we could query our JSON results for filter queries like contains, less than , greater than, etc. JSON results are dynamic and we cannot know in advance what keys will be generated as the output. Table (result_id (int64),jsondata(jsonb)) data looks like this
id1,{k1:vab,k2:abc,k3:def}
id1,{k1:abv,k2:v7,k3:ghu}
id1,{k1:v5,k2:vdd,k3:vew}
id1,{k1:v6,k2:v9s,k3:ved}
id2,{k4:vw,k5:vds,k6:vdss}
id2,{k4:v1,k5:fgg,k6:dd}
id2,{k4:qw,k5:gfd,k6:ess}
id2,{k4:er,k5:dfs,k6:fss}
My queries would be something like
Select * from table where result_id = 'id1' and jsondata->'k1' contains 'ab'
My script outputs a json content that I store in this table.
Each json key is represented in a Grid column and json key's values are column data.Grid offers filtering capabilities, which means filtering on JSON data.
My problem is that the filtering can happen on any JSON key, but key names are not static. Keys (json output) might change when the script content is changed So previously indexed keys would become irrelevant. But if the script is not changed the keys remain constant.
How do I apply indexing so that my JSON filter operations become faster? The same table contains many keys within the same JSON row and across rows. Wouldn't it be inefficient to index all keys so that filtering can be made efficient?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse