How to understand the return type? - postgresql

I'm building a framework for rust-postgres.
I need to know what value type will be returned from a row.try_get, to get the value in a variable of the appropriate type.
I can get the sql type from row.columns()[index].type, but not if the value is nullable , so i can't decide to put the value in a normal type or a Option<T>.
I can use just the content of the row to understand it, i can't do things like "get the table structure from Postgresql".
is there a way?

The reason that the Column type does not expose any way to find out if a result column is nullable is because the database does not return this information.
Remember that result columns are derived from running a query, and that query may contain arbitrary expressions. If the query was a simple SELECT of columns from a table, then it would be reasonably simple to determine if a column could be nullable.
But it could also be a very complex expression, derived from multiple columns, subselects or even custom functions. Postgres can figure out the data type of each column, but in the general case it doesn't know if a result column may contain nulls.
If your application is only performing simple queries, and you know which table column each result column comes from, then you can find out if that table column is nullable like this:
SELECT is_nullable
FROM information_schema.columns
WHERE table_schema='myschema'
AND table_name='mytable'
AND column_name='mycolumn';
If your queries are not that simple then I recommend you always get the result as an Option<T> and handle the possibility that the result might be None.

Related

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table.
Once finished I checked the SVL_S3LOG to see the structure of the external table and saw it was updated and I have new column with Array<int> type like expected.
When I have tried to execute select * on the external table I got this error: "Invalid operation: Nested tables do not support '*' in the SELECT clause.;"
So I have tried to detailed the select statement with all columns names:
select name, date, books.... (books is the Array<int> type)
from external_table_a1
and got this error:
Invalid operation: column "books" does not exist in external_table_a1;"
I have also checked under "AWS Glue" the table external_table_a1 and saw that column "books" is recognized and have the type Array<int>.
Can someone explain why my simple query is wrong?
What am I missing?
Querying JSON data is a bit of a hassle with Redshift: when parsing is enabled (eg using the appropriate SerDe configuration) the JSON is stored as a SUPER type. In your case that's the Array<int>.
The AWS documentation on Querying semistructured data seems pretty straightforward, mentioning that PartiQL uses "dotted notation and array subscript for path navigation when accessing nested data". This doesn't work for me, although I don't find any reasons in their SUPER Limitations Documentation.
Solution 1
What I have to do is set the flags set json_serialization_enable to true; and set json_serialization_parse_nested_strings to true; which will parse the SUPER type as JSON (ie back to JSON). I can then use JSON-functions to query the data. Unnesting data gets even crazier because you can only use the unnest syntax select item from table as t, t.items as item on SUPER types. I genuinely don't think that this is the supposed way to query and unnest SUPER objects but that's the only approach that worked for me.
They described that in some older "Amazon Redshift Developer Guide".
Solution 2
When you are writing your query or creating a query Redshift will try to fit the output into one of the basic column data types. If the result of your query does not match any of those types, Redshift will not process the query. Hence, in order to convert a SUPER to a compatible type you will have to unnest it (using the rather peculiar Redshift unnest syntax).
For me, this works in certain cases but I'm not always able to properly index arrays, not can I access the array index (using my_table.array_column as array_entry at array_index syntax).

How does implicit casting work in Oracle NoSQL Database?

I am trying to understand the implicit cast behavior.
I have a column called ticketNo, this is a string and it is a pk.
Using the same datatype in both sides, I am returning one row
SELECT * FROM demo d WHERE ticketNo = "1762386738153"
When I am doing a explicit cast, this query is returning the same row
SELECT * FROM demo d WHERE cast (ticketNo as Long)= 1762386738153
Now, when I am doing an implicit cast, this query is returning no rows
SELECT * FROM demo d WHERE ticketNo = 1762386738153
Any ideas ?
There is no implicit cast behavior in Oracle NoSQL Database. String types are not comparable to Long types so the predicate ticketNo = 1762386738153 returns always false in your case. A string item is comparable to another string item. A string item is also comparable to an enum item.
In your case, this is your primary key, in order to have the best performances, it is not recommended to do a CAST. Validate the types before do this query. A primary key is always typed, no wildcard or complex types are accepted
Otherwise,
the reason for returning false for incomparable items, instead of
raising an error, is to handle truly schemaless applications, where
different table rows may contain very different data or differently
shaped data. As a result, even the writer of the query may not know
what kind of items an operand may return and an operand may indeed
return different kinds of items from different rows.
you can always execute the explicit CAST operation when needed, as you did.
If you are interested in have more information : https://docs.oracle.com/en/database/other-databases/nosql-database/20.3/sqlreferencefornosql/value-comparison-operators.html

How can I prevent SQL injection with arbitrary JSONB query string provided by an external client?

I have a basic REST service backed by a PostgreSQL database with a table with various columns, one of which is a JSONB column that contains arbitrary data. Clients can store data filling in the fixed columns and provide any JSON as opaque data that is stored in the JSONB column.
I want to allow the client to query the database with constraints on both the fixed columns and the JSONB. It is easy to translate some query parameters like ?field=value and convert that into a parameterized SQL query for the fixed columns, but I want to add an arbitrary JSONB query to the SQL as well.
This JSONB query string could contain SQL injection, how can I prevent this? I think that because the structure of the JSONB data is arbitrary I can't use a parameterized query for this purpose. All the documentation I can find suggests I use parameterized queries, and I can't find any useful information on how to actually sanitize the query string itself, which seems like my only option.
For example a similar question is:
How to prevent SQL Injection in PostgreSQL JSON/JSONB field?
But I can't apply the same solution as I don't know the structure of the JSONB or the query, I can't assume the client wants to query a particular path using a particular operator, the entire JSONB query needs to be freely provided by the client.
I'm using golang, in case there are any existing libraries or code fragments that I can use.
edit: some example queries on the JSONB that the client might do:
(content->>'company') is NULL
(content->>'income')::numeric>80000
content->'company'->>'name'='EA' AND (content->>'income')::numeric>80000
content->'assets'#>'[{"kind":"car"}]'
(content->>'DOB')::TIMESTAMP<'2000-01-30T10:12:18.120Z'::TIMESTAMP
EXISTS (SELECT FROM jsonb_array_elements(content->'assets') asset WHERE (asset->>'value')::numeric > 100000)
Note that these don't cover all possible types of queries. Ideally I want any query that PostgreSQL supports on the JSONB data to be allowed. I just want to check the query to ensure it doesn't contain sql injection. For example, a simplistic and probably inadequate solution would be to not allow any ";" in the query string.
You could allow the users to specify a path within the JSON document, and then parameterize that path within a call to a function like json_extract_path_text. That is, the WHERE clause would look like:
WHERE json_extract_path_text(data, $1) = $2
The path argument is just a string, easily parameterized, which describes the keys to traverse down to the given value, e.g. 'foo.bars[0].name'. The right-hand side of the clause would be parameterized along the same rules as you're using for fixed column filtering.

PostgreSql Queries treats Int as string datatypes

I store the following rows in my table ('DataScreen') under a JSONB column ('Results')
{"Id":11,"Product":"Google Chrome","Handle":3091,"Description":"Google Chrome"}
{"Id":111,"Product":"Microsoft Sql","Handle":3092,"Description":"Microsoft Sql"}
{"Id":22,"Product":"Microsoft OneNote","Handle":3093,"Description":"Microsoft OneNote"}
{"Id":222,"Product":"Microsoft OneDrive","Handle":3094,"Description":"Microsoft OneDrive"}
Here, In this JSON objects "Id" amd "Handle" are integer properties and other being string properties.
When I query my table like below
Select Results->>'Id' From DataScreen
order by Results->>'Id' ASC
I get the improper results because PostgreSql treats everything as a text column and hence does the ordering according to the text, and not as integer.
Hence it gives the result as
11,111,22,222
instead of
11,22,111,222.
I don't want to use explicit casting to retrieve like below
Select Results->>'Id' From DataScreen order by CAST(Results->>'Id' AS INT) ASC
because I will not be sure of the datatype of the column due to the fact that JSON structure will be dynamic and the keys and values may change next time. and Hence could happen the same with another JSON that has Integer and string keys.
I want something so that Integers in Json structure of JSONB column are treated as integers only and not as texts (string).
How do I write my query so that Id And Handle are retrieved as Integer Values and not as strings , without explicit casting?
I think your assumtions about the id field don't make sense. You said,
(a) Either id contains integers only or
(b) it contains strings and integers.
I'd say,
If (a) then numerical ordering is correct.
If (b) then lexical ordering is correct.
But if (a) for some time and then (b) then the correct order changes, too. And that doesn't make sense. Imagine:
For the current database you expect the order 11,22,111,222. Then you add a row
{"Id":"aa","Product":"Microsoft OneDrive","Handle":3095,"Description":"Microsoft OneDrive"}
and suddenly the correct order of the other rows changes to 11,111,22,222,aa. That sudden change is what bothers me.
So I would either expect a lexical ordering ab intio, or restrict my id field to integers and use explicit casting.
Every other option I can think of is just not practical. You could, for example, create a custom < and > implementation for your id field which results in 11,111,22,222,aa. ("Order all integers by numerical value and all strings by lexical order and put all integers before the strings").
But that is a lot of work (it involves a custom data type, a custom cast function and a custom operator function) and yields some counterintuitive results, e.g. 11,111,22,222,0a,1a,2a,aa (note the position of 0a and so on. They come after 222).
Hope, that helps ;)
If Id always integer you can cast it in select part and just use ORDER BY 1:
select (Results->>'Id')::int From DataScreen order by 1 ASC

what is the difference of type record and type row in PostgreSQL?

As title shown, when reading the manul, I found type record type and row type, which are both composite type. However, I want to figure out their difference.
They're similar once defined but tend to have different use cases.
A RECORD type has no predefined structure and is typically used when the row type might change or is out of your control, for example if you're referencing a record in a FOR LOOP.
ROWTYPE is predefined of a particular table row structure and thus if anything deviates from that structure you will get runtime errors.
It all depends what you're trying to achieve.
For cursor loops I use a RECORD>
For more information:
http://www.postgresql.org/docs/current/static/plpgsql-declarations.html