How can I read PostgreSQL arrays with SOCI?

How can I read PostgreSQL arrays with SOCI? - postgresql

I need to execute some query that returns (SELECTs) a PostgreSQL array, but the documentation of SOCI's PostgreSQL backend doesn't mention anything about arrays.
If I just try to put it into a soci::rowset, it thinks that it is a string column and returns a string like "{1, 2, 3}" which I would hate to parse. Is there any way to get SOCI to handle that data type automatically, either with soci::into or via a soci::rowset? Or do I have to resort to JOINing with the array in order to get separate rows in the resultset?
I'm using SOCI 3.2 and PostgreSQL 9.3.

Related

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table.
Once finished I checked the SVL_S3LOG to see the structure of the external table and saw it was updated and I have new column with Array<int> type like expected.
When I have tried to execute select * on the external table I got this error: "Invalid operation: Nested tables do not support '*' in the SELECT clause.;"
So I have tried to detailed the select statement with all columns names:
select name, date, books.... (books is the Array<int> type)
from external_table_a1
and got this error:
Invalid operation: column "books" does not exist in external_table_a1;"
I have also checked under "AWS Glue" the table external_table_a1 and saw that column "books" is recognized and have the type Array<int>.
Can someone explain why my simple query is wrong?
What am I missing?

Querying JSON data is a bit of a hassle with Redshift: when parsing is enabled (eg using the appropriate SerDe configuration) the JSON is stored as a SUPER type. In your case that's the Array<int>.
The AWS documentation on Querying semistructured data seems pretty straightforward, mentioning that PartiQL uses "dotted notation and array subscript for path navigation when accessing nested data". This doesn't work for me, although I don't find any reasons in their SUPER Limitations Documentation.
Solution 1
What I have to do is set the flags set json_serialization_enable to true; and set json_serialization_parse_nested_strings to true; which will parse the SUPER type as JSON (ie back to JSON). I can then use JSON-functions to query the data. Unnesting data gets even crazier because you can only use the unnest syntax select item from table as t, t.items as item on SUPER types. I genuinely don't think that this is the supposed way to query and unnest SUPER objects but that's the only approach that worked for me.
They described that in some older "Amazon Redshift Developer Guide".
Solution 2
When you are writing your query or creating a query Redshift will try to fit the output into one of the basic column data types. If the result of your query does not match any of those types, Redshift will not process the query. Hence, in order to convert a SUPER to a compatible type you will have to unnest it (using the rather peculiar Redshift unnest syntax).
For me, this works in certain cases but I'm not always able to properly index arrays, not can I access the array index (using my_table.array_column as array_entry at array_index syntax).

How can I prevent SQL injection with arbitrary JSONB query string provided by an external client?

I have a basic REST service backed by a PostgreSQL database with a table with various columns, one of which is a JSONB column that contains arbitrary data. Clients can store data filling in the fixed columns and provide any JSON as opaque data that is stored in the JSONB column.
I want to allow the client to query the database with constraints on both the fixed columns and the JSONB. It is easy to translate some query parameters like ?field=value and convert that into a parameterized SQL query for the fixed columns, but I want to add an arbitrary JSONB query to the SQL as well.
This JSONB query string could contain SQL injection, how can I prevent this? I think that because the structure of the JSONB data is arbitrary I can't use a parameterized query for this purpose. All the documentation I can find suggests I use parameterized queries, and I can't find any useful information on how to actually sanitize the query string itself, which seems like my only option.
For example a similar question is:
How to prevent SQL Injection in PostgreSQL JSON/JSONB field?
But I can't apply the same solution as I don't know the structure of the JSONB or the query, I can't assume the client wants to query a particular path using a particular operator, the entire JSONB query needs to be freely provided by the client.
I'm using golang, in case there are any existing libraries or code fragments that I can use.
edit: some example queries on the JSONB that the client might do:
(content->>'company') is NULL
(content->>'income')::numeric>80000
content->'company'->>'name'='EA' AND (content->>'income')::numeric>80000
content->'assets'#>'[{"kind":"car"}]'
(content->>'DOB')::TIMESTAMP<'2000-01-30T10:12:18.120Z'::TIMESTAMP
EXISTS (SELECT FROM jsonb_array_elements(content->'assets') asset WHERE (asset->>'value')::numeric > 100000)
Note that these don't cover all possible types of queries. Ideally I want any query that PostgreSQL supports on the JSONB data to be allowed. I just want to check the query to ensure it doesn't contain sql injection. For example, a simplistic and probably inadequate solution would be to not allow any ";" in the query string.

You could allow the users to specify a path within the JSON document, and then parameterize that path within a call to a function like json_extract_path_text. That is, the WHERE clause would look like:
WHERE json_extract_path_text(data, $1) = $2
The path argument is just a string, easily parameterized, which describes the keys to traverse down to the given value, e.g. 'foo.bars[0].name'. The right-hand side of the clause would be parameterized along the same rules as you're using for fixed column filtering.

Does PostgreSQL have the equivalent of an Oracle ArrayBind?

Oracle has the ability to do bulk inserts by passing arrays as bind variables. The database then does a separate row insert for each member of the array:
http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html
Thus if I have an array:
string[] arr = { 1, 2, 3}
And I pass this as a bind to my SQL:
insert into my_table(my_col) values (:arr)
I end up with 3 rows in the table.
Is there a way to do this in PostgreSQL w/o modifying the SQL? (i.e. I don't want to use the copy command, an explicit multirow insert, etc)

Nearest that you can use is :
insert into my_table(my_col) SELECT unnest(:arr)

PgJDBC supports COPY, and that's about your best option. I know it's not what you want, and it's frustrating that you have to use a different row representation, but it's about the best you'll get.
That said, you will find that if you prepare a statement then addBatch and executeBatch, you'll get pretty solid performance. Sufficiently so that it's not usually worth caring about using COPY. See Statement.executeBatch. You can create "array bind" on top of that with a trivial function that's a few lines long. It's not as good as server-side array binding, but it'll do pretty well.

No, you cannot do that in PostgreSQL.
You'll either have to use a multi-row INSERT or a COPY statement.

I'm not sure which language you're targeting, but in Java, for example, this is possible using Connection.createArrayOf().
Related question / answer:
error setting java String[] to postgres prepared statement

replacing characters in a CLOB column (db2)

I have a CLOB(2000000) field in a db2 (v10) database, and I would like to run a simple UPDATE query on it to replace each occurances of "foo" to "baaz".
Since the contents of the field is more then 32k, I get the following error:
"{some char data from field}" is too long.. SQLCODE=-433, SQLSTATE=22001
How can I replace the values?
UPDATE:
The query was the following (changed UPDATE into SELECT for easier testing):
SELECT REPLACE(my_clob_column, 'foo', 'baaz') FROM my_table WHERE id = 10726
UPDATE 2
As mustaccio pointed out, REPLACE does not work on CLOB fields (or at least not without doing a cast to VARCHAR on the data entered - which in my case is not possible since the size of the data is more than 32k) - the question is about finding an alternative way to acchive the REPLACE functionallity for CLOB fields.
Thanks,
krisy

Finally, since I have found no way to this by an SQL query, I ended up exporting the table, editing its lob content in Notepad++, and importing the table back again.

Not sure if this applies to your case: There are 2 different REPLACE functions offered by DB2, SYSIBM.REPLACE and SYSFUN.REPLACE. The version of REPLACE in SYSFUN accepts CLOBs and supports values up to 1 MByte. In case your values are longer than you would need to write your own (SQL-based?) function.
BTW: You can check function resolution by executing "values(current path)"

How to optimize generic SQL to retrieve DDL information

I have a generic code that is used to retrieve DDL information from a Firebird database (FB2.1). It generates SQL code like
SELECT * FROM MyTable where 'c' <> 'c'
I cannot change this code. Actually, if that matters, it is inside Report Builder 10.
The fact is that some tables from my database are becoming a litle too populated (>1M records) and that query is starting to take too long to execute.
If I try to execute
SELECT * FROM MyTable where SomeIndexedField = SomeImpossibleValue
it will obviously use that index and run very quickly.
Well, it wouldn´t be that hard to the database find out that that is an impossible matcher and make some sort of optimization and avoid testing it against each row.
Is there any way to make my firebird database to optimize that search?

As the filter condition is a negative proposition (and also doesn't refer a column to search, but only a value to compare to another value), Firebird need to do a full table scan (without use any index) to confirm that aren't any record that meet your criteria.
If you can't change you need to wait for the upcoming 3.0 version, that will implement the Boolean data type, and therefore should start to evaluate "constant" fake comparisons in advance (maybe the client library will do this evaluation before send the statement to the server?).