When connecting to AWS Athena, a required parameter is s3_staging_dir to specify the output directory of the query. Is there any way to specify this parameter in scalikejdbc? I've tried looking through all of scalikejdbc's docs, but I found nothing of this sort.
Athena doc: http://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html
Scalikejdbc doc: http://scalikejdbc.org/documentation/configuration.html
I just tried to do it using a custom connection pool factory.
I did manage to connect to Athena, but I couldn't execute any SQL since "prepareStatement" is not implemented in the Athena JDBC driver.
So don't try it, It'll be useless.
Sorry :(
Related
PostgreSQL has excellent support for evaluating JSONPath expressions against JSON data.
For example, this query returns true because the value of the nested field is indeed "foo".
select '{"header": {"nested": "foo"}}'::jsonb #? '$.header ? (#.nested == "foo")'
Notably this query does not reference any schemas or tables. Ideally, I would like to use this functionality of PostgreSQL without creating or connecting to a full database instance. Is it possible to run PostgreSQL in such a way that it doesn't have schemas or tables, but is still able to evaluate "standalone" queries?
Some other context on the project, we need to evaluate JSONPath expressions against JSON data in both a Postgres database and Python application. Unfortunately, Python does not have any JSONPath libraries that support enough of the spec to be useful to us.
Ideally, I would like to use this functionality of PostgreSQL without creating or connecting to a full database instance.
Well, it is open source. You can always pull out the source code for this functionality you want and adapt it to compile by itself. But that seems like a large and annoying undertaking, and I probably wouldn't do it. And short of that, no.
Why do you need this? Are you worried about scalability or ease of installation or performance or what? If you are already using PostgreSQL anyway, firing up a dummy connection to just fire some queries at the JSONB engine doesn't seem too hard.
I am using tDBRow to query Snowflake but there are a couple of queries separated by semi colon. I know I need to set the MULTI_STATEMENT_COUNT to allow the multiple queries to run ok, but I don't know where to set it.
Is it in the snowflake connection tDBConnection and if so where? I tried it on Advanced Settings - Additional JDBC Parameters but it didn't like that statement.
I am using Athena Federated Query, with a connection to Amazon RDS. When setting up the connection, in the section for the lambda function for JdbcConnectorConfig, DefaultConnectionString I use:
postgres://jdbc:postgresql://xxxxx.xxxxxxx.ap-southeast-2.rds.amazonaws.com:5432/postgres?user=myUser&password=myPassword
This works well, However I notice in Athena that when I use this connection I get 2 databases:
pg_catalog: is empty
public: where the desire tables are
Do i need to amend this connection string and how? in order to have only the tables within public been display. I have review the following documentation:
PostgreSQL JDBC Driver
I tried unsuccessfully, and following similar question in stackoverflow (Similar question)
postgres://jdbc:postgresql://xxxxx.xxxxxxx.ap-southeast-2.rds.amazonaws.com:5432/postgres?user=myUser&password=myPassword¤tSchema=public
and also
postgres://jdbc:postgresql://xxxxx.xxxxxxx.ap-southeast-2.rds.amazonaws.com:5432/postgres?user=myUser&password=myPassword&searchpath=public
and also
postgres://jdbc:postgresql://xxxxx.xxxxxxx.ap-southeast-2.rds.amazonaws.com:5432/postgres?user=myUser&password=myPassword&options=--search_path=public
I am currently extracting data from PostgreSQL using its own ODBC driver.
The basic parameter described on Connection Strings work so far, but I was not able to find which other parameters are supported.
The documentation of the Devart ODBC Driver also supports the field Schema, which does not seem to work with the one of the PostgreSQL project.
Last but not least, there is a list in the documentation of the ODBC driver listing connection keywords, but these do not match the ones in Connection Strings either.
Is there any resource or standard describing the Connection String parameters I missed?
You should trust the documentation of the product rather than an unrelated site.
If you want to set the search_path with a connection option, you can use the Pqopt parameter like this
pqopt={search_path=myschema,public}
Disclaimer: I didn't test it.
I am working on writing a Spring Java program accessing data from Athena, but I found that Athena JDBC driver does not support PreparedStatement, does anyone have idea about how to avoid SQL injection on Athena?
Update: I originally answered this question in 2018, and since then Athena now supports query parameters.
Below is my original answer:
You'll have to format your SQL query as a string before you prepare the query, and include variables by string concatenation.
In other words, welcome to PHP programming circa 2005! :-(
This puts the responsibility on you and your application code to ensure the variables are safe, and don't cause SQL injection vulnerabilities.
For example, you can cast variables to numeric data types before you interpolate them into your SQL.
Or you can create an allowlist when it's possible to declare a limited set of values that may be allowed. If you accept input, check it against the whitelist. If the input is not in the allowlist, don't use it as part of your SQL statement.
I recommend you give feedback to the AWS Athena project and ask them when they will provide support for SQL query parameters in their JDBC driver. Email them at Athena-feedback#amazon.com
See also this related question: AWS Athena JDBC PreparedStatement
Athena now has support for prepared statements (this was not the case when the question was asked).
That being said, prepared statements aren't the only way to guard against SQL injection attacks in Athena, and SQL injection attacks aren't as serious as they are in a database.
Athena is just a query engine, not a database. While dropping a table can be disruptive, tables are just metadata, and the data is not dropped along with it.
Athena's API does not allow multiple statements in the same execution, so you can't sneak a DROP TABLE foo into a statement without completely replacing the query.
Athena does not, by design, have any capability of deleting data. Athena has features that can create new data, such as CTAS, but it will refuse to write into an existing location and cannot overwrite existing data.