Is it possible to evaluate a Postgres expression without connecting to a database? - postgresql

PostgreSQL has excellent support for evaluating JSONPath expressions against JSON data.
For example, this query returns true because the value of the nested field is indeed "foo".
select '{"header": {"nested": "foo"}}'::jsonb #? '$.header ? (#.nested == "foo")'
Notably this query does not reference any schemas or tables. Ideally, I would like to use this functionality of PostgreSQL without creating or connecting to a full database instance. Is it possible to run PostgreSQL in such a way that it doesn't have schemas or tables, but is still able to evaluate "standalone" queries?
Some other context on the project, we need to evaluate JSONPath expressions against JSON data in both a Postgres database and Python application. Unfortunately, Python does not have any JSONPath libraries that support enough of the spec to be useful to us.

Ideally, I would like to use this functionality of PostgreSQL without creating or connecting to a full database instance.
Well, it is open source. You can always pull out the source code for this functionality you want and adapt it to compile by itself. But that seems like a large and annoying undertaking, and I probably wouldn't do it. And short of that, no.
Why do you need this? Are you worried about scalability or ease of installation or performance or what? If you are already using PostgreSQL anyway, firing up a dummy connection to just fire some queries at the JSONB engine doesn't seem too hard.

Related

Using variables for schema and table names in a Redshift query

I want to be able to use the variable names in Redshift which refers to my DB Objects (like schema and table names). Something like...
SET my_schema="schema":
SET my_table="table";
SELECT * from #my_schema.#my_table;
But looks like Redshift doesn't have such feature. Is there any workaround possible to achieve this?
There are a few ways you try to attack this. But first trying to use a database engine for functions beyond querying the database is a waste of horsepower and the road to db lock-in. So I'm going to focus on ways to do this before the database.
The most complete way is to use a front-end system that clients connect to and then this system in turn connects to the db. The one I've used in the past is pgbounce-rr which pools connections to the the db but also allow for modifications to the SQL before being sent on. This will do what you want but you will need a computer to perform this work.
If you use Redshift data-api you could put a Lambda function in series which performs the SQL modifications you desire (but make sure you get your API permissions right). However, I expect it is unlikely that you are looking to move to an API access model.
Many benches support variable substitution and simple replacements in the SQL can be done by the bench. However, this is very dependent on which bench you use and having all users' benches configured correctly.
Bottom line - if you want something to modify your SQL do if before it goes to Redshift.

Upsert in Amazon RedShift without Function or Stored Procedures

As there is no support for user defined functions or stored procedures in RedShift, how can i achieve UPSERT mechanism in RedShift which is using ParAccel, a PostgreSQL 8.0.2 fork.
Currently, i'm trying to achieve UPSERT mechanism using IF...THEN...ELSE... statement
e.g:-
IF NOT EXISTS(SELECT...WHERE(SELECT..))
THEN INSERT INTO tblABC() SELECT... FROM tblXYZ
ELSE UPDATE tblABC SET.,.,.,. FROM tblXYZ WHERE...
which is giving me error. As i'm writing this code independently without including it in function or SP's.
So, is there any solution to achieve UPSERT.
Thanks
You should probably read this article on upsert by depesz. You can't rely on SERIALIABLE for this since, AFAIK, ParAccel doesn't support full serializability support like in Pg 9.1+. As outlined in that post, you can't really do what you want purely in the DB anyway.
The short version is that even on current PostgreSQL versions that support writable CTEs it's still hard. On an 8.0 based ParAccel, you're pretty much out of luck.
I'd do a staged merge. COPY the new data to a temporary table on the server, LOCK the destination table, then do an UPDATE ... FROM followed by an INSERT INTO ... SELECT. Doing the data uploads in big chunks and locking the table for the upserts is reasonably in keeping with how Redshift is used anyway.
Another approach is to externally co-ordinate the upserts via something local to your application cluster. Have all your tools communicate via an external tool where they take an "insert-intent lock" before doing an insert. You want a distributed locking tool appropriate to your system. If everything's running inside one application server, it might be as simple as a synchronized singleton object.

NoSql Injection in Python

when trying to make this question, i got this one it is using Java, and in the answer it gave a Ruby example, and it seems that the injection happens only when using Json? because i've an expose where i'll try to compare between NoSQL and SQL and i was trying to said: be happy, nosql has no sql injection since it's not sql ...
can you please explain me:
how sql injection happens when using Python driver (pymongo).
how to avoid it.
the comparison using the old way sql injection using the comment in the login form.
There are a couple of concerns with injection in MongoDB:
$where JS injection - Building JavaScript functions from user input can result in a query that can behave differently to what you expect. JavaScript functions in general are not a responsible method to program MongoDB queries and it is highly recommended to not use them unless absolutely needed.
Operator injection - If you allow users to build (from the front) a $or or something they could easily manipulate this ability to change your queries. This of course does not apply if you just take data from a set of text fields and manually build a $or from that data.
JSON injection - Quite a few people recently have been trying to convert a full JSON document sent (saw this first in JAVA, ironically) from some client side source into a document for insertion into MongoDB. I shouldn't need to even go into why this is bad. A JSON value for a field is fine since, of course, MongoDB is BSON.
As #Burhan stated injection comes from none sanitized input. Fortunately for MongoDB it has object orientated querying.
The problem with SQL injection comes from the word "SQL". SQL is a querying language built up of strings. On the other hand MongoDB actually uses a BSON document to specify a query (an Object). If you keep to the basic common sense rules I gave you above you should never have a problem with an attack vector like:
SELECT * FROM tbl_user WHERE ='';DROP TABLE;
Also MongoDB only supports one operation per command atm (without using eval, don't ever do that though) so that wouldn't work anyway...
I should add that this does not apply to data validation only injection.
SQL injection has nothing to do with the database. It is a type of vulnerability that allows for execution of arbitrary SQL commands because the target system does not sanitize the SQL that is given to the SQL server.
It doesn't matter if you are on NoSQL or not. If you have a system running on mongodb (or couchdb, or XYZ db), and you provide a front end where users can enter records - and you don't correctly escape and sanitize the input coming from the front end; you are open to SQL injection.

Mongodb access through sql like syntax

Is there any library where i can access mongodb by using sql like syntax.
Example
use db
select * from table1
insert into table1 values (a,b,c)
delete from table
select a,b,count(*) from table1 group by a,b
select a.field1,b.field2 from a,b where a.id=b.id
Thanks
Raman
The learning curve is small only if you are only doing extremely simple sql queries. If the extent of your SQL querying is "select * from X", then MongoDB looks like a brilliant idea to cut through all the too-complicated SQL. But if you need to perform left outer joins, test for null, check for ranges, subselects, grouping and summation, then you will soon end up with a round concave dent in your desk after being moved to Mongo. The sick punchline is that half the time, the thing you are trying to do can't be done in the Mongo interface. Mongo represents a bold new world where instead of databases doing things like aggregation and query optimization, it just stores data and all the magic is done by retrieving everything, slowly, storing it in app memory, and doing all that stuff in code instead.
YES!
A company called UnityJDBC makes a JDBC driver for mongodb. Unlike the mongo java driver, this JDBC driver allows you to run SQL queries against MongoDB and the driver is supported by any Java appliaction that uses JDBC.
to download this driver go to...
http://www.unityjdbc.com/mongojdbc/mongo_jdbc.php
Its free to download too!
hope this helps
MoSQL might satisfy your needs. It'll require you to run a new PostgreSQL instance but from there you can query your entire Mongo dataset with SQL.
"MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL."
Have a look at this recent project: http://www.mongosql.com/. I've been looking at it over the last few weeks and it looks very promising.
For those of you who have questioned the usefulness of SQL against MongoDB, consider the large number of not-very-technical users in many organizations, like business analysts, who may know SQL, but don't want to make the leap to JavaScript and JSON. Tools like mongoSQL can help push the adoption of MongoDB in an organization.
There are a few solutions out there, but nearly all of them fail to truly represent the MongoDB data model in a way that the "relationally" minded ODBC/JDBC applications and users desire/require. A recent commercial product was released that addresses these challenges
ODBC:
http://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/mongodb
JDBC:
http://www.progress.com/products/datadirect-connect/jdbc-drivers/data-sources/mongodb
To address the need for ODBC/JDBC (SQL) access...While there are strong arguments for writing new applications using Mongo's clients, there is still a strong need in the marketplace for quality ODBC/JDBC and SQL based access to MongoDB. This need largely arises from all the reporting, analytic, and BI applications that rely on ODBC/JDBC connectivity and do not offer native integration with MongoDB.
Free NoSQL Viewer supports conversion of SQL queries to MongoDB shell syntax. Furthermore, in SQL Viewer you can even use SQL SELECT statements to query MongoDB collections data without knowing MongoDB query syntax. Check out NoSQL Viewer here www.spviewer.com/nosqlviewer.html
Mongodb and its current driver do not support direct SQL like syntax.
However, all operations are easily doable with the driver specific operations.
Here is a brief mapping of mongodb operations to corresponding SQL like query :
http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
There are a couple projects underway to emulate a SQL interface for MongoDB. While they provide a familiar interface, in general they should be avoided. They operate on a fundamentally flawed premise in that they parse strings and translate them into method calls.
Once you work with MongoDB you will find the approach of using classes and methods a much more accessible interface as it works exactly like all other parts of your application. Yes there is a small learning curve as you first start, but for the most part, the interface in MongoDB works how you would expect it to.

Data Warehousing Postgres

We're considering using SSIS to maintain a PostgreSql data warehouse. I've used it before between SQL Servers with no problems, but am having a lot of difficulty getting it to play nicely with Postgres. I’m using the evaluation version of the OLEDB PGNP data provider (http://www.postgresql.org/about/news.1004).
I wanted to start with something simple like UPSERT on the fact table (10k-15k rows are updated/inserted daily), but this is proving very difficult (not to mention I’ll want to use surrogate keys in the future).
I’ve attempted (Link) and (http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx) which are effectively the same (except I don’t really understand the union all at the end when I’m trying to upsert) But I run into the same problem with parameters when doing the update using a OLEDb command – which I tried to overcome using (http://technet.microsoft.com/en-us/library/ms141773.aspx) but that just doesn’t seem to work, I get a validation error –
The external columns for complent.... are out of sync with the datasource columns... external column “Param_2” needs to be removed from the external columns.
(this error is repeated for the first two parameters as well – never came across this using the sql connection as it supports named parameters)
Has anyone come across this?
AND:
The fact that this simple task is apparently so difficult to do in SSIS suggests I’m using the wrong tool for the job - is there a better (and still flexible) way of doing this? Or would another ETL package be better for use between two Postgres database? -Other options include any listed on (http://en.wikipedia.org/wiki/Extract,_transform,_load#Open-source_ETL_frameworks). I could just go and write a load of SQL to do this for me, but I wanted a neat and easily maintainable solution.
I have used the Slowly Changing Dimension wizard for this with good success. It may give you what you are looking for especially with the Wizard
http://msdn.microsoft.com/en-us/library/ms141715.aspx
The External Columns Out Of Sync: SSIS is Case Sensitive - I encountered this issue multiple times and it makes me want to pull my hair out.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
SCD is way too slow for what I want. I need to use set based sql.
It turned out that a lot of my problems were with bugs in the provider.
I opened a forum topic (http://www.pgoledb.com/forum/viewtopic.php?f=4&t=49) and had a useful discussion with the moderator/support/developer person.
Also Postgres doesn't let you do cross db querys, so I solved the problem this way:
Data Source from Production DB to a temp Archive DB table
Run set based query between temp table and archive table
Truncate temp table
Note that the temp table is not atchally a temp table, but a copy of the archive table schema to temporarily stored data in.
Took a while, but I got there in the end.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
What enterprise ETL solution would you suggest?