Is it possible to query the CKAN datastore with comparisons other than exact matching? - postgresql

Is it possible to pass any of the comparison operators listed here https://www.postgresql.org/docs/9.1/static/functions-comparison.html to the datastore_search api?
I'm aware of the datastore_search_sql function, but it seems like pretty bad practice to be passing sql queries directly from the frontend.

I'm afraid datastore_search only does =. See https://github.com/ckan/ckan/blob/master/ckanext/datastore/backend/postgres.py#L341 This API call is designed for simple filtering and sorting - mirroring the controls in the resource preview widget.
I'm not clear about your situation - frontend sending SQL queries. But I don't see much different between using datastore_search_sql and datastore_search. They are both relatively simple wrappers around Postgres SQL.

Related

Scala generic query predicate generation DSL (possibly similar to AlaSQL in javascript)?

I'm building a generic API that provides access to a ton of different data sets, and I would like to avoid creating specific query API's for each data type. I.e. for the Users endpoint needing to manually implement name="Joe". I would rather the user be able to use some query language, like an SQL where predicate or something similar, to filter down these data sets. The set of data is ever growing, and we need a generic way to form query predicates.
Back when I was working with javascript, I used https://github.com/agershun/alasql to do simple predicates over objects in memory.
I'm looking for something similar in scala. It doesn't need to be SQL, it could be JSON or some other DSL.
I've looked at Calcite, and was able to get it to do WHERE predicates on data, but it took a lot of hacking. The Calcite library is incredibly large and complex. It requires a lot of objects to be instantiated to even generate one query. I don't want to pull that kind of heavy dependency into the project.

Combining MongoDB and a GraphDB like Neo4J

As part of a CMS I'm developing I've got MongoDB as the primary datastore which feeds to ElasticSearch and Redis. All this is configured decleratively.
I'm currently trying to develop a declarative api in JSON (A DSL of sorts) which, when implemented, will enable me to write uniform queries in JSON, but at the backend these datastores work in tandem to come up with the result. Federated search if you will.
Now, while fleshing out the supported types of queries for this Json api, I've come across a class of queries not (efficiently) supported by my current setup: graph-based queries, like friend-of-friend, RDF-queries, etc. Something I'd like to support as well.
So I'm looking for a way to introduce a GraphDB into this ecosystem with the best fit. I should probably say the app-layer sits in Node.js.
I've come across lots of articles comparing Neo4J (a popular GraphDB) vs MongoDB, but not so much of actual use-cases, real world scenarios in which the 2 are complemented.
Any pointers highly appreciated.
You might want to take a look at structr[1], which has a RESTful graph database backend that you can configure using Java beans. In future versions, there will be a configuration option using REST calls only, so that you can fire up a structr server and configure and use it as a standalone graph database backend.
Just contact us on twitter or via email.
(disclaimer: I'm one of the developers of structr, so this comment may not be 100% impartial :))
[1] http://structr.org
The databases are very much complementary.
Use MongoDB to store your raw data/system of record and load the raw data into Neo4j for additional insights/analysis. When you are dealing with unstructured data, you want to store the information in a datastore which is conducive to unstructure data - MongoDB fits the bill (as does other similar NOSQL databases). While Neo4j is considered a NOSQL database, it doesn't fit the bill for unstructured data. Because you have to determine what is a relationship, what is a node, and what properties are stored for each - it's better suited when you have semi-structured data and some understanding of the type of analysis you want to do.
A great architecture is store your unstructured data in MongoDB and use jobs to load them into Neo4j. This allows you to re-load your graph if you figure out there are new pieces of information you'd like to store in the graph for additional analysis.
They are definitely NOT replacements for each other. They fit very different use cases.

Using MicroORM for read layer in CQRS

Folks,
Im considering using a microORM such as Dapper.net for the read access component of a CQRS application (Asp.Net MVC), with Entity Framework being used for manipulating the domain.
This is CQRS light, I am not using event sourcing etc. I have seen it mentioned several times that the read only model in CQRS should be light/simpleas possible querying the data layer, possible using something like ADO.net
That implies potentially hardcoding SQL Query strings in our code or in some XML file. How should I go about justifying this approach where we have to maintain the domain mappings on one side and SQL statements on another?
Has anyone used MicroORM's in a CQRS solution in this way?
Thanks
Mick
Yes, absolutely you can use Dapper, PetaPoco, Massive, Simple.Data, or any other micro ORM you would like. In the past we have used NHibernate to solve the problem but it was a 10,000 lbs. gorilla compared to what we needed.
One thing that we really liked about Simple.Data and Petapoco in our evaluation of those libraries was that they each could adapt your queries to different database engines (including Mongo) with minimal tweaking necessary, whereas Dapper was basically one big bunch of SQL strings--it was "stringly typed". Don't get me wrong, Dapper's great and is very, very fast and will absolutely work great. Just evaluate your functional and non-functional requirements before committing.
Here are the relative number of downloads using NuGet for each of the primary micro ORMs (as of about 1/1/2012). For us, having a good community with lots of downloads is always a must in order to help iron out issues when the arise:
5568 Simple.Data
4990 Petapoco
4913 Dapper
2203 Massive
1152 OrmLite
Lastly, one thing you may want to investigate is your reasoning behind SQL altogether for your read models. If your domain is publishing events (regardless of event sourcing), and you're writing to simple, flat/non-relational view models, you may be able to get away with something as simple as JSON files that are pushed to the browser which the browser then interprets and uses to populate your HTML templates. There's all kinds of options that are available, you just need to determine what works best in your scenario.

Where can I get the ANSI or ISO standards for the RDBMS queries?

I want to write some queries which can work in almost all the databases without any SQLExceptions. So, where can I get the ANSI standards to write the queries ?
Not sure that'll help you.
Vendors are touch and go as far as standards implementation and often the standards themselves are imprecise enough such that you could never write a query that would work with all implementors.
For example, SQL 92 defines the concatenation operator as || but neither MySQL nor MSSQL use this (Oracle does). Vendor independent string concatenation is impossible.
Similarly, a standard escape character is not specified so how you handled that might not work in all vendors.
Having said that:
SQL 92:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Wiki article with links to SQL 99 ISO documents:
http://en.wikipedia.org/wiki/SQL:1999
From wikipedia:
The SQL standard is not freely available. The whole standard may be purchased from the ISO as ISO/IEC 9075(1-4,9-11,13,14):2008.
Nevertheless I would not advise you to follow this strategy because no database engine follows any SQL standard (SQL 99, 2003, etc.) to the letter. All of them take liberties in the way they handle instructions or define variables (for example, when comparing two strings different engines handle case sensitivity differently). A method that is very efficient with one engine can be terrible inefficient for another.
A suggestion would be to develop a standard group of queries and develop different classes that contain the specific implementation of that query for a certain target RDBMS.
Hope this helped
Check out the BNF of the core SQL grammars available at http://savage.net.au/SQL/
This is part of the answer - the rest, as pointed out by Kiranu and MattMitchell, is that different vendors implement the standard differently. No DBMS adheres perfectly to even SQL-92, though most are pretty close.
One observation: the SQL standard says nothing about indexes - so there is no standard syntax for creating an index. It also says nothing about how to create a database; each vendor has their own mechanisms for doing that.
The Sql-92 standard is probably the one you want to target. I believe it's supported most of the major RDBMSs.
Here is a less terse link. Sample content:
PostgreSQL Has views. Breaks standard by not allowing updates to views...
DB2 Conforms to at least SQL-92.
MSSQL Conforms to at least SQL-92.
MySQL Conforms to at least SQL-92.
Oracle Conforms to at least SQL-92.
Informix Conforms to at least SQL-92.
Something else you might consider, if you're using .NET, is to use the factory pattern in System.Data.Common which does a good job of abstracting provider specifics for a number of RDBMSs.
If you are trying to make a product that will work against multiple databases I think trying to only use standard sql is not the way to go, as other answers have indicated, due to the different 'interpretations' of the standard. Instead you should if possible have some kind of data access layer in your application which has different implementations specific for each database. Depending on what you are trying to do, there are tools such as Hibernate which will so a lot of the heavy lifting in regards to this for you.

Storing parts of user data in files for preventing SQL injection

I am new to web programming and have been exploring issues related to web security.
I have a form where the user can post two types of data - lets call them "safe" and "unsafe" (from the point of view of sql).
Most places recommend storing both parts of the data in database after sanitizing the "unsafe" part (to make it "safe").
I am wondering about a different approach - to store the "safe" data in database and "unsafe" data in files (outside the database). Ofcourse this approach creates its own set of problems related to maintaining association between files and DB entries. But are there any other major issues with this approach, especially related to security?
UPDATE: Thanks for the responses! Apologies for not being clear regarding what I am
considering "safe" so some clarification is in order. I am using Django, and the form
data that I am considering "safe" is accessed through the form's "cleaned_data"
dictionary which does all the necessary escaping.
For the purpose of this question, let us consider a wiki page. The title of
wiki page does not need to have any styling attached with it. So, this can be accessed
through form's "cleaned_data" dictionary which will convert the user input to
"safe" format. But since I wish to provide the users the ability to arbitrarily style
their content, I can't perhaps access the content part using "cleaned_data" dictionary.
Does the file approach solve the security aspects of this problem? Or are there other
security issues that I am overlooking?
You know the "safe" data you're talking about? It isn't. It's all unsafe and you should treat it as such. Not by storing it al in files, but by properly constructing your SQL statements.
As others have mentioned, using prepared statements, or a library which which simulates them, is the way to go, e.g.
$db->Execute("insert into foo(x,y,z) values (?,?,?)", array($one, $two, $three));
What do you consider "safe" and "unsafe"? Are you considering data with the slashes escaped to be "safe"? If so, please don't.
Use bound variables with SQL placeholders. It is the only sensible way to protect against SQL injection.
Splitting your data will not protect you from SQL injection, it'll just limit the data which can be exposed through it, but that's not the only risk of the attack. They can also delete data, add bogus data and so on.
I see no justification to use your approach, especially given that using prepared statements (supported in many, if not all, development platforms and databases).
That without even entering in the nightmare that your approach will end up being.
In the end, why will you use a database if you don't trust it? Just use plain files if you wish, a mix is a no-no.
SQL injection can targeted whole database not only user, and it is the matter of query (poisoning query), so for me the best way (if not the only) to avoid SQL injection attack is control your query, protect it from possibility injected with malicious characters rather than splitting the storage.