NoSQL Injection? (PHP->phpcassa->Cassandra) - nosql

Anyone familiar enough with the Cassandra engine (via PHP using phpcassa lib) to know offhand whether there's a corollary to the sql-injection attack vector? If so, has anyone taken a stab at establishing best practices to thwart them? If not, would anyone like to ; )

No. The Thrift layer used by phpcassa is an rpc framework, not based on string parsing.

An update - Cassandra v0.8 introduced CQL, which might have brought with it the possibility of injection attacks. However:
Prepared statements were then introduced in Cassandra v1.1.0, which help to prevent such attacks.
Furthermore, see this posting which explains features of CQL that make it resistant to injection, including:
each CQL query must contain exactly one statement
as a rule of thumb, there are also no statement types that contain
other statements, which would be another common vector for an
injection.

Related

SphinxQL with php mysqli/pdo and prepared statements

When querying Sphinx through SphinxQL would you gain the standard benefits of using mysqli/pdo in PHP?
In additions is there any benefit to using prepared statements with SphinxQL? Are they even supported?
I don't think proper binary (ie in the protocol - server-side) prepared statements are supported. It would have to be software emulated (client-side), which wouldn't bring much benefit.
In general one of the main reasons (other than sql injection protection) for prepared statements, is to avoid the overhead of full SQL parsing on every command. the sql dialect understood by sphinx is much simpler than a full blown database server, so it should in general be much quicker that parsing the incoming statements.
You may as well use mysqli I would think, but PDO wouldnt bring much benefit.
But at the end of the day, use which is most familiar to you, rather than worrying about the tiny benefits each might bring :)

Databases non-ORM and Scala

What is the best non-ORM database to work with Scala? I find this link link text, but this does not answer my question fully.
Could be considered desirable features performance, scalability and facility to write complex structures of relationships between data.
Thanks
Do you mean non-relational? There are Scala client libraries/wrappers for many NoSQL databases, including Cassandra, MongoDB, Redis, Voldemort, CouchDB, etc.
If by "complex structures of relationships between data" you mean that you'd prefer not to have to normalize, any NoSQL database should do reasonably well.
However, note that none of them--to my knowledge--will do anything like enforcing a referential integrity constraint or dereferencing object navigation paths for you. For that you may want to consider a graph database or OODBMS; unfortunately I'm not aware of any that's open source, liberally licensed and clusterable.
Update: I just found OrientDB which actually meets all threetwo of these criteria.
Update 2: OrientDB's clustering support isn't released yet. As a wise man once said, two out of three ain't bad.
The best solution is probably not to worry about it...
Abstract away from the problem by using the pluggable-persistence support in Akka: http://doc.akkasource.org/persistence
Then you can try them all, and take your pick based on profiling results :)

Smart way to evaluate what is the right NoSQL database for me?

There appears to be a myriad of NoSQL databases available these days:
CouchDB
MongoDB
Cassandra
Hadoop
There's also a boundary between these tools and tools such as Redis that work as a memcached replacement.
Without hand waving and throwing too many buzz words - my question is the following:
How does one intelligently decide which tool here makes the most sense for their project? Are the projects similar enough to where the answer to this is subjective, eg: Ruby is better than Python or Python is better than Ruby? Or are we talking Apples and oranges here in that they each of them solve different problems?
What's the best way to educate myself on this new trend?
Perhaps one way to think of it is, programming has recently evolved from using one general-purpose language for everything to using the general-purpose language for most things, plus domain-specific languages for the more appropriate parts. For example, you might use Lua to script artificial intelligence of a character in a game.
NoSQL databases might be similar. SQL is the general purpose database with the longest and broadest adoption. While it could be shoehorned to serve many tasks, programmers are beginning to use NoSQL as a domain-specific database when it is more appropriate.
I would argue, that the 4 major players you named do have quite different featuresets and try to solve different problems with different priority.
For instance, as far as i know Cassandra (and i assume Hadoop) central focus is on large scale installations.
MongoDb tries to be a better scaling alternative to classic SQL servers in providing comparably powerful query functions.
CouchDB's focus is comparably small scale (will not shard at all, "only" replicate), high durability and easy synchronization of data.
You might want to check out http://nosql-database.org/ for some more information.
I am facing pretty much the same problem as you, and i would say there is no real alternative to look at all solutions in detail.
Check out this site: http://cattell.net/datastores/ and in particular the PDF linked at the bottom (CACM Paper). The latter contains an excellent discussion of the relative merits of various data store solutions.
It's easy. NoSQL databases are ACID compliant databases minus some guarantees. So just decide which guarantees you can do without and find the database that fits. If you don't need durability for example, maybe redis is best. Or if you don't need multi-record transactions, then perhaps look into mongodb.

Where can I get the ANSI or ISO standards for the RDBMS queries?

I want to write some queries which can work in almost all the databases without any SQLExceptions. So, where can I get the ANSI standards to write the queries ?
Not sure that'll help you.
Vendors are touch and go as far as standards implementation and often the standards themselves are imprecise enough such that you could never write a query that would work with all implementors.
For example, SQL 92 defines the concatenation operator as || but neither MySQL nor MSSQL use this (Oracle does). Vendor independent string concatenation is impossible.
Similarly, a standard escape character is not specified so how you handled that might not work in all vendors.
Having said that:
SQL 92:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Wiki article with links to SQL 99 ISO documents:
http://en.wikipedia.org/wiki/SQL:1999
From wikipedia:
The SQL standard is not freely available. The whole standard may be purchased from the ISO as ISO/IEC 9075(1-4,9-11,13,14):2008.
Nevertheless I would not advise you to follow this strategy because no database engine follows any SQL standard (SQL 99, 2003, etc.) to the letter. All of them take liberties in the way they handle instructions or define variables (for example, when comparing two strings different engines handle case sensitivity differently). A method that is very efficient with one engine can be terrible inefficient for another.
A suggestion would be to develop a standard group of queries and develop different classes that contain the specific implementation of that query for a certain target RDBMS.
Hope this helped
Check out the BNF of the core SQL grammars available at http://savage.net.au/SQL/
This is part of the answer - the rest, as pointed out by Kiranu and MattMitchell, is that different vendors implement the standard differently. No DBMS adheres perfectly to even SQL-92, though most are pretty close.
One observation: the SQL standard says nothing about indexes - so there is no standard syntax for creating an index. It also says nothing about how to create a database; each vendor has their own mechanisms for doing that.
The Sql-92 standard is probably the one you want to target. I believe it's supported most of the major RDBMSs.
Here is a less terse link. Sample content:
PostgreSQL Has views. Breaks standard by not allowing updates to views...
DB2 Conforms to at least SQL-92.
MSSQL Conforms to at least SQL-92.
MySQL Conforms to at least SQL-92.
Oracle Conforms to at least SQL-92.
Informix Conforms to at least SQL-92.
Something else you might consider, if you're using .NET, is to use the factory pattern in System.Data.Common which does a good job of abstracting provider specifics for a number of RDBMSs.
If you are trying to make a product that will work against multiple databases I think trying to only use standard sql is not the way to go, as other answers have indicated, due to the different 'interpretations' of the standard. Instead you should if possible have some kind of data access layer in your application which has different implementations specific for each database. Depending on what you are trying to do, there are tools such as Hibernate which will so a lot of the heavy lifting in regards to this for you.

Storing parts of user data in files for preventing SQL injection

I am new to web programming and have been exploring issues related to web security.
I have a form where the user can post two types of data - lets call them "safe" and "unsafe" (from the point of view of sql).
Most places recommend storing both parts of the data in database after sanitizing the "unsafe" part (to make it "safe").
I am wondering about a different approach - to store the "safe" data in database and "unsafe" data in files (outside the database). Ofcourse this approach creates its own set of problems related to maintaining association between files and DB entries. But are there any other major issues with this approach, especially related to security?
UPDATE: Thanks for the responses! Apologies for not being clear regarding what I am
considering "safe" so some clarification is in order. I am using Django, and the form
data that I am considering "safe" is accessed through the form's "cleaned_data"
dictionary which does all the necessary escaping.
For the purpose of this question, let us consider a wiki page. The title of
wiki page does not need to have any styling attached with it. So, this can be accessed
through form's "cleaned_data" dictionary which will convert the user input to
"safe" format. But since I wish to provide the users the ability to arbitrarily style
their content, I can't perhaps access the content part using "cleaned_data" dictionary.
Does the file approach solve the security aspects of this problem? Or are there other
security issues that I am overlooking?
You know the "safe" data you're talking about? It isn't. It's all unsafe and you should treat it as such. Not by storing it al in files, but by properly constructing your SQL statements.
As others have mentioned, using prepared statements, or a library which which simulates them, is the way to go, e.g.
$db->Execute("insert into foo(x,y,z) values (?,?,?)", array($one, $two, $three));
What do you consider "safe" and "unsafe"? Are you considering data with the slashes escaped to be "safe"? If so, please don't.
Use bound variables with SQL placeholders. It is the only sensible way to protect against SQL injection.
Splitting your data will not protect you from SQL injection, it'll just limit the data which can be exposed through it, but that's not the only risk of the attack. They can also delete data, add bogus data and so on.
I see no justification to use your approach, especially given that using prepared statements (supported in many, if not all, development platforms and databases).
That without even entering in the nightmare that your approach will end up being.
In the end, why will you use a database if you don't trust it? Just use plain files if you wish, a mix is a no-no.
SQL injection can targeted whole database not only user, and it is the matter of query (poisoning query), so for me the best way (if not the only) to avoid SQL injection attack is control your query, protect it from possibility injected with malicious characters rather than splitting the storage.