How to safely store a sting with apostrophe in JSONB in postgres

How to safely store a sting with apostrophe in JSONB in postgres - postgresql

I have a case where addresses and country names have special characters. For eg:
People's Republic of Korea
De'Paul & Choice Street
etc..
This data get send as JSON payload to backend to be inserted in a JSONB column in postgres.
The insert statement gets messed up because of the "single quote" and ends up erroring out.
The front-end developers are saying that they are using popular libraries to get country names etc and don't want to touch the data. They just want to pass as is.
Any tips on how to process such data with special characters especially something that contradicts with JSON formatted data and safely insert into postgres?

Your developers are using the popular libraries, whatever they may be, in the wrong fashion. The application is obviously vulnerable to SQL injection, the most popular way to attack a database application.
Use prepared statements, then the problem will go away. If you cannot do that, use the popular library's functions to escape the input string for use as an SQL string literal.

Related

REST API - string or numerical identifier in URL

We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?

It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.

Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service

Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.

How can I use ormlite to escape my insert?

I have ormlite integrated into an application I'm working on. Right now I'm trying to build in functionality to easily switch from automatically inserting data to the database to outputting the equivalent collection of insert statements to a file for later use. The data isn't user input but still requires proper escaping to handle basic gotchas like apostrophes.
Ideas I've burned through:
Dao.create() writes to the database directly, so that's a no-go.
QueryBuilder can't handle inserts.
JdbcDatabaseConnection.compileStatement() might work but the amount of setup required is inappropriate.
Using a java.sql.PreparedStatement has a reasonable enough interface (if toString() returns the SQL like I would hope) but it's not compatible with ormlite's connection types.
This should be very easy and if it is, I can't find the right combination of method calls to make it happen.

Right now I'm trying to build in functionality to easily switch from automatically inserting data to the database to outputting the equivalent collection of insert statements to a file for later use.
Interesting. So one hack would be to use the MappedCreate class. The MappedCreate.build(...) method takes a DatabaseType and a TableInfo which is available from the dao.getTableInfo().
The mappedCreate.toString() exposed the generated INSERT statement (with a prefix) which might help but you would still need to convert the ? arguments to be the actual values with escaped quotes. That you would have to do in your own code.
Hope this helps somewhat.

Rest Search Query handling special characters design standards

I am designing a rest api where users can pass in queries using a search query language I will define.
The language will allow a number of operators eq, ne, gt, lt (equals, not equals, greater than, less than) etc etc.
The language will allow grouping and logical operators AND and OR.
So for example a query about companies may look like the following
/api/companies?q=(CompanyName eq Microsoft Or CompanyName eq Apple) And State eq California
So this should give me all companies where company name equals 'Microsoft' or 'Apple' and the state is California.
So this all works fine except for the fact that the system that I am writing the api against is extremely flexible and allows almost any character to be inserted into fields values. Additionally, I also must support custom fields and those are able to have special characters in the field name.
Initially my main concern was fields that contained parentheses. I will be converting this query into a SQL server query and I need a way to ensure that I do not confuse a parentheses in a field value with one that is intended for grouping. My second thought was to force field values to be quoted, but I think this will also cause similar problems.
I was also considering that there may be a simple approach involving html encoding, but I am unable to see exactly how that would work.
What I am looking for is any advice or examples of reasonable approaches to handle a rest search query with such flexible data.

You should use percent encoding to escape characters in your query string, see RFC 3986. This previous StackOverflow post contains some useful background information about URI encoding.
Initially my main concern was fields that contained parentheses. I will be converting this
query into a SQL server query and I need a way to ensure that I do not confuse a parentheses
in a field value with one that is intended for grouping
If this might be a problem then it sounds like your application will be susceptible to SQL injection. You should be escaping any external data before constructing an SQL query.
/api/companies?q=(CompanyName eq Microsoft Or CompanyName eq Apple) And State eq California
Based on this example you could take advantage of the URI query string to better represent your query:
/api/companies?CompanyName=Microsoft%20OR%20Apple&State=California

Here is an example.
http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/

Zend Framework disable string escape in ->insert

How can I disable string escape in $db->insert, I need to insert html in my database, so I don't want any string escape.Any solutions?

You don't want to disable that escaping.
Escaping data doesn't prevent you from inserting anything. In fact, quite the opposite: escaping data enables you to properly insert characters like quote marks that could otherwise confuse the database. More importantly, passing unescaped data directly to a database exposes an enormous security hole, making it trivial for a "hacker" (if we use the term liberally) to gain unrestricted access to your site and to your database.
You're probably confusing SQL escaping (which escapes data for use in SQL queries) with htmlspecialchars(), which escapes data for use on webpages. The two are unrelated.

Core Data Query slow

What's the secret to pulling up items that match characters typed into the search bar that react instantaneously? For instance, if I type in a letter "W" in the search bar, all phrases that contain a letter "W" in any character position within the phrase are returned immediately.
So if a database of 20,000 phrases contains 500 phrases with the letter "W", they would appear as soon as the user typed the first character. Then as additional characters are typed, the list would automatically gets shorter.
I can send query's up to a SQL server from the iPhone and get this type of response, however, no matter what we try and taking the suggestions of other users, we still can't get good response time when storing the database locally on the iPhone.
I know that this performance is available, because there are many other apps out there that display results as soon as you start typing.
Please note that this isn't the same as indexing all words in every phrase, as this only will bring up matches where the word starts with the character typed in. In this case, we're looking for characters within words.

I think asynchronous results filtering is the answer. Instead of updating the search results every time the user types a new character, put the db query on a background thread when the first character is typed. If a new character is typed before the query is finished, cancel the old query and start a new one. Finally, you will get to the point where the user stops typing long enough for the query to return. That way, the query itself never blocks the user's typing.
I believe the UISearchDisplayController class offers this type of asynchronous search, though whether you want to use that class or just adopt the asynchronous design pattern from it is up to you.

If you're willing to get away from the database for this, you could use a generalized suffix tree with all the terms in your phrases. You can build in a suffix tree in linear time and, I believe, use it to find all occurrences of a substring very quickly. The web has lots of pages about suffix trees and suffix arrays. Wikipedia is probably a good place to start.

I have a fun scheme for you. You can build an index of the characters that exist in each phrase via a 32-bit integer. Flip the bits [0-25] to represent the characters (case-insensitive) a-z that exist in the phrase. Build a second bitmap of the query string. Now you can do comparisons via bitwise operations (& and |) to determine matches. This is very fast and believe it or not SQLite actually supports bitwise operations in queries - so you can even use this scheme to go straight to the database. I have working code that does this built into one of our iPhone applications - Alphagram.