Anyone had success using a specific locale for a PostgreSQL database so that text comparison is case-insensitive? [duplicate] - postgresql

This question already has answers here:
Change postgres to case insensitive
(2 answers)
Closed last year.
I'm developing an app in Rails on OS X using PostgreSQL 8.4. I need to setup the database for the app so that standard text queries are case-insensitive. For example:
SELECT * FROM documents WHERE title = 'incredible document'
should return the same result as:
SELECT * FROM documents WHERE title = 'Incredible Document'
Just to be clear, I don't want to use:
(1) LIKE in the where clause or any other type of special comparison operators
(2) citext for the column datatype or any other special column index
(3) any type of full-text software like Sphinx
What I do want is to set the database locale to support case-insensitive text comparison. I'm on Mac OS X (10.5 Leopard) and have already tried setting the Encoding to "LATIN1", with the Collation and Ctype both set to "en_US.ISO8859-1". No success so far.
Any help or suggestions are greatly appreciated.
Thanks!
Update
I have marked one of the answers given as the correct answer out of respect for the folks who responded. However, I've chosen to solve this issue differently than suggested. After further review of the application, there are only a few instances where I need case-insensitive comparison against a database field, so I'll be creating shadow database fields for the ones I need to compare case-insensitively. For example, name and name_lower. I believe I came across this solution on the web somewhere. Hopefully PostgreSQL will allow similar collation options to what SQL Server provides in the future (i.e. DOCI).
Special thanks to all who responded.

You will likely need to do something like use a column function to convert your text e.g. convert to uppercase - an example :
SELECT * FROM documents WHERE upper(title) = upper('incredible document')
Note that this may mess up performance that used index scanning, but if it becomes a problem you can define an index including column functions on target columns e.g.
CREATE INDEX I1 on documents (upper(title))

With all the limitations you have set, possibly the only way to make it work is to define your own = operator for text. It is very likely that it will create other problems, such as creating broken indexes. Other than that, your best bet seems to be to use the citext datatype; that would still let the ORM stuff you're using generate the SQL.
(I am not mentioning the possibility of creating your own locale definition because I haven't ever heard of anyone doing it.)

Your problem and your exclusives are like saying "I want to swim, but I don't want to have to move my arms.".
You will drown trying.

I don't think that is what local or encoding is used for. Encoding is more for picking a character set and not determining how to deal with characters. If there were a setting it would be in the config, but I haven't seen one.
If you do not want to use ilike for fear of not being able to port to another database then I would suggest you look into what ORM options might be available with ActiveRecord if you are using that.
here is something from one of the top postgres guys: http://archives.postgresql.org/pgsql-php/2003-05/msg00045.php
edit: fixed specific references to locale.

SELECT * FROM documents WHERE title ~* 'incredible document'

Related

PostgreSQL SELECT can alter a table?

So I'm new to SQL like databases and the place that I work at migrated to PostgreSQL. One table drastically reduced its contents. The point is, I only used SELECT statements, and changed the name of the columns with AS. Is there a way I might have changed the table data?
When you migrate from a DBMS to another DBMS you must be sure that the objects created are strictly equivalent... The question seems to be trivial, but is'nt.
As a matter fact one important consideration for litterals (char/varchar...) is to verify the collation used formerly and the collation you have used to create the newly database in PostGreSQL.
Collation in an RDBMS is the way to adjust the behavior of character strings with regard to certain parameters such as the distinction, or not, of upper and lower case letters, the distinction, or not, of diacritical characters (accents, ligatures...), specific sorting to language, etc. And constitutes a superset of the character encoding.
Did you verify this point when using some WHERE clause to search some litterals ? If not, try to restricts litteral in applying the right collation (COLLATE operator) or use UPPER function to avoid the distinguish between upper and lower chars...

Postgres - disable lowering case of query

Is there any flag or option that can be set to disable Postgres from lowering query casing? (ie SELECT firstName, lastName, ... is converted by Postgres to SELECT firstname, lastname, ... )
Yes, I already know if you use double quotes, it will preserve case. And I know because of this annoying behavior, most recommend not to use case sensitive columns, forcing users to only use something other than Pascal naming schemes like snake naming schemes. I don't get why this behavior was built-in the first place.
SQL identifiers must be case-insensitive, unless quoted, according to the standard. So, no, you cannot change this behaviour (unless you're willing to modify Postgres source code and render it even less standard-compliant than it already is).
See also this Q&A

ends with (suffix) and contains string search using MATCH in SQLite FTS

I am using SQLite FTS extension in my iOS application.
It performs well but the problem is that it matches only string prefixes (or starts with keyword search).
i.e.
This works:
SELECT FROM tablename WHERE columnname MATCH 'searchterm*'
but following two don't:
SELECT FROM tablename WHERE columnname MATCH '*searchterm'
SELECT FROM tablename WHERE columnname MATCH '\*searchterm\*'
Is there any workaround for this or any way to use FTS to build a query similar to LIKE '%searchterm%' query.
EDIT:
As pointed out by Retterdesdialogs, storing the entire text in reverse order and running a prefix search on a reverse string is a possible solution for ends with/suffix search problem, which was my original question, but it won't work for 'contains' search. I have updated the question accordingly.
In my iOS and Android applications, I have shied away from FTS search for exactly the reason that it doesn't support substring matches due to lack of suffix queries.
The workarounds seem complicated.
I have resorted to using LIKE queries, which while being less performant than MATCH, served my needs.
The workaround is to store the reverse string in an extra column. See this link (its not exactly the same it should give a idea):
Search Suffix using Full Text Search
To get it to work for contains queries, you need to store all suffixes of the terms you want to be able to search. This has the downside of making the database really large, but that can be avoided by compressing the data.
SQLite FTS contains and suffix matches

String sort order (LC_COLLATE and LC_CTYPE)

Apparently PostgreSQL allows different locales for each database since version 8.4
So I went to the docs to read about locales (http://www.postgresql.org/docs/8.4/static/locale.html).
String sort order is of my particular interest (I want strings sorted like 'A a b c D d' and not 'A B C ... Z a b c').
Question 1: Do I only need to set LC_COLLATE (String sort order) when I create a database?
I also read about LC_CTYPE (Character classification (What is a letter? Its upper-case equivalent?))
Question 2: Can someone explain what this means?
The sort order you describe is the standard in most locales.
Just try for yourself:
SELECT regexp_split_to_table('D d a A c b', ' ') ORDER BY 1;
When you initialize your db cluster with initdb you can can pick a locale with --locale=some_locale. In my case it's --locale=de_AT.UTF-8. If you don't specify anything the locale is inherited from the environment - your current system locale will be used.
The template database of the cluster will be set to that locale. When you create a new database, it inherits the settings from the template. Normally you don't have to worry about anything, it all just works.
Read the chapter on CREATE DATABASE for more.
If you want to speed up text search with indexes, be sure to read about operator classes, as well.
All links to version 8.4, as you specifically asked for that.
In PostgreSQL 9.1 or later, there is collation support that allows more flexible use of collations:
The collation feature allows specifying the sort order and character
classification behavior of data per-column, or even per-operation.
This alleviates the restriction that the LC_COLLATE and LC_CTYPE
settings of a database cannot be changed after its creation.
Compared to other databases, PostgreSQL is a lot more stringent about case sensitivity. To avoid this when ordering you can use string functions to make it case sensitive:
SELECT * FROM users ORDER BY LOWER(last_name), LOWER(first_name);
If you have a lot of data it will be inefficient doing this across a whole table every time you want to display a list of records. An alternative is to use the citext module, which provides a type that is internally case insensitive when doing comparisons.
Bonus:
You might come into this issue when searching too, in this there is a case insensitive pattern matching operator:
SELECT * FROM users WHERE first_name ILIKE "%john%";
Answer for question 1 (One)
The LC_COLLATE and LC_CTYPE settings are determined when a database is created, and cannot be changed except by creating a new database.

How to alter Postgres table data based on its contents?

This is probably a super simple question, but I'm struggling to come up with the right keywords to find it on Google.
I have a Postgres table that has among its contents a column of type text named content_type. That stores what type of entry is stored in that row.
There are only about 5 different types, and I decided I want to change one of them to display as something else in my application (I had been directly displaying these).
It struck me that it's funny that my view is being dictated by my database model, and I decided I would convert the types being stored in my database as strings into integers, and enumerate the possible types in my application with constants that convert them into their display names. That way, if I ever got the urge to change any category names again, I could just change it with one alteration of a constant. I also have the hunch that storing integers might be somewhat more efficient than storing text in the database.
First, a quick threshold question of, is this a good idea? Any feedback or anything I missed?
Second, and my main question, what's the Postgres command I could enter to make an alteration like this? I'm thinking I could start by renaming the old content_type column to old_content_type and then creating a new integer column content_type. However, what command would look at a row's old_content_type and fill in the new content_type column based off of that?
If you're finding that you need to change the display values, then yes, it's probably a good idea not to store them in a database. Integers are also more efficient to store and search, but I really wouldn't worry about it unless you've got millions of rows.
You just need to run an update to populate your new column:
update table_name set content_type = (case when old_content_type = 'a' then 1
when old_content_type = 'b' then 2 else 3 end);
If you're on Postgres 8.4 then using an enum type instead of a plain integer might be a good idea.
Ideally you'd have these fields referring to a table containing the definitions of type. This should be via a foreign key constraint. This way you know that your database is clean and has no invalid values (i.e. referential integrity).
There are many ways to handle this:
Having a table for each field that can contain a number of values (i.e. like an enum) is the most obvious - but it breaks down when you have a table that requires many attributes.
You can use the Entity-attribute-value model, but beware that this is too easy to abuse and cause problems when things grow.
You can use, or refer to my implementation solution PET (Parameter Enumeration Tables). This is a half way house between between 1 & 2.