PostgreSQL SELECT can alter a table? - postgresql

So I'm new to SQL like databases and the place that I work at migrated to PostgreSQL. One table drastically reduced its contents. The point is, I only used SELECT statements, and changed the name of the columns with AS. Is there a way I might have changed the table data?

When you migrate from a DBMS to another DBMS you must be sure that the objects created are strictly equivalent... The question seems to be trivial, but is'nt.
As a matter fact one important consideration for litterals (char/varchar...) is to verify the collation used formerly and the collation you have used to create the newly database in PostGreSQL.
Collation in an RDBMS is the way to adjust the behavior of character strings with regard to certain parameters such as the distinction, or not, of upper and lower case letters, the distinction, or not, of diacritical characters (accents, ligatures...), specific sorting to language, etc. And constitutes a superset of the character encoding.
Did you verify this point when using some WHERE clause to search some litterals ? If not, try to restricts litteral in applying the right collation (COLLATE operator) or use UPPER function to avoid the distinguish between upper and lower chars...

Related

Should I save ASCII-only varchar in UTF-8 or ASCII?

I have a varchar column that contains only ASCII symbols. I don't need to sort by this field, but I need to search it by full equality.
Default locale is en.UTF8. Will I gain anything if I create this column with collate "C"?
Yes, it makes a difference.
Even if you do not sort deliberately, there are various operations requiring sort steps internally (some aggregate functions, DISTINCT, nested loop joins etc.).
Also, any index on the field has to sort values internally - and observe collation rules unless COLLATE "C" applies (no collation rules).
For searches by full equality you'll want an index - which works either way (for equality), but it's faster overall without collation rules. Depending on the details of your use case, the effect may be negligible or substantial. The impact grows with the length of your strings. I ran a benchmark on a related case some time ago:
Slow query ordering by a column in a joined table
Also, there are more pattern matching options with locale "C". The alternative would be to create an index with the special varchar_pattern_ops operator class.
Related:
PostgreSQL LIKE query performance variations
Operator “~<~” uses varchar_pattern_ops index while normal ORDER BY clause doesn't?
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
Postgres 9.5 introduced performance improvements with a technique called "abbreviated keys", which ran into problems with some locales. So it was deactivated, except for the C locale. Quoting The release notes of Postgres 9.5.2:
Disable abbreviated keys for string sorting in non-C locales (Robert Haas)
PostgreSQL 9.5 introduced logic for speeding up comparisons of string
data types by using the standard C library function strxfrm() as a
substitute for strcoll(). It now emerges that most versions of glibc
(Linux's implementation of the C library) have buggy implementations
of strxfrm() that, in some locales, can produce string comparison
results that do not match strcoll(). Until this problem can be better
characterized, disable the optimization in all non-C locales. (C
locale is safe since it uses neither strcoll() nor strxfrm().)
Unfortunately, this problem affects not only sorting but also entry
ordering in B-tree indexes, which means that B-tree indexes on text,
varchar, or char columns may now be corrupt if they sort according to
an affected locale and were built or modified under PostgreSQL 9.5.0
or 9.5.1. Users should REINDEX indexes that might be affected.
It is not possible at this time to give an exhaustive list of
known-affected locales. C locale is known safe, and there is no
evidence of trouble in English-based locales such as en_US, but some
other popular locales such as de_DE are affected in most glibc
versions.
The problem also illustrates where collation rules come in, generally.

Do text_pattern_ops comparators understand UTF-8?

According to the PostgreSQL 9.2 documentation, if I am using a locale other than the C locale (en_US.UTF-8 in my case), btree indexes on text columns for supporting queries like
SELECT * from my_table WHERE text_col LIKE 'abcd%'
need to be created using text_pattern_ops like so
CREATE INDEX my_idx ON my_table (text_col text_pattern_ops)
Now section 11.9 of the documentation states that this results in a "character by character" comparison. Are these (non-wide) C characters or does the comparison understand UTF-8?
Good question, I'm not totally sure but my tentative understanding is:
Here Postgresql means "real characters" (eventually multibyte), not bytes. The comparison "understands UTF-8" always, with or without this special index.
The point is that, for locales that have special (non C) collation rules, we normally want to follow those rules (and call the respective locale libraries) when doing comparisons ( <, >...) and sorting. But we don't want to use those collations for POSIX regular matching and LIKE patterns. Hence the existence of two different types of indexes for text.
The operators in the text_pattern_ops operator class actually do a memcmp() on the strings, so the documentation is perhaps slightly inaccurate talking about characters.
But this doesn't really affect the question whether they support UTF-8. The indexing of pattern matching operations in the described fashion does support UTF-8. The underlying operators don't have to worry about the encoding.

Postgresql: auto lowercase text while (or before) inserting to a column

I want to achieve case insensitive uniqueness in a varchar column. But, there is no case insensitive text data type in Postgres. Since original case of text is not important, it will be a good idea to convert all to lowercase/uppercase before inserting in a column with UNIQUE constraint. Also, it will require one INDEX for quick search.
Is there any way in Postgres to manipulate data before insertion?
I looked at this other question: How to automatically convert a MySQL column to lowercase.
It suggests using triggers on insert/update to lowercase text or to use views with lowercased text. But, none of the suggested methods ensure uniqueness.
Also, since this data will be read/written by various applications, lowercasing data in every individual application is not a good idea.
ALTER TABLE your_table
ADD CONSTRAINT your_table_the_column_lowercase_ck
CHECK (the_column = lower(the_column));
From the manual:
The use of indexes to enforce unique constraints could be considered
an implementation detail that should not be accessed directly.
You don't need a case-insensitive data type (although there is one)
CREATE UNIQUE INDEX idx_lower_unique
ON your_table (lower(the_column));
That way you don't even have to mess around with the original data.

String sort order (LC_COLLATE and LC_CTYPE)

Apparently PostgreSQL allows different locales for each database since version 8.4
So I went to the docs to read about locales (http://www.postgresql.org/docs/8.4/static/locale.html).
String sort order is of my particular interest (I want strings sorted like 'A a b c D d' and not 'A B C ... Z a b c').
Question 1: Do I only need to set LC_COLLATE (String sort order) when I create a database?
I also read about LC_CTYPE (Character classification (What is a letter? Its upper-case equivalent?))
Question 2: Can someone explain what this means?
The sort order you describe is the standard in most locales.
Just try for yourself:
SELECT regexp_split_to_table('D d a A c b', ' ') ORDER BY 1;
When you initialize your db cluster with initdb you can can pick a locale with --locale=some_locale. In my case it's --locale=de_AT.UTF-8. If you don't specify anything the locale is inherited from the environment - your current system locale will be used.
The template database of the cluster will be set to that locale. When you create a new database, it inherits the settings from the template. Normally you don't have to worry about anything, it all just works.
Read the chapter on CREATE DATABASE for more.
If you want to speed up text search with indexes, be sure to read about operator classes, as well.
All links to version 8.4, as you specifically asked for that.
In PostgreSQL 9.1 or later, there is collation support that allows more flexible use of collations:
The collation feature allows specifying the sort order and character
classification behavior of data per-column, or even per-operation.
This alleviates the restriction that the LC_COLLATE and LC_CTYPE
settings of a database cannot be changed after its creation.
Compared to other databases, PostgreSQL is a lot more stringent about case sensitivity. To avoid this when ordering you can use string functions to make it case sensitive:
SELECT * FROM users ORDER BY LOWER(last_name), LOWER(first_name);
If you have a lot of data it will be inefficient doing this across a whole table every time you want to display a list of records. An alternative is to use the citext module, which provides a type that is internally case insensitive when doing comparisons.
Bonus:
You might come into this issue when searching too, in this there is a case insensitive pattern matching operator:
SELECT * FROM users WHERE first_name ILIKE "%john%";
Answer for question 1 (One)
The LC_COLLATE and LC_CTYPE settings are determined when a database is created, and cannot be changed except by creating a new database.

Anyone had success using a specific locale for a PostgreSQL database so that text comparison is case-insensitive? [duplicate]

This question already has answers here:
Change postgres to case insensitive
(2 answers)
Closed last year.
I'm developing an app in Rails on OS X using PostgreSQL 8.4. I need to setup the database for the app so that standard text queries are case-insensitive. For example:
SELECT * FROM documents WHERE title = 'incredible document'
should return the same result as:
SELECT * FROM documents WHERE title = 'Incredible Document'
Just to be clear, I don't want to use:
(1) LIKE in the where clause or any other type of special comparison operators
(2) citext for the column datatype or any other special column index
(3) any type of full-text software like Sphinx
What I do want is to set the database locale to support case-insensitive text comparison. I'm on Mac OS X (10.5 Leopard) and have already tried setting the Encoding to "LATIN1", with the Collation and Ctype both set to "en_US.ISO8859-1". No success so far.
Any help or suggestions are greatly appreciated.
Thanks!
Update
I have marked one of the answers given as the correct answer out of respect for the folks who responded. However, I've chosen to solve this issue differently than suggested. After further review of the application, there are only a few instances where I need case-insensitive comparison against a database field, so I'll be creating shadow database fields for the ones I need to compare case-insensitively. For example, name and name_lower. I believe I came across this solution on the web somewhere. Hopefully PostgreSQL will allow similar collation options to what SQL Server provides in the future (i.e. DOCI).
Special thanks to all who responded.
You will likely need to do something like use a column function to convert your text e.g. convert to uppercase - an example :
SELECT * FROM documents WHERE upper(title) = upper('incredible document')
Note that this may mess up performance that used index scanning, but if it becomes a problem you can define an index including column functions on target columns e.g.
CREATE INDEX I1 on documents (upper(title))
With all the limitations you have set, possibly the only way to make it work is to define your own = operator for text. It is very likely that it will create other problems, such as creating broken indexes. Other than that, your best bet seems to be to use the citext datatype; that would still let the ORM stuff you're using generate the SQL.
(I am not mentioning the possibility of creating your own locale definition because I haven't ever heard of anyone doing it.)
Your problem and your exclusives are like saying "I want to swim, but I don't want to have to move my arms.".
You will drown trying.
I don't think that is what local or encoding is used for. Encoding is more for picking a character set and not determining how to deal with characters. If there were a setting it would be in the config, but I haven't seen one.
If you do not want to use ilike for fear of not being able to port to another database then I would suggest you look into what ORM options might be available with ActiveRecord if you are using that.
here is something from one of the top postgres guys: http://archives.postgresql.org/pgsql-php/2003-05/msg00045.php
edit: fixed specific references to locale.
SELECT * FROM documents WHERE title ~* 'incredible document'