Firebird sort order with other character set - firebird

The sorting in this query does not take in account signs, only letters:
SELECT CAST(Text AS VARCHAR(20) CHARACTER SET ISO8859_1) COLLATE NO_NO Result FROM (
select CAST('_Anon' AS VARCHAR(20)) COLLATE UNICODE_CI_AI as Text from RDB$DATABASE
UNION
SELECT CAST('Abba' AS VARCHAR(20)) COLLATE UNICODE_CI_AI AS Text from RDB$DATABASE
UNION
SELECT CAST('Beatles' AS VARCHAR(20)) COLLATE UNICODE_CI_AI AS Text from RDB$DATABASE)
ORDER BY Result
Expected sort order(non-alpha-numeric before any letter):
_Anon
Abba
Beatles
But I get:
Abba
_Anon
Beatles
The collation does not matter. If you delete "COLLATE NO_NO" it still sorts wrong.
Edit: Found that collation ES_ES sorts this correct, but it fails to sort Norwegian characters.
Is this a bug or am I missing something in this query?
What I'm trying to do is to get correct sort order in Norwegian, and none of the collations in UNICODE_CI_AI gives me the correct order.
Update: Expanded the example with another sub-query so that it clearer shows the point.

Marks hint to look at the collation pointed me in the direction of a solution.
I do consider this a bug, so I was going to file a a bug report to firebirdsql, but found out it's a "Won't fix" and the workaround below is the official fix.
Of all base collations defined ES_* is the only one with the attribute: SPECIALS-FIRST=1 set. In fact it's the only collation with any attribute set.
And that attribute defines that special characters should be sorted before alphanumeric characters.
So the workaround is to create a new collation based on the NO_NO collation:
CREATE COLLATION NO_NO_NOPAD_CI_SF
FOR ISO8859_1
FROM NO_NO
NO PAD
CASE INSENSITIVE
'SPECIALS-FIRST=1';
then using the new collation like this:
SELECT CAST(Text AS VARCHAR(20) CHARACTER SET ISO8859_1) COLLATE NO_NO_NOPAD_CI_NUM_SF Result FROM (
select CAST('_Anon' AS VARCHAR(20)) COLLATE UNICODE_CI_AI as Text from RDB$DATABASE
UNION
SELECT CAST('Abba' AS VARCHAR(20)) COLLATE UNICODE_CI_AI AS Text from RDB$DATABASE
UNION
SELECT CAST('Beatels' AS VARCHAR(20)) COLLATE UNICODE_CI_AI AS Text from RDB$DATABASE)
ORDER BY Result
Yields the expected result:
_Anon
Abba
Beatles

Related

Oracle vs Postgres order by

I am running below query in Oracle and Postgres, both shows different output with respect to ordering of the values.
with test as (
select 'Summary-Account by User (Using Contact ID)' col1 from dual
union all
select 'Summary-Account by User by Client by Day (Using Contact ID)' col1 from dual
)
select * from test
order by col1 desc;
Below is Oracle one
Postgres
with test as (
select 'Summary-Account by User (Using Contact ID)' col1
union all
select 'Summary-Account by User by Client by Day (Using Contact ID)' col1
)
select * from test
order by col1 desc;
Oracle collation is AL32UTF8
Postgres has LC_CTYPS is en_US.UTF-8
Both of them look same from how database should behave. How to fix this?
I have read few posts on stackoverflow about POSIX and C, after changing the query order by to order by col1 collate "C" desc; The result matches Oracle output.
Is there anyway to apply this permanently?
AL32UTF8 is not a collation, but an encoding (character set).
Oracle uses the “binary collation” by default, which corresponds the the C or POSIX collation in PostgreSQL.
You have several options to get a similar result in PostgreSQL:
create the database with LOCALE "C"
if you are selecting from a table, define the column to use the "C" collation:
ALTER TABLE tab ALTER col1 TYPE text COLLATE "C";
add an explicit COLLATE clause:
ORDER BY col1 COLLATE "C"

Need to use COLLATION in a SELECT DISTINCT

I am trying to apply collation on a SELECT DISTINCT statement. anyone now how to do this?
One would think that the DISTINCT would detect upper and lower case as different, i.e.. 'Yes' and 'YES'.
But DISTINCT does not appear to be case sensitive. So I believe I need to add COLLATE...
SELECT DISTINCT COLLATE Latin1_General_CS_AS Shrt_Text AS Sht_text
FROM tblMatStrings
Any idea on how to distinguish upper and lower in a SELECT DISTINCT?
You've just got your syntax a bit backwards
SELECT DISTINCT
Shrt_Text COLLATE Latin1_General_CS_AS As Shrt_text
FROM tblMatStrings

What's the difference between SQL_Latin1_General_CP1_CI_AS and SQL_Latin1_General_CP1_CI_AI

i get this error when i run an update query in Microsoft SQL Server
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "SQL_Latin1_General_CP1_CI_AI" in the equal to operation.
the query uses only 2 tables, the table it's updating and a temp table which it does an inner join into, neither table have i specified the collation of and they are both on the same database which means they should have the same collation since's it should be the database default one right
looking at the collations, the only difference is the last character, all i understand of the last part is that CI stands for Case Insensitive. if i was to take a stab in the dark i would think AI stands for Auto Increment but i have no idea what AS stands for
AI stands for accent insensitive (i.e. determines if cafe = café).
You can use the collate keyword to convert one (or both) of the values' collations.
See link for more info: http://msdn.microsoft.com/en-us/library/aa258237(v=sql.80).aspx
Example: DBFiddle
--setup a couple of tables, populate them with the same words, only vary whether to accents are included
create table SomeWords (Word nvarchar(32) not null)
create table OtherWords (Word nvarchar(32) not null)
insert SomeWords (Word) values ('café'), ('store'), ('fiancé'), ('ampère'), ('cafétería'), ('fête'), ('jalapeño'), ('über'), ('zloty'), ('Zürich')
insert OtherWords (Word) values ('cafe'), ('store'), ('fiance'), ('ampere'), ('cafétería'), ('fete'), ('jalapeno'), ('uber'), ('zloty'), ('Zurich')
--now run a join between the two tables, showing what comes back when we use AS vs AI.
--NB: Since this could be run on a database of any collation I've used COLLATE on both sides of the equality operator
select sw.Word MainWord
, ow1.Word MatchAS
, ow2.Word MatchAI
from SomeWords sw
left outer join OtherWords ow1 on ow1.Word collate SQL_Latin1_General_CP1_CI_AS = sw.Word collate SQL_Latin1_General_CP1_CI_AS
left outer join OtherWords ow2 on ow2.Word collate SQL_Latin1_General_CP1_CI_AI = sw.Word collate SQL_Latin1_General_CP1_CI_AI
Example's Output:
MainWord MatchAS MatchAI
café cafe
store store store
fiancé fiance
ampère ampere
caféteríacaféteríacafétería
fête fete
jalapeño jalapeno
über uber
zloty zloty zloty
Zürich Zurich

Can I specify the collation when performing a SELECT INTO

If I'm selecting from one source into another can I specify the collation at the same time.
e.g.
SELECT Column1, Column2
INTO DestinationTable
FROM SourceTable
Where 'DestinationTable' doesn't already exist.
I know I can do something like
SELECT Column1, Column2 COLLATE Latin1_General_CI_AS
INTO DestinationTable
FROM SourceTable
In my real problem the data types of the column aren't known in advance so I can't just add the collation to each column. It's in a corner of a legacy application using large nasty stored procedures that generate SQL and I'm trying to get it working on a new server that has a different collation in tempdb with minimal changes.
I'm looking for something like:
SELECT Column1, Column2
INTO DestinationTable COLLATE Latin1_General_CI_AS
FROM SourceTable
But that doesn't work.
Can you create the table first?
You can define a collation for the relevant columns. On INSERT, they will be coerced.
It sounds like you don't know the structure of the target table though... so then no, you can't without dynamic SQL. Which will make things worse...
You can do this like that if it helps:
SELECT *
INTO DestinationTable
FROM
(
SELECT Column1 COLLATE Latin1_General_CI_AS, Column2 COLLATE Latin1_General_CI_AS
FROM SourceTable
) as t
To correct Kasia's answer:
SELECT *
INTO DestinationTable
FROM
(
SELECT Column1 COLLATE Latin1_General_CI_AS as Column1
,Column2 COLLATE Latin1_General_CI_AS as Column1
FROM SourceTable
) as t
You have to add an alias for each column to get this work.

how to do 'any(::text[]) ilike ::text'

here is table structure
table1
pk int, email character varying(100)[]
data
1, {'mr_a#gmail.com', 'mr_b#yahoo.com', 'mr_c#postgre.com'}
what i try to achieve is find any 'gmail' from record
query
select * from table1 where any(email) ilike '%gmail%';
but any() can only be in left-side and unnest() might slow down performance. anyone have any idea?
edit
actually i kinda confuse a bit when i first post. i try to achieve through any(array[]).
this is my actual structure
pk int,
code1 character varying(100),
code2 character varying(100),
code3 character varying(100), ...
my first approch is
select * from tabl1 where code1 ilike '%code%' or code2 ilike '%code%' or...
then i try
select * from table1 where any(array[code1, code2, ...]) ilike '%code%'
which is not working.
Create an operator that implements ILIKE "backwards", e.g.:
CREATE FUNCTION backward_texticlike(text, text) RETURNS booleans
STRICT IMMUTABLE LANGUAGE SQL
AS $$ SELECT texticlike($2, $1) $$;
CREATE OPERATOR !!!~~* (
PROCEDURE = backward_texticlike,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = ~~*
);
(Note that ILIKE internally corresponds to the operator ~~*. Pick your own name for the reverse.)
Then you can run
SELECT * FROM table1 WHERE '%code%' !!!~~* ANY(ARRAY[code1, code2, ...]);
Store email addresses in a normalized table structure. Then you can avoid the expense of unnest, have "proper" database design, and take full advantage of indexing. If you're looking to do full text style queries, you should be storing your email addresses in a table and then using a tsvector datatype so you can perform full text queries AND use indexes. ILIKE '%whatever%' is going to result in a full table scan since the planner can't take advantage of any query. With your current design and a sufficient number of records, unnest will be the least of your worries.
Update Even with the updates to the question, using a normalized codes table will cause you the least amount of headache and result in optimal scans. Anytime that you find yourself creating numbered columns, it's a good indication that you might want to normalize. That being said, you can create a computed text column to use as a search words column. In your case you could create a search_words column that is populated on insert and update by a trigger. Then you can create a tsvector to build full text queries on the search_words