How to make "case-insensitive" query in Postgresql? - postgresql

Is there any way to write case-insensitive queries in PostgreSQL, E.g. I want that following 3 queries return same result.
SELECT id FROM groups where name='administrator'
SELECT id FROM groups where name='ADMINISTRATOR'
SELECT id FROM groups where name='Administrator'

Use LOWER function to convert the strings to lower case before comparing.
Try this:
SELECT id
FROM groups
WHERE LOWER(name)=LOWER('Administrator')

using ILIKE instead of LIKE
SELECT id FROM groups WHERE name ILIKE 'Administrator'

The most common approach is to either lowercase or uppercase the search string and the data. But there are two problems with that.
It works in English, but not in all languages. (Maybe not even in
most languages.) Not every lowercase letter has a corresponding
uppercase letter; not every uppercase letter has a corresponding
lowercase letter.
Using functions like lower() and upper() will give you a sequential
scan. It can't use indexes. On my test system, using lower() takes
about 2000 times longer than a query that can use an index. (Test data has a little over 100k rows.)
There are at least three less frequently used solutions that might be more effective.
Use the citext module, which mostly mimics the behavior of a case-insensitive data type. Having loaded that module, you can create a case-insensitive index by CREATE INDEX ON groups (name::citext);. (But see below.)
Use a case-insensitive collation. This is set when you initialize a
database. Using a case-insensitive collation means you can accept
just about any format from client code, and you'll still return
useful results. (It also means you can't do case-sensitive queries. Duh.)
Create a functional index. Create a lowercase index by using CREATE
INDEX ON groups (LOWER(name));. Having done that, you can take advantage
of the index with queries like SELECT id FROM groups WHERE LOWER(name) = LOWER('ADMINISTRATOR');, or SELECT id FROM groups WHERE LOWER(name) = 'administrator'; You have to remember to use LOWER(), though.
The citext module doesn't provide a true case-insensitive data type. Instead, it behaves as if each string were lowercased. That is, it behaves as if you had called lower() on each string, as in number 3 above. The advantage is that programmers don't have to remember to lowercase strings. But you need to read the sections "String Comparison Behavior" and "Limitations" in the docs before you decide to use citext.

You can use ILIKE. i.e.
SELECT id FROM groups where name ILIKE 'administrator'

You can also read up on the ILIKE keyword. It can be quite useful at times, albeit it does not conform to the SQL standard. See here for more information: http://www.postgresql.org/docs/9.2/static/functions-matching.html

You could also use POSIX regular expressions, like
SELECT id FROM groups where name ~* 'administrator'
SELECT 'asd' ~* 'AsD' returns t

use ILIKE
select id from groups where name ILIKE 'adminstration';
If your coming the expressjs background and name is a variable
use
select id from groups where name ILIKE $1;

Using ~* can improve greatly on performance, with functionality of INSTR.
SELECT id FROM groups WHERE name ~* 'adm'
return rows with name that contains OR equals to 'adm'.

ILIKE work in this case:
SELECT id
FROM groups
WHERE name ILIKE 'Administrator'

For a case-insensitive parameterized query, you can use the following syntax:
"select * from article where upper(content) LIKE upper('%' || $1 || '%')"

-- Install 'Case Ignore Test Extension'
create extension citext;
-- Make a request
select 'Thomas'::citext in ('thomas', 'tiago');
select name from users where name::citext in ('thomas', 'tiago');

If you want not only upper/lower case but also diacritics, you can implement your own func:
CREATE EXTENSION unaccent;
CREATE OR REPLACE FUNCTION lower_unaccent(input text)
RETURNS text
LANGUAGE plpgsql
AS $function$
BEGIN
return lower(unaccent(input));
END;
$function$;
Call is then
select lower_unaccent('Hôtel')
>> 'hotel'

A tested approach is using ~*
As in the example below
SELECT id FROM groups WHERE name ~* 'administrator'

select id from groups where name in ('administrator', 'ADMINISTRATOR', 'Administrator')

Related

Postgres full text search for partial word

I've been seeing a lot of examples around like this one:
postgres full text search like operator
They all specify that you can do a prefix search like this:
SELECT *
FROM eventlogging
WHERE description_tsv ## to_tsquery('mess:*');
and it will retrieve a word like: "message"
However, what I do not see anywhere is whether or not there is a way to search for different parts of a word, such as a suffix?
The example that I am having trouble with right now is this:
CREATE TABLE IF NOT EXISTS project (
id VARCHAR NOT NULL,
org_name VARCHAR NOT NULL DEFAULT '',
project_name VARCHAR NOT NULL DEFAULT ''
);
insert into project(id, org_name, project_name) values ('123', 'org', 'proj');
insert into project(id, org_name, project_name) values ('456', 'huh', 'org');
insert into project(id, org_name, project_name) values ('789', 'orgs', 'project');
CREATE OR REPLACE FUNCTION get_projects(query_in VARCHAR)
RETURNS TABLE (id VARCHAR, org_name VARCHAR, project_name VARCHAR) AS $$
BEGIN
RETURN QUERY
SELECT * FROM project WHERE (
to_tsvector('simple', coalesce(project.project_name, '')) ||
to_tsvector('simple', coalesce(project.org_name, ''))
) ## to_tsquery('simple', query_in);
END;
$$ LANGUAGE plpgsql;
The following example returns:
select * from get_projects('org');
id org_name project_name
----------------------------
123 org proj
456 huh org
My question is: why does it not return orgs? Similarly, if I search for proj, I only get the project named "proj" but not the one named "project."
Bonus points: how can I get results if I search for a substring? For example, if I search for the string jec, I would like to get back the project named project. I'm not really looking for fuzzy searching, but I would say that I am looking for substring searching.
Am I completely wrong to be using to_tsquery? I also tried plainto_tsquery and I tried using english instead of simple, but several references said to stick with simple.
Full text search is different from substring search. Full text search is about searching whole words, omitting frequent words from indexing, ignoring inflection and the like. PostgreSQL full text search extends that somewhat by allowing prefix searches.
To search for substrings, you have to search with a condition like
WHERE word ~ 'suffix\M'
(This would be a suffix search with the regular expression matching operator ~.)
To speed up a search like that, create a trigram index:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON tab USING gin (doc gin_trgm_ops);
So-called prefix searching doesn't really thematically belong in full text searching. I think it was tossed in because, given that tokens would be stored in a btree anyway, adding that "feature" was free. No other types of partial matching are mentioned in the context of FTS because they don't exist.
You discuss the partial matching that does exist with FTS, the :* notation. But then in your example, you don't actually use it. That is why you don't see it working, because you don't use it. If you do use it, it does work:
select * from get_projects('org:*');
But given your description, it sounds like you don't want FTS in the first place. You want LIKE or regex, perhaps with index support from pg_trgm.
but several references said to stick with simple.
It is hard to know how good the judgement of anonymous references are, but if you only want to use 'simple' than most likely you shouldn't be using FTS in the first place. 'simple' is useful for analyzing or learning or debugging real FTS situations, and can be used as a baseline for building up more complex configurations.

Alphanumeric sorting without any pattern on the strings [duplicate]

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

Wildcard search in postgresql

In postgresql, I have mangaged to add wildcard pattern (*) to the query using SIMILAR TO option. So my query will be,
SELECT * FROM table WHERE columnName SIMILAR TO 'R*'
This query would return all entities starting from 'R' and not 'r'. I want to make it case insensitive.
Use ILIKE:
SELECT * FROM table WHERE columnName ILIKE 'R%';
or a case-insensitive regular expression:
SELECT * FROM table WHERE columnName ~* '^R.*';
Both are PostgreSQL extensions. Sanjaya has already outlined the standards-compliant approaches - filtering both sides with lower(...) or using a two-branch SIMILAR TO expression.
SIMILAR TO is less than lovely and best avoided. See this earlier answer.
You could write:
SELECT * FROM table WHERE columnName SIMILAR TO '(R|r)%'
but I don't particularly recommend using SIMILAR TO.
try
SELECT * FROM table WHERE columnName SIMILAR TO 'R%|r%'

How to select usernames start with a-to-c range in PostgreSQL?

I got a table with username. I want to select username starts with a to c. What is the SQL syntax for this in PostgreSQL? Thanks.
There are a couple of different ways, but I'd probably use a regular expression with a character set match:
SELECT * FROM users WHERE username ~ '^[a-cA-C]';
or a substring search:
SELECT * FROM users WHERE lower(left(username,1)) BETWEEN 'a' AND 'c';
In older versions of PostgreSQL the left function isn't available, so you have to use substring(username from 1 for 1) instead.
See string functions and pattern matching for more information.

PostgreSql XML Text search

I have a text column in a table. We store XML in this column. Now I want to search for tags and values
Example data:
<bank>
<name>Citi Bank</name>
.....
.....
/<bank>
I would like to run the following query:
select * from xxxx where to_tsvector('english',xml_column) ## to_tsquery('<name>Citi Bank</name>')
This works fine but it also works for tags like name1 or no tag.
How do I have to setup my search in order for this to work so I get an exact match for the tag and value ?
You could use the xpath function like this
select *
from xxx
where xpath(xml_column, 'bank/name/text()') = 'CitiBank';
BUT it won't use the full-text search index. You could use a subquery to find probable matches and avoid full scans, and the xpath expression for getting correct answers, or create a function index if the queries are going to be always the same.
You might want to reconsider storing XML in a database, instead you could look at inserting the data into related tables, since using XML is a poor replacement for a relational store. Even if you go with XML in database, use the XML type, not the TEXT type, and create an index like this (yes, basically you'd need an index per xpath expression):
CREATE INDEX my_funcidx ON my_table USING GIN ( CAST(xpath('/bank/name/text()', xmlfield) AS TEXT[]) );
then, query it like this:
SELECT * FROM my_table WHERE CAST(xpath('/bank/name/text()', xmlfield) AS TEXT[]) #> '{Citi Bank}'::TEXT[];
and this will use the index, as EXPLAIN will indicate.
The important part is the CASTing to TEXT[], as XML[], which the xpath function returns, isn't indexable by default.