Django ORM: MyModel.objects.filter(name__iexact__in=[...]) - django-orm

I have a list of strings and I would like to do an case-insensitive "in" lookup with Django ORM.
I know, that I can do this:
name_list = ['Alpha', 'bEtA', 'omegA']
q_list = map(lambda n: Q(name__iexact=n), name_list)
q_list = reduce(lambda a, b: a | b, q_list)
MyModel.objects.filter(q_list)
But maybe there is a simpler solutions with more modern Django ORM?

IN operator(and other MySQL operators except binary comparisons) are case insensitive by default, unless you have changed the table collation. In most of the cases, you can simply use MyModel.objects.filter(name__in=name_list).
See https://blog.sqlauthority.com/2007/04/30/case-sensitive-sql-query-search/ to know more about collation.
P.S. Avoid using map-reduce like this. If the name_list length is large enough, your entire mysql-server can get down.
Edit:
For PostgreSQL you can do the following: MyModel.objects.annotate(name_lower=Lower('name')).filter(name_lower__in=[name.lower() for name in name_list])

Related

Parse query, identify field/table usage

Are there some modern Postgres feature that allow auditing which tables and fields are used in SELECTs?
I saw tooling such as Envoy proxy has something about parsing and basic stats, but it seems it falls short of a more complete analysis.
e.g. a table such as:
schema, table, field, selected_times, in_where_clause_times
We wrote a library that does just that, to power pganalyze (a Postgres monitoring tool).
The library is called pg_query (Ruby, open-source, MIT licensed), and takes the Postgres parser source code and packages it as a library:
PgQuery.parse("SELECT ? FROM x JOIN y USING (id) WHERE z = ?").tables
=> ["x", "y"]
PgQuery.parse("SELECT ? FROM x WHERE x.y = ? AND z = ?").filter_columns
=> [["x", "y"], [nil, "z"]]
See https://github.com/pganalyze/pg_query#extracting-tables-from-a-query and https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser for more context. There are also wrappers around the core parser in other languages, but not all have the tables/filter_columns helpers.
Note that this is based on the raw parser, i.e. you need to interpret which tables this refers to (e.g. if you have different schemas and set a search_path, etc).

PostgreSQL case-insensitive and accent-insensitive search

I have a data table and I would like to filter the columns. For example, to search for a user by his fullname.
However, I would like to allow the user to enter the search phrase case-independent and accents-independent.
So I have checked these (and more) sources and questions:
https://stackoverflow.com/a/11007216
How to ignore case sensitive rows in PostgreSQL
https://www.postgresql.org/docs/current/collation.html#COLLATION-NONDETERMINISTIC
I thought the nondeterministic collations might finally be the right way how to achieve that, but unfortunately I don't know how:
combine case_insensitive and ignore_accents into one collation
how to allow searching only by substring in such a WHERE (e.g., find "Jóhn Doe" only by the string "joh") as nondeterministic collations do not support LIKE or regular expressions
which index to use
I would be very grateful for any advice on how to finally deal with this type of problem.
Thanks!
Creating case and accent insensitive ICU collations is pretty simple:
CREATE COLLATION english_ci_ai (
PROVIDER = icu,
DETERMINISTIC = FALSE,
LOCALE = "en-US-u-ks-level1"
);
Or, equivalently (that syntax also works wil old ICU versions:
CREATE COLLATION english_ci_ai (
PROVIDER = icu,
DETERMINISTIC = FALSE,
LOCALE = "en-US#colStrength=primary"
);
See the ICU documentation for details and my article for a detailed discussion.
But your problem is that you want substring search. So you should create a trigram index:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS unaccent;
CREATE INDEX ON tab USING gin (unaccent(doc) gin_trgm_ops);
Then you can search like this:
SELECT * FROM tab
WHERE unaccent(doc) ILIKE unaccent('%joh%');
Note that you have to force a minimal length of 4 or so on the search string if you want that to be efficient.

Slick compile query with Set[Int] parameter

I have a query that takes Seq[Int] as it's argument (and performs filtering like WHERE x IN (...)), and I need to compile it since this query is failry complex. However, when I try the naive approach:
Compiled((xs: Set[Int]) => someQuery.filter(_.x inSet xs))
It fails with message that
Computation of type Set[Int] => Query[SomeTable, SomeValue, Seq] cannot be compiled (as type C)
Can Slick compile queries that takes a sets of integer as parameters?
UPDATE: I use PostgreSQL as database, so it can be possible to use arrays instead of IN clause, but how?
As for the PostgreSQL database, the solution is much simpler than I expected.
First of all, there is a need of special Slick driver for PostgreSQL that support arrays. It usually already included in projects that rely on PgSQL features, so there is no trouble at all. I use this driver.
The main idea is to replace plain SQL IN (...) clause which takes the same amount of bind parameters as the amount of items in list, and thus cannot be statically compiled by Slick with PgSQL-specific array operator x = ANY(arr), which takes only one parameter for the array. It's easy to do with code like this:
val compiledQuery = Compiled((x: Rep[List[Int]]) => query.filter(_.id === x.any))
This code will generate query like WHERE x = ANY(?) which will use only one parameter, so Slick will accept it for compilation.

PostgreSQL like not returning matching instances

How does the PostrgeSQL like function work? I'm using token inputs to limit input from user with only existing values.
I have the following values in the DB:
`Yellow, White, Orange...`
My Code
#colors = Color.where("name like ?", "%#{params[:q]}%")
If I type in w for example White is not proposed. I have to type in second letter to see White proposed. Because Db values all start by Capital letter I suspect a difference with SQLite.
I found this post which mentions ILIKE but was wondering if there is some common code that work both with Postgres and SQLite.
The SQLite LIKE operator is case insensitive per default.
In PostgreSQL ILIKE is the case insensitive version of LIKE. There are also operators:
~~ .. LIKE
~~* .. ILIKE
!~~ .. NOT LIKE
!~~* .. NOT ILIKE
These three expressions are all effectively the same in PostgreSQL:
name ilike '%w%'
name ~~* '%w%'
lower(name) like lower('%w%')
The last line mostly works in both SQLite and PostgreSQL. I added links to the respective manual pages.
A limitation applies: SQLite only understands lower / upper case of ASCII characters, while PostgreSQL understands other UTF-8 characters, too.
The case-sensitivity of LIKE depends on the database you use. Some databases ignore case when using LIKE, some don't, some look at various configuration options. One way around this is to normalize the case yourself by converting to upper or lower case:
#colors = Color.where("lower(name) like ?", "%#{params[:q].downcase}%")

String sort order (LC_COLLATE and LC_CTYPE)

Apparently PostgreSQL allows different locales for each database since version 8.4
So I went to the docs to read about locales (http://www.postgresql.org/docs/8.4/static/locale.html).
String sort order is of my particular interest (I want strings sorted like 'A a b c D d' and not 'A B C ... Z a b c').
Question 1: Do I only need to set LC_COLLATE (String sort order) when I create a database?
I also read about LC_CTYPE (Character classification (What is a letter? Its upper-case equivalent?))
Question 2: Can someone explain what this means?
The sort order you describe is the standard in most locales.
Just try for yourself:
SELECT regexp_split_to_table('D d a A c b', ' ') ORDER BY 1;
When you initialize your db cluster with initdb you can can pick a locale with --locale=some_locale. In my case it's --locale=de_AT.UTF-8. If you don't specify anything the locale is inherited from the environment - your current system locale will be used.
The template database of the cluster will be set to that locale. When you create a new database, it inherits the settings from the template. Normally you don't have to worry about anything, it all just works.
Read the chapter on CREATE DATABASE for more.
If you want to speed up text search with indexes, be sure to read about operator classes, as well.
All links to version 8.4, as you specifically asked for that.
In PostgreSQL 9.1 or later, there is collation support that allows more flexible use of collations:
The collation feature allows specifying the sort order and character
classification behavior of data per-column, or even per-operation.
This alleviates the restriction that the LC_COLLATE and LC_CTYPE
settings of a database cannot be changed after its creation.
Compared to other databases, PostgreSQL is a lot more stringent about case sensitivity. To avoid this when ordering you can use string functions to make it case sensitive:
SELECT * FROM users ORDER BY LOWER(last_name), LOWER(first_name);
If you have a lot of data it will be inefficient doing this across a whole table every time you want to display a list of records. An alternative is to use the citext module, which provides a type that is internally case insensitive when doing comparisons.
Bonus:
You might come into this issue when searching too, in this there is a case insensitive pattern matching operator:
SELECT * FROM users WHERE first_name ILIKE "%john%";
Answer for question 1 (One)
The LC_COLLATE and LC_CTYPE settings are determined when a database is created, and cannot be changed except by creating a new database.